Skip to content

stringr 1.5.0

Compare
Choose a tag to compare
@hadley hadley released this 04 Dec 17:34

Breaking changes

  • stringr functions now consistently implement the tidyverse recycling rules
    (#372). There are two main changes:

    • Only vectors of length 1 are recycled. Previously, (e.g.)
      str_detect(letters, c("x", "y")) worked, but it now errors.

    • str_c() ignores NULLs, rather than treating them as length 0
      vectors.

    Additionally, many more arguments now throw errors, rather than warnings,
    if supplied the wrong type of input.

  • regex() and friends now generate class names with stringr_ prefix (#384).

  • str_detect(), str_starts(), str_ends() and str_subset() now error
    when used with either an empty string ("") or a boundary(). These
    operations didn't really make sense (str_detect(x, "") returned TRUE
    for all non-empty strings) and made it easy to make mistakes when programming.

New features

  • Many tweaks to the documentation to make it more useful and consistent.

  • New vignette("from-base") by @sastoudt provides a comprehensive comparison
    between base R functions and their stringr equivalents. It's designed to
    help you move to stringr if you're already familiar with base R string
    functions (#266).

  • New str_escape() escapes regular expression metacharacters, providing
    an alternative to fixed() if you want to compose a pattern from user
    supplied strings (#408).

  • New str_equal() compares two character vectors using unicode rules,
    optionally ignoring case (#381).

  • str_extract() can now optionally extract a capturing group instead of
    the complete match (#420).

  • New str_flatten_comma() is a special case of str_flatten() designed for
    comma separated flattening and can correctly apply the Oxford commas
    when there are only two elements (#444).

  • New str_split_1() is tailored for the special case of splitting up a single
    string (#409).

  • New str_split_i() extract a single piece from a string (#278, @bfgray3).

  • New str_like() allows the use of SQL wildcards (#280, @rjpat).

  • New str_rank() to complete the set of order/rank/sort functions (#353).

  • New str_sub_all() to extract multiple substrings from each string.

  • New str_unique() is a wrapper around stri_unique() and returns unique
    string values in a character vector (#249, @seasmith).

  • str_view() uses ANSI colouring rather than an HTML widget (#370). This
    works in more places and requires fewer dependencies. It includes a number
    of other small improvements:

    • It no longer requires a pattern so you can use it to display strings with
      special characters.
    • It highlights unusual whitespace characters.
    • It's vectorised over both stringandpattern` (#407).
    • It defaults to displaying all matches, making str_view_all() redundant
      (and hence deprecated) (#455).
  • New str_width() returns the display width of a string (#380).

  • stringr is now licensed as MIT (#351).

Minor improvements and bug fixes

  • Better error message if you supply a non-string pattern (#378).

  • A new data source for sentences has fixed many small errors.

  • str_extract() and str_exctract_all() now work correctly when pattern
    is a boundary().

  • str_flatten() gains a last argument that optionally override the
    final separator (#377). It gains a na.rm argument to remove missing
    values (since it's a summary function) (#439).

  • str_pad() gains use_width argument to control whether to use the total
    code point width or the number of code points as "width" of a string (#190).

  • str_replace() and str_replace_all() can use standard tidyverse formula
    shorthand for replacement function (#331).

  • str_starts() and str_ends() now correctly respect regex operator
    precedence (@carlganz).

  • str_wrap() breaks only at whitespace by default; set
    whitespace_only = FALSE to return to the previous behaviour (#335, @rjpat).

  • word() now returns all the sentence when using a negative start parameter
    that is greater or equal than the number of words. (@pdelboca, #245)