tidyverse / stringr Goto Github PK

View Code? Open in Web Editor NEW

583.0 37.0 180.0 5.1 MB

A fresh approach to string manipulation in R

Home Page: https://stringr.tidyverse.org

License: Other

R 99.60% CSS 0.21% JavaScript 0.18%

r strings regular-expression

stringr's Issues

New Feature Request - fuzzywuzzy-style string matching/scoring

For awhile now, I've wanted a way to use fuzzywuzzy in R. I've even tried installing R-Python translators to no avail. If stringr could include any part of this type of functionality, it would make my life much, much easier.

str_replace and str_replace_all should take functions

as third argument instead of fixed text.

unexpected behavior when passing "\" to str_sub

Issue:

I am not sure if this is expected behavior, but I was testing out some possible solutions to a question on stackoverflow and got the following behavior:

Apologies in advance for abusing the function. I hope it helps.

Examples:

> str_length("\123")
[1] 1
> str_sub("\123", -1)
[1] "S"
> str_sub("\123", -20)
[1] "S"
> str_sub("\123123", end = -11)
[1] ""
> str_sub("\123123", end = -1)
[1] "S123"
> str_sub("\001", end = -1)
[1] "\001"
> str_sub("\001", end = -0)
[1] ""
> str_sub("\001", end = -2)
[1] ""
> str_sub("\001", end = 1)
[1] "\001"
> str_sub("\001", end = 6)
[1] "\001"
> str_sub("\00001", end = 6)
Error: embedded nul in string: '\001'

related feature request:

Options to simplify handling escapes would be a great feature; e.g. see these other SO questions:

Can R paste() output “\”?

Replacing escaped double quotes by double quotes in R"

How to gsub('%', '%', … in R?

Problem with str_detect and perl pattern

This function working based on grepl function . But in interface (str_detect) I can't find solutions with using extend POSIX i.e perl = FALSE, value = FALSE.

Suppose I need detect simple string like a:

s1 <- paste("(?!NEGATE_TERMS_I_DONT_HAVE_IT) term1 term2", sep="")

When I trying use:

isDetect <- str_detect("string to match", s1)

I getting error:

Error in grepl("(?!NEGATE_TERMS_I_DONT_HAVE_IT) term1 term2", "Sd",
fixed = FALSE, :
invalid regular expression '(?!NEGATE_TERMS_I_DONT_HAVE_IT) term1 term2'
In addition: Warning message:
In grepl("(?!NEGATE_TERMS_I_DONT_HAVE_IT) term1 term2", "Sd", fixed = FALSE, :
regcomp error: 'Invalid regexp'

And i must use standard grepl function

isDetect <- grepl(s1, "string to match", TRUE, TRUE)

I think will be usefull use in str_detect parameter like perl = FALSE,
value = FALSE.

Not working for $

This could be a totally dumb question, but I am trying to strip out prices for a set of string using position of a "$". However str_detect("Jokesonme", "$") gives me TRUE even if there is no "$" in the string.

documentation typo, "regexp" should be "regex"

In ?perl and the deprecation message that prints when you use perl, regexp is referred to instead of regex.

current version of stringr requires R 2.11.0 (for vapply)

The DESCRIPTION file should indicate this dependency.

Recommendation: exact match modifier

I recommend adding an exact match modifier like perl, fixed and ignore.case.

The exact modifier should match only on exact matches unlike fixed which matches on part of the string. Although this can be done using ==, the exact modifier would allow developers a parallel idiom to switch between exact and less-exact matchings.

An alternative, of course, is to use perl with a pattern wrapped between ^ and $, but this solutions required applying a function to the pattern and not the string and thus breaking the parallel construction.

Internally, this could use the perl construct described in the preceding paragraph or use the ==, which should be faster.

This could probably be implemented -entirely- mostly within the re_call function.

str_wrap Bug with Empty String Input

There was a bug introduced in the latest version of stringr in the str_wrap function. In previous versions (stringr_0.6.2), if an empty string was passed as input, the function worked fine, but stringr_1.0.0 throws an error.

Example Code

> library(stringr)
> str_wrap("",width=5)

Error in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE) : 
  argument `...` should be a character vector (or an object coercible to)

Session Info

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] graphics  grDevices datasets  stats     utils     methods   base     

other attached packages:
[1] stringr_1.0.0

loaded via a namespace (and not attached):
[1] magrittr_1.5  stringi_0.4-1 tools_3.1.1

check fails

Hi Hadley!

Trying to implement my two little functions to stringr (as discussed per mail some time ago), I found the following problem checking the original version first:
If i build stringr using RStudio, everything works as expected, but the check fails throwing the following error:

==> roxygenize('.', roclets=c('rd', 'collate', 'namespace'))

* checking for changes ... ERROR

Error in stri_replace_all_regex(string, pattern, replacement, vectorize_all = vec,  : 
  Missing closing bracket on a bracket expression. (U_REGEX_MISSING_CLOSE_BRACKET)

Using the command line R CMD check fails with this output:

...
* checking examples ... ERROR
Running examples in ‘stringr-Ex.R’ failed
The error most likely occurred in:

> ### Name: case
> ### Title: Convert case of a string.
> ### Aliases: case str_to_lower str_to_title str_to_upper
>
> ### ** Examples
>
> dog <- "The quick brown dog"
> str_to_upper(dog)
[1] "THE QUICK BROWN DOG"
> str_to_lower(dog)
[1] "the quick brown dog"
> str_to_title(dog)
Error in stri_trans_totitle(string, opts_brkiter = stri_opts_brkiter(locale = locale)) :
  The requested ICU resource cannot be found. Possible problem: ICU data has not been downloaded yet. Call `stri_install_check()`. (U_MISSING_RESOURCE_ERROR)
Calls: str_to_title -> stri_trans_totitle -> .Call
Execution halted

Starting R to check the mentioned possible problem gives

> library(stringi)
> stri_install_check()
stringi_0.5.1; en_US.UTF-8; ICU4C 51.2; Unicode 6.2
All tests completed successfully.
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] stringi_0.5-1

loaded via a namespace (and not attached):
[1] tools_3.1.1

Cheers,
Gerhard

str_pad Cannot Deal With NAs

This does not work, although it should return the missing value unchanged.

str_pad(c("hello", NA), 8)
Error in rep.int(string[i], times[i]) : invalid 'times' value

Not sure about whether other stringr functions are affected by this bug as well.

Lookaheads might be unsupported?

I'm trying to pull the first 2 characters before an underscore out of a string, if an underscore exists. So, for:

mystr <- "cp_awesome"

I'm just trying to get "cp"

> str_extract(mystr, "[a-z]{2}(?=_)")
Error in regexpr("[a-z]{2}(?=_)", "cp_awesome", fixed = FALSE,  : 
  invalid regular expression '[a-z]{2}(?=_)', reason 'Invalid regexp'

fails, but

> str_extract(mystr, "[a-z]{2}(?:_)")
[1] "cp_"

Succeeds, but operates as a grouping param instead. stringr seems to be rejecting the (?=) syntax.

Vectorisation problems

(From Stavros)

Your email said "all functions now vectorised with respect to string, pattern (and where appropriate) replacement parameters".

The doc for str_extract does not reflect this change; it says: "'pattern' should be a single pattern", though in fact it does vectorize over pattern:

> str_extract(c('abc'),c('.','..'))
[1] "a"  "ab"

On the other hand, str_extract_all is buggy:

> str_extract_all(c('abcd'),c('.','..'))
[[1]]
[1] "a" "b" "c" "d"          <<<<<<<< what happened to the matches for '..'?

But when we duplicate the string part, we get the correct result:

> str_extract_all(rep(c('abcd'),2),c('.','..'))
[[1]]
[1] "a" "b" "c" "d"

[[2]]
[1] "ab" "cd"

In str_match, the doc says that pattern should be a single pattern, and I get an error message if it isn't, but the result seems to use both patterns:

> str_match(c('abc','xy'),c('(.)','(..)'))
     [,1] [,2]
[1,] "a"  "a" 
[2,] "xy" "xy"
Warning messages:
1: In if (n == 0) { :
  the condition has length > 1 and only the first element will be used
2: In seq_len(n) : first element used of 'length.out' argument

str_lower, str_upper, str_capitalise, and str_CamelCase

Please add functions to convert a character string to all lowercase, all UPPERCASE, all First Letters Of Words In Capitalized Case and all camelCase. You could call the functions: str_lower, str_upper, str_capitalise, and str_CamelCase.

The first two are more straightforward and should be modeled on the tolower() and toupper() in base R. The last two are more tricky to get right. One source of inspiration could be the tocamel() function in the development version of the 'rapport' package: https://github.com/Rapporter/rapport/tree/development . The associated issues have been partially discussed on r-help: http://r.789695.n4.nabble.com/how-to-transform-string-to-quot-Camel-Case-quot-td4664222.html

Should you decide to take the 'rapport' approach and merge str_capitalise and str_CamelCase into one function, then you could call it str_camel.

word function

word() grabs words from char strings. For example:
str = 'abc.123.999..'
word(str, 1, delim='.') would return 'abc'
word(str, 2, delim='.') would return '123'
word(str, -1, delim='.') would return '999'

suggested by David Cooper

Verify str_split_fixed behaviour

On empty strings and zero-length character vectors

str_trim(character(0)) should return "", not character(0)

Ref: r-lib/pkgdown#49

Add Citation to the package.

Hi @hadley ,
Please could you add citation to the package. Although i could do it myself (both correct citation and pull request), I am afraid it would be better to wait for you.

In case I should do it, let me know. Thanks.

str_match with non-capturing groups is broken in release version

This minimal case demonstrates the problem (a bunch of non-capturing groups have been appended to make the problem obvious). The problem only occurs when at least one string does not match.

library(magrittr)
library(stringr)

x <- c("A_B_C", "THIS DOES NOT MATCH")
matcher <- regexec("(A)_(B)_(?:C)", x)
matches <- regmatches(x, matcher) %>% print
x %>% str_match("(A)_(B+)_(?:C)(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?")

The last line produces this resuit:

> x %>% str_match("(A)_(B+)_(?:C)(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?")
     [,1]    [,2] [,3] [,4]    [,5] [,6] [,7]    [,8] [,9] [,10]   [,11]
[1,] "A_B_C" "A"  "B"  "A_B_C" "A"  "B"  "A_B_C" "A"  "B"  "A_B_C" "A"  
[2,] NA      NA   NA   NA      NA   NA   NA      NA   NA   NA      NA   
Warning message:
In rbind(c("A_B_C", "A", "B"), c(NA_character_, NA_character_, NA_character_,  :
  number of columns of result is not a multiple of vector length (arg 1)

The bug is in these lines:

    tmp <- str_replace_all(pattern, "\\\\\\(", "")
    n <- str_length(str_replace_all(tmp, "[^(]", "")) + 1

which attempt to count the number of number of capture groups, but fail to exclude non-capturing groups. (Thinking about it, they probably also fail to include a capture group preceded by an even number of backslashes.)

I realize this code has all been replaced by stringi in the devel version, but if you're still maintaining the release version, it would be good to fix this.

Bug in str_subset / fixed

> str_subset("I", fixed("i", ignore_case = TRUE))
character(0)

I was expecting to get "I", not the empty string.

NA treatment

Often I want str_c (and friends) to behave correspondingly with e.g. sum.

I.e. sum(NA, 2) yields NA and I can do sum(NA, 2, na.rm = T) to get 2.
str_c(NA,2) yields NA2. The shortest way around this, that I've found to yield NA is
df[ ,strung_together := ifelse( any( is.na(col1), is.na(col2) ), NA, str_c(col1, col2)]

So, it would be cool to get str_c(col1, col2, na.rm = F) = NA.

Named capture groups

Like in python. e.g.

> str_match(strings,"([2-9][0-9]{2})[- .](?P<area>[0-9]{3})[- .]([0-9]{4})")
                    area
 [1,] "219 733 8965" "219" "733" "8965"
 [2,] "329-293-8753" "329" "293" "8753"
 [3,] NA             NA    NA    NA
 [4,] "595 794 7569" "595" "794" "7569"

str_match mistakes "(" in a character class for the beginning of a group

The group identification behavior in str_match requires the ( character to be escaped in character classes, in contrast to the group identification behavior in base R.

For example, with gsub,capturing a ( in the group does not require escaping it if it is in a character class:

gsub("([(]...[)])","123", c("(abc)", "xyz"))
 [1] "123" "xyz"

but it does with str_match

str_match(c("(abc)", "xyz"), "([(]...[)])")
      [,1]    [,2]    [,3]
 [1,] "(abc)" "(abc)" "(abc)"
 [2,] NA      NA      NA
 Warning message:
 In rbind(c("(abc)", "(abc)"), c(NA_character_, NA_character_, NA_character_ :
   number of columns of result is not a multiple of vector length (arg 1)

While it is possible to get around this explicitly escaping \\( like this

str_match(c("(abc)", "xyz"), "([\\(]...[\\)])")

the documentation says that the syntax should be consistent with base R.

str_pad doesn't accept NA's.

str_pad(c(120,123), width = 6, pad = '0')
[1] "000120" "000123"
str_pad(c(120,123,NA), width = 6, pad = '0')
Error in rep.int(string[i], times[i]) : invalid 'times' value

Just need to skip the NA's.

Implementing a str_elide function for stringr

I was looking for a elide function that could shorten long strings by replacing the too long middle part by “…”. Since I couldn’t find one for r quickly (I coouldn't find one in the stringi package either), I wrote my own. I think others may also have an interest in that and I would appreciate if you could incorporate it into your package. Below is my implementation (which is public domain licensed).

str_elide = function(s, length = 20, elideText = "...") {
el = str_length(elideText)
l = (length %/% 2) - (el %/% 2)
s1 = str_sub(s, 1, l)
s2 = str_sub(s, str_length(s)-(length-el-l)+1, str_length(s))
s12 = paste0(s1, elideText, s2)
ifelse(str_length(s) > length, s12, s)
}

Problem with str_sub<-

mytext <- c("bob","hadley","george")
str_sub(mytext, 1, 1) <- toupper(str_sub(mytext, 1, 1))
mytex

str_split creates empty leading "" when splitting on ""

Compare

str_split("abc","")
[[1]]
[1] ""  "a" "b" "c"

with

strsplit("abc","")
[[1]]
[1] "a" "b" "c"

Add tools for non-ASCII charsets

e.g. guess encoding function, and stuff based on charToRaw

ignore_case is not working for ICU regex() patterns

It doesn't seem like ignore_case argument is working for regex patterns:

library(stringr)
x <- c("a", "A")
str_detect(regex("a"), x)

gives

[1]  TRUE FALSE

and

str_detect(regex("a", ignore_case = TRUE), x)

gives

[1]  TRUE FALSE

My system is

> devtools::session_info()
Session info -------------------------------------------------------------------
 setting  value                       
 version  R version 3.1.3 (2015-03-09)
 system   x86_64, darwin13.4.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/New_York            

Packages -----------------------------------------------------------------------
 package   * version  date       source        
 bitops      1.0-6    2013-08-17 CRAN (R 3.1.0)
 devtools    1.8.0    2015-05-09 CRAN (R 3.1.3)
 digest      0.6.8    2014-12-31 CRAN (R 3.1.2)
 git2r       0.10.1   2015-05-07 CRAN (R 3.1.3)
 magrittr    1.5      2014-11-22 CRAN (R 3.1.2)
 memoise     0.2.1    2014-04-22 CRAN (R 3.1.0)
 RCurl       1.95-4.6 2015-04-24 CRAN (R 3.1.3)
 rversions   1.0.0    2015-04-22 CRAN (R 3.1.3)
 stringi     0.4-1    2014-12-14 CRAN (R 3.1.2)
 stringr   * 1.0.0    2015-04-30 CRAN (R 3.1.3)
 XML         3.98-1.1 2013-06-20 CRAN (R 3.1.0)

Check str_match/str_match_all output with no matches

e.g.

str_match_all("abc", "d")
str_match("abc", "d")

Should have one row for each input, and one column for each match + 1.

Missed stringi dependance

Since now stringr depends on stringi, the latter should be included in the "Depends:" field and not only in the "Imports:" in DESCRIPTION.
In my case, not having stringi installed, update.packages() failed on stringr.

str_pad() breaks with NAs as input

The current version of stringr's str_pad() (stringr_0.6.2 on R_3.1.3 on Win8) function does behave unexpected in case of NA inputs:

str_pad(NA, 2, "left", 0)
## Error in rep.int(string[i], times[i]) : invalid 'times' value

... instead of giving back NA.

That beeing said, the current version on Github (via ...

devtools::install_github("Rexamine/stringi")
devtools::install_github("hadley/stringr")

... ) does behave as I would have expected by giving back NA output whenever there is NA input.

documentation: ?stringr

version 1.0.0
?stringr gives an almost empty help:

Fast and friendly string manipulation.

Description

Fast and friendly string manipulation.

I think, it would be nice if ?stringr would list the commands of the stringr package and refer the reader to the help of the specific commands and the vignette

list in str_c

You may have considered and rejected this idea, but there are a few cases for me where this would be useful to pass a list containing only character vectors of length one. Is this something you want to support? paste currently does handle this.

str_detect() feature suggestion

For example, I want to validate the string argument of a function with a regex and the argument must exactly match the regex.

dummy <- function(x) {
    stopifnot(str_detect(x, "[ABC]{3}"))
}

I want this function to accept only argument in the format of "BBC", "AAA", "CBC" or "AAB". But I don't want this function to accept "ABCD" or "AAAA".
One approach is str_extract(x, "[ABC]{3}") == x but it is not intuitive.

UPDATE: perhaps I should use a better regex. Thanks gagolwes.

FR: resolve incompatible pattern modification earlier

Please have incompatible search modifiers fixed vs perl / ignore.case be resolved at the call of these function and not defer to the str_* calls. Take the following example:

pattern <- 
  "str" %>%
  ignore.case %>%
  perl  # %>%
  # fixed -> pattern


str(pattern)
# Overriding Perl regexp matching
#  atomic [1:1] pattern
#  - attr(*, "ignore.case")= logi TRUE
#  - attr(*, "perl")= logi TRUE
#  - attr(*, "fixed")= logi TRUE

In this case, each of the match modifiers set an attribute to TRUE, though they are incompatible. If one were to examine pattern as is done in the example, the effects are unclear as they will be resolved later. A better method would have successive calls to the modifiers adjust pattern as appropriate. This can be done changing the functions. For example, fixed might become:

fixed <-   function(string) {
  if (stringr::is.perl(string)) 
    message("Overriding Perl regexp matching")
  structure(string, fixed = TRUE, perl = NULL, ignore.case = NULL )
}

Or perl = FALSE as another alternative.

str_zpad

str_zpad <- function(string, width = max(str_length(string)), side = "left", pad = "0")
  str_pad(string, width, side, pad)

Idea from Bill Venables

New feature request:str_between

Per your callout
Excuse me if it is already there under another guise
e.g.
myText <- "1-10 of 1,224 reviews"

res <- str_between(myText,"of "," reviews")

res # 1,224

It would be the cherry on top to have a toInteger parameter available to result in 1224

str_sort and str_reverse

I think stringr will be better for having two functions added to it:

str_sort to sort each element of a string, e.g. str_sort(c("cba", "zxy", "fge")) will return c("abc", "xyz", "efg")
str_reverse to reverse the characters in each string, e.g. str_reverse(c("abcde", "fghij") will return c("edcba", "jihgf")

I am prepared to contribute the functions, documentation and test_that code if you think this is a good idea.

bug with new stringr release? wildcard does not match \n

The R-package stringr behaves differently between R-Version 3.1.1 and 3.2.0 (verified on two different machines). Under 3.2.0 the wildcard does not match \n

simplified example:
x <- "abc\n23"

in version R-3.1.1
str_extract_all(x, "a.+?[[:digit:]]{2}")
[1] "abc\n23"

in version R-3.2.0
str_extract_all(x, "a.+?[[:digit:]]{2}")
[[1]]
character(0)

R CMD check failed: The requested ICU resource file cannot be found

Travis has been failing but your Travis script failed to capture the failure for some reason: https://travis-ci.org/hadley/stringr/builds/61152178 I noticed it because my knitr repo started to fail after stringr was upgraded (https://travis-ci.org/yihui/knitr/jobs/61492954):

The requested ICU resource file cannot be found. Possible problem: ICU data has not been downloaded yet. Call stri_install_check(). (U_FILE_ACCESS_ERROR)

stringr has the same error, which I don't completely understand. This may be related to #52.

BTW, you most recent check also failed (for a different reason): https://travis-ci.org/hadley/stringr/builds/61467874 and Travis failed to capture it, either.

Match locations in str_locate_all

Is this intended behavior?

[[1]]
     start end
[1,]     1   0
[2,]     2   1
[3,]     3   2
[4,]     4   3
[5,]     5   4

Because, to me, this shouldn't be:

[[1]]
[1] ""  "h" "e" "l" "l" "o"

(Note the first value in the vector is an empty string. I would expect "h", "e", "l", "l", "o".)

Add support for named capture groups

Base R supports Python-style named capture groups with the perl option.

pat <- '-(?<food>[a-z]+)-'
string <- '-bacon-'
regexpr(pat, string, perl=TRUE)

It would be great to be able to use these patterns with stringr. Right now, a pattern such as this generates an error:

str_match_all(string, regex(pat))
str_match_all(string, perl(pat))
# Error in stri_match_all_regex(string, pattern, cg_missing = "", omit_no_match = TRUE,  : 
#   Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)

Installation failure via install_github()

I wanted the stringr vignette, which didn't seem available on CRAN, so I decided to install from GitHub and request vignette build at install time.

First I tried install_github("hadley/stringr", build_vignettes = TRUE)

> devtools::install_github("hadley/stringr", build_vignettes = TRUE)
Downloading github repo hadley/stringr@master
Installing stringr
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD build  \
  '/private/var/folders/bb/xs02zqls0snbgswbgkjwcbph0000gn/T/Rtmp0I9C3A/devtoolsed3119acff56/hadley-stringr-bd4e71f'  \
  --no-manual --no-resave-data 

* checking for file ‘/private/var/folders/bb/xs02zqls0snbgswbgkjwcbph0000gn/T/Rtmp0I9C3A/devtoolsed3119acff56/hadley-stringr-bd4e71f/DESCRIPTION’ ... OK
* preparing ‘stringr’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Error: processing vignette 'stringr.Rmd' failed with diagnostics:
unused argument (omit_no_match = TRUE)
Execution halted
Error: Command failed (1)

Then I tried without requesting the vignette:

> devtools::install_github("hadley/stringr")
Downloading github repo hadley/stringr@master
Installing stringr
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL  \
  '/private/var/folders/bb/xs02zqls0snbgswbgkjwcbph0000gn/T/Rtmp0I9C3A/devtoolsed315912cf67/hadley-stringr-bd4e71f'  \
  --library='/Users/jenny/resources/R/libraryCRAN' --install-tests 

* installing *source* package ‘stringr’ ...
** R
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (stringr)
Reloading installed stringr
unloadNamespace("stringr") not successful. Forcing unload.

Then I just grabbed a copy of the Rmd for the vignette, saved as foo.rmd, and tried "Knit":

processing file: foo.rmd
Error in stri_locate_all_regex(string, pattern, omit_no_match = TRUE,  : 
  unused argument (omit_no_match = TRUE)
Calls: <Anonymous> ... parse_inline -> str_locate_all -> stri_locate_all_regex
Execution halted

Then I tried walking through the code "by hand" and got my first error here:

> str_detect(strings, phone)
[1] FALSE  TRUE  TRUE  TRUE
> str_subset(strings, phone)
Error in stri_subset_regex(string, pattern, omit_na = TRUE, opts_regex = attr(pattern,  : 
  unused argument (omit_na = TRUE)

At this point, here's what session info looks like:

> devtools::session_info()
Session info---------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.1.2 (2014-10-31)
 system   x86_64, darwin10.8.0        
 ui       RStudio (0.98.1091)         
 language (EN)                        
 collate  en_CA.UTF-8                 
 tz       America/Vancouver           

Packages-------------------------------------------------------------------------------------
 package    * version    date       source                          
 devtools     1.6.0.9000 2014-11-30 Github (hadley/devtools@bd9c252)
 evaluate     0.5.5      2014-04-29 CRAN (R 3.1.0)                  
 formatR      1.0        2014-08-25 CRAN (R 3.1.1)                  
 knitr        1.8.3      2014-11-30 Github (yihui/knitr@21da020)    
 magrittr     1.5        2014-11-22 CRAN (R 3.1.2)                  
 rstudioapi   0.1        2014-03-27 CRAN (R 3.1.0)                  
 stringi      0.3.1      2014-11-06 CRAN (R 3.1.2)                  
 stringr    * 0.9.0.9000 2015-01-08 Github (hadley/stringr@bd4e71f)

Feature request : adding %.% operator to concatenate strings

Using paste or str_c very fast becomes hard to read. Other languages now some kind of operator to paste together strings like '.' or '+' - it would be nice to have such a thing going as well E. G:

'%. %' <- function(a, b) paste0(a,b)

str_wrap

Should work similarly to strwrap but should return strings combined with newlines.

str_c with Dates broken in 1.0.0 cran + github

stringr_1.0.0

str_c("x",Sys.Date())
[1] "x16556"

stringr_1.0.0.9000

str_c("x",Sys.Date())
[1] "x16556"

stringr_0.6.2

str_c("x",Sys.Date())
[1] "x2015-05-01"

Possible unintended behaviour of invert_match

If you use str_locate_all() on a string with consecutive matches, e.g.
str_locate_all(c("hello"), c("l"))
and then try to invert_match() it, you get row of the resultant matrix which is potentially problematic:
invert_match(str_locate_all(c("hello"), c("l"))[[1]])
gives

     start end
[1,]     0   2
[2,]     4   3
[3,]     5  -1

That row 2 is odd: the string start at position 4 and ends at position 3. Perhaps this behaviour is intended but perhaps not. I would have expected the matrix to be 2x2, since there are two regions with non-matched characters, "he" and "o". The zero-length "match" between the "l"s could be unexpected for some users.
I believe that this case should at least be addressed in the help file to manage users' expectations of how the function behaves, or possibly corrected if the function isn't intended to produce that sort of result.

Incorrect error messages

For dev version: 0.9.0.9000

Error messages ask user to use regexp function; the function appears to be named regex. See for

> packageVersion('stringr')
[1] "0.9.0.9000"

> perl("test")
perl is deprecated. Please use regexp instead
...

> ignore.case("test")
Please use (fixed|coll|regexp)(x, ignore_case = TRUE) instead of ignore.case(x)
...
>

And this:

type.regexp <- function(x) "regex"

Seems that there is some more general confusion between regex and regexp. Outside of R, I believe regex is more common. I would nominate using regex

`str_match` does not work well with non-capturing groups

str_match(state.name, "^(?:Ala|Mas).*(.)$")[1:3,]
[,1] [,2] [,3]
[1,] "Alabama" "a" "Alabama"
[2,] "Alaska" "a" "Alaska"
[3,] NA NA NA
Warning message:
In rbind(c("Alabama", "a"), c("Alaska", "a"), c(NA_character_, NA_character_, :
number of columns of result is not a multiple of vector length (arg 1)

The problem appears to be that for non-matching rows, the number of matches is counted to include the non-matching group. I think this is the problem line since this will count the non-capturing parenthesis:

n <- str_length(str_replace_all(tmp, "[^(]", "")) + 1

A possible fix is to add this line just before to remove the non-capturing paren:

tmp <- str_replace_all(tmp, "(?:", "")

which appears, to work, but I have not tested thoroughly at all.

This is on version 0.6.2

tidyverse / stringr Goto Github PK

stringr's Issues

Issue:

Examples:

related feature request:

Recommend Projects

Recommend Topics

Recommend Org