tidyverse / stringr Goto Github PK
View Code? Open in Web Editor NEWA fresh approach to string manipulation in R
Home Page: https://stringr.tidyverse.org
License: Other
A fresh approach to string manipulation in R
Home Page: https://stringr.tidyverse.org
License: Other
For awhile now, I've wanted a way to use fuzzywuzzy in R. I've even tried installing R-Python translators to no avail. If stringr could include any part of this type of functionality, it would make my life much, much easier.
as third argument instead of fixed text.
I am not sure if this is expected behavior, but I was testing out some possible solutions to a question on stackoverflow and got the following behavior:
Apologies in advance for abusing the function. I hope it helps.
> str_length("\123")
[1] 1
> str_sub("\123", -1)
[1] "S"
> str_sub("\123", -20)
[1] "S"
> str_sub("\123123", end = -11)
[1] ""
> str_sub("\123123", end = -1)
[1] "S123"
> str_sub("\001", end = -1)
[1] "\001"
> str_sub("\001", end = -0)
[1] ""
> str_sub("\001", end = -2)
[1] ""
> str_sub("\001", end = 1)
[1] "\001"
> str_sub("\001", end = 6)
[1] "\001"
> str_sub("\00001", end = 6)
Error: embedded nul in string: '\001'
Options to simplify handling escapes would be a great feature; e.g. see these other SO questions:
This function working based on grepl function . But in interface (str_detect) I can't find solutions with using extend POSIX i.e perl = FALSE, value = FALSE.
Suppose I need detect simple string like a:
s1 <- paste("(?!NEGATE_TERMS_I_DONT_HAVE_IT) term1 term2", sep="")
When I trying use:
isDetect <- str_detect("string to match", s1)
I getting error:
Error in grepl("(?!NEGATE_TERMS_I_DONT_HAVE_IT) term1 term2", "Sd",
fixed = FALSE, :
invalid regular expression '(?!NEGATE_TERMS_I_DONT_HAVE_IT) term1 term2'
In addition: Warning message:
In grepl("(?!NEGATE_TERMS_I_DONT_HAVE_IT) term1 term2", "Sd", fixed = FALSE, :
regcomp error: 'Invalid regexp'
And i must use standard grepl function
isDetect <- grepl(s1, "string to match", TRUE, TRUE)
I think will be usefull use in str_detect parameter like perl = FALSE,
value = FALSE.
This could be a totally dumb question, but I am trying to strip out prices for a set of string using position of a "$". However str_detect("Jokesonme", "$") gives me TRUE even if there is no "$" in the string.
In ?perl
and the deprecation message that prints when you use perl
, regexp
is referred to instead of regex
.
The DESCRIPTION file should indicate this dependency.
I recommend adding an exact
match modifier like perl
, fixed
and ignore.case
.
The exact
modifier should match only on exact matches unlike fixed
which matches on part of the string. Although this can be done using ==
, the exact
modifier would allow developers a parallel idiom to switch between exact and less-exact matchings.
An alternative, of course, is to use perl
with a pattern wrapped between ^
and $
, but this solutions required applying a function to the pattern and not the string and thus breaking the parallel construction.
Internally, this could use the perl
construct described in the preceding paragraph or use the ==
, which should be faster.
This could probably be implemented -entirely- mostly within the re_call
function.
There was a bug introduced in the latest version of stringr in the str_wrap function. In previous versions (stringr_0.6.2), if an empty string was passed as input, the function worked fine, but stringr_1.0.0 throws an error.
Example Code
> library(stringr)
> str_wrap("",width=5)
Error in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE) :
argument `...` should be a character vector (or an object coercible to)
Session Info
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] graphics grDevices datasets stats utils methods base
other attached packages:
[1] stringr_1.0.0
loaded via a namespace (and not attached):
[1] magrittr_1.5 stringi_0.4-1 tools_3.1.1
Hi Hadley!
Trying to implement my two little functions to stringr
(as discussed per mail some time ago), I found the following problem checking the original version first:
If i build stringr using RStudio, everything works as expected, but the check fails throwing the following error:
==> roxygenize('.', roclets=c('rd', 'collate', 'namespace'))
* checking for changes ... ERROR
Error in stri_replace_all_regex(string, pattern, replacement, vectorize_all = vec, :
Missing closing bracket on a bracket expression. (U_REGEX_MISSING_CLOSE_BRACKET)
Using the command line R CMD check
fails with this output:
...
* checking examples ... ERROR
Running examples in ‘stringr-Ex.R’ failed
The error most likely occurred in:
> ### Name: case
> ### Title: Convert case of a string.
> ### Aliases: case str_to_lower str_to_title str_to_upper
>
> ### ** Examples
>
> dog <- "The quick brown dog"
> str_to_upper(dog)
[1] "THE QUICK BROWN DOG"
> str_to_lower(dog)
[1] "the quick brown dog"
> str_to_title(dog)
Error in stri_trans_totitle(string, opts_brkiter = stri_opts_brkiter(locale = locale)) :
The requested ICU resource cannot be found. Possible problem: ICU data has not been downloaded yet. Call `stri_install_check()`. (U_MISSING_RESOURCE_ERROR)
Calls: str_to_title -> stri_trans_totitle -> .Call
Execution halted
Starting R to check the mentioned possible problem gives
> library(stringi)
> stri_install_check()
stringi_0.5.1; en_US.UTF-8; ICU4C 51.2; Unicode 6.2
All tests completed successfully.
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-suse-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringi_0.5-1
loaded via a namespace (and not attached):
[1] tools_3.1.1
Cheers,
Gerhard
This does not work, although it should return the missing value unchanged.
str_pad(c("hello", NA), 8)
Error in rep.int(string[i], times[i]) : invalid 'times' value
Not sure about whether other stringr functions are affected by this bug as well.
I'm trying to pull the first 2 characters before an underscore out of a string, if an underscore exists. So, for:
mystr <- "cp_awesome"
I'm just trying to get "cp"
> str_extract(mystr, "[a-z]{2}(?=_)")
Error in regexpr("[a-z]{2}(?=_)", "cp_awesome", fixed = FALSE, :
invalid regular expression '[a-z]{2}(?=_)', reason 'Invalid regexp'
fails, but
> str_extract(mystr, "[a-z]{2}(?:_)")
[1] "cp_"
Succeeds, but operates as a grouping param instead. stringr seems to be rejecting the (?=)
syntax.
(From Stavros)
Your email said "all functions now vectorised with respect to string, pattern (and where appropriate) replacement parameters".
The doc for str_extract does not reflect this change; it says: "'pattern' should be a single pattern", though in fact it does vectorize over pattern:
> str_extract(c('abc'),c('.','..'))
[1] "a" "ab"
On the other hand, str_extract_all is buggy:
> str_extract_all(c('abcd'),c('.','..'))
[[1]]
[1] "a" "b" "c" "d" <<<<<<<< what happened to the matches for '..'?
But when we duplicate the string part, we get the correct result:
> str_extract_all(rep(c('abcd'),2),c('.','..'))
[[1]]
[1] "a" "b" "c" "d"
[[2]]
[1] "ab" "cd"
In str_match, the doc says that pattern should be a single pattern, and I get an error message if it isn't, but the result seems to use both patterns:
> str_match(c('abc','xy'),c('(.)','(..)'))
[,1] [,2]
[1,] "a" "a"
[2,] "xy" "xy"
Warning messages:
1: In if (n == 0) { :
the condition has length > 1 and only the first element will be used
2: In seq_len(n) : first element used of 'length.out' argument
Please add functions to convert a character string to all lowercase, all UPPERCASE, all First Letters Of Words In Capitalized Case and all camelCase. You could call the functions: str_lower, str_upper, str_capitalise, and str_CamelCase.
The first two are more straightforward and should be modeled on the tolower() and toupper() in base R. The last two are more tricky to get right. One source of inspiration could be the tocamel() function in the development version of the 'rapport' package: https://github.com/Rapporter/rapport/tree/development . The associated issues have been partially discussed on r-help: http://r.789695.n4.nabble.com/how-to-transform-string-to-quot-Camel-Case-quot-td4664222.html
Should you decide to take the 'rapport' approach and merge str_capitalise and str_CamelCase into one function, then you could call it str_camel.
word() grabs words from char strings. For example:
str = 'abc.123.999..'
word(str, 1, delim='.') would return 'abc'
word(str, 2, delim='.') would return '123'
word(str, -1, delim='.') would return '999'
suggested by David Cooper
On empty strings and zero-length character vectors
Ref: r-lib/pkgdown#49
Hi @hadley ,
Please could you add citation to the package. Although i could do it myself (both correct citation and pull request), I am afraid it would be better to wait for you.
In case I should do it, let me know. Thanks.
This minimal case demonstrates the problem (a bunch of non-capturing groups have been appended to make the problem obvious). The problem only occurs when at least one string does not match.
library(magrittr)
library(stringr)
x <- c("A_B_C", "THIS DOES NOT MATCH")
matcher <- regexec("(A)_(B)_(?:C)", x)
matches <- regmatches(x, matcher) %>% print
x %>% str_match("(A)_(B+)_(?:C)(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?")
The last line produces this resuit:
> x %>% str_match("(A)_(B+)_(?:C)(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?(?:.)?")
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] "A_B_C" "A" "B" "A_B_C" "A" "B" "A_B_C" "A" "B" "A_B_C" "A"
[2,] NA NA NA NA NA NA NA NA NA NA NA
Warning message:
In rbind(c("A_B_C", "A", "B"), c(NA_character_, NA_character_, NA_character_, :
number of columns of result is not a multiple of vector length (arg 1)
The bug is in these lines:
tmp <- str_replace_all(pattern, "\\\\\\(", "")
n <- str_length(str_replace_all(tmp, "[^(]", "")) + 1
which attempt to count the number of number of capture groups, but fail to exclude non-capturing groups. (Thinking about it, they probably also fail to include a capture group preceded by an even number of backslashes.)
I realize this code has all been replaced by stringi in the devel version, but if you're still maintaining the release version, it would be good to fix this.
> str_subset("I", fixed("i", ignore_case = TRUE))
character(0)
I was expecting to get "I", not the empty string.
Often I want str_c (and friends) to behave correspondingly with e.g. sum.
I.e. sum(NA, 2)
yields NA and I can do sum(NA, 2, na.rm = T)
to get 2.
str_c(NA,2)
yields NA2. The shortest way around this, that I've found to yield NA is
df[ ,strung_together := ifelse( any( is.na(col1), is.na(col2) ), NA, str_c(col1, col2)]
So, it would be cool to get str_c(col1, col2, na.rm = F)
= NA.
Like in python. e.g.
> str_match(strings,"([2-9][0-9]{2})[- .](?P<area>[0-9]{3})[- .]([0-9]{4})")
area
[1,] "219 733 8965" "219" "733" "8965"
[2,] "329-293-8753" "329" "293" "8753"
[3,] NA NA NA NA
[4,] "595 794 7569" "595" "794" "7569"
The group identification behavior in str_match
requires the (
character to be escaped in character classes, in contrast to the group identification behavior in base R.
For example, with gsub
,capturing a (
in the group does not require escaping it if it is in a character class:
gsub("([(]...[)])","123", c("(abc)", "xyz"))
[1] "123" "xyz"
but it does with str_match
str_match(c("(abc)", "xyz"), "([(]...[)])")
[,1] [,2] [,3]
[1,] "(abc)" "(abc)" "(abc)"
[2,] NA NA NA
Warning message:
In rbind(c("(abc)", "(abc)"), c(NA_character_, NA_character_, NA_character_ :
number of columns of result is not a multiple of vector length (arg 1)
While it is possible to get around this explicitly escaping \\(
like this
str_match(c("(abc)", "xyz"), "([\\(]...[\\)])")
the documentation says that the syntax should be consistent with base R.
str_pad(c(120,123), width = 6, pad = '0')
[1] "000120" "000123"
str_pad(c(120,123,NA), width = 6, pad = '0')
Error in rep.int(string[i], times[i]) : invalid 'times' value
Just need to skip the NA's.
I was looking for a elide function that could shorten long strings by replacing the too long middle part by “…”. Since I couldn’t find one for r quickly (I coouldn't find one in the stringi package either), I wrote my own. I think others may also have an interest in that and I would appreciate if you could incorporate it into your package. Below is my implementation (which is public domain licensed).
str_elide = function(s, length = 20, elideText = "...") {
el = str_length(elideText)
l = (length %/% 2) - (el %/% 2)
s1 = str_sub(s, 1, l)
s2 = str_sub(s, str_length(s)-(length-el-l)+1, str_length(s))
s12 = paste0(s1, elideText, s2)
ifelse(str_length(s) > length, s12, s)
}
mytext <- c("bob","hadley","george")
str_sub(mytext, 1, 1) <- toupper(str_sub(mytext, 1, 1))
mytex
Compare
str_split("abc","")
[[1]]
[1] "" "a" "b" "c"
with
strsplit("abc","")
[[1]]
[1] "a" "b" "c"
e.g. guess encoding function, and stuff based on charToRaw
It doesn't seem like ignore_case
argument is working for regex
patterns:
library(stringr)
x <- c("a", "A")
str_detect(regex("a"), x)
gives
[1] TRUE FALSE
and
str_detect(regex("a", ignore_case = TRUE), x)
gives
[1] TRUE FALSE
My system is
> devtools::session_info()
Session info -------------------------------------------------------------------
setting value
version R version 3.1.3 (2015-03-09)
system x86_64, darwin13.4.0
ui X11
language (EN)
collate en_US.UTF-8
tz America/New_York
Packages -----------------------------------------------------------------------
package * version date source
bitops 1.0-6 2013-08-17 CRAN (R 3.1.0)
devtools 1.8.0 2015-05-09 CRAN (R 3.1.3)
digest 0.6.8 2014-12-31 CRAN (R 3.1.2)
git2r 0.10.1 2015-05-07 CRAN (R 3.1.3)
magrittr 1.5 2014-11-22 CRAN (R 3.1.2)
memoise 0.2.1 2014-04-22 CRAN (R 3.1.0)
RCurl 1.95-4.6 2015-04-24 CRAN (R 3.1.3)
rversions 1.0.0 2015-04-22 CRAN (R 3.1.3)
stringi 0.4-1 2014-12-14 CRAN (R 3.1.2)
stringr * 1.0.0 2015-04-30 CRAN (R 3.1.3)
XML 3.98-1.1 2013-06-20 CRAN (R 3.1.0)
e.g.
str_match_all("abc", "d")
str_match("abc", "d")
Should have one row for each input, and one column for each match + 1.
Since now stringr depends on stringi, the latter should be included in the "Depends:" field and not only in the "Imports:" in DESCRIPTION.
In my case, not having stringi installed, update.packages() failed on stringr.
The current version of stringr's str_pad()
(stringr_0.6.2 on R_3.1.3 on Win8) function does behave unexpected in case of NA
inputs:
str_pad(NA, 2, "left", 0)
## Error in rep.int(string[i], times[i]) : invalid 'times' value
... instead of giving back NA
.
That beeing said, the current version on Github (via ...
devtools::install_github("Rexamine/stringi")
devtools::install_github("hadley/stringr")
... ) does behave as I would have expected by giving back NA
output whenever there is NA
input.
version 1.0.0
?stringr
gives an almost empty help:
Fast and friendly string manipulation.
Description
Fast and friendly string manipulation.
I think, it would be nice if ?stringr
would list the commands of the stringr package and refer the reader to the help of the specific commands and the vignette
You may have considered and rejected this idea, but there are a few cases for me where this would be useful to pass a list containing only character vectors of length one. Is this something you want to support? paste
currently does handle this.
For example, I want to validate the string argument of a function with a regex and the argument must exactly match the regex.
dummy <- function(x) {
stopifnot(str_detect(x, "[ABC]{3}"))
}
I want this function to accept only argument in the format of "BBC", "AAA", "CBC" or "AAB". But I don't want this function to accept "ABCD" or "AAAA".
One approach is str_extract(x, "[ABC]{3}") == x but it is not intuitive.
UPDATE: perhaps I should use a better regex. Thanks gagolwes.
Please have incompatible search modifiers fixed
vs perl
/ ignore.case
be resolved at the call of these function and not defer to the str_*
calls. Take the following example:
pattern <-
"str" %>%
ignore.case %>%
perl # %>%
# fixed -> pattern
str(pattern)
# Overriding Perl regexp matching
# atomic [1:1] pattern
# - attr(*, "ignore.case")= logi TRUE
# - attr(*, "perl")= logi TRUE
# - attr(*, "fixed")= logi TRUE
In this case, each of the match modifiers set an attribute to TRUE
, though they are incompatible. If one were to examine pattern
as is done in the example, the effects are unclear as they will be resolved later. A better method would have successive calls to the modifiers adjust pattern
as appropriate. This can be done changing the functions. For example, fixed
might become:
fixed <- function(string) {
if (stringr::is.perl(string))
message("Overriding Perl regexp matching")
structure(string, fixed = TRUE, perl = NULL, ignore.case = NULL )
}
Or perl = FALSE
as another alternative.
str_zpad <- function(string, width = max(str_length(string)), side = "left", pad = "0")
str_pad(string, width, side, pad)
Idea from Bill Venables
Per your callout
Excuse me if it is already there under another guise
e.g.
myText <- "1-10 of 1,224 reviews"
res <- str_between(myText,"of "," reviews")
res # 1,224
It would be the cherry on top to have a toInteger parameter available to result in 1224
I think stringr
will be better for having two functions added to it:
str_sort
to sort each element of a string, e.g. str_sort(c("cba", "zxy", "fge"))
will return c("abc", "xyz", "efg")
str_reverse
to reverse the characters in each string, e.g. str_reverse(c("abcde", "fghij")
will return c("edcba", "jihgf")
I am prepared to contribute the functions, documentation and test_that code if you think this is a good idea.
The R-package stringr behaves differently between R-Version 3.1.1 and 3.2.0 (verified on two different machines). Under 3.2.0 the wildcard does not match \n
simplified example:
x <- "abc\n23"
in version R-3.1.1
str_extract_all(x, "a.+?[[:digit:]]{2}")
[1] "abc\n23"
in version R-3.2.0
str_extract_all(x, "a.+?[[:digit:]]{2}")
[[1]]
character(0)
Travis has been failing but your Travis script failed to capture the failure for some reason: https://travis-ci.org/hadley/stringr/builds/61152178 I noticed it because my knitr repo started to fail after stringr was upgraded (https://travis-ci.org/yihui/knitr/jobs/61492954):
The requested ICU resource file cannot be found. Possible problem: ICU data has not been downloaded yet. Call
stri_install_check()
. (U_FILE_ACCESS_ERROR)
stringr has the same error, which I don't completely understand. This may be related to #52.
BTW, you most recent check also failed (for a different reason): https://travis-ci.org/hadley/stringr/builds/61467874 and Travis failed to capture it, either.
Is this intended behavior?
[[1]]
start end
[1,] 1 0
[2,] 2 1
[3,] 3 2
[4,] 4 3
[5,] 5 4
Because, to me, this shouldn't be:
[[1]]
[1] "" "h" "e" "l" "l" "o"
(Note the first value in the vector is an empty string. I would expect "h", "e", "l", "l", "o".)
Base R supports Python-style named capture groups with the perl
option.
pat <- '-(?<food>[a-z]+)-'
string <- '-bacon-'
regexpr(pat, string, perl=TRUE)
It would be great to be able to use these patterns with stringr. Right now, a pattern such as this generates an error:
str_match_all(string, regex(pat))
str_match_all(string, perl(pat))
# Error in stri_match_all_regex(string, pattern, cg_missing = "", omit_no_match = TRUE, :
# Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)
I wanted the stringr vignette, which didn't seem available on CRAN, so I decided to install from GitHub and request vignette build at install time.
First I tried install_github("hadley/stringr", build_vignettes = TRUE)
> devtools::install_github("hadley/stringr", build_vignettes = TRUE)
Downloading github repo hadley/stringr@master
Installing stringr
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD build \
'/private/var/folders/bb/xs02zqls0snbgswbgkjwcbph0000gn/T/Rtmp0I9C3A/devtoolsed3119acff56/hadley-stringr-bd4e71f' \
--no-manual --no-resave-data
* checking for file ‘/private/var/folders/bb/xs02zqls0snbgswbgkjwcbph0000gn/T/Rtmp0I9C3A/devtoolsed3119acff56/hadley-stringr-bd4e71f/DESCRIPTION’ ... OK
* preparing ‘stringr’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Error: processing vignette 'stringr.Rmd' failed with diagnostics:
unused argument (omit_no_match = TRUE)
Execution halted
Error: Command failed (1)
Then I tried without requesting the vignette:
> devtools::install_github("hadley/stringr")
Downloading github repo hadley/stringr@master
Installing stringr
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL \
'/private/var/folders/bb/xs02zqls0snbgswbgkjwcbph0000gn/T/Rtmp0I9C3A/devtoolsed315912cf67/hadley-stringr-bd4e71f' \
--library='/Users/jenny/resources/R/libraryCRAN' --install-tests
* installing *source* package ‘stringr’ ...
** R
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (stringr)
Reloading installed stringr
unloadNamespace("stringr") not successful. Forcing unload.
Then I just grabbed a copy of the Rmd for the vignette, saved as foo.rmd
, and tried "Knit":
processing file: foo.rmd
Error in stri_locate_all_regex(string, pattern, omit_no_match = TRUE, :
unused argument (omit_no_match = TRUE)
Calls: <Anonymous> ... parse_inline -> str_locate_all -> stri_locate_all_regex
Execution halted
Then I tried walking through the code "by hand" and got my first error here:
> str_detect(strings, phone)
[1] FALSE TRUE TRUE TRUE
> str_subset(strings, phone)
Error in stri_subset_regex(string, pattern, omit_na = TRUE, opts_regex = attr(pattern, :
unused argument (omit_na = TRUE)
At this point, here's what session info looks like:
> devtools::session_info()
Session info---------------------------------------------------------------------------------
setting value
version R version 3.1.2 (2014-10-31)
system x86_64, darwin10.8.0
ui RStudio (0.98.1091)
language (EN)
collate en_CA.UTF-8
tz America/Vancouver
Packages-------------------------------------------------------------------------------------
package * version date source
devtools 1.6.0.9000 2014-11-30 Github (hadley/devtools@bd9c252)
evaluate 0.5.5 2014-04-29 CRAN (R 3.1.0)
formatR 1.0 2014-08-25 CRAN (R 3.1.1)
knitr 1.8.3 2014-11-30 Github (yihui/knitr@21da020)
magrittr 1.5 2014-11-22 CRAN (R 3.1.2)
rstudioapi 0.1 2014-03-27 CRAN (R 3.1.0)
stringi 0.3.1 2014-11-06 CRAN (R 3.1.2)
stringr * 0.9.0.9000 2015-01-08 Github (hadley/stringr@bd4e71f)
Using paste or str_c very fast becomes hard to read. Other languages now some kind of operator to paste together strings like '.' or '+' - it would be nice to have such a thing going as well E. G:
'%. %' <- function(a, b) paste0(a,b)
Should work similarly to strwrap
but should return strings combined with newlines.
stringr_1.0.0
str_c("x",Sys.Date())
[1] "x16556"
stringr_1.0.0.9000
str_c("x",Sys.Date())
[1] "x16556"
stringr_0.6.2
str_c("x",Sys.Date())
[1] "x2015-05-01"
If you use str_locate_all()
on a string with consecutive matches, e.g.
str_locate_all(c("hello"), c("l"))
and then try to invert_match()
it, you get row of the resultant matrix which is potentially problematic:
invert_match(str_locate_all(c("hello"), c("l"))[[1]])
gives
start end
[1,] 0 2
[2,] 4 3
[3,] 5 -1
That row 2 is odd: the string start at position 4 and ends at position 3. Perhaps this behaviour is intended but perhaps not. I would have expected the matrix to be 2x2, since there are two regions with non-matched characters, "he" and "o". The zero-length "match" between the "l"s could be unexpected for some users.
I believe that this case should at least be addressed in the help file to manage users' expectations of how the function behaves, or possibly corrected if the function isn't intended to produce that sort of result.
For dev version: 0.9.0.9000
Error messages ask user to use regexp
function; the function appears to be named regex
. See for
> packageVersion('stringr')
[1] "0.9.0.9000"
> perl("test")
perl is deprecated. Please use regexp instead
...
> ignore.case("test")
Please use (fixed|coll|regexp)(x, ignore_case = TRUE) instead of ignore.case(x)
...
>
And this:
type.regexp <- function(x) "regex"
Seems that there is some more general confusion between regex
and regexp
. Outside of R, I believe regex
is more common. I would nominate using regex
str_match(state.name, "^(?:Ala|Mas).*(.)$")[1:3,]
[,1] [,2] [,3]
[1,] "Alabama" "a" "Alabama"
[2,] "Alaska" "a" "Alaska"
[3,] NA NA NA
Warning message:
In rbind(c("Alabama", "a"), c("Alaska", "a"), c(NA_character_, NA_character_, :
number of columns of result is not a multiple of vector length (arg 1)
The problem appears to be that for non-matching rows, the number of matches is counted to include the non-matching group. I think this is the problem line since this will count the non-capturing parenthesis:
n <- str_length(str_replace_all(tmp, "[^(]", "")) + 1
A possible fix is to add this line just before to remove the non-capturing paren:
tmp <- str_replace_all(tmp, "(?:", "")
which appears, to work, but I have not tested thoroughly at all.
This is on version 0.6.2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.