Comments (6)
The behaviour of strsplit confused me a bit. Is this consistent?
strsplit("", "")
# [[1]]
character(0)
strsplit(" ", " ")
# [[1]]
# [1] ""
from stringr.
After reading the algorithm in ?strsplit, I found this is consistent but not typical. Usually (i.e., in the other languages), the second command should return 2 empty characters such as ["", ""].
from stringr.
The latter seems correct to me, since it should behave the same as
> strsplit("a","a")
[[1]]
[1] ""
But I think once you've decided that splitting on "" means to 'split into individual characters', whether strsplit("","") should return "" or character(0) seems like a judgement call to me.
My issue with the behavior of str_split is that pulling only the leading zero-length string seems very arbitrary. One could argue that there are infinitely many instances of "" in every string! Or at least one between every single character. So why shouldn't it return something like:
str_split("abc","")
[[1]]
[1] "" "a" "" "b" "" "c" ""
That's why I think 'split into individual characters' should generally mean 'split into non-zero length characters' with the exception of when you're splitting something that is already empty.
from stringr.
Ok, that helps. Unfortunately I don't have time to write a fix at the moment, but tests/patches would be appreciated! Does this problem also affect str_split_fixed
?
from stringr.
It does effect str_split_fixed, in the sense that it also assumes that every string has a leading empty string:
str_split_fixed("abc","",4)
[,1] [,2] [,3] [,4]
[1,] "" "a" "b" "c"
Again, I think this is mainly confusing because ?str_split_fixed also says that splitting on "" will split into single characters. At the moment, str_split_fixed behaves like this:
> str_split_fixed("abc","",1)
[,1]
[1,] "abc"
> str_split_fixed("abc","",2)
[,1] [,2]
[1,] "" "abc"
> str_split_fixed("abc","",3)
[,1] [,2] [,3]
[1,] "" "a" "bc"
> str_split_fixed("abc","",4)
[,1] [,2] [,3] [,4]
[1,] "" "a" "b" "c"
where I would expect either that pattern = "" would override n completely and always force a split into "a" "b" "c". Or possibly something more subtle like this:
> str_split_fixed("abc","",1)
[,1]
[1,] "abc"
> str_split_fixed("abc","",2)
[,1] [,2]
[1,] "a" "bc"
> str_split_fixed("abc","",3)
[,1] [,2] [,3]
[1,] "a" "b" "c"
> str_split_fixed("abc","",4)
[,1] [,2] [,3] [,4]
[1,] "a" "b" "c" ""
from stringr.
Fixed in stringi branch
from stringr.
Related Issues (20)
- Add helpers for `str_match()` HOT 2
- str_split simplify in mutate with native pipe - potentially untidy behaviour HOT 1
- Font is too big in the viewer from `str_view()` and `str_view_all()` HOT 1
- New features to easily capture text before or after n instance of delimiters without regex HOT 1
- Using a function as value for replacement argument in ```str_replace_all()``` HOT 1
- Adding a str_sub replacement in an R package, checking fails HOT 1
- `[[.stringr_pattern` method not defined
- Upkeep for stringr (2023)
- options(stringr.html = TRUE) does not work HOT 1
- Release stringr 1.5.1
- Problems with stringr and phyloseq
- str_view_all Shows Incorrect Error Message
- Add the .trim parameter to str_glue() HOT 1
- `str_detect_all` for multiple patterns? HOT 1
- Strings not properly splitting into separate list elements? HOT 3
- str_split not splitting correctly on Unicode character HOT 2
- SQL's LIKE is actually case sensitive HOT 4
- str_replace_all: unexpected behaviour HOT 3
- Behavior of str_remove when called with NA_character_ pattern
- Unexpected behavior using `str_sub` when having bigger or smaller start/end values than the minimum/maximun length of the 'subsetted' string
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stringr.