Comments (13)
Here's one way:
macro wherex(df, ex)
df = esc(df)
ex = esc(ex)
quote
df2 = transform($df, _idx = true)
$df[(@byrow! df2 :_idx = $ex)[:_idx],:]
end
end
using DataFrames, DataFramesMeta, RDatasets
iris = dataset("datasets", "iris")
i2 = @wherex iris :Species in ("setosa", "virginica")
It's definitely worth thinking about expanding this idea. @transform
is an interesting one. Somehow we need to have a way to specify the type of the column to be created (or we have to try the first one and use that type for the rest of the entries).
from dataframesmeta.jl.
Interesting. Though it doesn't look like the most efficient way of doing this: it would be good to avoid creating a temporary data frame.
Also, I wonder whether the non-vectorized form shouldn't be the recommended one (or even the only supported one): vectorized expressions require storing temporaries when combining operators, which is inefficient.
from dataframesmeta.jl.
Creating a temporary DataFrame is relatively inexpensive, but you could get around it with more effort.
Another issue with byrow operations is that the following won't work.
@where(df, :colA .> mean(:colB))
Supporting that requires something like Devectorize.jl. Maybe embedding @devec
would do the trick.
from dataframesmeta.jl.
I came up with this:
function Base.in{T}(xs::PooledDataArray{T}, ys::AbstractArray{T})
Bool[any(x in ys) for x in xs]
end
Of course it would need more methods for when xs
is a DataArray
, and when ys
is a Tuple
or a Range
(or use Union
s?).
Some micro benchmarks (after JIT warm up):
julia> @time a = @where iris :Species in ["setosa", "virginica"];
0.004165 seconds (1.11 k allocations: 56.937 KB)
julia> @time b = @where iris (:Species .== "setosa") | (:Species .== "virginica");
0.004514 seconds (1.55 k allocations: 84.269 KB)
julia> @time c = @wherex iris :Species in ["setosa", "virginica"];
0.006636 seconds (2.34 k allocations: 117.952 KB)
julia> a == b == c
true
from dataframesmeta.jl.
@Ismael-VC The problem with this method for in
is that you wouldn't be able to check whether an array is present in another as one of the elements, e.g. [1,2] in Any[[1,2], [3,4]]
. This would make in
unpredictable depending on the element type of the arrays. That's why we would need a different operator (JuliaLang/julia#5212).
Also, with complex conditions, a non-vectorized form will always be faster because the vectorized form creates temporary arrays for each one.
@tshort Operations relying on aggregate values would indeed no longer be possible with my proposal. Not sure what can be done about it (except having two different forms of @where
).
from dataframesmeta.jl.
@nalimilan what about using small in (\smallin
: ∊
):
julia> function ∊{T}(xs::PooledDataArray{T}, ys::AbstractArray{T})
Bool[any(x in ys) for x in xs]
end
∊ (generic function with 1 method)
julia> @where iris :Species ∊ ["setosa", "virginica"];
- https://en.wiktionary.org/wiki/∊
- Alternative form of
∈
(“element of”)
- Alternative form of
from dataframesmeta.jl.
@Ismael-VC I don't think it's a good idea. The two operators are easily confused, and nothing in the definition of "small in" implies it's vectorized. Anyway, the present issue is not about vectorizing in
, even if that question is of course related; let's discuss in
in the other issues.
from dataframesmeta.jl.
I would find it both natural and practical to allow using non-vectorized operators, like == instead of .==
The two operators are easily confused, and nothing in the definition of "small in" implies it's vectorized.
@nalimilan I think you are contradicting yourself in those statements, also I thought that the idea would be for documentation to explain that and yes I just focused on in
because of the example, obviously this is still way out of my league, perhaps some day. 😄
from dataframesmeta.jl.
@Ismael-VC Sorry, I don't see where I'm contradicting myself. Here I propose to work row-wise, and use only non-vectorized operators. You proposed to add a new vectorized operator which looks closely like the non-vectorized one.
from dataframesmeta.jl.
Oh yeah you are right. I missed the point of @where
working by row, I got this all wrong, since the beginning then, well at least now I've learn a lot in the process, thanks!
from dataframesmeta.jl.
Integrating conditional deletion of rows into @byrow!
might work (like how if statements work in SAS).
from dataframesmeta.jl.
EDIT: Nevermind, posted this to the wrong issue.
After JuliaLang/julia#22089, this can be "solved" with
julia> a = [1, 2]; b = [1, 2, 3];
julia> ∈′(a,b) = in.(a, [b])
∈′ (generic function with 1 method)
julia> a ∈′ b
2-element BitArray{1}:
true
true
from dataframesmeta.jl.
Closed in favor of #165
from dataframesmeta.jl.
Related Issues (20)
- operators do not work inside function call inside macros HOT 3
- typos HOT 3
- Macro @rolling for scrolling through a column or columns of values? HOT 3
- Add a `@bycol` macro-flag HOT 5
- Add metadata for working with DataFrames HOT 1
- Access subdf in @by and @combine HOT 7
- Request - grouped by columns available as single values rather than vectors HOT 5
- Request: `@order` to mimic `DataFrames.order` in `@orderby` HOT 2
- Very slow `@astable` macro outside a function HOT 4
- `@with` macro clashes with `Base.@with` in Julia 1.11+ HOT 8
- `ByRow` not defined when importing DataFramesMeta HOT 1
- docs question HOT 7
- Request @rsubset_rtransform HOT 7
- Special-case `==` as with other one-argument functions HOT 2
- Add an alternative syntax escaping than `$` HOT 1
- MethodError occurred when broadcasting a string inside @astable HOT 3
- Speculative future of `@groupby` macro
- Allow reference to previously defined columns in @transform HOT 7
- `groupby` derived columns
- Add convenience function to look up a single value in a `DataFrame` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataframesmeta.jl.