Giter VIP home page Giter VIP logo

Comments (4)

huangyxi avatar huangyxi commented on June 9, 2024

Another example:

julia> combine(groupby(a, :k), :t=>stack)
4×2 DataFrame
 Row │ k      t_stack
     │ Int64  Int64
─────┼────────────────
   11        1
   21        2
   32        3
   42        4

Expected behavior:

julia> combine(groupby(a, :k), :t=>stack)
4×2 DataFrame
 Row │ k      t_stack
     │ Int64   Array
─────┼────────────────
   11   [1, 2]
   22   [3, 4]

I'm uncertain about the internal mechanism, but it appears that the DataFrame might undergo flattening after the combination of grouped DataFrames.

from dataframes.jl.

huangyxi avatar huangyxi commented on June 9, 2024

Alternative using Base.vect and Ref.

https://discourse.julialang.org/t/groupby-and-aggregate-a-dataframe-with-custom-function-that-return-a-vector/69976

from dataframes.jl.

bkamins avatar bkamins commented on June 9, 2024

This is a design feature. The rule is that if function returns a vector it gets expanded. The reason is that in a vast majority of cases this is what users expect, and requiring them to flatten the result every time in this case would be inconvenient.
Note that even the simplest :a => identity requires flattening to produce a correct result.

It is important to understand that aggregation functions decide about how to handle the results on transformations based on the VALUE returned, not based on a function called. Relying on a function called would produce many special cases that would be even harder to learn.

Your case is rare (applying sum over vector of vectors) therefore the decision was that it should be handled by a special rule. As you have found, and as is written in the docstring:

In all of these cases, function can return either a single row or multiple rows. As a particular rule, values wrapped in a Ref or a 0-dimensional AbstractArray are unwrapped and then treated as a single row.

So you can write e.g. one of these (whichever is easier to remember for you):

julia> combine(groupby(a, :k), :v=>Ref∘sum)
2×2 DataFrame
 Row │ k      v_Ref_sum
     │ Int64  Array…
─────┼──────────────────
   1 │     1  [4, 6]
   2 │     2  [12, 14]

julia> combine(groupby(a, :k), :v=>fill∘sum)
2×2 DataFrame
 Row │ k      v_fill_sum
     │ Int64  Array…
─────┼───────────────────
   1 │     1  [4, 6]
   2 │     2  [12, 14]

To get what you want.

from dataframes.jl.

huangyxi avatar huangyxi commented on June 9, 2024

Thank you for your response. I have updated the document to ensure proper dissemination.

from dataframes.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.