Giter VIP home page Giter VIP logo

dance's Introduction

dance

Lifecycle Status Travis build status

Dancing šŸ’ƒ with the stats, aka tibble() dancing šŸ•ŗ. dance is a sort of reinvention of dplyr classic verbs, with a more modern stack underneath, i.e.Ā it leverages a lot from vctrs and rlang.

Installation

You can install the development version from GitHub.

# install.packages("pak")
pak::pkg_install("romainfrancois/dance")

Usage

Weā€™ll illustrate tibble dancing with iris grouped by Species.

library(dance)
g <- iris %>% group_by(Species)

waltz(), polka(), tango(), charleston()

These are in the neighborhood of dplyr::summarise().

waltz() takes a grouped tibble and a list of formulas and returns a tibble with: as many columns as supplied formulas, one row per group. It does not prepend the grouping variables (see tango for that).

g %>% 
  waltz(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
#> # A tibble: 3 x 2
#>   Sepal.Length Sepal.Width
#>          <dbl>       <dbl>
#> 1         5.01        3.43
#> 2         5.94        2.77
#> 3         6.59        2.97

polka() deals with peeling off one layer of grouping:

g %>% 
  polka()
#> # A tibble: 3 x 1
#>   Species   
#>   <fct>     
#> 1 setosa    
#> 2 versicolor
#> 3 virginica

tango() binds the results of polka() and waltz() so is the closest to dplyr::summarise()

g %>% 
  tango(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
#> # A tibble: 3 x 3
#>   Species    Sepal.Length Sepal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             5.01        3.43
#> 2 versicolor         5.94        2.77
#> 3 virginica          6.59        2.97

charleston() is like tango but it packs the new columns in a tibble:

g %>% 
  charleston(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
#> # A tibble: 3 x 2
#>   Species    data$Sepal.Length $Sepal.Width
#>   <fct>                  <dbl>        <dbl>
#> 1 setosa                  5.01         3.43
#> 2 versicolor              5.94         2.77
#> 3 virginica               6.59         2.97

swing, twist

There is no waltz_at(), tango_at(), etc ā€¦ but instead we can use either the same function on a set of columns or a set of functions on the same column.

For this, we need to learn new dance moves:

swing() and twist() are for applying the same function to a set of columns:

library(tidyselect)

g %>% 
  tango(swing(mean, starts_with("Petal")))
#> # A tibble: 3 x 3
#>   Species    Petal.Length Petal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             1.46       0.246
#> 2 versicolor         4.26       1.33 
#> 3 virginica          5.55       2.03

g %>% 
  tango(data = twist(mean, starts_with("Petal")))
#> # A tibble: 3 x 2
#>   Species    data$Petal.Length $Petal.Width
#>   <fct>                  <dbl>        <dbl>
#> 1 setosa                  1.46        0.246
#> 2 versicolor              4.26        1.33 
#> 3 virginica               5.55        2.03

They differ in the type of column is created and how to name them:

  • swing() makes as many new columns as are selected by the tidy selection, and the columns are named using a .name glue pattern, this way we might swing() several times.
g %>% 
  tango(
    swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
    swing(median, starts_with("Petal"), .name = "median_{var}"), 
  )
#> # A tibble: 3 x 5
#>   Species mean_Petal.Lengā€¦ mean_Petal.Width median_Petal.Leā€¦
#>   <fct>              <dbl>            <dbl>            <dbl>
#> 1 setosa              1.46            0.246             1.5 
#> 2 versicā€¦             4.26            1.33              4.35
#> 3 virginā€¦             5.55            2.03              5.55
#> # ā€¦ with 1 more variable: median_Petal.Width <dbl>
  • twist() instead creates a single data frame column.
g %>% 
  tango(
    mean   = twist(mean, starts_with("Petal")), 
    median = twist(median, starts_with("Petal")), 
  )
#> # A tibble: 3 x 3
#>   Species    mean$Petal.Length $Petal.Width median$Petal.Lengā€¦ $Petal.Width
#>   <fct>                  <dbl>        <dbl>              <dbl>        <dbl>
#> 1 setosa                  1.46        0.246               1.5           0.2
#> 2 versicolor              4.26        1.33                4.35          1.3
#> 3 virginica               5.55        2.03                5.55          2

The first arguments of swing() and twist() are either a function or a formula that uses . as a placeholder. Subsequent arguments are tidyselect selections.

You can combine swing() and twist() in the same tango() or waltz():

g %>% 
  tango(
    swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
    median = twist(median, contains("."))
  )
#> # A tibble: 3 x 4
#>   Species mean_Petal.Lengā€¦ mean_Petal.Width median$Sepal.Leā€¦ $Sepal.Width
#>   <fct>              <dbl>            <dbl>            <dbl>        <dbl>
#> 1 setosa              1.46            0.246              5            3.4
#> 2 versicā€¦             4.26            1.33               5.9          2.8
#> 3 virginā€¦             5.55            2.03               6.5          3  
#> # ā€¦ with 2 more variables: $Petal.Length <dbl>, $Petal.Width <dbl>

rumba, zumba

Similarly rumba() can be used to apply several functions to a single column. rumba() creates single columns and zumba() packs them into a data frame column.

g %>% 
  tango(
    rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}"), 
    Petal = zumba(Petal.Width, mean = mean, median = median)
  )
#> # A tibble: 3 x 4
#>   Species    Sepal_mean Sepal_median Petal$mean $median
#>   <fct>           <dbl>        <dbl>      <dbl>   <dbl>
#> 1 setosa           3.43          3.4      0.246     0.2
#> 2 versicolor       2.77          2.8      1.33      1.3
#> 3 virginica        2.97          3        2.03      2

salsa, chacha, samba, madison

Now we enter the realms of dplyr::mutate() with:

  • salsa() : to create new columns
  • chacha(): to reorganize a grouped tibble so that data for each group is contiguous
  • samba() : chacha() + salsa()
g %>% 
  salsa(
    Sepal = ~Sepal.Length * Sepal.Width, 
    Petal = ~Petal.Length * Petal.Width
  )
#> # A tibble: 150 x 2
#>    Sepal Petal
#>    <dbl> <dbl>
#>  1  17.8 0.280
#>  2  14.7 0.280
#>  3  15.0 0.26 
#>  4  14.3 0.3  
#>  5  18   0.280
#>  6  21.1 0.68 
#>  7  15.6 0.42 
#>  8  17   0.3  
#>  9  12.8 0.280
#> 10  15.2 0.15 
#> # ā€¦ with 140 more rows

You can swing(), twist(), rumba() and zumba() here too, and if you want the original data, you can use samba() instead of salsa():

g %>% 
  samba(centered = twist(~ . - mean(.), everything(), -Species))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ā€¦ with 140 more rows, and 4 more variables: centered$Sepal.Length <dbl>,
#> #   $Sepal.Width <dbl>, $Petal.Length <dbl>, $Petal.Width <dbl>

madison() packs the columns salsa() would have created

g %>% 
  madison(swing(~ . - mean(.), starts_with("Sepal")))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ā€¦ with 140 more rows, and 2 more variables: data$Sepal.Length <dbl>,
#> #   $Sepal.Width <dbl>

bolero and mambo

bolero() is similar to dplyr::filter(). The formulas may be made by mambo() if you want to apply the same predicate to a tidyselection of columns:

g %>% 
  bolero(~Sepal.Width > 4)
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
  bolero(mambo(~. > 4, starts_with("Sepal")))
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
  bolero(mambo(~. > 4, starts_with("Sepal"), .op = or))
#> # A tibble: 150 x 5
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ā€¦ with 140 more rows

dance's People

Contributors

almostnotarobot avatar dragosmg avatar romainfrancois avatar sharlagelfand avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dance's Issues

doc for choreography()

choreography() is not featured in the README but it's a central function to what the dance package does.

add tests for twist()

Add unit tests for twist(). You can create the test file with usethis::use_test("twist")

Create separate issues for testing each dance

Each dance() function needs to be tested. Create separate issues for each and perhaps comment here that you've done it.

This way, these issues are small goals that can be tackled separately.

add tests for samba()

Add unit tests for samba(). You can create the test file with usethis::use_test("samba")

add tests for madison()

Add unit tests for madison(). You can create the test file with usethis::use_test("madison")

add tests for tango()

Add unit tests for tango(). The test file for tango() has already been created and can be found in tests/testthat/test-tango.R

add tests for swing()

Add unit tests for swing(). You can create the test file with usethis::use_test("swing")

documentation for polka()

polka() in Readme.Rmd:

polka() deals with peeling off one layer of grouping:

and a reprex of polka() in action:

library(dance)
g <- iris %>% group_by(Species)

g %>% 
    polka()
#> # A tibble: 3 x 1
#>   Species   
#>   <fct>     
#> 1 setosa    
#> 2 versicolor
#> 3 virginica

Created on 2019-03-08 by the reprex package (v0.2.1)

create issues

This needs issues for:

  • roxygen documenting exported functions, e.g. starting by those used in the README
  • setup testthat
  • unit tests for all functions

add tests for rumba()

Add unit tests for rumba(). You can create the test file with usethis::use_test("rumba")

documentation for zumba()

Here is what the README.Rmd says about zumba() :

`rumba()` and `zumba()` can be used to apply several functions to a single column:

and a reprex of zumba() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        zumba(Petal.Width, mean = mean, median = median, .name = "Petal")
    )
#> # A tibble: 3 x 2
#>   Species    Petal$mean $median
#>   <fct>           <dbl>   <dbl>
#> 1 setosa          0.246     0.2
#> 2 versicolor      1.33      1.3
#> 3 virginica       2.03      2

The difference between rumba() and zumba() is that rumba() creates single columns while zumba() packs them into a data frame column. .name controls the naming of the new columns for zumba(), without using glue .

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}"), 
        zumba(Petal.Width, mean = mean, median = median, .name = "Petal")
    )
#> # A tibble: 3 x 4
#>   Species    Sepal_mean Sepal_median Petal$mean $median
#>   <fct>           <dbl>        <dbl>      <dbl>   <dbl>
#> 1 setosa           3.43          3.4      0.246     0.2
#> 2 versicolor       2.77          2.8      1.33      1.3
#> 3 virginica        2.97          3        2.03      2

Created on 2019-03-08 by the reprex package (v0.2.1)

documentation for chacha()

chacha() reorganizes a grouped tibble so that data for each group is contiguous

and a reprex of chacha() in action:

library(dance)

g <- data.frame(a = rep(1:3, 2), b = 1:6) %>% 
  group_by(a)

g
#> # A tibble: 6 x 2
#> # Groups:   a [3]
#>       a     b
#>   <int> <int>
#> 1     1     1
#> 2     2     2
#> 3     3     3
#> 4     1     4
#> 5     2     5
#> 6     3     6

chacha(g)
#> # A tibble: 6 x 2
#> # Groups:   a [3]
#>       a     b
#>   <int> <int>
#> 1     1     1
#> 2     1     4
#> 3     2     2
#> 4     2     5
#> 5     3     3
#> 6     3     6

Created on 2019-03-08 by the reprex package (v0.2.1)

add tests for zumba()

Add unit tests for zumba(). You can create the test file with usethis::use_test("zumba")

documentation for bolero()

Here is what the README.Rmd says about bolero() :

`bolero()` is similar to `dplyr::filter()`. The formulas may be made by `mambo()` if you
 want to apply the same predicate to a tidyselection of columns

bolero() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    bolero(~Sepal.Width > 4)
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
    bolero(mambo(~. > 4, starts_with("Sepal")))
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
    bolero(mambo(~. > 4, starts_with("Sepal"), .op = or))
#> # A tibble: 150 x 5
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ā€¦ with 140 more rows

Created on 2019-03-09 by the reprex package (v0.2.1)

documentation for samba()

Similar to dplyr::mutate(), samba() is used to create new columns. Useful to note that samba() will keep both the original columns and the newly created ones.

A reprex of samba() in action:

library(dance)

g <- iris %>% group_by(Species)

g %>% 
  samba(
    Sepal = ~Sepal.Length * Sepal.Width, 
    Petal = ~Petal.Length * Petal.Width
  )
#> # A tibble: 150 x 7
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal Petal
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl> <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa   17.8 0.280
#>  2          4.9         3            1.4         0.2 setosa   14.7 0.280
#>  3          4.7         3.2          1.3         0.2 setosa   15.0 0.26 
#>  4          4.6         3.1          1.5         0.2 setosa   14.3 0.3  
#>  5          5           3.6          1.4         0.2 setosa   18   0.280
#>  6          5.4         3.9          1.7         0.4 setosa   21.1 0.68 
#>  7          4.6         3.4          1.4         0.3 setosa   15.6 0.42 
#>  8          5           3.4          1.5         0.2 setosa   17   0.3  
#>  9          4.4         2.9          1.4         0.2 setosa   12.8 0.280
#> 10          4.9         3.1          1.5         0.1 setosa   15.2 0.15 
#> # ā€¦ with 140 more rows

Created on 2019-03-08 by the reprex package (v0.2.1)

documentation for tango()

Here is what the README.Rmd says about tango():

`tango()` binds the results of `polka()` and `waltz()` so is the closest to `dplyr::summarise()`

and a reprex of tango() in action:

library(dance)
g <- iris %>% group_by(Species)

g %>% 
    tango(
        Sepal.Length = ~mean(Sepal.Length), 
        Sepal.Width  = ~mean(Sepal.Width)
    )
#> # A tibble: 3 x 3
#>   Species    Sepal.Length Sepal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             5.01        3.43
#> 2 versicolor         5.94        2.77
#> 3 virginica          6.59        2.97

Created on 2019-03-08 by the reprex package (v0.2.1)

add tests for salsa()

Add unit tests for salsa(). You can create the test file with usethis::use_test("salsa")

documentation for charleston()

Here is what the README.Rmd says about charleston() :

`charleston()` is like `tango()` but it packs the new columns in a tibble:

and a reprex of charleston() in action:

library(dance)
g <- iris %>% group_by(Species)

g %>% 
    charleston(
        Sepal.Length = ~mean(Sepal.Length), 
        Sepal.Width  = ~mean(Sepal.Width)
    )
#> # A tibble: 3 x 2
#>   Species    data$Sepal.Length $Sepal.Width
#>   <fct>                  <dbl>        <dbl>
#> 1 setosa                  5.01         3.43
#> 2 versicolor              5.94         2.77
#> 3 virginica               6.59         2.97

Created on 2019-03-08 by the reprex package (v0.2.1)

documentation for swing()

Here is what the README.Rmd says about swing() :

both `swing()` and `twist()` are for applying the same function to a set of columns. 

and a reprex of swing() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(swing(mean, starts_with("Petal")))
#> # A tibble: 3 x 3
#>   Species    Petal.Length Petal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             1.46       0.246
#> 2 versicolor         4.26       1.33 
#> 3 virginica          5.55       2.03

swing() differs from twist() in the type of column created and how naming happens:

  • swing() makes as many new columns as are selected by the tidy selection, and the columns
    are named using a .name glue pattern, this way we might swing() several times.
library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
        swing(median, starts_with("Petal"), .name = "median_{var}"), 
    )
#> # A tibble: 3 x 5
#>   Species mean_Petal.Lengā€¦ mean_Petal.Width median_Petal.Leā€¦
#>   <fct>              <dbl>            <dbl>            <dbl>
#> 1 setosa              1.46            0.246             1.5 
#> 2 versicā€¦             4.26            1.33              4.35
#> 3 virginā€¦             5.55            2.03              5.55
#> # ā€¦ with 1 more variable: median_Petal.Width <dbl>

The first arguments of swing() and twist() are either a function or a formula that uses . as a placeholder. Subsequent arguments are tidyselect selections.

You can combine swing() and twist() in the same tango() or waltz():

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
        twist(median, contains("."), .name = "median")
    )
#> # A tibble: 3 x 4
#>   Species mean_Petal.Lengā€¦ mean_Petal.Width median$Sepal.Leā€¦ $Sepal.Width
#>   <fct>              <dbl>            <dbl>            <dbl>        <dbl>
#> 1 setosa              1.46            0.246              5            3.4
#> 2 versicā€¦             4.26            1.33               5.9          2.8
#> 3 virginā€¦             5.55            2.03               6.5          3  
#> # ā€¦ with 2 more variables: $Petal.Length <dbl>, $Petal.Width <dbl>

Created on 2019-03-08 by the reprex package (v0.2.1)

documentation for salsa()

Similar to dplyr::transmute(), salsa() is used to create new columns. Useful to note that salsa() will drop existing variables and only keep the newly created ones.

A reprex of salsa() in action:

library(dance)

g <- iris %>% group_by(Species)

g %>% 
    salsa(
        Sepal = ~Sepal.Length * Sepal.Width, 
        Petal = ~Petal.Length * Petal.Width
    )
#> # A tibble: 150 x 2
#>    Sepal Petal
#>    <dbl> <dbl>
#>  1  17.8 0.280
#>  2  14.7 0.280
#>  3  15.0 0.26 
#>  4  14.3 0.3  
#>  5  18   0.280
#>  6  21.1 0.68 
#>  7  15.6 0.42 
#>  8  17   0.3  
#>  9  12.8 0.280
#> 10  15.2 0.15 
#> # ā€¦ with 140 more rows

Created on 2019-03-08 by the reprex package (v0.2.1)

documentation for waltz()

Here is what the README.Rmd says about waltz() :

`waltz()` takes a grouped tibble and a list of formulas and returns a tibble with: 
as many columns as supplied formulas, one row per group. It does not prepend the grouping 
variables (see `tango` for that)

and a reprex of waltz() in action:

library(dance)

iris %>% 
  group_by(Species) %>% 
  waltz(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
#> # A tibble: 3 x 2
#>   Sepal.Length Sepal.Width
#>          <dbl>       <dbl>
#> 1         5.01        3.43
#> 2         5.94        2.77
#> 3         6.59        2.97

dance related example data set

That'd be nicer to manipulate a šŸ’ƒ šŸ‘Æā€ā™‚ļø šŸ‘Æ šŸ•ŗ data set rather than iris.

  • find some data
  • usethis::use_data_raw() and have some code in data-raw/dance.R to import it

add tests for waltz()

Add unit tests for waltz(). You can create the test file with usethis::use_test("waltz")

add tests for mambo()

Add unit tests for mambo(). You can create the test file with usethis::use_test("mambo")

add tests for chacha()

Add unit tests for chacha(). You can create the test file with usethis::use_test("chacha")

documentation for mambo()

Here is what the README.Rmd says about mambo():

`bolero()` uses a formula. The formulas may be made by `mambo()` if you want to apply 
the same predicate to a tidyselection of columns:

mambo() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    bolero(mambo(~. > 4, starts_with("Sepal")))
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
    bolero(mambo(~. > 4, starts_with("Sepal"), .op = or))
#> # A tibble: 150 x 5
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ā€¦ with 140 more rows

Created on 2019-03-09 by the reprex package (v0.2.1)

documentation for madison()

Here is what the README.Rmd says about madison() :

`madison()` packs the columns `salsa()` would have created into a data frame column.

and a reprex of madison() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    madison(swing(~ . - mean(.), starts_with("Sepal")))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ā€¦ with 140 more rows, and 2 more variables: data$Sepal.Length <dbl>,
#> #   $Sepal.Width <dbl>

Created on 2019-03-09 by the reprex package (v0.2.1)

benchmark plan

Although this is not top priority, this is more of a syntax experimentation, it'd be nice to bench::mark() dance against dplyr.

dance does not do hybrid evaluation yet, but has other tricks, so I expect things that use hybrid evaluation in dplyr, e.g. mean() to be faster. But I'm genuinely curious about how dance performs against dplyr generally.

add tests for bolero()

Add unit tests for bolero(). You can create the test file with usethis::use_test("bolero")

add tests for polka()

Add unit tests for polka(). You can create the test file with usethis::use_test("polka")

documentation for rumba()

Here is what the README.Rmd says about rumba() :

`rumba()` and `zumba()` can be used to apply several functions to a single column:

and a reprex of rumba() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}")
    )
#> # A tibble: 3 x 3
#>   Species    Sepal_mean Sepal_median
#>   <fct>           <dbl>        <dbl>
#> 1 setosa           3.43          3.4
#> 2 versicolor       2.77          2.8
#> 3 virginica        2.97          3

rumba() creates single columns and zumba() packs them into a data frame column. rumba() uses the .name glue pattern for naming columns

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}"), 
        zumba(Petal.Width, mean = mean, median = median, .name = "Petal")
    )
#> # A tibble: 3 x 4
#>   Species    Sepal_mean Sepal_median Petal$mean $median
#>   <fct>           <dbl>        <dbl>      <dbl>   <dbl>
#> 1 setosa           3.43          3.4      0.246     0.2
#> 2 versicolor       2.77          2.8      1.33      1.3
#> 3 virginica        2.97          3        2.03      2

Created on 2019-03-08 by the reprex package (v0.2.1)

documentation for twist()

Here is what the README.Rmd says about twist() :

both `swing()` and `twist()` are for applying the same function to a set of columns. 

and a reprex of twist() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(twist(mean, starts_with("Petal")))
#> # A tibble: 3 x 2
#>   Species    data$Petal.Length $Petal.Width
#>   <fct>                  <dbl>        <dbl>
#> 1 setosa                  1.46        0.246
#> 2 versicolor              4.26        1.33 
#> 3 virginica               5.55        2.03

Created on 2019-03-08 by the reprex package (v0.2.1)

twist() differs from swing() in the type of column created and how naming happens:

  • twist() instead creates a single data frame column, and .name controls its name:
library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        twist(mean, starts_with("Petal"), .name = "mean"), 
        twist(median, starts_with("Petal"), .name = "median"), 
    )
#> # A tibble: 3 x 3
#>   Species    mean$Petal.Length $Petal.Width median$Petal.Lengā€¦ $Petal.Width
#>   <fct>                  <dbl>        <dbl>              <dbl>        <dbl>
#> 1 setosa                  1.46        0.246               1.5           0.2
#> 2 versicolor              4.26        1.33                4.35          1.3
#> 3 virginica               5.55        2.03                5.55          2

The first arguments of swing() and twist() are either a function or a formula that uses . as a placeholder. Subsequent arguments are tidyselect selections.

You can combine swing() and twist() in the same tango() or waltz():

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
        twist(median, contains("."), .name = "median")
    )
#> # A tibble: 3 x 4
#>   Species mean_Petal.Lengā€¦ mean_Petal.Width median$Sepal.Leā€¦ $Sepal.Width
#>   <fct>              <dbl>            <dbl>            <dbl>        <dbl>
#> 1 setosa              1.46            0.246              5            3.4
#> 2 versicā€¦             4.26            1.33               5.9          2.8
#> 3 virginā€¦             5.55            2.03               6.5          3  
#> # ā€¦ with 2 more variables: $Petal.Length <dbl>, $Petal.Width <dbl>

Created on 2019-03-08 by the reprex package (v0.2.1)

Create separate issues for documenting each function

Each function (used in the README) needs its own roxygen documentation, this issue is about creating another issue for each of the functions, tango(), waltz() ...

For example you can add some context in the issue: a reprex that you created by extracting some code from the README

See example issue #8 about waltz() and its pull request #7 by @cecilesauder. Feel free to add other issue and maybe reference them here.

  • waltz() issue #8 and pull request #7
  • choreography() issue #22
  • polka() - issue #11
  • tango() - issue #12
  • charleston() - issue #13
  • swing() - issue #14
  • twist() - issue #15
  • rumba() - issue #16
  • zumba() - issue #17
  • salsa() - issue #18
  • samba() - issue #19
  • chacha() - issue #20
  • madison() - issue #23
  • bolero() - issue #24
  • mambo() - issue #27

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.