Giter VIP home page Giter VIP logo

Comments (3)

gonzalezeb avatar gonzalezeb commented on September 22, 2024

I think the simple solution is ok.
These filter reminded me that columns related to wood specific gravity (anything "wsg") don't need to show in the master table as we are retrieving wsg (wood density values) from a citable source (global wood density databse - BIOMAS packages)..we can discuss later

from allodb.

maurolepore avatar maurolepore commented on September 22, 2024

I changed my mind about how to approach this problem and I now think that the safest and cleanest way is to do as little as possible. That is, to give users all the information by storing the data in data/ as text. That way the different kinds of missing values will appear as entered (even in columns that are, for example, meant to be numeric). To use the tables we can then provide a helper that converts each column to the corresponding type. I'll soon demonstrate this with code.

from allodb.

maurolepore avatar maurolepore commented on September 22, 2024
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(allodb)

# All columns as text
allodb::wsg %>% 
  filter(sample_size == "NRA")
#> # A tibble: 16 x 8
#>    wsg_id family  species   wsg   wsg_specificity sample_size site  ref_id
#>    <chr>  <chr>   <chr>     <chr> <chr>           <chr>       <chr> <chr> 
#>  1 <NA>   Rhamna~ Ceanothu~ 0.67  <NA>            NRA         UCSC  <NA>  
#>  2 <NA>   Grossu~ Ribes di~ 0.73  <NA>            NRA         UCSC  <NA>  
#>  3 <NA>   Ericac~ Vacciniu~ 0.47  <NA>            NRA         UCSC  <NA>  
#>  4 <NA>   Rosace~ Holodisc~ 0.71  <NA>            NRA         Wind~ <NA>  
#>  5 <NA>   Rosace~ Rubus le~ NRA   <NA>            NRA         Wind~ <NA>  
#>  6 <NA>   Rosace~ Rubus sp~ NRA   <NA>            NRA         Wind~ <NA>  
#>  7 <NA>   Ericac~ Vacciniu~ 0.47  <NA>            NRA         Wind~ <NA>  
#>  8 <NA>   Ericac~ Vacciniu~ 0.47  <NA>            NRA         Wind~ <NA>  
#>  9 <NA>   Ericac~ Vacciniu~ 0.47  <NA>            NRA         Wind~ <NA>  
#> 10 <NA>   Ericac~ Arctosta~ 0.72  <NA>            NRA         Yose~ <NA>  
#> 11 <NA>   Rhamna~ Ceanothu~ 0.67  <NA>            NRA         Yose~ <NA>  
#> 12 <NA>   Rhamna~ Ceanothu~ 0.67  <NA>            NRA         Yose~ <NA>  
#> 13 <NA>   Rhamna~ Ceanothu~ 0.67  <NA>            NRA         Yose~ <NA>  
#> 14 <NA>   Rosace~ Holodisc~ 0.71  <NA>            NRA         Yose~ <NA>  
#> 15 <NA>   Grossu~ Ribes ne~ NRA   <NA>            NRA         Yose~ <NA>  
#> 16 <NA>   Grossu~ Ribes ro~ NRA   <NA>            NRA         Yose~ <NA>

# Preserves different representations of missing values, e.g. "NRA".
allodb::wsg %>% 
  filter(sample_size == "NRA")
#> # A tibble: 16 x 8
#>    wsg_id family  species   wsg   wsg_specificity sample_size site  ref_id
#>    <chr>  <chr>   <chr>     <chr> <chr>           <chr>       <chr> <chr> 
#>  1 <NA>   Rhamna~ Ceanothu~ 0.67  <NA>            NRA         UCSC  <NA>  
#>  2 <NA>   Grossu~ Ribes di~ 0.73  <NA>            NRA         UCSC  <NA>  
#>  3 <NA>   Ericac~ Vacciniu~ 0.47  <NA>            NRA         UCSC  <NA>  
#>  4 <NA>   Rosace~ Holodisc~ 0.71  <NA>            NRA         Wind~ <NA>  
#>  5 <NA>   Rosace~ Rubus le~ NRA   <NA>            NRA         Wind~ <NA>  
#>  6 <NA>   Rosace~ Rubus sp~ NRA   <NA>            NRA         Wind~ <NA>  
#>  7 <NA>   Ericac~ Vacciniu~ 0.47  <NA>            NRA         Wind~ <NA>  
#>  8 <NA>   Ericac~ Vacciniu~ 0.47  <NA>            NRA         Wind~ <NA>  
#>  9 <NA>   Ericac~ Vacciniu~ 0.47  <NA>            NRA         Wind~ <NA>  
#> 10 <NA>   Ericac~ Arctosta~ 0.72  <NA>            NRA         Yose~ <NA>  
#> 11 <NA>   Rhamna~ Ceanothu~ 0.67  <NA>            NRA         Yose~ <NA>  
#> 12 <NA>   Rhamna~ Ceanothu~ 0.67  <NA>            NRA         Yose~ <NA>  
#> 13 <NA>   Rhamna~ Ceanothu~ 0.67  <NA>            NRA         Yose~ <NA>  
#> 14 <NA>   Rosace~ Holodisc~ 0.71  <NA>            NRA         Yose~ <NA>  
#> 15 <NA>   Grossu~ Ribes ne~ NRA   <NA>            NRA         Yose~ <NA>  
#> 16 <NA>   Grossu~ Ribes ro~ NRA   <NA>            NRA         Yose~ <NA>



# All columns of the type that is most suitable for computation**
# E.g.: Notice that `sample_size` is integer.
as_allodb(allodb::wsg)
#> # A tibble: 419 x 8
#>    wsg_id family  species   wsg   wsg_specificity sample_size site  ref_id
#>  * <chr>  <chr>   <chr>     <chr> <chr>                 <int> <chr> <chr> 
#>  1 <NA>   Sapind~ Acer rub~ 0.49  <NA>                     NA Lill~ <NA>  
#>  2 <NA>   Sapind~ Acer sac~ 0.56  <NA>                     NA Lill~ <NA>  
#>  3 <NA>   Rosace~ Amelanch~ 0.66  <NA>                     NA Lill~ <NA>  
#>  4 <NA>   Rosace~ Amelanch~ 0.66  <NA>                     NA Lill~ <NA>  
#>  5 <NA>   Rosace~ Amelanch~ 0.66  <NA>                     NA Lill~ <NA>  
#>  6 <NA>   Annona~ Asimina ~ 0.47  <NA>                     NA Lill~ <NA>  
#>  7 <NA>   Betula~ Carpinus~ 0.58  <NA>                     NA Lill~ <NA>  
#>  8 <NA>   Juglan~ Carya al~ 0.62  <NA>                     10 Lill~ <NA>  
#>  9 <NA>   Juglan~ Carya co~ 0.6   <NA>                     10 Lill~ <NA>  
#> 10 <NA>   Juglan~ Carya gl~ 0.66  <NA>                     10 Lill~ <NA>  
#> # ... with 409 more rows

# Weird representation of missing values are lost (e.g. no more "NRA"")
as_allodb(allodb::wsg) %>% 
  filter(sample_size == "NRA")
#> # A tibble: 0 x 8
#> # ... with 8 variables: wsg_id <chr>, family <chr>, species <chr>,
#> #   wsg <chr>, wsg_specificity <chr>, sample_size <int>, site <chr>,
#> #   ref_id <chr>



# ** Notice a possible bug: `wsg` should be double -- not character.

Created on 2018-09-25 by the reprex package (v0.2.1)

from allodb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.