Giter VIP home page Giter VIP logo

madtraits's People

Contributors

akoontz11 avatar alexrego avatar amcmanis avatar aprilstabbins avatar ccarnivale avatar dimanti avatar gsmith1330 avatar khafen74 avatar maxfarrell avatar mrhelmus avatar nahuron avatar perimyotis avatar spencerbrucehudson avatar srlgadey avatar sylviakinosian avatar tom766 avatar willpearse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

madtraits's Issues

.delaRiva.2015

readxl very nicely loads all the column names in Unicode, which means some column names have sub/super-scripts for the units. I'm concerned this could cause problems down the line; check that it doesn't...

Running list of errors for each data set up to .friedman.2014

Here is a running list of problems with NATDB downloading on a PC. I give a description of what I think is going on, and then below that the code and the error.

First, there are lots that work. Basically anything that is a csv or a txt works. I stopped adding the ones that work.

# work fine -------------------------------------------
# .albouy.2015
data <- read.csv(unzip(suppdata("E096-203","Functional_data.zip", "esa_archives"), "Functional_data.csv"), sep=";")

# .anderson.2015
data<-read.csv(suppdata("10.1371/journal.pone.0166714",2))

# .artacho.2015
data <- read.csv2(suppdata("10.5061/dryad.qg062", "phenotypictraits.csv"), sep=';')

# .augspurger.2016a
  data <- read.csv(suppdata("10.5061/dryad.56cn4","Data File 1. Diaspore traits.csv"), as.is=TRUE)

# .bengtsson.2016
  data <- read.csv(suppdata('10.5061/dryad.62054', 'bengtsson_etal_2016_traits.csv'), as.is=TRUE)

# .comeault.2013
  data <- read.table(suppdata("10.5061/dryad.ck2cm","Tcris_FHA_phenotypes.txt"),header=TRUE)

# .edwards.2015a
    data <- read.csv(suppdata("E096-202", "Table1.csv", "esa_archives"))
# stopped adding ones that worked    

ERROR 1

This is probabally the hardest error. It is with read_excel and suppdata with dryad download. I tried to just use suppdata to download xls from dryad and could not open the downloaded files... so there is some interaction going on here.

# does not work -----------------------------------------------------------------

# .ameztegui.2016
data <- as.data.frame(read_excel(suppdata("10.5061/dryad.12b0h/2","FunctionalTraits_Dryad.xlsx")))

# .arnold.2016
data <- as.data.frame(read_excel(suppdata("10.5061/dryad.t3d52", "Arnold_etal_2016_functecol_dataset.xlsx"), as.is = TRUE, skip = 3))

# .aubret.2012a
data <- as.data.frame(read_excel(suppdata("10.5061/dryad.14cr5345", "Aubret%2053172.xlsx"),  sheet=1))


#all of the above give this error

Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim,  :
  Evaluation error: error -103 with zipfile in unzGetCurrentFileInfo
.

ERROR 2

This one just runs and runs. It is like the above one but it deals with figshare. So I think the problem is the suppdata and read_excel interaction (it does not matter where you are downloading from). If you can fix this interaction then ERRORS 1 and 2 might go away...

# .bello_bedoy.2015b
data <- as.data.frame(read_excel(
        suppdata("10.6084/m9.figshare.1190766.v2","mating occurrencePzoe_2010.xls")
    ))    

ERROR 3
Problem with the suppdata function not the unzip I think. It is having a problem with wiley... is it because these are not open access?

# .benesh.2017
data <- read.csv(unzip(suppdata("10.1002/ecy.1680", 1), "CLC_database_lifehistory.csv"))
Error in download.file(url, destination, quiet = TRUE) : 
  cannot open URL 'https://onlinelibrary.wiley.com/action/'
In addition: Warning message:
In download.file(url, destination, quiet = TRUE) :
  cannot open URL 'https://onlinelibrary.wiley.com/action/': HTTP status was '404 Not Found'

# .engemann.2016
 data <- read.delim(unzip(suppdata("10.1002/ecy.1569", 1), "DataS1/GrowthForm_Final.txt"))
 Error in download.file(url, destination, quiet = TRUE) : 
  cannot open URL 'https://onlinelibrary.wiley.com/action/'
In addition: Warning message:
In download.file(url, destination, quiet = TRUE) :
  cannot open URL 'https://onlinelibrary.wiley.com/action/': HTTP status was '404 Not Found'

ERROR 4
this one needs a file extension

# .cariveau.2016
data <- as.data.frame(read_excel(suppdata("10.1371/journal.pone.0151482", 3), sheet="TableS1_v2"))
Error: Missing file extension.

ERROR 5
This one crashes R and gives me a bomb in R studio!

.friedman.2014 <- function(...){
    data <- as.data.frame(read_excel(suppdata("10.5061/dryad.489c7","NILs_rawdata.xls")))

 fread: wanted 1 got 0 loc=570826752
 seek: wanted to seek to sector...
THEN R ABORTS

OK I got through to .friedman.2014 ...

A list of problems with functions .kraft.2015b to .marx.2016

I checked 16 functions, from .kraft.2015b to .marx.2016, alphabetically. The following functions did not have problems in the downloads:
.lagisz.2013
.limpens.2014a.
.limpens.2014b
.lu.2016a
.lu.2016b
.lu.2016c
.lupold.2013
.martin.2016

The following functions had problems in the downloads:

.kraft.2015b: Potentially non-trait variables: abundance

.kuo.2014: Needs to be run twice to work, otherwise good

.lawson.2015: One species is "Tree (Sportsmans 57)", otherwise good

.lessard.2016: Missing metadata. Some units seem wrong: "?" and "range"

.lislevand.2006: "egg_mass" is measured in "f"

.madin.2016: no metadata. "Skeletal micro-density" units should be "g cm^-3". looks like there's some numeric variables in the character list

.martin.2014: species name is just "O_canadensis"

.marx.2016: There are a few hundred empty rows in the numeric list with units. For example, see “data.frame(a$numeric$variable,a$numeric$units)[620:650,]”. Otherwise, good

I added citations to citations.tsv for the following functions:
.lagisz.2013
.lislevand.2006
.lupold.2013
.martin.2014

Warning: 'xml_find_one' is deprecated

The xml2 package has been updated, and some internal functions within fulltext still use it. This is causing natdb's package checks to fail, as this is a depreciation warning.

I need to update fulltext before this can be submitted, and before the travis-ci builds can stop erroring!

Metadata is strange

.kolbe.2011 seems to have meta-data that is the copy of the data. What is going on?!

Unit tests

  • .df.melt
  • natdb
  • clean.natdb
  • convert.natdb.units
  • lookup.natdb.names
  • add coveralls.io badge

.Tian.2016

Currently doesn't work; needs a skip attribute and some love

Long format datasets

...can't be loaded easily at present, which is silly as .df.melt turns them into that format. This is a problem for .madin.2016 and one other function you wrote 'on the day'.

Error, Do I Need to install Perl?

If this error is because of a TyPEo you can take away my R card!

I am unable to download a lot of the data sets on my PC. I think the error is coming from fulltext::ft_get_si or gdata::read.xls

Example Code:

mydats  <- c(".artacho.2015", ".vanier.2013", ".aubret.2012b")
dat <- natdb(datasets = mydats)

Error in findPerl(verbose = verbose) : 
  perl executable not found. Use perl= argument to specify the correct path.
Error in file.exists(tfn) : invalid 'file' argument

But this code without the 3rd dataset works fine:

mydats  <- c(".artacho.2015", ".vanier.2013")
dat <- natdb(datasets = mydats)

Note that these first two data sets use read.csv2 and read.delim not gdata::read.xls

Functions that need careful checking

  • .fitzgerald.2017
  • .wright.2004
  • .kamath.2016
  • .husak.2016
  • jennings.2016a - this one is weird, as it's been duplicated but the two versions are different. Remember square root is x^1/2, log(mm), and abundance is #

fitzgerald.2017

I don't think these traits have been transformed; I think they were transformed before analysis in the paper (not here).

ALSO report log as log(mm), and it should be log10(mm) or ln(mm), and only log(mm) if you don't know. Not log_mm

About MADtraits

I would like to better understand what this package does as it sounds interesting, I have read the main page but I am not sure I understand.

It takes a .csv file and formats it into a format that follow the traitdatabase standard that is described in Kissling et al. 2018?

um et al.

I presume that's meant to be µm - if so, please go through and correct

citations.tsv last 4 entries, no first column (counter).

Change function names / RDS caches

...because the dot-files are invisible on Mac, which is great, but probably makes people not trust their cache!... Maybe zip up the cache at the very end or something so people can move it around? A thought, anyway...

Things to check in every function

  • Numeric and categorical data have been assigned correctly. If they've not been, then you've downloaded the data correctly. Sometimes commas are decimals...
  • You've not got the unit names in the column names
  • You've removed metadata from the data itself
  • You've added meta-data if it's there :D
  • You've used a wrapper
  • You've not downloaded a file to the hard drive in a specific place (i.e., only use tempdir. but always use the wrappers)
  • Go through the list of variables that are already in the database. If you think yours should be one, submit a pull request on the cleaning.R file
  • gsub(" ", "_", species) !!!
  • um isn't a unit; use µm (copy from here if you need to)
  • Authors have given permission to use data

natdb_citations

It would be helpful if the user can see all the data sets that could be loaded by NATDB before trying to download all the data. Currently, natdb_citations is stored in sys.data to be used by citations(), but I suggest moving it (or just make a copy of it) to /data so that the user can call it directly as data natdb_citations.

I can make some changes and put in a pull request, but first I need a bit of a tutorial on how you maintain the package so I do not mess it up (i.e., do you use Roxygen or not etc.).

Installation fails with devtools::install_github('willpearse/MADtraits')

Hi,

In an R project using renv, I just tried installing MADtraits with devtools::install_github('willpearse/MADtraits') but it failed, the error message being:

* installing *source* package 'MADtraits' ...
** using staged installation
** R
Error in load(srcFile, e) : 
  (converted from warning) strings not representable in native encoding will be translated to UTF-8
ERROR: unable to build sysdata DB for package 'MADtraits'
* removing 'C:/Users/as80fywe/idiv/sTeTra/sTeTra/renv/library/R-4.0/x86_64-w64-mingw32/MADtraits'
Error: Failed to install 'MADtraits' from GitHub:
  (converted from warning) installation of package ‘C:/Users/as80fywe/AppData/Local/Temp/Rtmpqalzq5/file1fd479567573/MADtraits_1.0-0.tar.gz’ had non-zero exit status
In addition: Warning messages:
1: In untar2(tarfile, files, list, exdir, restore_times) :
  skipping pax global extended headers
2: In untar2(tarfile, files, list, exdir, restore_times) :
  skipping pax global extended headers

After a search online, installation succeeded using:

> withr::with_envvar(c(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true"), 
+                    remotes::install_github('willpearse/MADtraits')
+ )

Using this method, the same warnings appeared but no error:

* installing *source* package 'MADtraits' ...
** using staged installation
** R
Warning in load(srcFile, e) :
  strings not representable in native encoding will be translated to UTF-8
Warning in load(srcFile, e) :
  input string 'diplopylidium_nü¾�¦˜¼lleri' cannot be translated to UTF-8, is it valid in 'UTF-8' ?
** inst
** byte-compile and prepare package for lazy loading
n*** Successfully loaded .Rprofile ***n
** help
*** installing help indices
  converting help for package 'MADtraits'
    finding HTML links ... done
    MADtraits                               html  
    MADtraits_citations                     html  
    MADtraits_datasets                      html  
    citations                               html  
    clean.MADtraits                         html  
    datasets                                html  
** building package indices
n*** Successfully loaded .Rprofile ***n
** installing vignettes
   'MADtraits-intro.Rnw' 
** testing if installed package can be loaded from temporary location
n*** Successfully loaded .Rprofile ***n
** testing if installed package can be loaded from final location
n*** Successfully loaded .Rprofile ***n
** testing if installed package keeps a record of temporary installation path
* DONE (MADtraits)

Sorry that I have no clue about what the problem is...

Cheers,

Alban

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] drake_7.12.5    MADtraits_1.0-0

loaded via a namespace (and not attached):
  [1] nlme_3.1-149      fs_1.5.0          bold_1.1.0        usethis_1.6.3    
  [5] lubridate_1.7.9   devtools_2.3.2    progress_1.2.2    filelock_1.0.2   
  [9] httr_1.4.2        rprojroot_1.3-2   tools_4.0.2       backports_1.1.10 
 [13] R6_2.4.1          DT_0.15           withr_2.3.0       tidyselect_1.1.0 
 [17] prettyunits_1.1.1 processx_3.4.4    curl_4.3          compiler_4.0.2   
 [21] cli_2.0.2         xml2_1.3.2        desc_1.2.0        triebeard_0.3.0  
 [25] mvtnorm_1.1-1     callr_3.4.4       handlr_0.2.0      convertr_0.1     
 [29] stringr_1.4.0     digest_0.6.25     txtq_0.2.3        rmarkdown_2.4    
 [33] pkgconfig_2.0.3   htmltools_0.5.0   bibtex_0.4.2.3    sessioninfo_1.1.1
 [37] fastmap_1.0.1     htmlwidgets_1.5.2 rlang_0.4.7       readxl_1.3.1     
 [41] rstudioapi_0.11   httpcode_0.3.0    shiny_1.5.0       generics_0.0.2   
 [45] zoo_1.8-8         jsonlite_1.7.1    gtools_3.8.2      dplyr_1.0.2      
 [49] magrittr_1.5      Rcpp_1.0.5        fansi_0.4.1       ape_5.4-1        
 [53] RefManageR_1.2.12 lifecycle_0.2.0   stringi_1.5.3     yaml_2.2.1       
 [57] storr_1.2.1       MASS_7.3-53       pkgbuild_1.1.0    plyr_1.8.6       
 [61] grid_4.0.2        parallel_4.0.2    gdata_2.18.0      promises_1.1.1   
 [65] crayon_1.3.4      miniUI_0.1.1.1    lattice_0.20-41   conditionz_0.1.0 
 [69] hms_0.5.3         knitr_1.30        ps_1.3.4          pillar_1.4.6     
 [73] uuid_0.1-4        taxize_0.9.98     igraph_1.2.5      caper_1.0.1      
 [77] base64url_1.4     codetools_0.2-16  reshape2_1.4.4    pkgload_1.1.0    
 [81] crul_1.0.0        glue_1.4.2        rcrossref_1.1.0   evaluate_0.14    
 [85] data.table_1.13.0 remotes_2.2.0     renv_0.12.0       foreach_1.5.0    
 [89] vctrs_0.3.4       httpuv_1.5.4      urltools_1.7.3    testthat_2.3.2   
 [93] cellranger_1.1.0  purrr_0.3.4       tidyr_1.1.2       reshape_0.8.8    
 [97] assertthat_0.2.1  xfun_0.18         mime_0.9          xtable_1.8-4     
[101] later_1.1.0.1     tibble_3.0.3      iterators_1.0.12  suppdata_1.1-4   
[105] tinytex_0.26      memoise_1.1.0     ellipsis_0.3.1   

Duplicates

It looks like there are duplicate functions in the R script:

.wilman.2014a, starting at line 98 (I have made some changes to this) and starting at line 445.

.cariveau.2016, starting at line 87 and starting at line 296.

.jennings.2016a, starting at 1013 and 1034.

Make test suite

It's a bit terrifying we've gotten this far without one.

The downloads.R files do need one, regardless of what I might say about their pointlessness in the future, because in doing things like fixing #69 it's evident that having something to test against would be useful!!!

Function issues perez and delaRiva

WILL THERE IS AN UPLOADING ERROR THAT SAYS: EOF within quoted string. I think it might be an issue with the Quotes in Lat and Lon but I'm not sure how to fix it. Not as big of data sets as I thought so might not be worth it trying to fix them.


Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals,  : 
  invalid multibyte string at '<ba> 26<27> 23''S'
In addition: Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string

Everything else is good to go with these functions...


.perez.2014 <- function(...){
  data <- read.xls(ft_get_si('10.5061/dryad.d61jk/1', 'leaf%20traits%2c%20foliar%20freezing%20resistance%2c%20climatic%20niche.xlsx'), as.is=TRUE, sheet=1)
  data$Species <- tolower(gsub(" ", "_", data$Species, ignore.case = TRUE))
  metadata <- data[,c(2:4)]
  data <- data[,-c(2:4)]
  units <- c("cm^2", "gr/m^2", rep("N/mm^2",2), "NA", "latitude", "longitude")
  data <- .df.melt(data, "Species", units, metadata)
  return(data)
}

#similar issue for this as above.  I think it is the quotes in lat and long
.delaRiva.2015 <- function(...){
  data <- read.xls(ft_get_si('10.5061/dryad.dr275.2', 'Dryad_database.xls'), as.is=TRUE, sheet='Traits')
  data$Species <- tolower(gsub(" ", "_", data$Species, ignore.case = TRUE))
  metadata <- data[,c(2:7)]
  data <- data[,-c(2:7)]
  units <- c("m", "m^2", "cm^2", "g^-1", "m^2 Kg^-1", "μg g^-1", "%", "%", "g^-1", "g cm^-3", "g g^-1", "m g^-1", rep("NA", 4), "Latitude", "Longitude")
  data <- .df.melt(data, "Species", units, metadata)
  return(data)
}

Cheers,
Mal

Adding a function onto natdb

Hi Dr.Pearse,
I want to commit a function to the natdb downloads/r, but had no luck whatsoever. I tried making clones of the repository and pushing for change, but to no avail.

.paquette.2015

Hi,

After a fresh installation of MADtraits, I tried downloading trait data but it failed.

> library(MADtraits)
> data <- MADtraits(cache = "./data/downloaded data/traits/MADtraits/cache")
Downloading/loading data
'.' --> 1%; '|' --> 10% complete
|.........|.........|.........|.........|.........|.........|..No encoding supplied: defaulting to UTF-8.
.......|....Error in output[[i]] : subscript out of bounds
In addition: There were 45 warnings (use warnings() to see them)

It appears that the error comes from .paquette.2015 because the downloaded supplementary, a zip archive is corrupt. Code from .paquette.2015():

> data <- as.data.frame(readxl::read_xls(unzip(unzip(suppdata::suppdata("10.1002/ece3.1456", 
+                                                     1)))[2], sheet = 2, na = c("", "NA")))
Error in unzip(unzip(suppdata::suppdata("10.1002/ece3.1456", 1))) : 
  invalid zip name argument
In addition: Warning message:
In unzip(suppdata::suppdata("10.1002/ece3.1456", 1)) : zip file is corrupt
  • I tried opening the archive from suppdata cache but 7zip confirms it can't be opened.
  • I tried opening other supplementary files in the same format that suppdata just downloaded and could open them.
  • I went on the journal website, downloaded and opened the supplementary without problem
  • If the corrupt archive is replaced by the good one manually downloaded from the site, a modified .paquette.2015() with only one unzip succeeds (copied below).
> .paquette.2015 <- function(...) {
+ data <- as.data.frame(readxl::read_xls(unzip(suppdata::suppdata("10.1002/ece3.1456", 
+                                                     1))[2], sheet = 2, na = c("", "NA")))
+ data <- data[8:nrow(data), 2:8]
+ colnames(data) <- c("species", "occurrence", 
+                     "average_maximum_height", "wood_density", 
+                     "seed_mass", "shade_tolerance", "nitrogen_per_leaf_mass_unit")
+ data <- data[, names(data) != "occurrence"]
+ units <- c("m", "g/cm^3", "mg", NA, "%")
+ data$species <- tolower(gsub(" ", "_", data$species))
+ for (i in 2:6) data[, i] <- as.numeric(data[, i])
+ allNAs <- apply(data[2:6], 2, is.na)
+ data <- data[rowSums(allNAs) < 5, ]
+ return(MADtraits:::.df.melt(data, "species", units = units))
+ }
> .paquette.2015()
New names:
* `` -> ...3
* `` -> ...4
* `` -> ...5
* `` -> ...6
* `` -> ...7
* ...
A Trait DataBase containing:
            Species Traits Data-points:
Numeric          61      5          297
Categorical       0      0            0
Total            61      5          297
Units present. 
Warning message:
In .paquette.2015() : NAs introduced by coercion

I will look closer into that, hope I can help,

Alban

One for later

  • Traits of riparian woody plants responding to hydrological and hydraulic conditions: a northern Swedish database. María Dolores Bejarano, Judit Maroto, Christer Nilsson, Francisca Constança Aguiar. 2016. Vol: 97, Pages: 2892DOI: 10.1002/ecy.1533
  • LEDA
  • TRY
  • Root database

Windows errors

The following functions seem to error out on Windows machines. I would be grateful if someone could check them for me (please tick-off if these have been checked and found to work):

  • .ameztegui.2016
  • .delaRiva.2015
  • goncalves.2018

...this merges two other issues I'm closing because it's likely the same bug related to read_excel.

Use of fulltext::ft_get_si() with gdata::read.xls()

ft_get_si() is not working when used in conjunction with read.xls(), as shown for the functions below. ft_get_si() works when used in conjunction with read.csv() and read.xls() works when used without ft_get_si(). This could potentially be a problem with my machine. For each function, I've included the function name and the line of code that throws an error.

Error message
Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, : Intermediate file 'C:\Users\konrad\AppData\Local\Temp\RtmpsF6ra0\file1df45a2e6a0f.csv' missing!

kefi.2016
data <- read.xls(ft_get_si("10.5061/dryad.b4vg0", "chilean_metadata.xls"), sheet = 1)

maire.2016
data <- read.xls(ft_get_si("10.5061/dryad.j42m7.2", "globamax_data_160609 (for GEB ms).xlsx"), sheet = "Data")

These functions successfully download and read the files if I do not use ft_get_si(). grutters.2017, molinari.2014, and valido.2011 all use .xls(x) files from dryad and can be updated if this issue is fixed.

download file doesn't work

Loading from the downloaded temp file in .McCullough.2015() returns an error where a local copy of the file and associated file path does not.

Functions that don't work

Below is a list of the functions that don't work. This isn't a "list of shame", but figure out what's going on with your functions and check it off when it's done

  • .valido.2011
  • .perez.2013
  • .mesquita.2015
  • .maire.2016
  • .delgado.2015
  • .abakumova.2016
  • .tian.2016
  • .McCullough.2015

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.