willpearse / madtraits Goto Github PK
View Code? Open in Web Editor NEWMake A Database of trait data
License: Other
Make A Database of trait data
License: Other
readxl
very nicely loads all the column names in Unicode, which means some column names have sub/super-scripts for the units. I'm concerned this could cause problems down the line; check that it doesn't...
Here is a running list of problems with NATDB downloading on a PC. I give a description of what I think is going on, and then below that the code and the error.
First, there are lots that work. Basically anything that is a csv or a txt works. I stopped adding the ones that work.
# work fine -------------------------------------------
# .albouy.2015
data <- read.csv(unzip(suppdata("E096-203","Functional_data.zip", "esa_archives"), "Functional_data.csv"), sep=";")
# .anderson.2015
data<-read.csv(suppdata("10.1371/journal.pone.0166714",2))
# .artacho.2015
data <- read.csv2(suppdata("10.5061/dryad.qg062", "phenotypictraits.csv"), sep=';')
# .augspurger.2016a
data <- read.csv(suppdata("10.5061/dryad.56cn4","Data File 1. Diaspore traits.csv"), as.is=TRUE)
# .bengtsson.2016
data <- read.csv(suppdata('10.5061/dryad.62054', 'bengtsson_etal_2016_traits.csv'), as.is=TRUE)
# .comeault.2013
data <- read.table(suppdata("10.5061/dryad.ck2cm","Tcris_FHA_phenotypes.txt"),header=TRUE)
# .edwards.2015a
data <- read.csv(suppdata("E096-202", "Table1.csv", "esa_archives"))
# stopped adding ones that worked
ERROR 1
This is probabally the hardest error. It is with read_excel and suppdata with dryad download. I tried to just use suppdata to download xls from dryad and could not open the downloaded files... so there is some interaction going on here.
# does not work -----------------------------------------------------------------
# .ameztegui.2016
data <- as.data.frame(read_excel(suppdata("10.5061/dryad.12b0h/2","FunctionalTraits_Dryad.xlsx")))
# .arnold.2016
data <- as.data.frame(read_excel(suppdata("10.5061/dryad.t3d52", "Arnold_etal_2016_functecol_dataset.xlsx"), as.is = TRUE, skip = 3))
# .aubret.2012a
data <- as.data.frame(read_excel(suppdata("10.5061/dryad.14cr5345", "Aubret%2053172.xlsx"), sheet=1))
#all of the above give this error
Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim, :
Evaluation error: error -103 with zipfile in unzGetCurrentFileInfo
.
ERROR 2
This one just runs and runs. It is like the above one but it deals with figshare. So I think the problem is the suppdata and read_excel interaction (it does not matter where you are downloading from). If you can fix this interaction then ERRORS 1 and 2 might go away...
# .bello_bedoy.2015b
data <- as.data.frame(read_excel(
suppdata("10.6084/m9.figshare.1190766.v2","mating occurrencePzoe_2010.xls")
))
ERROR 3
Problem with the suppdata function not the unzip I think. It is having a problem with wiley... is it because these are not open access?
# .benesh.2017
data <- read.csv(unzip(suppdata("10.1002/ecy.1680", 1), "CLC_database_lifehistory.csv"))
Error in download.file(url, destination, quiet = TRUE) :
cannot open URL 'https://onlinelibrary.wiley.com/action/'
In addition: Warning message:
In download.file(url, destination, quiet = TRUE) :
cannot open URL 'https://onlinelibrary.wiley.com/action/': HTTP status was '404 Not Found'
# .engemann.2016
data <- read.delim(unzip(suppdata("10.1002/ecy.1569", 1), "DataS1/GrowthForm_Final.txt"))
Error in download.file(url, destination, quiet = TRUE) :
cannot open URL 'https://onlinelibrary.wiley.com/action/'
In addition: Warning message:
In download.file(url, destination, quiet = TRUE) :
cannot open URL 'https://onlinelibrary.wiley.com/action/': HTTP status was '404 Not Found'
ERROR 4
this one needs a file extension
# .cariveau.2016
data <- as.data.frame(read_excel(suppdata("10.1371/journal.pone.0151482", 3), sheet="TableS1_v2"))
Error: Missing file extension.
ERROR 5
This one crashes R and gives me a bomb in R studio!
.friedman.2014 <- function(...){
data <- as.data.frame(read_excel(suppdata("10.5061/dryad.489c7","NILs_rawdata.xls")))
fread: wanted 1 got 0 loc=570826752
seek: wanted to seek to sector...
THEN R ABORTS
OK I got through to .friedman.2014 ...
I checked 16 functions, from .kraft.2015b to .marx.2016, alphabetically. The following functions did not have problems in the downloads:
.lagisz.2013
.limpens.2014a.
.limpens.2014b
.lu.2016a
.lu.2016b
.lu.2016c
.lupold.2013
.martin.2016
The following functions had problems in the downloads:
.kraft.2015b: Potentially non-trait variables: abundance
.kuo.2014: Needs to be run twice to work, otherwise good
.lawson.2015: One species is "Tree (Sportsmans 57)", otherwise good
.lessard.2016: Missing metadata. Some units seem wrong: "?" and "range"
.lislevand.2006: "egg_mass" is measured in "f"
.madin.2016: no metadata. "Skeletal micro-density" units should be "g cm^-3". looks like there's some numeric variables in the character list
.martin.2014: species name is just "O_canadensis"
.marx.2016: There are a few hundred empty rows in the numeric list with units. For example, see “data.frame(a$numeric$variable,a$numeric$units)[620:650,]”. Otherwise, good
I added citations to citations.tsv for the following functions:
.lagisz.2013
.lislevand.2006
.lupold.2013
.martin.2014
The xml2
package has been updated, and some internal functions within fulltext
still use it. This is causing natdb
's package checks to fail, as this is a depreciation warning.
I need to update fulltext
before this can be submitted, and before the travis-ci builds can stop erroring!
@willpearse I do not know how to make a pull request for a wiki, but here I made some edits to the GitHub Things page that you can include if you like.
.kolbe.2011
seems to have meta-data that is the copy of the data. What is going on?!
Currently doesn't work; needs a skip attribute and some love
...can't be loaded easily at present, which is silly as .df.melt
turns them into that format. This is a problem for .madin.2016
and one other function you wrote 'on the day'.
If this error is because of a TyPEo you can take away my R card!
I am unable to download a lot of the data sets on my PC. I think the error is coming from fulltext::ft_get_si or gdata::read.xls
Example Code:
mydats <- c(".artacho.2015", ".vanier.2013", ".aubret.2012b")
dat <- natdb(datasets = mydats)
Error in findPerl(verbose = verbose) :
perl executable not found. Use perl= argument to specify the correct path.
Error in file.exists(tfn) : invalid 'file' argument
But this code without the 3rd dataset works fine:
mydats <- c(".artacho.2015", ".vanier.2013")
dat <- natdb(datasets = mydats)
Note that these first two data sets use read.csv2 and read.delim not gdata::read.xls
has a weird encoding thing
has got encoding problems I think
...and never did. Whoever uploaded it needs to fix the error in it
Will - clean up the names
.fitzgerald.2017
.wright.2004
.kamath.2016
.husak.2016
jennings.2016a
- this one is weird, as it's been duplicated but the two versions are different. Remember square root is x^1/2, log(mm), and abundance is #When loading the data, an error is thrown because the DOI cannot be found. Run on a windows system.
I don't think these traits have been transformed; I think they were transformed before analysis in the paper (not here).
ALSO report log as log(mm), and it should be log10(mm) or ln(mm), and only log(mm) if you don't know. Not log_mm
I would like to better understand what this package does as it sounds interesting, I have read the main page but I am not sure I understand.
It takes a .csv file and formats it into a format that follow the traitdatabase standard that is described in Kissling et al. 2018?
It doesn't look like you have this one: Functional traits of the understory plant community of a pyrogenic longleaf pine forest across environmental gradients
Currently only .wright.2004
, but worth checking
I presume that's meant to be µm - if so, please go through and correct
Dear willpearse,
The last 4 entries in the citations.tsv file don't have the first column, that seems a row counter that's missing.
https://onlinelibrary.wiley.com/doi/full/10.1111/jeb.12068
https://doi.org/10.5061/dryad.kf490/4
https://royalsocietypublishing.org/doi/pdf/10.1098/rsbl.2005.0428
https://doi.org/10.1098/rsbl.2005.0428
https://onlinelibrary.wiley.com/doi/full/10.1111/evo.12132
https://doi.org/10.5061/dryad.qj811/1
https://royalsocietypublishing.org/doi/full/10.1098/rsbl.2014.0043
10.1098/rsbl.2014.0043
...because the dot-files are invisible on Mac, which is great, but probably makes people not trust their cache!... Maybe zip up the cache at the very end or something so people can move it around? A thought, anyway...
cleaning.R
filegsub(" ", "_", species)
!!!It would be helpful if the user can see all the data sets that could be loaded by NATDB before trying to download all the data. Currently, natdb_citations is stored in sys.data to be used by citations(), but I suggest moving it (or just make a copy of it) to /data so that the user can call it directly as data natdb_citations.
I can make some changes and put in a pull request, but first I need a bit of a tutorial on how you maintain the package so I do not mess it up (i.e., do you use Roxygen or not etc.).
don't work
Hi,
In an R project using renv, I just tried installing MADtraits with devtools::install_github('willpearse/MADtraits')
but it failed, the error message being:
* installing *source* package 'MADtraits' ...
** using staged installation
** R
Error in load(srcFile, e) :
(converted from warning) strings not representable in native encoding will be translated to UTF-8
ERROR: unable to build sysdata DB for package 'MADtraits'
* removing 'C:/Users/as80fywe/idiv/sTeTra/sTeTra/renv/library/R-4.0/x86_64-w64-mingw32/MADtraits'
Error: Failed to install 'MADtraits' from GitHub:
(converted from warning) installation of package ‘C:/Users/as80fywe/AppData/Local/Temp/Rtmpqalzq5/file1fd479567573/MADtraits_1.0-0.tar.gz’ had non-zero exit status
In addition: Warning messages:
1: In untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers
2: In untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers
After a search online, installation succeeded using:
> withr::with_envvar(c(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true"),
+ remotes::install_github('willpearse/MADtraits')
+ )
Using this method, the same warnings appeared but no error:
* installing *source* package 'MADtraits' ...
** using staged installation
** R
Warning in load(srcFile, e) :
strings not representable in native encoding will be translated to UTF-8
Warning in load(srcFile, e) :
input string 'diplopylidium_nü¾�¦˜¼lleri' cannot be translated to UTF-8, is it valid in 'UTF-8' ?
** inst
** byte-compile and prepare package for lazy loading
n*** Successfully loaded .Rprofile ***n
** help
*** installing help indices
converting help for package 'MADtraits'
finding HTML links ... done
MADtraits html
MADtraits_citations html
MADtraits_datasets html
citations html
clean.MADtraits html
datasets html
** building package indices
n*** Successfully loaded .Rprofile ***n
** installing vignettes
'MADtraits-intro.Rnw'
** testing if installed package can be loaded from temporary location
n*** Successfully loaded .Rprofile ***n
** testing if installed package can be loaded from final location
n*** Successfully loaded .Rprofile ***n
** testing if installed package keeps a record of temporary installation path
* DONE (MADtraits)
Sorry that I have no clue about what the problem is...
Cheers,
Alban
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] drake_7.12.5 MADtraits_1.0-0
loaded via a namespace (and not attached):
[1] nlme_3.1-149 fs_1.5.0 bold_1.1.0 usethis_1.6.3
[5] lubridate_1.7.9 devtools_2.3.2 progress_1.2.2 filelock_1.0.2
[9] httr_1.4.2 rprojroot_1.3-2 tools_4.0.2 backports_1.1.10
[13] R6_2.4.1 DT_0.15 withr_2.3.0 tidyselect_1.1.0
[17] prettyunits_1.1.1 processx_3.4.4 curl_4.3 compiler_4.0.2
[21] cli_2.0.2 xml2_1.3.2 desc_1.2.0 triebeard_0.3.0
[25] mvtnorm_1.1-1 callr_3.4.4 handlr_0.2.0 convertr_0.1
[29] stringr_1.4.0 digest_0.6.25 txtq_0.2.3 rmarkdown_2.4
[33] pkgconfig_2.0.3 htmltools_0.5.0 bibtex_0.4.2.3 sessioninfo_1.1.1
[37] fastmap_1.0.1 htmlwidgets_1.5.2 rlang_0.4.7 readxl_1.3.1
[41] rstudioapi_0.11 httpcode_0.3.0 shiny_1.5.0 generics_0.0.2
[45] zoo_1.8-8 jsonlite_1.7.1 gtools_3.8.2 dplyr_1.0.2
[49] magrittr_1.5 Rcpp_1.0.5 fansi_0.4.1 ape_5.4-1
[53] RefManageR_1.2.12 lifecycle_0.2.0 stringi_1.5.3 yaml_2.2.1
[57] storr_1.2.1 MASS_7.3-53 pkgbuild_1.1.0 plyr_1.8.6
[61] grid_4.0.2 parallel_4.0.2 gdata_2.18.0 promises_1.1.1
[65] crayon_1.3.4 miniUI_0.1.1.1 lattice_0.20-41 conditionz_0.1.0
[69] hms_0.5.3 knitr_1.30 ps_1.3.4 pillar_1.4.6
[73] uuid_0.1-4 taxize_0.9.98 igraph_1.2.5 caper_1.0.1
[77] base64url_1.4 codetools_0.2-16 reshape2_1.4.4 pkgload_1.1.0
[81] crul_1.0.0 glue_1.4.2 rcrossref_1.1.0 evaluate_0.14
[85] data.table_1.13.0 remotes_2.2.0 renv_0.12.0 foreach_1.5.0
[89] vctrs_0.3.4 httpuv_1.5.4 urltools_1.7.3 testthat_2.3.2
[93] cellranger_1.1.0 purrr_0.3.4 tidyr_1.1.2 reshape_0.8.8
[97] assertthat_0.2.1 xfun_0.18 mime_0.9 xtable_1.8-4
[101] later_1.1.0.1 tibble_3.0.3 iterators_1.0.12 suppdata_1.1-4
[105] tinytex_0.26 memoise_1.1.0 ellipsis_0.3.1
It looks like there are duplicate functions in the R script:
.wilman.2014a, starting at line 98 (I have made some changes to this) and starting at line 445.
.cariveau.2016, starting at line 87 and starting at line 296.
.jennings.2016a, starting at 1013 and 1034.
It's a bit terrifying we've gotten this far without one.
The downloads.R
files do need one, regardless of what I might say about their pointlessness in the future, because in doing things like fixing #69 it's evident that having something to test against would be useful!!!
WILL THERE IS AN UPLOADING ERROR THAT SAYS: EOF within quoted string. I think it might be an issue with the Quotes in Lat and Lon but I'm not sure how to fix it. Not as big of data sets as I thought so might not be worth it trying to fix them.
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals, :
invalid multibyte string at '<ba> 26<27> 23''S'
In addition: Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
Everything else is good to go with these functions...
.perez.2014 <- function(...){
data <- read.xls(ft_get_si('10.5061/dryad.d61jk/1', 'leaf%20traits%2c%20foliar%20freezing%20resistance%2c%20climatic%20niche.xlsx'), as.is=TRUE, sheet=1)
data$Species <- tolower(gsub(" ", "_", data$Species, ignore.case = TRUE))
metadata <- data[,c(2:4)]
data <- data[,-c(2:4)]
units <- c("cm^2", "gr/m^2", rep("N/mm^2",2), "NA", "latitude", "longitude")
data <- .df.melt(data, "Species", units, metadata)
return(data)
}
#similar issue for this as above. I think it is the quotes in lat and long
.delaRiva.2015 <- function(...){
data <- read.xls(ft_get_si('10.5061/dryad.dr275.2', 'Dryad_database.xls'), as.is=TRUE, sheet='Traits')
data$Species <- tolower(gsub(" ", "_", data$Species, ignore.case = TRUE))
metadata <- data[,c(2:7)]
data <- data[,-c(2:7)]
units <- c("m", "m^2", "cm^2", "g^-1", "m^2 Kg^-1", "μg g^-1", "%", "%", "g^-1", "g cm^-3", "g g^-1", "m g^-1", rep("NA", 4), "Latitude", "Longitude")
data <- .df.melt(data, "Species", units, metadata)
return(data)
}
Cheers,
Mal
Hi Dr.Pearse,
I want to commit a function to the natdb downloads/r, but had no luck whatsoever. I tried making clones of the repository and pushing for change, but to no avail.
Hi,
After a fresh installation of MADtraits, I tried downloading trait data but it failed.
> library(MADtraits)
> data <- MADtraits(cache = "./data/downloaded data/traits/MADtraits/cache")
Downloading/loading data
'.' --> 1%; '|' --> 10% complete
|.........|.........|.........|.........|.........|.........|..No encoding supplied: defaulting to UTF-8.
.......|....Error in output[[i]] : subscript out of bounds
In addition: There were 45 warnings (use warnings() to see them)
It appears that the error comes from .paquette.2015 because the downloaded supplementary, a zip archive is corrupt. Code from .paquette.2015():
> data <- as.data.frame(readxl::read_xls(unzip(unzip(suppdata::suppdata("10.1002/ece3.1456",
+ 1)))[2], sheet = 2, na = c("", "NA")))
Error in unzip(unzip(suppdata::suppdata("10.1002/ece3.1456", 1))) :
invalid zip name argument
In addition: Warning message:
In unzip(suppdata::suppdata("10.1002/ece3.1456", 1)) : zip file is corrupt
> .paquette.2015 <- function(...) {
+ data <- as.data.frame(readxl::read_xls(unzip(suppdata::suppdata("10.1002/ece3.1456",
+ 1))[2], sheet = 2, na = c("", "NA")))
+ data <- data[8:nrow(data), 2:8]
+ colnames(data) <- c("species", "occurrence",
+ "average_maximum_height", "wood_density",
+ "seed_mass", "shade_tolerance", "nitrogen_per_leaf_mass_unit")
+ data <- data[, names(data) != "occurrence"]
+ units <- c("m", "g/cm^3", "mg", NA, "%")
+ data$species <- tolower(gsub(" ", "_", data$species))
+ for (i in 2:6) data[, i] <- as.numeric(data[, i])
+ allNAs <- apply(data[2:6], 2, is.na)
+ data <- data[rowSums(allNAs) < 5, ]
+ return(MADtraits:::.df.melt(data, "species", units = units))
+ }
> .paquette.2015()
New names:
* `` -> ...3
* `` -> ...4
* `` -> ...5
* `` -> ...6
* `` -> ...7
* ...
A Trait DataBase containing:
Species Traits Data-points:
Numeric 61 5 297
Categorical 0 0 0
Total 61 5 297
Units present.
Warning message:
In .paquette.2015() : NAs introduced by coercion
I will look closer into that, hope I can help,
Alban
The best way I can think of to keep an updated list of all the things to be incorporated is this Google Drive document: https://docs.google.com/spreadsheets/d/1J9t8nHJfY8qkMcdbm3PJKIwRqtQ8um_ED9FKFZbocUQ/edit?usp=sharing
I realise the ridiculousness of using such a thing on GitHub, but I really cannot bring myself to deal with a pull request every time someone finds a paper
problemo
The following functions seem to error out on Windows machines. I would be grateful if someone could check them for me (please tick-off if these have been checked and found to work):
.ameztegui.2016
.delaRiva.2015
goncalves.2018
...this merges two other issues I'm closing because it's likely the same bug related to read_excel
.
Doesn't work
ft_get_si()
is not working when used in conjunction with read.xls()
, as shown for the functions below. ft_get_si()
works when used in conjunction with read.csv()
and read.xls()
works when used without ft_get_si()
. This could potentially be a problem with my machine. For each function, I've included the function name and the line of code that throws an error.
Error message
Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, : Intermediate file 'C:\Users\konrad\AppData\Local\Temp\RtmpsF6ra0\file1df45a2e6a0f.csv' missing!
kefi.2016
data <- read.xls(ft_get_si("10.5061/dryad.b4vg0", "chilean_metadata.xls"), sheet = 1)
maire.2016
data <- read.xls(ft_get_si("10.5061/dryad.j42m7.2", "globamax_data_160609 (for GEB ms).xlsx"), sheet = "Data")
These functions successfully download and read the files if I do not use ft_get_si()
. grutters.2017, molinari.2014, and valido.2011 all use .xls(x) files from dryad and can be updated if this issue is fixed.
Loading from the downloaded temp file in .McCullough.2015() returns an error where a local copy of the file and associated file path does not.
I am getting this error when I try to install natdb... my guess is that the build is failing now?
Downloading GitHub repo willpearse/natbd@master
from URL https://api.github.com/repos/willpearse/natbd/zipball/master
Installation failed: Not Found (404)
Same is true for nacdb
Downloading GitHub repo pearselab/nacbd@master
from URL https://api.github.com/repos/pearselab/nacbd/zipball/master
Installation failed: Not Found (404)
Below is a list of the functions that don't work. This isn't a "list of shame", but figure out what's going on with your functions and check it off when it's done
.valido.2011
.perez.2013
.mesquita.2015
.maire.2016
.delgado.2015
.abakumova.2016
.tian.2016
.McCullough.2015
@willpearse I am wondering if there is an existing MADtraits db that it makes sense for us to contribute. Is there a list of existing instances of MADtraits to examine.
This is for https://github.com/Extended-Bee-Network/bee-interaction-database
Thank you!
Genus names are amiguous; they need to be filled in
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.