ecologicaltraitdata / traitdataform Goto Github PK

View Code? Open in Web Editor NEW

32.0 6.0 9.0 2.17 MB

A package to manage and compile functional trait data into predefined templates

Home Page: https://ecologicaltraitdata.github.io/traitdataform/

License: Other

R 100.00%

dataset trait-datasets harmonization r-package ecology

traitdataform's People

Contributors

Stargazers

Watchers

Forkers

caterinap yaoqili katrinleinweber willgearty mengchiehfeng r-quantities lauterbur bryanariasq02 nxushark

traitdataform's Issues

include more trait datasets incl. Std version

the package should provide more datasets from the living spreadshet (fdschneider/bexis_traits#20).

identify data for integration
write script to extract data upon call of data() (files are placed in 'data/' directory)
include documnetation in package files 'R/data.R'

A standardised version of each dataset should be provided as well (linking to trait Thesauri and taxon Ontologies).

save warnings of taxon matching into own column

If a synonym was mapped to an accepted name, or spelling was corrected, an entry should show up in column warnings or taxonRemarks.

example data

I want to include a suite of example trait data. Criteria:

data are already publicly available
authors agree to re-publication within the package
they should cover the different scenarios for trait data: factorial vs. numeric, logical; matrix format vs. table format; literature vs. measured data

Possible data:

trait data from Gossner et al
pollinator traits of C. Weiner (Open Access on Bexis)
...

Readme symbols not translating on .io

An FYI, for some reason the apostrophes (') at the first header of your README (Package 'traitdataform') are not translating correctly to your .io landing page.

The apostrophes are showing up for me (Chrome Browser, English language) as " â��"

add traitmap structure

add traitmap as universal source for

trait input names
input of units and other measurement info
thesaurus mapping
factor level mapping
derived columns, via mutate() (e.g. to add ratios or indices, or logical traits)

pulldata("pantheria") not working

There is an error in pantheria.R code that is causing pulldata("pantheria") not
to work.

amniota needs to be changed to pantheria and
read.csv needs to be changed to read.delim

amniota <- utils::read.csv("PanTHERIA_1-0_WR05_Aug2008.txt",
fileEncoding = "UTF-8",
stringsAsFactors = FALSE)

pantheria <- utils::read.delim("PanTHERIA_1-0_WR05_Aug2008.txt",
fileEncoding = "UTF-8",
stringsAsFactors = FALSE)

I composed the function mutate.traitdata() to modify a traitdata object and add derived traits.
Still open is the derivation of units for these traits: this requires units being given in the first place.

Since units can be provided in the trait standardization procedure, this is of minor importance.

This replaces issue #11.

add functionality to update user provided unit

After running the standardize.taxonomy function, the user provided unit might need to be changed. a re-running of as.traitdata is invalid. Also the mapping function does not provide this. It might not be necessary to write an own function, but providing a one-liner that maps units according to column "traitName".

Latin-1 character issues on CRAN

Am 11.04.2019 um 08:00 schrieb Prof Brian Ripley:

This concerns packages

[...] traitdataform [...]

which are failing their checks in a strict Latin-1 locale: see the debian-clang results. (Several of these seem to stem from vcr.)

On Linux, such a locale can be ensured via LC_CTYPE=en_US (which may need installing for distros that micro-package). AFAWK it cannot be done on Windows.

The character in don't is an (ASCII) apostrophe, not a right quote:

don’t

(with a right quote) is used in packages [...] (and others not failing).

en and em dashes are not portable, found in packages
[...] traitdataform [...] .

Using \uxxxx coding for non-ASCII chars in R character strings should help in some cases (see 'Writing R Extensions').

Please correct before May 10 to safely retain the package on CRAN.

extract Trait Thesauri via API

TOP and T-Sita, as well as some other physiological Ontologies could be tapped as source for looking up and matching trait names to definitions and get URIs as traitID.

The package that seems to allow access to Ontologies from R is ontoCAT.

I'm not experienced enough with API usage. Maybe someone wants to invest time in this for a later version.

install issue on mac: package units

can't install package on mac:

ERROR: dependency ‘units’ is not available for package ‘traitdataform’

> install.packages('units')

   package ‘units’ is available as a source package but not as a binary

Warning in install.packages :
  package ‘units’ is not available (for R version 3.1.2)

add other methods for taxonomy matching

if GBIF matching does not return a match, apply other nameservers.
Example:

get_gbif_taxonomy("Carabus arvensis")

add as.metadata() function

improve metadata handling by providing standard object class. A list of named objects, each a named list of metadata.

Method print.metadata() should produce the output we see at print.traitdata.

`write_metadata()` function

to produce

human readable txt or md file, containing a header, trait definitions (if available), license statement, and dedicated space for manual additions.
metadata.xml or .eml

Wrapper function and option to map to ETS v0.9 for backwards compatibility

With recent commits the package now produces output according to ETS v0.10.

This may break code that relies on columns with a *Std ending. To fix this, you should redirect those calls to the plain terms. Calls to plain terms, e.g. scientificName should now point to verbatimScientificName. See definitions in ETS.

For data publications, always refer to the version of ETS that has been applied to avoid misunderstandings.

I will provide a wrapper function to produce output according to v0.9. (please vote here if you require it urgently)

Add argument 'template' to standardize()

The output created by the standardize functions is parsimonious, i.e. contains only the columns that have been explicitly provided. A template argument should provide the terms and order of terms that are desired as output. This would be used to create a standardized output, e.g. for upload to BExIS or other services that expect a particular structure.

The template could be just a vector of exact column names (from the vocabulary), or a named vector that renames columns according to the desired output. If wrapping around transform, this could even provide new computed columns.

This functionality adds quite generous power to the package, since it allows to map any input onto any output.

Installation Error on Windows

When I try to install the package on Windows, I get the following error message:

Warnung: Ausführung von Kommando 'curl -s -S "http://onlinelibrary.wiley.com/store/10.1002/ecy.1783/asset/supinfo/ecy1783-sup-0002-DataS1.zip?v=1&s=361647dd673d04c9b0838931cda1cf28e1f6eb1f" -o "C:\Users\mbiber\AppData\Local\Temp\RtmpkNdjTP\file2b702e22858.zip"' ergab Status 127
Error in download.file("http://onlinelibrary.wiley.com/store/10.1002/ecy.1783/asset/supinfo/ecy1783-sup-0002-DataS1.zip?v=1&s=361647dd673d04c9b0838931cda1cf28e1f6eb1f", :
'curl' call had nonzero exit status
Error : unable to load R code in package 'traitdataform'
ERROR: lazy loading failed for package 'traitdataform'
removing 'C:/Users/mbiber/Documents/R/win-library/3.4/traitdataform'
Installation failed: Command failed (1)

Cheers,

Matthias

factorial traits mapping

the function standardize.traits() is supposed to map factor levels provided into harmonized factor levels. For this, a more advanced mapping structure might be required in parameter traitmap.

Very Minor: documentation error for get_gbif_taxonomy

Hi,
This hardly deserves an "issue" but noticed that get_gbif_taxonomy documentation says that the default is fuzzy = FALSE but I think the code as written has the default as fuzzy = TRUE. Should be a quick fix when you next update :-)

get_gbif_taxonomy fails on synonymous genus

Hi,

Ive encountered an issue with the get_gbif_taxonomy breaking when trying to resolve a synonymous genus. See below example:

traitdataform::get_gbif_taxonomy("Epiptera septentrionalis",subspecies = FALSE, verbose=TRUE, higherrank=FALSE, fuzzy=TRUE ,resolve_synonyms = TRUE )

The problem seems to be that a taxon is flagged as synonymous at any rank, but this function conducts a new get_gbifid_ search for only the species:

taxize::get_gbifid_(temp[[i]]$species[which.max(temp[[i]]$confidence)], messages = verbose)

Which is NULL, breaking the function

add Std versions of all datasets

each main dataset might contain a Std version, which is harmonized according to the traitdata standard. This requires

copying the examples from the vignette to the dataset definition
extending the documentation to describe the Std version of each dataset

standardize.taxonomy with subspecies names

Ich versuche die function standardize.taxonomy am passerines datensatz anzuwenden.
Leider bricht der Function call standardize.taxonomy bei der Species (Acrocephalus familiaris kingi) ab.
Hier der Code dafür:

library(traitdataform)
# Merge Genus and Species into one column
passerines <- tidyr::unite(passerines, Genus, Species, col="scientificName", sep=" ")
# Separate species and subspecies by " " rather than "_"
passerines$scientificName <- sapply(passerines$scientificName, function(x) paste0(strsplit(x, split="_")[[1]][1:2], collapse=" "))
passerines$scientificName <- factor(passerines$scientificName)
passerines_std <- standardize.taxonomy(passerines, return="scientificNameStd")

Vielen Dank für deine Hilfe.

parameter in standardize(): contain all columns of glossary for BExIS

An additional parameter can be set to produce data that can easily be uploaded to BExIS.
It should be a parameter in standardize.exploratories() which will be applied only within the wrapper function standardize() if one of its parameters is set.

Problems following the instructions in README

I am trying to follow the steps in the README but stumbled over some issues.
I'm running traitdataform_0.2.6 (installed when using devtools::install_github()) in:

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: NixOS 18.03.133070.89ff9f94b67 (Impala)

As this might be a rather specific setup I'm also using an Ubuntu docker container for testing:

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS

In both instances I get the same errors. The first error encountered is:

> library(traitdataform)
> data(carabids)
Warning message:
In data(carabids) : data set 'carabids' not found

This dataset can be manually loaded via:

> source("/usr/local/lib/R/site-library/traitdataform/extdata/carabids.R")
> ls()
[1] "carabids"

Next, the creation of the thesaurus works but an error occures in a subtask of standardize():

> thesaurus <- as.thesaurus(
+        body_length = as.trait("body_length", 
+          expectedUnit = "mm", 
+          identifier = "length"
+          ), 
+        antenna_length = as.trait("antenna_length", 
+          expectedUnit = "mm", 
+          identifier = "antenna"
+          ),
+        metafemur_length = as.trait("metafemur_length", 
+          expectedUnit = "mm", 
+          identifier = "metafemur"
+          ),
+        eyewidth = as.trait("eyewidth_corr", 
+          expectedUnit = "mm", 
+          identifier = "eyewidth"
+          )
+ )
>                           
> traitdataset1 <- standardize(carabids,
+             thesaurus = thesaurus,
+             taxa = "name_correct",
+             units = "mm"
+             )
Input is taken to be a species -- trait matrix. If this is not the case, please provide parameters!
Error in taxize::get_gbifid_(resolved$matched_name2, verbose = verbose) : 
  unused argument (verbose = verbose)

Also setting verbose in standardize to any value explicitly does not help as the internal function taxize::get_gbifid_ does not seem to accept this parameter at all. At least not in the version installed.

Here is the full `sessionInfo` of the Ubuntu instance

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] traitdataform_0.2.6

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18      xml2_1.2.0        magrittr_1.5      units_0.6-0      
 [5] getPass_0.2-2     ape_5.1           lattice_0.20-35   R6_2.2.2         
 [9] rlang_0.2.2       foreach_1.4.4     httr_1.3.1        stringr_1.3.1    
[13] plyr_1.8.4        tools_3.4.4       parallel_3.4.4    bold_0.5.0       
[17] grid_3.4.4        data.table_1.11.4 nlme_3.1-131      iterators_1.0.10 
[21] tibble_1.4.2      httpcode_0.2.0    taxize_0.9.4      crayon_1.3.4     
[25] reshape2_1.4.3    codetools_0.2-15  bitops_1.0-6      triebeard_0.3.0  
[29] RCurl_1.95-4.11   curl_3.2          crul_0.6.0        stringi_1.2.4    
[33] pillar_1.3.0      compiler_3.4.4    urltools_1.7.1    XML_3.98-1.16    
[37] jsonlite_1.5      reshape_0.8.7     zoo_1.8-3

functionality to add occurence and measurement level information

feeding in a data table to link to measurementID or occurenceID or locationID (for georeferencing exploratories data).

Update to ETS v0.10

The Ecological Trait-data Standard Vocabulary has been updated to v0.10 in the process of paper re-submission. Key terms have been modified to better reflect the verbatim or standardised character of entries. This now requires some major changes in the as.traitdata() as well as standardize() functions.

replace input column labels with verbatim* terms.
replace standard column labels with plain terms.
switch glossary to v0.10/ETS.csv

update function

there is need for a function that updates an already formatted traitdataset by

adding derived traits, e.g. ratios or binary traits,
updating units or measurement level information (#3)

derived columns, via mutate() (e.g. to add ratios or indices, or logical traits)

add print.trait() function

to preview trait definitions

revise vignette to use demo data

include datasets for testing

Since the reference to the external trait datasets on Dryad and other sources are not stable enough, a next version should include the raw data in the package. This would include

arthropodtraits
heteroptera
heteroptera_raw
carabids

probably the other demo data would be removed to minimize these issues. Ideally, recipes for pulling datasets from published trait data (e.g. pantheria #43) would be provided elsewhere, e.g. through the Open Traits Network.

traitdataform not available from CRAN anymore (since April 30th 2021)

Hello traitdataform developers 👋
I don't know to what extent you're still working on traitdataform but I wanted to warn you that it is not available on CRAN with the following message:

Archived on 2021-04-30 as check problems were not corrected in time.

The check issues are available here: https://cran-archive.r-project.org/web/checks/2021/2021-04-30_check_results_traitdataform.html

It seems that there are two causes for the check errors:

The dataset arthropodtraits used in several places doesn't seem to be accessible through the package
A number of times a table is used but with duplicated row.names which make R complain

I can offer to try to make a PR to solve these issues and help have traitdataform back on CRAN as I think it's important that this package is accessible to a maximum number of people!

Thanks :)

resolve issues in get_gbif_taxonomy (bug report by Felix Neff)

in der Funktion “get_gbif_taxonomy” wird der Funktion “taxize::get_gbifid_” das Argument “verbose” weitergegeben, welches diese Funktion gar nicht unterstützt (Zeile 8). Ich kriegte darum immer gleich ein Fehlermeldung. Wenn gelöscht, funktioniert’s.
ebenfalls in der Funktion “get_gbif_taxonomy” wird die Funktion “taxize::gnr_resolve” abgerufen (Zeile 4). Da verstecken sich zwei Probleme:
- der Funktion wird das Argument "preferred_data_sources = c(11)” mitgegeben. Wenn stattdessen das Argument "data_source_ids = c(11)” mitgegeben wird, sind die Resultate besser, sprich es gibt weniger fehlende Matches (z.B. wird Cicindela_silvatica aus dem Beispieldatensatz carabids dann gefunden). Ich habe nicht rausgefunden, warum das so ist, aber scheint auf jeden Fall zuverlässiger zu funktionieren.
  - gravierender ist, dass diese Funktion (taxize::gnr_resolve) eine Tabelle zurückgibt, welche nur die Namen enthält, für welche Matches gefunden wurden. Da diese Tabelle schlussendlich über die Zeilennummer mit dem Ursprungsdatensatz verbunden wird, führt das bereits bei einer fehlenden Übereinstimmung zu einem sich fortpflanzenden Fehler (dies passiert auch mit Cicindela_silvatica im Beispieldatensatz!). Ich konnte das Problem für mich lösen, indem ich in der “get_gbif_taxonomy" Funktion in Zeile 56 „scientificName = x[i]“ durch „scientificName = resolved$user_supplied_name[i]“ ersetzt habe. Zusätzlich habe ich dann in der “standardize.taxonomy" Funktion das Argument „all = TRUE“ in die merge.data.frame Funktion eingefügt (Zeile 7). Sonst spuckt diese Funktion eine verkürzte Tabelle aus (ohne die fehlenden Übereinstimmungen)
Ebenfalls in der Funktion standardize.taxonomy (Zeile 5): temp <- method(levels(droplevels(as.factor(x$scientificName)), ...) verhindert eine wohl häufige Fehlermeldung (wenn class == character)

prepare for CRAN

Cleaning the package up for CRAN will require quite some work. Check produces plenty of warnings, mostly related with ASCII encoding, incomplete function definitions and documentation, generic method consistency, and some package dependencies.

Errors

* checking examples ... ERROR
Running examples in 'traitdataform-Ex.R' failed
The error occurred in:
...

library('traitdataform')
Error in library("traitdataform") :
there is no package called 'traitdataform'
Execution halted

Warnings

* checking S3 generic/method consistency ... WARNING
print:
function(x, ...)
print.thesaurus:
function(x)
print:
function(x, ...)
print.trait:
function(x)
print:
function(x, ...)
print.traitdata:
function(x)
See section 'Generic functions and methods' in the 'Writing R
Extensions' manual.
* checking Rd files ... WARNING
amphibio.Rd: non-ASCII input and no declared encoding
arthropodtraits.Rd: non-ASCII input and no declared encoding
as.traitdata.Rd: non-ASCII input and no declared encoding
heteroptera_raw.Rd: non-ASCII input and no declared encoding
mammaldiet.Rd: non-ASCII input and no declared encoding
read.service.Rd: non-ASCII input and no declared encoding
problems found in 'amphibio.Rd', 'arthropodtraits.Rd', 'as.traitdata.Rd', 'heteroptera_raw.Rd', 'mammaldiet.Rd', 'read.service.Rd'
* checking for missing documentation entries ... WARNING
Undocumented code objects:
'glossary'
All user-level objects in a package should have documentation entries.
* checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'as.trait'
'relationSource' 'source'
Undocumented arguments in documentation object 'as.traitdata'
'id.vars' 'mutate' 'thesaurus'
Undocumented arguments in documentation object 'standardize'
'warnings'
Undocumented arguments in documentation object 'standardize.exploratories'
'plots' 'user' 'pswd' 'verbose' 'warnings'
Undocumented arguments in documentation object 'standardize.taxonomy'
'...'
Undocumented arguments in documentation object 'standardize.traits'
'...'
Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter 'Writing R documentation files' in the 'Writing R
Extensions' manual.
* checking Rd contents ... WARNING
Argument items with no description in Rd object 'as.trait':
'version' 'author' '...'
Argument items with no description in Rd object 'as.traitdata':
'...'
Argument items with no description in Rd object 'standardize.traits':
'output'
* checking data for non-ASCII characters ... WARNING
Warning: found non-ASCII strings
* checking data for ASCII and uncompressed saves ... OK
WARNING
'qpdf' is needed for checks on size reduction of PDFs

Notes

* checking installed package size ... NOTE
installed size is 34.0Mb
sub-directories of 1Mb or more:
Meta 8.0Mb
R 3.0Mb
data 8.0Mb
doc 4.0Mb
help 5.0Mb
html 2.0Mb
* checking DESCRIPTION meta-information ... NOTE
License components which are templates and need '+ file LICENSE':
MIT
* checking top-level files ... NOTE
Non-standard files/directories found at top level:
'LICENSE.md' 'data_test' 'docs' 'draft'
* checking R code for possible problems ... NOTE
get_gbif_taxonomy: no visible binding for global variable 'matchtype'
get_gbif_taxonomy: no visible binding for global variable 'status'
mutate.traitdata: no visible binding for global variable 'traitName'
read.service: no visible global function definition for 'read.table'
read.service.blocks: no visible global function definition for
'zip.unpack'
read.service.blocks: no visible binding for global variable
'read.table'
standardize.exploratories: no visible binding for global variable
'Landuse'
standardize.exploratories: no visible binding for global variable
'Plot_ID'
standardize.traits: no visible binding for global variable
'traitNameStd'
Undefined global functions or variables:
Landuse Plot_ID matchtype read.table status traitName traitNameStd
zip.unpack
Consider adding
importFrom("utils", "read.table", "zip.unpack")
to your NAMESPACE file.
* checking Rd line widths ... NOTE
Rd file 'as.thesaurus.Rd':
\examples lines wider than 100 characters:
traitDescription = c("body length in mm", "length of antenna in mm", "length of metafemur in mm", "eye width in mm"),
traits1 <- as.thesaurus(read.csv("https://raw.githubusercontent.com/EcologicalTraitData/TraitDataList/master/traitdatastandard_traitlis ... [TRUNCATED]
Rd file 'cast.traitdata.Rd':
\examples lines wider than 100 characters:
traits = c("Body_Size", "Dispersal_ability", "Feeding_guild","Feeding_guild_short", "Feeding_mode", "Feeding_s ... [TRUNCATED]
metadata = list(license = "http://creativecommons.org/publicdomain/zero/1.0/")
Rd file 'get_gbif_taxonomy.Rd':
\examples lines wider than 100 characters:
get_gbif_taxonomy(c("Chorthippus albomarginatus", "Chorthippus apricarius", "Chorthippus biguttulus", "Chorthippus dorsatus", "Chorthip ... [TRUNCATED]
Rd file 'mutate.traitdata.Rd':
\examples lines wider than 100 characters:
traits = c("Body_Size", "Dispersal_ability", "Feeding_guild","Feeding_guild_short", "Feeding_mode", "Feeding_s ... [TRUNCATED]
metadata = list(license = "http://creativecommons.org/publicdomain/zero/1.0/")
Rd file 'rbind.traitdata.Rd':
\examples lines wider than 100 characters:
dataset2 <- mutate.traitdata(dataset2, antenna_length = Antenna_Seg1 + Antenna_Seg2 + Antenna_Seg3 + Antenna_Seg4 + Antenna_Seg3 )
Rd file 'standardize.traits.Rd':
\examples lines wider than 100 characters:
dataset2 <- mutate.traitdata(dataset2, antenna_length = Antenna_Seg1 + Antenna_Seg2 + Antenna_Seg3 + Antenna_Seg4 + Antenna_Seg3 )
These lines will be truncated in the PDF manual.

call traitdatastandard directly from github source

include updated version,

add 'order' to traitdatastandard

standardize.exploratories()

function to resolve a locationID based on Biodiversities EPPlotID. matches provided ID against lookup table and fills in longitude and latitude.

define output of standardize.taxonomy()

short scientific name without author info
author info in extra column
kingdom, order, family only if requested by parameter
taxonID and taconRank moved to back of output table (change order in template file)

refine warnings on get_gbif_taxonomy

Loving get_gbif_taxonomy so far.

I just ran it on a list my colleagues maintain of about 20,000 "valid" names of hymenopteran species. About 16 came back with the warning " Selected first of multiple equally ranked concepts!". Of these, the majority meet the following condition: scientificName == scientificNameStd. However, the ones that do not (at the treshold I used) seem likely to be mis-matched. It would be super helpful, I think, to provide a different warning on these two cases, as when going through and manually checking results, it's great to have warnings in cases where the automation probably worked, but it's also nice to be able to focus easily on the ones most likely to be a problem.

Thanks!

add unit handling in mutate.traitdata()

take given units into account when calculating derived traits (e.g. ratios) and return new unit.
update standardized terms
keep occurrenceID, if originating from one specimen.

function to combine datasets

a simple merger function that

combines the provided columns and
allows to add metadata columns.

function to cast traitdata back into matrix format

This is applying reshape::cast(). Only for 'aggregated' data without multiple measurements of one trait.

Otherwise, definitions must be set for how to aggregate data, if one taxon has multiple measurements for one trait. It is impossible to make advanced assumptions about the data quality.