Giter VIP home page Giter VIP logo

baad's Introduction

BAAD: a Biomass And Allometry Database for woody plants

Build Status Build status

About

The Biomass And Allometry Database (BAAD) contains data on the construction of woody plants across the globe. These data were gathered from over 170 published and unpublished scientific studies, most of which was not previously available in the public domain. It is our hope that making these data available will improve our ability to understand plant growth, ecosystem dynamics, and carbon cycling in the world's woody vegetation. The dataset is described further in the publication

Falster, DS , RA Duursma, MI Ishihara, DR Barneche, RG FitzJohn, A Vårhammar, M Aiba, M Ando, N Anten, MJ Aspinwall, JL Baltzer, C Baraloto, M Battaglia, JJ Battles, B Bond-Lamberty, M van Breugel, J Camac, Y Claveau, L Coll, M Dannoura, S Delagrange, J-C Domec, F Fatemi, W Feng, V Gargaglione, Y Goto, A Hagihara, JS Hall, S Hamilton, D Harja, T Hiura, R Holdaway, LS Hutley, T Ichie, EJ Jokela, A Kantola, JW G Kelly, T Kenzo, D King, BD Kloeppel, T Kohyama, A Komiyama, J-P Laclau, CH Lusk, DA Maguire, G le Maire, A Mäkelä, L Markesteijn, J Marshall, K McCulloh, I Miyata, K Mokany, S Mori, RW Myster, M Nagano, SL Naidu, Y Nouvellon, AP O'Grady, KL O'Hara, T Ohtsuka, N Osada, OO Osunkoya, PL Peri, AM Petritan, L Poorter, A Portsmuth, C Potvin, J Ransijn, D Reid, SC Ribeiro, SD Roberts, R Rodríguez, A Saldaña-Acosta, I Santa-Regina, K Sasa, NG Selaya, SC Sillett, F Sterck, K Takagi, T Tange, H Tanouchi, D Tissue, T Umehara, H Utsugi, MA Vadeboncoeur, F Valladares, P Vanninen, JR Wang, E Wenk, R Williams, F de Aquino Ximenes, A Yamaba, T Yamada, T Yamakura, RD Yanai, and RA York (2015) BAAD: a Biomass And Allometry Database for woody plants. Ecology 96:1445–1445. 10.1890/14-1889.1

At time of publication, the BAAD contained 258526 measurements collected in 175 different studies, from 20950 individuals across 674 species. Details about individual studies contributed to the BAAD are given are available in these online reports.

Using BAAD

The data in BAAD are released under the Creative Commons Zero public domain waiver, and can therefore be reused without restriction. To recognise the work that has gone into building the database, we kindly ask that you cite the above article, or when using data from only one or few of the individual studies, the original articles if you prefer.

There are two options for accessing data within BAAD.

Download compiled database

You can download a compiled version of the database from either:

  1. Ecological Archives. This is the version of the database associated with the corresponding paper in the journal [Ecology]. Link.
  2. Releases we have posted on github.
  3. The baad.data package for R.

The database contains the following elements

  • data: amalgamated dataset (table), with columns as defined in dictionary
  • dictionary: a table of variable definitions
  • metadata: a table with columns "studyName","Topic","Description", containing written information about the methods used to collect the data
  • methods: a table with columns as in data, but containing a code for the methods used to collect the data. See config/methodsDefinitions.csv for codes.
  • references: as both summary table and bibtex entries containing the primary source for each study
  • contacts: table with contact information and affiliations for each study These elements are available at both of the above links as a series of CSV and text files.

If you are using R, by far the best way to access data is via our package baad.data. After installing the package (instructions here), users can run

baad.data::baad_data("1.0.0")

to download the version stored Ecological Archives, or

baad.data::baad_data("x.y.z")

to download an earlier or more recent version (where version numbers will follow the semantic versioning guidelines. The baad.data package caches everything so subsequent calls, even across sessions, are very fast. This should facilitate greater reproducibility by making it easy to depend on the version used for a particular analysis, and allowing different analyses to use different versions of the database.

Further details about the different versions and changes between versions is available on the github releases page and in the CHANGELOG.

Details about the data distribution system

The BAAD is designed to be a living database -- we will be making periodic releases as we add more data. These updates will correspond with changes to the version number of this resource, and each version of the database will be available on github and via the baad.data package. If you use this resource for a published analysis, please note the version number in your publication. This will allow anyone in the future to go back and find exactly the same version of the data that you used.

Rebuilding from source

The BAAD can be rebuilt from source (raw data files) using our scripted workflow in R. Beyond base R, building of the BAAD requires the package 'remake'. To install remake, from within R, run:

# installs the package devtools
install.packages("devtools")
# use devtools to install remake
devtools::install_github("richfitz/remake")

A number of other packages are also required (rmarkdown, knitr, knitcitations, plyr, whisker, maps, mapdata, gdata, bibtex, taxize, Taxonstand, jsonlite). These can be installed either within R using install.packages, or more easily using remake (instructions below).

The database can then be rebuilt using remake.

First download the code and raw data, either from Ecological Archives or from github as either zip file, or by cloning the baad repository:

git clone [email protected]:dfalster/baad.git

Then open R and set the downloaded folder as your working directory. Then,

# ask remake to install any missing packages
remake::install_missing_packages()

# build the dataset
remake::make("export")

# load dataset into R
baad <- readRDS('export/baad.rds')

A copy of the dataset has been saved in the folder export as both rds (compressed data for R) and also as csv files.

Reproducing older versions of the BAAD and the paper from Ecology

You can reproduce any version of the BAAD by checking out the appropriate commit that generated, or using the links provided under the releases tab. For example, to reproduce v1.0.0 of the database, corresponding to the paper in Ecology and the manuscript submitted to Ecology:

git checkout v1.0.0

Then in R run

remake::make("export")
remake::make("manuscript")

Contributing data to the BAAD

We welcome further contributions to the BAAD.

If you would like to contribute data, the requirements are

  1. Data collected are for woody plants
  2. You collected biomass or size data for multiple individuals within a species
  3. You collected either total leaf area or at least one biomass measure
  4. Your biomass measurements (where present) were from direct harvests, not estimated via allometric equations.
  5. You are willing to release the data under the Creative Commons Zero public domain dedication.

See these instructions on how to prepare and submit your contribution.

Once sufficient additional data has been contributed, we plan to submit an update to the first data paper, inviting as co authors anyone who has contributed since the first data paper.

Acknowledgements

We are extremely grateful to everyone who has contributed data. We would also like to acknowledge the following funding sources for supporting the data compilation. D.S. Falster, A. Vårhammer and D.R. Barneche were employed on an ARC discovery grant to Falster (DP110102086) and a UWS start-up grant to R.A. Duursma. R.G. FitzJohn was supported by the Science and Industry Endowment Fund (RP04-174). M.I. Ishihara was supported by the Environmental Research and Technology Development Fund (S-9-3) of the Ministry of the Environment, Japan.

baad logo

baad's People

Contributors

angevar avatar dbarneche avatar dfalster avatar fguilhaumon avatar hlapp avatar jscamac avatar masaish avatar remkoduursma avatar richfitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

baad's Issues

Check encoding option

Previously, the 'latin1' encoding causes most data of Ishihara0000 to be not read in at all. Check encoding settings elsewhere.

Yamada2000 and Yamada1996 - Incorrect status info

The author replied the same thing for both studies:
6. Is the 'Stand description' complete?
If not, could you please provide more information? Codes and legends for Growing_condition and Status are found within Variable.definition.csv file attached.
ANSWER: status a primay hill dipterocarp forest

I have added a new_question.txt file asking for it be fixed according to our standards.

Review Issue - Delagrange0000a - outliers

On the reply file 'report1-questions.txt' the author recognised that some data had incorrect units (I corrected those already) others are outliers. What do you want to do?
See reply below:
9. Please review the plots showing how your data compares to other data in the study. Does your data fall well within the rest of the dataset? Are there outliers?
If so, could you please review the original data file you provided us with and verify that all information is correct.
ANSWER: The crown width of some individuals and 2 LMA value were out of fit. These data were not in the right unit and it was corrected in "Cor_data_delagrange0000a.csv". Others are outliers.

Some data has disappeared?

In previous reports, it says we have 14152 measurements, but we now have only 13671, although I don't recall deleting anything?

I looked at the number of observations in the raw data file, vs what shows up in the final dataset (code further below).

Results:
study nmiss
1 Coll2008 132
2 McCulloh2010 348
3 O'Grady2006 10
4 Osada0000 184
5 Osada2003 223
6 Osada2005 49
7 Peri0000 12
8 Peri2008 9
9 Peri2011 9
10 Selaya2007 464
11 Selaya2008 535
12 Selaya2008b 263
13 Wang2011 4

NOTE: McCulloh2010 needs to have many missing ones, because all samples were branch-level, only some were tree level. We have to go through the other ones and check!

dat <- loadStudies()$data
studynames <- unique(dat$dataset)
rawdat <- lapply(studynames, readDataRaw)

nstudy <- as.vector(table(dat$dataset))
nstudyraw <- sapply(rawdat, nrow)
nmiss <- nstudyraw - nstudy
df <- data.frame(study=studynames[nstudyraw > nstudy],
nmiss = nmiss[nmiss >0]
)
df

Aiba2005 - disregard light?

The author replied

  1. I see in your paper that you provide an average value for canopy openness [average 5.9 +- 0.7%], however we did not get individual information. Would you be willing to share individual-based info in case you have it?
    ANSWER: Canopy openness is available only for three individuals per species and therefore would be useless for purpose of a meta-analysis.

I followed the author's advice and removed this information. I did this in a separate commit on purpose in case you want to go back and keep it in the final dataset.

Peri 2008 and 2011 - refv style

Dan, could you please edit the bibtex reference for these studies? One (2008) is a proceedings abstract in spanish and the other (2011) seems to be a book chapter.
The full references provided by the author can be found within the review folder.

Stancioiu2005 - coordinates

Guys, we got the following reply from the author

  1. Do your locations fall in the right spot in both world and country map?
    If not, please outline the issues here and provide us with updated longitude and latitude data.
    ANSWER: MAP IS CORRECT BUT LONG SHOWS NEGATIVE AND IT SHOLD BE POSITIVE

The longitude we have is -123.6556, which corresponds to California in the US. So I don't see any problems? Why does the author think that it should be positive?

Send test email to before sending to entire group

Sending email to the following people, based on code as of commit e685da2

rm(list=ls())

source('report/report-fun.R')
dat <- loadStudies(reprocess=TRUE)

emailReport(dat, "Lusk2012")
emailReport(dat, "Kelly0000a")
emailReport(dat, "Baltzer2007")
emailReport(dat, "Bond-Lamberty2002")

O'Hara0000 - reference

The author provided an 'in review' reference:
O'Hara, K.L., York, R.A. (in review). Leaf area and crown architecture in a giant sequoia spacing study: A new model for LAI development. Forest Science

what do we do?

Change variable names

During review Remko and Daniel identified number of variable name changes, in column in variable definitions file. Need a function to find and replace a variable name throughout project. Untested draft

    fn <- list.files(pattern="\\.R", recursive=TRUE, full.names=TRUE)
    for(i in 1:length(fn)){
       r <- readLines(fn[i])
       r <- gsub("raw","rawdata",r)
     writeLines(r, fn[i])
    }

Non-existing data

For studies in which a certain variable (e.g., 'traits') was not measured, how do we treat it?
Keep as an empty cell, call it NA, N/A, or delete that row?

Check leaf area for all conifers

  • Conifer leaf area - define ideal, then implement it.
  • Definition should be 'half of total surface area' (other measurements often include either 'projected area', or 'total surface area')
  • check all studies with conifers to standardise measurements

Sterck0000 - Outliers

  1. Please review the plots showing how your data compares to other data in the study. Does your data fall well within the rest of the dataset? Are there outliers?
    If so, could you please review the original data file you provided us with and verify that all information is correct.
    ANSWER: for some surface area plots, the data are clear outliers, but apparantly because similar combinations of traits were not available for studies with large trees (and large surface areas....)?

O'Hara0000 - metadata

The author did not provide a metadata, I have added a new_question.txt file requesting to do so.

Attributes to a.ilf and m.ilf

dat <- loadStudies(reprocess=TRUE)
str(dat$data)

note the
..- attr(*, "names")= chr NA NA NA NA ...

for a.ilf and m.ilf
This is a side effect of a data processing problem?

O'Hara1995 - no stand description

The author provided incomplete information
6. Is the 'Stand description' complete?
If not, could you please provide more information? Codes and legends for Growing_condition and Status are found within Variable.definition.csv file attached.
ANSWER: STAND DESCRIPTION: Multiaged ponderosa pine stands on range of soils and plant associations.

I have added a new_question.txt file so he can re-check the answer based on variable definitions guidelines.

Review Issue - O'Hara1995 - all-sided-leaf area

What do we do with this? dataMatchColumns indicates m2 anyway, but does this confer some weird comparisons?

  1. It seems that the units you provided for leaf area might be wrong. You provided m2 but we suspect it may be cm2. Could you please double-check this information? The original indicates m2 indeed, but maybe the original dataset is expressed in cm2?
    ANSWER: should be m2, but should be specified as "all-sided leaf area"

Review Issue - Ilomaki2003 - Stand?

Guys please check this out.. it will be easy to modify but we need to first arbitrarily define one status category for each stand description.

  1. Is the 'Stand description' complete?
    If not, could you please provide more information? Codes and legends for Growing_condition and Status are found within Variable.definition.csv file attached.
    ANSWER: I don't quite understand what is meant by the codes dominant, codominant and suppressed in terms of a stand as they relate to trees, but in my understanding the sparse stand was all dominants, the medium and the dense had dominants and suppressed trees (now the medium seems to have the most dominant trees)

Function to check setup is correct

  • Check input files (make function to do this routinely, perhaps as
    the project files are updated).
    • order of column names should correspond
    • no dupicate column names
    • no NA values in var_in (e.g., was in O'Hara0000)
      This is harder than it looks because things like "Genus" are
      sometimes missing from the matching columns. Some work will be
      needed here.

King1994 - Stand description

How do we define this description within our status categories?

"Saplings in understory and gaps of tropical humid forest on Barro Colorado Island, Panama"

Review Issue - O'Hara1995 - sapwood area

The author indicated that one of the variables was mistakenly interpreted

  1. Please review the plots showing how your data compares to other data in the study. Does your data fall well within the rest of the dataset? Are there outliers?
    If so, could you please review the original data file you provided us with and verify that all information is correct.
    ANSWER: WE DID NOT Have sapwood data at crown base so the one graph that shows our data as outliers is incorrect. We should have no data on that graph.

I'll leave to you to decide what to do with the variable a.ss2 (wrongly converted into a.ssbc)

Review Issue - Aiba2007 - No height from base to first branch?

We originally added a question:

  1. Is crown height equivalent to height to crown base, being the height of first branch bearing a live leaf?
    ANSWER: Yes.

I went back to check the data.csv file but can't see this variable there. Did we delete it at some point? If so, why?

Ribeiro2011 - Delete 0 leaf mass?

  1. There are 9 data points where 'Leaf mass (kg)' is zero. Can you please check whether these data are correct?

ANSWER: The data is correct. The data are from deciduous species that at the moment were without leaves.

Kantola2004: individual-specific location still missing

Kantola provided location information for his dataset indicating lat, lon, status and vegetation type. These three locations however, are not linked to any particular individual, which makes it not possible to incorporate. I have added a new_question.txt file into the review folder so we can email him asking to add this information to the data.csv file. This means that the location-level information that he provided will ONLY be incorporated after we can assign each individual to a particular location.

Review Issue - Delagrange2004 - leafN

The author has provided the following answer:

  1. Please review the plots showing how your data compares to other data in the study. Does your data fall well within the rest of the dataset? Are there outliers?
    If so, could you please review the original data file you provided us with and verify that all information is correct.
    ANSWER: It seems that 2 populations or units are used for leaf N concentration in the entire dataset.

Function to compare two csv files and identify differences

We should write a function that compares two csv files and identifies the differences, so that we can see what has changed. Git does this, but only on a line-by-line basis. We want cell by cell.

Any ideas how to do this?

the library testthat can be used to compare to sv files, but only returns true or false,
library(testthat)
df1 <- read.csv(...)
df2 <- read.csv(...)
expect_that(df1, equals(df2))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.