dfalster / baad Goto Github PK

BAAD: a Biomass And Allometry Database for woody plants

License: Other

R 24.75% TeX 74.74% Shell 0.52%

baad's Introduction

BAAD: a Biomass And Allometry Database for woody plants

About

The Biomass And Allometry Database (BAAD) contains data on the construction of woody plants across the globe. These data were gathered from over 170 published and unpublished scientific studies, most of which was not previously available in the public domain. It is our hope that making these data available will improve our ability to understand plant growth, ecosystem dynamics, and carbon cycling in the world's woody vegetation. The dataset is described further in the publication

Falster, DS , RA Duursma, MI Ishihara, DR Barneche, RG FitzJohn, A Vårhammar, M Aiba, M Ando, N Anten, MJ Aspinwall, JL Baltzer, C Baraloto, M Battaglia, JJ Battles, B Bond-Lamberty, M van Breugel, J Camac, Y Claveau, L Coll, M Dannoura, S Delagrange, J-C Domec, F Fatemi, W Feng, V Gargaglione, Y Goto, A Hagihara, JS Hall, S Hamilton, D Harja, T Hiura, R Holdaway, LS Hutley, T Ichie, EJ Jokela, A Kantola, JW G Kelly, T Kenzo, D King, BD Kloeppel, T Kohyama, A Komiyama, J-P Laclau, CH Lusk, DA Maguire, G le Maire, A Mäkelä, L Markesteijn, J Marshall, K McCulloh, I Miyata, K Mokany, S Mori, RW Myster, M Nagano, SL Naidu, Y Nouvellon, AP O'Grady, KL O'Hara, T Ohtsuka, N Osada, OO Osunkoya, PL Peri, AM Petritan, L Poorter, A Portsmuth, C Potvin, J Ransijn, D Reid, SC Ribeiro, SD Roberts, R Rodríguez, A Saldaña-Acosta, I Santa-Regina, K Sasa, NG Selaya, SC Sillett, F Sterck, K Takagi, T Tange, H Tanouchi, D Tissue, T Umehara, H Utsugi, MA Vadeboncoeur, F Valladares, P Vanninen, JR Wang, E Wenk, R Williams, F de Aquino Ximenes, A Yamaba, T Yamada, T Yamakura, RD Yanai, and RA York (2015) BAAD: a Biomass And Allometry Database for woody plants. Ecology 96:1445–1445. 10.1890/14-1889.1

At time of publication, the BAAD contained 258526 measurements collected in 175 different studies, from 20950 individuals across 674 species. Details about individual studies contributed to the BAAD are given are available in these online reports.

Using BAAD

The data in BAAD are released under the Creative Commons Zero public domain waiver, and can therefore be reused without restriction. To recognise the work that has gone into building the database, we kindly ask that you cite the above article, or when using data from only one or few of the individual studies, the original articles if you prefer.

There are two options for accessing data within BAAD.

Download compiled database

You can download a compiled version of the database from either:

Ecological Archives. This is the version of the database associated with the corresponding paper in the journal [Ecology]. Link.
Releases we have posted on github.
The baad.data package for R.

The database contains the following elements

data: amalgamated dataset (table), with columns as defined in dictionary
dictionary: a table of variable definitions
metadata: a table with columns "studyName","Topic","Description", containing written information about the methods used to collect the data
methods: a table with columns as in data, but containing a code for the methods used to collect the data. See config/methodsDefinitions.csv for codes.
references: as both summary table and bibtex entries containing the primary source for each study
contacts: table with contact information and affiliations for each study These elements are available at both of the above links as a series of CSV and text files.

If you are using R, by far the best way to access data is via our package baad.data. After installing the package (instructions here), users can run

baad.data::baad_data("1.0.0")

to download the version stored Ecological Archives, or

baad.data::baad_data("x.y.z")

to download an earlier or more recent version (where version numbers will follow the semantic versioning guidelines. The baad.data package caches everything so subsequent calls, even across sessions, are very fast. This should facilitate greater reproducibility by making it easy to depend on the version used for a particular analysis, and allowing different analyses to use different versions of the database.

Further details about the different versions and changes between versions is available on the github releases page and in the CHANGELOG.

Details about the data distribution system

The BAAD is designed to be a living database -- we will be making periodic releases as we add more data. These updates will correspond with changes to the version number of this resource, and each version of the database will be available on github and via the baad.data package. If you use this resource for a published analysis, please note the version number in your publication. This will allow anyone in the future to go back and find exactly the same version of the data that you used.

Rebuilding from source

The BAAD can be rebuilt from source (raw data files) using our scripted workflow in R. Beyond base R, building of the BAAD requires the package 'remake'. To install remake, from within R, run:

# installs the package devtools
install.packages("devtools")
# use devtools to install remake
devtools::install_github("richfitz/remake")

A number of other packages are also required (rmarkdown, knitr, knitcitations, plyr, whisker, maps, mapdata, gdata, bibtex, taxize, Taxonstand, jsonlite). These can be installed either within R using install.packages, or more easily using remake (instructions below).

The database can then be rebuilt using remake.

First download the code and raw data, either from Ecological Archives or from github as either zip file, or by cloning the baad repository:

git clone [email protected]:dfalster/baad.git

Then open R and set the downloaded folder as your working directory. Then,

# ask remake to install any missing packages
remake::install_missing_packages()

# build the dataset
remake::make("export")

# load dataset into R
baad <- readRDS('export/baad.rds')

A copy of the dataset has been saved in the folder export as both rds (compressed data for R) and also as csv files.

Reproducing older versions of the BAAD and the paper from Ecology

You can reproduce any version of the BAAD by checking out the appropriate commit that generated, or using the links provided under the releases tab. For example, to reproduce v1.0.0 of the database, corresponding to the paper in Ecology and the manuscript submitted to Ecology:

git checkout v1.0.0

Then in R run

remake::make("export")
remake::make("manuscript")

Contributing data to the BAAD

We welcome further contributions to the BAAD.

If you would like to contribute data, the requirements are

Data collected are for woody plants
You collected biomass or size data for multiple individuals within a species
You collected either total leaf area or at least one biomass measure
Your biomass measurements (where present) were from direct harvests, not estimated via allometric equations.
You are willing to release the data under the Creative Commons Zero public domain dedication.

See these instructions on how to prepare and submit your contribution.

Once sufficient additional data has been contributed, we plan to submit an update to the first data paper, inviting as co authors anyone who has contributed since the first data paper.

Acknowledgements

We are extremely grateful to everyone who has contributed data. We would also like to acknowledge the following funding sources for supporting the data compilation. D.S. Falster, A. Vårhammer and D.R. Barneche were employed on an ARC discovery grant to Falster (DP110102086) and a UWS start-up grant to R.A. Duursma. R.G. FitzJohn was supported by the Science and Industry Endowment Fund (RP04-174). M.I. Ishihara was supported by the Environmental Research and Technology Development Fund (S-9-3) of the Ministry of the Environment, Japan.

baad's People

Contributors

Stargazers

Watchers

Forkers

jscamac infotroph aammd npp2016 dlebauer shawking rafapoyatos npp97-field seanth sunxm19 annakrystalli fguilhaumon wengensheng hrlai fdbesanto2 qiansong-cherry willharrigan

baad's Issues

Check encoding option

Previously, the 'latin1' encoding causes most data of Ishihara0000 to be not read in at all. Check encoding settings elsewhere.

Yamada2000 and Yamada1996 - Incorrect status info

The author replied the same thing for both studies:
6. Is the 'Stand description' complete?
If not, could you please provide more information? Codes and legends for Growing_condition and Status are found within Variable.definition.csv file attached.
ANSWER: status a primay hill dipterocarp forest

I have added a new_question.txt file asking for it be fixed according to our standards.

Review Issue - Delagrange0000a - outliers

On the reply file 'report1-questions.txt' the author recognised that some data had incorrect units (I corrected those already) others are outliers. What do you want to do?
See reply below:
9. Please review the plots showing how your data compares to other data in the study. Does your data fall well within the rest of the dataset? Are there outliers?
If so, could you please review the original data file you provided us with and verify that all information is correct.
ANSWER: The crown width of some individuals and 2 LMA value were out of fit. These data were not in the right unit and it was corrected in "Cor_data_delagrange0000a.csv". Others are outliers.

Some data has disappeared?

In previous reports, it says we have 14152 measurements, but we now have only 13671, although I don't recall deleting anything?

I looked at the number of observations in the raw data file, vs what shows up in the final dataset (code further below).

Results:
study nmiss
1 Coll2008 132
2 McCulloh2010 348
3 O'Grady2006 10
4 Osada0000 184
5 Osada2003 223
6 Osada2005 49
7 Peri0000 12
8 Peri2008 9
9 Peri2011 9
10 Selaya2007 464
11 Selaya2008 535
12 Selaya2008b 263
13 Wang2011 4

NOTE: McCulloh2010 needs to have many missing ones, because all samples were branch-level, only some were tree level. We have to go through the other ones and check!

dat <- loadStudies()$data
studynames <- unique(dat$dataset)
rawdat <- lapply(studynames, readDataRaw)

nstudy <- as.vector(table(dat$dataset))
nstudyraw <- sapply(rawdat, nrow)
nmiss <- nstudyraw - nstudy
df <- data.frame(study=studynames[nstudyraw > nstudy],
nmiss = nmiss[nmiss >0]
)
df

Aiba2005 - disregard light?

The author replied

I see in your paper that you provide an average value for canopy openness [average 5.9 +- 0.7%], however we did not get individual information. Would you be willing to share individual-based info in case you have it?
ANSWER: Canopy openness is available only for three individuals per species and therefore would be useless for purpose of a meta-analysis.

I followed the author's advice and removed this information. I did this in a separate commit on purpose in case you want to go back and keep it in the final dataset.

New report function that emails correct files

Change leaf N to per mass, so consistent with nitrogen measures

in prepmapInfo function (plotting.r), why do some locations no return a counrty?

Are they in the sea?

Test code:

source('report/report-fun.R')
dat <- loadStudies(reprocess=FALSE)
 prepdata  <-  prepMapInfo(dat$data)

Peri 2008 and 2011 - refv style

Dan, could you please edit the bibtex reference for these studies? One (2008) is a proceedings abstract in spanish and the other (2011) seems to be a book chapter.
The full references provided by the author can be found within the review folder.

Stancioiu2005 - coordinates

Guys, we got the following reply from the author

Do your locations fall in the right spot in both world and country map?
If not, please outline the issues here and provide us with updated longitude and latitude data.
ANSWER: MAP IS CORRECT BUT LONG SHOWS NEGATIVE AND IT SHOLD BE POSITIVE

The longitude we have is -123.6556, which corresponds to California in the US. So I don't see any problems? Why does the author think that it should be positive?

Send test email to before sending to entire group

Sending email to the following people, based on code as of commit e685da2

rm(list=ls())

source('report/report-fun.R')
dat <- loadStudies(reprocess=TRUE)

emailReport(dat, "Lusk2012")
emailReport(dat, "Kelly0000a")
emailReport(dat, "Baltzer2007")
emailReport(dat, "Bond-Lamberty2002")

O'Hara0000 - reference

The author provided an 'in review' reference:
O'Hara, K.L., York, R.A. (in review). Leaf area and crown architecture in a giant sequoia spacing study: A new model for LAI development. Forest Science

what do we do?

Change variable names

During review Remko and Daniel identified number of variable name changes, in column in variable definitions file. Need a function to find and replace a variable name throughout project. Untested draft

    fn <- list.files(pattern="\\.R", recursive=TRUE, full.names=TRUE)
    for(i in 1:length(fn)){
       r <- readLines(fn[i])
       r <- gsub("raw","rawdata",r)
     writeLines(r, fn[i])
    }

Non-existing data

For studies in which a certain variable (e.g., 'traits') was not measured, how do we treat it?
Keep as an empty cell, call it NA, N/A, or delete that row?

Revise report so that only requests "Essential" variables (i.e. not mat, map, family)

Review Issue - Bond-Lamberty2002 - some height data are wrong.

Check leaf area for all conifers

Conifer leaf area - define ideal, then implement it.
Definition should be 'half of total surface area' (other measurements often include either 'projected area', or 'total surface area')
check all studies with conifers to standardise measurements

Sterck0000 - Outliers

Please review the plots showing how your data compares to other data in the study. Does your data fall well within the rest of the dataset? Are there outliers?
If so, could you please review the original data file you provided us with and verify that all information is correct.
ANSWER: for some surface area plots, the data are clear outliers, but apparantly because similar combinations of traits were not available for studies with large trees (and large surface areas....)?

O'Hara0000 - metadata

The author did not provide a metadata, I have added a new_question.txt file requesting to do so.

Attributes to a.ilf and m.ilf

dat <- loadStudies(reprocess=TRUE)
str(dat$data)

note the
..- attr(*, "names")= chr NA NA NA NA ...

for a.ilf and m.ilf
This is a side effect of a data processing problem?

O'Hara1995 - no stand description

The author provided incomplete information
6. Is the 'Stand description' complete?
If not, could you please provide more information? Codes and legends for Growing_condition and Status are found within Variable.definition.csv file attached.
ANSWER: STAND DESCRIPTION: Multiaged ponderosa pine stands on range of soils and plant associations.

I have added a new_question.txt file so he can re-check the answer based on variable definitions guidelines.

Remove vegetation type for any non-field grown plants

Review Issue - O'Hara1995 - all-sided-leaf area

What do we do with this? dataMatchColumns indicates m2 anyway, but does this confer some weird comparisons?

It seems that the units you provided for leaf area might be wrong. You provided m2 but we suspect it may be cm2. Could you please double-check this information? The original indicates m2 indeed, but maybe the original dataset is expressed in cm2?
ANSWER: should be m2, but should be specified as "all-sided leaf area"

Review Issue - Ilomaki2003 - Stand?

Guys please check this out.. it will be easy to modify but we need to first arbitrarily define one status category for each stand description.

Is the 'Stand description' complete?
If not, could you please provide more information? Codes and legends for Growing_condition and Status are found within Variable.definition.csv file attached.
ANSWER: I don't quite understand what is meant by the codes dominant, codominant and suppressed in terms of a stand as they relate to trees, but in my understanding the sparse stand was all dominants, the medium and the dense had dominants and suppressed trees (now the medium seems to have the most dominant trees)

Check methods for height of crown base and crown depth

Check methods variables for h.c
Delete c.d variable, for these studies ensure h.c is calculated correctly (Aiba2005 Delagrange2004 Osada0000 Osada2003 Osada2005 Osunkoya2007 Petritan2009 Sterck0000)

Function to check setup is correct

Check input files (make function to do this routinely, perhaps as
the project files are updated).
- order of column names should correspond
- no dupicate column names
- no NA values in var_in (e.g., was in O'Hara0000)
  This is harder than it looks because things like "Genus" are
  sometimes missing from the matching columns. Some work will be
  needed here.

Write new email to send when everything is ok or needs further review.

Resolve taxonomy (family names, higher?)

using taxize or other method?

King1994 - Stand description

How do we define this description within our status categories?

"Saplings in understory and gaps of tropical humid forest on Barro Colorado Island, Panama"

Fix lai values that are text

Review Issue - Martin1998 - TempF or TempRF

The author said that the vegetation type was 'mesic temperate deciduous forest'
How does that fit into our existing categories? TempF or TempRF?

Check branch mass and determine consistent definition and check implemented this way

Check all studies with branch mass. Is definition consistent across studies?

Review Issue - O'Hara1995 - sapwood area

The author indicated that one of the variables was mistakenly interpreted

Please review the plots showing how your data compares to other data in the study. Does your data fall well within the rest of the dataset? Are there outliers?
If so, could you please review the original data file you provided us with and verify that all information is correct.
ANSWER: WE DID NOT Have sapwood data at crown base so the one graph that shows our data as outliers is incorrect. We should have no data on that graph.

I'll leave to you to decide what to do with the variable a.ss2 (wrongly converted into a.ssbc)

User manual, exp for importing data

Review instructions in readme file

Parviainen1999 - report1_questions.txt came back empty

Will close this when the author returns the correct file.

Process for managing "stage" of import for each data folder

Want to manage which data folders are complete or incomplete within data folder.

Review Issue - Nouvellon2010 - Solve problem with bib ref file

Abstract has special characters and author's initials do not match the actual citation. How do we solve this?

Here's a possible solution that I tried for problem with authors' names:
http://www.tex.ac.uk/cgi-bin/texfaq2html?label=bibtranscinit

What do you think?

Review Issue - Aiba2007 - No height from base to first branch?

We originally added a question:

Is crown height equivalent to height to crown base, being the height of first branch bearing a live leaf?
ANSWER: Yes.

I went back to check the data.csv file but can't see this variable there. Did we delete it at some point? If so, why?

Ribeiro2011 - Delete 0 leaf mass?

There are 9 data points where 'Leaf mass (kg)' is zero. Can you please check whether these data are correct?

ANSWER: The data is correct. The data are from deciduous species that at the moment were without leaves.

Send reminder email to contributors who haven't repsonded

Add custom diff to fix excel line endings problem

Search on git custom diff files
possible solution?
http://git-scm.com/book/en/Customizing-Git-Git-Attributes#Binary-Files

Check and or remove zero values

Parviainen1999 - missing vegetation type

I asked to author for this specific item already, just waiting for the reply.

Kantola2004: individual-specific location still missing

Kantola provided location information for his dataset indicating lat, lon, status and vegetation type. These three locations however, are not linked to any particular individual, which makes it not possible to incorporate. I have added a new_question.txt file into the review folder so we can email him asking to add this information to the data.csv file. This means that the location-level information that he provided will ONLY be incorporated after we can assign each individual to a particular location.

Get missing DOIs and abstracts

Martin1998 - location name missing

I added a new_question.txt file asking for it.

Move key functions to dataMashR package

I have created a new public repo called dataMashR (https://github.com/dfalster/dataMashR). The core data import functions (in R/import.R" should be moved over there and then used within this project as a package installed from github.

Review Issue - Delagrange2004 - leafN

The author has provided the following answer:

Please review the plots showing how your data compares to other data in the study. Does your data fall well within the rest of the dataset? Are there outliers?
If so, could you please review the original data file you provided us with and verify that all information is correct.
ANSWER: It seems that 2 populations or units are used for leaf N concentration in the entire dataset.

Function to compare two csv files and identify differences

We should write a function that compares two csv files and identifies the differences, so that we can see what has changed. Git does this, but only on a line-by-line basis. We want cell by cell.

Any ideas how to do this?

the library testthat can be used to compare to sv files, but only returns true or false,
library(testthat)
df1 <- read.csv(...)
df2 <- read.csv(...)
expect_that(df1, equals(df2))

Delagrange0000a bad h.t

Delagrange0000a has four values where h.t = 0

dfalster / baad Goto Github PK

baad's Introduction

BAAD: a Biomass And Allometry Database for woody plants

About

Using BAAD

Download compiled database

Details about the data distribution system

Rebuilding from source

Reproducing older versions of the BAAD and the paper from Ecology

Contributing data to the BAAD

Acknowledgements

baad's People

Contributors

Stargazers

Watchers

Forkers

baad's Issues

Recommend Projects

Recommend Topics

Recommend Org