ropensci / allodb Goto Github PK
View Code? Open in Web Editor NEWAn R package for biomass estimation at extratropical forest plots.
Home Page: https://docs.ropensci.org/allodb/
License: GNU General Public License v3.0
An R package for biomass estimation at extratropical forest plots.
Home Page: https://docs.ropensci.org/allodb/
License: GNU General Public License v3.0
@teixeirak and @gonzalezeb,
allodb is now public (following email thread). Here are the changes I made to adapt to this openess.
Added a few documents into .github/. This are fairly standard documents adapted from the tidyverse. For example, the document ISSUE_TEMPLATE
is automatically used to guide users every time they open an new issue on GitHub.
Added some badges to reflect development status. The build status updates automatically every time we push a new commit to master branch.
Tweaked the website a little bit -- more changes will come as I improve the template for all fgeo websites.
@gonzalezeb, please confirm the type of new columns:
New column should be
-------------------------------
sample_size integer
site_dbh_unit character
equation_form character
equation_allometry character
To ensure column types are interpreted as expected, write function to output column types to be passed to col_types
(see ?readr::read_csv
). For example, I used this approach in fgeo.tool::type_vft
(https://forestgeo.github.io/fgeo.tool/reference/type_fgeo.html). The type of each column Erika documented as metatada-tables.
The code below outputs a list that can be passed to col_type
of readr::read_csv()
. Before I can do this I need help from @gonzalezeb. I asked her to clean the column names of the master data (#31).
# Import and clean --------------------------------------------------------
path_to_data <- here::here("data-raw/allotemp_main.csv")
master <- readr::read_csv(path_to_data, col_types = types_allodb_master)
# This prints to screen the contents of types_allodb_master (see below)
library(tidyverse)
types <- map(master, class) %>%
enframe() %>%
unnest() %>%
mutate(
type = case_when(
value == "character" ~ "c",
value == "integer" ~ "i",
value == "numeric" ~ "d"
),
type = paste0(name, " = '", type, "',")
) %>%
pull(type)
cat(types)
# Determine column type explicitely to avoid surprises (see readr::read_csv)
# c = character,
# i = integer,
# n = number,
# d = double,
# l = logical,
# D = date,
# T = date time,
# t = time,
# ? = guess,
# _/- to skip the column.
types_allodb_master <- list(
# xxx
)
Once this is done I need to ask @gonzalezeb to confirm the type of each variable.
Some data has the wrong encoding. devtools::check()
throws these warnings:
Following this post, below is my best attempt to fix the problem. But the solution is not good enough: At best, the non-ASCII characters are removed. What I want it to replace them with the correct character.
Suzanne said the encoding is "latin1" (https://goo.gl/KZiVbQ). But the conversion from latin-ascii doesn't work well enough (see below).
Maybe if I receive the data in .csv format? And I can read it with the right encoding? Something like this: read.csv(data, encoding = "latin1")
.
library(tidyverse)
library(stringi)
library(allodb)
WSG %>%
mutate(encode = stri_enc_mark(species)) %>%
filter(encode != "ASCII") %>%
transmute(
original = species,
with_stri = stri_trans_general(species, "latin-ascii"),
with_iconv = iconv(species, "latin1", "ASCII", sub = "")
)
# # A tibble: 7 x 3
# original with_stri with_iconv
# <chr> <chr> <chr>
# 1 bigll3¡ bigll3A,A¡ bigll3
# 2 pequeña pequeAfA±a pequea
# 3 sp. ‘hairy’ sp. A¢a,¬EoehairyA¢a,¬a,,¢ sp. hairy
# 4 ‘giant A¢a,¬Eoegiant giant
# 5 dewevrei (De Wild.) J.LÀ dewevrei (De Wild.) J.LA-a,¬ dewevrei (De Wild.) J.L
# 6 normandii AubrÀŒ©v. & Pe normandii AubrA-a,¬A'A(C)v. & Pe normandii Aubrv. & Pe
# 7 pellegrinianum (J.LÀŒ©on pellegrinianum (J.LA-a,¬A'A(C)on pellegrinianum (J.Lon
Jim Lutz recommends that the R code should generate warning messages when unreliable allometries are applied.
There are several issues that need to be decided:
We don't currently have any way to estimate the error on biomass estimates for individual trees, and I'm not sure its possible in the context of what we're doing.
@ervanSTRI, do you have any advice on this?
Hi Ervan, Erika and Krista,
(@ervanSTRI, @gonzalezeb, @teixeirak)
I'm following up the idea of building a function to calculate biomass for sites in the CTFS-ForestGEO network. I now need a table with one allometric equation per site and I would like to start with tropical forests.
-- Ervan, I remember you offered such a table; can you share it with me now?
-- Erika and Krista, I hope to meet you in the upcoming weeks.
BACKGROUND
I wrote a prototype function that calculates biomass using either default equations or user's equations -- whichever is most specific. The prototype is explained here: https://forestgeo.github.io/bmss/articles/biomass.html. Now the function works with a dummy data set.
These are my next steps:
(1) For site-level equations, replace the dummy data set by the real data set for tropical forests (Ervan);
(2) Same for temperate forests (Erika and Krista);
(3) For species-level equations, replace the dummy data set by the real data set for tropical forests (Ervan);
(4) For species-level equations, replace the dummy data set by the real data set for tropical forests (Erika and Krista).
Cheers,
Mauro
From: Helene Muller-Landau [[email protected]]
Sent: Tuesday, February 13, 2018 4:14 PM
best,
Helene
On Tue, Feb 13, 2018 at 3:07 PM, Teixeira, Kristina A. [email protected] wrote:
Hi Helene,
Erika’s digging into adding allometry fitting methods to her database. What are the key features of fitting methods that you’d recommend she be sure to capture? Thanks, K.
The following files live in data-raw. Could you please document their source and purpose?
See the following files in the directory data-raw/is_this_to_document_or_relocate/:
If you are unsure whether these files should live in data-raw or elsewhere, talk to me or see http://r-pkgs.had.co.nz/data.html.
An issue (more of a reminder to myself) to tackle is to convert biomass units from original equations to the final output unit we want allodb to give (kg, Mg). That conversion factor (convert from inches, mm to cm, etc) should be incorporated in the equation. Of special attention is the DBH used to built the original allometry. We have two options:
(Just to keep track of when and who has been invited.)
From: Teixeira, Kristina A.
Sent: Wednesday, January 24, 2018 1:40:02 PM (UTC-05:00) Eastern Time (US & Canada)
To: Lepore, Mauro; Rutishauser, Ervan; Muller-Landau, Helene; Davies, Stuart J.; McMahon, Sean; Arellano, Gabriel; Nathan G. Swenson; Wright, Joe; Jim Lutz
Cc: Gonzalez, Erika B.
Subject: ForestGEO biomass allometry data pub
Hi all,
As most/all of you already know, Mauro is preparing a new function to calculate biomass for the CTFS R package, and Erika is working to compile the best available biomass allometries and wood density data for all ForestGEO sites. Specifically, she’s consulting site PIs (particularly for temperate sites) as to the best allometries for their site, compiling species-specific allometries for temperate sites, compiling wood density data for tropical species, etc. We are aiming to formally document this and publish it as a data paper in Ecology, which will provide citable documentation of our methods and present a platform for future updates. We plan to invite all site PIs who contribute allometries or wood density data as coauthors.
At this early stage, we’d like to set up a videoconference for those of you who have already been contributing to this discussion and/or have significant interest and expertise in the subject in order to give an overview of our plans and get your comments. Those of you on this email are people who I think will be most interested in this effort, but please don’t feel obligated to engage, and feel free to add in anyone I’ve missed that you think would like to engage on this level.
We’re hoping to set up a meeting sometime within the next couple weeks. Erika will set up a poll.
We are managing this project in Github, and anyone who would like to have access to the repository should let Mauro know.
all my best,
Krista
This issue is in relation to issue #38, about the use of dba (diameter at stem base) in shrub allometries.
I have summarized what I think is the correct way to present these calculations. I will need Mauro’s help to convert this into a function that depends only on DBH.
Problem (this paragraph can be included in the description of the function):
Most available equations to calculate shrub biomass in temperate regions (Smith et al 1983, Lutz et al, 2014, Halpern and Millet, 1996) use diameter at the base of the stem (15 cm above ground or close to ground) as independent variable. Given that at CTFS-ForestGEO plots the standardized diameter measurement for woody stems is at breast height (DBH, 1.37 m), we WROTE a function to use stem DBH as input to calculate biomass for some shrub species (see site.species table). We suggest the following:
Potential solution (where I need help)..
Step 1=Calculate the basal area contribution of each stem within a tree
BA <- (pi/4) * dataset$dbh^2
Step 2= Sum of basal area of each stem to get basal area of per tree
tree.sum.BA.un <- tapply(BA, dataset$tag, sum, na.rm = T)
Step 3= Calculate the contribution of each stem to sum of basal area of a tree
BA.contribution <- BA / tree.sum.BA
a) If an allometry equation use DBA (diameter at base) as independent variable then:
Step 4= calculate the diameter at the base of shrub, assuming area preserving
Step 5= calculate AGB using the basal diameter equation (a*(DBA^b) or exp(a+b*ln(DBA))
Step 6= Redistribute the biomass of the main stem to other stems, using the basal area contribution
b) If DBH is the independent variable in the equation (assuming that only the diameter of the main stem was measured in a shrub), then:
Step 4= identify the main stem (stem with the largest DBH).
Step 5= calculate AGB using DBH of main stem
Step 6= redistribute the biomass to other stems within the tree, using the basal contribution of each stem.
Erika pushed her dataset Allometries_Temperate sites.xlsx.
Consider building a template for other databases the researchers may want to offer web-browsing.
The master data contains different representations of missing values, which are described here:
Now there is a problem. If we specify all possible representations of missing values (e.g. via the argument na
to read_csv()
), then we lose information of what kind of missing value each one it is.
A simple solution, I think, is to represent the kind of missing value as a new column. The original representations will all be coerced to NA but we could identify what kind of NA it is using the new column.
Here are the columns that have some representation of missing values, and the corresponding kind:
$`wsg`
[1] "NRA"
$wsg_id
[1] NA
$wsg_specificity
[1] NA
$c
[1] NA
$d
[1] NA
$dbh_min_cm
[1] "NI"
$dbh_max_cm
[1] NA "NI"
$sample_size
[1] NA "NRA"
$equation_id
[1] NA
$regression_model
[1] NA
$other_equations_tested
[1] NA "NRA"
$log_biomass
[1] NA
$bias_corrected
[1] NA
$bias_correction_factor
[1] NA "NRA"
$notes_fitting_model
[1] NA
$development_species
[1] NA
$ref_id
[1] NA
$wsg_source
[1] NA
$ref_wsg_id
[1] NA
$original_data_availability
[1] NA
$notes_to_consider
[1] NA
$warning
[1] NA
How do I generate a new equation_id at the same time that I am:
This is a direct question for @maurolepore
Some sites have local height allometries, which can be used to improve estimates.
We need to plan the structure for incorporating these in the database.
This is something that @gonzalezeb, @maurolepore, and @teixeirak should discuss in person.
Making a table of site and equaiton – based on E and wood-density – is possible. Currently, such equaiton is in Ervan’s code. Example from comp.AGB():
On 2018-02-27 Erika and Krista and I met at SCBI. These are the topics we covered:
Workflow (Erika and Mauro)
Structure of the allodb project (Erika and Mauro)
Pendent issues (Erika, Krista and Mauro)
We discussed and updated open issues.
FastField (Erika and Mauro)
We discussed work I've done with Jess Shue in preparation for using the FastField software to collect census data.
Overview of fgeo (All of Krista's lab)
I introduced Krista's lab to the fgeo package -- the single place where to look for all things related to ForestGEO's software.
I’ll study the table and try it with the code I have.
https://twitter.com/mauro_lepore/status/968817868262526978
I'll be chatting with Daniel Falster around later in March (2018). Daniel is the author of the database baad, published in Ecology.
I plan to ask Daniel general questions about his experience building baad -- mostly from the software side of things. @teixeirak and @gonzalezeb, if you want me to ask him some questions from the scientific side of things, please list them in this issue.
Include this in references:
Milles, P.D., Smith. W.B (2009) Specific Gravity and Other Properties of Wood and Bark for 156 Tree Species Found in North America. Research Note NC-38. Newton Square, PA: U.S. Dept. of Agriculture, Forest Service, North Research Station.
I think you should install tools for package development. Even if you don't use them directly, you may want to run code that I wrote -- which often uses such tools.
To do this easily see this article.
Windows: Install Rtools. This is not an R package! It is “a collection of resources for building packages for R under Microsoft Windows, or for building R itself”. Go to https://cran.r-project.org/bin/windows/Rtools/ and install as instructed.
-- From http://usethis.r-lib.org/articles/articles/usethis-setup.html
I use this packages a lot and you may come accross scripts that you need to run that require this packages. If you install them now, your workflow won't be interrupted later.
Now the type of the column wsg
is defined as character. This is likely a bug and and it should be of type double. @gonzalezeb can you confirm?
(Pulled from #12 (comment))
As we'll have a ton of coauthors on this paper, it will help to create a spreadsheet with name, email, affiliation. I have one from our GCB review that can be modified.
A reference for each dataset is stored and automatically tested for changes whenever we run all tests. I've just updated the reference datasets. A quick look to the difference between the old and new references suggests the changes are intentional.
This feature can be very useful in helping detect unintentional changes. @gonzalezeb let me know if you at some point you want to learn how to use it yourself. You'll need to install devtools and testthat.
This was a silent issue. I should write a test to expose it should it happen again (which is most likely).
The database will allow application of different equations to trees of different size (e.g., switch from a highly specific allometry for small sizes to a more generic one at sizes above that sampled for the specific one). This is desirable. However, when a tree crosses a size threshold, estimates of woody growth and ANPP will be seriously flawed. To address this, we need to incorporate code that forces the application of only one allometric equation-- even outside its range--to trees that grow over such a threshhold.
Create R/data.R and create the roxygen skeleton to document all datasets in data/. Then ask Erika to fill the gaps in the documentation, reveal the missing datasets, and the missmatches between the data and the data_metadata -- which BTW may need to be specified programatically so it doesn't come out of sync (via some function that would compare the names of the columns in each data set with the values of "Field" in its corresponding metadata-dataset).
Follows https://github.com/forestgeo/allodb/issues/36#issuecomment-423217920.
Here is a good way to generate random ids in R: ids::random_id().
Disregard this issue. Put here in error.
Relates to https://github.com/forestgeo/allodb/issues/36.
For few sites, which species list I got from ForestGEO website I don't have a code.
Here are some ways I think we may deal with this issue:
Throw a warning indicating the species that match the user's data for which allodb has no species-code. The match may be by species name -- if the user provides spcies data -- or by site or region, in which case we can list all species in that site or region with unknown species-code.
Provide some way of filling the gap, e.g. ask the user to provide a table with species-names and species-codes.
Could the opposite problem happen? That is, can the user user provide codes that don't match any code in allodb? We could deal with this in a way similar to that described above.
I plan to write a test to automatically check that each equation can be evaluated -- i.e. that it contains valid R code. Once that is done, we'll be able to press Control + Shift + T
to run this (and all other) test, and we'll get an informative error message if some equation isn't valid R code.
FYI I have stored a copy of each dataset as a reference to test for unexpected changes. You can see the datasets HERE; they are named with the format ref-datasetname
.
Although poorly, this feature is documented in the help file of testthat::expect_known_output(). I suggest you don't worry about this for now -- it is better to talk about this in person and I can show you how it works.
I leave this here as a reminder for future discussion, but I close the issue because it requires no more action.
This issue describes the process by which I export and document data. You can track this process searching for commits tagged with the number of this issue (#29).
To help assess accuracy of allometries, this table should contain the following information:
It has all but 1-2. @gonzalezeb, please add any that aren't in there yet.
2.1. allodb: extract all unique pieces in the allometric equations, so I can ask Erika what each one means. For splitting symbols, use * + ( ) [ ] ^ - CAPITAL-LETTERS exp ln
standardize categories in variable_biomass_component
We'll want to set bonds on what should be considered a reasonable biomass estimate and flag any allometries that fall outside of this range.
The first step is to define how those bounds should be determined.
Suggested by Ervan (via @).
A simple step-by-step tutorial guiding them through, after having installed a package with the functions + objects (i.e. wood density and site info) is probably the best way to go.
--Ervan
For this, maybe use the BIOMASS package which is already available.
Ervan suggested that the most specific equations (i.e. closer geographically or of higher taxonomic resolution) not always may be the best equaiton. Instead, best might be one generic ecuation. [Here generic means not “of taxonomic genus level”; it means general – an equation that has been produced based on measuring many trees]. He said that there are generic equations for three reigons: North America, Europe and China. For each region there may be multiple equaitons: one per each taxonomical group.
One important aspect of making the database easy to work with and maintainable in the long run is to normalize it. Eventually, we will need to move in that direction. Our current not-normalized structure already seems to be exposing some issues. For us to assess how urgent it is to normalize the database, I'll document the issues I'm noticing here.
As I restructure the allodb project I would like to delete useless files. This issue documents the files I would like to remove but I need to first confirm with @gonzalezeb.
@gonzalezeb, please fix type_allodb_master()
(file R/type_allodb_master.R
) to reflect the recent change in column names of master
.
From something like this:
type_allodb_master <- function() {
list(
...
<old name> = <old type>,
...
)
}
To something like this:
type_allodb_master <- function() {
list(
...
<new name> = <new type>,
...
)
}
library(allodb)
> master <- read_csv_as_chr(here::here("data-raw/allodb_master.csv"))
>
> setdiff(names(master), names(allodb:::type_allodb_master()))
[1] "biogeographic_zone" "region" "proxy_species"
[4] "notes_on_species"
> setdiff(names(allodb:::type_allodb_master()), names(master))
[1] "development_species" "notes_to_consider"
Ervan suggests that the biomass() function needs a parameter to input wood density because new wood density data becomes available frequently and users may want to incorporate it. That is why he prefers to compute biomass not with a fixed equation but as a function of wood density.
We previously had DBH range in the site-species table, and this commit removed it. I think we need it. The purpose would be to allow assignment of different allometric equations to a single species based on size. For example, we may trust a local/species-specific equation for only part of the possible size range.
I'm envisioning that the equation is selected based on the dbh range in this sheet. The min and max dbh in the equations sheet would only be for reference and to give warnings if there's an attempt to apply an allometry outside the range for which it was developed.
If this doesn't make sense or if you disagree, let's discuss in person.
The master allotemp_main.csv data has names that are difficult to work with. Can you change its names to not have "(" or ")" or spaces? Ideally all names should start with a letter and include only letters, numbers and "_", not "." nor spaces.
If this is not possible let's discuss.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.