Giter VIP home page Giter VIP logo

reproductive_allocation_kuringgai's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

reproductive_allocation_kuringgai's Issues

Output data for baad compilation

here is code used:

library(plyr)
library(dplyr)

data <- tbl_df(read.csv("data/2013_Kuringgai_harvest.csv", stringsAsFactors=FALSE)) %.%
  filter(year==2013, plant_status=="alive") %.%
  arrange(tag_ID, date, segment)

segments <- list()
segments[["1"]] <- c(as.character(1:8,"1.2","1.3","1.2.1","1.2.1.1","1.1.2","1.1.2.1","1.1.1.2"))

## Get total mass of all material subtended by given node
get.mass.at.node <- function(data, segments){
  data %.%  filter(segment %in% segments) %.%
    group_by(tag_ID) %.%
    summarise(
        species = species[1],
        site= site[1],
        age=age[1],
      height=max(height, na.rm=TRUE),
        leaf_weight=sum(leaf_weight),
        stem_weight=sum(stem_weight),
        total_weight=sum(leaf_weight, stem_weight))
}

mass <- ldply(segments,  function(x) get.mass.at.node(data, x)) %.%
        rename(c('.id'='node_above')) %.%  # rename plyr name to somewthing more meaningful
        arrange(tag_ID, node_above)

# Get avergae diameter for base of segment. This is given by diameter readings for segment
# with names in list above
get.diam.node.above <- function(data, level){
  data %.%  filter(node_above == level) %.%
    group_by(tag_ID) %.%
    summarise(
        node_above=level,
        dia=mean(c(diameter_1,diameter_2, diameter_3), na.rm=TRUE),
      stem.area=dia^2*pi/4
        )
}

diameters <- ldply(names(segments),  function(x) get.diam.node.above(data, x)) %.%
        arrange(tag_ID, node_above)

# Merge mass and diameter measurements
merged1 <- merge(mass, diameters, by=c("tag_ID", "node_above"))

# import lma
lma <- tbl_df(read.csv("data/LMA.csv", stringsAsFactors=FALSE)) %.%
  filter(species !="" & species !=" ") %.%
  group_by(species, site) %.%
  summarise(lma=mean(LMA, na.rm=TRUE)*1000/100,
    leaf_size= mean(leaf_area/leaf_number, na.rm=TRUE))

merged2 <- tbl_df(merge(merged1, lma, by=c("species", "site"))) %.%
  mutate(leaf_area = leaf_weight/lma) %.%
  rename(replace=c("species" = "Abbreviation"))

# import lma
species <- tbl_df(read.csv("data/species_names.csv", stringsAsFactors=FALSE))

merged3 <- tbl_df(merge(species, merged2, by=c("Abbreviation"))) %.%
  select(-Abbreviation, -Common_name, -Previous_names, -tag_ID, -node_above)

write.csv(x=merged3,file="output/data.csv", row.names=FALSE)

PHPH weights not correct

The total plant weights for many PHPH are a huge underestimate, because they are only the weights of one of several stems emerging from the ground. RA should be accurate, since it is the ratio of reproduction to (growth + repro) on a single stem. But plots of plant size vs. RA will be very strange. Since I don't have repro from the side branches, this means I need (at least for PHPH) to introduce two "plant weight" columns, one to calculate total plant weight, leaf weight, growth, and a second to calculate RA. Happy to take this one myself.

BAER cone weights are too high

We have never quite finished the calculation for BAER reproductive investment. Right now investment is over-estimated, because the "cone weight" used includes the weight of the "seed pods" (since they are fused to the cone), but there is also a separate weight of "seed pods" included. So the cone weight for each "brown cone" needs to be reduced by the number of seed pods for that cone. If easiest I can do this manually, but suspect I'll need help to automate it using R

ReproductionAllocation_all does not include RA

Need to rename functions and variables as this now only includes plant growth and not RA. RA is added on function combine_by_individual which is drawing data from AccessoryCosts_all

Convert UTM coords to latitutude and longitude

Achieved using this code:

library(rgdal)
file <- "data/sites.csv"
data <-  read.csv(file, stringsAsFactors=FALSE, check.names=FALSE)
# prepare UTM coordinates matrix
utmcoor <- SpatialPoints(cbind(data[["UTM easting"]],data[["UTM northing"]]), proj4string=CRS("+proj=utm +zone=56 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs "))
longlatcoor <- spTransform(utmcoor,CRS("+proj=longlat"))

add "se" back in as function

I'd like to add "se" back into the automatically calculated variables for SummarySpp and

Line 186, in Summaries.R

function to apply

fs <- c("max", "mean", "sd", "length")

missing individuals from data summaries

HATE_003 not showing up in data summaries, but listed as "TRUE" under "use_for_allocation" in file "individuals"
GRBU_906 - showing up as zero reproductive investment, but has reproduction listed
PELA_902 not showing up in data summaries, but listed as "TRUE" under for_allocation" in file "individuals"
PUTU_904 not showing up in data summaries, but listed as "TRUE" under for_allocation" in file "individuals"

I have not started tracing any of these problems - but also not sure where to start looking.

Accessory costs recalculate for indidivuals not species

Right now we are calculating prepollencosts, dispersalcosts and dispersal costs to the point of pollination at the species level. These calculations should be redone at the individual level – especially relevant for the species with variable numbers of ovules per inflorescence/infructescence

  1. Prepollencosts = fixed costs up to the point of pollination.

Formula by species is in R/Summaries line 40

Scalings:
• Most scalings are identical to those listed in the file Data/MultiplierTable
• Some scalings need to be done at the individual level:
BAER, PEPU – cone_green (scaling = total ovule count / total cone count)
COER, GRBU, GRSP – inflorescence stalk (scaling = total ovule count / total inflorescence count)
PILI –inflorescence stalk + bract (scaling = total ovule count / total inflorescence count)
Steps:
a. for the species*parts with unique multipliers, select the individuals and create data frame with scalings
- For all individuals, total ovule count = SummaryInd$repro_all_count
- Inflorescence count: SummaryInd$inflorescence_count
- cone count: SummaryInd$cone_count
(These count calculations are done in the file data/Investment.R and I want to make sure they are using FD (Final Development) numbers)

b. Have “lookup” function that pulls from either generic multiplierTable or the new dataframe as appropriate

  • I can create this quickly, but no quite sure what format would be best

Parts weights:
• Non-cone parts:
o When actual flower parts have been weighed, this number should be used for determining weight (should be able to find code where Konrad did this for other calculations)
o When there is a match between flower part & individual in the file Data/flowerParts
o Otherwise use the species average for the specific flower part from the data frame Data/PartsSummary
• Cones for PEPU and BAER
o Difficult because we don’t have the actual weights of cones that continued developing. We do have dimensions of those cones at the time of flowering and the regressions to turn the dimensions into a weight.
o For these individuals Konrad already has calculated the green cone weight at the time of flowering (not currently an actual output, can't remember name of dataframe lists midway development parts)
o So we need to take these green cone weights and the correct census – I've created a table for the individuals where this is relevent, listing the census and the cone dimensions at that census; should be able to match and use numbers)

  1. Dispersal Costs

Scalings:
• Most scalings are identical to those listed in the file Data/MultiplierTable
• Some scalings need to be done at the individual level – same as for prepollencosts above

Parts weights:
• Non-cone parts: as above
• Cones: I think it is appropriate to take the actual cone weight and divide it by the number of mature seeds. There is an argument that some of the cone’s weight could be attributed to the many shed flowers, but I think that is the weight that is already part of “cone green”

  1. Dispersal costs to the point of pollination – same as above

linear fit to estimate leaf growth doesn't work for older individuals

  • As described below, I don't think using a linear fit to estimate leaf growth is appropriate, especially in the older plants
  • The plots below show that for all plants, even old ones, if just the upper (narrower diameter) segments are used, the leaf weight vs. diameter plots on the main line
  • However, for the points that are "basal diameter,all leaves on plant" or even (diameter of segment 2, leaf weight above segment 2), the diameter increases with little increase in leaf weight
  • this is of course because there are few leaves on the lower branches of an older plant and overall, much lower increase in whole plant leaf weight vs basal diameter
  • this means that for the plots, segment 1 (which are mostly the labeled dots) fall "above" the main line of points; for some individuals segment 2 does the same
  • This leads to an overestimate in leaf weight increase in older plants
  • stem weights always look good
  • looking at BOLE, age 7
    • colors represent segments, with red the entire plant, then purple, gray, green
    • for stems, linear no matter what portion plant included
    • for leaves, with older plants (ages 5,7,9 for BOLE, older for other species) have continued diameter increase with little increase in leaf area
par(mfrow=c(1,2), cex=1, omi=c(.1,.1,.1,.1), mai=c(1.1,1.1,.1,0.2))
check <- BOLE_HarvestData

plot(dia~stem_weight,data=subset(check), pch=16,log="xy",col=col.lots[as.factor(segment)])
text(dia~stem_weight,data=subset(check,segment==1),labels=individual,cex=.5,pos=3,offset=-.7)

plot(dia~leaf_weight,data=subset(check), pch=16,log="xy",col=col.lots[as.factor(segment)])
text(dia~leaf_weight,data=subset(check,segment==1),labels=individual,cex=.5,pos=3,offset=-.7)
  • Plot just have age 7 BOLE individuals
plot(dia~stem_weight,data=subset(check,age==7), pch=16,log="xy",col=col.spp[as.factor(individual)])
text(dia~stem_weight,data=subset(check,segment==1&age==7),labels=individual,cex=.5,pos=3,offset=-.7)

plot(dia~leaf_weight,data=subset(check,age==7), pch=16,log="xy",col=col.spp[as.factor(individual)])
text(dia~leaf_weight,data=subset(check,segment==1&age==7),labels=individual,cex=.5,pos=3,offset=-.7)
  • At the other extreme, with PELA, effect only seed among age 32 plants, and only among 5 of the 7 reps (colors here represent individuals)
check <- PELA_HarvestData

plot(dia~stem_weight,data=subset(check,age==32), pch=16,log="xy",col=col.spp[as.factor(individual)])
text(dia~stem_weight,data=subset(check,segment==1&age==32),labels=individual,cex=.5,pos=3,offset=-.7)

plot(dia~leaf_weight,data=subset(check,age==32), pch=16,log="xy",col=col.spp[as.factor(individual)])
text(dia~leaf_weight,data=subset(check,segment==1&age==32),labels=individual,cex=.5,pos=3,offset=-.7)
  • Similarly no asymptoting at all for HATE
check <- HATE_HarvestData

plot(dia~stem_weight,data=subset(check,age==32), pch=16,log="xy",col=col.spp[as.factor(individual)])
text(dia~stem_weight,data=subset(check,segment==1&age==32),labels=individual,cex=.5,pos=3,offset=-.7)

plot(dia~leaf_weight,data=subset(check,age==32), pch=16,log="xy",col=col.spp[as.factor(individual)])
text(dia~leaf_weight,data=subset(check,segment==1&age==32),labels=individual,cex=.5,pos=3,offset=-.7)
  • COER is similar - as are many other species
  • Dots colored by age, only including 7 & 9 year old plants
check <- COER_HarvestData

plot(dia~stem_weight,data=subset(check,age>5), pch=16,log="xy",col=col.spp[as.factor(segment)])
text(dia~stem_weight,data=subset(check,segment==1&age>5),labels=individual,cex=.5,pos=3,offset=-.7)

plot(dia~leaf_weight,data=subset(check,age>5), pch=16,log="xy",col=col.spp[as.factor(segment)])
text(dia~leaf_weight,data=subset(check,segment==1&age>5),labels=individual,cex=.5,pos=3,offset=-.7)

accessory data missing

There are currently 0 observations in the file "accessory" generated from "AccessoryCosts_all". There are also 0 observations in the individual species files (i.e. BAER_AccessoryCosts), so the problem is in created the individual species files.

Counts still slightly inaccurate.

Most problems with "counts" of parts (aborted pre-pollination, aborted pre-provisioning, aborted during provisioning, seeds) are now fixed, but three species still have part defined that don't fit nicely in the same framework as other species:

BOLE: finished_flower can’t be on list
flower_aborted required for some species, not others!
HATE: problems because inflorescence bud now in flower units!

I think it is safer to create separate count lists for each species rather than try and change nomenclature in all the files. Would it be easy to incorporate summing "counts" in to the yml file where there are lists of parts for each species? And one could add additional lines for parts to count?

Problem running mclapply on Lizzy's machine

mclapply(species$Abbreviation, CalculateInvestmentForSpecies,  mc.cores=detectCores()-1 )
Error in mclapply(species$Abbreviation, CalculateInvestmentForSpecies,  : 
  'mc.cores' > 1 is not supported on Windows
> source('analysis/RA_Calculations.R')

seedset underestimated for some species

For some species, where the second year's buds are starting to form at the time of harvest, the count of all ovules produced currently includes the count of buds at both the beginning and end of the year. We will need to create a filter so one set is ignored. This probably needs to be discussed on a species-by-species basis

Use correct version of "repro inv"

  • The first plot below shows that there are individuals where absolute investment in accessory tissues is larger than total reproductive investmen* There are two different sources of reproductive investment, one which comes from the file "accessory" and is the sum of all the different accessory tissues and one which comes from the dataframe "investment". The "investment" variant is what is currently being used, but not sure which one should be being used
  • They are quite different from each other
plot(accessory_inv~ReproInv,data=subset(SummaryInd,accessory_inv>0),log="xy",pch=16,col=col.spp[as.factor(species)])
abline(0,1)
text(120,350,"all RE to accessory tissues",srt=45)

plot(investment$ReproInv~accessory$total_repro_inv,log="xy",pch=16,col=col.spp[as.factor(investment$species)])

two other places use deleted columns to filter data

I have found two other places that use the deleted columns from "harvest". The correct data is now in "IndividualList".

File:ReproductiveAllocation.R, line 21
Original code:

filter(segment == 1, use_status == "use", plant_status == "alive")  %>% ...

• Now need to filter on “IndividualsList”, with “use_for_allocation_calculations ==TRUE” and “alive==TRUE”
• Would this work?

filter(segment == 1, 
  IndividualsList$use_for_allocation_calculations[match(HarvestData$individual, IndividualsList$individual)], 
IndividualsList$alive[match(HarvestData$individual, IndividualsList$individual)])  %>% ...

File: AccessoryTissues.R, Lines 5-8
Original code:

 AgeData <- unique(
    filter(HarvestData, segment == 1, plant_status == "alive", individual %in% Reproduction$individual) %>%
    select(age, individual)
      )

Would this work?

AgeData <- unique(
    filter(IndividualsList, alive, individual %in% Reproduction$individual) %>%
    select(age, individual)
      )

Tasks for finishing Accessory paper

Supp Matt (Daniel)

  • Finish description of invetsment
  • Review text around species graphs and tables
  • Review text on code

Main text (Lizzy)

  • Proof

Code

  • Better readme
  • Post reduced code and datsaet
  • Add details about code to main text

Submission

  • Revise abstract
  • Cover letter

small fixes for accessory costs

problem 1 - can't enter different scaling values for different parts (problem for COER, GRSP inflorescence stalks in fruit vs flower)

problem 2 - if there are multiple entries for a part (at different census times), it is not adding them all together and instead just taking the value from the first census period

question - for PEPU, should I divide brown cone weight by seed count or flower count - i.e. in cone with low seedset, should some of the cone weight be considered "aborted"

question - for HATE, how to deal with "nearly mature" seeds. (This is an issues for all species where buds begin forming before the "start" of the year or seeds continue maturing after the "end" of the year - I'm starting to rethink how I've gone about the calculations and thinking that I should, where possible, only use a single cohort of data and project forward or backwards.

BAER - project backwards since I know original cone dimensions and flower counts
BOLE -Correction not possible since so much early bud abortion
EPMI - project backwards since buds just barely forming at start
GRSP - correction not possible since so much early bud abortion, but limited effect because few flowers at start of year
HEPU - correction not possible, but few flowers at start of year
HATE - project forward, since I know which seeds are maturing
LEES - project backwards since buds just barely forming at start
PELA - correction not possible and year truly started midway through season

FinDev numbers have changed

I have just been trying to extract the "up to date" FinDev (Final Development) dataframe, which should give me the counts of all reproductive parts.

results_best.rmd, line 568

tmpBAER <- BAER_Investment
FD_BAER <- as.data.frame(tmpBAER$FD)
BAER_count <- FD_BAER %>%
group_by(Individual, what) %>%
summarise_each(funs(sum), count, weight)

Some of the numbers looked odd to me - there are negative numbers and large number of buds (and other parts) which I was fairly sure had "progressed" to a different "final development stage".

I have now checked the counts for both BAER_001 and BOLE_001 against the original data spreadsheet and the problem is definitely with the current Fin Dev numbers - as extracted using the above code - not the ones I archived in February.

It is possible I've "extracted" the wrong data, but it certainly looks the same as what I've worked with before - most counts haven't changed.

Any clues? Wondering if this problem feeding into the current investment numbers I've been working with, or if it is only an issue for the data I am now trying to use.

Lizzy

PEPU aborted cones have incorrect mass

  • There are two problems with the calculation of PEPU cones. I discovered this, because there were a few plants, especially PEPU_802, with huge ReproInv relative to repro_all_count
  • Looking at various data sources I discovered that the estimated FD (Final Development) parts weights for those cones were much too big, for PEPU_802 more than 10x too big
  • In addition, for PEPU_802 and PEPU_807 the aborted cones were actually collected and marked as "to use" in the flower parts data, but aren't being used
  • I've looked at Konrad's scripts and can't immediately see what would be wrong
plot(ReproInv~repro_all_count,data=subset(SummaryInd, repro_all_count>0&species=="PEPU"),log="xy")
text(ReproInv~repro_all_count,data=subset(SummaryInd, repro_all_count>0&species=="PEPU"),labels=individual, pos=1,offset=.6,cex=0.8,col="black")

check_FD <-as.data.frame(PEPU_Investment[3])
check_FD_aborted <- subset(check_FD,FD.what=="cone_aborted")
check_FD_aborted

check_parts <- subset(as.data.frame(PEPU_FlowerPartsData),part=="cone_aborted")
check_parts

Further fixes in data workflow

  • Check file deletions at this commit(are any of these needed?)
  • IAT in R/AccessoryTissues.R: expand name
  • IAT in R/AccessoryTissues.R: does not need reproduction data, just list of individuals

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.