Giter VIP home page Giter VIP logo

habitatconnectivity's People

Contributors

aaronplex avatar garrettlab avatar krishnakeshav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

habitatconnectivity's Issues

Using data from CroplandCROS

Here is a code template to download data from CroplandCROS (high spatial resolution = 30 m times 30 m), to transform it into a spatRaster that can be used in geohabnet, and start assembling our vignette.

You can download a GeoTIFF file from CroplandCROS, and check their user guide if needed.

  1. First, choose your crop, area of interest (AOI), and year.
  2. Then, make sure to use Web Mercator (WGS 84) and DPI = 96 to export.
  3. Check that you get a compressed (zipped) folder with all information including (but not only) the tif file called "clipped".
  4. Read the clipped.tif file in R and check that a coordinate reference system (crs) comes with the file.
  5. Plot the spatRaster and check what are the values of each pixel. There is only one integer, but that value just represents 'presence' (e.g., 56) or absence' (NA).
  6. Transform crs to the one used and accepted by geohabnet.
  7. Generate a map of cropland density at a coarser resolution by summing all small pixels into larger pixels (i.e., aggregating pixels or reducing spatial resolution).
  8. Finally save the resulting spatRaster to be the input in geohabnet. Pending is how to convert it into a map of host density instead of host abundance as this is the same issue as the one using data from GBIF.

Here is the code I used that worked:
library(terra)
r<-rast("clipped.tif")
plot(r)
v<-72
r<-r/v
plot(r)

r<-project(r, "+proj=longlat +datum=WGS84")
f<-40
a<-aggregate(r, fact = f, fun="sum", na.rm = TRUE) / (f*f)
plot(a)

writeRaster(a, "FloridaCitrus01.tif")

*The project() and aggregate() functions took about 15 minutes the first and 5 minutes the second as the dataset is at high resolution.

Options/parameters for multiple data sources

  • 1. monfreda - Has data only till 2005.
  • 2. SPAM - dataset until 2017(only for Africa - think if there's should be a parameter for Africa/not Africa). Global dataset available until 2010. Choose the option A for each crop -
Browse[2]> test_spam
class       : SpatRaster 
dimensions  : 732, 1008, 3  (nrow, ncol, nlyr)
resolution  : 0.08333333, 0.08333333  (x, y)
extent      : -26, 58, -35, 26  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
names       : H_POTA_A, H_POTA_I, H_POTA_R 
min values  :      0.0,      0.0,      0.0 
max values  :   8468.4,   2656.5,   8468.4 

Potential designs -

  1. select source then crop
  2. select crop and then source
  3. source of data matters, so display list of crops from all the data sources.

All are part of

geodata

package

Flexible choice of dispersal kernels

@krishnakeshav @GarrettLab, this issue is for an improvement issue.

Specifically, the functions in geohabnet will have the flexibility that the user can choose which dispersal kernels (inverse power law, negative exponential or both) to analyze. Currently, the functions make the user to run the analysis using both dispersal kernel models.

merging MAPSPAM and 'Monfreda' datasets

check that if users specify the same crop in both datasets, generate a map of the mean between MAPSPAM and Monfreda.

sum if crop are different in each dataset

Dynamically load parameters.

As it turns out, directory structure cannot be customized and should follow CRAN recommendations. Current issue is to figure out a way to load parameters dynamically within the package since install structure and source structure of R package is different.

add support for more metrics

Reference - igraph

  1. betweeness -> accounts for number of shortest pathways b/w nodes
  2. node_strength -> takes into account first layer of nodes connected to
  3. sum_of_nearest_neighbors -> takes into account 2nd layer also
  4. eigenvector_centrality
  5. node degree - how many link node has? Its similar to node strength.
  6. closeness centrality -> similar to betweenness -> shortest path b/w
  7. page rank

Plot maps

Need plots for only following -

  1. Mean cropland connectivity - final
  2. difference between CC and cropland density
  3. Variance in cc.

Difference map

Old difference = sum - mean

Difference = mean ccri - host density,
For sum , aggregate host density and compare with mean ccri
For both, generate 2 host density maps, take mean of both and compare with the mean.

Link weight in shortest paths

  1. Fix how link weights are provided in the betweenness() and closeness() functions.

#link.weight<-seq(0,1,0.01)
max.inverse<-1.0001*max(link.weight)-link.weight #this is the transformed link weights that has to be provided in betweenness

Storing dataset

Program uses TIF files to produce spatial raster for map background and and zero values. It downloads the file from Amazon S3 currently. S3 is also used as backup for MapSpam dataset. We need to find an alternative way to store these dataset.

Few options -

  1. Calculate required spatial rasters in runtime.
  2. Use external storage for data.
  3. Publish dataset as a package or independently. E.g. Zenodo

Test issues

  1. We observed that plots doesn't appear in windows from @jrobledob .
  2. Custom analysis fails in windows.
    They both have a common cause. Terra appears to be setting invalid extent for background raster. Filed an issue here

check for libraries

Could you check for which libraries are necessary to run the cropland connectivity analysis? For example, dismo is currently present in the code, but no function from this package is probably used.

errors in

Hello,

I was exploring the package last week, and I had these issues:

geohabnet::senstivity_analysis()
Error: 'senstivity_analysis' is not an exported object from 'namespace:geohabnet'

I also tried the example for avocado, changing the potato to avocado, and got an error. For "potato," it kind of worked.

avocado_mon <- geohabnet::cropharvest_rast("avocado", "monfreda")
> avocado_result <- geohabnet::msean(avocado_mon, global = TRUE, link_threshold  = 0.000001,
+                                    inv_pl = list(beta = c(0.05),
+                                                  metrics = c("betweeness"),
+                                                  weights = c(100),
+                                                  cutoff = -1), res = 24,
+                                    neg_exp = list(gamma = c(0.05),
+                                                   metrics = c("betweeness"),
+                                                   weights = c(100), cutoff = -1))
Error in legend(x, y, legend, fill = fill, xpd = xpd, bty = bty, cex = cex,  : 
  unused arguments (ext = c(-110, 150, -88, -78), horizontal = TRUE)

similar for this:

# A test for berries
> berries_mon <- geohabnet::cropharvest_rast("berries", "monfreda")
Error in geodata::crop_monfreda(crop = crop_name, path = tempdir(), var = "area_f") : 
   is not available; see monfredaCrops()
> monfredaCrops()
Error in monfredaCrops() : could not find function "monfredaCrops"
> geohabnet::monfredaCrops()
Error: 'monfredaCrops' is not an exported object from 'namespace:geohabnet'

Further work on CC - TBD

Some of the features we still need to work on include
(a) assigning weights for network metrics, (parameter - e.g. 10% for metric1, 20% for metric2 etc)

(b) including other ways of data such as GBIF (and which file format this data need to be), and
(create a different card later)
(c) using weights for each crop when the user selects multiple crops. Same as (a). assign weight to each crop in case of multiple crops. Parameterize them.

Additionally, check that cropharvest gives information about the HarvestedAreaFraction, not production.
FIx code here)

Originally posted by @AaronPlex in #1 (comment)

Unit Tests

add unit tests for CC to demonstrate the usage of functions. Use minimal parameters.

set_parmaeters_object() issue

package has new function set_parameters_object() which allows users to specify to parameter values through function instead of parameters.yaml .Currently, using the function, modifies the yaml in undesired way. It needs to retain the structure of unmodified yaml.

assign weights to hosts

  1. all crops in monfreda = 1
  2. all crops in spam = 1
  3. newraster <- sum(monfredaAvocado, spamAvocado) * weight
  4. newraster <- sum(monfredaAvocado * w1, spamAvocado * w2) / 2

Rename fields in parameters.yaml

#TODO:
rename -

  • DispersalParameterBeta,
  • DispersalParameterGamma,
  • Hosts
  • NetworkMetrics,
  • PriorityMaps,
  • HostDensityThreshold,
  • LinkThreshold

link threshold consideration

Another option is to apply a distance threshold (which is more used in papers and easier to interpret) instead of link weight thresholds (which are more difficult to interpret).

Adjacency matrix

Support for function that can return adjacency matrix and also accept adjacency matrix as a parameter.

Wrong output

Hi @krishnakeshav , @AaronPlex
I ran a habitat connectivity analysis for Lauraceae using all 4 parameters (individually and together) using the function "msean" and got the same result always.
Plex and I also ran the sensitive analysis with just betweenness, and the result is different.

It seems like the "msean" function need to be validated.

opt parameter for distance function

From Friday's meeting -
requirement to add new parameter -
distance_model : fun_name

Bard reference -

The geosphere::distGeo() and geosphere::distVincentyEllipsoid() functions are both used to calculate the distances between two points on a sphere. The geosphere::distGeo() function uses the great-circle distance formula, while the geosphere::distVincentyEllipsoid() function uses the Vincenty ellipsoidal distance formula.

The geosphere::distGeo() function is generally faster than the geosphere::distVincentyEllipsoid() function. This is because the geosphere::distGeo() function uses a simpler formula that is easier to compute.

However, the geosphere::distVincentyEllipsoid() function is more accurate than the geosphere::distGeo() function. This is because the geosphere::distVincentyEllipsoid() function takes into account the curvature of the earth, while the geosphere::distGeo() function does not.

In terms of computational efficiency, the geosphere::distGeo() function is generally better when the distances between the points are relatively small. However, the geosphere::distVincentyEllipsoid() function is better when the distances between the points are relatively large.

For calculating distances of cells in an adjacency matrix, the geosphere::distGeo() function is a good choice if the distances between the cells are relatively small. However, if the distances between the cells are relatively large, the geosphere::distVincentyEllipsoid() function is a better choice.

Here is a table that summarizes the performance and accuracy of the two functions:

Function Performance Accuracy
geosphere::distGeo() Faster Less accurate
geosphere::distVincentyEllipsoid() Slower More accurate

Ultimately, the best function to use depends on the specific application. If speed is the most important factor, then the geosphere::distGeo() function is a good choice. If accuracy is the most important factor, then the geosphere::distVincentyEllipsoid() function is a good choice.

Automate site

Automate github pages to be built and deployed whenever changes are pushed to main branch.

network metrics checking

It seems like page rank is not currently working in the package

``
sensitivity_analysis()
New analysis started for given raster
Running sensitivity analysis for the extent: [ -24, 180, -58, 60 ],
Link threshold: 1e-06
Host density threshold: 1e-04

Error in pagerank(graph, param) : could not find function "pagerank"
``

GBIF - global database for all organisms. So extract only the applicable ones and provide users to select from. It uses scientific names.

Sample -

Please try the code below where we can download information directly from GBIF and then convert that info into raster files of a desired resolution. In this case, myTaxon asks for the scientific name of the host (plant) instead of the common name: say, "Persea americana" instead of "avocado". Let me know if it works on your side.
Also, the user needs to create an account in GBIF and then use his/her credentials. (For the moment, I trust that people in this repo can see my credentials). lol

library(rgbif)

# User provide a taxon name and R generates the taxon key to search in GBIF
# Or user provide the taxon key directly
myTaxon <- c("Persea americana")
taxonkey <- name_backbone(myTaxon)$usageKey

# Downloading info from GBIF
downloadID<-occ_download(
  pred_in("taxonKey", taxonkey),
  format = "SIMPLE_CSV",
  user = username, pwd = password, email = email
)

hostOccGBIF<-occ_download_get(downloadID[1]) %>%
  occ_download_import()

# Cleaning dataset
hostOccGBIF<-hostOccGBIF[hostOccGBIF$countryCode != "",]
hostOccGBIF<-hostOccGBIF[hostOccGBIF$countryCode != "ZZ",]
hostOccGBIF<-hostOccGBIF[hostOccGBIF$occurrenceStatus == "PRESENT",]
hostOccGBIF$Lon <- as.numeric(hostOccGBIF$decimalLongitude)
hostOccGBIF$Lat <- as.numeric(hostOccGBIF$decimalLatitude)
hostOccGBIF<-hostOccGBIF[,colnames(hostOccGBIF) %in% 
                           c("species", "Lon", "Lat")]
hostOccGBIF<-hostOccGBIF[is.na(hostOccGBIF$Lon)==FALSE, ]
hostOccGBIF<-hostOccGBIF[is.na(hostOccGBIF$Lat)==FALSE, ]
length(hostOccGBIF$species)

library(terra)
vectorHost<-vect(hostOccGBIF, crs="+proj=longlat", geom=c("Lon","Lat"))
e <- ext(-180, 180, -60, 90) #left, right, bottom, top
vectorHost<-crop(vectorHost, e)
r <- rast(res=0.5, ext=e) # res=1 here is equivalent to resolution=12 in geohabnet
rasterHost<-rasterize(vectorHost, r, fun=length)
plot(rasterHost)
# Optional: Applying the spatial operation named focal to mitigate sampling bias
frasterHost<-focal(rasterHost, 3, mean, na.policy="all", na.rm=TRUE)
plot(frasterHost)

Replace usage of raster with terra

@AaronPlex recommended that both the packages provide similar functionalities. Upon looking further, it seems that terra performs better than raster in case of large objects. This will require some analysis.
start here.

CCRI to HCI

Please change all references to cropland connectivity risk index (or CCRI) to habitat connectivity index (or HCI)

Also please add something toward the beginning of the descriptions (whenever geohabnet is introduced for the first time) to discuss how CCRI as described by Xing et al relates to HCI

Metrics weight

  • Issue raised in meeting by @manoj044 regarding how metrics are scaled using weights if we are making it optional. Found the following information on how CCRI metrics are scaled in a paper and will be implementing them -

The summary index (the CCRI) was calculated as a weighted sum of 1/2 betweenness centrality, 1/6 node strength, 1/6 sum of nearest neighbors’ node degrees, and 1/6 eigenvector centrality, such that each of the four metrics was scaled before summing by dividing by the maximum value observed for that metric. The weighting emphasizes betweenness because betweenness will particularly capture a potential role as a bridge that is not obvious when individual cropland area is considered alone, and also to include connectedness of a node at different scales.

  • Additionally, parameterize weight assignment to each individual metric.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.