garrettlab / habitatconnectivity Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 4.0 554.35 MB

geohabnet R package

Home Page: https://garrettlab.github.io/HabitatConnectivity/

License: GNU General Public License v3.0

R 97.34% TeX 2.66%

agriculture crop geographical-information-system geography network-analysis networks r surveillance

habitatconnectivity's People

Contributors

Stargazers

Watchers

Forkers

ricardoi romaric20 ashishufl

habitatconnectivity's Issues

Using data from CroplandCROS

Here is a code template to download data from CroplandCROS (high spatial resolution = 30 m times 30 m), to transform it into a spatRaster that can be used in geohabnet, and start assembling our vignette.

You can download a GeoTIFF file from CroplandCROS, and check their user guide if needed.

First, choose your crop, area of interest (AOI), and year.
Then, make sure to use Web Mercator (WGS 84) and DPI = 96 to export.
Check that you get a compressed (zipped) folder with all information including (but not only) the tif file called "clipped".
Read the clipped.tif file in R and check that a coordinate reference system (crs) comes with the file.
Plot the spatRaster and check what are the values of each pixel. There is only one integer, but that value just represents 'presence' (e.g., 56) or absence' (NA).
Transform crs to the one used and accepted by geohabnet.
Generate a map of cropland density at a coarser resolution by summing all small pixels into larger pixels (i.e., aggregating pixels or reducing spatial resolution).
Finally save the resulting spatRaster to be the input in geohabnet. Pending is how to convert it into a map of host density instead of host abundance as this is the same issue as the one using data from GBIF.

Here is the code I used that worked:
library(terra)
r<-rast("clipped.tif")
plot(r)
v<-72
r<-r/v
plot(r)

r<-project(r, "+proj=longlat +datum=WGS84")
f<-40
a<-aggregate(r, fact = f, fun="sum", na.rm = TRUE) / (f*f)
plot(a)

writeRaster(a, "FloridaCitrus01.tif")

*The project() and aggregate() functions took about 15 minutes the first and 5 minutes the second as the dataset is at high resolution.

Options/parameters for multiple data sources

1. monfreda - Has data only till 2005.
2. SPAM - dataset until 2017(only for Africa - think if there's should be a parameter for Africa/not Africa). Global dataset available until 2010. Choose the option A for each crop -

Browse[2]> test_spam
class       : SpatRaster 
dimensions  : 732, 1008, 3  (nrow, ncol, nlyr)
resolution  : 0.08333333, 0.08333333  (x, y)
extent      : -26, 58, -35, 26  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
names       : H_POTA_A, H_POTA_I, H_POTA_R 
min values  :      0.0,      0.0,      0.0 
max values  :   8468.4,   2656.5,   8468.4

Potential designs -

select source then crop
select crop and then source
source of data matters, so display list of crops from all the data sources.

All are part of

geodata

package

Publish

Continue #34

Flexible choice of dispersal kernels

@krishnakeshav @GarrettLab, this issue is for an improvement issue.

Specifically, the functions in geohabnet will have the flexibility that the user can choose which dispersal kernels (inverse power law, negative exponential or both) to analyze. Currently, the functions make the user to run the analysis using both dispersal kernel models.

merging MAPSPAM and 'Monfreda' datasets

check that if users specify the same crop in both datasets, generate a map of the mean between MAPSPAM and Monfreda.

sum if crop are different in each dataset

Dynamically load parameters.

As it turns out, directory structure cannot be customized and should follow CRAN recommendations. Current issue is to figure out a way to load parameters dynamically within the package since install structure and source structure of R package is different.

Return from sensitivity_analysis()

Return raster from sensitivity_analysis() .

add support for more metrics

Reference - igraph

betweeness -> accounts for number of shortest pathways b/w nodes
node_strength -> takes into account first layer of nodes connected to
sum_of_nearest_neighbors -> takes into account 2nd layer also
eigenvector_centrality
node degree - how many link node has? Its similar to node strength.
closeness centrality -> similar to betweenness -> shortest path b/w
page rank

Plot maps

Need plots for only following -

Mean cropland connectivity - final
difference between CC and cropland density
Variance in cc.

Difference map

Old difference = sum - mean

Difference = mean ccri - host density,
For sum , aggregate host density and compare with mean ccri
For both, generate 2 host density maps, take mean of both and compare with the mean.

Link weight in shortest paths

Fix how link weights are provided in the betweenness() and closeness() functions.

#link.weight<-seq(0,1,0.01)
max.inverse<-1.0001*max(link.weight)-link.weight #this is the transformed link weights that has to be provided in betweenness

Storing dataset

Program uses TIF files to produce spatial raster for map background and and zero values. It downloads the file from Amazon S3 currently. S3 is also used as backup for MapSpam dataset. We need to find an alternative way to store these dataset.

Few options -

Calculate required spatial rasters in runtime.
Use external storage for data.
Publish dataset as a package or independently. E.g. Zenodo

Test issues

We observed that plots doesn't appear in windows from @jrobledob .
Custom analysis fails in windows.
They both have a common cause. Terra appears to be setting invalid extent for background raster. Filed an issue here

Add support for multiple Spatial Resolution

Info required -
@AaronPlex @GarrettLab

check for libraries

Could you check for which libraries are necessary to run the cropland connectivity analysis? For example, dismo is currently present in the code, but no function from this package is probably used.

errors in

Hello,

I was exploring the package last week, and I had these issues:

geohabnet::senstivity_analysis()
Error: 'senstivity_analysis' is not an exported object from 'namespace:geohabnet'

I also tried the example for avocado, changing the potato to avocado, and got an error. For "potato," it kind of worked.

avocado_mon <- geohabnet::cropharvest_rast("avocado", "monfreda")
> avocado_result <- geohabnet::msean(avocado_mon, global = TRUE, link_threshold  = 0.000001,
+                                    inv_pl = list(beta = c(0.05),
+                                                  metrics = c("betweeness"),
+                                                  weights = c(100),
+                                                  cutoff = -1), res = 24,
+                                    neg_exp = list(gamma = c(0.05),
+                                                   metrics = c("betweeness"),
+                                                   weights = c(100), cutoff = -1))
Error in legend(x, y, legend, fill = fill, xpd = xpd, bty = bty, cex = cex,  : 
  unused arguments (ext = c(-110, 150, -88, -78), horizontal = TRUE)

similar for this:

# A test for berries
> berries_mon <- geohabnet::cropharvest_rast("berries", "monfreda")
Error in geodata::crop_monfreda(crop = crop_name, path = tempdir(), var = "area_f") : 
   is not available; see monfredaCrops()
> monfredaCrops()
Error in monfredaCrops() : could not find function "monfredaCrops"
> geohabnet::monfredaCrops()
Error: 'monfredaCrops' is not an exported object from 'namespace:geohabnet'

Further work on CC - TBD

Some of the features we still need to work on include
(a) assigning weights for network metrics, (parameter - e.g. 10% for metric1, 20% for metric2 etc)

(b) including other ways of data such as GBIF (and which file format this data need to be), and
(create a different card later)
(c) using weights for each crop when the user selects multiple crops. Same as (a). assign weight to each crop in case of multiple crops. Parameterize them.

Additionally, check that cropharvest gives information about the HarvestedAreaFraction, not production.
FIx code here)

Originally posted by @AaronPlex in #1 (comment)

Unit Tests

add unit tests for CC to demonstrate the usage of functions. Use minimal parameters.

set_parmaeters_object() issue

package has new function set_parameters_object() which allows users to specify to parameter values through function instead of parameters.yaml .Currently, using the function, modifies the yaml in undesired way. It needs to retain the structure of unmodified yaml.

assign weights to hosts

all crops in monfreda = 1
all crops in spam = 1
newraster <- sum(monfredaAvocado, spamAvocado) * weight
newraster <- sum(monfredaAvocado * w1, spamAvocado * w2) / 2

Rename fields in parameters.yaml

#TODO:
rename -

link threshold consideration

Another option is to apply a distance threshold (which is more used in papers and easier to interpret) instead of link weight thresholds (which are more difficult to interpret).

Adjacency matrix

Support for function that can return adjacency matrix and also accept adjacency matrix as a parameter.

Name change from cropland connectivity to habitat connectivity

Changing the following terms

cropland connectivity -> habitat connectivity

cropland connectivity risk index -> habitat connectivity index

CCRI -> HCI

Wrong output

Hi @krishnakeshav , @AaronPlex
I ran a habitat connectivity analysis for Lauraceae using all 4 parameters (individually and together) using the function "msean" and got the same result always.
Plex and I also ran the sensitive analysis with just betweenness, and the result is different.

It seems like the "msean" function need to be validated.

Use fraction area instead of hector

Use fraction area instead of hector when using following code . Add new parameter "var = area_f" to the function call.

opt parameter for distance function

From Friday's meeting -
requirement to add new parameter -
distance_model : fun_name

Bard reference -

The geosphere::distGeo() and geosphere::distVincentyEllipsoid() functions are both used to calculate the distances between two points on a sphere. The geosphere::distGeo() function uses the great-circle distance formula, while the geosphere::distVincentyEllipsoid() function uses the Vincenty ellipsoidal distance formula.

The geosphere::distGeo() function is generally faster than the geosphere::distVincentyEllipsoid() function. This is because the geosphere::distGeo() function uses a simpler formula that is easier to compute.

However, the geosphere::distVincentyEllipsoid() function is more accurate than the geosphere::distGeo() function. This is because the geosphere::distVincentyEllipsoid() function takes into account the curvature of the earth, while the geosphere::distGeo() function does not.

In terms of computational efficiency, the geosphere::distGeo() function is generally better when the distances between the points are relatively small. However, the geosphere::distVincentyEllipsoid() function is better when the distances between the points are relatively large.

For calculating distances of cells in an adjacency matrix, the geosphere::distGeo() function is a good choice if the distances between the cells are relatively small. However, if the distances between the cells are relatively large, the geosphere::distVincentyEllipsoid() function is a better choice.

Here is a table that summarizes the performance and accuracy of the two functions:

Function	Performance	Accuracy
geosphere::distGeo()	Faster	Less accurate
geosphere::distVincentyEllipsoid()	Slower	More accurate

Ultimately, the best function to use depends on the specific application. If speed is the most important factor, then the geosphere::distGeo() function is a good choice. If accuracy is the most important factor, then the geosphere::distVincentyEllipsoid() function is a good choice.

Automate site

Automate github pages to be built and deployed whenever changes are pushed to main branch.

Internal error in help

Receiving internal error when trying to navigate help for functions

CC package

Conversion to R package.

1. Package name
2. CC as a standalone package vs part of INA package.
3. Implementation - A nice reference for making R code into an R package https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/ (for awhile there it didn't work because of a package-making package incompatibility, then it started working again - hopefully it's working these days)

Doc review

Review and edit docs

network metrics checking

It seems like page rank is not currently working in the package

``
sensitivity_analysis()
New analysis started for given raster
Running sensitivity analysis for the extent: [ -24, 180, -58, 60 ],
Link threshold: 1e-06
Host density threshold: 1e-04

Error in pagerank(graph, param) : could not find function "pagerank"
``

plot for network density

Add a plot comparing network density versus the dispersal parameter values of gamma and beta

Add CITATION to the R package

R package allows paper to be added as a citation, we can use this feature cite/link relevant papers. Refer to following SO

GBIF - global database for all organisms. So extract only the applicable ones and provide users to select from. It uses scientific names.

Sample -

Please try the code below where we can download information directly from GBIF and then convert that info into raster files of a desired resolution. In this case, myTaxon asks for the scientific name of the host (plant) instead of the common name: say, "Persea americana" instead of "avocado". Let me know if it works on your side.
Also, the user needs to create an account in GBIF and then use his/her credentials. (For the moment, I trust that people in this repo can see my credentials). lol

library(rgbif)

# User provide a taxon name and R generates the taxon key to search in GBIF
# Or user provide the taxon key directly
myTaxon <- c("Persea americana")
taxonkey <- name_backbone(myTaxon)$usageKey

# Downloading info from GBIF
downloadID<-occ_download(
  pred_in("taxonKey", taxonkey),
  format = "SIMPLE_CSV",
  user = username, pwd = password, email = email
)

hostOccGBIF<-occ_download_get(downloadID[1]) %>%
  occ_download_import()

# Cleaning dataset
hostOccGBIF<-hostOccGBIF[hostOccGBIF$countryCode != "",]
hostOccGBIF<-hostOccGBIF[hostOccGBIF$countryCode != "ZZ",]
hostOccGBIF<-hostOccGBIF[hostOccGBIF$occurrenceStatus == "PRESENT",]
hostOccGBIF$Lon <- as.numeric(hostOccGBIF$decimalLongitude)
hostOccGBIF$Lat <- as.numeric(hostOccGBIF$decimalLatitude)
hostOccGBIF<-hostOccGBIF[,colnames(hostOccGBIF) %in% 
                           c("species", "Lon", "Lat")]
hostOccGBIF<-hostOccGBIF[is.na(hostOccGBIF$Lon)==FALSE, ]
hostOccGBIF<-hostOccGBIF[is.na(hostOccGBIF$Lat)==FALSE, ]
length(hostOccGBIF$species)

library(terra)
vectorHost<-vect(hostOccGBIF, crs="+proj=longlat", geom=c("Lon","Lat"))
e <- ext(-180, 180, -60, 90) #left, right, bottom, top
vectorHost<-crop(vectorHost, e)
r <- rast(res=0.5, ext=e) # res=1 here is equivalent to resolution=12 in geohabnet
rasterHost<-rasterize(vectorHost, r, fun=length)
plot(rasterHost)
# Optional: Applying the spatial operation named focal to mitigate sampling bias
frasterHost<-focal(rasterHost, 3, mean, na.policy="all", na.rm=TRUE)
plot(frasterHost)

Issue raised in meeting by @manoj044 regarding how metrics are scaled using weights if we are making it optional. Found the following information on how CCRI metrics are scaled in a paper and will be implementing them -

The summary index (the CCRI) was calculated as a weighted sum of 1/2 betweenness centrality, 1/6 node strength, 1/6 sum of nearest neighbors’ node degrees, and 1/6 eigenvector centrality, such that each of the four metrics was scaled before summing by dividing by the maximum value observed for that metric. The weighting emphasizes betweenness because betweenness will particularly capture a potential role as a bridge that is not obvious when individual cropland area is considered alone, and also to include connectedness of a node at different scales.

Additionally, parameterize weight assignment to each individual metric.

Make weights flexible for the metrics

Allow assignment of weights to individual metrics.

consistent naming in code

Style code as per the tidyverse or any consistent naming rules - snake_case, camelCase etc.