Giter VIP home page Giter VIP logo

rmangal's Introduction

rmangal ๐Ÿ“ฆ - an R Client for the Mangal database

R CMD Check lint codecov Project Status: Active โ€“ The project has reached a stable, usable state and is being actively developed. CRAN status

Context

Mangal -- a global ecological interactions database -- serializes ecological interaction matrices into nodes (e.g. taxon, individuals or population) and interactions (i.e. edges). For each network, Mangal offers the opportunity to store study context such as the location, sampling environment, inventory date and informations pertaining to the original publication. For every nodes involved in the ecological networks, Mangal references unique taxonomic identifiers such as Encyclopedia of Life (EOL), Catalogue of Life (COL), Global Biodiversity Information Facility (GBIF) etc. and can extend nodes informations to individual traits.

rmangal is an R client to the Mangal database and provides various functions to explore his content through search functions. It offers methods to retrieve networks structured as mgNetwork or mgNetworksCollection S3 objects and methods to convert mgNetwork to other class objects in order to analyze and visualize networks properties: igraph, tidygraph, and ggraph.

Installation

So far, only the development version is available and can be installed via the remotes ๐Ÿ“ฆ

R> remotes::install_github("ropensci/rmangal")
R> library("rmangal")

How to use rmangal

There are seven search_*() functions to explore the content of Mangal, for instance search_datasets():

R> mgs <- search_datasets("lagoon")
Found 2 datasets

Once this first step achieved, networks found can be retrieved with the get_collection() function.

R> mgn <- get_collection(mgs)

get_collection() returns an object mgNetwork if there is one network returned, otherwise an object mgNetworkCollection, which is a list of mgNetwork objects.

R> class(mgn)
[1] "mgNetworksCollection"
R> mgn
A collection of 3 networks

* Network # from data set #
* Description: Dietary matrix of the Huizacheโ€“Caimanero lagoon
* Includes 189 edges and 26 nodes
* Current taxonomic IDs coverage for nodes of this network:
  --> ITIS: 81%, BOLD: 81%, EOL: 85%, COL: 81%, GBIF: 0%, NCBI: 85%
* Published in ref # DOI:10.1016/s0272-7714(02)00410-9

* Network # from data set #
* Description: Food web of the Brackish lagoon
* Includes 27 edges and 11 nodes
* Current taxonomic IDs coverage for nodes of this network:
  --> ITIS: 45%, BOLD: 45%, EOL: 45%, COL: 45%, GBIF: 18%, NCBI: 45%
* Published in ref # DOI:NA

* Network # from data set #
* Description: Food web of the Costal lagoon
* Includes 34 edges and 13 nodes
* Current taxonomic IDs coverage for nodes of this network:
  --> ITIS: 54%, BOLD: 54%, EOL: 54%, COL: 54%, GBIF: 15%, NCBI: 54%
* Published in ref # DOI:NA

igraph and tidygraph offer powerful features to analyze networks and rmangal provides functions to convert mgNetwork to igraph and tbl_graph so that the user can easily benefit from those packages.

R> ig <- as.igraph(mgn[[1]])
R> class(ig)
[1] "igraph"
R> library(tidygraph)
R> tg <- as_tbl_graph(mgn[[1]])
R> class(tg)
[1] "tbl_graph" "igraph"

๐Ÿ“– Note that the vignette "Get started with rmangal" will guide the reader through several examples and provide further details about rmangal features.

How to publish ecological networks

We are working on that part. The networks publication process will be facilitated with structured objects and tests suite to maintain data integrity and quality.Comments and suggestions are welcome, feel free to open issues.

rmangal vs rglobi

Those interested only in pairwise interactions among taxons may consider using rglobi, an R package that provides an interface to the GloBi infrastructure. GloBi provides open access to aggregated interactions from heterogeneous sources. In contrast, Mangal gives access to the original networks and open the gate to study ecological networks properties (i.e. connectance, degree etc.) along large environmental gradients, which wasn't possible using the GloBi infrastructure.

Older versions

Acknowledgment

We are grateful to Noam Ross for acting as an editor during the review process. We also thank Anna Willoughby and Thomas Lin Petersen for reviewing the package. Their comments strongly contributed to improve the quality of rmangal.

Code of conduct

Please note that the rmangal project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Meta

rofooter

rmangal's People

Contributors

clementviolet avatar gabrielbouleau avatar kevcaz avatar lstmemery avatar maelle avatar noamross avatar steveviss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rmangal's Issues

POST_user erreur url

L'URL pour envoyer la table user est fausse et renvoie une erreur 404. Le problรจme vient du fait que la chaรฎne de caractรจre "users" ne prend en rรฉalitรฉ pas de "s" ร  la fin. La ligne qui pose problรจme dans la fonction est juste ci-dessous.

#path <- httr::modify_url(server, path = paste0(mangal.env$base, "/users/?name=", users[[1]]))

Hex logo

Transform the font as path with Inkscape.

Request bug on search_taxa()

> small_net <- search_taxa("superba")
Full text search
Error in `$<-.data.frame`(`*tmp*`, "original_name", value = c("Holonomada superba",  : 
  replacement has 5 rows, data has 3

POST_environment

POST_environment() can only accept one attribute and one value. Could be useful to have a function for posting an environmental dataset like : each row is a different site and each column is an environmental variable.

bottleneck for large networks

reprex: get_network_by_id(19)

I've been waiting for >15min, still not done, all data are in, it's a list of >15,000 elements, not that big.

The culprit is the do.call(rbind,) approach for large list of sf objects see r-spatial/sf#798. I'm working on a workaround, basically I'll delay the creation of sf object as much as possible.

Goodpractice output

It is good practice to

  โœ– write unit tests for all functions, and all package code
    in general. 90% of code lines are covered by test cases.

    R/get_citations.R:15:NA
    R/get_citations.R:21:NA
    R/get_citations.R:27:NA
    R/get_collection.R:88:NA
    R/get_collection.R:91:NA
    ... and 23 more lines

  โœ– avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    inst/doc/rmangal.R:26:1
    R/as.igraph.R:40:1
    R/get_network_by_id.R:21:1
    R/get_network_by_id.R:34:1
    R/search_datasets.R:46:1
    ... and 15 more lines

  โœ– avoid sapply(), it is not type safe. It might return a
    vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/search_taxa.R:36:12
    R/search_taxa.R:38:19

  โœ– checking tests ... Running โ€˜testthat.Rโ€™ ERROR Running the
    tests in โ€˜tests/testthat.Rโ€™ failed. Last 13 lines of output: 6:
    resp_to_spatial(get_singletons(endpoints()$network, ids = id)$body)
    7: get_singletons(endpoints()$network, ids = id) 8: httr::GET(url,
    config = httr::add_headers(`Content-type` = "application/json"),
    ua, ...) 9: request_perform(req, hu$handle$handle) 10:
    request_fetch(req$output, req$url, handle) 11:
    request_fetch.write_memory(req$output, req$url, handle) 12:
    curl::curl_fetch_memory(url, handle = handle) โ•โ• testthat results
    โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• OK: 46
    SKIPPED: 0 FAILED: 1 1. Error: (unknown) (@test-as.igraph.R#4)
    Error: testthat unit tests failed Execution halted
โ”€

Package cleanup

  • Set previous Gab version on a branch (in order to not break mangal-wg/mangal-datasets), and set mangal_dev_v2 as master
  • Move generic API function (GET) to ZZZ.r to improve user experience

search_datasets() purrr:map_() strange behavior

Hi,

I've just noticed a curious behavior of search_datasets(). When I tried to use purrr::map_ the code execution is way more slow, see below the result of this benchmark:

### Data

dataset_name <- c("fautin_1993", "ricciardi_2010", "edwards_1982")
 
#### First benchmark
tini <- Sys.time()

res_test_1 <- dataset_name %>%
  map_dfr(~search_datasets(search = x))
 
tfin <- Sys.time()

tfin - tini

Time difference of 3.554202 mins

### Second benchmark
tini <- Sys.time()

res_test_2 <- search_datasets(dataset_name[1])

Found 1 dataset(s) for query: fautin_1993

for(i in 2:length(dataset_name)){
  res_test_2 <- bind_rows(res_test_2, search_datasets(dataset_name[i]))
}

Found 1 dataset(s) for query: ricciardi_2010
Found 1 dataset(s) for query: edwards_1982

tfin <- Sys.time()

tfin - tini

Time difference of 2.091179 secs

I can't explain to myself this huge difference.

add `verbose` argument

message() is called in many functions, I think it would make sense to add an argument verbose to suppress them if required.

Force coercion of each entry

Make sure that every entries are of the good type : use as.character or as.integer inside each POST_*() function.

Error `list_datasets()` if some references field are NULL

Hi,

Maybe there were some recents change about db architecture, because list_datasets() is not working anymore.

When I tried to download a dataset (ie: kemp_1977) I got this:

> list_datasets("kemp_1977")

Found 1 dataset(s) for keywork: kemp_1977
Erreur : All columns in a tibble must be 1d or 2d objects:

  • Column doi is NULL
  • Column jstor is NULL
  • Column pmid is NULL
  • Column paper_url is NULL
    Call rlang::last_error() to see a backtrace

See the return of the db below.
Capture dโ€™รฉcran 2019-03-28 ร  09 30 11

print edges errors after coercion done in as.igraph()

R>  insects_networks <- get_collection(search_networks(query='insect%'))
Found 14 networks                                                            
R> insects_network <- insects_networks[[1]]                                  
R>  ig_network <- as.igraph(insects_network)
R> ig_network
IGRAPH eac6924 DN-- 29 38 -- 
+ attr: name (v/c), original_name (v/c), node_level (v/c), network_id
| (v/n), taxonomy_id (v/n), created_at (v/c), updated_at (v/c),
| taxonomy.id (v/n), taxonomy.name (v/c), taxonomy.ncbi (v/n),
| taxonomy.tsn (v/n), taxonomy.eol (v/n), taxonomy.bold (v/n),
| taxonomy.gbif (v/n), taxonomy.col (v/c), taxonomy.rank (v/c),
| taxonomy.created_at (v/c), taxonomy.updated_at (v/c), taxonomy (v/l),
| date (e/c), direction (e/c), type (e/c), method (e/c), attr_id (e/n),
| value (e/n), public (e/l), network_id (e/n), created_at (e/c),
| updated_at (e/c), attribute.id (e/n), attribute.name (e/c),
| attribute.description (e/c), attribute.unit (e/c),
| attribute.created_at (e/c)
+ edges from eac6924 (vertex names):
Error in seq_len(no) : argument must be coercible to non-negative integer

expand mgNetwork with traits

Mangal db structure offers the opportunity to store traits taken on nodes (nodes table) and/or to store generic traits on taxa (taxonomy table). I plan to add a logical argument expand_trait (default: FALSE) to the function get_network_by_id() to add traits to mgNetwork (network$traits).

merge in `search_interactions()`

So far the current approach is to add data frame in column which I think is confusing for users.
This is the case for the networks column and the the additional appended when expand_node = TRUE. For the latter I propose something like :

if (expand_node) {
    tmp <- as.data.frame(get_singletons(endpoints()$node,
        interactions$node_from))
    names(tmp) <- paste0("node_from_", names(tmp))
    interactions <- cbind(tmp, interactions)
    #
    tmp <- as.data.frame(get_singletons(endpoints()$node,
        interactions$node_to))
    names(tmp) <- paste0("node_to_", names(tmp))
    interactions <- cbind(tmp, interactions)
 }

But there may be better option.

pkgdown: issue on inherit methods.

Hard to fix this one... don't know what it is.
@KevCaz , could you have a look if you have a moment.
Thanks !

> pkgdown::build_article("rmangal")
Reading 'vignettes/rmangal.Rmd'
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function 'mapView' for signature '"tbl_df"'

Returning API error mesages

Il faudrait faire un checkup pour รชtre certain que les fonctions POST couvrent bien tout les types de messages d'erreur que l'API peut lancer

  • POST_trait
  • POST_line
  • POST_table
  • ...

output argument in `search_datasets()`

Are raw and list useless in this context?

R> res4 <- search_datasets(query = "2011", output = "list")
Found 1 datasets
Error in datasets[i, "id"] : no 'dimnames' attribute for array
R> res4 <- search_datasets(query = "2011", output = "raw")
Found 1 datasets
Error in datasets[i, "id"] : no 'dimnames' attribute for array

`list_datasets` from known datasets names

Hi guys,

I am testing the new functions of rmangal. Right now I am trying to retrieve datasets whom I know the name. According to the documentation list_datasets() can accept arguments to pass at the get_gen() function.

So I want for exemple retrieve the dataset named kemp_1977, I try to do this:

list_datasets(query = paste0("name=", "kemp_1977"))

Or I try to do something like this :

list_datasets(query = list(name = "kemp_1997"))

Each time I got back the same error message

Error in get_gen(endpoints()$dataset, query = list(q = search), ...) :
argument formel "query" correspondant ร  plusieurs arguments fournis

Is it a bug or am I doing something wrong?

weird behavior with `sf`

Test commented in #43

   Running the tests in โ€˜tests/testthat.Rโ€™ failed.
   Last 13 lines of output:
     Did you rename it, without setting st_geometry(obj) <- "newname"?
     1: search_networks(area) at testthat/test-search_networks.R:8
     2: sf::st_transform(polygon, crs = 4326)
     3: st_transform.sf(polygon, crs = 4326)
     4: st_transform(st_geometry(x), crs, ...)
     5: st_geometry(x)
     6: st_geometry.sf(x)
     7: stop("attr(obj, \"sf_column\") does not point to a geometry column.\nDid you rename it, without setting st_geometry(obj) <- \"newname\"?")
     
     โ•โ• testthat results  โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
     OK: 12 SKIPPED: 0 FAILED: 1
     1. Error: (unknown) (@test-search_networks.R#8) 
     
     Error: testthat unit tests failed
     Execution halted

POST_Ref() erreur URL

La fonction qui envoie la table ref n'a pas le bon URL. Aucune indication sur ce que serait le bon URL.

search_interactions()

Search network interactions based on type:

avail_type <- function() c(
  "competition",
  "amensalism",
  "neutralism",
  "commensalism",
  "mutualism",
  "parasitism",
  "predation",
  "herbivory",
  "symbiosis",
  "scavenger",
  "detritivore",
  "unspecified"
)

Put attribute in lower case

Attributes that are used by the api for the unique constrain should be in lower case

-> Add tolower(attribute) in injection fonctions

search_taxa()

Args

  • query: full search on node.original_name (provided by authors) and taxonomy.name (Taxonomy homogenized)
  • bold: specific search on taxonomy.bold reliable to bold systems infra
  • eol: specific search on taxonomy.eol reliable to EOL infra
  • tsn: specific search on taxonomy.tsn reliable to ITIS infra
  • eol: specific search on taxonomy.eol reliable to Encyclopedia of life infra

Remove tibble dependency

If I am correct, tibble is only used for its well-thought print method. A way of using it without depending on tibble is to set the class of data frames to c("tbl_df", "tbl", "data.frame")ย then if tibble is loaded the tibble's print method will be used! I got the hint form Noam Ross (ropensci/software-review#244).

Update README

  • Travis badges, R dev status, appVeyor status
  • Getting started (install procedure)
  • Authentification mandatory for POST / Procedure to get ORCID token

`mg_network()` 404 error

I had tested the mg_network() function before the pulled request #24 and it was working fine. Now I have updated my rmangal version, I got an error when I try to use this function :

mg_network(16) # id of kemp_1977 datatset's

API request failed: [404]
FALSE

It seems that the function is stuck in a infinite loop or something like this, because I need to abort the operation by hand. It is not a server issue beaucoup with my own function clementviolet/get_mangal/, I can access to this network and in Postman too.

Vignettes

List of comprehensive vignettes

  1. How to list datasets?
  2. How to retrieve networks from one and/or list of datasets?
  3. How to retrieve all/specific networks?
  4. Search over taxonomic name, retrieve networks which include this taxa
  5. Search by location, used buffer, polygons to retrieve networks.
  6. How to use the ids (bold, tsn, ncbi etc.) to get extra taxonomic informations using taxize?

Citation file

Hi everyone!

I am writing my manuscript about my internship, and I would like to cite the rmangal package. I don't know how you want it to be cited, but I am making a proposal below.

@Manual{,
title = {rmangal},
author = {}, 
year = {2019},
note = {R package version 0.6.0},
url = {https://github.com/mangal-wg/rmangal}
}

I intentionally left blank the author field, because I don't know the order of the authors.

`rbind.sf` not exported by `sf`

rbind.sf() is not exported by sf (you can access it with sf:::rbind.sf()) so :

purrr::reduce(purrr::map(responses,"body"), sf::rbind.sf)

(in api_utils.R) won't work:

R> sf::rbind.sf

Error: 'rbind.sf' is not an exported object from 'namespace:sf'

It looks like we can use the default rbind() (as explains here r-spatial/sf#92), worst case scenario we will have to copy paste the code of the function.

get_network_by_id() enhancements

  1. If the geom.type is polygon, then only the first coordinates is returned.
    See get_network_by_id(1101)$network$geom.coordinates

  2. If the id doesn't exist, the function return a 404 error which is right, but doesn't stop following API calls for nodes, edges, etc.

Javascript environment table

Change the type of variable from FLOAT to STRING, so we can put qualitative environmental information as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.