Giter VIP home page Giter VIP logo

rfishbase's Introduction

rfishbase

R-CMD-check Coverage status Onboarding CRAN status Downloads

Welcome to rfishbase 5! This is the fourth rewrite of the original rfishbase package described in Boettiger et al. (2012).

Another streamlined re-design following new abilities for data hosting and access. This release relies on a HuggingFace datasets hosting for data and metadata hosting in parquet and schema.org.

Data access is simplified to use the simple HuggingFace datasets API instead of the previous contentid-based resolution. This allows metadata to be defined with directly alongside the data platform independent of the R package.

A simplified access protocol relies on duckdbfs for direct reads of tables. Several functions previously used only to manage connections are now deprecated or removed, along with a significant number of dependencies.

Core use still centers around the same package API using the fb_tbl() function, with legacy helper functions for common tables like species() are still accessible and can still optionally filter by species name where appropriate. As before, loading the full tables and sub-setting manually is still recommended.

Historic helper functions like load_taxa() (combining the taxonomic classification from Species, Genus, Family and Order tables), validate_names(), and common_to_sci() and sci_to_common() should be in working order, all using table-based outputs.

  • rfishbase 1.0 relied on parsing of XML pages served directly from Fishbase.org.
  • rfishbase 2.0 relied on calls to a ruby-based API, fishbaseapi, that provided access to SQL snapshots of about 20 of the more popular tables in FishBase or SeaLifeBase.
  • rfishbase 3.0 side-stepped the API by making queries which directly downloaded compressed csv tables from a static web host. This substantially improved performance a reliability, particularly for large queries. The release largely remained backwards compatible with 2.0, and added more tables.
  • rfishbase 4.0 extends the static model and interface. Static tables are distributed in parquet and accessed through a provenance-based identifier. While old functions are retained, a new interface is introduced to provide easy access to all fishbase tables.

We welcome any feedback, issues or questions that users may encounter through our issues tracker on GitHub: https://github.com/ropensci/rfishbase/issues

Installation

remotes::install_github("ropensci/rfishbase")
library("rfishbase")
library("dplyr") # convenient but not required

Getting started

Generic table interface

All fishbase tables can be accessed by name using the fb_tbl() function:

fb_tbl("ecosystem")
# A tibble: 160,334 × 18
   autoctr E_CODE EcosystemRefno Speccode Stockcode Status CurrentPresence
     <int>  <int>          <int>    <int>     <int> <chr>  <chr>          
 1       1      1          50628      549       565 native Present        
 2       2      1            189      552       568 native Present        
 3       3      1            189      554       570 native Present        
 4       4      1          79732      873       889 native Present        
 5       5      1           5217      948       964 native Present        
 6       7      1          39852      956       972 native Present        
 7       8      1          39852      957       973 native Present        
 8       9      1          39852      958       974 native Present        
 9      10      1            188     1526      1719 native Present        
10      11      1            188     1626      1819 native Present        
# ℹ 160,324 more rows
# ℹ 11 more variables: Abundance <chr>, LifeStage <chr>, Remarks <chr>,
#   Entered <int>, Dateentered <dttm>, Modified <int>, Datemodified <dttm>,
#   Expert <int>, Datechecked <dttm>, WebURL <chr>, TS <dttm>

You can see all the tables using fb_tables() to see a list of all the table names (specify sealifebase if desired). Careful, there are a lot of them! The fishbase databases have grown a lot in the decades, and were not intended to be used directly by most end-users, so you may have considerable work to determine what’s what. Keep in mind that many variables can be estimated in different ways (e.g. trophic level), and thus may report different values in different tables. Also note that species is name (or SpecCode) is not always the primary key for a table – many tables are specific to stocks or even individual samples, and some tables are reference lists that are not species focused at all, but meant to be joined to other tables (faoareas, etc). Compare tables against what you see on fishbase.org, or ask on our issues forum for advice!

fish <- c("Oreochromis niloticus", "Salmo trutta")

fb_tbl("species") %>% 
  mutate(sci_name = paste(Genus, Species)) %>%
  filter(sci_name %in% fish) %>% 
  select(sci_name, FBname, Length)
# A tibble: 2 × 3
  sci_name              FBname       Length
  <chr>                 <chr>         <dbl>
1 Oreochromis niloticus Nile tilapia     60
2 Salmo trutta          Sea trout       140

In most tables, species are identified by SpecCode (as per best practices) rather than scientific names. Multiple tables can be joined on the SpecCode to more fully describe a species.

To filter species by taxonomic names, use the taxa table from load_taxa(), which provides a joined table of taxonomy from subspecies up through Class, along with the corresponding FishBase taxon ids codes. Here is an example workflow joining two of the spawing tables and filtering to the grouper family, Epinephelidae:

library(rfishbase)
library(dplyr)

## Get the whole spawning and spawn agg table, joined together:
spawn <- left_join(fb_tbl("spawning"),  
                   fb_tbl("spawnagg"), 
                   relationship = "many-to-many")

# Filter taxa down to the desired species
groupers <- load_taxa() |> filter(Family == "Epinephelidae")

## A "filtering join" (inner join) 
spawn |> inner_join(groupers)
# A tibble: 227 × 95
   autoctr StockCode SpecCode SpawningRefNo SourceRef C_Code E_CODE
     <int>     <int>    <int>         <int>     <int> <chr>   <int>
 1      18        18       12          5222      3092 528A       NA
 2      19        18       12         26409      1784 388       145
 3      20        20       14         26409        NA 192        NA
 4    9147        20       14        118249    118249 826E        8
 5      22        21       15          5241      5241 630        NA
 6      23        21       15          5241      6484 388        NA
 7      24        21       15          5241      3095 060        NA
 8      24        21       15          5241      3095 060        NA
 9      24        21       15          5241      3095 060        NA
10      24        21       15          5241      3095 060        NA
# ℹ 217 more rows
# ℹ 88 more variables: SpawningGround <chr>, Spawningarea <chr>, Jan <dbl>,
#   Feb <dbl>, Mar <dbl>, Apr <dbl>, May <dbl>, Jun <dbl>, Jul <dbl>,
#   Aug <dbl>, Sep <dbl>, Oct <dbl>, Nov <dbl>, Dec <dbl>, GSI <int>,
#   PercentFemales <int>, TempLow <dbl>, TempHigh <dbl>, SexRatiomid <dbl>,
#   SexRmodRef <int>, FecundityMin <int>, WeightMin <dbl>,
#   LengthFecunMin <dbl>, LengthTypeFecMin <chr>, FecundityRef <int>, …

Species Names

Always keep in mind that taxonomy is a dynamic concept. Species can be split or lumped based on new evidence, and naming authorities can disagree over which name is an ‘accepted name’ or ‘synonym’ for any given species. When providing your own list of species names, consider first checking that those names are “valid” in the current taxonomy established by FishBase:

validate_names("Abramites ternetzi")
[1] "Abramites hypselonotus"

rfishbase can also provide tables of synonyms(), a table of common_names() in multiple languages, and convert common_to_sci() or sci_to_common()

common_to_sci(c("Bicolor cleaner wrasse", "humphead parrotfish"), Language="English")
# A tibble: 5 × 4
  Species                ComName                     Language SpecCode
  <chr>                  <chr>                       <chr>       <int>
1 Labroides bicolor      Bicolor cleaner wrasse      English      5650
2 Chlorurus cyanescens   Blue humphead parrotfish    English      7909
3 Bolbometopon muricatum Green humphead parrotfish   English      5537
4 Bolbometopon muricatum Humphead parrotfish         English      5537
5 Chlorurus oedema       Uniform humphead parrotfish English      8394

Note that the results are returned as a table, potentially indicating other common names for the same species, as well as potentially different species that match the provided common name! Please always be careful with names, and use unique SpecCodes to refer to unique species.

SeaLifeBase

SeaLifeBase.org is maintained by the same organization and largely parallels the database structure of Fishbase. As such, almost all rfishbase functions can instead be instructed to address the

fb_tbl("species", "sealifebase")
# A tibble: 102,464 × 111
   SpecCode Genus   Species Author SpeciesRefNo FBname FamCode Subfamily GenCode
      <int> <chr>   <chr>   <chr>         <int> <chr>    <int> <chr>       <int>
 1    57969 Abdopus horrid… (D'Or…        96968 Red S…    1890 Octopodi…   24384
 2    57836 Abdopus tenebr… (Smit…           19 <NA>      1890 Octopodi…   24384
 3    57142 Abdopus tongan… (Hoyl…           19 <NA>      1890 Octopodi…   24384
 4  2381155 Abdopus undula… Huffa…        84307 <NA>      1890 <NA>        24384
 5    14647 Abebai… troglo… Vande…           19 <NA>       572 <NA>         9260
 6   165283 Aberom… muranoi Baces…       104101 <NA>       616 <NA>        33537
 7   140720 Aberra… banyul… Macki…        85340 <NA>       174 <NA>         9262
 8    40346 Aberra… enigma… unspe…           19 <NA>       174 <NA>         9262
 9    20199 Aberra… aberra… (Barn…           19 <NA>       308 <NA>         9263
10    93706 Aberro… verruc… Kasat…         3696 <NA>       922 <NA>        17969
# ℹ 102,454 more rows
# ℹ 102 more variables: TaxIssue <int>, Remark <chr>, PicPreferredName <chr>,
#   PicPreferredNameM <chr>, PicPreferredNameF <chr>, PicPreferredNameJ <chr>,
#   Source <chr>, AuthorRef <int>, SubGenCode <int>, Fresh <int>, Brack <int>,
#   Saltwater <int>, Land <int>, BodyShapeI <chr>, DemersPelag <chr>,
#   Amphibious <chr>, AmphibiousRef <int>, AnaCat <chr>, MigratRef <int>,
#   DepthRangeShallow <int>, DepthRangeDeep <int>, DepthRangeRef <int>, …

Versions and importing all tables

By default, tables are downloaded the first time they are used. rfishbase defaults to download the latest available snapshot; be aware that the most recent snapshot may be months behind the latest data on fishbase.org. Check available releases:

available_releases()
[1] "19.04" "21.06" "23.01" "23.05" "24.07"

Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

ropensci_footer

rfishbase's People

Contributors

cboettig avatar harryganz avatar karthik avatar oharac avatar philipp-neubauer avatar raymondben avatar sckott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rfishbase's Issues

can't get getTrophicLevel() to work

Hi,
I would like to extrapolate trophic levels for a subset of fish.data:

getTrophicLevel(fish.data[1:10]) # works

reef <- which_fish("reef", "habitat", fish.data)
reef_troph<-getTrophicLevel(fish.data[reef]) # does not work

Error in xpathApply(summaryPage, "//*[contains(@href, '/Ecology/FishEcologySummary.php')][1]", :
subscript out of bounds

What am I doing wrong?

Cheers,
Cami

List categories that are included within "trophic"

My aim is to obtain data on species richness and biomass for herbivorous fishes at a global extent. The first step is to determine which fish within the database are herbivores. To do this I am using the which_fish function, e.g.,

data(fishbase)
herb <- which_fish("herbivore","trophic")

I am sure I am missing species because they are categorized as something other than "herbivore". So I was trying to get a list of all the existing categories within "trophic" so I can make sure I include all the relevant search terms in addition to herbivore (e.g., "algae", "plant", "seaweed"...).

I wondered about using

unique(trophic)

but cannot seem to extract the "trophic" data from the list.

I hope that makes sense.
Thanks
Sal

SeaLifeBase?

Any chance of doing something like rfishbase, but for SeaLifeBase.org? They are similar websites.

Cannot download from GIT

Hello! I am trying to down load the most recent version of Rfishbase in order to use the length_weight command. I use this code;

library("devtools", lib.loc="/Library/Frameworks/R.framework/Versions/3.0/Resources/library")
devtools::install_github("ropensci/rfishbase")

Which failed (see below)! Why did this happen?

Downloading github repo ropensci/rfishbase@master
Installing rfishbase
Installing dependencies for rfishbase:
dplyr, lazyeval, tidyr

There is a binary version available (and will be installed) but the source version is later:
binary source
dplyr 0.2 0.4.1

trying URL 'http://cran.rstudio.com/bin/macosx/contrib/3.0/dplyr_0.2.tgz'
Content type 'application/x-gzip' length 2991757 bytes (2.9 Mb)

opened URL

downloaded 2.9 Mb

The downloaded binary packages are in
/var/folders/3_/f4qx5lb92235mm8_spksxfdw0000gp/T//RtmpJRG46J/downloaded_packages
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL
'/private/var/folders/3_/f4qx5lb92235mm8_spksxfdw0000gp/T/RtmpJRG46J/devtools47d1b77d660/ropensci-rfishbase-503f476'
--library='/Library/Frameworks/R.framework/Versions/3.0/Resources/library' --install-tests

ERROR: dependencies ‘lazyeval’, ‘tidyr’ are not available for package ‘rfishbase’

  • removing ‘/Library/Frameworks/R.framework/Versions/3.0/Resources/library/rfishbase’
    Error: Command failed (1)
    In addition: Warning message:
    packages ‘lazyeval’, ‘tidyr’ are not available (for R version 3.0.2)

Access to 'Predators' table

Another request... I'm wondering if it would be relatively simple to add access to the 'Predators' table on Fishbase?

Error: could not find function "validate_names"

I installed rfishbase from github using devtools under R 3.1.2 using OSX Mavericks. When I run validate_names I get the following error:

Error: could not find function "validate_names"

I am testing it on a single species: validate_names("Abramis brama"). Other functions I have tried like ecology("Abramis brama") are working just fine.

I'm assuming this is similar to the issue here: #32
since I just tried

rfishbase:::validate_names("Abramis brama")

and that seemed to work. Is there a new function that does something similar?

Thanks!

familySearch Error: could not find function "familySearch"

Hey,

I installed rfishbase from github using devtools under R 3.1.2 using OSX Mavericks. When I run familySearch or habitatSearch I get the following error: Error: could not find function "familySearch". I am using the examples from the tutorial (https://github.com/ropensci/rfishbase/blob/master/inst/doc/rfishbase/rfishbase_github.md).

My install commands were:
library(devtools)
install_github("rfishbase", "ropensci")

Then I ran with these:
rm(list=ls())
require(rfishbase)
require(XML)
require(RCurl)

We installed from the CRAN packages as well with the same results.

?morphology and column nams

Hi @cboettig,
When I type in the following I get this error.

?morphology
Warning messages:
1: In fetch(key) : internal error -3 in R_decompress1
2: In strsplit(msg, "\n") : input string 1 is invalid in this locale

Any thoughts? I want to find the Max total length and common length column names.
Thanks,
K

Min/Max temp?

In rfishbase 2.0, is there a function to extract the min and max temperatures per species? This info is found under Environment / Climate / Range section on fishbase - similar to requests in issue #19. In the previous version, getEnviroClimateRange() spit it out along with range info but I'm not seeing the same functionality in either species(), ecology() or distribution(). Not sure if I'm missing something but it would be nice to have min/max and preferred min/max temps if available per species (similar to how Depth is provided)

Review of rfishbase

Fishbase.org serves as a source for user-generated categorical and quantitative data on fish species, providing information on both species-specific summary pages and through linked data tables.
The rfishbase package provides an easy and intuitive way to access and collate these data, which would otherwise require the user to either manually generate their own data tables or navigate to linked pages to individually download them. rfishbase's species function quickly assembles these data. The package provides functions that enabled comparisons across multiple species and allowed me to tailor my data tables to variables of interest. The documentation provides straightforward descriptions of the main functions with examples.

A major strength of rfishbase is that it does the data wrangling for you - tables are automatically formatted using the plyr and tidyr packages. As a result, I found it easy to visualize my data as a nice ggplot graphic.

As mentioned in the package documentation, there are hidden data tables that aren't readily apparent from calling the species() command. I found myself wishing I had a function to list the accessible hidden data tables. e.g., I was able to use maturity() and reproduction() to access associated data but got an error trying to access the online recruitment data with recruitment(). Also on my wish list would be a way to list locations for the StockCode values that I found in several of the hidden files. All in all, thanks for making a really great and useful package! I'm looking forward to using rfishbase in my research.

Lauren Yamane

getGrowth?

I would be thrilled to pull out k, Linf, and what type of length measure it is (TL, SL, FL). But I will say it has been very fun exploring the length, weight, and age data so far!

species giving odd subscript error

I attempted to walk through the instructions in the readme and got the following error:

> species(fish[1:2])
Error in `[[.default`(col, i, exact = exact) : subscript out of bounds

access trophic level

Listed on it's own page in a table, e.g. http://fishbase.org/Ecology/FishEcologySummary.php?StockCode=1180&GenusName=Sparus&SpeciesName=aurata

and on the html summary pages, e.g. http://www.fishbase.org/Summary/speciesSummary.php?ID=347&AT=nile+perch

this information is sadly not available in the summary XML...
http://www.fishbase.us/maintenance/FB/showXML.php?identifier=FB-347&ProviderDbase=03

As this is such a common request maybe I should just attempt scraping this information.

Error using getEnviroClimateRange()

Hi Carl-
I'm trying to use getEnviroClimateRange() and it is giving me an error:

> data(fishbase)
> test = getEnviroClimateRange(fish.data[3])
Error in which(value == defs) : 
  argument "code" is missing, with no default
In addition: Warning message:
XML content does not seem to be XML: '' 

Oddly, it worked once for me and then has not since. I'll take a look at the function to see if I can figure out what is going on but would love help if possible.
I'm using R 3.1.0 and loaded rfisheries from GitHub yesterday. Let me know if you want more info

Package onboarding review

General comments

Great package and project!

  • I have very little to say about the codebase itself - it's well designed, and clever in its modularity and extensibility. I have more comments about the documentation below, but I realized that several of them pertain to the fact that function documentation is somewhat cookie-cutter from function to function, as you use roxygen2 inheritance rules to re-use parts of the documentation. Some functions could use more specific documentation, which may require breaking out of this templated approach.
  • Is there any way to query the database aside from just providing species names or codes? In general I found myself wondering if I could query the database in other ways. For instance, I'd like to get all species that had a certain diet or met certain morphometric criteria. There are a number of different views on the fishbase website that allow one to do this. Of course, this would require a lot more functions, but perhaps it's worth it to build in some more general way
    to access the SQL database through the API?

Tests

  • Lots of test and excellent coverage! Passes, though I had to bypass needs_api() because the server was returing "mysql_server_up": false.
  • ping()could use a more verbose name. (ping_fishbase_server()`?)

README.md / vingette

  • I dislike the use of "Species list" because it is easy to confuse with an R list. May you can use the term "vector" from the start?
  • Some of the output in README.md is prepended with a funny # output:. This probably has to do with a custom knitr setup.
  • Nice warning for misapplied names!
  • Vingette should document the list of all taxa

Functions

  • Maintain naming consistency: speciesnames --> species_names, commonnames --> commonnames
  • commonnames returns ? in place of Mandarin Chinese names. This appears to occur at the server level, though not on the fishbase website. Can the server not serve unicode? If so, I would recommend some notice in the documentation of this current limitation, as it may preclude use by users of non-Roman languages.
  • Several tables return a reference field (e.g., DietRefNo from 'diet()), but there is no function (or API functionality) to retrieve those references. It would be useful to be able to look these up.

Code of conduct

Documentation

  • I'm not sure of the target user, but many might encounter FishBase through rfishbase for the first time. The README/vingette/help files could use some links to relevant fishbase documentation.
  • Some examples (e.g., speed("Oreochromis niloticus"), also ration(), ) return zero results. Also the species_list() example in the vingette returns zero resuts. Examples should be queries that return results.
  • All function documentation should have a link to the relevant field description pages in Fishbase manual for the fields parameter, or at least standard guidance of how to find the relevant fields in the Fishbase manual. I realized that this documentation is incomplete, but at least many of the fields are described in detail. Another useful approach would be a function (or function option) to
    return the available fields of a table.
  • Similarly, is there a page that lists available languages that could be linked to in the docs for common_to_sci()?
  • species() and species_info() do the same thing. The help file is a bit confusing as the title, usage, and examples alternate between using each. I recommend eliminating/deprecating one usage and making the help file consistent. The species_fields convience object also should be documented in here, or its file should be linked to.

occurrence broken (api?)

Might be the same api issue as before

occurrence("Oreochromis niloticus")
Source: local data frame [0 x 0]

Warning messages:
1: In check_and_parse(resp) : server error: (500) Internal Server Error
2: In error_checks(parsed, resp = resp) :
  server error for query http://fishbase.ropensci.org/occurrence?SpecCode=2&limit=200&fields=

Request: Common Name

It'd be nice to grab the common name (i.e., "English name"). So for Gadus macrocephalus, I could get "Pacific Cod".

validate_names returns array of suggestions errantly

I am using validate_names on a list of over 1000 scientific names, which are mostly correct, but may have small spelling errors occasionally. I've included a subset of the species below.

> species = c("Ablennes hians", "Acanthopagrus schlegeli", "Acanthopagrus berda", "Auxis thazard", "Auxis rochei")

Of these species, I can (after-the-fact) verify that A. schegeli, A. thazard, and A. rochei are misspelled. However, when I run validate_names, several interesting things happen...

> validate_names(species)

[[1]]
[1] "Ablennes hians"

[[2]]
[1] "Acanthopagrus schlegelii schlegelii"

[[3]]
[1] "Acanthopagrus berda"     "Acanthopagrus vagus"     "Acanthopagrus pacificus"

[[4]]
[1] "Auxis rochei rochei"   "Auxis thazard thazard"

[[5]]
[1] "Auxis rochei rochei"

The function returns an array of arrays. Within that...

  1. A. hians had no spelling issues, and was returned correctly;
  2. A. schlegelii had a spelling issue and returned fixed;
  3. A. berda had no spelling issues, but an array of alternatives were also returned;
  4. A. thazard was misspelled, and was returned fixed, but placed second in an array of alternatives;
  5. A. rochei was misspelled, and was returned fixed.

It seems like there are two issues here. The first is that even when there is an exact string match, an array of suggestions is returned (#3), when I would have thought that just the original string would be returned. Is this intentional? Second, when a species names does need to be corrected, the closest match isn't always returned first in the array of suggestions (#4).

For me, ideally only the best match would be returned, and at the very least the best match would be returned first in an array of alternatives. This is because, now, if I use this array of arrays as an input to another function, like species_info, only the first entry of each array is used as the species name parameter. This leads to duplication issues, and left out A. thazard entirely, as you can see below.

> speciesList = validate_names(species)
> species_info(speciesList, fields = c("SpecCode", "Genus", "Species", "FBname"))

Source: local data frame [5 x 4]

  SpecCode         Genus               Species             FBname
1      972      Ablennes                 hians    Flat needlefish
2     6531 Acanthopagrus schlegelii schlegelii Blackhead seabream
3     5526 Acanthopagrus                 berda  Goldsilk seabream
4       93         Auxis         rochei rochei        Bullet tuna
5       93         Auxis         rochei rochei        Bullet tuna

More Testing

Hehe, more generally @jebyrnes is clearly demonstrating that I need to add tests for these (shows you how meaningless 99% test coverage can be).

@sckott Ideally we should just have some automated testing of the endpoints on fishbaseapi ropensci/fishbaseapi#40, which would be a more general solution than just testing the R functions.

Migrating from earlier versions of rfishbase

Many of the function calls and return types in rfishbase 0.* were relatively crude, and thus it does not make much sense to mimic them. Instead, we should simply provide a guide to users to map any functionality in the original version into the 2.0 version.

Here's a complete list of functions in the current (v0.2-3) NAMESPACE, along with mappings that exist so far. Will continue to update this as we get further along in rfishbase2.0.

  • findSpecies, fish_names, which_fish(Family/Order/Class = x), getIds are all replaced by species_list()
  • getDepth() is now in species_info() (see fields list)
  • getEnviroClimateRange() basic information is in species_info(), more details in the various distribution functions and in ecology() table.
  • getFaoArea() is now faoareas()
  • getFoodItems is now fooditems()
  • getLengthWeight() is now length_weight()
  • getMetabolism is now oxygen()
  • getPictures() pending ...
  • getPredators() is now predators()
  • getQuantTraits was a rather arbitrary combination of traits, now better allocated. Some of these are returned by the general species_info() table, others by specific tables such as morphology() or morphometrics()
  • getRefs() pending
  • getSize() is in species_info()
  • getTrophicLevel is in ecology()
  • loadCache and updateCache are no longer needed, all functions are live except for queries to the taxa-table, which still uses caching and has load_taxa(update=TRUE) to update the cache.
  • which_fish() There are no longer generic methods to query for which species match certain criteria. The current approach requires querying the complete table wherein such data are held and filtering that. Some examples of this should be added to the vignette.

Get FAO areas?

hoping to look at species distributions per FAO fishing areas and am wondering if it's possible to run a quick query for this within rfishbase for a given list of species? It seems that there isn't a search term for the FAO areas within the function "which_fish" - but the information is on fishbase.org.

Missing getTropicLevel data?

Hi Carl,

I'm using the rfishbase package to gather a series of data for a modelling exercise. I just notice that some of the species that have Tropical Level data on the their fishbase summary pages, when queried using the getTropicalLevel() commend, display only NA. Examples are Carangoides_armatus, Lethrinus_erythracanthus.

What caused this? Is there a fix?

Many thanks! Xueying

Get references

Get full reference information from the specified field or query.

This information is in the XML of each post -- a parsing function needs to extract all this data and index it by fishbase number, then another function needs to match the ref id numbers I currently extract with getRef with the full reference information.

Methods should all use latest cache, with a warning

Currently methods use the provided cache unless specifically handed the path to a new cache, even after the updateCache() function has been run. Ideally, the code should automatically search for new caches, and use them if available (with warning and override options), while never overwriting the original Cache.

Waiting on this as the entire cache structure might change once we have a real API.

Trophic info Ecology table imported differently?

Great project! Thanks for your effort in developing this package.

When I import info from the ecology table, in my case regarding the fish diet, it seems as if some info is imported differently. Maybe this is the result of the package importing something other than the info that is displayed on the ecology summary page on the fishbase website, but I have not been able to deduce where it comes from exactly.

For example:

sp1<-c("Myxine glutinosa","Anoplopoma fimbria","Ammodytes dubius")
diet<-c("SpecCode","DietTroph","DietSeTroph","DietTLu","DietseTLu","FoodTroph","FoodSeTroph")
ecology(sp1,fields=diet)

gives:
SpecCode DietTroph DietSeTroph DietTLu DietseTLu FoodTroph FoodSeTroph
1 2513 3.45 0.66 3.40 0.59 4.01 0.70
2 512 3.83 0.61 NA NA 4.11 0.88
3 3821 3.14 0.22 3.11 0.17 3.27 0.38

The summary of the ecology on the fishbase website, gives the following (varaibles in the same order):
1 2514 4.54 NA 3.40 0.59 4.01 0.70
2 512 3.84 0.21 NA NA 4.11 0.88
3 3821 3.20 0.07 3.11 0.17 3.27 0.38

The "DietTLu" and "FoodTroph" variables match perfectly, but the "DietTroph" info deviates. This is especially obvious for the first species in the example.

What do you think creates this difference? Thanks so much!
Floor

minor bug in selecting server

First, great work.

I'm using your work to do something similar for PHP - the current idea is just to use it internally in the effort of "automatically fill in data fields" for the websites I work for (as volunteer developer) with Julian Dignall, Jools, (owner + developer), www.planetcatfish.com and www.aquaticrepublic.com - these sites contain data for aquarium hobbyists (and also used as a reference by many professionals, ranging from scientists to commercial vendors of various fish)

The catfish side is pretty well established and we have a large number of species (3281 at time of writing) and many pictures (around 14000).

The "every fish" site of aquatic republic is still being developed - we only have around 1900 species, and most of those have "poor data", and we only have around 2000 pictures for those 1900 species.

Anyway, that's my "longer than I wanted it to be" introduction... I found a little bug, I think:

' @param server the index of the server to use. 1 is sinica (Tiawan), 2 is US.

...
fishbase <- function(fish.id, curl=getCurlHandle(), server=2){
servers <- c("http://fishbase.sinica.edu.tw/", "http://www.fishbase.us/")
...
url <- paste(servers[2],
"maintenance/FB/showXML.php?identifier=FB-",
fish.id, "&ProviderDbase=03", sep="")

surely that should be:

url <- paste(servers[server],
"maintenance/FB/showXML.php?identifier=FB-",
fish.id, "&ProviderDbase=03", sep="")

If I find anything else, I'll let you know.

getLengthWeight

I am not able to get the getLengthWeight function to work, and not sure if it is a bug, or something I am doing. I have tried it out on a few species, and get similar errors:

getLengthWeight(fish.data[1])
Error in $<-.data.frame(*tmp*, "r2", value = numeric(0)) :
replacement has 0 rows, data has 25
getLengthWeight(fish.data[2])
Error in $<-.data.frame(*tmp*, "r2", value = numeric(0)) :
replacement has 0 rows, data has 22
getLengthWeight(fish.data[3])
Error in $<-.data.frame(*tmp*, "r2", value = numeric(0)) :
replacement has 0 rows, data has 6

Summary of endpoints / tables to be implemented

See ropensci/fishbaseapi#2

Table list, as given under "More Information" on a species summary page. Doesn't quite map to the table list of the SQL, but probably more sensible for the endpoints.

  • Countries
  • FAO areas
  • Ecosystems
  • Occurrences
  • Introductions
  • Stocks
  • Ecology
  • Diet
  • Food items
  • Food consumption
  • Ration
  • Common names
  • Synonyms
  • Metabolism
  • Predators
  • Ecotoxicology
  • Reproduction
  • Maturity
  • Spawning
  • Fecundity
  • Eggs
  • Egg development
  • Age/Size
  • Growth
  • Length-weight
  • Length-length
  • Length-frequencies
  • Morphometrics
  • Morphology
  • Larvae
  • Larval dynamics
  • Recruitment
  • Abundance
  • References
  • Aquaculture
  • Aquaculture profile
  • Strains
  • Genetics
  • Allele frequencies
  • Heritability
  • Diseases
  • Processing
  • Mass conversion
  • Collaborators
  • Pictures
  • Stamps, Coins
  • Sounds
  • Ciguatera
  • Speed
  • Swim. type
  • Gill area
  • Otoliths
  • Brains
  • Vision

Parsing `getEnviroClimateRange` output

The Environment / Climate / Range paragraph seems to have some kind of structure, which I would be tempted to read as

Primary environment; info; info. Secondary environment; info; info; info.

But splitting on \\. is not a good solution, as there are some ref infos in this string, see http://www.fishbase.org/summary/14154. I think we can either remove the reference infos ((Ref. NNNNNN)), or write a clever regexp to take care of them. Then we can change the output format to something more like

Environment    Info
Freshwater       benthopelagic
Freshwater       pH range: 6.5 - 7.5
Tropical               22°C - 26°C

That's a quite significant rewrite, any comments?

Have findSpecies() match subspecies

Hi Carl-
First, thanks for your work on this package... it is VERY useful and I think it will have some far-reaching impacts on fisheries research once it gains some traction and even more so once the API is in place.
This is probably more of a request than an issue.
I have a data set that includes a few species which return no records for findSpecies(). However, these species DO have subspecies entries in the Fishbase database. It seems that this could be confusing for anyone that expected all species that exist in Fishbase to be returned. I'm wondering if you can have findSpecies() return all subspecies as TRUE or (even better) add an argument that allows you to choose to include them or not.
As an example:
sum(findSpecies("Clupea pallasii pallasii"))
[1] 1
sum(findSpecies("Clupea pallasii"))
[1] 0

If the matching is done internally using a regular expression, it might be as simple as adding a wildcard to the search terms you give findSpecies(), but there would need to be documentation for that.
Again, thanks for making this package a reality! Also, I'm not an R guru by any means, but let me know if there is anything that I can do to help with the package!
-Cotton

Environment / Climate / Range

This is more of a feature wish:
I was wondering and hoping if it would be easy to add Environment / Climate / Range as one of the types for the which_fish function. Alternatively, maybe a search anywhere for a keyword (not just within a category). Thanks!

Improve discoverability of tables

As @layamene so effectively put it in her review (#46), most tables are effectively "hidden." Tables should be more discoverable.

FishBase has too many tables. We need:

  • Clear, consise & searchable documentation of all endpoints / implemented tables
  • an endpoint to search across tables to find which one has a particular field,
  • Merge more tables in advance:
    • Tables with a common structure (e.g. 1 row per species) should be merged
    • More ID values should be supplemented with the most relevant information from common tables
  • ...

user agent `rfishbase`

@cboettig any interest in including the string rfishbase along with the others that httr uses

curl/7.38.0 Rcurl/1.95.4.5 httr/0.6.1

so it would be like

curl/7.38.0 Rcurl/1.95.4.5 httr/0.6.1 rfishbase/0.1.0

allow to differentiate requests from rfishbase vs. those form users not using rfishbase, though I imagine nearly all coming from R will come from this pkg

can't get getLengthWeight() to work

Hi,

I really like the rfishbase package - finding it really useful for my PhD!

When I run:

require(rfishbase)
data(fishbase)
out <- getLengthWeight(fish.data[1:2])

getLengthWeight(fish.data[1:2]) generates this error:

Error in $<-.data.frame(*tmp*, "r2", value = numeric(0)) :
replacement has 0 rows, data has 24

Do you know what's wrong?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.