Giter VIP home page Giter VIP logo

gn_crossmap's People

Contributors

dimus avatar waffle-iron avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gn_crossmap's Issues

bizarre matchedScore when adding a non-matching name

Probably an issue for GNresolver in general, but matching the name "Acanthurus chirurgus" gives a sensible result, with 2 possible names of equal score (0.988):

1,Acanthurus chirurgus,Canonical form exact match,Acanthurus chirurgus,"Acanthurus chirurgus (Bloch, 1787)",Acanthurus chirurgus,species,species,,,0,0.988,26846886
1,Acanthurus chirurgus,Canonical form exact match,Acanthurus chirurgus,"Acanthurus chirurgus (non Bloch, 1787)",Acanthurus chirurgus,species,species,synonym,"Acanthurus bahianus Castelnau, 1855",0,0.988,26831504

whereas matching the same name but with scientificNameAuthorship = "(Linnaeus)" (which shouldn't match either hit), means that priority is given to the name that is a synonym (scores of 0.988 vs 0.75)

1,Acanthurus chirurgus,(Linnaeus),Canonical form exact match,Acanthurus chirurgus (Linnaeus),"Acanthurus chirurgus (non Bloch, 1787)",Acanthurus chirurgus,species,species,synonym,"Acanthurus bahianus Castelnau, 1855",0,0.988,26831504
1,Acanthurus chirurgus,(Linnaeus),Canonical form exact match,Acanthurus chirurgus (Linnaeus),"Acanthurus chirurgus (Bloch, 1787)",Acanthurus chirurgus,species,species,,,0,0.75,26846886

Enable tab-separated output from crossmap

Since the return values from crossmap often include commas, the output fields are often quoted strings with embedded command, making it rather tedious to parse the output without using a proper csv parser. If it were possible to get an tab-separated output, this would presumably be much easier, since non of the return fields are likely to have a tab.

So could we have a switch (or perhaps even the default) to output with tabs as separators. Or perhaps you could simply use the same separator for output as was used in the input, which would reduce the number of switches needed.

On this topic, I don't know if the absence of tabs in the return fields is guaranteed (e.g. no tabs in the matchedName), but perhaps it should be, as I can't see any use for them, and it seems sensible to replace all occurrences of lines of whitespace characters with a single, normal space.

if Rank is empty program exists

During ingestion phase we try to infer rank. If rank haven't been created program exists, as it tries to call an array method on nil

Allow input files from stdin

This is not important , but for simple use, if might be nice to allow the command-line crossmap script to allow stdin as an input files (by convention this is usually done as e.g. -i - rather than -i filename.csv). For example, I have a dublin core file with species names as one of the lines, and I was thinking I would pass it through a simple perl script to pick out the names and feed them into crossmap. It's slightly more elegant if I can do this in a single pipeline, rather than having to save to intermediate files.

I only mention it as this is usually allowed by default by most command-line parsing libraries (e.g. argparse in python - I'm not familiar with ruby). So I'm surprised that it doesn't just come "for free" in crossmap.

scientificNameAuthorship should not require braces

At the moment, if I run the following input into crossmap -d 1, I get a 'canonical name exact match' rather than an 'exact match', because the author name isn't in braces.

taxonID,scientificNameAuthorship,scientificName
0,"Linnaeus, 1758",Abramis bjoerkna

So I have to do

taxonID,scientificNameAuthorship,scientificName
0,"(Linnaeus, 1758)",Abramis bjoerkna

Can you remove the requirement for the braces in the scientificNameAuthorship field? It would be nice if this could also match against "L., 1758", and maybe even "L. 1758"

make it possible to send new headers and save them on the server

web GUI has a functionality that allows to change headers supplied with a file to new headers compatible with gn_crossmap. These modified headers have to be used instead of ones supplied with a file.

Scope of work

  • make it possible to supply modified headers
  • if modified headers are detected -- use these headers instead of original ones.

Allow chaining of crossmap instances

I have been trying to chain crossmap instance together, to look up synonyms using CoL, then look up the returned (accepted) name from another provider (e.g. EoL / OpenTree / whatever). I do this using the acceptedName from the first call as the scientificName for the second.

However, this requires changing the column names between instances, so that acceptedName of one instance is treated as the scientificName of another (and also to change the output names so that the input fields in the first command aren't taken as input for the second command).

At the moment I'm switching the names using a simple perl regexp substitution (and blatting the scientificName fields in the initial input):

cat names.csv | crossmap -i - -o - | perl -pe 's/scientificName/input/g; s/acceptedName/scientificName/' | crossmap -i - -o -d 179

but perhaps if this is a use case that might see more general usage, it might be worth having a switch which allows this sort of relabelling automatically?

Make intermediate progress details available for external clients.

When used as a library -- collect and show the following results

All Steps

  • which step is happening and how far it went.
  • how long this step is going to take (estimation)

Resolution step

  • how many names are submitted
  • how many are done
  • how names which are done already separate by possible match results.

acceptedName omitted from output if not a synonym

for example

echo -e "taxonID,scientificName\n1,Acanthurus chirurgus" | crossmap -i - -o - -d 1

gives

1,Acanthurus chirurgus,Canonical form exact match,Acanthurus chirurgus,"Acanthurus chirurgus (Bloch, 1787)",Acanthurus chirurgus,species,species,,,0,0.988,26846886
1,Acanthurus chirurgus,Canonical form exact match,Acanthurus chirurgus,"Acanthurus chirurgus (non Bloch, 1787)",Acanthurus chirurgus,species,species,synonym,"Acanthurus bahianus Castelnau, 1855

So the first line is missing an acceptedName field. In this case I would like to default to the first line, since it is not a synonym. But I can't simply do that by filtering out lines with 'synonym' and looking for the "acceptedName" field, which I would like to do.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.