globalnamesarchitecture / gnlist-resolver-gui Goto Github PK
View Code? Open in Web Editor NEWThis tool allows cross-mapping scientific name checklists to a variety of biodiversity databases
License: MIT License
This tool allows cross-mapping scientific name checklists to a variety of biodiversity databases
License: MIT License
Currently we cannot have more than one pod, because non-commercial version of ingress can send request to any pod in the system. We need a organize a mount that can be shared by all pods. This will also allow us to do rolling restart. I also will double check that minikube works correctly.
Use this data in the name of the file:
name of the original file
name of the data source
date for the data source release
date when the match was done
As another ticket -- add a setting that will allow to configure file name.
Add a field that will tell how many rows correspond to this match
We need to figure out feasibility of this approach. If people install the project as a web page they would need to download Elm's JS file from a remote link and then some combination of CORS/Proxy would do the trick of connecting to the server.
This ticket is about an investigation phase. If we find the approach feasible we will set our system to support "application as a static web page" approach.
Placeholder for MDL framework, with initial implentation
Error handling (top level) placeholder
We moved the prototype code from gn_crossmap_web with whole history. As a result when we create a ticket in old repository the numbers are duplicated with old code. We need to flatten old history.
Infraspecies without authors (Natrix natrix cypriaca) gives n/a as inputRank.
Orobanche densiflora Salzmann ex Reuter in DC. gives inputRank n/a. Why?
see input and result files attached
Infraspecies is written with capital produces wrong canonical match. E.g. Iris humilis Georgi
subsp. Arenaria (Waldst. et Kit.) A.et D.Löve
Astragalus macrocarpus DC. subsp. Lefkarensis
Placeholder for i18n
Ingestion step depends on biodiversity
gem. It does make sense to migrate to gnresolver
project as it is significantly faster, and has more features. For this we need to move gn_list_resolver gem to JRuby
Output in UTF-8. Why do some accepted species names have special characters depicted
correctly, while others don’t. E.g. &
versus &
etc.?
If I have data that will not does not have enough information for resolution process, I want to see an error message that explains what needs to be done for this step to succeed. I should not be able to click on "Continue" button in this case.
Migrate to GraphQL API of gnindex
Clone gn_crossmap_web and start refactoring
Simple results have 3 categories:
match
fuzzy
no match
Advanced have
exact + canonical = match
fuzzy = fuzzy
partial + partial fuzzy + genus + unmatch + error = no match
This ticket grew from a bug fix to an improvement, so I'll give estimate it
would expect that Adenophora lilifolia gets some score because Adenophora liliifolia with only one additional character is an accepted name? http://webservice.catalogueoflife.org/col/webservice?name=Adenophora+liliifolia
We do have Material design framework, but it is not polished yet. So we need to enhance it.
We have to normalize headers to the terms that the program understands. To do that we need to create workflow that prevents entering wrong combinations of data
Currently Resolution "page" has to be opened from the start to the end of the process. Users should be able to leave the page, return to it again while the resolution continues to happen in the background
The tax_id field that the resolver imports from CoL and outputs in the results is useless, it is just a database index id, changes with every edition and is not used in other CoL services. Unfortunately CoL is in a transition phase from offering LSIDs earlier to 'natural keys' after AC2017 edition. These are however already in DwC, in the references field. I would suggest to extract these from the references field. Example: from http://www.catalogueoflife.org/annual-checklist/details/species/id/1d761fa6e15f9ba277ad7784af78c8b4/synonym/5fc5c8ab89caeede5ede03be346369a7 the identifier can be extracted for both the synonym and its current accepted name. It may also be helpful to give this whole url in the output.
Upload of the file happens in about the same time, but the 'wait' period after upload increased from 7 to 40 seconds for 1 million names file according to my measurements on production servers
We need to setup kubernetes configurations for both projects
Canonical field shows differences in fuzzy matching, so edit distance next to it is convenient
Napkin design
Simple results have 3 categories:
match
fuzzy
no match
Advanced have
exact + canonical = match
fuzzy = fuzzy
partial + partial fuzzy + genus + unmatch + error = no match
Currently Dockerfile is huge and contains a lot of stuff that is only important for development. We need a new Dockerfile that is shaves off all the unnecessary baggage.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.