globalnamesarchitecture / biodiversity Goto Github PK
View Code? Open in Web Editor NEWScientific Name Parser
License: MIT License
Scientific Name Parser
License: MIT License
Shaanxilithes Xing, Yue & Zhang, 1984 Shaanxilithes ing
Parapandorina Xue, Tang, Yu & Zhou 1995 Parapandorina ue
Update
Parser should return f. fo. forma as they are without converting them into one representation
@mjy commented on Wed Jan 18 2017
Assuming the biodiversity gem is for all intents and purposes deprecated (i.e. no longer tracking individual improvements realized here), and assuming I want to use native Ruby, what's the best practice for using gnparser?
My use case is query processing, I get a single name string, break it down, and use the pieces to search against my index. Spawning to the shell is possible, but the JVM is loaded every time, so this isn't a good solution. Sending the query to a web endpoint would take too long. It seems like what's missing is a daemon style approach? Or, more than likely, I'm missing something.
It seems like this is an issue for all the native scripting languages (R, Python, etc) that can't/won't J-ify themselves for whatever reason.
@alexander-myltsev commented on Thu Jan 19 2017
@mjy , did you consider using https://github.com/GlobalNamesArchitecture/gnparser#usage-as-a-socket-server ? It expects new-line delimited list of strings -- each string is a name to parse.
@mjy commented on Thu Jan 19 2017
@alexander-myltsev Exploring that- currently not working, maybe my Java version? I followed the wget instructions, and the gnparse name "Homo sapiens"
worked, then:
matt@MacBook-Pro-71 Downloads$ gnparse socket --port 1234 Exception in thread "main" java.lang.UnsupportedClassVersionError: akka/actor/ExtensionId : Unsupported major.minor version 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.globalnames.GnParser$.main(GnParser.scala:75) at org.globalnames.GnParser.main(GnParser.scala)
@alexander-myltsev commented on Mon Jan 23 2017
Sorry for late reply.
Akka uses Java 8. Alas, you should update Java to use it.
Lacanobia nr. subjuncta Bold:Aab, 0925
Hi,
Just installed the latest biodiversity gem.
When trying the example:
Biodiversity::Parser.parse("Plantago major", simple = true)
I get that error:
/root/.gem/gems/biodiversity-5.5.2/lib/biodiversity/parser.rb:36:in `parse': wrong number of arguments (given 2, expected 1) (ArgumentError)
Has the API changed? If so then the example could perhaps be modified.
To ease creation of UUIDv5 from scientific names
I am going to automatically create it in the parser.
It looks like names with these elements are not yet recognized as viruses, so capitalized words are stripped from the canonical form:
NPV, e.g., Papilio polyxenes NPV: http://eol.org/pages/41592578
RNA, e.g., Alternaria zinniae dsRNA element: http://eol.org/pages/11611917
virophage, e.g., Organic Lake virophage: http://eol.org/pages/20868817
satellites, e.g., Double-stranded RNA satellites: http://eol.org/pages/11603787
satellite, e.g., Whitefly VEM satellite: http://eol.org/pages/20858522
betasatellite, e.g., Tomato leaf curl China betasatellite: http://eol.org/pages/11603870
alphasatellite, e.g., Ageratum yellow vein Singapore alphasatellite: http://eol.org/pages/39738381
particle, e.g., Mouse Intracisternal A-particle: http://eol.org/pages/11609198
subgroup, e.g., Subgroup B: http://eol.org/pages/11623168 -- This is probably not limited to viruses, but it's very unlikely that any name that has this string in it will have author information associated with it.
Verbatim was not real verbatim. It should not be so, as we want ID to represent the string as it was given.
Hi there. I'm interested in porting this to R. However, i'm not sure how treetrop
gem is used here. Is treetop
required at run time (seems like it might be considering e.g., https://github.com/GlobalNamesArchitecture/biodiversity/blob/master/lib/biodiversity/parser/scientific_name_canonical.rb#L523), or just during development to create the Ruby classes/functions that are used in the biodiversity
gem? Curious if I can just port your Ruby functions to R, but if treetop
is required at run time that seems much harder as I don't think there's anything like treetop
in R.
Eupithecia cf. maestosa
Names like this are not parsed correctly because the parser does not handle low case genus information.
Monochamus (monohammus) galloprovincialis De Fluiter, 1950
Solution would be to accept names like this but ignore the content of the subgenus.
Before new lines were stripped by verbatim change, now verbatim stays unchanged and parserver needs do strip new lines itself.
Removing underscores would very unlikely creatate false positives, as underscores are not allowed by codes
Paddy looked at Parser 3.1.10 release and commented via email that apostrophes are not legal in the code, and should be removed. There is also another positive outcome -- names where apostrophes are removed already will now match not as fuzzy matchers but as exact matches.
NoMethodError: undefined method `canonical' for #Treetop::Runtime::SyntaxNode:0x00007fa29c67bb78
... realize its no longer supported, but thought I'd log it here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.