Giter VIP home page Giter VIP logo

ala-names-generator's Introduction

ala-names-generator Build Status

This file outlines the steps that need to be performed to generate an ALA names list.

Some of the steps are manual.  It may be nice to incorporate them into a complete package.

1) Download the latest CSV files from http://biodiversity.org.au/dataexport/.  Save them in the /data/bie-staging/anbg directory

2) Rename the files - stripping out the date and timestamp information from the file names.

3) Run the ala-names-setup.sql script to reset the mysql environment:
mysql -uroot -ppassword< ala-names-setup.sql

4) If the CoL name staging are not populated run the import-col-names.sql script.

5) Import each of the DwC archives to be used as additional names sources eg: --dwc /data/bie-staging/names-lists/dwc-col

6) Decide how each of the DwC archives aregoing to be used in padding out the NSL.  At the moment this is a manual step that requires some SQL to be executed defining the type and level at which padding occurrs.
Based on names_list 1 = AusMoss 2 = AusFungi and 3 = CoL here is the latest SQL that was executed to populate it:
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (1,'class','Equisetopsida','all');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (2,'kingdom','Chromista','all');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (2,'kingdom','Fungi','all');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (2,'kingdom','Protozoa','all');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'kingdom','Chromista','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'kingdom','Fungi','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'kingdom','Protozoa','merge');
insert into names_list_padding(id,taxon_rank,pad_type) VALUES (1, 'kingdom', 'all');
insert into names_list_padding(id,taxon_rank,pad_type) VALUES (2, 'kingdom', 'all');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Diphysciales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Archidiales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Hypnales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Scouleriales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Bartramiales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Encalyptales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Pottiales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Buxbaumiales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Hedwigiales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Hookeriales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Ptychomniales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Splachnales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Gigaspermales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Sphagnales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Orthotrichales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Andreaeales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Polytrichales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Bryales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Grimmiales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Funariales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Hypnodendrales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Rhizogoniales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'order','Dicranales','merge');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'kingdom','Viruses','all');
INSERT INTO names_list_padding (id,taxon_rank,scientific_name,pad_type) VALUES (3,'kingdom','Bacteria','all');

6) To generate the names list in the mysql database execute the following command:
java -Xmx1G -Xms1G -cp .:ala-names-generator-1.0-SNAPSHOT-assembly.jar au.org.ala.names.NamesGenerator --all

This command performs the following individual steps (each of these steps can be run separately using the corresponding argument.
a) --tnexp : generates the sound exp to be used to assist in the potential duplicates. If a sound exp (taxa specific to account for masculine/feminine) - 9.83255 minutes 
b) --nsl : Calculates the depth and number of children for the nsl taxon concepts.  This aids in the selection of an "accepted" concept. - total time: 38.595684 minutes (886,982 taxon concepts)
c) --init : Extracts the "Accepted" AFD and APC taxon concepts from the NSL.  It also associates synonyms for the concepts - 34 minutes
d) --caab : Inserts the missing names from a subset on caab names.  This is not available as a DwCA thus needs specialised attention
e) --synexp: Populates the sound like expressions for the synonyms. These are used to determine whether or not to include
f) --lists all : Incorporates the names lists that should have "all" their concepts included
g) --lists merge : Incorporates the names lists that should have their concepts "merge"d with the existing concepts.  This step needs to be performed after the --lists all to take into account concepts added in that step.
h) --apni : Extracts the APNI species level and below concepts for use in the names list.
i) --col : pads out AFD with concepts that have records in Australia  
j) --class: Generates the classification for all the ALA accepted concepts. This includes a denormalised view of the major linnaean ranks.
k) --clean: Renames the mysql names dumps so that there is no conflict in file name


o) --kingdoms: Incorporates all the CoL kingdoms that are missing from the NSL. This includes synonyms. (This step is obsolete/deprecated since support of the dwc lists) 

7) Dump the ALA accepted concepts and synonyms to file. 
mysql -uroot -ppassword < create-dumps.sql

ala-names-generator's People

Contributors

mbohun avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.