bioplatformsaustralia / bpaotu Goto Github PK

OTU database access for the Australian Microbiome

License: GNU Affero General Public License v3.0

Shell 0.94% Python 25.00% CSS 1.70% JavaScript 36.20% HTML 0.96% TypeScript 35.19%

bpaotu's Introduction

Bioplatforms Australia - Operational taxonomic unit (OTU) query system

BPA-OTU is a web-based portal into Operational Taxonomic Unit (OTU) data, developed to access data from the Australian Microbiome.

System overview

The backend is implemented in Django, but uses SQLAlchemy for most database operations.
The frontend is implemented in React and uses Plotly for charts and Leaflet for maps. It has its own webserver, separate from Django, which serves the React assets and also proxies requests from the user interface through to Django.
In production, the system requires that the browser session be logged in to the configured CKAN instance (see settings.py). This is an administrative restriction, and the system doesn't require CKAN authentication for functionality.
All data for the system is contained within a Postgres database which is loaded from a set of files by an ingest operation (see below). Some ancillary data is fetched using the Python ckanapi (e.g. sample site images and sample metagenome data). For this reason the docker containers (at least runserver and celeryworker) need to run with a valid CKAN_API_KEY environment variable (see ./.env_local and ./docker-compose.yml).
It depends on another Bioplatforms Australia project called bpa-ingest (maintained externally). The version of bpa-ingest used is maintained in the runtime-requirements.txt file. When updating the AM metadata schema, the bpa-ingest repository requires changes. These changes will be associated with a git tag by the bpa-ingest team for the new version. The entry in runtime-requirements.txt must be updated to use the version at this new tag. Note: This dependency was handled previously as a git submodule.
For development, Django runs in a Docker container, while the frontend webserver is started from a shell prompt outside of the container. The container mounts ./ as a volume, which means that Django will monitor all of its *.py files and restart when they are updated outside of the container.
The production instance is hosted at https://data.bioplatforms.com/
For production, both Django and the frontend webserver run in Docker containers.
Deployment into production from github is performed by Bioplatforms Australia using CircleCI

Development environment setup

Backend (Django)

Install docker and compose
- Note: the Docker compose plugin (docker compose) does not seem to work with the docker-compose-build.yml file, but the older executable (docker-compose) does work
- On the docker compose install page, there is a note that Compose V1 won't be supported anymore from the end of June 2023 (which may affect these steps)
Generate ./.env_local. This should contain KEY=value lines. See ./.env for keys. This must have a valid CKAN_API_KEY so that site images and sample metagenome data can be fetched during development. You can use your personal CKAN_API_KEY in the development environment. This key can be found on the profile page after logging on to the bioplatforms.com data portal.

Note that .env_local is used to supply environment variables to the backend running in a docker container. Don't confuse this with the various .env.* files that can be used by React to supply environment variables to the frontendinfo

In particular, the only purpose of ./.env is to document the available keys for manual generation of ./.env_local.

Ensure that other keys have a value set so the page will work (dummy values are fine). In particaular, CKAN_DEVEL_USER_EMAIL and BPAOTU_AUTH_SECRET_KEY need values, and possibly others.
Build the docker images

docker-compose -f docker-compose-build.yml build base dev
Start all of the containers

docker-compose up

There are 4 containers: runserver, db, cache, celeryworker

If the local machine already has a postgresql server instance it will need to be stopped, since the ports will conflict (sudo service postgresql stop)

This will start the docker containers attached to the current terminal process. If you want the containers to persist running after closing the terminal, start the containers with the -d argument:

docker-compose up -d

And then manage the containers with usual docker commands (docker-compose ps, docker-compose stop, docker-compose start)

Ingest

Once the BE is operational it's possible to do a data ingest. This is described in detail in the Input data description section. For quick reference:

/path/to/bpaotu is the app root (i.e. where docker-compose.yml is)

Extract the ingest archive to /path/to/bpaotu/data/dev

tar -zxvf </path/to/dataarchive.tar.gz> -C /path/to/bpaotu/data/dev
Update the sample contextual database for the import

cp /path/to/bpaotu/data/dev/$ingest_dir/db/AM_db_* /path/to/bpaotu/data/dev/amd-metadata/amd-samplecontextual/
Run the otu_ingest management task on the app container

docker-compose exec runserver bash

/app/docker-entrypoint.sh django-admin otu_ingest $ingest_dir $yyyy-mm-dd --use-sql-context --no-force-fetch

Where: $ingest_dir is the directory of the extracted ingest archive (note: tab complete will work here), $yyyy-mm-dd is the date of the ingest (i.e. today's date)

Frontend (React)

These steps are performed in a separate terminal, i.e. not in the container, and from the frontend/ directory.

Install node
- The required version is in the frontend/package.json under the `"engines"`` property
- Most systems will already have a version of node installed. The easiest way to install the required version for this app is to use nvm (Node Version Manager)
- Once nvm is installed, install the required version of node, e.g.: nvm install x.y.z
- There is also a file in the frontend/ directory called .nvmrc that specifies the version of node to be used for this project in the event that the local system has multiple versions of node.
Install yarn
- This is the preferred package manager for node projects
npm install -g yarn
Install node modules for the web app
- Run yarn install to install the node modules
Start the React frontend
- Run yarn start
- The page will be accessible on port 3000 by default

Input data description

BPA-OTU loads input data to generate a PostgreSQL schema named otu. The importer functionality completely erases all previously loaded data.

Three categories of file are ingested:

contextual metadata (extension: .xlsx for Excel file [default] or .db for SQLite DB)
taxonomy files (extension: .taxonomy)
OTU abundance tables (extension: .txt)

Note that /data/dev is a mount point in a Docker container. See ./docker-compose.yml

By default the contextual metadata will be downloaded during the ingest operation, or it can be provided as either a sqlite database or an Excel spreadsheet

./data/dev/amd-metadata/amd-samplecontextual/*.db # sqlite database
./data/dev/amd-metadata/amd-samplecontextual/*.xlsx # Excel spreadsheet

See "Additional arguments" below for more context on these.

Abundance and taxonomy files must be placed under a base directory for the particular ingest $dir, which is under the mount point for the Docker container, structured as follows:

./data/dev/$dir/$amplicon_code/*.txt.gz
./data/dev/$dir/$amplicon_code/*.$classifier_db.$classifier_method.taxonomy.gz

$classifier_db and $classifier_method describe the database and method used to generate a given taxonomy. They can be arbitrary strings.

The ingest is then run as a Django management command. To run this you will need to shell into the runserver container

cd ~/bpaotu # or wherever docker-compose.yml lives
# either this
docker-compose exec runserver bash
# or this
docker exec -it bpaotu_runserver_1 bash

## Either ingest using local sqlite db file for contextual metadata...
root@05abc9e1ecb2:~# /app/docker-entrypoint.sh django-admin otu_ingest $dir $yyyy_mm_dd --use-sql-context --no-force-fetch

## or download contextual metadata and use that:
root@420c1d1e9fe4:~# /app/docker-entrypoint.sh django-admin otu_ingest $dir $yyyy_mm_dd

If docker-compose exec runserver bash does not work, then find the id of the container with docker container ls (the system will need to be running for this to work, i.e. with docker-compose up) and then run docker exec -it 2361ab2339af bash (name will be different for the reader)

$dir is the base directory for the abundance and taxonomy files.

$yyyy_mm_dd is the ingest date .e.g. 2022-01-01

Example usage:

Get data file, unarchive and copy data to ./data/dev, and ingest data using a particular date:

cd ./data/dev
tar -xvzf </path/to/dataarchive.tar.gz> ./

cd ~/bpaotu # or wherever docker-compose.yml lives
docker-compose exec runserver bash
/app/docker-entrypoint.sh django-admin otu_ingest AM_data_db_submit_202303211107/ 2023-11-29 --use-sql-context --no-force-fetch

Additional arguments:

NOTE: the order is important if supplying both of these arguments

--use-sql-context: Add this to use contextual metadata file in format of SQLite DB instead of XLSX file (default: use XLSX file)
--no-force-fetch: Add this to avoid fetch of contextual metadata file from server and instead use the one available in local folder (default: fetch from server)

Contextual Metadata

This file describes sample specific metadata. The current schema of the contextual metadata can be found here

Taxonomy files

A gzip-compressed tab-delimited file with extension .taxonomy.gz

The first row of this file must contain a header. The required header fields are:

#OTU ID\tkingdom\tphylum\tclass\torder\tfamily\tgenus\tspecies\tamplicon\ttraits

#OTU ID\tkingdom\tsupergroup\tdivision\tclass\torder\tfamily\tgenus\tspecies\tamplicon\ttraits

Each column value is an arbitrary character string, with the following restrictions:

#OTU ID: a string describing the OTU (GATC string, md5sum or string prefixed with mxa_)
kingdom...species: taxon as a text string, e.g., d_Bacteria
amplicon: text string (e.g. 16S, A16S, 18S, ITS, ...)
traits: text string (multiple traits are comma separated)

NB: Taxonomic ranks must be forward filled with last known field assignment if empty (e.g. dbacteria, dbacteria_unclassified, dbacteria_unclassified, dbacteria_unclassified, dbacteria_unclassified, dbacteria_unclassified, d__bacteria_unclassified)

Example:

hou098@terrible-hf:~/bpaotu$ zcat  data/dev/202203050842/16S/16S_PWSW_seqs_listSET_OTU_taxon_20220304_withAMPLICON_FAPROTAXv124.silva132.SKlearn.taxonomy.gz  | head -4
#OTU ID confidence  kingdom phylum  class order family  genus species amplicon  traits
GATTGGCTCACGGACGCAAAACCACCAAAAAACACGTGACGTTACTGGTTGTCCGTCCTTTTGGTTTTTTTGCCCTTCTATGGTAATGCTATGAGTGCTTTTTGCAAAATGCTGCTCTGGGATTCGCTCCCGAACGCAACGCGCTACCTATTACTACTATCATAATTACATCACGCAAATTCAGGAGCTCATCAATGGTGAGCCAGCCAAGTTCATTCAAGATAGGTGAAATATGATCAAATTTCTTAGTATTAGTCAAAATACGGGCAGCAAAATTTTGTATAAGTTGTAGTTTATGAACATTATCCTTTGAAGTCCCAGACCATACAGTAGAACAGTAAAATAATTTACTAAAAACTAGTGAATTCAAAATGGTGTTCAATACCTCTCTAGAAAATAGGTGACGGACTCTATTTACTTGACATAAAGTAGATAAAAGGGAAGAACTAAGTGATGTAACGTAGTCATTAAAGTTAAAGTTCGAGTCTAGCAGAAGCCACGGGTTTTAACTCTTGACCAAGAAAAGGCACAGTGACATCTGGGAGCTGAGATAGGAGCTGTCTTACTCCGAA  0.4340600531226606  d__Unassigned d__Unassigned_unclassified  d__Unassigned_unclassified  d__Unassigned_unclassified  d__Unassigned_unclassified  d__Unassigned_unclassified  d__Unassigned_unclassified  27f519r_bacteria
AACGAACGCCGGCGGCGTGCTTAACACATGCAAGTCGAACGCGAAAGCCTGGGCAACTGGGCGAGTAGAGTGGCGAACGGGTGAGTAATACGTGAGTAACCTGCCCTTGAGTGGGGAATAACTCCTCGAAAGGGGAGCTAATACCGCATAAGACCACGACCCCGATGGGAGTTGCGGTCAAAGGTGGCCTCATGCACCAGAGCGTTTGGGCACAGATTCTGCGTGCCGGAAAAGAATCTGTACCCCAGCGCTTTGTCAGTGAAGCTATCGCTTGAGGAGGGGCTCGCGGCCCATCAGCTAGTTGGTAGGGTAATGGCCTACCAAGGCGACGACGGGTAGCTGGTCTGAGAGGACGACCAGCCACACGGGAATTGAGAGACGGTCCCGACTCCTACGGGAGGCAGCAGTGGGGAATCTTGGGCAATGGGGGAAACCCTGACCCAGCGACGCCGCGTGGGGGATGAAGGCCTTCGGGTTGTAAACCCCTGTTCGGTGGGACGAACATCTTCCCATGAACAGTGGGAAGATTTGACGGTACCACCAGAGTAAGCCCCGGCTAACTCCGTGC  0.9999802845765206  d__Bacteria d__Bacteria_unclassified  d__Bacteria_unclassified  d__Bacteria_unclassified  d__Bacteria_unclassified  d__Bacteria_unclassified  d__Bacteria_unclassified  27f519r_bacteria
GATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGTAACAGGCTTTCACTGTTTACTGCTCTTCTTTCGATATGGAGCAAAGGTTTTCCAAACCTTATTCCTAACGGAGGAGTATCATCTCGTACTTTGACCTAGTCAAGATACGAAATGTAGAGAAGTGAAGAGTGAAAGTGCTGACGAGTGGCGGACGGCTGAGTAACGCGTGGGAACGTGCCCCAAAGTGAGGGATAAGCACCGGAAACGGTGTCTAATACCGCATATGATCTTCGGATTAAAGCAGAAATGCGCTTTGGGAGCGGCCCGCGTTGGATTAGGTAGTTGGTGAGGTAAAGGCTCACCAAGCCGACGATCCATAGCTGGTCTGAGAGGATGACCAGCCAGACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGAGGAATCTTCCACAATGGGCGAAAGCCTGATGGAGCAACGCCGCGTGCAGGATGAAGGCCTTAGGGTCGTAAACTGCTTTTATTAGTGAGGAATATGACGGTAACTAATGAATAAGGGTCGGCTAACTACGTGC 0.8979041295444753  d__Bacteria p__Patescibacteria  c__Saccharimonadia  o__Saccharimonadales  f__Saccharimonadales  g__Saccharimonadales  g__Saccharimonadales_unclassified 27f519r_bacteria

Abundance files

A gzip-compressed tab-delimited file with the extension .txt.gz

The first row is a header, with the following format:

#OTU ID\tSample_only\tAbundance\tAbundance_20K

Each column has the following format:

#OTU ID: text string, corresponding to the strings in the taxonomy file
Sample_only: the identifier for the sample ID for which this column specifies abundance
Abundance (floating point) : the abundance of the OTU in the sample
Abundance_20K (integer): the abundance of the OTU in the sample after randomly sub-sampling 20,000 reads.

Missing values for Abundance or Abundance_20K are indicated by empty strings. Abundance can be the last field on the line if Abundance_20K is missing.

Example:

#OTU ID Sample_only Abundance Abundance_20K
AAAAGAAGTAAGTAGTCTAACCGCAAGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGG  21646 17
AAAAGAAGTAAGTAGTCTAACCGTTTACGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGG  21653 14
AAAAGAAGTAGATAGCTTAACCTTCGGGAGGGCGTTTACCACTTTGTGATTCATGACTGGGG  21644 70  2

Database visualisation

To generate an SVG diagram of the database schema, install the postgresql-autodoc and graphviz packages (Ubuntu), and then

PGPASSWORD=$db_password postgresql_autodoc -d webapp -h localhost -u webapp -s otu
dot  -Tsvg webapp.dot  > webapp.svg

Database login

Start a bash terminal on the db container and run log into psql with the webapp role:

psql -U webapp

Then set the search path to the "otu" schema at the psql prompt

SET search_path TO otu;

Testing

There is a script to test the output of the OTU and Contextual Download feature. This counts and displays the number of unique OTU hashes in the OTU.fasta file, the number of unique Sample IDs in the contextual.csv file, and for each domain .csv file, counts and displays the number of unique OTU hashes and unique Sample IDs. The results can then be inspected to ensure they are as expected for the given search.

To run, download a search, extract the results to a directory, cd to that directory and run the script:

. /path/to/bpaotu/test/verify-otu-contextual-export.sh

Deployments

Bioplatforms Australia - Australian Microbiome Search Facility

Licence

BPA OTU is released under the GNU Affero GPL. See source for a licence copy.

Contributing

Fork next_release branch
Make changes on a feature branch
Submit pull request

bpaotu's People

Contributors

Stargazers

Watchers

bpaotu's Issues

BIOM ingest support

Import the abundance tables via http://biom-format.org/ rather than plain text.

stream OTU zip downloads

There's nothing to stop us streaming the zip download, rather than generating the whole (quite large) thing in memory. I've found this Python module which will do the job.

https://github.com/allanlei/python-zipstream

ingest MM contextual metadata

BLAST search support

Use NCBI BLAST to search a fasta file of the database.

Sample map: ontologies not supported

We're not mapping the ontologies through into string form.

Let user know if a new Galaxy account has been made

... with a link to the password reset page. Jeff has a card in the RDC project.

Links to datasets are via the old MM/BASE projects, rather than AMDB

Urgent, but simple fix.

Drop empty contextual metadata columns from the CSV export

If you subset the data (e.g. to just environment = Soil), many of the contextual metadata fields become irrelevant and will be blank for all samples.

We already deal with this in the biom export, by doing a first pass to determine the interesting fields, and then writing only those fields into the BIOM file. We need to do the same thing for the CSV export code.

Take a look at the code and have a go at making the change. If you look at biom.py you'll see how we approach the issue for the other output format.

BIOM: improve perfomance

Improve filenames for data downloads

Make the filenames for the Zip export of OTU data (CSV and BIOM) meaningful.

improve the outer filename (the .zip)
improve the inner filename (only applicable to BIOM)

Perhaps we should just include the date and time in ISO format, with a sensible prefix. So, 'BiomExport-20180524T1406.zip'. Note we drop the colons from the ISO format as they'll cause problems on macOS.

Biom Format header incorrect

Reported by Jeff:
Modify BIOM format produced from CCG so that the value of "format" = "Biological Observation Matrix 1.0.0" (and not "1.0.0" as it is now)

Contextual combo box bug

Steps to reproduce:

add a contextual filter with a combo box (e.g. Australian Soil Classification)
leave as defaults
press the search button
error will be shown

I've investigated, the underlying issue appears to be that when we emit selectContextualFilter we add a clone of EmptyContextualFilter to the redux store. This doesn't actually match what we're displaying in the UI - in the UI we're showing the first combo box option as being selected.

I'm not sure the best approach to fix this. We'll probably need a different template EmptyContextualFilter for each type of filter (string, combo box / ontology, float range, ...)

@sztamas when you're back could you maybe suggest a fix?

Incompatible BIOM file for phyloseq

Phyloseq Installation
https://joey711.github.io/phyloseq/install.html
source('http://bioconductor.org/biocLite.R')
biocLite('phyloseq')

The phyloseq uses import_biom to import BIOM file.
e.g.,
x<-import_biom("GP.biom")

An issue occur when using the same feature to import the following biom file.

x<-import_biom("json_biom_file.biom")

Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Warning messages:
1: In parseFunction(i$metadata$taxonomy) : Empty taxonomy vector encountered.
2: In parseFunction(i$metadata$taxonomy) : Empty taxonomy vector encountered.
3: In parseFunction(i$metadata$taxonomy) : Empty taxonomy vector encountered.
4: In parseFunction(i$metadata$taxonomy) : Empty taxonomy vector encountered.
....

Some phyloseq users have posted the same issue here.
joey711/phyloseq#674

Sample map and contextual data: remove need to log in

The contextual metadata is public, so there's no need for auth on this page.

Prod build has broken

Looks like it's related to commit 70a78ad.

The egg-link from /env to bpaotu in /app references non-existent path /data/app.

This is affecting staging.

BIOM import: missing data

A lot of the bacteria data doesn't seem to be making it into the database. The methodology is unchanged, we just have more input data into Andrew's pipeline, so we should have more data, or the same amount, but nothing that was there before should be gone now.

Example missing data point:

Sample ID: 7031
OTU: AACAAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAACGAGGAAGTCCCTTCGGGGATTGTACTAGTGGCGTACGGGTGAGTAACGCGTGGATAATCTTCCTTAAGGTGGGGAATAACTAGTCGAAAGATTAGCTAATACCGCATAAGACCACAGGCTCTTCGGAGCAAGGGGTTAAAGCCGAAAGGCGCCATAAGATGAGTCTGCGCCCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCGACGATCGGTAGCTGGTCTGAGAGGACGGCCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAGTGACGAAGGCCTTCGGGTTGTAAAGCTCTGTTCTCAGGGAAAAAGAAAGTGATTGTACCTGAGAAGAAAGGACCGGCTAACTTCGTGC
Count: 3

webapp=# select otu.code, sample_otu.count from sample_otu join otu on otu.id=sample_otu.otu_id join ontology_otuamplicon a on a.id=otu.amplicon_id where sample_id=7031 and otu.amplicon_id=3 and otu.code='AACAAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAACGAGGAAGTCCCTTCGGGGATTGTACTAGTGGCGTACGGGTGAGTAACGCGTGGATAATCTTCCTTAAGGTGGGGAATAACTAGTCGAAAGATTAGCTAATACCGCATAAGACCACAGGCTCTTCGGAGCAAGGGGTTAAAGCCGAAAGGCGCCATAAGATGAGTCTGCGCCCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCGACGATCGGTAGCTGGTCTGAGAGGACGGCCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAGTGACGAAGGCCTTCGGGTTGTAAAGCTCTGTTCTCAGGGAAAAAGAAAGTGATTGTACCTGAGAAGAAAGGACCGGCTAACTTCGTGC';
 code | count 
------+-------
(0 rows)

this vs the old dataset:

bpaotu=# select otu.code, sample_otu.count from sample_otu join otu on otu.id=sample_otu.otu_id join ontology_otuamplicon a on a.id=otu.amplicon_id where sample_id=7031 and otu.amplicon_id=3 and otu.code='AACAAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAACGAGGAAGTCCCTTCGGGGATTGTACTAGTGGCGTACGGGTGAGTAACGCGTGGATAATCTTCCTTAAGGTGGGGAATAACTAGTCGAAAGATTAGCTAATACCGCATAAGACCACAGGCTCTTCGGAGCAAGGGGTTAAAGCCGAAAGGCGCCATAAGATGAGTCTGCGCCCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCGACGATCGGTAGCTGGTCTGAGAGGACGGCCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAGTGACGAAGGCCTTCGGGTTGTAAAGCTCTGTTCTCAGGGAAAAAGAAAGTGATTGTACCTGAGAAGAAAGGACCGGCTAACTTCGTGC';
                                                                                                                                                                                                                   
                     code                                                                                                                                                                                          
                                               | count 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------+-------
 AACAAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAACGAGGAAGTCCCTTCGGGGATTGTACTAGTGGCGTACGGGTGAGTAACGCGTGGATAATCTTCCTTAAGGTGGGGAATAACTAGTCGAAAGATTAGCTAATACCGCATAAGACCACAGGCTCTTCGGAGCAAGGGGTTAAAGCCGAAAGGCGCCATAAGATGAGTCTGCGC
CCGATTAGCTAGTTGGTGAGGTAATGGCTCACCAAGGCGACGATCGGTAGCTGGTCTGAGAGGACGGCCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATATTGCACAATGGAGGAAACTCTGATGCAGCGACGCCGCGTGAGTGACGAAGGCCTTCGGGTTGTAAAGCTCTGTTCTCAGGGAAA
AAGAAAGTGATTGTACCTGAGAAGAAAGGACCGGCTAACTTCGTGC |     3

Switch to more modern javascript (ES6+ or Typescript)

Dockerised yarn, webpack and es6 or typescript for a happier life.

BIOM: ontology fields end in _id

Drop the '_id' from the title.

Soil Environment - Contextual filter bug

When selecting the Soil Environment in the Contextual Filter section, contextual filters of type Soil aren't shown in the Contextual Filter list boxes.

Contextual metadata query bug

Reported by Anna Fitzgerald:

If you query the database for "Broad Land Use" - "Conservation and Natural Environments - National Park" and then hit search, no results are found.

That's suspicious - we need to look into why there are no matches.

Ideally, the hierarchy in this ontology should be respected when querying. So searching for "Conservation and Natural Environments" should return all samples which have that key set, or any of the child keys. This may mean that we need to change the ontology columns from integer type to a set type, so we can do "IN" queries on the column.

Local dev: API endpoints returning 403

Steps to reproduce:

run with CKAN integration turned on, on the same host and port (e.g. localhost:8000)
disable CKAN integration (or swap from dockercompose-bpa-ckan to plain bpaotu)
browser will still be sending the cookie through; this seems to cause 403s

Workaround is to use private browsing mode; then everything works fine.

This is a low priority as it only affects developers.

Make is more obvious when a search of the BPA Data Portal is running and finished

See linked trello card for details.

BIOM: modify format to work well with Phyloseq

sort the metadata fields
always the same fields in each entry, even if the value is 'null'

Unsorted column header of metadata table

The first thing that I have noticed is the key order (column header of the metadata in this case) that is not alphabetical sorted when extracting using json.load in python. The result produced by python2 and python3 is not the same.
Having an alphabetical order of column header is important for feeding phyloseq wrapper with correct column number when creating NMDS plot.
Regarding to this finding, I have posted an issue on biom-format.org github after going over the python script (parse.py) on biom-format github.

Missing values in metadata table

The next thing I found is the metadata table containing missing values which I think it’s reasonable, but it will produce inconsistent key/value pairs of each sample inside JSON structure.
With different key/value pairs in JSON structure, it might be the issue confusing the phyloseq with missing metadata table.

BPAOTU: link back to the main AMDB home page

Put a clear "return to the Australian Microbiome home page" link somewhere in the UI.

Link to https://data.bioplatforms.com/organization/about/australian-microbiome

Marine data ingest truncated

only a few hundred samples have contextual metadata
this causes most of the abundance data to be excluded from the ingest

Galaxy integration bug: create_user was renamed to users.create

The Submit to Galaxy functionality is failing currently if the galaxy user doesn't exist.

The view is calling galaxy_client.create_user on the galaxy client, but the client has been changed so the new function is galaxy_client.user.create.

Current user (email address) no longer shown

This happened when we moved to React, it should be simple to add this feature back in. I'll take a look, it'd be good to get across the new React stuff anyway!

Reported by Jeff from QCIF.

Fullscreen icon not showing on leaflet map

Restore 'tables' functionality.

Port over to the new react frontend stack.

Sample map: samples off coast of NZ misplaced

There's a line of samples off the east coast of NZ. They're not showing up in the default zoom, but if you zoom out enough so that the world map wraps, you'll see them appear. I wonder if it's some issue with them being too close to the wrapping point in longitude?

Update Galaxy Australia - Quick Start Guide.pdf to version 3

Jeff provided a new version of the Galaxy Quick Start Guide.

Update to it.

BPAOTU: Cache taxonomic possibilities

Show contextual filters in Search Results Table

When samples are returned from a query of the BPA Data Portal, in the summary list, also include a column for each of the contextual filters included in the search

This makes it easier for the user to review the results and satisfy themselves that the results are correct.

e.g. if one stipulates a search of vegetation type = grassland and vegetation type = forest, a column called "vegetation type" should appear and only include the values of grassland and forest.

BIOM export: ontologies not supported

We're writing out the ontology numbers (which are meaningless) into the BIOM file. We should instead write out the textual version, by doing an onotology lookup.

BIOM export support

Export data in the BIOM format (hdf5)

http://biom-format.org/

We can't stream BIOM data, so we'll need to prepare it on the server and then email the person a link.

Start off by reviewing the BIOM format docs, and then come back with a proposed way to add this support. We must keep the current CSV format support, this is an additional feature.

Note this is required for the RDC project to move forward.

Add webpack hot-reloading for the frontend code

Set up webpack and django to have hot-reloading while developing.

Also, add a hash to the js bundles generated by webpack, so we invalidate the cache when publishing a new version of the app.

Put the information current in the Contextual Information pop-up (expanded and maybe with some examples) directly on the search Processed Data page to make it clearer what is meant by "Contextual" Filters

See linked trello card.

OTU endpoint verification doesn't work

The function doing the verification returns HttpResposeForbidden but the code invoking it is assuming an exception is thrown.

BIOM: summarise filter parameters in title

e.g. the contextual, amplicon and taxonomic filters which have been applied.

this will then flow through to end users via tooling (e.g. galaxy/phyloseq)

BIOM: work around phinch issues

See trello card: https://trello.com/c/sKn1cp7L/141-modify-biom-format-from-ccg-so-that-it-is-the-same-as-a-phinch-format-file-which-can-be-read-into-both-phinch-and-krona

Send to galaxy: green popup never comes

https://bpa-staging.ccgapps.com.au/bpa/otu/

Hi Grahame,

A bug has crept in – when a dataset has been sent to Galaxy, the blue blox (Submission to Galaxy in Progress…” no longer changes to the green box with the message that includes the link to the Galaxy History.

Can you please fix?

contextual view: broad land use doens't work

Add broad land use, get this traceback

runserver_1 | sqlalchemy.exc.InvalidRequestError: Could not find a FROM clause to join from. Tried joining to <class 'bpaotu.otu.SampleLandUse'>, but got: Can't determine join between 'Join object on sample_context(140241661005216) and ontology_sampleaustraliansoilclassification(140241661226288)' and 'ontology_samplelanduse'; tables have more than one foreign key constraint relationship between them. Please specify the 'onclause' of this join explicitly.

Map improvements

show sample contextual metadata when a point is clicked upon in the map
if there are two points at the same (lat, lng) but different depths, show them in as tabs, and only show a single point

BPAOTU: pre-query taxo possibilities for next level in hierarchy

Taxonomic filters either do not populate, or populate slowly

The kingdom filters always work, but the second-level is either too slow, or not working at all in some cases. We need to track this down and resolve.

We are receiving bug reports from users on this, and it happened to me during a demo, so it's becoming urgent.

BPAOTU: Map on separate page

make the map reachable via a direct link, e.g. /map (relative to the base URL for the app)

Just show all the samples.

Speed up BIOM export

The BIOM export is working, but is a little slow.

Investigate performance improvements.

Most of the data output should be coming from the abundance_tbl function; we should heavily optimise this function. I suspect the call to format() at line 96 of biom.py is actually more expensive than we'd like. We could change the calls to .index() to use a O(1) lookup list rather than the current O(N) approach.

add text box (sequence to search) and button for BLAST under taxonomy box.
enable text box only if Amplicon is provided in the search form.