Giter VIP home page Giter VIP logo

swabseq-analysis's Introduction

swabseq-analysis

Turns kkovary/swabseq_aws into a containerized Flask API with authentication.

Original code has been edited in the following ways:

  • deleted the option to push results to github
  • instead, it just writes results to local disk, in provided directory path (which will be used by Flask API wrapper to write to a random temp directory)
  • added lab-grid/script-runner
  • added a Dockerfile that logs into Basespace to be able to get files, and installs the necessary R and Python dependencies.

Development

To run the server locally:

docker-compose up --build

To test, run 2 scripts. One to generate results for the demo sequencing data, and the other to retrieve those results (use .ps only if using Microsoft Powershell):

./test_unauthenticated.sh
<record the id returned>
<wait several minutes until server stops printing processing messages>
./test_unauthenticated-results.sh <id> > demo_output.json

Before running first time, create a .env file:

cp example.env .env

Before running first time, if you will pull sequencing data from Basespace, generate a default.cfg file:

docker-compose run --rm server bs auth \
    --scopes "BROWSE GLOBAL,READ GLOBAL,CREATE GLOBAL,MOVETOTRASH GLOBAL,START APPLICATIONS,MANAGE APPLICATIONS" \
    --force

This will create a default.cfg file in the ./.basespace directory. Future calls to docker-compose up will use the credentials saved in the ./.basespace directory.

Original Script Usage instructions for demo script:

  • Rscript countAmpliconsAWS.R --basespaceID [ID for run] --threads [number of threads for running bcl2fastq]
  • The --basespaceID is used to identify the run on BaseSpace and then download the raw data which is then demultiplexed with bcl2fastq and then analyzed, where a PDF of run info and results is generated, along with a csv file with the unique DNA barcodes for each sample, the location of that sample on 96 and 384 well plates, the number of counts for the targeted amplicons, and the classification of the sample (COVID positive, COVID negative, or inconclusive/failed sample).

swabseq-analysis's People

Contributors

dependabot[bot] avatar kathryn-explorable avatar kkovary avatar robotoer avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

kkovary

swabseq-analysis's Issues

version suggestion

ARG SERVER_VERSION=local+container

From Jamie: "I recommend not setting this here and force that the value gets passed in directly and force an error. This will ensure that the health check will have the right version."

Only return necessary files

The QA/QC pdf is necessary, and the LIMS_results.csv are the per-DNA-barcode results. The run_info.csv data is already in the pdf, and the other info is not necessary to QA/QC the run.

New arguments for pipeline

I've added two new arguments to the pipeline that I think could be useful:

--season

  • Here we can specify winter, spring, summer, or fall
  • This allows the pipeline to pull in the correct forward and reverse barcode information in so that the plots in the PDF file will be correct

--debug

  • There are some outputs and plots that take extra time to generate and may not always be necessary.
    • We can discuss if this is how we want to move forward or if we just want all analyses to be done all of the time.
  • If --debug TRUE, the pipeline will cary out these extra steps if there is a potential issue with the run.

Add classification for control success / control fail

The following changes should be made to the Sample Categorization plot in the qc_report:

  1. Add control success / control fail logic for positions A1 and B1 so that it's easy to see if there was an issue
  2. Update color scheme so that COVID positive and COVID negative classifications are more striking, along with control wells.

Test if wrapping python script into R script using reticulate

At the moment there are two major scripts in the pipeline, countAmpliconsAWS.R and dict_align.py.

At the moment, countAmpliconsAWS.R runs dict_align.py towards the beginning of the pipeline in order to align and count the amplicons. When dict_align.py is finished running, it saves the output as results.csv, which is then loaded into memory by countAmpliconsAWS.R for downstream analysis. This write/read step takes extra time, and it would be better to keep the results in memory for downstream analysis instead of writing it to the drive and then reading it back in.

The reticulate library for R provides an R interface to python that may allow us to bypass this write/read step (https://rstudio.github.io/reticulate/). I haven't used this library yet, but I'm interested in trying it out to see if it improves speed.

run bcl2fastq without compression

We could shave off ~30 seconds or so by adding the argument --no-bgzf-compression to bcl2fastq to convert bcl files to fastq files instead of fastq.gz files.

Normally fastq.gz is preferred since fastq files are so large, but since we're deleting the run files after analysis this is not an issue, and decompression takes a while.

I haven't tested this out yet but I'm curious if it improves speed.

Use distribution based model to adjust RPP30 threshold in water control wells

At the moment, the water control wells are using a fixed RPP30 threshold (>10 counts), but this will lead to a high number of control failures. Instead we will use threshold that is based on the distribution of RPP30 reads in the run.

Implicit assumption is that RPP30 reads come from a mixture of distributions and that for samples we look and see if reads are possibly coming from left tail of RPP30 present distribution, and for neg controls we look and see if reads are possibly coming from right tail of RPP30 absent distribution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.