lab-grid / swabseq-analysis Goto Github PK
View Code? Open in Web Editor NEWTurning kkovary/swabseq_aws into a containerized Flask API
License: MIT License
Turning kkovary/swabseq_aws into a containerized Flask API
License: MIT License
At the moment, the water control wells are using a fixed RPP30 threshold (>10 counts), but this will lead to a high number of control failures. Instead we will use threshold that is based on the distribution of RPP30 reads in the run.
Implicit assumption is that RPP30 reads come from a mixture of distributions and that for samples we look and see if reads are possibly coming from left tail of RPP30 present distribution, and for neg controls we look and see if reads are possibly coming from right tail of RPP30 absent distribution.
Line 4 in 27cc1e4
From Jamie: "I recommend not setting this here and force that the value gets passed in directly and force an error. This will ensure that the health check will have the right version."
https://github.com/lab-grid/swabseq-analysis/blob/main/Dockerfile#L16
This url might change at some point and break docker build... we should just keep this file in the repo.
(Suggestion from Jamie.)
The following changes should be made to the Sample Categorization
plot in the qc_report
:
A1
and B1
so that it's easy to see if there was an issueWe could shave off ~30 seconds or so by adding the argument --no-bgzf-compression
to bcl2fastq
to convert bcl
files to fastq
files instead of fastq.gz
files.
Normally fastq.gz
is preferred since fastq
files are so large, but since we're deleting the run files after analysis this is not an issue, and decompression takes a while.
I haven't tested this out yet but I'm curious if it improves speed.
I've added two new arguments to the pipeline that I think could be useful:
--season
winter
, spring
, summer
, or fall
--debug
--debug TRUE
, the pipeline will cary out these extra steps if there is a potential issue with the run.The QA/QC pdf is necessary, and the LIMS_results.csv are the per-DNA-barcode results. The run_info.csv data is already in the pdf, and the other info is not necessary to QA/QC the run.
At the moment there are two major scripts in the pipeline, countAmpliconsAWS.R
and dict_align.py
.
At the moment, countAmpliconsAWS.R
runs dict_align.py
towards the beginning of the pipeline in order to align and count the amplicons. When dict_align.py
is finished running, it saves the output as results.csv
, which is then loaded into memory by countAmpliconsAWS.R
for downstream analysis. This write/read step takes extra time, and it would be better to keep the results in memory for downstream analysis instead of writing it to the drive and then reading it back in.
The reticulate
library for R
provides an R
interface to python
that may allow us to bypass this write/read step (https://rstudio.github.io/reticulate/). I haven't used this library yet, but I'm interested in trying it out to see if it improves speed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.