msettles / dbcamplicons Goto Github PK
View Code? Open in Web Editor NEWAnalysis of Double Barcoded Illumina Amplicon Data
License: GNU Lesser General Public License v3.0
Analysis of Double Barcoded Illumina Amplicon Data
License: GNU Lesser General Public License v3.0
Our collaborators used the function 'dbcamplicons preprocess' in a 2018 analysis to demultiplex dual-barcoded sequences. We attempted to download dbcAmplicons but can no longer successfully install it (having lots of python errors).
Is this software no longer being maintained? If so, what would you suggest for alternative demultiplexing workflows? Thank you.
Rather than reverse complementing barcode1 automatically, what do you think about a couple of switches (-rc1, -rc2 perhaps) which would allow for flexibly reverse complementing barcode 1 or 2 depending on the library (or spreadsheet provided by a client). Currently it is a little bit confusing to check that barcode1 indeed matches the barcodes sequences in the fastq files, then have to reverse complement that barcode sequence in order to get dbcAmplicons to work.
We recently ran into a tricky problem where some primer names were accidentally left off of the SampleSheet. The preprocesslog indicated a big dropoff in the number of reads successfully assigned to sample vs identified to barcode and primer which helped tip us off to this problem, however the Identified_Barcodes listed all of the reads associated with each sample+barcode, even if the barcode wasn't listed under that sample (a significant difference in total reads listed in the preprocesslog and the Identified_Barcodes table also helped with troubleshooting). It would be very cool to have one additional table generated by preprocess which listed:
Sample | PrimersExpected | PrimersIdentified | ReadsByBarcode | ReadsByBarcodeAndPrimer |
---|---|---|---|---|
sample1 | 96 | 96 | 1000000 | 800000 |
sample2 | 48 | 96 | 1000000 | 50000 |
This would make troubleshooting issues with samplesheet formatting much easier.
Hello,
I notice that the update history for version 0.9.0 states that single barcoded reads can be processed, but I can't figure out how to do that.
Is there an option I need to set or a way to set up the input files to do this?
I have paired-end reads with a third read containing the barcodes.
For the Barcode file I put just two columns (BarcodeID, BarcodeSeq).
I then ran preprocess with --R1 Seq_R1.fastq --R2 Seq_R2.fastq --BC1 Barcodes.fastq
I get the following Traceback, which I interpret as program missing the other barcode read:
File "build/bdist.linux-x86_64/egg/dbcAmplicons/preprocess_app.py", line 161, in start
self.run = FourReadIlluminaRun(fastq_file1, fastq_file2, fastq_file3, fastq_file4)
File "build/bdist.linux-x86_64/egg/dbcAmplicons/illuminaRun.py", line 48, in __init__
self.fbc2.append(misc.infer_read_file_name(fread, "3"))
File "build/bdist.linux-x86_64/egg/dbcAmplicons/misc.py", line 104, in infer_read_file_name
raise Exception("Error inferring read " + seakread + " from read 1, found " + str(len(read)) + " suitable matches.")
Exception: Error inferring read 3 from read 1, found 0 suitable matches.
Thanks for the help!
join is the only sub function that requires you to specify both -1 -2, allow for search of -2 if only specifying -1
using external path to RDP classifier but creates problems if user is not familiar with the path to RDP. When including the classifer.jar file within the dbcAmplicons app, determine if the classifier can be packed and made within it.
Support output of Biom formatted files to facilitate usage of downstream software.
http://biom-format.org/
This can be accomplished using their python application
Cleaning up.
A fatal error was encountered.
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/dbcAmplicons/abundance_app.py", line 244, in start
sampleList_md = [{'primers': ";".join(primers[v])} for v in sampleList]
TypeError: sequence item 0: expected string, NoneType found
allows for one to go through the pipeline on a single path then split at the very end
Using the following call:
dbcAmplicons/scripts/python/convert2ReadTo4Read.py -1 data/003237_H-2D_S30_R1_filtered.fastq.gz -2 data/003237_H-2D_S30_R2_filtered.fastq.gz --debug
I get the error:
ERROR:[TwoSequenceReadSet] Unknown error occured generating four read set
Cleaning up.
A fatal error was encountered.
Traceback (most recent call last):
File "dbcAmplicons/scripts/python/convert2ReadTo4Read.py", line 48, in start
self.run_out.addRead(read.getFourReads(bc1_length=barcode1, bc2_length=barcode2))
File "build/bdist.linux-x86_64/egg/dbcAmplicons/sequenceReads.py", line 355, in getFourReads
raise Exception("string in the barcode is not %s characters" % str(bc1_length + bc2_length))
Exception: string in the barcode is not 16 characters
When I try to add in the -p
or -q
flags the error remains the same.
even if there are no primer sequences in the samples and no primers identified in the sample sheet, validation requires a primer sheet. a dummy sheet will pass validation.
generate new python script in scripts/python that will take data processed by dbcAmplicons and split by sample which is suitable for upload to the SRA
Change the output to stdout that summarizes the total percentage of reads identified in a run and their respective allocations to every project including "unidentified" reads
Currently "dbcAmplicons preprocess" processes all projects within a single set of fastq files in the same way. However, some clients prefer to receive files with primers intact, others wish to have primers removed, and some would prefer 4-read format (R1, R2, I1, I2). It would be great if a more sophisticated configuration option could be provided to generate the correct output per-project.
Fix parsing of flash output
For two reads add support for clipping of reads due to quality
Test to see if clipping reads improves quality of classification
phoebe11@c11-42:~/MattsPipeline/CarolynsQiime/CQiime_metadata$ dbcAmplicons validate -B Carolyn_M_dbcBarcodeTable.txt -S Carolyn_M_SampleSheet2.txt --debug
/share/apps/python-2.7.4/lib/python2.7/site-packages/pkg_resources.py:1031: UserWarning: /home/phoebe11/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
A newer version (0.8.5) of dbcAmplicons is available at https://github.com/msettles/dbcAmplicons
barcode table length: 587
Cleaning up.
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/dbcAmplicons/validate_app.py", line 114, in start
prTable = primerTable(primerFile)
File "build/bdist.linux-x86_64/egg/dbcAmplicons/primers.py", line 39, in init
prfile = open(primerfile, 'r')
TypeError: coercing to Unicode: need string or buffer, NoneType found
more graceful, and helpful error message needed
Wrong order of files.
dbcAmplicons preprocess -B barcodeTable2.txt -P primerTable.txt -S Judelson-sample.txt -1 Illumina_RawData/Undetermined_S0_L001_R1_001.fastq.gz -2 Illumina_RawData/Undetermined_S0_L001_I1_001.fastq.gz -3 Illumina_RawData/Undetermined_S0_L001_R2_001.fastq.gz -4 Illumina_RawData/Undetermined_S0_L001_I2_001.fastq.gz
Error message:
/home/msettles/Python_venv/local/lib/python2.7/site-packages/pkg_resources.py:991: UserWarning: /home/jli/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
A newer version (0.8.1-20160418) of dbcAmplicons is available at https://github.com/msettles/dbcAmplicons/tree/develop
barcode table length: 71
primer table length P5 Primer Sequences:8, P7 Primer Sequences:9
sample table length: 1, and 1 projects.
Cleaning up.
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/dbcAmplicons/preprocess_app.py", line 98, in start
bcsuccesscount += read.assignBarcode(bcTable, barcodeMaxDiff) # barcode
File "build/bdist.linux-x86_64/egg/dbcAmplicons/sequenceReads.py", line 130, in assignBarcode
bc2, bc2Mismatch = barcodeDist(bcTable.getP5(), self.bc_2, max_diff)
File "build/bdist.linux-x86_64/egg/dbcAmplicons/sequenceReads.py", line 31, in barcodeDist
bc_i, bc_mismatch = editdist.hamming_distance_list(b_l, b_2, max_diff+1)
SystemError: Bad Arguments
The default output of dbcAmplicons preprocess is documented in the DBC_ampliconsUserManual as being formatted like:
{Sequence ID (Illumina header) } : {SampleID} : {PrimerPairID} {Barcode1|#differences1|Barcode2|#differences2} {PrimerForward|#differences|bpTrimmed}
However, this format is only used when "--keepPrimers" is not passed to preprocess.
Example without "--keepPrimers" (format described in the manual):
@M01380:62:000000000-B547W:1:1102:20354:1000 1:N:0:AG_5856:ITS3_ITS4 NTATCGCT|1|NTCTCTAT|1 ITS3_CS1|1|20|
Example with "--keepPrimers" (differs from what is described in the manual):
@M01380:62:000000000-B547W:1:1102:12519:1279 1:N:0:AG_1105 ACGAATTC|0|CAGGACGT|0
This might be intended behavior. I couldn't find documentation for it in the manual however, and it resulted in some issues with a downstream pipeline. Additionally, because no information is reported regarding the target specific primer (number of differences, which primer was identified, bpTrimmed) it isn't clear whether dbcAmplicons is still looking for the primer within the read.
I think the preferred behavior would be to report the primer, num mismatches, as well as 0bp trimmed?
try to replicate, then try to guard against.
If expected header not identified error out
Removing spaces will prevent downstream errors.
Specifically RDP os.isFile check fails when using tilda to reference.
Hello! I've gotten almost all the way through the pipeline with no problem until the abundance step. When I run:
dbcAmplicons abundance -S SampleSheet.txt -O test-results/16sV4 -F NWRD001_1.classified.fixrank --biom > abundance.16sV4.log
I get this output, but all of my output files are empty. I checked with the HPC support people at my institution and they said that the error messages about numPy were safe to ignore:
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/special/init.py:640: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._ufuncs import *lines, 44403.0 lines/second
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/linalg/basic.py:17: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._solve_toeplitz import levinsonlines/second
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/linalg/init.py:207: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._decomp_update import *
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/special/_ellip_harm.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._ellip_harm_2 import _ellipsoid, _ellipsoid_norm
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/interpolate/_bsplines.py:10: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _bspl
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/lil.py:19: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _csparsetools
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:165: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._shortest_path import shortest_path, floyd_warshall, dijkstra,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/_validation.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._tools import csgraph_to_dense, csgraph_from_dense,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:167: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._traversal import breadth_first_order, depth_first_order,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:169: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._min_spanning_tree import minimum_spanning_tree
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:170: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._reordering import reverse_cuthill_mckee, maximum_bipartite_matching,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/spatial/init.py:95: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .ckdtree import *
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/spatial/init.py:96: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .qhull import *
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/spatial/_spherical_voronoi.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _voronoi
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/spatial/distance.py:122: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _hausdorff
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/optimize/_trlib/init.py:1: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._trlib import TRLIBQuadraticSubproblem
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/optimize/_numdiff.py:10: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._group_columns import group_dense, group_sparse
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/stats/_continuous_distns.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _stats
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/_libs/init.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/init.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import (hashtable as _hashtable,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/dtypes/common.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import algos, lib
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/util/hashing.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import hashing, tslib
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/indexes/base.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import (lib, index as libindex, tslib as libts,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/tseries/offsets.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
import pandas._libs.tslibs.offsets as liboffsets
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/ops.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import algos as libalgos, ops as libops
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/indexes/interval.py:32: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs.interval import (
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/internals.py:14: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import internals as libinternals
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/sparse/array.py:33: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
import pandas._libs.sparse as splib
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/window.py:36: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
import pandas._libs.window as _window
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/groupby/groupby.py:68: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import (lib, reduction,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/reshape/reshape.py:30: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import algos as _algos, reshape as _reshape
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/io/parsers.py:45: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
import pandas._libs.parsers as parsers
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/io/pytables.py:50: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import algos, lib, writers as libwriters
processed 100000 total lines, 38930.0 lines/second
processed 200000 total lines, 41795.0 lines/second
processed 300000 total lines, 42559.0 lines/second
processed 400000 total lines, 43132.0 lines/second
processed 500000 total lines, 43520.0 lines/second
processed 600000 total lines, 43753.0 lines/second
processed 700000 total lines, 43928.0 lines/second
processed 800000 total lines, 44070.0 lines/second
processed 900000 total lines, 44183.0 lines/second
processed 1000000 total lines, 44272.0 lines/second
processed 1100000 total lines, 44343.0 lines/second
processed 1200000 total lines, 44403.0 lines/second
processed 1300000 total lines, 44431.0 lines/second
processed 1400000 total lines, 44371.0 lines/second
processed 1500000 total lines, 44361.0 lines/second
Writing output
Writing json formatted biom file to: results/16sV4.biom
Writing abundance file to: results/16sV4.abundance.txt
Writing proportions file to: results/16sV4.proportions.txt
finished in 0.57 minutes
Cleaning up.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.