Giter VIP home page Giter VIP logo

dbcamplicons's People

Contributors

msettles avatar pmhenry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbcamplicons's Issues

reproduce analysis from 2018

Our collaborators used the function 'dbcamplicons preprocess' in a 2018 analysis to demultiplex dual-barcoded sequences. We attempted to download dbcAmplicons but can no longer successfully install it (having lots of python errors).

Is this software no longer being maintained? If so, what would you suggest for alternative demultiplexing workflows? Thank you.

Easier RC support for barcodes

Rather than reverse complementing barcode1 automatically, what do you think about a couple of switches (-rc1, -rc2 perhaps) which would allow for flexibly reverse complementing barcode 1 or 2 depending on the library (or spreadsheet provided by a client). Currently it is a little bit confusing to check that barcode1 indeed matches the barcodes sequences in the fastq files, then have to reverse complement that barcode sequence in order to get dbcAmplicons to work.

More detailed statistics for trouble shooting

We recently ran into a tricky problem where some primer names were accidentally left off of the SampleSheet. The preprocesslog indicated a big dropoff in the number of reads successfully assigned to sample vs identified to barcode and primer which helped tip us off to this problem, however the Identified_Barcodes listed all of the reads associated with each sample+barcode, even if the barcode wasn't listed under that sample (a significant difference in total reads listed in the preprocesslog and the Identified_Barcodes table also helped with troubleshooting). It would be very cool to have one additional table generated by preprocess which listed:

Sample PrimersExpected PrimersIdentified ReadsByBarcode ReadsByBarcodeAndPrimer
sample1 96 96 1000000 800000
sample2 48 96 1000000 50000

This would make troubleshooting issues with samplesheet formatting much easier.

Preprocess single barcoded reads

Hello,

I notice that the update history for version 0.9.0 states that single barcoded reads can be processed, but I can't figure out how to do that.

Is there an option I need to set or a way to set up the input files to do this?

I have paired-end reads with a third read containing the barcodes.
For the Barcode file I put just two columns (BarcodeID, BarcodeSeq).
I then ran preprocess with --R1 Seq_R1.fastq --R2 Seq_R2.fastq --BC1 Barcodes.fastq

I get the following Traceback, which I interpret as program missing the other barcode read:

  File "build/bdist.linux-x86_64/egg/dbcAmplicons/preprocess_app.py", line 161, in start
    self.run = FourReadIlluminaRun(fastq_file1, fastq_file2, fastq_file3, fastq_file4)
  File "build/bdist.linux-x86_64/egg/dbcAmplicons/illuminaRun.py", line 48, in __init__
    self.fbc2.append(misc.infer_read_file_name(fread, "3"))
  File "build/bdist.linux-x86_64/egg/dbcAmplicons/misc.py", line 104, in infer_read_file_name
    raise Exception("Error inferring read " + seakread + " from read 1, found " + str(len(read)) + " suitable matches.")
Exception: Error inferring read 3 from read 1, found 0 suitable matches.

Thanks for the help!

allow search of pairs in join

join is the only sub function that requires you to specify both -1 -2, allow for search of -2 if only specifying -1

classify app should include rdp classifier

using external path to RDP classifier but creates problems if user is not familiar with the path to RDP. When including the classifer.jar file within the dbcAmplicons app, determine if the classifier can be packed and made within it.

biom file error when primers are absent

Cleaning up.
A fatal error was encountered.
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/dbcAmplicons/abundance_app.py", line 244, in start
sampleList_md = [{'primers': ";".join(primers[v])} for v in sampleList]
TypeError: sequence item 0: expected string, NoneType found

convert2ReadTo4Read.py barcode error

Using the following call:
dbcAmplicons/scripts/python/convert2ReadTo4Read.py -1 data/003237_H-2D_S30_R1_filtered.fastq.gz -2 data/003237_H-2D_S30_R2_filtered.fastq.gz --debug

I get the error:
ERROR:[TwoSequenceReadSet] Unknown error occured generating four read set
Cleaning up.
A fatal error was encountered.
Traceback (most recent call last):
File "dbcAmplicons/scripts/python/convert2ReadTo4Read.py", line 48, in start
self.run_out.addRead(read.getFourReads(bc1_length=barcode1, bc2_length=barcode2))
File "build/bdist.linux-x86_64/egg/dbcAmplicons/sequenceReads.py", line 355, in getFourReads
raise Exception("string in the barcode is not %s characters" % str(bc1_length + bc2_length))
Exception: string in the barcode is not 16 characters

When I try to add in the -p or -q flags the error remains the same.

validation fails without primer sheet

even if there are no primer sequences in the samples and no primers identified in the sample sheet, validation requires a primer sheet. a dummy sheet will pass validation.

preprocess output percentages

Change the output to stdout that summarizes the total percentage of reads identified in a run and their respective allocations to every project including "unidentified" reads

Project specific handling of primer trimming and output format

Currently "dbcAmplicons preprocess" processes all projects within a single set of fastq files in the same way. However, some clients prefer to receive files with primers intact, others wish to have primers removed, and some would prefer 4-read format (R1, R2, I1, I2). It would be great if a more sophisticated configuration option could be provided to generate the correct output per-project.

validate needs primer, fix for no primer

phoebe11@c11-42:~/MattsPipeline/CarolynsQiime/CQiime_metadata$ dbcAmplicons validate -B Carolyn_M_dbcBarcodeTable.txt -S Carolyn_M_SampleSheet2.txt --debug
/share/apps/python-2.7.4/lib/python2.7/site-packages/pkg_resources.py:1031: UserWarning: /home/phoebe11/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
A newer version (0.8.5) of dbcAmplicons is available at https://github.com/msettles/dbcAmplicons
barcode table length: 587
Cleaning up.
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/dbcAmplicons/validate_app.py", line 114, in start
prTable = primerTable(primerFile)
File "build/bdist.linux-x86_64/egg/dbcAmplicons/primers.py", line 39, in init
prfile = open(primerfile, 'r')
TypeError: coercing to Unicode: need string or buffer, NoneType found

error when you give it the wrong order of files

more graceful, and helpful error message needed
Wrong order of files.

dbcAmplicons preprocess -B barcodeTable2.txt -P primerTable.txt -S Judelson-sample.txt -1 Illumina_RawData/Undetermined_S0_L001_R1_001.fastq.gz -2 Illumina_RawData/Undetermined_S0_L001_I1_001.fastq.gz -3 Illumina_RawData/Undetermined_S0_L001_R2_001.fastq.gz -4 Illumina_RawData/Undetermined_S0_L001_I2_001.fastq.gz

Error message:

/home/msettles/Python_venv/local/lib/python2.7/site-packages/pkg_resources.py:991: UserWarning: /home/jli/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
A newer version (0.8.1-20160418) of dbcAmplicons is available at https://github.com/msettles/dbcAmplicons/tree/develop
barcode table length: 71
primer table length P5 Primer Sequences:8, P7 Primer Sequences:9
sample table length: 1, and 1 projects.
Cleaning up.
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/dbcAmplicons/preprocess_app.py", line 98, in start
bcsuccesscount += read.assignBarcode(bcTable, barcodeMaxDiff) # barcode
File "build/bdist.linux-x86_64/egg/dbcAmplicons/sequenceReads.py", line 130, in assignBarcode
bc2, bc2Mismatch = barcodeDist(bcTable.getP5(), self.bc_2, max_diff)
File "build/bdist.linux-x86_64/egg/dbcAmplicons/sequenceReads.py", line 31, in barcodeDist
bc_i, bc_mismatch = editdist.hamming_distance_list(b_l, b_2, max_diff+1)
SystemError: Bad Arguments

Format of read ID differs from manual when using "--keepPrimers"

The default output of dbcAmplicons preprocess is documented in the DBC_ampliconsUserManual as being formatted like:

{Sequence ID (Illumina header) } : {SampleID} : {PrimerPairID} {Barcode1|#differences1|Barcode2|#differences2} {PrimerForward|#differences|bpTrimmed}

However, this format is only used when "--keepPrimers" is not passed to preprocess.

Example without "--keepPrimers" (format described in the manual):
@M01380:62:000000000-B547W:1:1102:20354:1000 1:N:0:AG_5856:ITS3_ITS4 NTATCGCT|1|NTCTCTAT|1 ITS3_CS1|1|20|

Example with "--keepPrimers" (differs from what is described in the manual):
@M01380:62:000000000-B547W:1:1102:12519:1279 1:N:0:AG_1105 ACGAATTC|0|CAGGACGT|0

This might be intended behavior. I couldn't find documentation for it in the manual however, and it resulted in some issues with a downstream pipeline. Additionally, because no information is reported regarding the target specific primer (number of differences, which primer was identified, bpTrimmed) it isn't clear whether dbcAmplicons is still looking for the primer within the read.

I think the preferred behavior would be to report the primer, num mismatches, as well as 0bp trimmed?

abundance step appears to run but output files are empty

Hello! I've gotten almost all the way through the pipeline with no problem until the abundance step. When I run:

dbcAmplicons abundance -S SampleSheet.txt -O test-results/16sV4 -F NWRD001_1.classified.fixrank --biom > abundance.16sV4.log
I get this output, but all of my output files are empty. I checked with the HPC support people at my institution and they said that the error messages about numPy were safe to ignore:

/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/special/init.py:640: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._ufuncs import *lines, 44403.0 lines/second
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/linalg/basic.py:17: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._solve_toeplitz import levinsonlines/second
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/linalg/init.py:207: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._decomp_update import *
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/special/_ellip_harm.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._ellip_harm_2 import _ellipsoid, _ellipsoid_norm
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/interpolate/_bsplines.py:10: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _bspl
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/lil.py:19: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _csparsetools
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:165: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._shortest_path import shortest_path, floyd_warshall, dijkstra,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/_validation.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._tools import csgraph_to_dense, csgraph_from_dense,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:167: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._traversal import breadth_first_order, depth_first_order,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:169: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._min_spanning_tree import minimum_spanning_tree
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/sparse/csgraph/init.py:170: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._reordering import reverse_cuthill_mckee, maximum_bipartite_matching,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/spatial/init.py:95: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .ckdtree import *
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/spatial/init.py:96: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .qhull import *
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/spatial/_spherical_voronoi.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _voronoi
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/spatial/distance.py:122: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _hausdorff
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/optimize/_trlib/init.py:1: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._trlib import TRLIBQuadraticSubproblem
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/optimize/_numdiff.py:10: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from ._group_columns import group_dense, group_sparse
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/scipy/stats/_continuous_distns.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from . import _stats
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/_libs/init.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/init.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import (hashtable as _hashtable,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/dtypes/common.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import algos, lib
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/util/hashing.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import hashing, tslib
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/indexes/base.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import (lib, index as libindex, tslib as libts,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/tseries/offsets.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
import pandas._libs.tslibs.offsets as liboffsets
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/ops.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import algos as libalgos, ops as libops
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/indexes/interval.py:32: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs.interval import (
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/internals.py:14: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import internals as libinternals
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/sparse/array.py:33: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
import pandas._libs.sparse as splib
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/window.py:36: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
import pandas._libs.window as _window
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/groupby/groupby.py:68: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import (lib, reduction,
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/core/reshape/reshape.py:30: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import algos as _algos, reshape as _reshape
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/io/parsers.py:45: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
import pandas._libs.parsers as parsers
/global/home/users/rpduncan/src/dbcA_virtualenv/lib/python2.7/site-packages/pandas/io/pytables.py:50: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
from pandas._libs import algos, lib, writers as libwriters
processed 100000 total lines, 38930.0 lines/second
processed 200000 total lines, 41795.0 lines/second
processed 300000 total lines, 42559.0 lines/second
processed 400000 total lines, 43132.0 lines/second
processed 500000 total lines, 43520.0 lines/second
processed 600000 total lines, 43753.0 lines/second
processed 700000 total lines, 43928.0 lines/second
processed 800000 total lines, 44070.0 lines/second
processed 900000 total lines, 44183.0 lines/second
processed 1000000 total lines, 44272.0 lines/second
processed 1100000 total lines, 44343.0 lines/second
processed 1200000 total lines, 44403.0 lines/second
processed 1300000 total lines, 44431.0 lines/second
processed 1400000 total lines, 44371.0 lines/second
processed 1500000 total lines, 44361.0 lines/second
Writing output
Writing json formatted biom file to: results/16sV4.biom
Writing abundance file to: results/16sV4.abundance.txt
Writing proportions file to: results/16sV4.proportions.txt
finished in 0.57 minutes
Cleaning up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.