adamjorr / kbbq-py Goto Github PK
View Code? Open in Web Editor NEWk-mer based base quality recalibration
License: MIT License
k-mer based base quality recalibration
License: MIT License
should look up the read given an authoritative bam and set of variable sites and grab the erroneous positions
I'm thinking we should support 3 commands right now.
Given any number of BAM files or FASTQ files, recalibrate the BAM or FASTQ files. We should support these options
--use-oq
to use the OQ flag in BAM files--set-oq
to set the OQ flag before calibrating the read.--method
to set the error detection method; gatk
or lighter
--model
to set the calibration modellighter
method--prefix
see below--output
see belowCurrently I think it's OK if we support 1 BAM or 1 FASTQ at a time. But it would be neat if we could support an intersection of read groups when multiple input files are specified on the command line. Perhaps such a scheme would have rules like:
When multiple files are given at the command line, the output rules should probably be something like:
However, these rules should be able to be overriden with some options like:
--prefix
to output the recalibrated file with the same name and type as the input files, but with a prefix.--output
specified as many times as input files are specified, where the ordering specifies which input is associated with each output. That is, the recalibrated first input specified is output to the first output specified, the second input goes to the second output, and so on. An error should be thrown if --output
is used too many or too few times. We could probably do a file type conversion but I think it would be OK if --output
made an output of the same filetype as the corresponding input.Related to #6. Given a BAM + VCF + BED truthset, and optionally a FASTQ file of reads, output a tsv file with data to be plotted by the plot command. This file should probably be something like
predicted q | actual q | dataset | number of bases |
---|---|---|---|
0 | 0 | conf_regions.bam | 300 |
... | ... | ... | ... |
Note that we shouldn't actually include the header, so the user can call benchmark
with many different datasets and append to the output file each time.
A convenience command to plot the calibration data output from a benchmark command.
Given a file with columns predicted q, actual q, dataset, and number of bases, plot either:
And possibly other types of plots. The plot type should be specified with a flag like --type
.
samtools pulls any read that overlaps a valid region, so there may be bases on the edges that are getting counted when they shouldn't. these should be skipped.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.