Giter VIP home page Giter VIP logo

kmer-scripts's Introduction

Dependencies

  • R
    • ggplot2
install.packages("ggplot2")

Usage

kmer-plot --help
kmer-plot MODE [OPT=VAL ...] [name=]FILE [[name=]FILE ...]

Mode and Options

# kcov mode
kmer-plot kcov

[name=]FILE         jellyfish hash or two column count histogram (count abundance).
                    name is optional, should be enclosed in escaped \",
                    e.g. \"Set1\"=set1.jf

out="kcov.pdf"      output file name, supports .png, .pdf, .ps.
coverage.max=300    coverage axix maximum, 0 is data max.
count.max=0         count axis maximum, 0 is data max.
anscombe=FALSE      display Anscombe transformed data (variance stabilization)


theme=c("gg","bw","classic")   choose one.
plot.lines=!anscombe           draw line graph, TRUE or FALSE, default opposite of anscombe=.
plot.bars=anscombe             draw bar graph, TRUE or FALSE, default same as anscombe=.
plot.facet=F                   draw multiple sets facetted
plot.lines.width=.5            ...
plot.peaks=!anscombe
plot.peak.labels=plot.peaks
plot.peak.points=plot.peaks
plot.peak.ranges=plot.peaks
peak.size.min=10000
peak.label.angle=0
peak.label.hjust=.5
peak.label.size=3
width=10
height=6

# gccov mode
kmer-plot gccov
[name=]FILE         character separated file with one row per contig, 4 or 5
                    columns: (id length GC coverage [taxonomic-group])

sep="\t"            column-separator
out="gccov.pdf"     output file name, supports .png, .pdf, .ps.
coverage.max=300    coverage axix maximum, 0 is data max.
length.min=1000     minimum length of contigs to display/use in sum of length
tax.occ.min=1       minimum occurance of a taxon to display
bin.num=100         number of bins for total length histogram
tax.ignore=FALSE    ignore taxonomy column (color by length)
length.min.scatter=0     minimum length for contigs to display in scatter plot,
                         still counted in sum of length
jitter=FALSE        add jitter to better visualize fully overlapping data points
sample.scatter=0    only display this random fraction of data points in scatter
width=10
height=6

Examples

kcov Distribution of 19-mers in 5*10^6 reads of E.coli K-12
# download http://www.ncbi.nlm.nih.gov/sra/?term=ERR008613
# get 500 Mbp read data
head -n 10000000 ../raw/ERR008613_1.fastq > ERR008613-h10M_1.fq
head -n 10000000 ../raw/ERR008613_2.fastq > ERR008613-h10M_2.fq
# count kmers
jellyfish count -t 10 -C -m 19 -s 1G ERR008613-250_* -o e-coli.jf
jellyfish histo e-coli.jf > e-coli.tsv

# plot distribution
kmer-plot kcov e-coli.tsv plot.peaks=F theme=gg coverage.max=200 width=5 height=3 out=e-coli-kcov.png

sample/e-coli-kcov.png

kcov Random data with two kmer populations
kmer-plot kcov AB-m19.tsv coverage.max=1200 height=3 width=9 anscombe=F out=AB-kcov.png
kmer-plot kcov AB-m19.tsv plot.peaks=T coverage.max=1200 height=3 width=9 anscombe=T out=AB-kcov-ansc.png

sample/AB-kcov.png sample/AB-kcov-ansc.png

kcov Overlay / facetting of distributions of two genomic samples
kmer-plot kcov width=7 height=3 out=dm-kcov.png \
 \"diploid-clonal\"=dm-il-raw-m19-histo.tsv \
 \"diploid-mix\"=dm-gen-bgi-pe-raw-m19.tsv

kmer-plot kcov width=8 height=5 anscombe=T plot.facet=TRUE out=dm-kcov-facet.png \
 \"diploid-clonal\"=dm-il-raw-m19-histo.tsv \
 \"diploid-mix\"=dm-gen-bgi-pe-raw-m19.tsv

sample/dm-kcov.png sample/dm-kcov-facet.png

gccov GC-coverage plot for (meta-) genome assemblies
kmer-plot gccov <(cut -f1-4 tg.genome-nn-stats-franks-taxa.tsv) palette=Dark2 coverage.max=200 out=tg-gccov.png
kmer-plot gccov <(cut -f1- tg.genome-nn-stats-franks-taxa.tsv) palette=Dark2 coverage.max=200 out=tg-gccov-tax.png

sample/tg-gccov.png sample/tg-gccov-tax.png

kmer-scripts's People

Contributors

thackl avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

kmer-scripts's Issues

data frame error message in kmer-plot kcov

error

I get a strange error message when running kmer-plot kcov. Do you have any idea what could be causing this?

reading kmers:  AV-28x.jf
NULL
Fehler in `$<-.data.frame`(`*tmp*`, "cnt.a", value = NA_real_) : 
  replacement has 1 row, data has 0
Ruft auf: kcov ... FUN -> FUN -> peaks_refine -> $<- -> $<-.data.frame
Ausführung angehalten

some more details

  • I am running the same analysis for five different species. Four of them work fine and one causes the error.
  • I used jellyfish count (version 2.2.6) to create the .jf hash file and kmer-plot kcov (pulled from master today) for plotting
  • jellyfish does not produce any error message
  • The dataset I want to analyze is downsampled from a larger sequencing library. I do not get an error message when running the same analysis on the complete dataset

Do you have any idea what might be going on?

Reads not being written to disk

Hi Thomas,

Trying to filter reads with:
kmer-filter -k mer.jlf -1 R1.fq -2 R2.fq -o merFilter -n 50 -u 15 -l 5 -m 27

Script runs with no errors, but at the end log indicates:
[01-10 18:58:40] [kfr] Using binaries: jellyfish
[01-10 18:58:40] [kfr] Loading kmer hash
[01-10 19:16:55] [kfr] 335465346 distinct kmers loaded
[01-10 19:16:55] [kfr] Filtering: paired end mode
[01-10 19:16:55] [kfr] Files: /tmp/R1.fq /tmp/R2.fq
[01-11 06:18:26] [kfr] Kept 20908042 of 65046198 (32.1%) reads/pairs

But my output files contain only a small fraction of the 2m that should be there

Add license

Hi Thomas,

can you please assign a license to this repository? I would prefer MIT, but I am also happy with any other Open Source license.

Thanks :-)

Required libraries

Hello.
Thanks for sharing this tool. However, it would be nice if you can point the required .pm libraries for the using of the programs. I realized that some of them are in https://github.com/BioInf-Wuerzburg

However, I'm not sure whether the Jellyfish.pm library is the same Perl wrapper one can find after compiling jellyfish.
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.