Giter VIP home page Giter VIP logo

deltamp's People

Contributors

lentendu avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

a-h-b

deltamp's Issues

Test datasets

In GitLab by @lentendu on Feb 15, 2018, 22:00

  • test directory
  • configuration files for test datasets (one for 454 and one for Illumina)

Create a DeltaMP module by make

In GitLab by @lentendu on Feb 6, 2018, 16:23

  • approximately contains:
#%Module1.0

module-whatis   "DeltaMP version ${VERSION[DELTAMP]}"

module load xxx
[..]
prepend-path    PATH CLONE_DIRECTORY_PATH/DeltaMP/${VERSION[DELTAMP]}/bin
  • the modulefiles directory path need to be read from a configuration file (e.g. config.txt)
  • the module(s) contanining all dependencies need to be read from a configuration file (e.g. config.txt)
  • make files and step scripts requesting the module to be loaded

repair integration of mcl

In GitLab by @ahb-ufz on Sep 18, 2018, 11:12

After MCL clustering, the taxonomy displayed in the all_OTUs table and the taxonomy files don't agree. Likely, the taxonomy data in the all_OTUs table are wrong.

A config file to check if this error is reproduced is attached.
configuration.cerc_dnY_preclC_chiB_clM_dbredY_454_18S_4117590897.tsv

some detail:
-> unique.mcl.pick.0.wang.cons.taxo has OTU, repseq and tax string, the taxonomy-repseq links agrees with the all_OTUs.tsv, but the Otu-taxonomy and OTU-repseq links don't

-> the Otu-taxonomy link in unique.mcl.pick.0.wang.cons.taxonomy doesn't agree with the all_OTUs.tsv

-> the repseq-taxonomy link in unique.mcl.pick.0.wang.taxonomy doesn't agree with the all_OTUs.tsv

-> the bottom of the all_OTUs.tsv is missing taxonomic annotation.

-> the reason seems to be that the .list file doesn't feature the repseq as first member in all cases. Later on there is no real joining, but pasting, which messes up the results

Add triming methods based on maximum error rate for Illumina

In GitLab by @lentendu on Jun 13, 2018, 15:19

  • vsearch --fastq_filter (maxEE)

  • DADA2 as replacement of trimming and pre-clustering. Problem: this should be applied on primer clipped unpaired reads (between Illumina_fastq and Illumina_pair_end steps, and avoiding Illumina_raw_stats step) and would need a significant change in the workflow
    This could look like that (for each library in array job):

Rscript --vanilla dada2_wrap.R $LIB

and the content of dada2_wrap.R:

library(dada2)
samp<-commandArgs()[7]
fnFs<-paste0(samp,".fwd.fastq")
fnRs<-paste0(samp,".rvs.fastq")
pdf(paste0("dada_qual_",samp,".pdf"),width=10,height=5)
plotQualityProfile(c(fnFs,fnRs))
dev.off()
#some tricks to get optimal truncation length (length reached by at least 80 % of reads)
filtFs<-sub("\\.fastq","\\.filter\\.fastq",fnFs)
filtRs<-sub("\\.fastq","\\.filter\\.fastq",fnRs)
out<-filterAndTrim(fnFs,filtFs,rnFs,filtRs,truncLen=c(260,250),maxEE=c(5,5))
# some exports of filtering read counts
errF<-learnErrors(filtFs)
errR<-learnErrors(filtRs)
# pdf(paste0("dada_err_",samp,".pdf"),height=6,width=6)
# plotErrors(errF, nominalQ=TRUE)
# dev.off()
derepFs <- derepFastq(filtFs)
derepRs <- derepFastq(filtRs)
# some exports of dereplicated read counts
dadaFs <- dada(derepFs, err=errF)
dadaRs <- dada(derepRs, err=errR)
# some exports of ASV counts
# format [email protected][[2]][,1:2] and [email protected][[2]][,1:2] to fasta files
# add sequence tracking to name ASV sequences properly and create mothur like names files (if necessary...)

License header

In GitLab by @lentendu on Feb 6, 2018, 16:49

insert recall of GNU licence at the beginning of each script

Database selection in the configuration file

In GitLab by @lentendu on Feb 14, 2018, 17:25

  • No default database set in deltamp
  • The path to the database directory and the database prefix $DB need to be provided in the configuration file to match $DB.fasta and $DB.taxonomy
  • The aligned version of the database will be searched under $DB.align.fasta (e.g. for SILVA database)

DeltaMP configuration file example

In GitLab by @lentendu on Feb 6, 2018, 16:57

Provide configuration file example needed to reproduce the full bioinformatic workflow of published studies, at least one per target gene (16S, 18S, ITS, COI)

Clip primers with linked strategy with cutadapt, and check for reverse-complement sequence orientation

In GitLab by @lentendu on Sep 14, 2018, 15:24

Linked strategy with an anchored 5' adapter and a non-anchored 3' adapter, which works in both situation of traversed sequencing and partial sequencing.

Libraries often contains reads orientated in both direction, with, for example, half the R1 library with the forward primer at 5' end and half with reverse primer at 5' end.
So each libraries need to be checked for both directions.
If only a library contains only reads in one direction, this will have no effect.

Add archiving script

In GitLab by @lentendu on Feb 15, 2018, 09:53

  • script to tar.gz archive and MD5sum check outputs, processing files and demultiplexed read files (if needed)
  • add archiving to queueing commands in pipeline master
  • control symlinking of raw reads in DeltaMP main

check header of archived input for compliance with pandaseq (Illumina)

In GitLab by @ahb-ufz on Mar 15, 2018, 15:22

as pandaseq is very strict about the format of the header, there are many datasets in ENA whose headers don't conform with this format. We could include
pandaseq-checkid
to check the format after the download and abort the pipeline reporting the problem, because at the moment it's not always transparent (sometimes a BADID warning is issued, sometimes it isn't and the pipeline only gets stuck, if no reads are merged and in the quality step)

Allow DBCHOP without DBALIGN

In GitLab by @lentendu on Sep 19, 2018, 20:51

No need for the full database aligned version if the aligned version for the region between the primers is available.

add size information to fasta file for sumaclust or remove -s size from sumaclust command

In GitLab by @ahb-ufz on Sep 18, 2018, 12:02

workflow runs into error in OTU step with sumaclust, because *unique.sort.fasta has no size argument in the header, which is required by the command. The size could be added to the fasta file or the "-s size" could be removed from l. 118 of the OTU script.
(second option would mean sorting by count, which is reasonable; the workflow runs to the end with default setting)
@lentendu , you decide.

Databases formating scripts

In GitLab by @lentendu on Feb 6, 2018, 16:59

Provide auxiliary scripts to download and format reference databases (SILVA, UNITE, PR2, GenBank query search etc..).

ITSx

In GitLab by @lentendu on Feb 6, 2018, 16:51

Integrate ITSx at trim step for fungal ITS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.