lentendu / deltamp Goto Github PK
View Code? Open in Web Editor NEWA flexible, reproducible and resource efficient metabarcoding amplicon pipeline for HPC
License: GNU General Public License v3.0
A flexible, reproducible and resource efficient metabarcoding amplicon pipeline for HPC
License: GNU General Public License v3.0
In GitLab by @lentendu on Jun 22, 2018, 14:35
In GitLab by @lentendu on Feb 15, 2018, 22:00
In GitLab by @lentendu on Sep 17, 2018, 15:14
In GitLab by @ahb-ufz on Sep 18, 2018, 11:15
there's code to use vsearch rather than bayes classifier for taxonomy, but no field in the example config file.
is the code ready to go?
In GitLab by @lentendu on Sep 19, 2018, 21:16
at the end of pipeline_master echo all jobid and their corresponding step name stored in variables into a file:
echo -e "get\t${get_jobid}#${TECH}_${RAW_EXT}\t${raw_jobid} ... " | tr "#" "\n" > config/jobid
In GitLab by @lentendu on Jun 15, 2018, 18:18
Good practice would be to execute first with quality only option, then control the quality of the reads, and choose the position at which the quality is dropping to cut the reads (so long it keep enough nt to allow proper pair-end merging).
In GitLab by @lentendu on Jun 12, 2018, 10:19
Illumina_fastq and Illumina_pair_end
In GitLab by @lentendu on Feb 6, 2018, 16:23
#%Module1.0
module-whatis "DeltaMP version ${VERSION[DELTAMP]}"
module load xxx
[..]
prepend-path PATH CLONE_DIRECTORY_PATH/DeltaMP/${VERSION[DELTAMP]}/bin
In GitLab by @ahb-ufz on Sep 18, 2018, 11:12
After MCL clustering, the taxonomy displayed in the all_OTUs table and the taxonomy files don't agree. Likely, the taxonomy data in the all_OTUs table are wrong.
A config file to check if this error is reproduced is attached.
configuration.cerc_dnY_preclC_chiB_clM_dbredY_454_18S_4117590897.tsv
some detail:
-> unique.mcl.pick.0.wang.cons.taxo has OTU, repseq and tax string, the taxonomy-repseq links agrees with the all_OTUs.tsv, but the Otu-taxonomy and OTU-repseq links don't
-> the Otu-taxonomy link in unique.mcl.pick.0.wang.cons.taxonomy doesn't agree with the all_OTUs.tsv
-> the repseq-taxonomy link in unique.mcl.pick.0.wang.taxonomy doesn't agree with the all_OTUs.tsv
-> the bottom of the all_OTUs.tsv is missing taxonomic annotation.
-> the reason seems to be that the .list file doesn't feature the repseq as first member in all cases. Later on there is no real joining, but pasting, which messes up the results
In GitLab by @lentendu on May 24, 2018, 14:39
During cut_db step
In GitLab by @lentendu on Jun 13, 2018, 15:19
vsearch --fastq_filter (maxEE)
DADA2 as replacement of trimming and pre-clustering. Problem: this should be applied on primer clipped unpaired reads (between Illumina_fastq and Illumina_pair_end steps, and avoiding Illumina_raw_stats step) and would need a significant change in the workflow
This could look like that (for each library in array job):
Rscript --vanilla dada2_wrap.R $LIB
and the content of dada2_wrap.R:
library(dada2)
samp<-commandArgs()[7]
fnFs<-paste0(samp,".fwd.fastq")
fnRs<-paste0(samp,".rvs.fastq")
pdf(paste0("dada_qual_",samp,".pdf"),width=10,height=5)
plotQualityProfile(c(fnFs,fnRs))
dev.off()
#some tricks to get optimal truncation length (length reached by at least 80 % of reads)
filtFs<-sub("\\.fastq","\\.filter\\.fastq",fnFs)
filtRs<-sub("\\.fastq","\\.filter\\.fastq",fnRs)
out<-filterAndTrim(fnFs,filtFs,rnFs,filtRs,truncLen=c(260,250),maxEE=c(5,5))
# some exports of filtering read counts
errF<-learnErrors(filtFs)
errR<-learnErrors(filtRs)
# pdf(paste0("dada_err_",samp,".pdf"),height=6,width=6)
# plotErrors(errF, nominalQ=TRUE)
# dev.off()
derepFs <- derepFastq(filtFs)
derepRs <- derepFastq(filtRs)
# some exports of dereplicated read counts
dadaFs <- dada(derepFs, err=errF)
dadaRs <- dada(derepRs, err=errR)
# some exports of ASV counts
# format [email protected][[2]][,1:2] and [email protected][[2]][,1:2] to fasta files
# add sequence tracking to name ASV sequences properly and create mothur like names files (if necessary...)
In GitLab by @lentendu on Feb 16, 2018, 11:23
In GitLab by @lentendu on Feb 6, 2018, 16:49
insert recall of GNU licence at the beginning of each script
In GitLab by @lentendu on Aug 8, 2018, 17:04
Only a variable(s) containing the dependent step jobid is use in steps.final, no jobnames.
In GitLab by @lentendu on Sep 19, 2018, 14:27
In GitLab by @lentendu on Sep 19, 2018, 16:35
In GitLab by @lentendu on Feb 6, 2018, 16:43
In GitLab by @lentendu on Feb 14, 2018, 17:25
In GitLab by @lentendu on Jun 12, 2018, 13:59
In GitLab by @lentendu on Feb 6, 2018, 16:57
Provide configuration file example needed to reproduce the full bioinformatic workflow of published studies, at least one per target gene (16S, 18S, ITS, COI)
In GitLab by @lentendu on Mar 7, 2018, 11:29
In GitLab by @lentendu on Sep 19, 2018, 16:24
no value for numOtus on second line
no otu label on first line
In GitLab by @lentendu on May 24, 2018, 16:21
SOP and paper based
In GitLab by @lentendu on Feb 15, 2018, 14:20
Allow selection between MOTHUR pre.cluster or cd-hit-454 for preclustering step
In GitLab by @lentendu on Feb 6, 2018, 16:54
In GitLab by @lentendu on May 30, 2018, 13:52
In GitLab by @lentendu on Sep 14, 2018, 15:24
Linked strategy with an anchored 5' adapter and a non-anchored 3' adapter, which works in both situation of traversed sequencing and partial sequencing.
Libraries often contains reads orientated in both direction, with, for example, half the R1 library with the forward primer at 5' end and half with reverse primer at 5' end.
So each libraries need to be checked for both directions.
If only a library contains only reads in one direction, this will have no effect.
In GitLab by @lentendu on Feb 15, 2018, 09:53
In GitLab by @ahb-ufz on Mar 15, 2018, 15:22
as pandaseq is very strict about the format of the header, there are many datasets in ENA whose headers don't conform with this format. We could include
pandaseq-checkid
to check the format after the download and abort the pipeline reporting the problem, because at the moment it's not always transparent (sometimes a BADID warning is issued, sometimes it isn't and the pipeline only gets stuck, if no reads are merged and in the quality step)
In GitLab by @lentendu on Aug 8, 2018, 15:40
Have to output full jobnames, jobid, status etc..
In GitLab by @lentendu on Feb 6, 2018, 16:47
In GitLab by @lentendu on May 24, 2018, 16:15
In GitLab by @lentendu on Sep 19, 2018, 20:51
No need for the full database aligned version if the aligned version for the region between the primers is available.
In GitLab by @ahb-ufz on Sep 18, 2018, 12:02
workflow runs into error in OTU step with sumaclust, because *unique.sort.fasta has no size argument in the header, which is required by the command. The size could be added to the fasta file or the "-s size" could be removed from l. 118 of the OTU script.
(second option would mean sorting by count, which is reasonable; the workflow runs to the end with default setting)
@lentendu , you decide.
In GitLab by @lentendu on Jun 13, 2018, 01:02
Use Swarm for (pre)clustering, without fastidious and OTU breaking options
In GitLab by @lentendu on Jun 15, 2018, 09:39
In GitLab by @lentendu on Mar 7, 2018, 22:27
In GitLab by @lentendu on Feb 15, 2018, 18:11
Describe dependencies and install informations
In GitLab by @lentendu on Sep 14, 2018, 17:29
Not found yet, could be the one from Christina's paper
In GitLab by @lentendu on Apr 26, 2018, 14:44
https://en.wikipedia.org/wiki/Lua_(programming_language)
https://wiki.ufz.de/eve/index.php/Lmod
In GitLab by @lentendu on Feb 6, 2018, 16:59
Provide auxiliary scripts to download and format reference databases (SILVA, UNITE, PR2, GenBank query search etc..).
In GitLab by @ahb-ufz on Sep 18, 2018, 11:17
swarm dies of Ns introduced during denoising, so this combination shouldn't be able to be called or the Ns removed between the steps
In GitLab by @lentendu on Feb 16, 2018, 10:58
Also check if raw read archiver step need to be queued
In GitLab by @lentendu on Jun 22, 2018, 15:06
ERROR: [hostlist.c:1737] Invalid range: `1-154%400': Invalid argument
scancel: error: Invalid job id 1670598_[1-154%400]
In GitLab by @lentendu on Sep 14, 2018, 16:52
In GitLab by @lentendu on May 24, 2018, 16:14
so far only possible with cd-hit-est
In GitLab by @lentendu on May 8, 2018, 21:04
Option --without-progress-bar for all OBITools commands,
--quiet for vsearch, etc...
In GitLab by @lentendu on Feb 6, 2018, 16:51
Integrate ITSx at trim step for fungal ITS
In GitLab by @lentendu on Jun 12, 2018, 14:00
In GitLab by @lentendu on May 24, 2018, 16:10
During cut_db step
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.