hariszaf / pema Goto Github PK
View Code? Open in Web Editor NEWPEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS and COI marker genes
PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S rRNA, ITS and COI marker genes
It is very common that NGS data contain adapter sequences other than the ones used during library construction, mainly because sequencing centers may sequence many different libraries at the same time. This cross-contamination, and any other project-specific contamination of reads with short sequences, can be handled easily if the user has the option to provide a custom adapter file in PEMA.
Thank you for considering!
In the convertIllumunaRawDataToEnaFormat.sh
replace reg expression with more alternative options, like from:
/^@M0.*/
to:
/^@[M|A]0.*/
to work for multiple cases such as novaseq etc
Could be very informative especially in the case of COI samples (in general, useful for protein coding markers).
A tool that could be added are metaMATE (https://github.com/tjcreedy/metamate).
Also, this publication (https://doi.org/10.1186/s12859-021-04180-x) discusses how to remove putative pseudogenes and the method (based on the NCBI ORFfinder program) is implemented in MetaWorks (https://github.com/terrimporter/MetaWorks).
It would be nice to also include in one file all the metadata that are required when PEMA outputs are published (e.g. to obis): this is just a summary of information available elsewhere in the outputs of PEMA, but gathered in one file for ease of findability. I can provide guidance as to the content of this when this issue is worked on.
Hello there,
I am reaching out for help regarding an error I am facing twice and I don't know what is exactly the problem and how to solve it.
I am running my data using PEMA latest version through docker V 3.2.2 on mac OS V10.13.6
I have 100 samples and the problem is happing after 3 days and 5 hours of running approximately.
the error to me is timing out error ( perhaps I am mistaken! ) however, I copied the error below and I would be thankful if you could guide me on that.
Thanks in advance and looking forward to hearing from you
Best Regards,
Elham
SPAdes log can be found here: /mnt/analysis/pema_result_22April2021/3.correct_by_BayesHammer/ERR0000100/spades.log
Thank you for using SPAdes!
Error correction using BayesHammer is completed!
filtered_max_ERR0000011_1.fastq.gz.1P.fastq.00.0_0.cor.fastq.gz
filtered_max_ERR0000011_2.fastq.gz.2P.fastq.00.0_0.cor.fastq.gz
ERR NOFILE filtered_max_ERR0000011_1.fastq.gz.1P.fastq.00.0_0.cor.fastq.gz
Too confused to continue.
Try -h for help.
Task failed:
Program & line : './pema_latest.bds', line 444
Task Name : ''
Task ID : 'pema_latest.bds.20210422_075345_852/task.pema_latest.line_444.id_659'
Task PID : '55544'
Task hint : '/home/tools/PANDAseq/bin/pandaseq -f filtered_max_ERR0000011_1.fastq.gz.1P.fastq.00.0_0.cor.fastq.gz -r filtered_max_ERR0000011_2.fastq.gz.2P.fastq.00'
Task resources : 'cpus: 1 mem: -1.0 B timeout: 86400 wall-timeout: 86400'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[]'
Output files : '[]'
Script file : '/home/pema_latest.bds.20210422_075345_852/task.pema_latest.line_444.id_659.sh'
Exit status : '1'
StdErr (10 lines) :
ERR NOFILE filtered_max_ERR0000011_1.fastq.gz.1P.fastq.00.0_0.cor.fastq.gz
Too confused to continue.
Try -h for help.
Fatal error: ./pema_latest.bds, line 446, pos 4. Task/s failed.
Creating checkpoint file '/home/pema_latest.bds.line_446.chp'
Consider of adding as an alternative the Midori dbs
the Eukaryote CO1 Reference Set For The RDP Classifier.
This issue is about describing the parameter file that is required to run PEMA in a machine-interoperable way: describing the formats and including defaults for all entries in the file, for example
Work has begun on this, but it needs a review and completion
@marc-portier BigDataScript supports reading json
files; see here.
The readParameterFile
function of pema
reads the .tsv
parameters file line-by-line to return a bds "dictionary".
We could edit this function to read a .json
file instead and have this .json
file in an RO-Crate oriented way. 😎
This is about describing the PEMA outputs in a formal way, so that they are more machine-interoperable.
Similarly to the parameters.tsv
file and its structured format (see issue #35 ) we need to do so with the pema output.
In the PEMA's output files file we had started working on it.
We shall go for it step-by-step.
@cpavloud feel free to contribute 😛
Hi, I'm getting an error from pandaseq
when using BGI reads. Before that, I ran a small dataset from the same BGI run (different barcode) successfully.
I'm using singularity v. 3.8.6
and pema_v.2.1.4.sif
.
How can I run pandaseq-checkid "V350194505L1C001R00100000782/1 BH:ok"
using a container?
Here's the output:
ERR BADID V350194505L1C001R00100000782:::1:0:0:0: V350194505L1C001R00100000782/1 BH:ok
* * * * * Something is wrong with this ID. If tags are absent, try passing the -B option.
* * * * * Consult `pandaseq-checkid "V350194505L1C001R00100000782/1 BH:ok"` to get an idea of the problem..
Task failed:
Program & line : '/home/modules/preprocess.bds', line 334
Task Name : ''
Task ID : 'pema_latest.bds.20240326_090348_413/task.preprocess.line_334.id_15'
Task PID : '1399067'
Task hint : '/home/tools/PANDAseq/bin/pandaseq -f filtered_max_ERR0000001_1.fastq.gz.1P.fastq.00.0_0.cor.fastq.gz -r filtered_max_ERR0000001_2.fastq.gz.2P.fastq.00'
Task resources : 'cpus: 1 mem: -1.0 B timeout: 86400 wall-timeout: 86400'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[]'
Output files : '[]'
Script file : '/home/bioinf/pema_latest.bds.20240326_090348_413/task.preprocess.line_334.id_15.sh'
Exit status : '1'
StdErr (10 lines) :
0x1677340:1 STAT READS 0
0x1677340:1 STAT NOALGN 0
0x1677340:1 STAT LOWQ 0
0x1677340:1 STAT BADR 0
0x1677340:1 STAT SLOW 0
0x1677340:1 STAT OK 0
0x1677340:1 STAT OVERLAPS 0
ERR BADID V350194505L1C001R00100000782:::1:0:0:0: V350194505L1C001R00100000782/1 BH:ok
* * * * * Something is wrong with this ID. If tags are absent, try passing the -B option.
* * * * * Consult `pandaseq-checkid "V350194505L1C001R00100000782/1 BH:ok"` to get an idea of the problem..
It would be very useful and time-saving when we are trying to determine the appropriate parameters for each dataset, if we could run pema in distinct steps so as to be able to run the analysis partially.
It would be nice to consider adding this option!
Thank you
Using the current PEMA version (pema v.2.1.3), when choosing
EnaData No
the sequence files are converted to the necessary format for PEMA to run.
During this conversion, they get new names such as ERR0000001, ERR0000002, ERR0000003 etc.
However, in the finalTable.tsv (and the other output files at the XXX_taxon_assign folder), the sample names are ERR1, ERR2, ERR3 etc.
One think that has been requested is to enhance the final_table.tsv file to include (apart from the columns it already includes), the NCBI Taxon ID for each ASV/OTU and the accession number of the sequence that was its closest match in the database used. The NCBI Taxon ID could then be used as the taxonConceptID when submitting data to GBIF/OBIS using the DwC-A format (as discussed here)
For example, instead of the current final_table.tsv file, which looks like this
OTU_id,ERR0000008,ERR0000009,Classification
Otu1,1123,2,Eukaryota;Arthropoda;Insecta;Plecoptera;Capniidae;Allocapnia;Allocapnia aurora
Otu2,3,0,Eukaryota;Porifera;Demospongiae;Hadromerida;Polymastiidae;Polymastia;Polymastia littoralis
(Ideally) It could be something like this
OTU_id,ERR0000008,ERR0000009,Classification,Accession_number,NCBI_Taxon_ID
Otu1,1123,2,Eukaryota;Arthropoda;Insecta;Plecoptera;Capniidae;Allocapnia;Allocapnia aurora,JN200445,608846
Otu2,3,0,Eukaryota;Porifera;Demospongiae;Hadromerida;Polymastiidae;Polymastia;Polymastia littoralis,NC_023834,1473587
If it is not possible to retrieve the accession number and/or the NCBI taxon ID, I think we can find some workarounds.
Perhaps it will be possible to retrieve the NCBI Taxon ID using the Bio.Entrez package
Running PEMA for the ITS sample under /sanity_checks with the given parameters file fails with the following message:
mv: target 'data_after_cutadapt/' is not a directory
Fatal error: ./pema_latest.bds, line 342, pos 5. Exec failed.
Exit value : 1
Command : mv *fastq.gz data_after_cutadapt/
Error with the PEMA ASV inference. Possibly due to spelling error.
The line that possibly was skipped because of spelling error in the parameters file.
} else if (paramsDereplication{'clusteringAlgo'} == 'algo_Swarm') {
In the parameters file I write clusteringAlgo algo_swarm
, while it is suggested to write (write "Swarm" or "vsearch" or "CROP" after algo_).
In the initialize.bds script there is a line that creates the folder Swarm
.
The error
Fatal error: /home/modules/taxAssignment.bds, line 11. Directory '/mnt/analysis/isd_crete_2016_20230823/7.mainOutput/gene_16S/swarm' does not exists
pema_latest.bds, line 156 : if ( paramsForTaxAssign{'custom_ref_db'} != 'Yes'){
pema_latest.bds, line 158 : if ( paramsForTaxAssign{'gene'} == 'gene_16S') {
pema_latest.bds, line 170 : if (paramsForTaxAssign{'taxonomyAssignmentMethod'} != 'phylogeny') {
pema_latest.bds, line 172 : crestAssign(paramsForTaxAssign, globalVars)
taxAssignment.bds, line 4 : string crestAssign(string{} params, string{} globalVars) {
taxAssignment.bds, line 6 : if ( params{'custom_ref_db'} != 'Yes') {
taxAssignment.bds, line 9 : if ( (params{'gene'} == 'gene_16S' || params{'gene'} == 'gene_18S') && params{'taxonomyAssignmentMethod'} != 'phylogeny' ) {
taxAssignment.bds, line 11 : globalVars{'assignmentPath'}.chdir()
The parameters file:
parameters0f.isd_crete_2016_20230823.txt
I run the tutorial command in step 3 singularity run -B /root/Desktop/pema/analysis_directory/:/mnt/analysis /root/Desktop/pema_v.1.3.1.sif
Approx 95% complete for SRR3231901_1.fastq.gz
Analysis complete for SRR3231901_1.fastq.gz
Task failed:
Program & line : '/home/pema_latest.bds', line 173
Task Name : ''
Task ID : 'pema_latest.bds.20201123_083118_572/task.pema_latest.line_173.id_6'
Task PID : '561'
Task hint : '/home/tools/fastqc/FastQC/fastqc --outdir /mnt/analysis/16S_final_test/1.quality_control /mnt/analysis/mydata/README.md'
Task resources : 'cpus: 1 mem: -1.0 B timeout: 86400 wall-timeout: 86400'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[]'
Output files : '[]'
Script file : '/root/Desktop/pema_latest.bds.20201123_083118_572/task.pema_latest.line_173.id_6.sh'
Exit status : '1'
StdErr (10 lines) :
Picked up _JAVA_OPTIONS: -Dawt.useSystemAAFontSettings=on -Dswing.aatext=true
Failed to process /mnt/analysis/mydata/README.md
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
at uk.ac.babraham.FastQC.Sequence.FastQFile.(FastQFile.java:89)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:152)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.(OfflineRunner.java:121)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
Fatal error: /home/pema_latest.bds, line 175, pos 1. Task/s failed.
pema_latest.bds, line 175 : wait
Hi!
When checking "outputPerSample", for some of my samples the accession numbers are correct in the file names, but in the file itself it's another accession number. See an example below with files for ERR4914067.
This ERR number does not appear inside these files, but ERR4914068 and ERR4914071 do. "profile_ERR4914067.csv" even has 3 ERR numbers including two ERR4914068.
profile_ERR4914067.csv
Relative_Abundance_ERR4914067.csv
Richness_ERR4914067.csv
All_Cumulative_ERR4914067.csv
Additionally, ERR4614067 does not appear in the final table. Maybe it could be related to what I described above.
Thanks for your input!
According to recent emails with the MIDORI developers, it seems wise to update PEMA to where the midori db is now published. Hopefully this will solve a couple of issues that we have had (1) the gaps in the taxonomic classification output when there are missing taxon nodes (2) some were errors and discrepancies in the classifications wrt NCBI
Copy of the emails (latest to first):
Sorry to say that we are no more updating the databases in "MIDORI server”.
We are updating only databased you can download from here : http://www.reference-midori.info/download.php#
Hi Christina,
Thank you for your email.
I think PEMA is using old MIDORI database.
I have fixed this problem quite long time ago.
In all formats, except RAW files, we have inserted missing taxonomy by creating it from a lower taxonomic ranking (ex. description in class-level was missing, so it was created from order-level in the following example, >JF502242.1.7041.7724 root_1;Eukaryota_2759;Chordata_7711;class_Crocodylia_1294634;Crocodylia_1294634;Crocodylidae_8493;Crocodylus_8500;Crocodylus intermedius_184240).
Will it be possible that you download recent databases from our cite and locally perform the taxonomic assignment?
We are using NCBI taxonomy for all MIDORI databases.
I think those inconsistency is happening because PEMA is using old database (NCBI taxonomy has been consistently revised).
If you have further questions, please write me back again.
Best regards, Ryuji
Dear Dr Machida,
My name is Christina Pavloudi and I am a Post Doctoral Researcher at the CNRS.
In my previouds Post Doc position, I was working for the ARMS-MBON project (my colleagues are in CC), where we were sequencing ARMS samples for COI (among other genes) and we were using PEMA for the analyses of the results.
PEMA is using MIDORI for the taxonomic assignment of COI reads, hence I am contacting you regarding an issue we came across.
At the moment, the MIDORI output does not always have the same number of columns, i.e. the same number of taxonomic levels, for all the assignments.
You can see an example in the the attached file ("Example_species_notall.tsv")
For some assignments, the output has all the 8 levels: root, superkingdom, phylum, class, order, family, genus, species (see attached file "Example_species_alllevels.tsv").
It would be extremely helpful, in terms of FAIRness for the ARMS-MBON project, if the MIDORI output was consistent and always contained the 8 levels, even if some columns were empty (see attached "Example_species_emptylevels.tsv"). Do you perhaps consider doing something like this for future versions of MIDORI?
Also, could I ask which taxonomy you are using in MIDORI?
Because, as you can see in "Example_species_emptylevels_completed.tsv", for some of the species in question the missing taxonomic levels do exist (if we check at the WoRMS, but also at the NCBI Taxonomy). Also, some of them are different from the output that is produced by MIDORI.
It has been observed that the current setup (we'll call it original
approach), can be time consuming.
In cases of hundreds of samples this can be a limiting factor for pema.
An alternative would be to try the fastp tool (fastp
preprocessing approach).
Issue
When setting the "removeSingletons" parameter to "Yes" in PEMA, I'm noticing that many singletons still appear in the final output table (often hundreds of them towards the end of the table) while they should not be there.
Details
I am using PEMA version 2.1.4, running it on ARMS data using the LifeWatch workflow on the Tesseract platform.
It would be appreciated if this behavior could be looked into, as it's crucial for the accuracy of our analyses to ensure that unwanted singletons are excluded when specified.
Thank you!
This is more of a question of how PEMA uses storage for each run. For my project I have 140 samples with PE sequences resulting to 14 gb of data.
14G ./my data
196G /pema215_otu
Is possible to reduce the storage needed for a run of PEMA or all output is required?
For example I have 2 all_samples.fasta (one in mainOutput and one in PEMA folder) files and 1 final_all_samples.fasta, are all necessary?
Also some intermediate folders like linearizedSequences
, mergedSequences
take up similar space as the mydata
folder.
The reason for this issue is that in large scale projects this can lead to exceeding disk quota.
To facilitate use and understanding of PEMA by external users (e.g. future Tesseract users), the file PEMA's output files.md needs a few updates and more details:
A new user should be able to understand what each file/folder corresponds to without having to dive in the depths of the PEMA code.
When I am running 16S data with the vsearch algorithm using the current version of PEMA (pema v.2.1.3), the "8.outputPerSample" folder is empty.
I haven't tried it yet with other genes/sets of parameters to see if the error is being repeated.
New version of the CREST algorithm is also available now. In this version, the latest Silva version and the PR2 database are included.
Thus, by integrating this CREST algorithm, then the related issues (#21 and #26 ) will be addressed.
Here is the repo of the previous CREST version, the one that is currently used in PEMA.
@lanzen could you please let us know when ready? 🎉 thanks a lot in advance!
Choosing PEAR as a merging algorithm option in PANDAseq seems to cause disruption and failing in PEMA v. 2.1.4, while the steps before PANDAseq run fine.
I mention the last lines of the output file, after SPADes is finished:
Adjusting sequences using the BayesHammer algorithm of SPAdes has been completed.
Fatal error: /home/modules/preprocess.bds, line 334, pos 1. Map 'params' does not have key 'elimination'.
pema_latest.bds, line 84 : merging(paramsSpadesMerging, globalVars)
preprocess.bds, line 263 : string merging(string{} params, string{} globalVars){
preprocess.bds, line 287 : for ( string correctFile : correct ) {
preprocess.bds, line 334 : task $globalVars{'path'}/tools/PANDAseq/bin/pandaseq -f $forwardFile -r $reverseFile -6 \
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 9 / 0, nodes: 1422 -> 9771 -> 9772 -> 4554 -> 4775 -> 4779 -> 4903 -> 4904 -> 4906
Node Id : 4927
bdsNode Id : 4906
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 8 / 0, nodes: 1422 -> 9771 -> 9772 -> 4554 -> 4775 -> 4779 -> 4903 -> 4904
Node Id : 4906
bdsNode Id : 4904
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 7 / 0, nodes: 1422 -> 9771 -> 9772 -> 4554 -> 4775 -> 4779 -> 4903
Node Id : 4904
bdsNode Id : 4903
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 6 / 0, nodes: 1422 -> 9771 -> 9772 -> 4554 -> 4775 -> 4779
Node Id : 4903
bdsNode Id : 4779
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 5 / 0, nodes: 1422 -> 9771 -> 9772 -> 4554 -> 4775
Node Id : 4779
bdsNode Id : 4775
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 4 / 0, nodes: 1422 -> 9771 -> 9772 -> 4554
Node Id : 4775
bdsNode Id : 4554
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 3 / 0, nodes: 1422 -> 9771 -> 9772
Node Id : 4554
bdsNode Id : 9772
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 2 / 0, nodes: 1422 -> 9771
Node Id : 9772
bdsNode Id : 9771
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 1 / 0, nodes: 1422
Node Id : 9771
bdsNode Id : 1422
It is noted that elimination parameter is on as mentioned above, copied from the parameters file used:
################################################################
################# PANDAseq (v. 2.11) ################### // https://storage.googleapis.com/pandaseq/pandaseq.html
################################################################
pandaseqAlgorithm pear
pandaseqThreads 20
pandaseqMinlen
minoverlap 12
threshold 0.6
elimination -N
Nevertheless, PEAR can run standalone in Zorbas with the following commands:
module load pear/0.9.11
pear -f -r -o
Could this be considered please, in order to be fixed?
When I analyze 18S using my custom_ref_db, I get this error.
This is the parameter and custom db I made.
parameters.txt (Original file name is "parameters.tsv")
hikim_test_SSEQ.nds.txt (Original file name is "hikim_test_SSEQ.nds")
hikim_test.fasta.txt (Original file name is "hikim_test.fasta")
So, I tested your "crest_algo_example" (http://pema/analysis_directory/custom_ref_db/crest_algo_example/).
Even if I use the crest example file you uploaded, I get the same error.
I have my doubts about whether your example files work well.
So what to ask for
Thank you!
Consider adding the new SILVA release
https://www.arb-silva.de/documentation/release-1381/
We would like that the outputs from PEMA are described in a ro-crate.json file. VLIZ can help in how to fill the file with content. It will be the same idea as the ro-crate produced by MetaGOflow (see comment below), but with some extra provenance-related fields (see e.g. https://github.com/emo-bon/observatory-bpns-data/blob/main/ro-crate-metadata.json)
@kmexter can advice on the content of this file.
Using reference databases from this repo, PEMA could integrate the analysis of the 12S rRNA marker gene.
Christina has found a bug when using 18S data.
@cpavloud Could you please describe it?
Check repo and publication
Hello, I ran through the demo data given and there wasn't a problem. I have a few fastq files that I converted to fastq.gz using the provided convertIllumunaRawDataToEnaFormat.sh
. This created a directory and file in the mydata
dir (namely mapping_files_for_PEMA.txt
rawDataInEnaFormat/
) I ran with those files in the directory and got an error (as others have gotten by keeping the README.md as I thought I addressed initially here with Akhilbiju01's question) I promptly moved them out so that I only have the fastq.gz files and am getting this error involving no "no space*". I looked through the source code in PEMA_v1.2.bds
line 507 to 515 the file is created (a temp file I imagine) then deleted. It appears this non-existent folder is supposed to # merge all lines of a fastq entry into one and only one line
given by this line:
sys awk 'NR==1 {print ; next} {printf /^>/ ? "\n"$0"\n" : $1} END {printf "\n"}' se.$derepl > nospace.$derepl
So at this point I'm uncertain if the data is bad or if there's bug in the code. Here's my full output from the run:
$ singularity run -B /p/home/tclack/bio/pema-1.2/test/analysis_folder/:/mnt/analysis ./pema_v.1.1.sif
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
this ouput file already exists
1E_S29_L001_R1_001.fastq.gz1E_S29_L001_R2_001.fastq.gz2G_S42_L001_R2_001.fastq.gz2H_S48_L001_R1_001.fastq.gz3C_S17_L001_R1_001.fastq.gz3C_S17_L001_R2_001.fastq.gzperl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US.UTF-8",
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US.UTF-8",
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US.UTF-8",
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US.UTF-8",
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US.UTF-8",
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
Started analysis of 3C_S17_L001_R1_001.fastq.gz
Started analysis of 2H_S48_L001_R1_001.fastq.gz
Started analysis of 1E_S29_L001_R1_001.fastq.gz
Started analysis of 1E_S29_L001_R2_001.fastq.gz
Started analysis of 2G_S42_L001_R2_001.fastq.gz
Approx 5% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 5% complete for 2H_S48_L001_R1_001.fastq.gz
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US.UTF-8",
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
Started analysis of 3C_S17_L001_R2_001.fastq.gz
Approx 5% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 10% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 10% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 15% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 5% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 10% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 15% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 15% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 20% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 10% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 20% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 25% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 15% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 5% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 5% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 20% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 25% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 25% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 30% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 20% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 30% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 30% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 35% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 40% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 25% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 35% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 35% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 45% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 50% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 30% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 10% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 40% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 45% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 40% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 55% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 35% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 10% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 50% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 45% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 60% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 55% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 60% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 50% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 65% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 70% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 40% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 15% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 65% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 55% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 75% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 45% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 70% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 60% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 80% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 50% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 15% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 75% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 65% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 85% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 55% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 20% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 80% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 70% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 90% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 95% complete for 2H_S48_L001_R1_001.fastq.gz
Approx 60% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 85% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 75% complete for 3C_S17_L001_R1_001.fastq.gz
Analysis complete for 2H_S48_L001_R1_001.fastq.gz
Approx 65% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 20% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 90% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 95% complete for 2G_S42_L001_R2_001.fastq.gz
Approx 80% complete for 3C_S17_L001_R1_001.fastq.gz
Analysis complete for 2G_S42_L001_R2_001.fastq.gz
Approx 70% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 25% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 85% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 75% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 90% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 80% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 95% complete for 3C_S17_L001_R1_001.fastq.gz
Approx 30% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 25% complete for 1E_S29_L001_R2_001.fastq.gz
Analysis complete for 3C_S17_L001_R1_001.fastq.gz
Approx 85% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 90% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 95% complete for 3C_S17_L001_R2_001.fastq.gz
Approx 35% complete for 1E_S29_L001_R1_001.fastq.gz
Analysis complete for 3C_S17_L001_R2_001.fastq.gz
Approx 30% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 40% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 35% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 45% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 50% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 40% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 55% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 45% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 60% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 50% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 55% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 65% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 60% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 70% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 65% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 75% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 70% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 80% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 85% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 75% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 80% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 90% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 95% complete for 1E_S29_L001_R1_001.fastq.gz
Approx 85% complete for 1E_S29_L001_R2_001.fastq.gz
Analysis complete for 1E_S29_L001_R1_001.fastq.gz
Approx 90% complete for 1E_S29_L001_R2_001.fastq.gz
Approx 95% complete for 1E_S29_L001_R2_001.fastq.gz
Analysis complete for 1E_S29_L001_R2_001.fastq.gz
FastQC is completed!
readF is: 1E_S29_L001_R1_001.fastq.gz
readF is: 1E_S29_L001_R2_001.fastq.gz
readF is: 2G_S42_L001_R2_001.fastq.gz
readF is: 2H_S48_L001_R1_001.fastq.gz
readF is: 3C_S17_L001_R1_001.fastq.gz
readF is: 3C_S17_L001_R2_001.fastq.gz
Trimmomatic is done
Error correction using BayesHammer is completed!
Merging step by SPAdes is completed
all the first steps are done! clustering is about to start!
rm: cannot remove 'se.*': No such file or directory
rm: cannot remove 'nospace.*': No such file or directory
Fatal error: /home/PEMA_v1.bds, line 517, pos 1. Exec failed.
Exit value : 1
Command : rm se.* nospace.*
PEMA_v1.bds, line 517 : sys rm se.* nospace.*
Hi Haris! I'd like to use PEMA on some metabarcoding data for my graduate work. I successfully installed the Singularity image on my university's HPC environment, but I was hoping for some advice about how to estimate the HPC resources I would need to run PEMA in my job script (we use Slurm). Specifically, do you have any guidance for the #SBATCH specifications and values I should use?
For context, I have 96 samples that were PE sequenced for COI, 18S, and 16S amplicons (euk eDNA metabarcoding) on an Illumina MiSeq, so I have 576 fastq files as my raw sequencing data. I would like to use a custom ref db for COI, so I will follow your instructions about training the RDP classifier (I know this will likely affect computational load). Thank you!
Reading file all.nonchimeras.fasta 100%
1242678 nt in 4409 seqs, min 200, max 390, avg 282
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 155 Size min 2, max 692, avg 28.4
Singletons: 0, 0.0% of seqs, 0.0% of clusters
Writing OTU table (classic) 100%
cp: cannot stat '16S_otutab.txt': No such file or directory
Fatal error: ./pema_latest.bds, line 694, pos 3. Exec failed.
Exit value : 1
Command : cp 16S_otutab.txt rna_otutab_its_taxon_assign.txt
We should add in the parameters file the version of SWARM algorithm that is implemented in PEMA.
Also, the version of CROP and of the RAxML-ng (and PaPaRa and EPA-ng).
And the version of cutadapt that is being used for the primer removal in the case of ITS.
And for the MIDORI database, we need to specify the GenBank release that it was based on.
I think that for all the other tools, the versioning information is already there.
Also, we should mention somewhere in the parameters file that RPDClassifier is being used for the COI gene and we should also mention the version of the RPDClassifier.
Similarly, we should also mention the CREST is being used for the 16S, 18S and ITS markers.
Also, we should add the thresholds/default values used by the classifiers for the taxonomic identification of the sequences.
Then, we could add this information in the otu_seq_comp_appr term when submitting data to GBIF/OBIS using the DwC-A format.
Then, after every analysis, the user will have full provenance (regarding tools and parameters implemented) stored in the copy of the parameters file inside the output folder.
A new db mostly for metazoan was recently published:
repo
publication
consider training crest or rdp with that and integrate it the pema workfow.
@cpavloud check and share your thougts! 😅
In the next release if PEMA, could the extended final table that is produced when asked for in the parameter files
This is to make it so that the ARMS workflow in the tesseract can look in just one place to get this file, rather than a slightly different place depending on the parameters set by tye user
I wanted to run 18S data using the current version (pema v.2.1.3) and with the swarm algorithm.
I used the attached parameter setting
parameters_1st_try.txt
However, the analysis went until step 4.mergingPairedEndFiles and then an error came up
`
Merging step by SPAdes is completed
Marker gene under study 18S.
Fatal error: /home/modules/initialize.bds, line 193, pos 18. Map 'params' does not have key 'clusteringAlgoFor16S_18SrRNA'.
pema_latest.bds, line 95 : buildDirectories(paramsSpadesMerging, globalVars )
initialize.bds, line 158 : string buildDirectories(string{} params, string{ } globalVars){
initialize.bds, line 162 : if ( params{'gene'} == 'gene_COI' ) {
initialize.bds, line 175 : } else if ( params{'gene'} == 'gene_16S' ) {
initialize.bds, line 188 : } else if ( params{'gene'} == 'gene_18S' ) {
initialize.bds, line 193 : if ( params{'clusteringAlgoFor16S_18SrRNA' } == 'algo_Swarm' ) {
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 10 / 0, nodes: 1422 -> 8178 -> 8179 -> 2295 -> 230 3 -> 2356 -> 2409 -> 2415 -> 2432 -> 2433
Node Id : 2434
bdsNode Id : 2433
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 9 / 0, nodes: 1422 -> 8178 -> 8179 -> 2295 -> 2303 -> 2356 -> 2409 -> 2415 -> 2432
Node Id : 2433
bdsNode Id : 2432
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 8 / 0, nodes: 1422 -> 8178 -> 8179 -> 2295 -> 2303 -> 2356 -> 2409 -> 2415
Node Id : 2432
bdsNode Id : 2415
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 7 / 0, nodes: 1422 -> 8178 -> 8179 -> 2295 -> 2303 -> 2356 -> 2409
Node Id : 2415
bdsNode Id : 2409
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 6 / 0, nodes: 1422 -> 8178 -> 8179 -> 2295 -> 2303 -> 2356
Node Id : 2409
bdsNode Id : 2356
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 5 / 0, nodes: 1422 -> 8178 -> 8179 -> 2295 -> 2303
Node Id : 2356
bdsNode Id : 2303
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 4 / 0, nodes: 1422 -> 8178 -> 8179 -> 2295
Node Id : 2303
bdsNode Id : 2295
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 3 / 0, nodes: 1422 -> 8178 -> 8179
Node Id : 2295
bdsNode Id : 8179
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 2 / 0, nodes: 1422 -> 8178
Node Id : 8179
bdsNode Id : 8178
ProgramCounter.pop(100): Node ID does not match!
PC : PC: size 1 / 0, nodes: 1422
Node Id : 8178
bdsNode Id : 1422
`
Then, I thought that maybe it needed a little tricking (like the one I did here), so I changed the gene (in the parameters)
parameters_2nd_try.txt
This time, the analysis went until step 7.mainOutput and the files
asvs_representatives_all_samples.fasta all.denovo.nonchimeras.fasta asvs_repr_with_singletons.fasta all_samples.fasta asvs.stats all_sequences_grouped.fa asvs.swarms amplicon_contingency_table.tsv mysilvamod132_18S_taxon_assign.xml asvs_contingency_table.tsv
were created
but nothing was added in the 18S_taxon_assign folder and this error came up
`
Traceback (most recent call last):
File "/home/tools/CREST/LCAClassifier/bin/classify", line 16, in
sys.exit(LCAClassifier.classify.main())
File "/home/tools/CREST/LCAClassifier/src/LCAClassifier/classify.py", line 662, in main
otuFile=open(options.otus,"r")
IOError: [Errno 2] No such file or directory: 'allTab_18S_taxon_assign.tsv'
Task failed:
Program & line : '/home/modules/taxAssignment.bds', line 59
Task Name : ''
Task ID : 'pema_latest.bds.20211019_114721_581/task.taxAssignment.line_59.id_1843'
Task PID : '3380'
Task hint : '/home/tools/CREST/LCAClassifier/bin/classify; -c /home/tools/CREST/LCAClassifier/parts/etc/lcaclassifier.conf; -d silva132; -t allTab_18S_taxon_assign'
Task resources : 'cpus: 1 mem: -1.0 B timeout: 86400 wall-timeout: 86400'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[]'
Output files : '[]'
Script file : '/home1/bilbao/pema_latest.bds.20211019_114721_581/task.taxAssignment.line_59.id_1843.sh'
Exit status : '1'
StdErr (10 lines) :
Traceback (most recent call last):
File "/home/tools/CREST/LCAClassifier/bin/classify", line 16, in
sys.exit(LCAClassifier.classify.main())
File "/home/tools/CREST/LCAClassifier/src/LCAClassifier/classify.py", line 662, in main
otuFile=open(options.otus,"r")
IOError: [Errno 2] No such file or directory: 'allTab_18S_taxon_assign.tsv'
Fatal error: /home/modules/taxAssignment.bds, line 65, pos 13. Task/s failed.
pema_latest.bds, line 151 : if ( paramsForTaxAssign{'custom_ref_db'} != 'Yes'){
pema_latest.bds, line 153 : if ( paramsForTaxAssign{'gene'} == 'gene_16S' || paramsForTaxAssign{'gene'} == 'gene_18S' || paramsForTaxAssign{'gene'} == 'gene_ITS') {
pema_latest.bds, line 165 : if (paramsForTaxAssign{'taxonomyAssignmentMethod'} != 'phylogeny') {
pema_latest.bds, line 167 : crestAssign(paramsForTaxAssign, globalVars)
taxAssignment.bds, line 4 : string crestAssign(string{} params, string{} globalVars) {
taxAssignment.bds, line 6 : if ( params{'custom_ref_db'} != 'Yes') {
taxAssignment.bds, line 9 : if ( (params{'gene'} == 'gene_16S' || params{'gene'} == 'gene_18S') && params{'taxonomyAssignmentMethod'} != 'phylogeny' ) {
taxAssignment.bds, line 24 : if ( params{'silvaVersion'} == 'silva_128' ) {
taxAssignment.bds, line 46 : } else if ( params{'silvaVersion'} == 'silva_132' ) {
taxAssignment.bds, line 65 : wait
`
Consider a normalization of 16S results using
https://rrndb.umms.med.umich.edu
Hello,
I've been testing out PEMA with my data for the past few days and have run into an issue. I got it going, and all was fine until my job ran out of memory. It was partway through the clustering step when this happened. I wasn't too worried because I know PEMA has a checkpoint system I can use to restart at this point. I am having issues doing this, however.
I am running the code on a slurm controlled HPC cluster:
singularity exec -B /nesi/nobackup/uoaxxxxx/PEMA/analysis_folder/:/mnt/analysis /nesi/nobackup/uoaxxxxx/PEMA/pema_latest.sif /home/tools/BDS/.bds/bds -r /nesi/nobackup/uoaxxxxx/PEMA/analysis_folder/trimming.chp
and I get the error:
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport Exception in thread "main" java.lang.RuntimeException: File not found '/nesi/nobackup/uoa02559/PEMA/analysis_folder/trimming.chp' at org.bds.util.Gpr.reader(Gpr.java:553) at org.bds.util.Gpr.reader(Gpr.java:534) at org.bds.serialize.BdsSerializer.load(BdsSerializer.java:271) at org.bds.Bds.runCheckpoint(Bds.java:886) at org.bds.Bds.run(Bds.java:853) at org.bds.Bds.main(Bds.java:185)
I have tried with other checkpoint files also but with no luck. Am I being dense or should this restart the process at that stage?
Congrats on the pipeline and publication!
Thanks for your help,
Jed
As implemented, by keeping column 2 (line 18) of this file, the species assigment is missed.
BBmap is available here:
https://sourceforge.net/projects/bbmap/
it has been published in PloS One:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657622/
and adopted by a wide community, including the JGI (here is a guide in their website for bbmerge: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbmerge-guide/)
Depending on how the suite of tools may be integrated in Pema, merging can be achieved through bbmerge, but additional steps (trimming, adapter removal) may also be handled with the same package in a very fast and efficient way.
Thank you for considering adding the tool, it appears to handle better than pandaseq the merging step of fully overlapping reads (insert size equal to read length cases)!
Consider of adding PR2 as an alternative to the Silva database.
Here you may see how you could use PR2.
It would be super useful to return the pema main output (otu/asv table) in a 7-level taxonomy format, meaning all taxonomy assignments are as:
d__Bacteria; p__Abyssubacteria; c__SURF-5; o__SURF-5; f__SURF-5; g__SURF-5; s__SURF-5 sp003598085
Hi @hariszaf ,
The dev version (2.1.5) produces very long recurrent taxonomy labels like this one:
Main genome;Eukaryota;Excavata;Discoba;Kinetoplastea;Kinetoplastea (class);X (Kinetoplastea (class));Kinetoplastea (X (Kinetoplastea (class)));XX (Kinetoplastea (X (Kinetoplastea (class))));Kinetoplastea (XX (Kinetoplastea (X (Kinetoplastea (class)))));XXX (Kinetoplastea (XX (Kinetoplastea (X (Kinetoplastea (class))))));Kinetoplastea (XXX (Kinetoplastea (XX (Kinetoplastea (X (Kinetoplastea (class)))))));XXX (Kinetoplastea (XXX (Kinetoplastea (XX (Kinetoplastea (X (Kinetoplastea (class))))))));sp. (XXX (Kinetoplastea (XXX (Kinetoplastea (XX (Kinetoplastea (X (Kinetoplastea (class)))))))))
Main genome
may be irrelevant, as well.
PEMA tries to unzip folders that have already been unzipped when running using a checkpoint.
Fix that by finding a workaround in line 35
Hi,
I am currently trying to understand what your dockerfiles do and the installation using build-from-source approach, purpose of this being support by EasyBuild.
In my understanding, dockerfiles in ./pemabase
folder take care of environment (dependencies) and the ./Dockerfile.pema
does little more than just copying PEMA scripts to correct directories inside docker container.
Im guessing then, that it shouldn't be too hard to achieve what I'm trying to do, had I known all the dependencies.
Therefore, let me ask you, in case I did get the PEMA installation process correctly, have you, by any chance, got a list of dependencies, maybe containing even their required versions, other than those in pemabase Dockerfiles (since they are a tad harder to read and definitely not very specific)?
Thanks a lot for your answer! :-)
Currently, the option remove_singletons is only available in the ASV pipleline, please add it for the OTU one as well. Οr better, an option of filtering OTUs based on a user-defined filter :-)
Thanks!
Natassa
Hi!
Just a comment for PEMA v.2.1.4, probably it's a small bug.
When Swarm is chosen as a clustering algorithm for ITS, altough the asvs.swarms file is created in the end, the names of the sequences and, subsequently, the names at the extenedFinalTable.tsv and finalTable.tsv are OtuXXX instead of ASVXXX.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.