Comments (17)
One extra tip, I saw in your parameters that you ask for the NCBI id
As you will have a vast number of ASVs, I would leave this part out and I would find another way to get the ncbi ids once the abundance table is ready.
from pema.
Here are my parameters:
outputFolderName Ayalon1_18S_pr2
EnaData Yes
sequencerPrefix V
maxInfo Yes
targetLength 100
strictness 0.8
adapters TruSeq2-PE.fa
seedMismatches 0
palindromeClipThreshold 20
simpleClipThreshold 30
leading 20
trailing 2
minlen 100
threadsTrimmomatic 10
pandaseqAlgorithm simple_bayesian
pandaseqThreads 10
pandaseqMinlen
minoverlap 1
threshold 0.6
elimination
vsearchThreads 20
vsearchId 0.97
gene gene_18S
taxonomyAssignmentMethod alignment
numberOfCoresForPapara 20
referenceDb pr2
taxonomyFolderName my_taxon_assign
forwardITSPrimer GATGAAGAACGYAGYRAA
reverseITSPrimer CTBTTVCCKCTTCACTCG
clusteringAlgo algo_Swarm
d 15
boundary 3
swarmThreads 20
removeSingletons Yes
omp_num_threads 20
abskew 2
midori_version midori_1
custom_ref_db No
name_of_custom_db partialCustomdb
getNCBITaxId Yes
phyloseq No
tree Yes
raxmlThreads 20
parsTrees 10
bootstrapTrees 100
emptyRawDataFile Yes
emptyCheckpoints Yes
classifierAlgo CREST
from pema.
Hi @stas-malavin.
Was this sample among the ones you used in your
from pema.
Sorry, I don't understand your question
from pema.
Apologies.
My question was whether the filtered_max_ERR0000001
sample that in your error seems to lead to that error, was also among the samples you had in your small test that performed fine.
My guess here is that maybe something's funny with the BGI format but I do no have experience with that. If you d like to you could send me the two paired read files of that sample to have a look.
How can I run pandaseq-checkid "V350194505L1C001R00100000782/1 BH:ok" using a container?
I am not sure I understand your question here, but if you would like to run pema after first initiating a container in an interactive way, you can always run
singularity exec -B <>:/mnt/analysis pema_v2.1.5.sif bash
cd /home
However, in the Singularity world you cannot edit a script, as it's a read-only environment, you could do this more straightforward in a Docker environment.
However, pandaseq
is already using the -B
flag is that is that you were thinking.
Hope this helps and I d be happy to help more if I can
from pema.
Hi, filtered_max_ERR0000001
specifically comes from adjustingSequences step of the pipeline (from hammerBayes, I suppose). ERR0000001
, the initial sample, was not among the samples I tested on. It wasn't a test, indeed, it was just another sample, and it went smoothly. It was only one (here in this set I have several), but it probably doesn't matter.
Here are the links to the files (forward and reverse):
https://drive.google.com/file/d/185ttx3WXYqlmm7UAJZ1BO_PaKRtkVZD6/view?usp=sharing
https://drive.google.com/file/d/15i3habgwlyazpZ3zs2r0NYMnSlY3bVe7/view?usp=sharing
Thanks!
from pema.
I was able to run your sample using pema v.2.1.5
that is about to be released with the fastp
option.
It seems that your BGI reads (I guess they come from a non Illumina platform) come with a format that our initial pipeline cannot deal with for now.
I would suggest you go with the latest version of pema and the fastp
option in the preprocess step.
If you have also Docker on your system, you can pull this by running
docker pull hariszaf/pema:v.2.1.5
if not, please let me know how I could send you the .sif
file of the v.2.1.5
version.
Thanks again and thanks for bringing up the BGI inconsistency.
from pema.
Hi @hariszaf ,
I've pulled the Docker container.
Could you please specify how do I run it?
docker run --rm -it -v .:/mnt/analysis hariszaf/pema:v.2.1.5
from the manual just opens a shell inside the container. I also tried without -i
, which stands for 'interactive', same result.
from pema.
You may run
cd /home
./pema_latest.bds
This is still a beta version - I am checking some last combinations
but I ran your sample with fastp preprocess, Swarm and CREST and it went smooth. 🚀
from pema.
root@a7faceac8cd9:~# cd /home
root@a7faceac8cd9:/home# ./pema_latest.bds
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
Fatal error: modules/initialize.bds, line 34, pos 16. Map 'my_case' does not have key 'preprocess'.
from pema.
Apologies, I should have mentioned that this version will use an updated parameters file.
Attached the parameters.tsv I used ; github does not support tsv
so just unzip this and get the file from there.
from pema.
In an old parameters.tsv
you had Swarm's d=15, while the current value is 3. Why such a big difference?..
And also, would appreciate if you share your experience. I've tried Swarm, and it always seemed a bit arbitrary to me, how to select d
. Did you run any tests on that?
from pema.
Ah, I see, you're now removing oligotons from the occurrence 5
from pema.
This value has a lot to do with the taxonomic group you are targeting.
When using for example COI as your marker gene it's wquite common you go with high values ~12.
If you are working with bacteria you d go with lower, even 1.
Did you run any tests on that?
If you would have a mock community, it would be super beneficial for you to set your params.
In the pema publication, i remember we had done some validations with different d values so maybe you could also have a look there, but the best thing is to always have a mock community to drive your parameters. I know this is hard and maybe not always an option, but to my knowledge is the best way to go :)
from pema.
good point, thanks
from pema.
how should I set the boundary parameter? It's empty in the current file
from pema.
Hi @hariszaf , the development version works, thank you.
from pema.
Related Issues (20)
- add remove_singletons option in OTU pipeline HOT 4
- Error when unzipping using a checkpoing HOT 1
- HPC job specifications HOT 4
- custom_ref_db doesn't work. HOT 5
- Upgrade to crest4
- provide pema main data product in a 7-level taxonomy format HOT 3
- extended final table HOT 1
- PEMA Ro-Crate HOT 1
- PEMA to create a run summary file HOT 2
- MIDORI updates HOT 2
- Unexpected Singletons in Final Table despite removeSingletons set to "Yes" HOT 5
- Error PEMA ASV inference: Directory swarm does not exists HOT 5
- 16s copy number
- When Swarm is chosen for the ITS, the names of the sequences are OtuXX and not ASVXX HOT 4
- Needed updates on "PEMA's output files.md" HOT 2
- time consuming preprocessing
- Storage usage of PEMA HOT 1
- Long recurrent taxonomic labels HOT 1
- Wrong sample names in OutputPerSample HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pema.