Hi, I'm getting an error from pandaseq when using BGI

Here are my parameters: <div class="snippet-clipboard-content notranslate position

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

You may run <div class="snippet-clipboard-content notranslate position-relative ov

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

pandaseq error "Something is wrong with this ID" about pema HOT 17 CLOSED

stas-malavin commented on July 19, 2024

pandaseq error "Something is wrong with this ID"

from pema.

Comments (17)

hariszaf commented on July 19, 2024 1

One extra tip, I saw in your parameters that you ask for the NCBI id
As you will have a vast number of ASVs, I would leave this part out and I would find another way to get the ncbi ids once the abundance table is ready.

from pema.

stas-malavin commented on July 19, 2024

Here are my parameters:

outputFolderName	Ayalon1_18S_pr2
EnaData	Yes
sequencerPrefix	V
maxInfo	Yes
targetLength	100
strictness	0.8
adapters	TruSeq2-PE.fa
seedMismatches	0
palindromeClipThreshold	20
simpleClipThreshold	30
leading	20
trailing	2
minlen	100
threadsTrimmomatic	10
pandaseqAlgorithm	simple_bayesian
pandaseqThreads	10
pandaseqMinlen	
minoverlap	1
threshold	0.6
elimination	
vsearchThreads	20
vsearchId	0.97
gene	gene_18S
taxonomyAssignmentMethod	alignment
numberOfCoresForPapara	20
referenceDb	pr2
taxonomyFolderName	my_taxon_assign
forwardITSPrimer	GATGAAGAACGYAGYRAA
reverseITSPrimer	CTBTTVCCKCTTCACTCG
clusteringAlgo	algo_Swarm
d	15
boundary	3
swarmThreads	20
removeSingletons	Yes
omp_num_threads	20
abskew	2
midori_version	midori_1
custom_ref_db	No
name_of_custom_db	partialCustomdb
getNCBITaxId	Yes
phyloseq	No
tree	Yes
raxmlThreads	20
parsTrees	10
bootstrapTrees	100
emptyRawDataFile	Yes
emptyCheckpoints	Yes
classifierAlgo	CREST

from pema.

hariszaf commented on July 19, 2024

Hi @stas-malavin.
Was this sample among the ones you used in your

from pema.

stas-malavin commented on July 19, 2024

Sorry, I don't understand your question

from pema.

hariszaf commented on July 19, 2024

Apologies.
My question was whether the filtered_max_ERR0000001 sample that in your error seems to lead to that error, was also among the samples you had in your small test that performed fine.

My guess here is that maybe something's funny with the BGI format but I do no have experience with that. If you d like to you could send me the two paired read files of that sample to have a look.

How can I run pandaseq-checkid "V350194505L1C001R00100000782/1 BH:ok" using a container?

I am not sure I understand your question here, but if you would like to run pema after first initiating a container in an interactive way, you can always run

singularity exec -B <>:/mnt/analysis pema_v2.1.5.sif bash
cd /home

However, in the Singularity world you cannot edit a script, as it's a read-only environment, you could do this more straightforward in a Docker environment.
However, pandaseq is already using the -B flag is that is that you were thinking.

Hope this helps and I d be happy to help more if I can

from pema.

stas-malavin commented on July 19, 2024

Hi, filtered_max_ERR0000001 specifically comes from adjustingSequences step of the pipeline (from hammerBayes, I suppose). ERR0000001, the initial sample, was not among the samples I tested on. It wasn't a test, indeed, it was just another sample, and it went smoothly. It was only one (here in this set I have several), but it probably doesn't matter.

Here are the links to the files (forward and reverse):
https://drive.google.com/file/d/185ttx3WXYqlmm7UAJZ1BO_PaKRtkVZD6/view?usp=sharing
https://drive.google.com/file/d/15i3habgwlyazpZ3zs2r0NYMnSlY3bVe7/view?usp=sharing

Thanks!

from pema.

hariszaf commented on July 19, 2024

Hi @stas-malavin

I was able to run your sample using pema v.2.1.5 that is about to be released with the fastp option.

It seems that your BGI reads (I guess they come from a non Illumina platform) come with a format that our initial pipeline cannot deal with for now.

I would suggest you go with the latest version of pema and the fastp option in the preprocess step.

If you have also Docker on your system, you can pull this by running

docker pull hariszaf/pema:v.2.1.5

if not, please let me know how I could send you the .sif file of the v.2.1.5 version.

Thanks again and thanks for bringing up the BGI inconsistency.

from pema.

stas-malavin commented on July 19, 2024

Hi @hariszaf ,
I've pulled the Docker container.
Could you please specify how do I run it?
docker run --rm -it -v .:/mnt/analysis hariszaf/pema:v.2.1.5 from the manual just opens a shell inside the container. I also tried without -i, which stands for 'interactive', same result.

from pema.

hariszaf commented on July 19, 2024

You may run

cd /home
./pema_latest.bds

This is still a beta version - I am checking some last combinations
but I ran your sample with fastp preprocess, Swarm and CREST and it went smooth. 🚀

from pema.

stas-malavin commented on July 19, 2024

root@a7faceac8cd9:~# cd /home
root@a7faceac8cd9:/home# ./pema_latest.bds 
Picked up JAVA_TOOL_OPTIONS: -XX:+UseContainerSupport
Fatal error: modules/initialize.bds, line 34, pos 16. Map 'my_case' does not have key 'preprocess'.

from pema.

hariszaf commented on July 19, 2024

Apologies, I should have mentioned that this version will use an updated parameters file.

Attached the parameters.tsv I used ; github does not support tsv so just unzip this and get the file from there.

parameters.zip

from pema.

stas-malavin commented on July 19, 2024

In an old parameters.tsv you had Swarm's d=15, while the current value is 3. Why such a big difference?..
And also, would appreciate if you share your experience. I've tried Swarm, and it always seemed a bit arbitrary to me, how to select d. Did you run any tests on that?

from pema.

stas-malavin commented on July 19, 2024

Ah, I see, you're now removing oligotons from the occurrence 5

from pema.

hariszaf commented on July 19, 2024

This value has a lot to do with the taxonomic group you are targeting.
When using for example COI as your marker gene it's wquite common you go with high values ~12.
If you are working with bacteria you d go with lower, even 1.

Did you run any tests on that?

If you would have a mock community, it would be super beneficial for you to set your params.

In the pema publication, i remember we had done some validations with different d values so maybe you could also have a look there, but the best thing is to always have a mock community to drive your parameters. I know this is hard and maybe not always an option, but to my knowledge is the best way to go :)

from pema.

stas-malavin commented on July 19, 2024

good point, thanks

from pema.

stas-malavin commented on July 19, 2024

how should I set the boundary parameter? It's empty in the current file

from pema.

stas-malavin commented on July 19, 2024

Hi @hariszaf , the development version works, thank you.

from pema.

pandaseq error "Something is wrong with this ID" about pema HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent