Giter VIP home page Giter VIP logo

Comments (20)

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

Hi,

I am also facing the same issue stated above. I have 120 GB of plant sequencing data from which I want to do Chloroplast and Mitochondria assemblies. While assembling chloroplast from the data I got the same error:

Unable to generate results with single copy vertex percentage < 50%

I Highly appreciate some help in solving the issue.

Thank you!

from getorganelle.

JianjunJin avatar JianjunJin commented on August 12, 2024

@SowmyaPulapet @sanhuacat
Please provide the assembly graph in either fastg or visualized png form for troubleshooting

from getorganelle.

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

Assembly graph

This is the assembly graph I got. I am confused about why it shows both embplant_pt and embplant_mt. I am assembling only the chloroplast.

The command I used:

get_organelle_from_reads.py -1 test_1.fq.gz -2 test_2.fq.gz -o PS-plastome -F embplant_pt -t 20

Along with this I also got another error:

Disentangling failed: 'No new connections.'

This happened when I reran the command with a lesser word count (-w 75) and increasing --max-reads .

Feel free to let me know if you need any other information.

Thank you!

from getorganelle.

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

Hi @JianjunJin,

I am running out of time. Could you please help me with this?

from getorganelle.

sanhuacat avatar sanhuacat commented on August 12, 2024

Dr. Jin,

I reran the program and have provided the information from log.txt and the assembly graph png.

As mentioned previously, the output files are not normal, so I have provided the extended_spades\K105\assembly_graph.fastg instead of the assembly graph in the regular output directory.

The log file indicates that the SPAdes software does not seem to have run successfully. Does this mean that the assembly process was completed but the circular graph resolution failed?

I believe that the data was recognized as Sanger due to the sequencing platform, which should not be the reason for the issues with assembly. I would like to understand the meaning of "Unable to generate result with single copy vertex percentage < 50%" and find a solution for it.

Thanks again.
屏幕截图 2024-04-10 154439

GetOrganelle v1.7.7.0

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]
PLATFORM: Linux localhost.localdomain 4.18.0-80.11.2.el8_0.x86_64 #1 SMP Tue Sep 24 11:32:19 UTC 2019 x86_64 x86_64
PYTHON LIBS: GetOrganelleLib 1.7.7.0; numpy 1.24.3; sympy 1.12; scipy 1.10.1
DEPENDENCIES: Bowtie2 2.4.1; SPAdes 3.13.1; Blast 2.12.0
GETORG_PATH=/home/lou/.GetOrganelle
LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1
WORKING DIR: /nfs/Cold_storage/Sbi_data/CAU_SbiReseq/clean_data
/nfs/lou/miniconda3/envs/getorganelle/bin/get_organelle_from_reads.py -1 GM003.final.R1.fq.gz -2 GM003.final.R2.fq.gz -F embplant_pt -o /nfs/lou/cpg30/GM003 -R 15 -t 10 -k 21,45,65,85,105 -s /nfs/lou/Sorghum_bicolor_cp.fasta

2024-04-03 16:28:36,368 - INFO: Pre-reading fastq ...
2024-04-03 16:28:36,398 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf')
2024-04-03 16:28:41,822 - INFO: Tasting 100000+100000 reads ...
2024-04-03 16:36:29,871 - INFO: Estimating reads to use finished.
2024-04-03 16:36:30,056 - INFO: Unzipping reads file: GM003.final.R1.fq.gz (13761484241 bytes)
2024-04-03 16:37:28,923 - INFO: Unzipping reads file: GM003.final.R2.fq.gz (13696338551 bytes)
2024-04-03 16:38:26,289 - INFO: Counting read qualities ...
2024-04-03 16:38:27,521 - INFO: Identified quality encoding format = Sanger
2024-04-03 16:38:27,522 - INFO: Phred offset = 33
2024-04-03 16:38:27,523 - INFO: Trimming bases with qualities (0.00%): 33..33 !
2024-04-03 16:38:27,610 - INFO: Mean error rate = 0.0049
2024-04-03 16:38:27,611 - INFO: Counting read lengths ...
2024-04-03 16:40:21,282 - INFO: Mean = 140.4 bp, maximum = 150 bp.
2024-04-03 16:40:21,313 - INFO: Reads used = 7482673+7482673
2024-04-03 16:40:21,314 - INFO: Pre-reading fastq finished.

2024-04-03 16:40:21,314 - INFO: Making seed reads ...
2024-04-03 16:40:22,373 - INFO: Making seed - bowtie2 index ...
2024-04-03 16:40:31,160 - INFO: Making seed - bowtie2 index finished.
2024-04-03 16:40:31,161 - INFO: Mapping reads to seed bowtie2 index ...
2024-04-03 16:41:38,434 - INFO: Mapping finished.
2024-04-03 16:41:38,573 - INFO: Seed reads made: /nfs/lou/cpg30/GM003/seed/embplant_pt.initial.fq (357841720 bytes)
2024-04-03 16:41:38,632 - INFO: Making seed reads finished.

2024-04-03 16:41:38,632 - INFO: Checking seed reads and parameters ...
2024-04-03 16:41:38,633 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2024-04-03 16:41:38,633 - INFO: If the result graph is not a circular organelle genome,
2024-04-03 16:41:38,633 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run.
2024-04-03 16:42:50,385 - INFO: Pre-assembling mapped reads ...
2024-04-03 16:50:25,529 - INFO: Pre-assembling mapped reads finished.
2024-04-03 16:50:25,560 - INFO: Estimated embplant_pt-hitting base-coverage = 958.33
2024-04-03 16:50:51,432 - INFO: Reads reduced to = 3904016+3904016
2024-04-03 16:50:51,432 - INFO: Adjusting expected embplant_pt base coverage to 500.00
2024-04-03 16:50:51,433 - INFO: Estimated word size(s): 105
2024-04-03 16:50:51,433 - INFO: Setting '-w 105'
2024-04-03 16:50:51,433 - INFO: Setting '--max-extending-len inf'
2024-04-03 16:50:53,103 - INFO: Checking seed reads and parameters finished.

2024-04-03 16:50:53,103 - INFO: Making read index ...
2024-04-03 16:51:10,926 - INFO: For /nfs/lou/cpg30/GM003/1-GM003.final.R1.fq.gz.fastq, only top 3904016 reads are used in downstream analysis.
2024-04-03 16:52:15,306 - INFO: For /nfs/lou/cpg30/GM003/2-GM003.final.R2.fq.gz.fastq, only top 3904016 reads are used in downstream analysis.
2024-04-03 16:52:39,843 - INFO: 6005450 candidates in all 7808032 reads
2024-04-03 16:52:39,878 - INFO: Pre-grouping reads ...
2024-04-03 16:52:39,879 - INFO: Setting '--pre-w 105'
2024-04-03 16:52:40,392 - INFO: 200000/1246567 used/duplicated
2024-04-03 16:52:51,162 - INFO: 5757 groups made.
2024-04-03 16:52:54,792 - INFO: Making read index finished.

2024-04-03 16:52:54,793 - INFO: Extending ...
2024-04-03 16:52:54,793 - INFO: Adding initial words ...
2024-04-03 16:53:23,774 - INFO: AW 11410352
2024-04-03 16:54:12,897 - INFO: Round 1: 6005450/6005450 AI 271161 AW 11476374
2024-04-03 16:54:44,603 - INFO: Round 2: 6005450/6005450 AI 271981 AW 11489456
2024-04-03 16:55:41,275 - INFO: Round 3: 6005450/6005450 AI 272615 AW 11498544
2024-04-03 16:56:15,546 - INFO: Round 4: 6005450/6005450 AI 273294 AW 11507744
2024-04-03 16:56:49,671 - INFO: Round 5: 6005450/6005450 AI 273984 AW 11515272
2024-04-03 16:57:23,559 - INFO: Round 6: 6005450/6005450 AI 274415 AW 11520702
2024-04-03 16:57:58,129 - INFO: Round 7: 6005450/6005450 AI 274861 AW 11525516
2024-04-03 16:58:31,470 - INFO: Round 8: 6005450/6005450 AI 275375 AW 11531414
2024-04-03 16:59:02,604 - INFO: Round 9: 6005450/6005450 AI 275816 AW 11536476
2024-04-03 16:59:35,025 - INFO: Round 10: 6005450/6005450 AI 276242 AW 11541920
2024-04-03 17:00:08,163 - INFO: Round 11: 6005450/6005450 AI 276721 AW 11547248
2024-04-03 17:00:39,730 - INFO: Round 12: 6005450/6005450 AI 277212 AW 11551604
2024-04-03 17:01:11,540 - INFO: Round 13: 6005450/6005450 AI 277522 AW 11555108
2024-04-03 17:01:40,362 - INFO: Round 14: 6005450/6005450 AI 277894 AW 11559594
2024-04-03 17:02:11,451 - INFO: Round 15: 6005450/6005450 AI 278322 AW 11564216
2024-04-03 17:02:11,452 - INFO: Hit the round limit 15 and terminated ...
2024-04-03 17:02:28,882 - INFO: Extending finished.

2024-04-03 17:02:29,446 - INFO: Separating extended fastq file ...
2024-04-03 17:02:38,535 - INFO: Setting '-k 21,45,65,85,105'
2024-04-03 17:02:38,535 - INFO: Assembling using SPAdes ...
2024-04-03 17:02:39,179 - INFO: spades.py -t 10 --phred-offset 33 -1 /nfs/lou/cpg30/GM003/extended_1_paired.fq -2 /nfs/lou/cpg30/GM003/extended_2_paired.fq --s1 /nfs/lou/cpg30/GM003/extended_1_unpaired.fq --s2 /nfs/lou/cpg30/GM003/extended_2_unpaired.fq -k 21,45,65,85,105 -o /nfs/lou/cpg30/GM003/extended_spades
2024-04-03 17:28:50,851 - INFO: Insert size = 162.362, deviation = 36.9732, left quantile = 121, right quantile = 209
2024-04-03 17:28:50,852 - INFO: Assembling finished.

2024-04-03 17:30:59,629 - INFO: Slimming /nfs/lou/cpg30/GM003/extended_spades/K105/assembly_graph.fastg finished!
2024-04-03 17:30:59,630 - INFO: Slimming assembly graphs finished.

2024-04-03 17:30:59,631 - INFO: Extracting embplant_pt from the assemblies ...
2024-04-03 17:30:59,635 - INFO: Disentangling /nfs/lou/cpg30/GM003/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a circular genome ...
2024-04-03 17:30:59,778 - INFO: Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%'
2024-04-03 17:30:59,779 - INFO: Scaffolding disconnected contigs using SPAdes scaffolds ...
2024-04-03 17:30:59,779 - WARNING: Assembly based on scaffolding may not be as accurate as the ones directly exported from the assembly graph.
2024-04-03 17:30:59,779 - INFO: Disentangling /nfs/lou/cpg30/GM003/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a circular genome ...
2024-04-03 17:30:59,787 - INFO: Disentangling failed: 'No new connections.'
2024-04-03 17:30:59,787 - INFO: Disentangling /nfs/lou/cpg30/GM003/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a/an embplant_pt-insufficient graph ...
2024-04-03 17:30:59,837 - INFO: Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%'
2024-04-03 17:30:59,837 - INFO: Please ...
2024-04-03 17:30:59,837 - INFO: load the graph file 'assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg,assembly_graph.fastg' in K105
2024-04-03 17:30:59,837 - INFO: load the CSV file 'assembly_graph.fastg.extend-embplant_pt-embplant_mt.csv' in K105
2024-04-03 17:30:59,837 - INFO: visualize and export your result in Bandage.
2024-04-03 17:30:59,837 - INFO: If you have questions for us, please provide us with the get_org.log.txt file and the post-slimming graph in the format you like!
2024-04-03 17:30:59,837 - INFO: Extracting embplant_pt from the assemblies failed.

Total cost 3840.43 s
Thank you!

from getorganelle.

JianjunJin avatar JianjunJin commented on August 12, 2024

Assembly graph

This is the assembly graph I got. I am confused about why it shows both embplant_pt and embplant_mt. I am assembling only the chloroplast.

The command I used:

get_organelle_from_reads.py -1 test_1.fq.gz -2 test_2.fq.gz -o PS-plastome -F embplant_pt -t 20

Along with this I also got another error:

Disentangling failed: 'No new connections.'

This happened when I reran the command with a lesser word count (-w 75) and increasing --max-reads .

Feel free to let me know if you need any other information.

Thank you!

FYI, it's not organelle sufficient, because there are a few high-depth embplant_pt contigs having dead-ends (being terminal contigs). Try to solve this issue first.

from getorganelle.

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

Hi @JianjunJin ,

Could you please give some input on how that can be achieved? Is the error "No new connections" also due to the same issue?

from getorganelle.

JianjunJin avatar JianjunJin commented on August 12, 2024

@sanhuacat

  • Sanger is the quality encoding format (see https://en.wikipedia.org/wiki/FASTQ_format), not the seq tech.
  • SPAdes is running good.
  • At some point, it's a good result with an organelle-sufficient and relatively clean graph. It's a little complex due to LSC sharing a small repeat with IR. But GetOrganelle can handle it well usually. No idea why GetOrganelle didn't recognize the multiplicities correctly in this simple case (and triggered the <50% issue), probably due to uneven coverage - not turned on in your provided image.

You may either 1) do manual curation using get_organelle_from_assembly.py to automatically extract the pt from the manually-curated graph or 2) try GetOrganelle v1.8.0, which using an updated disentangling module but not formally released yet. You may send the fastg file to me if you want me to test it out.

from getorganelle.

JianjunJin avatar JianjunJin commented on August 12, 2024

@SowmyaPulapet
Please see https://github.com/Kinggerm/GetOrganelle/wiki/FAQ#what-should-i-do-with-incomplete-resultbroken-assembly-graph for finetuning.

The "No new connections" was printed because GetOrganelle was trying to fix the terminal contigs but failed - it's on the same track leading to insufficient.

from getorganelle.

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

Hi @JianjunJin ,

I am already aware of this Wiki section and it is appreciatable how informative and detailed the Wiki for this tool is.

Among the solutions suggested; I have already tried the following:

  1. Reduced the word size from 89 to 75
  2. Increased the input reads with these options: --reduce-reads-for-coverage or --max-reads
  3. Increasing the number of rounds

In all those runs, I got the organelle insufficient graphs with the above-mentioned errors. I will try a run with the related genome as the seed. But I am not sure whether it is suggested for chloroplast genome also.

Please let me know what would you suggest if I have made all the above modifications.

Thank you

from getorganelle.

JianjunJin avatar JianjunJin commented on August 12, 2024

@SowmyaPulapet
It's not clear enough through this graph (depth not turned on), but if you set the depth in Bandage, you would likely get rid of real embplant_mt contigs. Although the SSC region is not clear, my intuition here is that there is only one gap in the LSC.

I didn't see your complete log; However, further reducing word size and/or using related as the seed may help.

from getorganelle.

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

from getorganelle.

sanhuacat avatar sanhuacat commented on August 12, 2024

@JianjunJin

Thank you for your answer. I tried to solve it manually. Although I didn't understand why there was an error, I eventually got the usable genome.
Looking forward to new updates!

from getorganelle.

JianjunJin avatar JianjunJin commented on August 12, 2024

@SowmyaPulapet @sanhuacat
Please note that Disentangling failed is not an error indicating abnormal execution, but rather an expected outcome in many runs. It is analogous to obtaining low support or unusual results in a statistical estimation, which can occur due to limitations in the data or imperfections in the model's or algorithm's suitability for the given problem.
Probably the log message appears too alarming.

from getorganelle.

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

@JianjunJin

As suggested I did a rerun with closely related species as seed. This is the command used:

~/Tools/GetOrganelle/get_organelle_from_reads.py -1 ../Trimming/test_R1_val_1.fq.gz -2 ../Trimming/test_R2_val_2.fq.gz -o plastome -F embplant_pt -t 20 -s ../Reference/CP_Genome.fasta -w 65

Unfortunately this time, the .fastg graph was not at all generated. I am also attaching the log file here.

Please have a look and provide your suggestions.

Thank you.
get_org.log.txt

from getorganelle.

JianjunJin avatar JianjunJin commented on August 12, 2024

@SowmyaPulapet
The graph is available at plastome/extended_spades/K115/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg according to your log file

from getorganelle.

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

@JianjunJin

graph

This is the graph I got from the path. What can I do further?

from getorganelle.

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

@JianjunJin Hi, any input from your side?

from getorganelle.

JianjunJin avatar JianjunJin commented on August 12, 2024

@SowmyaPulapet It's not clear to me but likely organelle sufficient now. Try to load the csv (with blast info) and manually curate the graph in Bandage, e.g. remove the contigs with shallow depth coverages and see what remains.

from getorganelle.

SowmyaPulapet avatar SowmyaPulapet commented on August 12, 2024

Yes, I figured it out and got the complete genome.

Thanks for your inputs @JianjunJin !

from getorganelle.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.