Giter VIP home page Giter VIP logo

Comments (4)

lcoombe avatar lcoombe commented on July 17, 2024

Hi @osilander,

Thanks for reaching out! Those shorter contigs are likely primarily due to the tigmint-long step, which detects and cuts the 'goldtigs' (golden path reads, pre-scaffolding) at putative misassemblies/chimeric regions. Depending on where those cuts are made you can end up with these very short sequences, which can be safely filtered out of the assembly.

It is also possible to have sequences shorter than the read lengths because the initial GoldPath stage performs some trimming on reads while generating the goldtigs/golden path (~1X representation of the underlying genome).

I hope that makes sense - just let us know if you have any other questions!

Thank you for your interest in GoldRush!
Lauren

from goldrush.

osilander avatar osilander commented on July 17, 2024

Thanks for the explanation.
I was looking a little more into this and found that the contig length distribution seems quite odd. There are many contigs that are exactly (or very close) to specific (round) numbers - 2,000bp, 3,000bp, etc.

This becomes very apparent when you look at the histogram or cumulative curves(see below). For example, I have 40,014 total contigs. 2,059 are between 1,001bp and 1,999 bp in length but 2,673 are exactly 2,000bp in length. Similarly, 3,025 are between 2,001 and 2,999 in length; 138 are exactly 3,000bp, and 316 are between 2,999 and 3,001.

My read length distributions are very continuous (ONT 10.4.1, dorado basecalls). This contig length pattern continues up to approximately 20,000bp - there are unexpected bumps in contig lengths at 4,000 5,000 6,000 7,000 etc.

There is also a strange drop-off in contigs that are greater than 1,000bp compared to less than 1,000bp (attached).
goldrush-hist.pdf

Is this possibly something specific to my install? Ubuntu 20.04.5, goldrush v1.0.1 I get no errors/warnings during assembly. Have you ever seen this before?

from goldrush.

jwcodee avatar jwcodee commented on July 17, 2024

Hello. The reason you see a lot of contigs at those specific lengths is because the GoldPath module within GoldRush evaluates each read as non-overlapping tiles, which is by default of length 1,000 bp with the exception of the last tile. Part of the GoldPath module involves trimming reads based on overlap and since GoldPath is evaluating reads as a collection of tiles, trimming is done by removing tiles. The trimmed read will either of length (x remaining tiles * 1000 bp) or (x -1 remaining tiles + length of last tile).

from goldrush.

github-actions avatar github-actions commented on July 17, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your interest in GoldRush!

from goldrush.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.