Comments (4)
Hi @osilander,
Thanks for reaching out! Those shorter contigs are likely primarily due to the tigmint-long
step, which detects and cuts the 'goldtigs' (golden path reads, pre-scaffolding) at putative misassemblies/chimeric regions. Depending on where those cuts are made you can end up with these very short sequences, which can be safely filtered out of the assembly.
It is also possible to have sequences shorter than the read lengths because the initial GoldPath stage performs some trimming on reads while generating the goldtigs/golden path (~1X representation of the underlying genome).
I hope that makes sense - just let us know if you have any other questions!
Thank you for your interest in GoldRush!
Lauren
from goldrush.
Thanks for the explanation.
I was looking a little more into this and found that the contig length distribution seems quite odd. There are many contigs that are exactly (or very close) to specific (round) numbers - 2,000bp, 3,000bp, etc.
This becomes very apparent when you look at the histogram or cumulative curves(see below). For example, I have 40,014 total contigs. 2,059 are between 1,001bp and 1,999 bp in length but 2,673 are exactly 2,000bp in length. Similarly, 3,025 are between 2,001 and 2,999 in length; 138 are exactly 3,000bp, and 316 are between 2,999 and 3,001.
My read length distributions are very continuous (ONT 10.4.1, dorado basecalls). This contig length pattern continues up to approximately 20,000bp - there are unexpected bumps in contig lengths at 4,000 5,000 6,000 7,000 etc.
There is also a strange drop-off in contigs that are greater than 1,000bp compared to less than 1,000bp (attached).
goldrush-hist.pdf
Is this possibly something specific to my install? Ubuntu 20.04.5, goldrush v1.0.1 I get no errors/warnings during assembly. Have you ever seen this before?
from goldrush.
Hello. The reason you see a lot of contigs at those specific lengths is because the GoldPath module within GoldRush evaluates each read as non-overlapping tiles, which is by default of length 1,000 bp with the exception of the last tile. Part of the GoldPath module involves trimming reads based on overlap and since GoldPath is evaluating reads as a collection of tiles, trimming is done by removing tiles. The trimmed read will either of length (x remaining tiles * 1000 bp) or (x -1 remaining tiles + length of last tile).
from goldrush.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your interest in GoldRush!
from goldrush.
Related Issues (20)
- core dump in goldrush-edit HOT 7
- zsh & pipefail HOT 7
- Combine silver path and golden path generation in one step
- optimize parameters for large genome HOT 6
- Using existing Draft Assembly in place of Gold Path Input HOT 3
- Goldrush for PacBio or ONT + PacBio HOT 3
- ONT read lengths mostly under 10 kb HOT 2
- goldrush for generating UMI consensus? HOT 6
- Debian Installation issues - memory free issue on test dataset HOT 9
- Error when running goldrush_test_demo HOT 3
- Question about warning and output HOT 3
- bc unknown command HOT 4
- Can goldrush be used to assemble hifi-reads? HOT 2
- Dependencies not found with conda environment. HOT 3
- Filename should not contain . HOT 10
- miller required for goldrush_test_demo.sh HOT 2
- tmpdir/shared_dir pass through from parent goldrush? HOT 2
- terminate called after throwing an instance of 'std::system_error' HOT 23
- Add `ulimit -u` command in driver makefile HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from goldrush.