Comments (9)
Hi Irilenia,
Thanks for this -- could you pass on a mini fastq file containing some of these problematic reads so we can use them for debugging?
-Lars
from bio-tradis.
from bio-tradis.
from bio-tradis.
Hello, we are also seeing a set of sample files of different lengths longer than they should be and this seems to be the likely cause. Thanks for looking into it,
Best wishes,
Lisa
from bio-tradis.
We see the same problem, looking through the code it appears that the CIGAR.pm file which parses the CIGAR reported by the mapping software is incorrect. My understanding is that the original parser was incorrect, it was then changed a few months ago to fix the soft clipping issue but that causes the software to be incorrect for other CIGAR characters since they were erroneously grouped with the soft clipping character in the same 'if' statement.
According to page 8 of the SAM tools manual https://samtools.github.io/hts-specs/SAMv1.pdf, only characters {M,D,N,=,X} consume the reference and therefore we should only change the coordinate upon encountering those characters.
I'm working on the software for the Quadram Institute, I have changed the If statements in the CIGAR parser to:-
if($action eq 'M' || $action eq 'D' || $action eq 'N' || $action eq '=' || $action eq 'X' ){
$results{start} = $current_coordinate - 1 if($results{start} == -1);
$current_coordinate += $number;
$results{end} = $current_coordinate -1;
}
There are other changes that we are making and this correction should be pushed in alongside other changes in due course.
from bio-tradis.
Hi Martin,
Do you mean that you want to remove any handling of soft-clipped bases? If so, I think this would just reintroduce the bug I fixed in issue #120, basically that soft-clipped bases at the beginning and ends of reads need to consume bases in the reference, as the design of TraDIS primers are such that the first base corresponds to the insert site -- if there's a base calling error early in the read this will lead to soft-clipping, and a miscalled insertion site as @irilenia showed in that issue report. I have similar data that shows this is a common problem, and shouldn't be ignored. It seems to be a bigger problem with the switch to bwa, as the default smalt parameters were such that reads with more than one or two mismatches were generally tossed.
I think to solve the current issue properly, we would need to track the end position of the chromosome, and forbid insertion sites that extend beyond that -- I haven't looked to see how difficult this would be.
-Lars
from bio-tradis.
from bio-tradis.
Hi Martin,
I agree the X and D probably should not be grouped with soft-clipping in terms of extending the coordinates at the ends, but I don't believe this is the problem either here or with issue #120.
If you read #120, you'll see soft-clipping has to be considered to get accurate insertion sites. Basically extra bases from the soft clipping need to be appended to the edge of read first when calculating the start site, otherwise mismatches near the 5' end of the read will lead to a shift from the true insert site at the read start. Once this has been done, S needs to consume bases in the reference, as you'll end up with an incorrect alignment stop site otherwise.
Some of the current issue may be related to the current handling of soft-clipping, but I don't think this was entirely created by my updated handling of soft-clipping and I suspect it's really an issue with the padding in InsertSite.pm -- for example this problem seems to predate my fix for soft-clipping, see issue #86. My guess is that the start and end of the genome aren't being tracked properly, and this probably interacts poorly with anything that modifies the alignment coordinates, particularly when you've got split alignments.
-Lars
from bio-tradis.
from bio-tradis.
Related Issues (20)
- How to deal with biological replicates - tradis_essentiality.R HOT 4
- installing biotradis on MacOS HOT 4
- test failures with samtools v1.10 HOT 1
- Essentiality analysis code HOT 4
- ftp download link in biotradistutorial.pdf no longer exists
- Possible issue with calculation of read start and ends in module Cigar.pm HOT 2
- tradis_essentiality.R output a file containing change points?
- tradis_comparison issue HOT 3
- -mm paramater not working as expected HOT 1
- Unable to install biotradis
- Essentiality function not working HOT 1
- Issues on running BioTraDIS on multiple contigs HOT 1
- EdgeR not working with new R version?
- how is the insertion index calculated? HOT 6
- Installing Bio-Tradis Fail HOT 6
- Duplicate genes HOT 4
- Port 18 volume discrepancy? HOT 6
- Installation on Ubuntu 17.10 fails HOT 2
- Using Bio-Tradis to analyze mariner-transposon library HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bio-tradis.