Comments (7)
@mehrankr FYI. I just added a Python script to split Octopus new tagged realigned BAM output into separate BAM files.
from octopus.
Thanks for your question. The --bamout
option has different input depending on whether you're analysing a single sample or multiple samples. For a single sample, you need to specify the full output path (e.g.~/octopus/minibams/realigned.bam
). The reason your first command fails is that you're trying to write to an existing directory. For multi-sample analysis the input is a directory (e.g. ~/octopus/minibams
). I'll try to make this clearer in the documentation.
Note that Octopus does not produce 'split' realigned BAMs in the same way that my bamsplit script does (and earlier versions of Octopus). Rather, all realigned reads for a given sample are written to a single BAM, and the supporting haplotype is annotated with the HI
BAM tag. I think this is a better way of doing things as it allows clearing visualisation in alignment browsers that support coloured reads, such as IGV. If you need to work with 'split' BAMs, then it should be fairly straightforward to write a Python script using pysam to generate them by making use of the HI
tag in the Octopus realigned BAM.
from octopus.
Thanks Dan for getting back to me so quickly.
That makes sense. However, when I look at the output BAM file, all of my prior tags in the input BAM are excluded and there is only the RG tag.
In pysam, I only see the following for all reads:
In [1]: read.get_tags()
Out[1]: [('RG', '1')]
This, however, is for an unpaired 50 bp experiment. Does it mean that Octopus couldn't do any phasing because of the short reads?
from octopus.
Ah... I missed that you are using v0.5.2-beta. This version of Octopus does produce split BAM files: you need to use the --split-bamout
option rather than the --bamout
option. In v0.6.0-beta the --split-bamout
option was removed and BAM tags were added.
from octopus.
Thanks Dan. this worked well for all of the files I had, except one paired end ATAC-seq library which ended with the following debug message:
[2019-03-30 01:35:09] <DEBG> There are 38 reads in chr3_GL000221v1_random:44103-44304 after filtering
[2019-03-30 01:35:09] <DEBG> Measuring block chr3_GL000221v1_random:44203-44204 containing 1 calls
[2019-03-30 01:35:09] <DEBG> Fetched 19 unfiltered reads from chr3_GL000221v1_random:50521-50722
[2019-03-30 01:35:09] <DEBG> In sample 93-VU147T
[2019-03-30 01:35:09] <DEBG> 0 failed the IsNotMarkedQcFail filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasWellFormedCigar filter
[2019-03-30 01:35:09] <DEBG> 0 failed the IsMapped filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasValidBaseQualities filter
[2019-03-30 01:35:09] <DEBG> There are 19 reads in chr3_GL000221v1_random:50521-50722 after filtering
[2019-03-30 01:35:09] <DEBG> Measuring block chr3_GL000221v1_random:50621-50622 containing 1 calls
[2019-03-30 01:35:09] <DEBG> Fetched 48 unfiltered reads from chr4_GL000008v2_random:0-106
[2019-03-30 01:35:09] <DEBG> In sample 93-VU147T
[2019-03-30 01:35:09] <DEBG> 0 failed the IsNotMarkedQcFail filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasWellFormedCigar filter
[2019-03-30 01:35:09] <DEBG> 0 failed the IsMapped filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasValidBaseQualities filter
[2019-03-30 01:35:09] <DEBG> There are 48 reads in chr4_GL000008v2_random:0-106 after filtering
[2019-03-30 01:35:09] <INFO> - 100% 2h 4m -
[2019-03-30 01:35:09] <DEBG> Encountered an error whilst filtering, attempting to cleanup
[2019-03-30 01:35:09] <INFO> Removed 2 temporary files
[2019-03-30 01:35:10] <EROR> A program error has occurred:
[2019-03-30 01:35:10] <EROR>
[2019-03-30 01:35:10] <EROR> Encountered an unknown error during calling. This means there is a
[2019-03-30 01:35:10] <EROR> bug and your results are untrustworthy.
[2019-03-30 01:35:10] <EROR>
[2019-03-30 01:35:10] <EROR> To help resolve this error run in debug mode and send the log file to
[2019-03-30 01:35:10] <EROR> https://github.com/luntergroup/octopus/issues.
[2019-03-30 01:35:10] <INFO> ------------------------------------------------------------------------
Any ideas on what might be going wrong?
Edit:
After using another FASTA file that doesn't have the contigs and only has the chromosomes, I didn't get that error. I think there might be an issue related to small contigs, multi-mapping reads, or soft clipped reads.
from octopus.
It's difficult to say what caused this error without the having access to the data, but bugs that caused this type of error have been fixed in versions since v0.5.2-beta (which is now fairly outdated), so it's possible that whatever bug caused this error has already been fixed. I'll close this issue now but please open a new issue if you find the problem (or any others) with the latest version.
from octopus.
Really appreciate it, thanks.
from octopus.
Related Issues (20)
- germline.v0.7.4.forest.gz HOT 6
- This is not a bug. Tell us about the issues with the forest files HOT 1
- can not download the forest files HOT 1
- Remove depreciated inheritance of std::unary_function
- Call in high heterozygotes
- VCF file is empty
- info on UL nanpore reads HOT 2
- Forest file is still not available for donwloading "std::bad_alloc Error" HOT 9
- Population-level variant calling using accessions with complex ploidy level
- Will octopus add a mode using paired information to compose a linked debruijn graph?
- Can octopus add a feature to add supported read query names of the variants in the INFO field of the output VCF file? HOT 1
- Position of maternal or paternal haplotypes in Octopus after performing TRIO analyses HOT 1
- Models for NovaseqX?
- Encountered an exception during calling 'std::bad_alloc' HOT 2
- training data & regions of the genome used for somatic random forest model
- Build failures on `gcc-13`: `option_collation.cpp:1298: error: redundant move in initialization`
- Build fails, is there a singularity? HOT 1
- ERROR in reading reference index. HOT 2
- Starting Call Set Refinement (CSR) filtering provokes RAM overload HOT 2
- Use with Nanopore long reads HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from octopus.