Giter VIP home page Giter VIP logo

Comments (7)

dancooke avatar dancooke commented on July 26, 2024 1

@mehrankr FYI. I just added a Python script to split Octopus new tagged realigned BAM output into separate BAM files.

from octopus.

dancooke avatar dancooke commented on July 26, 2024

Thanks for your question. The --bamout option has different input depending on whether you're analysing a single sample or multiple samples. For a single sample, you need to specify the full output path (e.g.~/octopus/minibams/realigned.bam). The reason your first command fails is that you're trying to write to an existing directory. For multi-sample analysis the input is a directory (e.g. ~/octopus/minibams). I'll try to make this clearer in the documentation.

Note that Octopus does not produce 'split' realigned BAMs in the same way that my bamsplit script does (and earlier versions of Octopus). Rather, all realigned reads for a given sample are written to a single BAM, and the supporting haplotype is annotated with the HI BAM tag. I think this is a better way of doing things as it allows clearing visualisation in alignment browsers that support coloured reads, such as IGV. If you need to work with 'split' BAMs, then it should be fairly straightforward to write a Python script using pysam to generate them by making use of the HI tag in the Octopus realigned BAM.

from octopus.

mehrankr avatar mehrankr commented on July 26, 2024

Thanks Dan for getting back to me so quickly.

That makes sense. However, when I look at the output BAM file, all of my prior tags in the input BAM are excluded and there is only the RG tag.

In pysam, I only see the following for all reads:

In [1]: read.get_tags()
Out[1]: [('RG', '1')]

This, however, is for an unpaired 50 bp experiment. Does it mean that Octopus couldn't do any phasing because of the short reads?

from octopus.

dancooke avatar dancooke commented on July 26, 2024

Ah... I missed that you are using v0.5.2-beta. This version of Octopus does produce split BAM files: you need to use the --split-bamout option rather than the --bamout option. In v0.6.0-beta the --split-bamout option was removed and BAM tags were added.

from octopus.

mehrankr avatar mehrankr commented on July 26, 2024

Thanks Dan. this worked well for all of the files I had, except one paired end ATAC-seq library which ended with the following debug message:

[2019-03-30 01:35:09] <DEBG> There are 38 reads in chr3_GL000221v1_random:44103-44304 after filtering
[2019-03-30 01:35:09] <DEBG> Measuring block chr3_GL000221v1_random:44203-44204 containing 1 calls
[2019-03-30 01:35:09] <DEBG> Fetched 19 unfiltered reads from chr3_GL000221v1_random:50521-50722
[2019-03-30 01:35:09] <DEBG> In sample 93-VU147T
[2019-03-30 01:35:09] <DEBG> 0 failed the IsNotMarkedQcFail filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasWellFormedCigar filter
[2019-03-30 01:35:09] <DEBG> 0 failed the IsMapped filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasValidBaseQualities filter
[2019-03-30 01:35:09] <DEBG> There are 19 reads in chr3_GL000221v1_random:50521-50722 after filtering
[2019-03-30 01:35:09] <DEBG> Measuring block chr3_GL000221v1_random:50621-50622 containing 1 calls
[2019-03-30 01:35:09] <DEBG> Fetched 48 unfiltered reads from chr4_GL000008v2_random:0-106
[2019-03-30 01:35:09] <DEBG> In sample 93-VU147T
[2019-03-30 01:35:09] <DEBG> 0 failed the IsNotMarkedQcFail filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasWellFormedCigar filter
[2019-03-30 01:35:09] <DEBG> 0 failed the IsMapped filter
[2019-03-30 01:35:09] <DEBG> 0 failed the HasValidBaseQualities filter
[2019-03-30 01:35:09] <DEBG> There are 48 reads in chr4_GL000008v2_random:0-106 after filtering
[2019-03-30 01:35:09] <INFO>                           -             100%            2h 4m                 -
[2019-03-30 01:35:09] <DEBG> Encountered an error whilst filtering, attempting to cleanup
[2019-03-30 01:35:09] <INFO> Removed 2 temporary files
[2019-03-30 01:35:10] <EROR> A program error has occurred:
[2019-03-30 01:35:10] <EROR> 
[2019-03-30 01:35:10] <EROR>     Encountered an unknown error during calling. This means there is a
[2019-03-30 01:35:10] <EROR>     bug and your results are untrustworthy.
[2019-03-30 01:35:10] <EROR> 
[2019-03-30 01:35:10] <EROR> To help resolve this error run in debug mode and send the log file to
[2019-03-30 01:35:10] <EROR> https://github.com/luntergroup/octopus/issues.
[2019-03-30 01:35:10] <INFO> ------------------------------------------------------------------------

Any ideas on what might be going wrong?

Edit:
After using another FASTA file that doesn't have the contigs and only has the chromosomes, I didn't get that error. I think there might be an issue related to small contigs, multi-mapping reads, or soft clipped reads.

from octopus.

dancooke avatar dancooke commented on July 26, 2024

It's difficult to say what caused this error without the having access to the data, but bugs that caused this type of error have been fixed in versions since v0.5.2-beta (which is now fairly outdated), so it's possible that whatever bug caused this error has already been fixed. I'll close this issue now but please open a new issue if you find the problem (or any others) with the latest version.

from octopus.

mehrankr avatar mehrankr commented on July 26, 2024

Really appreciate it, thanks.

from octopus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.