Giter VIP home page Giter VIP logo

Comments (25)

dancooke avatar dancooke commented on August 30, 2024

Hi Justin!

This looks like a bug in octopus' VCF merge function. Could you please provide your command line prompt that you used, and if possible the temporary directory containing all the BCF files? It should be possible for you to manually merge the BCF files assuming they are not corrupted.

Please be aware that the somatic caller is currently some way off the maturity of the germline caller. I've just uploaded a new release which contains numerous fixes and improvements which drastically improves somatic calling performance from the last release, but there is still some way to go.

from octopus.

jbedo avatar jbedo commented on August 30, 2024

Octopus was invoked with:
octopus -R /data/hg38/hg38.fa -I tmp/9be570942bd7cfae6ed5c8c3b44b73bccb8e1c08.Samtools.AddRGLine.bam tmp/0828b0a2897927c2b831c1361f5fa7ec0ea31476.Samtools.AddRGLine.bam -N normal --fast --output tmp/3f9d7c093adadc2aea0e997d8bf405f0e355b993.Octopus.CallSomatic.vcf --threads 8
Unfortunately I can't provide the BCF files, I realise this makes debugging difficult. I wonder if this is actually a HtsLib bug though, as it appears to be stuck in a HtsLib function.

Thanks for the update to to somatic calling! I'll try it out. In the meantime, I've had a look at the temp BCF files and found a lot of genotype calls referencing an alt allele in the germline (e.g., 2|0) that is not present in the ALT column of the VCF file (only 1 alternate present). This seems to only occur with variants tagged with SOMATIC; does 2 mean something different in this context or is this another bug? I'll open up another issue if it is a bug.

from octopus.

dancooke avatar dancooke commented on August 30, 2024

Agh, the odd VCF format you're seeing was due to a bug which has been fixed in the new release. So it looks like htslib is getting into a muddle with octopus' (bad) VCF. Would you be able to try with the new release to see if the problem is resolved?

from octopus.

jbedo avatar jbedo commented on August 30, 2024

Sure, I'll get back to you in 10 hours ;).

from octopus.

dancooke avatar dancooke commented on August 30, 2024

Cheers! Apologies about the slow runtime, I'll be focusing on improving that over the next few weeks.

from octopus.

jbedo avatar jbedo commented on August 30, 2024

Looks like it still hangs in htslib. Inspecting the BCF files the strange genotyping is now fixed, but the SOMATIC mutations are still a little odd: all of them are tri-allelic. I think it's unlikely the only somatic mutations have copy number 3.

How does htslib handle multi-allelic sites? Maybe that's still causing it to hang. I've linked against htslib 1.3.1, maybe I should try again with htslib 1.3.2. Or should I be using the master branch of htslib?

from octopus.

dancooke avatar dancooke commented on August 30, 2024

The triploid somatic genotypes are not intended to give copy number information per se, rather they are used to distinguish novel somatic haplotypes from the germline background (they are always ploidy + 1). The third column in the phased genotype is always the somatic allele. Octopus does infer allele frequencies (a somatic allele frequency credible interval is given in the SCR sample field), but it doesn't try to infer copy numbers from these, maybe it's something we'll look at in the future.

I've not had this issue with htslib, I am using 1.3.2 so it's possible that a bug in 1.3.1 is causing this. Does the problem persist if you call over a smaller target region (e.g. with just one somatic)?

from octopus.

jbedo avatar jbedo commented on August 30, 2024

Thanks for the clarification, that makes a lot more sense.

I tried calling over a single chromosome and it completed successfully.

from octopus.

jbedo avatar jbedo commented on August 30, 2024

Ok after some experiments I can get it to hang if I run octopus on the unknown contigs in hg38, so everything except chr1-22, chrX, and chrY. There doesn't seem to be anything unusal about the VCF output that I dug out of the octopus temporary files for these chromosomes, other than the _ in the chromosome name (e.g., chrUn_KI270438v1).

from octopus.

jbedo avatar jbedo commented on August 30, 2024

Also confirmed bug is still present with htslib 1.3.2.

from octopus.

dancooke avatar dancooke commented on August 30, 2024

I'm not 100% sure, but I think I resolved this at some point over the last year. Will leave open for now.

from octopus.

jbedo avatar jbedo commented on August 30, 2024

Looks like some variant of this bug is still around. Octopus seems to create an "unfiltered" file and then hang. The output file I specify remains empty. I'll try and check if it's caused by the contigs as it previously was a year ago.

from octopus.

dancooke avatar dancooke commented on August 30, 2024

This is odd. Octopus shouldn't have any issues with the underscores in the contig names. I was trying to reproduce this bug today and did come across a new issue (#17), but I don't think it is related to this problem. I haven't managed to reproduce this issue yet, have you checked if you get the same behaviour running on the same contigs with a single sample?

I also had a quick look at the htslib method your GDB stack trace indicates where the process gets stuck (starting line 579 here). Although I cannot claim to fully understand this code, that while loop looks highly suspect. Even if octopus is calling this method with bad input, I don't think an infinite loop is a reasonable outcome.

from octopus.

jbedo avatar jbedo commented on August 30, 2024

Removing contigs didn't help this time, octopus still freezes after producing the "unfiltered" file. I'm currently using a cluster to execute this so debugging with gdb is difficult. I'll run a single sample and see if it terminates.

As for the while loop, I don't think it's going into any infinite loops as the CPU usage goes to 0%, suggesting some sort of deadlock.

from octopus.

jbedo avatar jbedo commented on August 30, 2024

Looks like I have the same problem with a single sample: the unfiltered file is produced then octopus hangs and the output file remains empty.

from octopus.

dancooke avatar dancooke commented on August 30, 2024

Very strange. The temporary file merging occurs in the main thread, once all other worker threads have finished, so I'm struggling to see how this can be a deadlock issue. Could you please confirm which OS, compiler, standard library, Boost, and htslib versions you're using?

from octopus.

jbedo avatar jbedo commented on August 30, 2024

I'm on linux, here are the dependencies being used:

glibc-2.26-75
zlib-1.2.11
bzip2-1.0.6.0.1
xz-5.2.3
openssl-1.0.2m
nghttp2-1.24.0-lib
libssh2-1.8.0
curl-7.56.1
htslib-1.6
bash-4.4-p12
gcc-6.4.0-lib
icu4c-59.1
boost-1.65.1

I compiled octopus with clang 3.9.1.

from octopus.

jbedo avatar jbedo commented on August 30, 2024

I reproduced the problem on my laptop, here's a stack trace:

#0 0x00007fd79ced90d9 in pthread_cond_destroy@@GLIBC_2.3.2 () from /nix/store/dps6gpjd9vmjylqgjhdbw6kyxfbfssn7-glibc-2.26-75/lib/libpthread.so.0 #1 0x0000000000818b03 in octopus::(anonymous namespace)::run_octopus_multi_threaded(octopus::GenomeCallingComponents&) () #2 0x000000000081155f in octopus::run_calling(octopus::GenomeCallingComponents&) () #3 0x000000000081cba6 in octopus::run_octopus(octopus::GenomeCallingComponents&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () #4 0x0000000000477d74 in main ()

HTH.

from octopus.

dancooke avatar dancooke commented on August 30, 2024

That is useful, thanks! Oddly, this stack trace points somewhere quite different to your previous one. From this Stack Overflow question, it looks some thread is waiting on one of the std::condition_variables that terminates before calling notify, then when the condition variable destructor is called (which just calls pthread_cond_destroy), we get undefined behaviour (hanging in this case). I'll try to track down where this could be happening!

from octopus.

dancooke avatar dancooke commented on August 30, 2024

Could you please try checking out the branch fix/hang-after-calling and running your example with the --debug option? This will produce a debug log file which could be quite large (~10GB). If the changes I've made haven't fixed the issue, it would be helpful if you could post the last few hundred lines of the debug log after it gets stuck. Cheers.

from octopus.

dancooke avatar dancooke commented on August 30, 2024

Just some other thoughts. I wonder if there is some issue with using Clang with libstdc++ which is really intended for GCC? If the new branch doesn't fix the issue, would it be possible for you to try building with GCC instead of Clang?

from octopus.

jbedo avatar jbedo commented on August 30, 2024

It doesn't compile for me with GCC for some reason, but I don't think the libstdc++ from gcc will cause any issues (especially a pthread issue). In any case, the branch partially worked! My first octopus run ran to completion, but the second sample crashed. I'll restart and see if it's a repeatable thing.

from octopus.

jbedo avatar jbedo commented on August 30, 2024

OK it doesn't hang anymore but now it just crashes in the call set refinement stage. No unfiltered files are being produced; does that mean octopus is getting further than before or not? Doesn't seem to be anything useful in the debug log:

[2017-12-13 07:40:45] 0 failed the IsNotOctopusDuplicate filter
[2017-12-13 07:40:45] 0 failed the HasValidBaseQualities filter
[2017-12-13 07:40:45] 0 failed the HasWellFormedCigar filter
[2017-12-13 07:40:45] 0 failed the IsMapped filter
[2017-12-13 07:40:45] 2 failed the IsProperTemplate filter
[2017-12-13 07:40:45] 0 failed the IsNotMarkedQcFail filter
[2017-12-13 07:40:45] 0 failed the IsNotMarkedDuplicate filter
[2017-12-13 07:40:45] There are 201 reads in chr1:5687474-5695533 after filtering
[2017-12-13 07:40:45] - 100% 31s -
[2017-12-13 07:40:45] Encountered an error whilst filtering, attempting to cleanup
[2017-12-13 07:40:45] Removed 912 temporary files
[2017-12-13 07:40:46] A program error has occurred:
[2017-12-13 07:40:46]
[2017-12-13 07:40:46] Encountered an unknown error during calling. This means there is a
[2017-12-13 07:40:46] bug and your results are untrustworthy.
[2017-12-13 07:40:46]
[2017-12-13 07:40:46] To help resolve this error run in debug mode and send the log file to
[2017-12-13 07:40:46] https://github.com/luntergroup/octopus/issues.
[2017-12-13 07:40:46] ------------------------------------------------------------------------

from octopus.

dancooke avatar dancooke commented on August 30, 2024

Ok great. The crash in the variant filtering is certainly an independent bug as this only starts once the calling has finished, and filtering isn't threaded yet. Unfortunately the 'cleanup' after the exception in the filtering removes the unfiltered calls - I'll probably change that behaviour. Would you mind opening a separate issue for the filtering crash? Probably if you just call the region chr1:5687474-5695533 you'll hit the same bug - in which case it would be very helpful if you could provide a BAM subset for this region with the issue. I don't envision it will be too difficult to resolve this one! In the meantime, you can just avoid filtering completely by specifying -f off. Many thanks for your help in resolving this issue!

from octopus.

dancooke avatar dancooke commented on August 30, 2024

Closing this issue as I believe it is fixed in v0.3.2-alpha (c27686f).

from octopus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.