Giter VIP home page Giter VIP logo

Comments (6)

ewels avatar ewels commented on August 18, 2024

Hi Justin,

Sorry for this problem. We’ve hit it a bunch of times as well, it can be annoying. We’ve already split this task out by itself for this reason and subsample the BAM file down to one million reads first to limit the memory usage. I suspect here that it’s actually this subsampling method which is eating all of the memory. We tried to be clever and randomise the subsampling but it would perhaps be better to just take the first one million reads instead. I’ll take a look into this.

Phil

from rnaseq.

ewels avatar ewels commented on August 18, 2024

Hi @justinjeyakani,

I just pushed an update in #37 which should hopefully help the memory problem. It would be great if you could pull the latest version of the code and give it a try.

I've tested with a minimal dataset and it seems to work ok. I'd like to see how many reads the subsampled BAM files are ending up with using the current default subsampling target of ~10GB - we may be able to tweak this either up (to get better / more representative gene body coverage profiles) or down (for faster execution if the profiles aren't affected).

I'm trying to test on some larger data now myself, but our HPC is having problems again 😣

Phil

from rnaseq.

justinjeyakani avatar justinjeyakani commented on August 18, 2024

Dear Phil,

Sure, will pull the latest and test with large bam files.
Any concern to include generate bigwig and run geneBody_coverage2.py on it. This serves to make use of the entire bam with less memory and an additional feature of the pipeline for visualization of the alignment.

from rnaseq.

ewels avatar ewels commented on August 18, 2024

Hi Justin,

Yes, perhaps you're right. I skipped over this suggestion initially because we'd already half-implemented the subsampling solution. But as you say, perhaps doing a bigwig conversion is the better approach (and would be simpler pipeline code too).

Phil

from rnaseq.

apeltzer avatar apeltzer commented on August 18, 2024

https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html

Should be sufficient to get this going?

from rnaseq.

apeltzer avatar apeltzer commented on August 18, 2024

This is now in the dev branch of the pipeline :-)

from rnaseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.