Giter VIP home page Giter VIP logo

Comments (13)

caherzee avatar caherzee commented on August 30, 2024

Hi Matthias,

How are you observing that more threads are used by elprep?

(I double checked the source code, which seems fine, and I cannot reproduce the behaviour).

Thanks!
Charlotte

from elprep.

matthdsm avatar matthdsm commented on August 30, 2024

I was monitoring CPU usage through htop and saw the entire node was being used (full cpu, all cores), despite only having asked 35 threads (instead of 80).

I'll rerun the analysis and see of this occurs again.
Matthias

from elprep.

caherzee avatar caherzee commented on August 30, 2024

If possible, can you run your command with /usr/bin/time -v (e.g. /usr/bin/time -v elprep sfm input.bam output.bam) and send me the output?

Thanks!
Charlotte

from elprep.

matthdsm avatar matthdsm commented on August 30, 2024

Good thing I'm benchmarking right now!

Command being timed: "elprep sfm NA12878.bam NA12878.elprep.bam --nr-of-threads 35 --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --sorting-order coordinate --haplotypecaller NA12878.elprep.g.vcf.gz --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --known-sites dbsnp-151.elsites --reference-confidence GVCF --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --log-path /projects/demultiplex/development/elprep --timed"
User time (seconds): 74572.83
System time (seconds): 2917.64
Percent of CPU this job got: 2323%
Elapsed (wall clock) time (h:mm:ss or m:ss): 55:35.31
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
-> RAM USAGE
Maximum resident set size (kbytes): 40641948 = 40GB
<-
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 352416596
Voluntary context switches: 51149677
Involuntary context switches: 2166519
Swaps: 0
File system inputs: 0
File system outputs: 55015096
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

from elprep.

caherzee avatar caherzee commented on August 30, 2024

Hmm that seems ok. Any chance the extra threads you see are not from the elprep command?

from elprep.

matthdsm avatar matthdsm commented on August 30, 2024

Elprep was the only thing running on the node at that moment.
I'm currently running the same sample on 10 threads. At this point the spike hasn't occured.
I'll get back to you in an hour with the results.

from elprep.

matthdsm avatar matthdsm commented on August 30, 2024

The spike occurred again, staring with "Phase 2" of the process
Schermafbeelding 2021-10-21 om 14 56 35

from elprep.

matthdsm avatar matthdsm commented on August 30, 2024

Elprep log

elprep version 5.1.1 compiled with go1.16.7 - see http://github.com/exascience/elprep for more information.
2021/10/21 12:07:49 Created log file at /projects/demultiplex/development/elprep/logs/elprep/elprep-2021-10-21-12-07-49-573422852-UTC.log
2021/10/21 12:07:49 Command line: [elprep sfm NA12878.bam NA12878.elprep.bam --nr-of-threads 10 --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --sorting-order coordinate --haplotypecaller NA12878.elprep.g.vcf.gz --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --known-sites dbsnp-151.elsites --reference-confidence GVCF --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --log-path /projects/demultiplex/development/elprep --timed]
2021/10/21 12:07:49 Executing command:
 elprep sfm NA12878.bam NA12878.elprep.bam --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --quantize-levels 0 --max-cycle 500 --known-sites dbsnp-151.elsites --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --haplotypecaller NA12878.elprep.g.vcf.gz --sorting-order coordinate --nr-of-threads 10 --timed --log-path /projects/demultiplex/development/elprep --intermediate-files-output-prefix NA12878 --intermediate-files-output-type bam
2021/10/21 12:07:49 Splitting...
2021/10/21 12:12:55 Filtering (phase 1)...
2021/10/21 12:55:34 Filtering (phase 2) and variant calling...
2021/10/21 13:14:29 Merging sam/bam...
2021/10/21 13:18:42 Merging metrics...
2021/10/21 13:18:42 Merge vcf...

time log

Command being timed: "elprep sfm NA12878.bam NA12878.elprep.bam --nr-of-threads 10 --mark-duplicates --mark-optical-duplicates NA12878_duplicate_metrics.txt --optical-duplicates-pixel-distance 2500 --sorting-order coordinate --haplotypecaller NA12878.elprep.g.vcf.gz --bqsr NA12878.recall --reference /references/Hsapiens/hg38/seq/hg38.elfasta --known-sites dbsnp-151.elsites --reference-confidence GVCF --target-regions /references/Hsapiens/hg38/coverage/capture_regions/CMGG_WES_analysis_ROI_v2.bed --log-path /projects/demultiplex/development/elprep --timed"
        User time (seconds): 67783.27
        System time (seconds): 2824.52
        Percent of CPU this job got: 1659%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10:53
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 45212524
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 391376706
        Voluntary context switches: 41522383
        Involuntary context switches: 1930008
        Swaps: 0
        File system inputs: 0
        File system outputs: 55015496
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

from elprep.

caherzee avatar caherzee commented on August 30, 2024

I think the problem might be there are too many system calls blocked in this specific case. It seems that the go runtime continues to launch threads in that case, though only 10 goroutines (threads) should actually ever be running in parallel. So if I understand it correctly, it should not be an issue so many threads are spawned, and it is why we don't see a bigger cpu usage than expected. I am unsure if I can prevent the go runtime to do this. I have to look into this a bit deeper.

from elprep.

matthdsm avatar matthdsm commented on August 30, 2024

I'm currently re-running with tmp path set to a faster FS, that should help the blocked calls, no?

from elprep.

matthdsm avatar matthdsm commented on August 30, 2024

No dice, same result, but to a lesser extent. A slow(ish) FS seems to be the culprit in this case

from elprep.

caherzee avatar caherzee commented on August 30, 2024

That makes sense. Are you restricting the nr of threads to run multiple jobs on the server?

from elprep.

matthdsm avatar matthdsm commented on August 30, 2024

Currently, yes. I'm running the same sample with different parameters to find the sweet spot between performance and speed specific to our hardware. Atm I've noticed that throwing more resources at elprep does not scale linearly, with only about 2min speed gains with double the resources.

from elprep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.