Comments (6)
I'm trying to use 'kmcp search' to classify my metagenome data, but the speed is very slow (I killed it after 1 week run) and unacceptable even if I already define -j 40
for multiple CPUs. My commond is ohup kmcp search -j 40 -d ~/data/Database/KMCP_database/GTDB_rep_genomes_r207/gtdb.r207.minh5.kmcp/ -1 ../../01-Trimming/02_trimmed_reads_1P.fq.gz -2 ../../01-Trimming/02_trimmed_reads_2P.fq.gz -o 150cm_ECS.KMCP.tsv.gz &
@shenwei356 Could you give me some idea about that?
from kmcp.
Hi thanks for using KMCP. Please provide more details:
- Local machine or a cluster? number of cpus? size of RAM?
- Where is the database stored? local disk or NAS? try add -w (https://bioinf.shenwei.me/kmcp/faq/#why-are-the-cpu-usages-are-very-low-not-100)
- How's the database built? Is sketching used? Minimizer? What's the size of gtdb.r207.minh5.kmcp. I'd recommend using all k-mers.
- Information of the query reads, read length, the number of reads.
- Please rerun and check the instant speed.
- Please use other tools like screen, instead of nohup, to run command in background.
from kmcp.
Hi thanks for using KMC. Please provide more details:
- Local machine or a cluster? number of cpus? size of RAM?
- Where is the database stored? local disk or NAS? try add -w (https://bioinf.shenwei.me/kmcp/faq/#why-are-the-cpu-usages-are-very-low-not-100)
- How's the database built? Is sketching used? Minimizer? What's the size of gtdb.r207.minh5.kmcp. I'd recommend using all k-mers.
- Information of the query reads, read length, the number of reads.
- Please rerun and check the instant speed.
- Please use other tools like screen, instead of nohup, to run command in background.
Thanks for your quick reply,
- I run it in our local sever with 80 CPUs and 1TB RAM.
- The database I used it built by myself, which is based on the latest GTDB database, here is the
index.log
1 16:29:44.659 [INFO] kmcp v0.9.0
2 16:29:44.695 [INFO] https://github.com/shenwei356/kmcp
3 16:29:44.696 [INFO]
4 16:29:44.696 [INFO] loading .unik file infos from file: gtdb-r207-k21-n10/_info.txt
5 16:29:45.409 [INFO] 657030 cached file infos loaded
6 16:29:45.554 [INFO]
7 16:29:45.554 [INFO] -------------------- [main parameters] --------------------
8 16:29:45.554 [INFO] number of hashes: 1
9 16:29:45.554 [INFO] false positive rate: 0.200000
10 16:29:45.554 [INFO] k-mer size(s): 21
11 16:29:45.554 [INFO] split seqequence size: 0, overlap: 150
12 16:29:45.554 [INFO] block-sizeX-kmers-t: 10.00 M
13 16:29:45.555 [INFO] block-sizeX : 256
14 16:29:45.555 [INFO] block-size8-kmers-t: 20.00 M
15 16:29:45.555 [INFO] block-size1-kmers-t: 200.00 M
16 16:29:45.555 [INFO] -------------------- [main parameters] --------------------
17 16:29:45.555 [INFO]
18 16:29:45.555 [INFO] building index ...
19 16:29:46.285 [INFO]
20 16:29:46.285 [INFO] block size: 16432
21 16:29:46.285 [INFO] number of index files: 40 (may be more)
22 16:29:46.285 [INFO]
23 17:56:35.564 [INFO]
24 17:56:35.564 [INFO] kmcp database with 213177546931 k-mers saved to gtdb.r207.minh5.kmcp
25 17:56:35.564 [INFO] total file size: 120.21 GB
26 17:56:35.564 [INFO] total index files: 40
27 17:56:35.564 [INFO]
28 17:56:35.565 [INFO] elapsed time: 1h26m50.906706327s
29 17:56:35.565 [INFO]
- My input data is two trimmed metagenome paired-end 150-bp reads files with 22GB zipped size (11 GB for each)
- the instant speed in the log file is very low, I just rerun it about 2 hours ago, right now the last line of the log file is
processed queries: 4608, speed: 0.000 million queries per minute^Mprocessed queries: 4672, speed: 0.000 million queries per minute^M
from kmcp.
It's weird, please add my Wechat if you have one: shenwei356
from kmcp.
Are the CPUs ARM?
from kmcp.
Are the CPUs ARM?
If they are, please try the new binaries. I've fixed the search for ARM architectures.
BTW, there's no need to set the false positive rate as 0.2 for kmcp index
; 0.3 is OK.
from kmcp.
Related Issues (20)
- KMCP database building tutorial HOT 3
- Dealing with novel/non-sequenced species HOT 2
- long read metagenomic profiling HOT 2
- suitable for CDS and/or contig taxonomic assignment? HOT 2
- Masking prophages in bacterial genomes before building database as Phanta does HOT 2
- How to specify multiple kmer values HOT 2
- Add a tutorial of detecting specific pathogen in sequencing data HOT 1
- Detecting closest reference in custom DB HOT 4
- Report statistics of matched, unmatched reads HOT 1
- KMCP's MetaPhlAn output doesn't follow the MetaPhlAn file format HOT 3
- [Suggestion] Use score calibration when identifying proviruses and plasmids HOT 2
- ETA missing when building KMCP index HOT 4
- Merge error (number of fields < query index field) HOT 3
- Optimizing KMCP with HumGut HOT 9
- Kmcp profile empty HOT 2
- TODO: save the search result into a serializing binary file for fast downstream parsing HOT 3
- Availability of old gtdb databases? HOT 5
- coverage is greater than 100 HOT 1
- kmcp search is stucked HOT 4
- How to profile results only identify 1 species per reference HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kmcp.