tlemane / kmtricks Goto Github PK
View Code? Open in Web Editor NEWmodular k-mer count matrix and Bloom filter construction for large read collections
License: GNU Affero General Public License v3.0
modular k-mer count matrix and Bloom filter construction for large read collections
License: GNU Affero General Public License v3.0
Hello,
i installed kmtricks with conda, and i tried to run it on about 10000 fastq files, stored on an external drive.
here is the command line used :
kmtricks pipeline --file list_fastq_kmtricks --run-dir kmtricksDir --kmer-size 31 --hard-min 5 --mode kmer:count:bin --recurrence-min 10 -t 12
and here are the message obtained from kmtricks :
[2022-04-16 20:17:08.096] [info] Run with Kmer<32> - uint64_t implementation
[2022-04-16 20:17:08.320] [info] Compute configuration...
[2022-04-16 20:17:08.320] [info] 3504 samples found (10512 read files).
[2022-04-16 20:51:29.192] [info] Use 113 partitions.
[2022-04-16 20:51:29.287] [info] Compute minimizer repartition...
Compute SuperK [==================================================] [02d:11h:28m:38s]
Count partitions [==================================================] [02d:11h:28m:38s]
Merge partitions [> ] [00:00s]
terminate called after throwing an instance of 'std::runtime_error'
terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
[2022-04-19 08:34:53.972] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
what(): Unable to open /media/ugo/Transcend3/scRNAseq_kmer/EMTAB_9067/kmtricks/counts/partition_1/ERR4147809.kmer what(): Unable to open /media/ugo/Transcend3/scRNAseq_kmer/EMTAB_9067/kmtricks/counts/partition_10/ERR4147809.kmer
[2022-04-19 08:34:53.990] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2022-04-19 08:34:53.990] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
i was not able to find the ./kmtricks_backtrace.log file
i checked the file /media/ugo/Transcend3/scRNAseq_kmer/EMTAB_9067/kmtricks/counts/partition_1/ERR4147809.kmer, and it exists.
here is kmtricks infos:
kmtricks v1.2.1
HOST -
build host: Linux-5.4.0-54-generic
run host: Linux 5.4.0-109-generic
BUILD -
c compiler: GNU 11.2.0
cxx compiler: GNU 11.2.0
conda: ON
static: OFF
native: OFF
modules: ON
socks: ON
howde: ON
dev: OFF
kmer: 32,64,96,128,160,192,224,256
max_c: 4294967295
GIT SHA1 / VERSION -
kmtricks: 5c5c0b5
sdsl: c32874c
bcli: 3e4f493
fmt: 0544a227
kff: 97d135e
lz4: 4de56b3
spdlog: v1.2.1-1811-g5b4c4f3f
xxhash: 6853ddc
gtest: release-1.8.0-2774-g96f4ce02
croaring: v0.3.3-17-g2d5c927
robin-hood-hasing: 24b3f50
turbop: 4ab9f5b
cfrcat: 2f9da97
indicators: v1.9-36-gcdcff01
Thanks for your help
Ugo
Hello,
Thank you for developing kmtricks
. I'm following the conda instructions to install kmtricks
on an HPC running CentOS 7.9.2009, but I'm currently unable to run it:
$ conda create -p kmtricks_env
$ conda activate ./kmtricks_env
$ conda install -c conda-forge -c tlemane kmtricks
$ which kmtricks
~/conda_envs/kmtricks_env/bin/kmtricks
$ conda list kmtricks
# packages in environment at /data/home/***/conda_envs/kmtricks_env:
#
# Name Version Build Channel
kmtricks 1.0.0 hdf3d972_0 tlemane
$ kmtricks --version
[1] 24748 illegal hardware instruction kmtricks --version
$ kmtricks --help
[1] 24776 illegal hardware instruction kmtricks --help
The conda install seems to work fine on macOS (Big Sur, MacBook Air M1, 2020), although it uses v0.0.6 (kmtricks.py
instead of kmtricks
as in v1.0.0):
$ conda activate ./kmtricks_env
$ which kmtricks.py
/Users/***/conda_envs/kmtricks_env/bin/kmtricks.py
$ kmtricks.py --version
kmtricks v0.0.6, git_sha1 : 8539f16
$ kmtricks.py --help
usage: kmtricks.py [-v] [-d] [--version] [-h] cmd ...
kmtricks cli
Subcommands:
cmd env, run
Global arguments:
-v, --verbose Verbose mode
-d, --debug Debug mode
--version Display kmtricks version
-h, --help Show this message and exit
I would appreciate any help.
running:
kmtricks pipeline --file fof2 --run-dir ./kmer_pa --kmer-size 31 --mode kmer:pa:text -t 10
I get:
Killed after receive Segmentation fault:SIGSEGV(11) signal
running 120 samples, 420 read files (~14GB each) on Ubuntu 22.04, kernel 6.5.0-21-generic, kmtricks version v1.4.0. installed in conda environment. 125GB of RAM, 50BG sawp, 25 CPU. I monitored CPU and RAM using htop and did not see overuse of either CPU or RAM see log file:
Backtrace:
1 0x00007fe97f642520 (null) + 140640841377056
2 0x000055a6bf40c531 (null) + 94174661625137
3 0x000055a6bf40c33d (null) + 94174661624637
4 0x000055a6bf3ee1a3 gatb::core::kmer::impl::RepartitorAlgorithm<32ul>::computeRepartition(gatb::core::kmer::impl::Repartitor&) + 563
5 0x000055a6bf3eee6a gatb::core::kmer::impl::RepartitorAlgorithm<32ul>::execute() + 138
6 0x000055a6bf290914 (null) + 94174660069652
7 0x000055a6bf291098 (null) + 94174660071576
8 0x000055a6bf2917d0 (null) + 94174660073424
9 0x000055a6bf044260 main + 3312
10 0x00007fe97f629d90 (null) + 140640841276816
11 0x00007fe97f629e40 __libc_start_main + 128
12 0x000055a6bf0465e5 (null) + 94174657668581
Please advise? Thank you
Thanks for kmtricks; we have incorporated it into one of our lab pipelines with significant computing time improvement.
We use kmtricks to generate binary presence/absence matrices from x samples, each from 2-4 fastq files (.fq.gz). These files are significant, and a goal is to remove them from storage after computing.
Our usage is fairly simple:
kmtricks pipeline --mode kmer:pa:bin
kmtricks aggregate --pa-matrix kmer --format text
My query is, I want to incorporate z additional samples at a later date and recalculate everything, but without bringing back the reads for previous x samples, i.e. adding the new samples from fastq files into the previous run quants.
Is it possible?
I have tried to get some ideas from the wiki, but I need help finding something suggesting this is possible and where to start
Thanks for your help.
kmtricks uses a non-standard definition of canonical k-mer, because it treats T<G. This is mentioned in the usage of the aggregate command with sorted option, but its also relevant when the output is not sorted (because it defines what is the canonical k-mer). It could save future users some debugging time if this information was included in kmer dump help and kmer aggregate help (for non sorted option).
Just to clarify, I think you do have this information there already but it could help avoid user-error if it was featured more prominently in the help/usage.
J'ai fait une modif, j'ai relancé make test
pour voir et ça a fail. Bon ok mais j'ai comme un doute.
Du coup, j'ai remis le répertoire test tout propre comme quand j'ai cloné le dépôt et j'ai enlevé ma modif.
Je lance une première fois make test -> pass
Je lance une deuxième fois make test (sans rien changer) -> fail
Y a un truc bizarre :-)
Error I get when compiling kmtricks
on a fresh EC2 server:
In file included from /mnt/1/kmtricks/include/kmtricks/howde_utils.hpp:32,
from /mnt/1/kmtricks/include/kmtricks/task.hpp:36,
from /mnt/1/kmtricks/include/kmtricks/cmd.hpp:37,
from /mnt/1/kmtricks/src/kmtricks.cpp:24:
/mnt/1/kmtricks/thirdparty/cfrcat/include/cfrcat/cfrcat.hpp: In function ‘uint64_t cfr::concat(int, int)’:
/mnt/1/kmtricks/thirdparty/cfrcat/include/cfrcat/cfrcat.hpp:124:10: error: ‘copy_file_range’ was not declared in this scope
return copy_file_range(in_fd, NULL, out_fd, &offset, size, 0);
system:
$ uname -ar
Linux ip-xxx.us-west-2.compute.internal 5.10.130-118.517.amzn2.x86_64 #1 SMP Wed Jul 13 16:51:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
$ cmake --version
cmake3 version 3.13.1
$ /usr/lib64/libc.so.6
GNU C Library (GNU libc) stable release version 2.26, by Roland McGrath et al.
kmdiff by default use std::thread::hardware_concurrency()
to get the number of available threads. in cli.cpp
but, because there's a but ;-)
hardware_concurrency
returns, when possible, the underlying hardware capability to run threads, which might not corresponds to the actual number of cores available to the process (through the use of taskset, batch system like slurm, etc...). The consequence is that kmdiff might run in a non optimal way.
For example, I've got a user that has submitted a kmc job on a 96 cores HPC nodes, in a single core slurm allocation: more than 100 threads are now fighting for the usage of this core.
I would suggest to switch to sched_getaffinity
in order to get the default trhead number value.
something like that.
#include <sched.h>
int getCPUs()
{
cpu_set_t cpu_set;
sched_getaffinity(0, sizeof(cpu_set), &cpu_set);
return CPU_COUNT(&cpu_set);
}
regards
Eric
Hi, thanks for making this great tool! I'm trying to generate kmer presence/absence matrices with these commands:
kmtricks pipeline --file list_five --run-dir output_5 --cpr --mode kmer:pa:bin
kmtricks aggregate --run-dir output_5 --pa-matrix kmer --format text --cpr-in --sorted > output.txt
I've noticed that, in the output, kmers are being reported as present in an assembly but when I grep for that kmer it isn't in the fasta. Are there any settings I can use to prevent this happening? I've tried --hard-min 1
which didn't help.
Thanks!
Jenny
hello
I launched this kmtricks command for a file of files with ~ 37000 fastq files
kmtricks pipeline --file "$list_fq" --run-dir "$outDir/out" --kmer-size 31 --hard-min 1 --mode kmer:count:bin --until count --cpr -t 20
i runs nicely but at some point uses all ram available (i have 128Gb) and the script stops
./analysis_10x_MDAMB468_kmtricks.sh : ligne 30 : 21185 Processus arrêté "$kmtricks" pipeline --file "$list_fq" --run-dir "$outDir/out" --kmer-size 31 --hard-min 1 --mode kmer:count:bin --until count --cpr -t 20
Is there a way to limit the RAM usage ?
thanks a lot
I installed kmtricks v0.0.2 with conda. Running kmtricks.py
with python < 3.6 results in the following error message:
File "/ebio/abt2_projects/ag-swart-loxodes/envs/kmtricks/bin/kmtricks.py", line 109
self.global_parser: argparse.ArgumentParser = None
^
SyntaxError: invalid syntax
It appears that the version check in lines 40-42 of kmtricks.py
isn't used because it's not enclosed in the main code block.
Perhaps the python version could be included in the Conda recipe? I've gotten it to work now and look forward to trying out the pipeline.
Salut Téo,
Pierre m'a transmis le lien vers kmtricks pour que je regarde un peu ce que tu as fait.
Pour voir, j'ai profilé l'exécutable (sur un seul fichier donc pas trop d'intérêt pour le moment).
Et j'ai vu que 86% du temps était dans la fonction pInfo.saveInfoFile(name) qui est finalement (si j'ai bien compris) un dump lisible de qu'il y a dans la version binaire. Je suppose que c'est un truc de dev ;-)
Bien à toi,
Guillaume.
PS : je peux faire un pull request si tu veux, j'ai fait la modif pour voir
Hello,
I installed kmtricks with conda, and I got the error when I tried to run it:
kmtricks: error while loading shared libraries: liblz4.so.1: cannot open shared object file: No such file or directory
here is my systerm kmtricks:
————————————————
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Thanks for your help!
Hello,
I'm struggling with the following problem:
I'm running kmtricks
(installed through conda
) in order to index 1051 genomes. This is the command line that produced the error:
kmtricks pipeline --file ./genomes.fof --run-dir ./index --kmer-size 31 --mode hash:bft:bin --hard-min 2 --soft-min 3 --share-min 1 --bloom-size 10000 --bf-format howdesbt --cpr
And this is the error message (the backtrace log file is empty):
[2022-04-28 12:54:22.153] [info] Run with Kmer<32> - uint64_t implementation
[2022-04-28 12:54:22.295] [info] Compute configuration...
[2022-04-28 12:54:22.295] [info] 1051 samples found (1051 read files).
[2022-04-28 12:55:19.370] [info] Use 4 partitions.
[2022-04-28 12:55:19.459] [info] Compute minimizer repartition...
Compute SuperK [==================================================] [01m:40s]
Count partitions [==================================================] [01m:41s]
Merge partitions [> ] [00:00s]
terminate called recursively
terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
terminate called recursively
what(): Unable to open ./index/counts/partition_3/G268.hash.p4
[2022-04-28 13:06:08.228] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2022-04-28 13:06:08.228] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2022-04-28 13:06:08.229] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
terminate called recursively
[2022-04-28 13:06:08.229] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
I'm not sure if this is the same problem reported in issue #15, this is why I'm opening this new issue.
In case this is the same kind of problem, is there something else I could try to overcome this issue without increasing the the maximum number of open files (ulimit
)?
In #15 @tlemane suggested to also reduce the number of threads, but I didn't specify the -t
argument, so this is not really useful in my case.
Thanks in advance for your help
kmtricks dump seems to crash with the following error:
terminate called after throwing an instance of 'km::IOError'
what(): std::exception
[2023-03-31 07:43:20.178] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
kmtricks infos:
kmtricks v1.3.0
- HOST -
build host: Linux-6.1.3
run host: Linux 4.15.0-197-generic
- BUILD -
c compiler: GNU 11.2.0
cxx compiler: GNU 11.2.0
conda: ON
static: OFF
native: OFF
modules: ON
socks: ON
howde: ON
dev: OFF
kmer: 32,64,96,128,160,192,224,256
max_c: 4294967295
- GIT SHA1 / VERSION -
kmtricks: 92d7894
sdsl: c32874c
bcli: 3e4f493
fmt: 0544a227
kff: 97d135e
lz4: 4de56b3
spdlog: v1.2.1-1811-g5b4c4f3f
xxhash: 6853ddc
gtest: release-1.8.0-2774-g96f4ce02
croaring: v0.3.3-17-g2d5c927
robin-hood-hasing: 24b3f50
turbop: 4ab9f5b
cfrcat: 2f9da97
indicators: v1.9-36-gcdcff01
Contact: [email protected]
Backtrace:
Backtrace:
1 0x00007f549d83cf10 (null) + 140001396641552
2 0x00007f549d83ce87 gsignal + 199
3 0x00007f549d83e7f1 abort + 321
4 0x00007f549e4ae036 __gnu_cxx::__verbose_terminate_handler() + 192
5 0x00007f549e4ac524 (null) + 140001409680676
6 0x00007f549e4ac576 (null) + 140001409680758
7 0x00007f549e4ac768 __cxa_rethrow + 0
8 0x00000000005411b6 void km::check_fstream_good<std::basic_ifstream<char, std::char_traits<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_ifstream<char, std::char_traits<char> > const&) + 198
9 0x00000000005a8d99 (null) + 5934489
10 0x00000000005aa5cf (null) + 5940687
11 0x00000000005ace4b (null) + 5951051
12 0x00000000004c9132 main + 914
13 0x00007f549d81fc87 __libc_start_main + 231
14 0x00000000004cbdc4 (null) + 5029316
Dear authors, we have indexed 94 RNA-seq files (total fastq.gz: 201Gb) and we obtained a 441Gb kmtrick index.
This looks big compared to your supplementary tables.
A merged Jellyfish index made with DEkupl's joincount for the same dataset was only 23Gb.
We are wondering whether we are doing something wrong. I'm attaching my code below.
Thanks !
kmtricks.txt
whole_matrix.txt
fof_mondor.txt
Hello, I needed a matrix of the kmer counts for several samples. I followed the instructions given in the example of the documentation, but I don't see specified the correspondence between samples and columns, neither in the previous link nor in the documentation for the aggregate module. I would guess it follows the order given by the file of file ? I hope I didnt miss anything.
Thank you !
Hello
Since the process of generating hash PA matrix is expected to take less disk space, is it possible to generate hash:pa matrix and then
to convert, it as a last step, to kmer:pa matrix?
Thank you
Hi,
I was tempted into trying kmtricks after the nice talk of @pierrepeterlongo at DSB2021.
My use case is the following. I'm trying to find patterns of shared k-mer in a somewhat large genomic project: 204GB of uncompressed fasta files corresponding to ~480 scaffolded assemblies of butterfly, wasps and flies; roughly about 199G bp in total; I don't know about the unique k-mer count by now, but it should probably be less than the total TARA ocean project, so I guess kmtricks can do the job ;)
I have only one file by specimen, and so my file of files looks like:
# fof.txt
sample1 : sample1.fna ! 1
sample2 : sample2.fna ! 1
This triggers the following error:
Traceback (most recent call last):
File "kmtricks.py", line 1080, in <module>
main()
File "kmtricks.py", line 1072, in main
pool.exec()
File "kmtricks.py", line 866, in exec
self.run_ready()
File "kmtricks.py", line 929, in run_ready
cmd.run()
File "kmtricks.py", line 353, in run
self.preprocess()
File "kmtricks.py", line 586, in preprocess
raise FileExistsError(f'{repart_file} doesn\'t exists.')
FileExistsError: kmdir/storage/partition_storage_gatb/minimRepart.minimRepart doesn't exists.
(I tried with both conda installed kmtricks and compiled from source.)
It works if I trick it into parsing twice the same file:
# fof.txt
sample1 : sample1.fna ; sample1.fna ! 1
sample2 : sample2.fna ; sample2.fna ! 1
The command ran was taken from your benchmarks here:
set -euo pipefail
rm -rf kmdir
kmtricks.py --verbose --debug run \
--file fof.txt \
--run-dir kmdir \
--kmer-size 20 \
--nb-cores 8 \
--nb-partitions 1 \
--count-abundance-min 1 \
--recurrence-min 1 \
--mode bf_trp \
--hasher sabuhash \
--max-hash 1000000 \
--split howde \
--lz4 \
--max-count 256 \
--max-memory 8000 \
--log-files repart,superk,count,merge,split
Thanks for kmtricks anyway, it looks promising!
Hello,
Thanks for developping this nice tool !
I found the output counts are scaled into 0-255 by default, and I was wondering if there is some way to have the raw counts as output without scaling ?
Thanks and best wishes.
Hello
after compiling kmindex from sources. I tried to run the example scripts.
1_buildsh -> OK
2_register.sh -> FAILURE see:
rpm_maker:examples/data > sh 2_register.sh
[2023-10-11 13:45:08.390] [error] [InvalidParamError] -> Unknown param: --index.
[2023-10-11 13:45:08.394] [error] [InvalidParamError] -> Unknown param: --index.
and when I try to use the doc eg here also have a problem.
rpm_maker:examples/data > kmindex build --fof fof1.txt --run-dir D1_index --index ./G --register-as D1 --hard-min --kmer-size 25 --bloom-size 1000000
[2023-10-11 13:49:11.338] [error] [MissingValueError] -> --hard-minneeds a value.
can you provide runnable instructions please.
regards
Eric
Hello,
I have a question about kmtricks on pair-end RNA-seq data: if the two fastq files are in "reverse-forward" mode, i.e. the first fastq contains reverse reads and the second fastq contains forward reads, does kmtricks treate them specially (for example firstly reverse-complement the first fastq before counting k-mers) ?
Thanks and best wishes !
I'm getting the following error when I run kmtricks with 276 samples:
$ kmtricks pipeline --kmer-size 111 --hard-min 0 --share-min 1 --soft-min 2 --recurrence-min 3 --file data/group/xjin/r.proc.kmtricks_input.txt --run-dir <DIR> --mode kmer:count:text --threads 24
[2024-03-22 09:26:07.016] [info] Run with Kmer<128> - uint64_t[4] implementation
[2024-03-22 09:26:07.069] [info] Compute configuration...
[2024-03-22 09:26:07.069] [info] 276 samples found (552 read files).
[2024-03-22 09:26:47.828] [info] Use 156 partitions.
[2024-03-22 09:26:48.117] [info] Compute minimizer repartition...
Compute SuperK [> ] [00m:00s]
Count partitions [> ] [00:00s]
terminate called after throwing an instance of 'std::runtime_error'
what(): Unable to open <DIR>/superkmers/xjin_AB_P0R2c/skp.90
terminate called recursively
[2024-03-22 09:30:47.682] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
./kmtricks_backtrace.log:
Backtrace:
1 0x00007f8d88e41090 (null) + 140245863764112
2 0x00007f8d88e4100b gsignal + 203
3 0x00007f8d88e20859 abort + 299
4 0x00007f8d89213f00 __gnu_cxx::__verbose_terminate_handler() + 192
5 0x00007f8d8921243c (null) + 140245867766844
6 0x00007f8d8921248e (null) + 140245867766926
7 0x000055a817e6c705 (null) + 94180443866885
8 0x00007f8d88e448a7 (null) + 140245863778471
9 0x00007f8d88e44a60 on_exit + 0
10 0x000055a817f2e98a (null) + 94180444662154
11 0x00007f8d88e41090 (null) + 140245863764112
12 0x00007f8d88edb23f clock_nanosleep + 223
13 0x00007f8d88ee0ec7 nanosleep + 23
14 0x000055a817f3f95b (null) + 94180444731739
15 0x000055a8180b14d0 (null) + 94180446246096
16 0x000055a8180b2165 (null) + 94180446249317
17 0x000055a8180b9443 (null) + 94180446278723
18 0x000055a817e52afb main + 1419
19 0x00007f8d88e22083 __libc_start_main + 243
20 0x000055a817e555e5 (null) + 94180443772389
infos:
$ kmtricks infos
kmtricks v1.4.0
- HOST -
build host: Linux-6.1.3
run host: Linux 4.18.0-513.11.1.el8_9.x86_64
- BUILD -
c compiler: GNU 11.2.0
cxx compiler: GNU 11.2.0
conda: ON
static: OFF
native: OFF
modules: ON
socks: ON
howde: ON
dev: OFF
kmer: 32,64,96,128,160,192,224,256
max_c: 4294967295
- GIT SHA1 / VERSION -
kmtricks: 7dc4d18
sdsl: c32874c
bcli: 3e4f493
fmt: 0544a227
kff: 97d135e
lz4: 4de56b3
spdlog: v1.2.1-1811-g5b4c4f3f
xxhash: 6853ddc
gtest: release-1.8.0-2774-g96f4ce02
croaring: v0.3.3-17-g2d5c927
robin-hood-hasing: 24b3f50
turbop: 4ab9f5b
cfrcat: 2f9da97
indicators: v1.9-36-gcdcff01
Contact: [email protected]
I don't get the same error when I use a subset of 8 samples (instead of 276), nor when I use just 12 threads (instead of 24).
This seems pretty clearly related to Issue #15 , where kmtricks is opening too many files.
Indeed, lsof
confirms this, with the crash occuring just as the number of open files ramps up. When I use 12 threads, <1000 files are opened and it doesn't crash.
While using fewer threads works, I'd love a solution that maintains the high parallelization during other steps. Can you suggest a way to run the kmtricks pipeline so that the superkmers computation step doesn't open too many files, but I get as much parallelization as possible?
Thanks for your help, and for building a valuable tool!
Hi there,
I would like to use kmtricks
, to use HowDeSBT
as this example suggests that there is a convenient wrapper using the newest index build.
Is the search of kmtricks
resp. HowDeSBT
equivalent? Meaning that if I use kmtricks
, the search timings and results are the same as if I would use the original HowDeSBT index/query.
Another question: How do I determine the Bloomfilter Size?
in the example kmtricks pipeline
needs this as a command line argument. But I don't how to choose an appropriate size for my data set.
Thanks in advance,
Svenja
https://github.com/tlemane/kmtricks/wiki/kmtricks-socks-interface
-t option stands both for threshold and for nb of threads.
Hi! Thanks for this excellent tool! When aggregating or dumping on a run-dir that's read-only, I get the following error:
terminate called after throwing an instance of 'gatb::core::system::Exception'
[2024-04-27 20:35:12.548] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
It took me a while to figure this out, and I've replicated this behavior using the supplied test data. Putting this note here in case someone else runs into this issue.
Hi,
Thanks for your amazing tool! Really helpful!
However, I got a problem when I run the command below
kmtricks.py run --file fof.txt --run-dir ./count_run --kmer-size 31 --nb-cores 8 --nb-partitions 4 --count-abundance-min 0 --recurrence-min 1 --mode ascii --lz4
And I the content in the 'fof.txt' is shown below.
The k-mer size in the output file is still 20. Is there anything wrong with my command or the file 'fof.txt'?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.