tlemane / kmtricks Goto Github PK

View Code? Open in Web Editor NEW

67.0 5.0 6.0 18.06 MB

modular k-mer count matrix and Bloom filter construction for large read collections

License: GNU Affero General Public License v3.0

CMake 0.93% Shell 1.93% C++ 96.96% Dockerfile 0.10% C 0.08%

kmer count matrix bloom-filters

kmtricks's People

Contributors

Stargazers

Watchers

Forkers

sam217pa pythseq kbseah schaudge lrobidou alixregnier

kmtricks's Issues

kmtricks crash at merge

Hello,

i installed kmtricks with conda, and i tried to run it on about 10000 fastq files, stored on an external drive.

here is the command line used :
kmtricks pipeline --file list_fastq_kmtricks --run-dir kmtricksDir --kmer-size 31 --hard-min 5 --mode kmer:count:bin --recurrence-min 10 -t 12

and here are the message obtained from kmtricks :

[2022-04-16 20:17:08.096] [info] Run with Kmer<32> - uint64_t implementation
[2022-04-16 20:17:08.320] [info] Compute configuration...
[2022-04-16 20:17:08.320] [info] 3504 samples found (10512 read files).
[2022-04-16 20:51:29.192] [info] Use 113 partitions.
[2022-04-16 20:51:29.287] [info] Compute minimizer repartition...
Compute SuperK [==================================================] [02d:11h:28m:38s]
Count partitions [==================================================] [02d:11h:28m:38s]
Merge partitions [> ] [00:00s]
terminate called after throwing an instance of 'std::runtime_error'
terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
[2022-04-19 08:34:53.972] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
what(): Unable to open /media/ugo/Transcend3/scRNAseq_kmer/EMTAB_9067/kmtricks/counts/partition_1/ERR4147809.kmer what(): Unable to open /media/ugo/Transcend3/scRNAseq_kmer/EMTAB_9067/kmtricks/counts/partition_10/ERR4147809.kmer

[2022-04-19 08:34:53.990] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2022-04-19 08:34:53.990] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log

i was not able to find the ./kmtricks_backtrace.log file
i checked the file /media/ugo/Transcend3/scRNAseq_kmer/EMTAB_9067/kmtricks/counts/partition_1/ERR4147809.kmer, and it exists.

here is kmtricks infos:
kmtricks v1.2.1

HOST -
build host: Linux-5.4.0-54-generic
run host: Linux 5.4.0-109-generic
BUILD -
c compiler: GNU 11.2.0
cxx compiler: GNU 11.2.0
conda: ON
static: OFF
native: OFF
modules: ON
socks: ON
howde: ON
dev: OFF
kmer: 32,64,96,128,160,192,224,256
max_c: 4294967295
GIT SHA1 / VERSION -
kmtricks: 5c5c0b5
sdsl: c32874c
bcli: 3e4f493
fmt: 0544a227
kff: 97d135e
lz4: 4de56b3
spdlog: v1.2.1-1811-g5b4c4f3f
xxhash: 6853ddc
gtest: release-1.8.0-2774-g96f4ce02
croaring: v0.3.3-17-g2d5c927
robin-hood-hasing: 24b3f50
turbop: 4ab9f5b
cfrcat: 2f9da97
indicators: v1.9-36-gcdcff01

Thanks for your help

Ugo

Unable to use kmtricks: "illegal hardware instruction"

Hello,

Thank you for developing kmtricks. I'm following the conda instructions to install kmtricks on an HPC running CentOS 7.9.2009, but I'm currently unable to run it:

$ conda create -p kmtricks_env
$ conda activate ./kmtricks_env
$ conda install -c conda-forge -c tlemane kmtricks
$ which kmtricks
~/conda_envs/kmtricks_env/bin/kmtricks
$ conda list kmtricks
# packages in environment at /data/home/***/conda_envs/kmtricks_env:
#
# Name                    Version                   Build  Channel
kmtricks                  1.0.0                hdf3d972_0    tlemane
$ kmtricks --version
[1]    24748 illegal hardware instruction  kmtricks --version
$ kmtricks --help
[1]    24776 illegal hardware instruction  kmtricks --help

The conda install seems to work fine on macOS (Big Sur, MacBook Air M1, 2020), although it uses v0.0.6 (kmtricks.py instead of kmtricks as in v1.0.0):

$ conda activate ./kmtricks_env
$ which kmtricks.py
/Users/***/conda_envs/kmtricks_env/bin/kmtricks.py
$ kmtricks.py --version
kmtricks v0.0.6, git_sha1 : 8539f16
$ kmtricks.py --help   
usage: kmtricks.py [-v] [-d] [--version] [-h] cmd ...

kmtricks cli

Subcommands:
  cmd            env, run

Global arguments:
  -v, --verbose  Verbose mode
  -d, --debug    Debug mode
  --version      Display kmtricks version
  -h, --help     Show this message and exit

I would appreciate any help.

Killed after receive Segmentation fault:SIGSEGV(11) signal

running:
kmtricks pipeline --file fof2 --run-dir ./kmer_pa --kmer-size 31 --mode kmer:pa:text -t 10
I get:

Killed after receive Segmentation fault:SIGSEGV(11) signal

running 120 samples, 420 read files (~14GB each) on Ubuntu 22.04, kernel 6.5.0-21-generic, kmtricks version v1.4.0. installed in conda environment. 125GB of RAM, 50BG sawp, 25 CPU. I monitored CPU and RAM using htop and did not see overuse of either CPU or RAM see log file:

Backtrace:
1 0x00007fe97f642520 (null) + 140640841377056
2 0x000055a6bf40c531 (null) + 94174661625137
3 0x000055a6bf40c33d (null) + 94174661624637
4 0x000055a6bf3ee1a3 gatb::core::kmer::impl::RepartitorAlgorithm<32ul>::computeRepartition(gatb::core::kmer::impl::Repartitor&) + 563
5 0x000055a6bf3eee6a gatb::core::kmer::impl::RepartitorAlgorithm<32ul>::execute() + 138
6 0x000055a6bf290914 (null) + 94174660069652
7 0x000055a6bf291098 (null) + 94174660071576
8 0x000055a6bf2917d0 (null) + 94174660073424
9 0x000055a6bf044260 main + 3312
10 0x00007fe97f629d90 (null) + 140640841276816
11 0x00007fe97f629e40 __libc_start_main + 128
12 0x000055a6bf0465e5 (null) + 94174657668581

Please advise? Thank you

add samples to a previous run

Thanks for kmtricks; we have incorporated it into one of our lab pipelines with significant computing time improvement.

We use kmtricks to generate binary presence/absence matrices from x samples, each from 2-4 fastq files (.fq.gz). These files are significant, and a goal is to remove them from storage after computing.

Our usage is fairly simple:
kmtricks pipeline --mode kmer:pa:bin
kmtricks aggregate --pa-matrix kmer --format text

My query is, I want to incorporate z additional samples at a later date and recalculate everything, but without bringing back the reads for previous x samples, i.e. adding the new samples from fastq files into the previous run quants.

Is it possible?
I have tried to get some ideas from the wiki, but I need help finding something suggesting this is possible and where to start

Thanks for your help.

Clarification suggestion

kmtricks uses a non-standard definition of canonical k-mer, because it treats T<G. This is mentioned in the usage of the aggregate command with sorted option, but its also relevant when the output is not sorted (because it defines what is the canonical k-mer). It could save future users some debugging time if this information was included in kmer dump help and kmer aggregate help (for non sorted option).

Just to clarify, I think you do have this information there already but it could help avoid user-error if it was featured more prominently in the help/usage.

make test failed la deuxième fois

J'ai fait une modif, j'ai relancé make test pour voir et ça a fail. Bon ok mais j'ai comme un doute.

Du coup, j'ai remis le répertoire test tout propre comme quand j'ai cloné le dépôt et j'ai enlevé ma modif.
Je lance une première fois make test -> pass
Je lance une deuxième fois make test (sans rien changer) -> fail

Y a un truc bizarre :-)

compilation error on EC2

Error I get when compiling kmtricks on a fresh EC2 server:

In file included from /mnt/1/kmtricks/include/kmtricks/howde_utils.hpp:32,
                 from /mnt/1/kmtricks/include/kmtricks/task.hpp:36,
                 from /mnt/1/kmtricks/include/kmtricks/cmd.hpp:37,
                 from /mnt/1/kmtricks/src/kmtricks.cpp:24:
/mnt/1/kmtricks/thirdparty/cfrcat/include/cfrcat/cfrcat.hpp: In function ‘uint64_t cfr::concat(int, int)’:
/mnt/1/kmtricks/thirdparty/cfrcat/include/cfrcat/cfrcat.hpp:124:10: error: ‘copy_file_range’ was not declared in this scope
   return copy_file_range(in_fd, NULL, out_fd, &offset, size, 0);

system:

$ uname -ar
Linux ip-xxx.us-west-2.compute.internal 5.10.130-118.517.amzn2.x86_64 #1 SMP Wed Jul 13 16:51:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
$ cmake --version
cmake3 version 3.13.1
$ /usr/lib64/libc.so.6
GNU C Library (GNU libc) stable release version 2.26, by Roland McGrath et al.

hardware_concurrency returns number of CPUs instead of available CPUs on Linux

kmdiff by default use std::thread::hardware_concurrency() to get the number of available threads. in cli.cpp

but, because there's a but ;-)

hardware_concurrency returns, when possible, the underlying hardware capability to run threads, which might not corresponds to the actual number of cores available to the process (through the use of taskset, batch system like slurm, etc...). The consequence is that kmdiff might run in a non optimal way.

For example, I've got a user that has submitted a kmc job on a 96 cores HPC nodes, in a single core slurm allocation: more than 100 threads are now fighting for the usage of this core.

I would suggest to switch to sched_getaffinity in order to get the default trhead number value.

something like that.

#include <sched.h>

int getCPUs()
{
  cpu_set_t cpu_set;
  sched_getaffinity(0, sizeof(cpu_set), &cpu_set);
  return CPU_COUNT(&cpu_set);
}

regards
Eric

False positive kmers

Hi, thanks for making this great tool! I'm trying to generate kmer presence/absence matrices with these commands:
kmtricks pipeline --file list_five --run-dir output_5 --cpr --mode kmer:pa:bin
kmtricks aggregate --run-dir output_5 --pa-matrix kmer --format text --cpr-in --sorted > output.txt

I've noticed that, in the output, kmers are being reported as present in an assembly but when I grep for that kmer it isn't in the fasta. Are there any settings I can use to prevent this happening? I've tried --hard-min 1 which didn't help.

Thanks!
Jenny

kmtricks uses all RAM

hello

I launched this kmtricks command for a file of files with ~ 37000 fastq files

kmtricks pipeline --file "$list_fq" --run-dir "$outDir/out" --kmer-size 31 --hard-min 1 --mode kmer:count:bin --until count --cpr -t 20

i runs nicely but at some point uses all ram available (i have 128Gb) and the script stops

./analysis_10x_MDAMB468_kmtricks.sh : ligne 30 : 21185 Processus arrêté "$kmtricks" pipeline --file "$list_fq" --run-dir "$outDir/out" --kmer-size 31 --hard-min 1 --mode kmer:count:bin --until count --cpr -t 20

Is there a way to limit the RAM usage ?

thanks a lot

Specify python version in Conda recipe

I installed kmtricks v0.0.2 with conda. Running kmtricks.py with python < 3.6 results in the following error message:

  File "/ebio/abt2_projects/ag-swart-loxodes/envs/kmtricks/bin/kmtricks.py", line 109
    self.global_parser: argparse.ArgumentParser = None
                      ^
SyntaxError: invalid syntax

It appears that the version check in lines 40-42 of kmtricks.py isn't used because it's not enclosed in the main code block.

Perhaps the python version could be included in the Conda recipe? I've gotten it to work now and look forward to trying out the pipeline.

la ligne 85 dans km_reads_to_superk.cpp : pInfo.saveInfoFile(name) est-elle utile

Salut Téo,

Pierre m'a transmis le lien vers kmtricks pour que je regarde un peu ce que tu as fait.
Pour voir, j'ai profilé l'exécutable (sur un seul fichier donc pas trop d'intérêt pour le moment).

Et j'ai vu que 86% du temps était dans la fonction pInfo.saveInfoFile(name) qui est finalement (si j'ai bien compris) un dump lisible de qu'il y a dans la version binaire. Je suppose que c'est un truc de dev ;-)

Bien à toi,

Guillaume.

PS : je peux faire un pull request si tu veux, j'ai fait la modif pour voir

error while loading shared libraries: liblz4.so.1

Hello,

I installed kmtricks with conda, and I got the error when I tried to run it：
kmtricks: error while loading shared libraries: liblz4.so.1: cannot open shared object file: No such file or directory
here is my systerm kmtricks：
————————————————
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Thanks for your help!

Killed after receive Segmentation fault:SIGSEGV(11) signal

terminate called recursively / after throwing an instance of 'std::runtime_error'

Hello,
I'm struggling with the following problem:

I'm running kmtricks (installed through conda) in order to index 1051 genomes. This is the command line that produced the error:

kmtricks pipeline --file ./genomes.fof --run-dir ./index --kmer-size 31 --mode hash:bft:bin --hard-min 2 --soft-min 3 --share-min 1 --bloom-size 10000 --bf-format howdesbt --cpr

And this is the error message (the backtrace log file is empty):

[2022-04-28 12:54:22.153] [info] Run with Kmer<32> - uint64_t implementation
[2022-04-28 12:54:22.295] [info] Compute configuration...
[2022-04-28 12:54:22.295] [info] 1051 samples found (1051 read files).
[2022-04-28 12:55:19.370] [info] Use 4 partitions.
[2022-04-28 12:55:19.459] [info] Compute minimizer repartition...
Compute SuperK   [==================================================] [01m:40s]
Count partitions [==================================================] [01m:41s]
Merge partitions [>                                                 ] [00:00s]
terminate called recursively
terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
terminate called recursively
  what():  Unable to open ./index/counts/partition_3/G268.hash.p4
[2022-04-28 13:06:08.228] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2022-04-28 13:06:08.228] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
[2022-04-28 13:06:08.229] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log
terminate called recursively
[2022-04-28 13:06:08.229] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log

I'm not sure if this is the same problem reported in issue #15, this is why I'm opening this new issue.
In case this is the same kind of problem, is there something else I could try to overcome this issue without increasing the the maximum number of open files (ulimit)?

In #15 @tlemane suggested to also reduce the number of threads, but I didn't specify the -t argument, so this is not really useful in my case.

Thanks in advance for your help

crash at dump

kmtricks dump seems to crash with the following error:

terminate called after throwing an instance of 'km::IOError'
  what():  std::exception
[2023-03-31 07:43:20.178] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log

kmtricks infos:

kmtricks v1.3.0

- HOST -
build host: Linux-6.1.3
run host: Linux 4.15.0-197-generic

- BUILD -
c compiler: GNU 11.2.0
cxx compiler: GNU 11.2.0
conda: ON
static: OFF
native: OFF
modules: ON
socks: ON
howde: ON
dev: OFF
kmer: 32,64,96,128,160,192,224,256
max_c: 4294967295

- GIT SHA1 / VERSION -
kmtricks: 92d7894
sdsl: c32874c
bcli: 3e4f493
fmt: 0544a227
kff: 97d135e
lz4: 4de56b3
spdlog: v1.2.1-1811-g5b4c4f3f
xxhash: 6853ddc
gtest: release-1.8.0-2774-g96f4ce02
croaring: v0.3.3-17-g2d5c927
robin-hood-hasing: 24b3f50
turbop: 4ab9f5b
cfrcat: 2f9da97
indicators: v1.9-36-gcdcff01

Contact: [email protected]

Backtrace:

Backtrace:
1 0x00007f549d83cf10 (null) + 140001396641552
2 0x00007f549d83ce87 gsignal + 199
3 0x00007f549d83e7f1 abort + 321
4 0x00007f549e4ae036 __gnu_cxx::__verbose_terminate_handler() + 192
5 0x00007f549e4ac524 (null) + 140001409680676
6 0x00007f549e4ac576 (null) + 140001409680758
7 0x00007f549e4ac768 __cxa_rethrow + 0
8 0x00000000005411b6 void km::check_fstream_good<std::basic_ifstream<char, std::char_traits<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_ifstream<char, std::char_traits<char> > const&) + 198
9 0x00000000005a8d99 (null) + 5934489
10 0x00000000005aa5cf (null) + 5940687
11 0x00000000005ace4b (null) + 5951051
12 0x00000000004c9132 main + 914
13 0x00007f549d81fc87 __libc_start_main + 231
14 0x00000000004cbdc4 (null) + 5029316

Big kmtricks index obtained

Dear authors, we have indexed 94 RNA-seq files (total fastq.gz: 201Gb) and we obtained a 441Gb kmtrick index.
This looks big compared to your supplementary tables.
A merged Jellyfish index made with DEkupl's joincount for the same dataset was only 23Gb.
We are wondering whether we are doing something wrong. I'm attaching my code below.
Thanks !
kmtricks.txt
whole_matrix.txt
fof_mondor.txt

Clarification needed for kmer matrix columns

Hello, I needed a matrix of the kmer counts for several samples. I followed the instructions given in the example of the documentation, but I don't see specified the correspondence between samples and columns, neither in the previous link nor in the documentation for the aggregate module. I would guess it follows the order given by the file of file ? I hope I didnt miss anything.
Thank you !

Hash values to kmer conversion

Hello
Since the process of generating hash PA matrix is expected to take less disk space, is it possible to generate hash:pa matrix and then
to convert, it as a last step, to kmer:pa matrix?
Thank you

kmtricks fails when only one path by sample is provided

Hi,

I was tempted into trying kmtricks after the nice talk of @pierrepeterlongo at DSB2021.

My use case is the following. I'm trying to find patterns of shared k-mer in a somewhat large genomic project: 204GB of uncompressed fasta files corresponding to ~480 scaffolded assemblies of butterfly, wasps and flies; roughly about 199G bp in total; I don't know about the unique k-mer count by now, but it should probably be less than the total TARA ocean project, so I guess kmtricks can do the job ;)

I have only one file by specimen, and so my file of files looks like:

# fof.txt
sample1 : sample1.fna ! 1
sample2 : sample2.fna ! 1

This triggers the following error:

Traceback (most recent call last):
  File "kmtricks.py", line 1080, in <module>
    main()
  File "kmtricks.py", line 1072, in main
    pool.exec()
  File "kmtricks.py", line 866, in exec
    self.run_ready()
  File "kmtricks.py", line 929, in run_ready
    cmd.run()
  File "kmtricks.py", line 353, in run
    self.preprocess()
  File "kmtricks.py", line 586, in preprocess
    raise FileExistsError(f'{repart_file} doesn\'t exists.')
FileExistsError: kmdir/storage/partition_storage_gatb/minimRepart.minimRepart doesn't exists.

(I tried with both conda installed kmtricks and compiled from source.)

It works if I trick it into parsing twice the same file:

# fof.txt
sample1 : sample1.fna ; sample1.fna ! 1
sample2 : sample2.fna ; sample2.fna ! 1

The command ran was taken from your benchmarks here:

set -euo pipefail
rm -rf kmdir

kmtricks.py --verbose --debug run \
           --file fof.txt \
           --run-dir kmdir \
           --kmer-size 20 \
           --nb-cores 8 \
           --nb-partitions 1 \
           --count-abundance-min 1 \
           --recurrence-min 1 \
           --mode bf_trp \
           --hasher sabuhash \
           --max-hash 1000000 \
           --split howde \
           --lz4 \
           --max-count 256 \
           --max-memory 8000 \
           --log-files repart,superk,count,merge,split

Thanks for kmtricks anyway, it looks promising!

Possibility to output raw counts ?

Hello,

Thanks for developping this nice tool !
I found the output counts are scaled into 0-255 by default, and I was wondering if there is some way to have the raw counts as output without scaling ?

Thanks and best wishes.

documentation, examples problems

Hello

after compiling kmindex from sources. I tried to run the example scripts.
1_buildsh -> OK
2_register.sh -> FAILURE see:

rpm_maker:examples/data > sh 2_register.sh
[2023-10-11 13:45:08.390] [error] [InvalidParamError] -> Unknown param: --index.
[2023-10-11 13:45:08.394] [error] [InvalidParamError] -> Unknown param: --index.

and when I try to use the doc eg here also have a problem.

rpm_maker:examples/data > kmindex build --fof fof1.txt --run-dir D1_index --index ./G --register-as D1 --hard-min --kmer-size 25 --bloom-size 1000000
[2023-10-11 13:49:11.338] [error] [MissingValueError] -> --hard-minneeds a value.

can you provide runnable instructions please.

regards

Eric

Does kmtricks consider read orientation ?

Hello,

I have a question about kmtricks on pair-end RNA-seq data: if the two fastq files are in "reverse-forward" mode, i.e. the first fastq contains reverse reads and the second fastq contains forward reads, does kmtricks treate them specially (for example firstly reverse-complement the first fastq before counting k-mers) ?

Thanks and best wishes !

Terminate called after throwing an instance of 'std::runtime_error'; Unable to open superkmers/xjin_AB_P0R2c/skp.90

I'm getting the following error when I run kmtricks with 276 samples:

$ kmtricks pipeline --kmer-size 111 --hard-min 0 --share-min 1 --soft-min 2 --recurrence-min 3 --file data/group/xjin/r.proc.kmtricks_input.txt --run-dir <DIR> --mode kmer:count:text --threads 24
[2024-03-22 09:26:07.016] [info] Run with Kmer<128> - uint64_t[4] implementation
[2024-03-22 09:26:07.069] [info] Compute configuration...
[2024-03-22 09:26:07.069] [info] 276 samples found (552 read files).
[2024-03-22 09:26:47.828] [info] Use 156 partitions.
[2024-03-22 09:26:48.117] [info] Compute minimizer repartition...
Compute SuperK   [>                                                 ] [00m:00s]                                    
Count partitions [>                                                 ] [00:00s]                                     
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unable to open <DIR>/superkmers/xjin_AB_P0R2c/skp.90
terminate called recursively
[2024-03-22 09:30:47.682] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log

./kmtricks_backtrace.log:

Backtrace:
1 0x00007f8d88e41090 (null) + 140245863764112
2 0x00007f8d88e4100b gsignal + 203
3 0x00007f8d88e20859 abort + 299
4 0x00007f8d89213f00 __gnu_cxx::__verbose_terminate_handler() + 192
5 0x00007f8d8921243c (null) + 140245867766844
6 0x00007f8d8921248e (null) + 140245867766926
7 0x000055a817e6c705 (null) + 94180443866885
8 0x00007f8d88e448a7 (null) + 140245863778471
9 0x00007f8d88e44a60 on_exit + 0
10 0x000055a817f2e98a (null) + 94180444662154
11 0x00007f8d88e41090 (null) + 140245863764112
12 0x00007f8d88edb23f clock_nanosleep + 223
13 0x00007f8d88ee0ec7 nanosleep + 23
14 0x000055a817f3f95b (null) + 94180444731739
15 0x000055a8180b14d0 (null) + 94180446246096
16 0x000055a8180b2165 (null) + 94180446249317
17 0x000055a8180b9443 (null) + 94180446278723
18 0x000055a817e52afb main + 1419
19 0x00007f8d88e22083 __libc_start_main + 243
20 0x000055a817e555e5 (null) + 94180443772389

infos:

$ kmtricks infos
kmtricks v1.4.0

- HOST -
build host: Linux-6.1.3
run host: Linux 4.18.0-513.11.1.el8_9.x86_64

- BUILD -
c compiler: GNU 11.2.0
cxx compiler: GNU 11.2.0
conda: ON
static: OFF
native: OFF
modules: ON
socks: ON
howde: ON
dev: OFF
kmer: 32,64,96,128,160,192,224,256
max_c: 4294967295

- GIT SHA1 / VERSION -
kmtricks: 7dc4d18
sdsl: c32874c
bcli: 3e4f493
fmt: 0544a227
kff: 97d135e
lz4: 4de56b3
spdlog: v1.2.1-1811-g5b4c4f3f
xxhash: 6853ddc
gtest: release-1.8.0-2774-g96f4ce02
croaring: v0.3.3-17-g2d5c927
robin-hood-hasing: 24b3f50
turbop: 4ab9f5b
cfrcat: 2f9da97
indicators: v1.9-36-gcdcff01

Contact: [email protected]

I don't get the same error when I use a subset of 8 samples (instead of 276), nor when I use just 12 threads (instead of 24).
This seems pretty clearly related to Issue #15 , where kmtricks is opening too many files.
Indeed, lsof confirms this, with the crash occuring just as the number of open files ramps up. When I use 12 threads, <1000 files are opened and it doesn't crash.

While using fewer threads works, I'd love a solution that maintains the high parallelization during other steps. Can you suggest a way to run the kmtricks pipeline so that the superkmers computation step doesn't open too many files, but I get as much parallelization as possible?

Thanks for your help, and for building a valuable tool!

Questions to kmtricks vs HowDeSBT

Hi there,

I would like to use kmtricks, to use HowDeSBT as this example suggests that there is a convenient wrapper using the newest index build.
Is the search of kmtricks resp. HowDeSBT equivalent? Meaning that if I use kmtricks, the search timings and results are the same as if I would use the original HowDeSBT index/query.

Another question: How do I determine the Bloomfilter Size?
in the example kmtricks pipeline needs this as a command line argument. But I don't how to choose an appropriate size for my data set.

Thanks in advance,
Svenja

socks -t option

https://github.com/tlemane/kmtricks/wiki/kmtricks-socks-interface

-t option stands both for threshold and for nb of threads.

Crashes when run-dir is not writeable

Hi! Thanks for this excellent tool! When aggregating or dumping on a run-dir that's read-only, I get the following error:

terminate called after throwing an instance of 'gatb::core::system::Exception'
[2024-04-27 20:35:12.548] [error] Killed after receive Aborted:SIGABRT(6) signal. Demangled backtrace dumped at ./kmtricks_backtrace.log. If the problem persists, please open an issue with the return of 'kmtricks infos' and the content of ./kmtricks_backtrace.log

It took me a while to figure this out, and I've replicated this behavior using the supplied test data. Putting this note here in case someone else runs into this issue.

"--kmer-size" not work

Hi,
Thanks for your amazing tool! Really helpful!
However, I got a problem when I run the command below
kmtricks.py run --file fof.txt --run-dir ./count_run --kmer-size 31 --nb-cores 8 --nb-partitions 4 --count-abundance-min 0 --recurrence-min 1 --mode ascii --lz4
And I the content in the 'fof.txt' is shown below.

The k-mer size in the output file is still 20. Is there anything wrong with my command or the file 'fof.txt'?

tlemane / kmtricks Goto Github PK

kmtricks's People

Contributors

Stargazers

Watchers

Forkers

kmtricks's Issues

Recommend Projects

Recommend Topics

Recommend Org