Giter VIP home page Giter VIP logo

fastk's People

Contributors

thegenemyers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastk's Issues

Non derterministic segfault and miscs

Hello,

Thanks for FastK, it is truly useful.

I'm looking at kmer coverage of contigs of a small assembly. It means that if I want to get kmer coverage/histogram for each single contig I need to create 1 fasta file per contig and apply FastK to it. It is quite involved.
I do trivial parallelisation over all files. My pipeline stop randomly as a consequence of a segfault on random contigs. Running incriminated step by itself outside of the pipeline does not allow to reproduce the issue. Restarting the pipeline from scratch does make it segfault again but not on the same files. My intuition is that there might be an issue with multiple FastK instance writing in the same temporary folder?
Here is an example of error message:
/bin/bash: line 1: 193787 Segmentation fault (core dumped) FastK -t1 -T2 seqs/folder43/contig_12.fa -P'seqs/folder43'

I am having another similar problem with random segfault. This time with Logex with
Logex -H 'result=A&.B' sample.ktab contigs.ktab
In all examples I've looked at, segfault happened when contigs.ktab was empty (well formed table with 0 kmer). Though running the same command line outside of the pipeline works without issue and indeed produce a working .hist file (even though with no kmer).
This second issue is less problematic in the sense that I can just pre-filter for empty contigs.ktab.

Additionally here are a few miscellaneous issues I encountered. I'm mostly puzzled by the first one:

  • Looking at kmer in small individual genes/contigs, for sequences of size 100+bp and k=40, I get the following error message:
    "FastK: Too much of the data is in reads less than the k-mer size".
    If I append "NN" at the end of the sequence, I obtain expected results without failure. And some other smaller sequences do not show that issue. I joined an example.
  • Fastmerge doesn't handle empty ktab. It segfaults.
  • Fastq.gz are unziped in the working directory but not removed after

If it's useful, I'm working on Ubuntu 16.04.7 LTS with gcc version 9.4.0.

Best,
Seb

Understanding how kmers are counted

I want to understand how kmers are counted in FastK and how that affects the totals in merquryFK calculations.

Why do the total values in completeness.stats and qv files differ so much? What do they represent and how they relate to each other? I run merquryfk with a single genome assembled using Pacbio HiFi and HiC data, and run against an Illumina kmer dataset.

# mMelMel1_T1.qv 
Assembly	No Support	Total	Error %	QV
GCA_922984935.2.subset	6005	7999890	0.0024	46.2

# mMelMel1_T1.completeness.stats 
Assembly	Region	Found	Total	% Covered
GCA_922984935.2.subset	all	2268391877	2268397787	100.00

From Merqury, marbl/merqury#84

The Total in QV are kmers that are 'present' in the assembly. So if there is one specific kmer found 3 times in the assembly, but never in the reads, it is counted as 3 error kmers (no suppurt). The 3 error kmers are part of the Total.

The Total in completeness are distinct solid kmers in the reads. In other words, a kmer that is present over a certain frequency in the reads is counted as one kmer. I forgot how exactly the Total is computed in MerquryFK completeness. It's likely that it is only filtering out kmers with frequency of 1, which is the default in FastK? Might be a good question for Gene.

We expected the opposite because the total for QC is ~8M whereas the total for Completeness is ~2.2B.

Won't open a .fa file

Synopsis: I think the suffix[] table at line 138 in io.c has a typo in the last item in the second row.

Details:

Looking to use this package to quickly identify half-covered scaffolds (or scaffold segments) in an assembly, having seen the two presentations in VGP conference calls.

Working with a mar/31 clone of the repo.

For first test I gave it a small random apple.fa file. When I tried 'FastK apple.fa', it reported "FastK: Cannot open apple.fa as a .cram|[bs]am|f{ast}[aq][.gz]|db|dam file."

That report comes from Fetch_File() in io.c. Examining that, well, it's not completely clear to me what the purpose of the suffix[] and extend[] tables are (maybe it forgives the user for giving an incomplete file extension?). But given that the loop that attempts the different extensions, in order, records the first success in i, and then assigns ftype as FASTQ or FASTA based on whether i is odd or even (and >=5), the intent appears to be that the table should alternate between fastq and fasta extensions. But at the end of the second line of suffix[] we see ".fq" twice. I infer the second of those should be ".fa", and when I make that change FastK is then able to open apple.fa without complaint.

Created FastK Docker container

As I couldn't see the FASTK's Docker container, I created it by myself. Feel free to take the file if you like.

https://gist.github.com/junaruga/dd71378add9fb8e840dad65bb050bf0d
https://quay.io/repository/junaruga/garg-fastk?tab=info

Note we are using the container at https://github.com/GargGroup/BioDivGenomics/tree/main/FASTK as a temporary workaround. After you prepare the Docker container official version, we would like to use it.

As you may know, if you create the bioconda package, then the Docker container is automatically created at https://biocontainers.pro .

E.g.
https://anaconda.org/bioconda/hifiasm
https://biocontainers.pro/tools/hifiasm
https://quay.io/repository/biocontainers/hifiasm?tab=info

Logex accept modimizer

Hi Gene,

I wonder if it would be possible for Logex to accept a modimizer for the output? Like say Logex 'reduced=A%51==0' source<.ktab>? I would like to get modimized homozygous kmers to use alongside het kmer pairs.

Best,
Haynes

haplex output ktab

Hi Gene,

Just digging into this. Looks great so far. Can you make Haplex output a ktab file as well as a text file? I'm wanting to get a set of het kmer pairs and then run those with some modimized homozygous kmers run through the profiler so that I can better categorize het kmer pairs. Or if there is a way to go from the text output to a .ktab that would be good as well.

Thanks,
Haynes

Logex and Fastmerge bugs

Hello Gene,

Logex and Fastmerge assume the ibyte field is always 3. I ran into a very small genome (~40 Mb), which gave me 2. The histograms from Logex and Fastmerge are correct. The Kmer tables, however, are junks.

(1) Line 1026 and 1044 should be something like bst+ibyte here instead of bst+3?

FASTK/Logex.c

Line 1026 in d694359

{ fwrite(bst+3,hbyte,1,out[i]);

FASTK/Logex.c

Line 1044 in d694359

{ fwrite(bst+3,hbyte,1,out[i]);

(2) Line L1028 and L1050 should be something like this?

x = 0;
if (ibyte==3)
    x = ((int) bst[0] << 16) | (bst[1] << 8) | bst[2];
else if(ibyte==2)
    x = ((int) bst[0] << 8) | bst[2];
else if(ibyte==1)
    x = bst[0];

FASTK/Logex.c

Line 1028 in d694359

x = (bst[0] << 16) | (bst[1] << 8) | bst[2];

FASTK/Logex.c

Line 1050 in d694359

x = (bst[0] << 16) | (bst[1] << 8) | bst[2];

(3) You write the minval as 1. Do you think if it is worth updating it after Logex?

FASTK/Logex.c

Line 1391 in d694359

fwrite(&one,sizeof(int),1,f);

(4) another minor point is do you think if it makes sense to grant the group read permission to files generated by Logex?

Fastmerge is pretty much the same as Logex.

Best,
Chenxi

Profiling HiFi readset crashes

Hi @thegenemyers,

Thanks for developting this tool, I like it a lot.

I wanted to profile a ~20 Gb gzipped fastq file using:

FastK -v -p -N../Test/FOO -P./ My_file.fastq.gz

And it goes successfully through the first three stages but then in Phase 4: Merging Profile Fragments
It breaks everytime and displays,

Phase 4 (-p option): Merging Profile Fragments

    0%
FastK: Cannot open external file /current/directory//FOO.25.P0.0 in /current/directory/

*** Error Exit 1 ***
sh: error while loading shared libraries: libc.so.6: cannot open shared object file: Error 24

   Could not clean up !

EDIT: Simplified path names
So it seems the -N argument is not working correctly inb phase 4 when using the -p profiling mode. I can confirm that the FOO profile files are partially complete in ../Test/ . I also tested FastK -v -p -N/current/directory/FOO -P/current/directory/ My_file.fastq.gz

Now the Error 24 suggests there a too many files open. But I raised my ulimit to the hard limit which is >1M, and monitoring the files attached to the process does not indicate large numbers at all. Is this a bug or am I using the command wrong?

Kind regards,

Ronald

Misnaming a gzipped input causes a segfault

Granted, this is a stupid mistake on my part. I created a file named sorang.telomeric.fasta but in fact it was a gzipped file. The first two byes of the file are the gzip magic number: 1f 8b.

Running FastK -v -k40 -t1 data/sorang.telomeric.fasta produced this output:

Partitioning 1 .fasta file into 4 parts
Determining minimizer scheme & partition for sorang.telomeric
  Estimate 155.896M 40-mers
  Handling data in a single block
Segmentation fault

Clearly, the fault is mine. But it did take me a while to discover the error of my ways.

I wonder if some simple test could be done early on to validate that what the user claims is fasta is really fasta. The fact that the first character (1f) is outside the printable ascii range, is strong evidence. Looking at the code (but not completely understanding it), it looks like such a test could be added to the FK1 case in fast_nearest without (probably) adding much runtime overhead.

gunzip won't unzip a symlink

(This doesn't seem like it needs to be addressed in the short term, if at all).

I tried "FastK orange.fa.gz" where orange.fa.gz was a symlink. The result is

gunzip: ./orange.fa.gz is not a regular file
FastK: Cannot get stats for ./orange.fa

Apparently gunzip doesn't like symlinks, so it fails to create the unzipped file. The downstream code in FastK, I guess, doesn't notice that gunzip failed, but a later sanity check notices the unzipped file doesn't exist.

This would be an issue for the use case where the user has read access to a shared directory of gzipped read data or assemblies, but doesn't have write access. They can't give FastK a path to the original (because, I presume, it would try to write the unzipped file in that directory). Traditionally a symlink would be the 'right' solution, to avoid wasted disk space. But perhaps disk deduplication technology makes this less of an issue?

I suspect this is only an issue for gzip'd files. I assume for the other compressed formats you are able to decompress on-the-fly and don't need to write an uncompressed file.

Bob H

Segfault for FastK, sometimes, when k isn't 40

synopsis: When running FastK with -k set 32 or less, I'm seeing segfaults.

Specifically, I've seen this with one particular input file (the blue whale assembly) and k in {20,26,31,32}.

Details:

I was trying to count 31-mers in this VGP blue whale assembly:
https://s3.amazonaws.com/genomeark/species/Balaenoptera_musculus/mBalMus1/assembly_curated/mBalMus1.pri.cur.20200528.fasta.gz

My current directory had mBalMus1.pri.cur.20200528.fasta.gz as a symlink to some other directory. Then I ran
FastK -v -k31 mBalMus1.pri.cur.20200528.fasta.gz
and got this output:

  Gzipped file mBalMus1.pri.cur.20200528.fasta.gz being temporarily uncompressed

Partitioning 1 .fasta file into 4 parts

Determining minimizer scheme & partition for mBalMus1.pri.cur.20200528
  Estimate 2.375G 31-mers
  Dividing data into 2 blocks
  Using 5-minimizers with 1024 core prefixes

Phase 1: Partitioning K-mers into 8 Super-mer Files

  There are 105 reads totalling 2,374,852,541 bps

     Part:         31-mer   super-mers  ave. length
        0:  1,151,752,026   74,926,939         15.4
        1:  1,197,133,971  112,483,639         10.6
      Sum:  2,348,885,997  187,410,578         12.5

      Range 1,151,752,026 - 1,197,133,971 (3.86%)

  Resources for phase:  1:27.610u  5.846s  1:01.816w  151.2%

Phase 2: Sorting & Counting K-mers in 2 blocks

  Processing block 2: Sorting super-mers     **Segmentation fault**

I'm using commit 7cebc7d, from a few hours ago.

Fastcat: Target name cannot have a .hist, .ktab, or .prof suffi

I followed the example 2 for hpc operations as I got an not enough disk space error when running fastk in single instance.
I completed the commandsfastk and fastmerge and end up with 25 slices. For each slice I have two files, e.g. slice10.hist and slice10.ktab. Now I am trying to apply the command fastcat.
However, it throws the following error:

Fastcat -vht full slice1 slice2
Fastcat: Target name cannot have a .hist, .ktab, or .prof suffi

I tried renaming files to remove the suffix, but it doesnt make a difference and the error persists.

I also tried:

Fastcat -vt full slice1.ktab slice2.ktab
Fastcat: Target name cannot have a .hist, .ktab, or .prof suffi

What am I doing wrong?

bc trim option to take as many ints as source files?

For linked reads we want to trim 23 bases from R1 but not from R2. Right now I'm making two calls and then doing a Logex 'result = (A |+ B)' R1 R2.

This would be fine except for min counts on 2 files vs one file. Making a trim of 23 for both is okay except you do lose a fair amount of data. I can make this change and make a pull request if you are willing to take pull requests. Or i can just make this change in my forked version.

*** Error in `../FastK': free(): corrupted unsorted chunks: 0x0000000001687480 *** when profiling sequences

Thanks for creating yet another incredibly useful looking bit of a software :) I have been searching for a program with all this functionality for a while but I have run into this issue. I have created this as a new issue as I believe it is separate to the file write to /tmp problem raised previously. I am getting corrupted memory faults when try to profile the following sequences downloadable as:

wget https://ebitutorial.s3.climb.ac.uk/graph5.fasta

Running the command:

FastK -k51 graph5.fasta -v -T1 -p

I get:


Partitioning 1 .fasta file into 1 parts

Determining minimizer scheme & partition for graph5
Estimate 1.703M 51-mers
Handling data in a single block
Using 5-minimizers with 1024 core prefixes

Phase 1: Partitioning K-mers into 1 Super-mer Files

There are 38 reads totalling 1,705,389 bps

 Part:     51-mer  super-mers  ave. length
    0:  1,703,489      81,679         20.9
  Sum:  1,703,489      81,679         20.9

  Range 1,703,489 - 1,703,489 (0.00%)

Resources for phase: 0.076u 0.016s 0.108w 84.7%

Phase 2: Sorting & Counting K-mers in 1 blocks

  Part:  wgt'd k-mers  savings             
     0:       988,833      1.7
   All:       988,833      1.7

Resources for phase: 0.312u 0.028s 0.368w 92.1%

Phase 4 (-p option): Merging Profile Fragments

The profiles occupy 160.35 KB
Segmentation fault (core dumped)

That may be the /tmp dir problem. If I point it at a drive with a large amount (1 Tbp of free space) as temp dir with -P I get:

FastK -k51 graph5.fasta -v -P/mnt/chris-native/chris/tmp/ -T1 -p

Partitioning 1 .fasta file into 1 parts

Determining minimizer scheme & partition for graph5
Estimate 1.703M 51-mers
Handling data in a single block
Using 5-minimizers with 1024 core prefixes

Phase 1: Partitioning K-mers into 1 Super-mer Files

There are 38 reads totalling 1,705,389 bps

 Part:     51-mer  super-mers  ave. length
    0:  1,703,489      81,679         20.9
  Sum:  1,703,489      81,679         20.9

  Range 1,703,489 - 1,703,489 (0.00%)

Resources for phase: 0.088u 0.008s 0.356w 26.9%

Phase 2: Sorting & Counting K-mers in 1 blocks

  Part:  wgt'd k-mers  savings             
     0:       988,833      1.7
   All:       988,833      1.7

Resources for phase: 0.400u 0.076s 0.551w 86.4%

Phase 4 (-p option): Merging Profile Fragments

The profiles occupy 160.35 KB
*** Error in `../FastK': free(): corrupted unsorted chunks: 0x0000000001687480 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777f5)[0x7fb467d637f5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8038a)[0x7fb467d6c38a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fb467d7058c]
../FastK[0x418bbb]
../FastK[0x40438a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fb467d0c840]
../FastK[0x404819]
======= Memory map: ========
00400000-004b2000 r-xp 00000000 00:2b 474411334 /mnt/gpfs/chris/repos/FASTK/FastK
006b1000-006b2000 r--p 000b1000 00:2b 474411334 /mnt/gpfs/chris/repos/FASTK/FastK
006b2000-006b3000 rw-p 000b2000 00:2b 474411334 /mnt/gpfs/chris/repos/FASTK/FastK
006b3000-006b6000 rw-p 00000000 00:00 0
01664000-02eac000 rw-p 00000000 00:00 0 [heap]
7fb35c000000-7fb35c021000 rw-p 00000000 00:00 0
7fb35c021000-7fb360000000 ---p 00000000 00:00 0
7fb361e85000-7fb461e86000 rw-p 00000000 00:00 0
7fb461e86000-7fb461e87000 ---p 00000000 00:00 0
7fb461e87000-7fb462687000 rw-p 00000000 00:00 0
7fb462df9000-7fb462e10000 r-xp 00000000 08:02 656120 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fb462e10000-7fb46300f000 ---p 00017000 08:02 656120 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fb46300f000-7fb463010000 r--p 00016000 08:02 656120 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fb463010000-7fb463011000 rw-p 00017000 08:02 656120 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fb463011000-7fb46301a000 r-xp 00000000 08:02 654573 /lib/x86_64-linux-gnu/libcrypt-2.23.so
7fb46301a000-7fb463219000 ---p 00009000 08:02 654573 /lib/x86_64-linux-gnu/libcrypt-2.23.so
7fb463219000-7fb46321a000 r--p 00008000 08:02 654573 /lib/x86_64-linux-gnu/libcrypt-2.23.so
7fb46321a000-7fb46321b000 rw-p 00009000 08:02 654573 /lib/x86_64-linux-gnu/libcrypt-2.23.so
7fb46321b000-7fb463249000 rw-p 00000000 00:00 0
7fb463249000-7fb463319000 r-xp 00000000 08:02 134403 /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
7fb463319000-7fb463518000 ---p 000d0000 08:02 134403 /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
7fb463518000-7fb46351b000 r--p 000cf000 08:02 134403 /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
7fb46351b000-7fb46351d000 rw-p 000d2000 08:02 134403 /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
7fb46351d000-7fb46351e000 rw-p 00000000 00:00 0
7fb46351e000-7fb463565000 r-xp 00000000 08:02 137969 /usr/lib/x86_64-linux-gnu/libhx509.so.5.0.0
7fb463565000-7fb463764000 ---p 00047000 08:02 137969 /usr/lib/x86_64-linux-gnu/libhx509.so.5.0.0
7fb463764000-7fb463766000 r--p 00046000 08:02 137969 /usr/lib/x86_64-linux-gnu/libhx509.so.5.0.0
7fb463766000-7fb463768000 rw-p 00048000 08:02 137969 /usr/lib/x86_64-linux-gnu/libhx509.so.5.0.0
7fb463768000-7fb463769000 rw-p 00000000 00:00 0
7fb463769000-7fb463777000 r-xp 00000000 08:02 137959 /usr/lib/x86_64-linux-gnu/libheimbase.so.1.0.0
7fb463777000-7fb463976000 ---p 0000e000 08:02 137959 /usr/lib/x86_64-linux-gnu/libheimbase.so.1.0.0
7fb463976000-7fb463977000 r--p 0000d000 08:02 137959 /usr/lib/x86_64-linux-gnu/libheimbase.so.1.0.0
7fb463977000-7fb463978000 rw-p 0000e000 08:02 137959 /usr/lib/x86_64-linux-gnu/libheimbase.so.1.0.0
7fb463978000-7fb46399f000 r-xp 00000000 08:02 137965 /usr/lib/x86_64-linux-gnu/libwind.so.0.0.0
7fb46399f000-7fb463b9f000 ---p 00027000 08:02 137965 /usr/lib/x86_64-linux-gnu/libwind.so.0.0.0
7fb463b9f000-7fb463ba0000 r--p 00027000 08:02 137965 /usr/lib/x86_64-linux-gnu/libwind.so.0.0.0
7fb463ba0000-7fb463ba1000 rw-p 00028000 08:02 137965 /usr/lib/x86_64-linux-gnu/libwind.so.0.0.0
7fb463ba1000-7fb463ba8000 r-xp 00000000 08:02 133165 /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4
7fb463ba8000-7fb463da7000 ---p 00007000 08:02 133165 /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4
7fb463da7000-7fb463da8000 r--p 00006000 08:02 133165 /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4
7fb463da8000-7fb463da9000 rw-p 00007000 08:02 133165 /usr/lib/x86_64-linux-gnu/libffi.so.6.0.4
7fb463da9000-7fb463dbe000 r-xp 00000000 08:02 145811 /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0
7fb463dbe000-7fb463fbd000 ---p 00015000 08:02 145811 /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0
7fb463fbd000-7fb463fbe000 r--p 00014000 08:02 145811 /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0
7fb463fbe000-7fb463fbf000 rw-p 00015000 08:02 145811 /usr/lib/x86_64-linux-gnu/libroken.so.18.1.0
7fb463fbf000-7fb463fef000 r-xp 00000000 08:02 137955 /usr/lib/x86_64-linux-gnu/libhcrypto.so.4.1.0
7fb463fef000-7fb4641ef000 ---p 00030000 08:02 137955 /usr/lib/x86_64-linux-gnu/libhcrypto.so.4.1.0
7fb4641ef000-7fb4641f0000 r--p 00030000 08:02 137955 /usr/lib/x86_64-linux-gnu/libhcrypto.so.4.1.0
7fb4641f0000-7fb4641f1000 rw-p 00031000 08:02 137955 /usr/lib/x86_64-linux-gnu/libhcrypto.so.4.1.0
7fb4641f1000-7fb4641f2000 rw-p 00000000 00:00 0
7fb4641f2000-7fb464291000 r-xp 00000000 08:02 137952 /usr/lib/x86_64-linux-gnu/libasn1.so.8.0.0
7fb464291000-7fb464490000 ---p 0009f000 08:02 137952 /usr/lib/x86_64-linux-gnu/libasn1.so.8.0.0
7fb464490000-7fb464491000 r--p 0009e000 08:02 137952 /usr/lib/x86_64-linux-gnu/libasn1.so.8.0.0
7fb464491000-7fb464494000 rw-p 0009f000 08:02 137952 /usr/lib/x86_64-linux-gnu/libasn1.so.8.0.0
7fb464494000-7fb464518000 r-xp 00000000 08:02 137974 /usr/lib/x86_64-linux-gnu/libkrb5.so.26.0.0
7fb464518000-7fb464717000 ---p 00084000 08:02 137974 /usr/lib/x86_64-linux-gnu/libkrb5.so.26.0.0
7fb464717000-7fb46471a000 r--p 00083000 08:02 137974 /usr/lib/x86_64-linux-gnu/libkrb5.so.26.0.0
7fb46471a000-7fb46471d000 rw-p 00086000 08:02 137974 /usr/lib/x86_64-linux-gnu/libkrb5.so.26.0.0
7fb46471d000-7fb46471e000 rw-p 00000000 00:00 0
7fb46471e000-7fb464726000 r-xp 00000000 08:02 137979 /usr/lib/x86_64-linux-gnu/libheimntlm.so.0.1.0
7fb464726000-7fb464925000 ---p 00008000 08:02 137979 /usr/lib/x86_64-linux-gnu/libheimntlm.so.0.1.0
7fb464925000-7fb464926000 r--p 00007000 08:02 137979 /usr/lib/x86_64-linux-gnu/libheimntlm.so.0.1.0
7fb464926000-7fb464927000 rw-p 00008000 08:02 137979 /usr/lib/x86_64-linux-gnu/libheimntlm.so.0.1.0
7fb464927000-7fb46492a000 r-xp 00000000 08:02 660056 /lib/x86_64-linux-gnu/libkeyutils.so.1.5
7fb46492a000-7fb464b29000 ---p 00003000 08:02 660056 /lib/x86_64-linux-gnu/libkeyutils.so.1.5
7fb464b29000-7fb464b2a000 r--p 00002000 08:02 660056 /lib/x86_64-linux-gnu/libkeyutils.so.1.5
7fb464b2a000-7fb464b2b000 rw-p 00003000 08:02 660056 /lib/x86_64-linux-gnu/libkeyutils.so.1.5
7fb464b2b000-7fb464b3c000 r-xp 00000000 08:02 131781 /usr/lib/x86_64-linux-gnu/libtasn1.so.6.5.1
7fb464b3c000-7fb464d3c000 ---p 00011000 08:02 131781 /usr/lib/x86_64-linux-gnu/libtasn1.so.6.5.1
7fb464d3c000-7fb464d3d000 r--p 00011000 08:02 131781 /usr/lib/x86_64-linux-gnu/libtasn1.so.6.5.1
7fb464d3d000-7fb464d3e000 rw-p 00012000 08:02 131781 /usr/lib/x86_64-linux-gnu/libtasn1.so.6.5.1
7fb464d3e000-7fb464d97000 r-xp 00000000 08:02 137411 /usr/lib/x86_64-linux-gnu/libp11-kit.so.0.1.0
7fb464d97000-7fb464f96000 ---p 00059000 08:02 137411 /usr/lib/x86_64-linux-gnu/libp11-kit.so.0.1.0
7fb464f96000-7fb464fa0000 r--p 00058000 08:02 137411 /usr/lib/x86_64-linux-gnu/libp11-kit.so.0.1.0
7fb464fa0000-7fb464fa2000 rw-p 00062000 08:02 137411 /usr/lib/x86_64-linux-gnu/libp11-kit.so.0.1.0
7fb464fa2000-7fb464fdf000 r-xp 00000000 08:02 137984 /usr/lib/x86_64-linux-gnu/libgssapi.so.3.0.0
7fb464fdf000-7fb4651df000 ---p 0003d000 08:02 137984 /usr/lib/x86_64-linux-gnu/libgssapi.so.3.0.0
7fb4651df000-7fb4651e0000 r--p 0003d000 08:02 137984 /usr/lib/x86_64-linux-gnu/libgssapi.so.3.0.0
7fb4651e0000-7fb4651e2000 rw-p 0003e000 08:02 137984 /usr/lib/x86_64-linux-gnu/libgssapi.so.3.0.0
7fb4651e2000-7fb4651e3000 rw-p 00000000 00:00 0
7fb4651e3000-7fb4651fc000 r-xp 00000000 08:02 132347 /usr/lib/x86_64-linux-gnu/libsasl2.so.2.0.25
7fb4651fc000-7fb4653fc000 ---p 00019000 08:02 132347 /usr/lib/x86_64-linux-gnu/libsasl2.so.2.0.25
7fb4653fc000-7fb4653fd000 r--p 00019000 08:02 132347 /usr/lib/x86_64-linux-gnu/libsasl2.so.2.0.25
7fb4653fd000-7fb4653fe000 rw-p 0001a000 08:02 132347 /usr/lib/x86_64-linux-gnu/libsasl2.so.2.0.25
7fb4653fe000-7fb465415000 r-xp 00000000 08:02 655434 /lib/x86_64-linux-gnu/libresolv-2.23.so
7fb465415000-7fb465615000 ---p 00017000 08:02 655434 /lib/x86_64-linux-gnu/libresolv-2.23.so
7fb465615000-7fb465616000 r--p 00017000 08:02 655434 /lib/x86_64-linux-gnu/libresolv-2.23.so
7fb465616000-7fb465617000 rw-p 00018000 08:02 655434 /lib/x86_64-linux-gnu/libresolv-2.23.so
7fb465617000-7fb465619000 rw-p 00000000 00:00 0
7fb465619000-7fb465623000 r-xp 00000000 08:02 134938 /usr/lib/x86_64-linux-gnu/libkrb5support.so.0.1
7fb465623000-7fb465822000 ---p 0000a000 08:02 134938 /usr/lib/x86_64-linux-gnu/libkrb5support.so.0.1
7fb465822000-7fb465823000 r--p 00009000 08:02 134938 /usr/lib/x86_64-linux-gnu/libkrb5support.so.0.1
7fb465823000-7fb465824000 rw-p 0000a000 08:02 134938 /usr/lib/x86_64-linux-gnu/libkrb5support.so.0.1
7fb465824000-7fb465827000 r-xp 00000000 08:02 654290 /lib/x86_64-linux-gnu/libcom_err.so.2.1
7fb465827000-7fb465a26000 ---p 00003000 08:02 654290 /lib/x86_64-linux-gnu/libcom_err.so.2.1
7fb465a26000-7fb465a27000 r--p 00002000 08:02 654290 /lib/x86_64-linux-gnu/libcom_err.so.2.1
7fb465a27000-7fb465a28000 rw-p 00003000 08:02 654290 /lib/x86_64-linux-gnu/libcom_err.so.2.1
7fb465a28000-7fb465a54000 r-xp 00000000 08:02 132856 /usr/lib/x86_64-linux-gnu/libk5crypto.so.3.1
7fb465a54000-7fb465c53000 ---p 0002c000 08:02 132856 /usr/lib/x86_64-linux-gnu/libk5crypto.so.3.1
7fb465c53000-7fb465c55000 r--p 0002b000 08:02 132856 /usr/lib/x86_64-linux-gnu/libk5crypto.so.3.1
7fb465c55000-7fb465c56000 rw-p 0002d000 08:02 132856 /usr/lib/x86_64-linux-gnu/libk5crypto.so.3.1
7fb465c56000-7fb465c57000 rw-p 00000000 00:00 0
7fb465c57000-7fb465d1a000 r-xp 00000000 08:02 137943 /usr/lib/x86_64-linux-gnu/libkrb5.so.3.3
7fb465d1a000-7fb465f1a000 ---p 000c3000 08:02 137943 /usr/lib/x86_64-linux-gnu/libkrb5.so.3.3
7fb465f1a000-7fb465f27000 r--p 000c3000 08:02 137943 /usr/lib/x86_64-linux-gnu/libkrb5.so.3.3
7fb465f27000-7fb465f29000 rw-p 000d0000 08:02 137943 /usr/lib/x86_64-linux-gnu/libkrb5.so.3.3
7fb465f29000-7fb465fa8000 r-xp 00000000 08:02 133173 /usr/lib/x86_64-linux-gnu/libgmp.so.10.3.0
7fb465fa8000-7fb4661a7000 ---p 0007f000 08:02 133173 /usr/lib/x86_64-linux-gnu/libgmp.so.10.3.0
7fb4661a7000-7fb4661a8000 r--p 0007e000 08:02 133173 /usr/lib/x86_64-linux-gnu/libgmp.so.10.3.0
7fb4661a8000-7fb4661a9000 rw-p 0007f000 08:02 133173 /usr/lib/x86_64-linux-gnu/libgmp.so.10.3.0
7fb4661a9000-7fb4661dd000 r-xp 00000000 08:02 133193 /usr/lib/x86_64-linux-gnu/libnettle.so.6.2
7fb4661dd000-7fb4663dc000 ---p 00034000 08:02 133193 /usr/lib/x86_64-linux-gnu/libnettle.so.6.2
7fb4663dc000-7fb4663de000 r--p 00033000 08:02 133193 /usr/lib/x86_64-linux-gnu/libnettle.so.6.2
7fb4663de000-7fb4663df000 rw-p 00035000 08:02 133193 /usr/lib/x86_64-linux-gnu/libnettle.so.6.2
7fb4663df000-7fb466411000 r-xp 00000000 08:02 133179 /usr/lib/x86_64-linux-gnu/libhogweed.so.4.2
7fb466411000-7fb466610000 ---p 00032000 08:02 133179 /usr/lib/x86_64-linux-gnu/libhogweed.so.4.2
7fb466610000-7fb466611000 r--p 00031000 08:02 133179 /usr/lib/x86_64-linux-gnu/libhogweed.so.4.2
7fb466611000-7fb466612000 rw-p 00032000 08:02 133179 /usr/lib/x86_64-linux-gnu/libhogweed.so.4.2
7fb466612000-7fb466735000 r-xp 00000000 08:02 134875 /usr/lib/x86_64-linux-gnu/libgnutls.so.30.6.2
7fb466735000-7fb466934000 ---p 00123000 08:02 134875 /usr/lib/x86_64-linux-gnu/libgnutls.so.30.6.2
7fb466934000-7fb46693f000 r--p 00122000 08:02 134875 /usr/lib/x86_64-linux-gnu/libgnutls.so.30.6.2
7fb46693f000-7fb466941000 rw-p 0012d000 08:02 134875 /usr/lib/x86_64-linux-gnu/libgnutls.so.30.6.2
7fb466941000-7fb466942000 rw-p 00000000 00:00 0
7fb466942000-7fb46698f000 r-xp 00000000 08:02 131120 /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2.10.5
7fb46698f000-7fb466b8e000 ---p 0004d000 08:02 131120 /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2.10.5
7fb466b8e000-7fb466b90000 r--p 0004c000 08:02 131120 /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2.10.5
7fb466b90000-7fb466b91000 rw-p 0004e000 08:02 131120 /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2.10.5
7fb466b91000-7fb466b93000 rw-p 00000000 00:00 0
7fb466b93000-7fb466ba0000 r-xp 00000000 08:02 131171 /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2.10.5
7fb466ba0000-7fb466da0000 ---p 0000d000 08:02 131171 /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2.10.5
7fb466da0000-7fb466da1000 r--p 0000d000 08:02 131171 /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2.10.5
7fb466da1000-7fb466da2000 rw-p 0000e000 08:02 131171 /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2.10.5
7fb466da2000-7fb466de9000 r-xp 00000000 08:02 137935 /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2.2
7fb466de9000-7fb466fe8000 ---p 00047000 08:02 137935 /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2.2
7fb466fe8000-7fb466fea000 r--p 00046000 08:02 137935 /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2.2
7fb466fea000-7fb466fec000 rw-p 00048000 08:02 137935 /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2.2
7fb466fec000-7fb467207000 r-xp 00000000 08:02 654731 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fb467207000-7fb467406000 ---p 0021b000 08:02 654731 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fb467406000-7fb467422000 r--p 0021a000 08:02 654731 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fb467422000-7fb46742e000 rw-p 00236000 08:02 654731 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
7fb46742e000-7fb467431000 rw-p 00000000 00:00 0
7fb467431000-7fb46748f000 r-xp 00000000 08:02 654734 /lib/x86_64-linux-gnu/libssl.so.1.0.0
7fb46748f000-7fb46768f000 ---p 0005e000 08:02 654734 /lib/x86_64-linux-gnu/libssl.so.1.0.0
7fb46768f000-7fb467693000 r--p 0005e000 08:02 654734 /lib/x86_64-linux-gnu/libssl.so.1.0.0
7fb467693000-7fb467699000 rw-p 00062000 08:02 654734 /lib/x86_64-linux-gnu/libssl.so.1.0.0
7fb467699000-7fb4676b4000 r-xp 00000000 08:02 135071 /usr/lib/x86_64-linux-gnu/librtmp.so.1
7fb4676b4000-7fb4678b3000 ---p 0001b000 08:02 135071 /usr/lib/x86_64-linux-gnu/librtmp.so.1
7fb4678b3000-7fb4678b4000 r--p 0001a000 08:02 135071 /usr/lib/x86_64-linux-gnu/librtmp.so.1
7fb4678b4000-7fb4678b5000 rw-p 0001b000 08:02 135071 /usr/lib/x86_64-linux-gnu/librtmp.so.1
7fb4678b5000-7fb4678e6000 r-xp 00000000 08:02 161247 /usr/lib/x86_64-linux-gnu/libidn.so.11.6.15
7fb4678e6000-7fb467ae6000 ---p 00031000 08:02 161247 /usr/lib/x86_64-linux-gnu/libidn.so.11.6.15
7fb467ae6000-7fb467ae7000 r--p 00031000 08:02 161247 /usr/lib/x86_64-linux-gnu/libidn.so.11.6.15
7fb467ae7000-7fb467ae8000 rw-p 00032000 08:02 161247 /usr/lib/x86_64-linux-gnu/libidn.so.11.6.15
7fb467ae8000-7fb467aeb000 r-xp 00000000 08:02 654528 /lib/x86_64-linux-gnu/libdl-2.23.so
7fb467aeb000-7fb467cea000 ---p 00003000 08:02 654528 /lib/x86_64-linux-gnu/libdl-2.23.so
7fb467cea000-7fb467ceb000 r--p 00002000 08:02 654528 /lib/x86_64-linux-gnu/libdl-2.23.so
7fb467ceb000-7fb467cec000 rw-p 00003000 08:02 654528 /lib/x86_64-linux-gnu/libdl-2.23.so
7fb467cec000-7fb467eac000 r-xp 00000000 08:02 654530 /lib/x86_64-linux-gnu/libc-2.23.so
7fb467eac000-7fb4680ac000 ---p 001c0000 08:02 654530 /lib/x86_64-linux-gnu/libc-2.23.so
7fb4680ac000-7fb4680b0000 r--p 001c0000 08:02 654530 /lib/x86_64-linux-gnu/libc-2.23.so
7fb4680b0000-7fb4680b2000 rw-p 001c4000 08:02 654530 /lib/x86_64-linux-gnu/libc-2.23.so
7fb4680b2000-7fb4680b6000 rw-p 00000000 00:00 0
7fb4680b6000-7fb468124000 r-xp 00000000 08:02 137680 /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0
7fb468124000-7fb468324000 ---p 0006e000 08:02 137680 /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0
7fb468324000-7fb468327000 r--p 0006e000 08:02 137680 /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0
7fb468327000-7fb468328000 rw-p 00071000 08:02 137680 /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0
7fb468328000-7fb468349000 r-xp 00000000 08:02 654615 /lib/x86_64-linux-gnu/liblzma.so.5.0.0
7fb468349000-7fb468548000 ---p 00021000 08:02 654615 /lib/x86_64-linux-gnu/liblzma.so.5.0.0
7fb468548000-7fb468549000 r--p 00020000 08:02 654615 /lib/x86_64-linux-gnu/liblzma.so.5.0.0
7fb468549000-7fb46854a000 rw-p 00021000 08:02 654615 /lib/x86_64-linux-gnu/liblzma.so.5.0.0
7fb46854a000-7fb468559000 r-xp 00000000 08:02 654186 /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fb468559000-7fb468758000 ---p 0000f000 08:02 654186 /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fb468758000-7fb468759000 r--p 0000e000 08:02 654186 /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fb468759000-7fb46875a000 rw-p 0000f000 08:02 654186 /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fb46875a000-7fb468773000 r-xp 00000000 08:02 654834 /lib/x86_64-linux-gnu/libz.so.1.2.8
7fb468773000-7fb468972000 ---p 00019000 08:02 654834 /lib/x86_64-linux-gnu/libz.so.1.2.8
7fb468972000-7fb468973000 r--p 00018000 08:02 654834 /lib/x86_64-linux-gnu/libz.so.1.2.8
7fb468973000-7fb468974000 rw-p 00019000 08:02 654834 /lib/x86_64-linux-gnu/libz.so.1.2.8
7fb468974000-7fb46898c000 r-xp 00000000 08:02 654534 /lib/x86_64-linux-gnu/libpthread-2.23.so
7fb46898c000-7fb468b8b000 ---p 00018000 08:02 654534 /lib/x86_64-linux-gnu/libpthread-2.23.so
7fb468b8b000-7fb468b8c000 r--p 00017000 08:02 654534 /lib/x86_64-linux-gnu/libpthread-2.23.so
7fb468b8c000-7fb468b8d000 rw-p 00018000 08:02 654534 /lib/x86_64-linux-gnu/libpthread-2.23.so
7fb468b8d000-7fb468b91000 rw-p 00000000 00:00 0
7fb468b91000-7fb468bb7000 r-xp 00000000 08:02 655423 /lib/x86_64-linux-gnu/ld-2.23.so
7fb468d88000-7fb468d9b000 rw-p 00000000 00:00 0
7fb468db5000-7fb468db6000 rw-p 00000000 00:00 0
7fb468db6000-7fb468db7000 r--p 00025000 08:02 655423 /lib/x86_64-linux-gnu/ld-2.23.so
7fb468db7000-7fb468db8000 rw-p 00026000 08:02 655423 /lib/x86_64-linux-gnu/ld-2.23.so
7fb468db8000-7fb468db9000 rw-p 00000000 00:00 0
7ffd617cb000-7ffd6181f000 rw-p 00000000 00:00 0 [stack]
7ffd61932000-7ffd61934000 r--p 00000000 00:00 0 [vvar]
7ffd61934000-7ffd61936000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-fffffffff

Sorry for the long post but I have though the above may be helpful. This is compiled with gcc:

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609

installing via conda

Dear Gene, thanks for making FASTK!

Would you consider making FASTK available also via conda?

I tried to compile it on our cluster, run into some errors and I bet I am not the only one. Having it on conda would make the life of loads of people easier.

Unexpected relative profile results (possible user error)

Synopsis: I have a contrived example with a short genome and some perfect reads sampled from that genome. I'm using the relative profile feature to get counts of read kmers across the genome. The results are mostly 1s with a few short stretches of higher counts. The 1s conflict with the result I expect.

Details:

The example can be found at https://github.com/rsharris/fake_reads_2021. apple.fa is a random genome of length 3K. orange.fa is 80 fake reads, each 300 bp, sampled from apple with no sequencing error, at random positions and with random orientation. Average depth โ‰ˆ8X.

Running this
FastK -k40 -t1 apple.fa
FastK -k40 -t1 orange.fa
FastK -k40 -t1 -p:orange.ktab apple.fa
Profex apple.prof 1 > orange_on_apple.profile
I expected to get a count profile that roughly matched the coverage profile I would get from piling up orange to apple alignments (realizing that the 40-mers absent at the end of the reads would reduce avg count to โ‰ˆ7X).

(orange_on_apple.profile is in that same repo)
Instead, I get mostly 1s with occasional short bursts of higher numbers. I'm at a loss to understand this. Am I missing something in how I am using the tools? I know the motivation for profiles was to profile along reads, not along genomes, so maybe there's a difference I'm not understanding.

I wonder if the fact that the kmer counts in apple itself are all 1 is relevant.

FWIW I'm using a fetch of the repo from about an hour ago.

FastK: Segmentation fault with small data

Hello,
On the latest master f18a4e6 , I was trying to test the FastK with a small test data. Then I got the following Segmentation fault. Do you have any idea about how to fix it? Thank you.

$ pwd
/home/jaruga/git/FASTK

$ gcc --version
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-1)
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ make
$ cat fastk_source.fasta 
>FastK input source
ACTG
$ ./FastK -v fastk_source.fasta 

Partitioning 1 .fasta file into 4 parts
  File is so small will use only 1 thread 

Determining minimizer scheme & partition for fastk_source

FastK: Warming Sequences are on average smaller than 1.5x k-mer size!
  Estimate -0.036K 40-mers
  Handling data in a single block
  Using 5-minimizers with 1024 core prefixes

Phase 1: Partitioning K-mers into 1 Super-mer Files

  There are 1 reads totalling 4 bps

     Part:  40-mer  super-mers  ave. length
        0:       0           0           NA
      Sum:       0           0           NA

      Range 0 - 0

  Resources for phase:  0.000u  0.001s  0.001w  127.8%

Phase 2: Sorting & Counting K-mers in 1 blocks

  Processing block 1: Sorting super-mers     Segmentation fault (core dumped)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.