Giter VIP home page Giter VIP logo

spoa's People

Contributors

adrianbunk avatar ekg avatar mbrcic avatar rvaser avatar soapza avatar tbrekalo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spoa's Issues

Conda installation not working

Hi, spoa conda installation exited with a segmentation fault (core dump) error when running it with a very small dataset composed of 600bp-long reads. After installing spoa from Github, it worked fine. May the issue be related to this from racon?
Thanks,
Simone

Option to build without "-march=native"

The code builds with "-march=native" which is best for performance but reduces portability. There are use cases where portability is more important than performance, and therefore it would be nice to also have the option to build without "-march=native".

Some questions regarding multiple alignment behavior

This is not a description of an issue found with the repository - just some questions about the multiple alignment functionality:

  • The result is generally dependent on the order in which the input sequences are added to the alignment. Is this expected? If so, are there any suggestions on the ordering most likely to result in the best alignment quality? E. g. longest first, shortest first, anything else?
  • If a sequence is added to an alignment multiple times consecutively, does the number of times affect the result? That is, does the alignment give more weight to input strings that are added many times? If not, it seems that it would be ok to enter each distinct input string only once.
  • I have cases where all of the input strings have a common prefix, but the common prefix is not preserved in the alignment. The same for a common suffix. Is this behavior expected? This also seems to be order dependent.

parameter setting to print out POA

Hi Robert,

Spoa seems like a great tool/library that were looking to use in our next project. Would it be possible for you to add a parameter setting (or add a setting to the -r) to print the POA graph? Regarding the format I maybe it could be an adjacency list or similar, with weights in each-edge tuple.

I'm assuming all the information necessary is in the structure graph = spoa::createGraph(); but I'm unfortunately not proficient in C/C++ and looking to call spoa from python.

Minimum coverage

First up, great work!

Is there any chance you can add a minimum coverage requirement before calling a base? The idea behind that, if I have 10 reads with a large deletion and 2 with a partial deletion, I only want to call bases in the consensus with at least coverage of, e.g., 5.

Thank you!

make error

hello there,
I had a make error with error msgs like below, any idea?

In file included from /public/home/yangzhzh/tools_zz/spoa/src/sisd_alignment_engine.hpp:14,
from /public/home/yangzhzh/tools_zz/spoa/src/alignment_engine.cpp:12:
/public/home/yangzhzh/tools_zz/spoa/include/spoa/alignment_engine.hpp:30: error: expected nested-name-specifier before 'Alignment'
/public/home/yangzhzh/tools_zz/spoa/include/spoa/alignment_engine.hpp:30: error: 'Alignment' has not been declared
/public/home/yangzhzh/tools_zz/spoa/include/spoa/alignment_engine.hpp:30: error: expected ';' before '=' token
/public/home/yangzhzh/tools_zz/spoa/include/spoa/alignment_engine.hpp:30: error: expected unqualified-id before '=' token
/public/home/yangzhzh/tools_zz/spoa/include/spoa/alignment_engine.hpp:50: error: 'Alignment' does not name a type
/public/home/yangzhzh/tools_zz/spoa/include/spoa/alignment_engine.hpp:53: error: 'Alignment' does not name a type
In file included from /public/home/yangzhzh/tools_zz/spoa/src/alignment_engine.cpp:12:
/public/home/yangzhzh/tools_zz/spoa/src/sisd_alignment_engine.hpp:30: error: expected ';' before 'override'
/public/home/yangzhzh/tools_zz/spoa/src/sisd_alignment_engine.hpp:32: error: 'Alignment' does not name a type
/public/home/yangzhzh/tools_zz/spoa/src/sisd_alignment_engine.hpp:47: error: 'Alignment' does not name a type
/public/home/yangzhzh/tools_zz/spoa/src/sisd_alignment_engine.hpp:50: error: 'Alignment' does not name a type
/public/home/yangzhzh/tools_zz/spoa/src/sisd_alignment_engine.hpp:53: error: 'Alignment' does not name a type
/public/home/yangzhzh/tools_zz/spoa/src/sisd_alignment_engine.hpp:60: error: expected ';' before 'noexcept'
In file included from /public/home/yangzhzh/tools_zz/spoa/src/alignment_engine.cpp:13:
/public/home/yangzhzh/tools_zz/spoa/src/simd_alignment_engine.hpp:43: error: expected ';' before 'override'
/public/home/yangzhzh/tools_zz/spoa/src/simd_alignment_engine.hpp:45: error: 'Alignment' does not name a type
/public/home/yangzhzh/tools_zz/spoa/src/simd_alignment_engine.hpp:61: error: expected constructor, destructor, or type conversion before 'linear'
/public/home/yangzhzh/tools_zz/spoa/src/simd_alignment_engine.hpp:65: error: expected constructor, destructor, or type conversion before 'affine'
/public/home/yangzhzh/tools_zz/spoa/src/simd_alignment_engine.hpp:69: error: expected constructor, destructor, or type conversion before 'convex'
/public/home/yangzhzh/tools_zz/spoa/src/simd_alignment_engine.hpp:78: error: expected initializer before 'noexcept'
/public/home/yangzhzh/tools_zz/spoa/src/alignment_engine.cpp: In function 'std::unique_ptr<spoa::AlignmentEngine, std::default_deletespoa::AlignmentEngine > spoa::createAlignmentEngine(spoa::AlignmentType, int8_t, int8_t, int8_t, int8_t, int8_t, int8_t)':
/public/home/yangzhzh/tools_zz/spoa/src/alignment_engine.cpp:64: error: 'nullptr' was not declared in this scope
/public/home/yangzhzh/tools_zz/spoa/src/alignment_engine.cpp: At global scope:
/public/home/yangzhzh/tools_zz/spoa/src/alignment_engine.cpp:77: error: 'Alignment' does not name a type
make[2]: *** [CMakeFiles/spoa.dir/src/alignment_engine.cpp.o] Error 1
make[1]: *** [CMakeFiles/spoa.dir/all] Error 2
make: *** [all] Error 2

thanks a lot,
Zhenzhen

adding the library

Hello,

I'm trying to use the library in my C++ code. I copied "/include/spoa" into my project folder and compiled the "libspoa.a". When I add your sample code segment into my code and try to compile, I get the error below. This is the case when using mac os. But it works fine with Ubuntu.

Is there any way to solve this in mac?

Undefined symbols for architecture arm64: "__ZN4spoa15AlignmentEngine5AlignERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_5GraphEPi", referenced from: __Z3tstv in svarp.cpp.o "__ZN4spoa5Graph12AddAlignmentERKSt6vectorISt4pairIiiESaIS3_EERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEj", referenced from: __Z3tstv in svarp.cpp.o "__ZN4spoa5Graph17GenerateConsensusB5cxx11Ev", referenced from: __Z3tstv in svarp.cpp.o "__ZN4spoa5Graph33GenerateMultipleSequenceAlignmentB5cxx11Eb", referenced from: __Z3tstv in svarp.cpp.o "__ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv", referenced from: __ZNK4spoa5Graph21IsTopologicallySortedEv in libspoa.a(graph.cpp.o) __ZNK4spoa5Graph15ExtractSubgraphEPKNS0_4NodeES3_ in libspoa.a(graph.cpp.o) __ZNKSt3__113__vector_baseIjNS_9allocatorIjEEE20__throw_length_errorEv in libspoa.a(graph.cpp.o) __ZNKSt3__113__vector_baseIiNS_9allocatorIiEEE20__throw_length_errorEv in libspoa.a(graph.cpp.o) __ZNKSt3__113__vector_baseINS_10unique_ptrIN4spoa5Graph4NodeENS_14default_deleteIS4_EEEENS_9allocatorIS7_EEE20__throw_length_errorEv in libspoa.a(graph.cpp.o) __ZNKSt3__113__vector_baseINS_10unique_ptrIN4spoa5Graph4EdgeENS_14default_deleteIS4_EEEENS_9allocatorIS7_EEE20__throw_length_errorEv in libspoa.a(graph.cpp.o) __ZNKSt3__113__vector_baseIPN4spoa5Graph4EdgeENS_9allocatorIS4_EEE20__throw_length_errorEv in libspoa.a(graph.cpp.o) ... "__ZNKSt3__16locale9use_facetERNS0_2idE", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m in libspoa.a(graph.cpp.o) "__ZNKSt3__18ios_base6getlocEv", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m in libspoa.a(graph.cpp.o) "__ZNSt3__112__next_primeEm", referenced from: __ZNSt3__1L4copyINS_11__wrap_iterIPjEENS_15insert_iteratorINS_13unordered_setIjNS_4hashIjEENS_8equal_toIjEENS_9allocatorIjEEEEEEEET0_T_SF_SE_ in libspoa.a(graph.cpp.o) "__ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE9push_backEc", referenced from: __ZN4spoa5Graph17GenerateConsensusEv in libspoa.a(graph.cpp.o) __ZN4spoa5Graph17GenerateConsensusEPNSt3__16vectorIjNS1_9allocatorIjEEEEb in libspoa.a(graph.cpp.o) "__ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1ERKS5_", referenced from: __ZNSt3__16vectorINS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEENS4_IS6_EEE12emplace_backIJRS6_EEEvDpOT_ in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_filebufIcNS_11char_traitsIcEEE4openEPKcj", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_filebufIcNS_11char_traitsIcEEE5closeEv", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_filebufIcNS_11char_traitsIcEEEC1Ev", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_filebufIcNS_11char_traitsIcEEED1Ev", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE3putEc", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE5flushEv", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE6sentryC1ERS3_", referenced from: __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE6sentryD1Ev", referenced from: __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEED2Ev", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEElsEj", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEElsEm", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEElsEx", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__15ctypeIcE2idE", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m in libspoa.a(graph.cpp.o) "__ZNSt3__16localeD1Ev", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m in libspoa.a(graph.cpp.o) "__ZNSt3__18ios_base33__set_badbit_and_consider_rethrowEv", referenced from: __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m in libspoa.a(graph.cpp.o) "__ZNSt3__18ios_base4initEPv", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZNSt3__18ios_base5clearEj", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) __ZNSt3__124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m in libspoa.a(graph.cpp.o) "__ZNSt3__19basic_iosIcNS_11char_traitsIcEEED2Ev", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZTTNSt3__114basic_ofstreamIcNS_11char_traitsIcEEEE", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) "__ZTVNSt3__114basic_ofstreamIcNS_11char_traitsIcEEEE", referenced from: __ZNK4spoa5Graph8PrintDotERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE in libspoa.a(graph.cpp.o) NOTE: a missing vtable usually means the first non-inline virtual member function has no definition. ld: symbol(s) not found for architecture arm64 collect2: error: ld returned 1 exit status

Thank you.

Convex gap penalties

Hi again,

I was wondering if it would be possible to explore convex gap penalties within spoa. To give a concrete example: penalize a gap as log(gap_size), meaning that it would be cheap to extend gaps to favor longer insertions than shorter, the figure attached below illustrates this.

While this might be nontrivial to implement, it's definitely needed by the community expecting structural gaps in their alignments (transcripts or structural variants). I believe work in this direction will be immensely helpful and publishable. If you don't have time looking into this, I would appreciate your opinion as to whether this is possible to implement within spoa?

This idea has been investigated by regular aligners NGMLR, but would be, to my knowledge, novel in a POA alignment strategy.

image (Image taken from https://medium.com/pacbio/visualizing-the-chaos-of-cancer-one-tool-at-a-time-a9e083f8bc31)

Inconsistent output directory of libspoa.a

Hi Robert,

I recently incorporated spoa into a project of mine using CMake and the ExternalProject functionality of CMake. I would like to make sure that the lib/libspoa.a is consistently located, so it can be used to initialize the CMake library:

https://github.com/vgteam/GetBlunted/blob/025bf7b7ea2b564495019129c4f22f1902beab68/CMakeLists.txt#L274

However, it appears that this directory is renamed lib64 in certain contexts. The case where this happens for me is on CentOS 7-8.2003.0.el7.centos.x86_64 with the following CPU:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 46
model name  : Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz
stepping    : 6
microcode   : 0xd
cpu MHz     : 2260.911
cache size  : 24576 KB
physical id : 0
siblings    : 16
core id     : 0
cpu cores   : 8
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida spec_ctrl intel_stibp flush_l1d
bogomips    : 4521.82
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual

It does not happen on MacOS (unknown CPU) or on ubuntu 18.04 with the following CPU:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 158
model name	: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
stepping	: 9
microcode	: 0xde
cpu MHz		: 800.007
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds
bogomips	: 8400.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:
processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6

Is there any way to predict when lib will be renamed to lib64 ?

Memory usage

Hi,
spoa is a fantastic tool, but the memory usage is a bit high. Is there any way to adjust the parameters to make it smaller?

Question: Obtain base qualities for the consensus sequence

I was wondering if @rvaser had any thoughts on how to obtain a base quality value for each base in the returned consensus sequence. I was thinking it could be one of two things, but there surely are more:

  1. In the simplest form, the consensus base quality is the probability of the consensus base being wrong given only the non-N input bases that cover/overlap/align-to that consensus base. The probability of error for each input base could either be a fixed value (FASTA) or the input base quality (FASTQ).

  2. In a more complicated form, the consensus base quality could be produced using base alignment quality (BAQ). This captures both the probability of the base being wrong, as well as the base being misaligned.

My intended use case for spoa and the consensus call is in DNA sequencing, where we know that a set of sequencing reads all observe (originate from) the same source molecule, using molecular tagging (i.e. unique molecular identifiers or molecular barcodes). In this case, all the sequencing reads should be the same, but due various errors in the sample and library preparation, as well as the act of sequencing, we get random and systematic errors. The former are what consensus calling could squash (ex. https://github.com/mikessh/mageri). I am working on incoporating MSA into (see these development branches in callerpp and fgbio. I'd be happy to discuss further over email if that works.

SOVERSION is something else than version of the software

Hi,
I have packaged spoa for Debian and intend to upgrade the package to your latest upstream version 1.1.5. I realised that your are simply re-using the version of the software as soversion. This is not how soversion should be dealt with. It should only be bumped in case of an ABI change and if a new version does not change the ABI the soversion should stay the same. While I realised that there is actually an ABI change (there are some symbols missing) I will simply stick to this to not derive from your code too much. But in case you will release a new version of spoa without any ABI change (ABI changes should be restrictet to those that are really necessary anyway) I'd recommend to stick to soversion 1.1.5 and for the next ABI change I'd recommend soversion 2 (or something like this).
Kind regards, Andreas.

Unexpected alignment results

The attached program uses spoa to compute an MSA of 6 sequences. The sequences only differ in the length of a homopolymer run consisting of a number of T's between 14 and 19. Each sequence is entered in the alignment with a weight. The sequences are entered in order of decreasing weight.

The input sequences and weights are as follows:

    {"GACAACCTGTTTTTTTTTTTTTTTTGAGA", 6},    // T16, weight 6
    {"GACAACCTGTTTTTTTTTTTTTTTGAGA", 5},     // T15, weight 5
    {"GACAACCTGTTTTTTTTTTTTTTGAGA", 4},      // T14, weight 4
    {"GACAACCTGTTTTTTTTTTTTTTTTTTGAGA", 3},  // T18, weight 3
    {"GACAACCTGTTTTTTTTTTTTTTTTTTTGAGA", 2}, // T19, weight 2
    {"GACAACCTGTTTTTTTTTTTTTTTTTGAGA", 2},   // T17, weight 2

The computed consensus is

GACAACCTGTTTTTTTTTTTTTTTTTTGAGA

This has T18. This is unexpected, as the sequence with T18 has weight 3, while sequences shorter than T18 have total weight 17, and sequences longer than T18 have weight just 2. The weighted average of the input lengths of the homopolymer runs is 16.05. So I would have expected the consensus to have T16, or possibly T17, but certainly not T18.

Any suggestion on how to improve on this result? The reason for entering the sequences in order of decreasing weight is the result of some discussion we had in another issue a long time ago (when I was operating as GitHub user @paoloczi).

See the attached code for more details. It is a modified version of the test program on your README page. This was built with spoa from tag 4.0.8 and the following command line:

g++ -std=c++17 spoaTest.cpp -lspoa

spoaTest.cpp.gz

Install error: ‘Alignment’ has not been declared

Hi rvaser,

I try to install the tool and get these errors. Do you know how to fix it?
Thanks,
Haojing

io2 09:37:11 ~/bin/spoa/build
$ make
[ 33%] Building CXX object CMakeFiles/spoa.dir/src/alignment_engine.cpp.o
In file included from /public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp:3:
/public/home/shaohaojing/bin/spoa/include/spoa/alignment_engine.hpp:29: error: expected nested-name-specifier before ‘Alignment’
/public/home/shaohaojing/bin/spoa/include/spoa/alignment_engine.hpp:29: error: ‘Alignment’ has not been declared
/public/home/shaohaojing/bin/spoa/include/spoa/alignment_engine.hpp:29: error: expected ‘;’ before ‘=’ token
/public/home/shaohaojing/bin/spoa/include/spoa/alignment_engine.hpp:29: error: expected unqualified-id before ‘=’ token
/public/home/shaohaojing/bin/spoa/include/spoa/alignment_engine.hpp:61: error: ‘Alignment’ does not name a type
/public/home/shaohaojing/bin/spoa/include/spoa/alignment_engine.hpp:66: error: ‘Alignment’ does not name a type
In file included from /public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp:10:
/public/home/shaohaojing/bin/spoa/src/sisd_alignment_engine.hpp:20: error: ‘spoa::SisdAlignmentEngine::SisdAlignmentEngine(spoa::SisdAlignmentEngine&&)’ cannot be defaulted
/public/home/shaohaojing/bin/spoa/src/sisd_alignment_engine.hpp:21: error: ‘spoa::SisdAlignmentEngine& spoa::SisdAlignmentEngine::operator=(spoa::SisdAlignmentEngine&&)’ cannot be defaulted
/public/home/shaohaojing/bin/spoa/src/sisd_alignment_engine.hpp:37: error: expected ‘;’ before ‘override’
/public/home/shaohaojing/bin/spoa/src/sisd_alignment_engine.hpp:39: error: ‘Alignment’ does not name a type
/public/home/shaohaojing/bin/spoa/src/sisd_alignment_engine.hpp:55: error: ‘Alignment’ does not name a type
/public/home/shaohaojing/bin/spoa/src/sisd_alignment_engine.hpp:60: error: ‘Alignment’ does not name a type
/public/home/shaohaojing/bin/spoa/src/sisd_alignment_engine.hpp:65: error: ‘Alignment’ does not name a type
/public/home/shaohaojing/bin/spoa/src/sisd_alignment_engine.hpp:77: error: expected ‘;’ before ‘noexcept’
In file included from /public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp:11:
/public/home/shaohaojing/bin/spoa/src/simd_alignment_engine.hpp:34: error: ‘spoa::SimdAlignmentEngine::SimdAlignmentEngine(spoa::SimdAlignmentEngine&&)’ cannot be defaulted
/public/home/shaohaojing/bin/spoa/src/simd_alignment_engine.hpp:51: error: expected ‘;’ before ‘override’
/public/home/shaohaojing/bin/spoa/src/simd_alignment_engine.hpp:53: error: ‘Alignment’ does not name a type
/public/home/shaohaojing/bin/spoa/src/simd_alignment_engine.hpp:80: error: expected constructor, destructor, or type conversion before ‘Linear’
/public/home/shaohaojing/bin/spoa/src/simd_alignment_engine.hpp:86: error: expected constructor, destructor, or type conversion before ‘Affine’
/public/home/shaohaojing/bin/spoa/src/simd_alignment_engine.hpp:92: error: expected constructor, destructor, or type conversion before ‘Convex’
/public/home/shaohaojing/bin/spoa/src/simd_alignment_engine.hpp:108: error: expected initializer before ‘noexcept’
/public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp:94: error: ‘Alignment’ does not name a type
/public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp: In member function ‘int64_t spoa::AlignmentEngine::WorstCaseAlignmentScore(int64_t, int64_t) const’:
/public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp:104: error: expected primary-expression before ‘[’ token
/public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp:104: error: expected primary-expression before ‘]’ token
/public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp:104: error: expected primary-expression before ‘len’
/public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp:104: error: unable to deduce ‘auto’ from ‘’
/public/home/shaohaojing/bin/spoa/src/alignment_engine.cpp:104: error: expected ‘,’ or ‘;’ before ‘{’ token
make[2]: *** [CMakeFiles/spoa.dir/src/alignment_engine.cpp.o] Error 1
make[1]: *** [CMakeFiles/spoa.dir/all] Error 2
make: *** [all] Error 2

Unoptimal consensus

Hi, thank you for the useful tool.

I've noticed that sometimes consensus (with global alignment mode) tends to be the longest sequence from the set, even when it is clear that the "right answer" is not.
For example, for the test below I receive TTATAGTATATATTATATAATATATAAATATAATATACATTAAT as an answer consensus sequence, regardless of scoring functions (tried default, edit distance, and some others - i.e. -e -1 -g -8 -l 1 -m 10 -n -8 ) or reads order. MSA itself looks OK. Moving to local alignment did not help also.

Do you have some recommendations how to overcome this issue?

Seems that this issue mostly happens when there is an insertion in the beginning of one of the sequences, i.e. in test below there is extra T on first position. Possibly in such case "correct" paths through POA graph are not scored?

>1
TATAGTATATATTATATAATATATAATATAATATACATTAAT
>2
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>3
TATAGTATATATTATATAATATATAAATAAATATACATTAAT
>4
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>5
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>6
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>7
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>8
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>9
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>10
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>11
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>12
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>13
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>14
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>15
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>16
TTATAGTATATATTATATAATATATAAATATAATATACATTAAT
>17
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>18
TATAGTATATATTATATAATATATAAATATAATATACATTAAT
>19
TATAGTATATATTATATAATATATAAATATAATATACATTAAT

spoa: make errors

When install spoa, I get the following errors while run 'make'. I tried to install previous release package and it reported the same errors. And I really do not known what this is mean.

In file included from ~/spoa-v4.0.7/src/alignment_engine.cpp:3:
~/spoa-v4.0.7/include/spoa/alignment_engine.hpp:29: error: expected nested-name-specifier before 'Alignment'
~/spoa-v4.0.7/include/spoa/alignment_engine.hpp:29: error: 'Alignment' has not been declared
~/spoa-v4.0.7/include/spoa/alignment_engine.hpp:29: error: expected ';' before '=' token
~/spoa-v4.0.7/include/spoa/alignment_engine.hpp:29: error: expected unqualified-id before '=' token
~/spoa-v4.0.7/include/spoa/alignment_engine.hpp:61: error: 'Alignment' does not name a type
~/spoa-v4.0.7/include/spoa/alignment_engine.hpp:66: error: 'Alignment' does not name a type
In file included from ~/spoa-v4.0.7/src/alignment_engine.cpp:10:
~/spoa-v4.0.7/src/sisd_alignment_engine.hpp:20: error: 'spoa::SisdAlignmentEngine::SisdAlignmentEngine(spoa::SisdAlignmentEngine&&)' cannot be defaulted
~/spoa-v4.0.7/src/sisd_alignment_engine.hpp:21: error: 'spoa::SisdAlignmentEngine& spoa::SisdAlignmentEngine::operator=(spoa::SisdAlignmentEngine&&)' cannot be defaulted
~/spoa-v4.0.7/src/sisd_alignment_engine.hpp:37: error: expected ';' before 'override'
~/spoa-v4.0.7/src/sisd_alignment_engine.hpp:39: error: 'Alignment' does not name a type
~/spoa-v4.0.7/src/sisd_alignment_engine.hpp:55: error: 'Alignment' does not name a type
~/spoa-v4.0.7/src/sisd_alignment_engine.hpp:60: error: 'Alignment' does not name a type
~/spoa-v4.0.7/src/sisd_alignment_engine.hpp:65: error: 'Alignment' does not name a type
~/spoa-v4.0.7/src/sisd_alignment_engine.hpp:77: error: expected ';' before 'noexcept'
In file included from ~/spoa-v4.0.7/src/alignment_engine.cpp:11:
~/spoa-v4.0.7/src/simd_alignment_engine.hpp:34: error: 'spoa::SimdAlignmentEngine<A>::SimdAlignmentEngine(spoa::SimdAlignmentEngine<A>&&)' cannot be defaulted
~/spoa-v4.0.7/src/simd_alignment_engine.hpp:51: error: expected ';' before 'override'
~/spoa-v4.0.7/src/simd_alignment_engine.hpp:53: error: 'Alignment' does not name a type
~/spoa-v4.0.7/src/simd_alignment_engine.hpp:80: error: expected constructor, destructor, or type conversion before 'Linear'
~/spoa-v4.0.7/src/simd_alignment_engine.hpp:86: error: expected constructor, destructor, or type conversion before 'Affine'
~/spoa-v4.0.7/src/simd_alignment_engine.hpp:92: error: expected constructor, destructor, or type conversion before 'Convex'
~/spoa-v4.0.7/src/simd_alignment_engine.hpp:108: error: expected initializer before 'noexcept'
~/spoa-v4.0.7/src/alignment_engine.cpp:94: error: 'Alignment' does not name a type
~/spoa-v4.0.7/src/alignment_engine.cpp: In member function 'int64_t spoa::AlignmentEngine::WorstCaseAlignmentScore(int64_t, int64_t) const':
~/spoa-v4.0.7/src/alignment_engine.cpp:104: error: expected primary-expression before '[' token
~/spoa-v4.0.7/src/alignment_engine.cpp:104: error: expected primary-expression before ']' token
~/spoa-v4.0.7/src/alignment_engine.cpp:104: error: expected primary-expression before 'len'
~/spoa-v4.0.7/src/alignment_engine.cpp:104: error: unable to deduce 'auto' from '<expression error>'
~/spoa-v4.0.7/src/alignment_engine.cpp:104: error: expected ',' or ';' before '{' token
make[2]: *** [CMakeFiles/spoa.dir/build.make:63: CMakeFiles/spoa.dir/src/alignment_engine.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:73: CMakeFiles/spoa.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

Affine gap penalties

Hi again,

[With the risk of asking something obvious once reading the POA-paper] Is it possible (or practically feasible) implement affine gap penalty when aligning a sequence to the graph?

Unable to build shared library for v3.3.0 on Ubuntu 20.04 LTS with gcc v9.3.0

Thanks for putting out v3.3.0. There seems to be an issue with building a shared library.

Linking CXX shared library lib/libspoa.so                                                                              
/usr/bin/ld: CMakeFiles/spoa_avx2.dir/src/simd_alignment_engine_dispatch.cpp.o: relocation R_X86_64_PC32 against symbol `_ZTVN
4spoa19SimdAlignmentEngineILNS_4ArchE0EEE' can not be used when making a shared object; recompile with -fPIC                  
/usr/bin/ld: final link failed: bad value                                                                                     
collect2: error: ld returned 1 exit status                                                                                    
make[2]: *** [CMakeFiles/spoa.dir/build.make:133: lib/libspoa.so.4.0.0] Error 1                                               
make[1]: *** [CMakeFiles/Makefile2:188: CMakeFiles/spoa.dir/all] Error 2                                                      
make: *** [Makefile:130: all] Error 2                                                                                         

gcc v9.3.0 on Ubuntu 20.04 LTS.

I used the following cmake option while building.

-Dspoa_generate_dispatch=ON

determining alignment score

How would you recommend computing the alignment score from an alignment object.

In my application, I'd like to test this score to estimate if the alignment should be made in the forward or reverse orientation.

Build fails on ppc64le

Hi , I am working on building this on linux on power little endian architecture. I found that it fails during build stage due to usage of -march=native option. This is not supported on ppc64le. Is this option needed ? Could you pl. take a look into the same and see if something can be done

Below is the exact error message

[  0%] Building CXX object CMakeFiles/spoa.dir/src/alignment_engine.cpp.o
g++-4.8: error: unrecognized command line option ‘-march=native’
make[2]: *** [CMakeFiles/spoa.dir/src/alignment_engine.cpp.o] Error 1
CMakeFiles/spoa.dir/build.make:62: recipe for target 'CMakeFiles/spoa.dir/src/alignment_engine.cpp.o' failed
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/spoa.dir/all' failed
make[1]: *** [CMakeFiles/spoa.dir/all] Error 2
make: *** [all] Error 2
Makefile:129: recipe for target 'all' failed

https://travis-ci.com/github/gururajrkatti/spoa/jobs/461198016

Please set policy CMP0074 to NEW

Our HPC uses zlib 1.2.7, but spoa requires 1.2.8, and due to the above cmake policy it is not straightforward to use another ZLIB in place (no root access, upgrading is not possible). If you set the above we can set ZLIB_ROOT to allow us to compile this on older systems using conda/mamba envs with newer zlib's.

Cheers

'Cleverly' align sets of redundant sequences

In our applications, it can happen that SPOA is given sets that have many duplicate sequences. For example, this multi-FASTA

smoothxg_into_spoa_pad311_621639_in_1884956ms.zip

has 9280 sequences, of which 2416 are unique.

Is there a way to tweak SPOA to only work on the 2416 sequences, but to weigh them properly with respect to their frequencies in the non-deduplicated set? I smell it could be done, at least theoretically. We would need this feature when using SPOA as a submodule in other projects. The aim is to avoid redundant work while keeping consensus sequences that make sense.

How to prevent gaps at the ends of the alignment?

Hello!
I'm trying to align an alignment with sPOA that never has gaps at the ends of the alignment. I.E.:

GATTACA
GATTA - -
is forbidden, but
GATTACA
GATT - - A
is fine.

This is possible in my use-case, because the strings are guaranteed to end and begin with the same characters. (Explanation of my use case appended to the end of this issue.)

There are a couple ways I could imagine doing this, but I couldn't find a way to implement them. Here are the two ways:

  1. I could replace the start and end of the string with special characters that have an extremely high match score, e.g. the input strings
    GATTACA
    GATTA
    become
    XATTACX
    XATTX
    . But I couldn't find a way to make character-specific match scores.

  2. I could directly penalize gap open/extends that lead to the end of the alignment string. I didn't see a way to do that either.

Is something like this possible to do in sPOA? If not, would you be willing to add the feature?


Explanation of my use case:
I'm working on a tool for VG that simplifies poorly-constructed snarls that contain duplicated sequence information through multiple paths in the snarl. My tool extracts the haplotypes from the snarl, realigns them, and converts the alignment into a replacement snarl for the graph.

For this to work, I need to guarantee that each haplotype still stretches from the source to sink inside the snarl. I.e., the first character of each haplotype and the last character of each haplotype must be guaranteed to be aligned together.

Issue when installing spoa

Dear Robert Vaser and the SPOA team,
I tried to install spoa v4.0.0 but it failed at the cmake step.
Here is the log:

[22:34] morands@frrdcim20: spoa-4.0.0 $ module load cmake/3.18.4
[22:35] morands@frrdcim20: spoa-4.0.0 $ mkdir build
[22:35] morands@frrdcim20: spoa-4.0.0 $ cd build/
[22:35] morands@frrdcim20: build $ cmake ..
-- The CXX compiler identification is GNU 4.8.5
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:48 (add_subdirectory):
  The source directory

    /home/morands/LIBRARIES/spoa-4.0.0/vendor/cereal

  does not contain a CMakeLists.txt file.


-- Configuring incomplete, errors occurred!
See also "/home/morands/LIBRARIES/spoa-4.0.0/build/CMakeFiles/CMakeOutput.log".

Would you have any idea what went wrong during this step?
Thanks in advance for your precious help & time.
Stan

Linking spoa as an ExternalProject in CMake

Hi, I have been working on a project for which I am hoping to link most of my external libraries using the built in External Project method in CMake. In order to do that I need to be able to specify the install dir, which currently defaults to /usr/local/lib/libspoa.a.

My solution generally looks like this (spoa substituted for other libraries):

# Need to explicitly enable ExternalProject functionality
include(ExternalProject)

# Download or update library as an external project
ExternalProject_Add(project_spoa
        GIT_REPOSITORY https://github.com/rvaser/spoa.git
        PREFIX ${CMAKE_CURRENT_BINARY_DIR}/external/
        BUILD_IN_SOURCE True
        INSTALL_DIR ${CMAKE_SOURCE_DIR}/external/spoa/
        INSTALL_COMMAND make INSTALL_PREFIX=${CMAKE_SOURCE_DIR}/external/spoa/ install
        )

# Define INSTALL_DIR as the install directory for external library
ExternalProject_Get_Property(project_spoa INSTALL_DIR)

# Create new library for external project (so it can be linked with main library)
add_library(spoa STATIC IMPORTED)
set_property(TARGET spoa
        PROPERTY IMPORTED_LOCATION ${INSTALL_DIR}/lib/libspoa.a)

# Define library as dependent on the downloaded project
add_dependencies(spoa
        project_spoa
        )

# Define main library as dependent on the downloaded project (transitively)
add_dependencies(Bluntifier spoa)

# Ensure that main library has access to primary dependencies' and secondary dependencies' headers
include_directories(external/spoa/include/)

But currently I get this error:

[100%] Built target spoa
[ 18%] Performing install step for 'project_spoa'
[100%] Built target spoa
Install the project...
-- Install configuration: ""
-- Installing: /usr/local/lib/libspoa.a
CMake Error at cmake_install.cmake:41 (file):
  file INSTALL cannot copy file
  "/home/ryan/code/GetBlunted/build/external/src/project_spoa/lib/libspoa.a"
  to "/usr/local/lib/libspoa.a".

Generally other libraries have some variable in the makefile which allows the prefix to be set. Sometimes INSTALL_PREFIX or just PREFIX. Is there some other variable in spoa that already fills this role? In the past this flag has been set within the make command itself, but for projects which do not use CMake.

Thanks

make: *** No targets specified and no makefile found. Stop.

Hello, I tried to build spoa using the following command.

git clone https://github.com/rvaser/spoa && cd spoa && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && make

And it says:
make: *** No targets specified and no makefile found. Stop.

How can I solve this? Your response will be appreciated.

Here's the whole log.

cmake -DCMAKE_BUILD_TYPE=Release .. && make
Cloning into 'spoa'...
remote: Enumerating objects: 1247, done.
remote: Counting objects: 100% (163/163), done.
remote: Compressing objects: 100% (69/69), done.
Receiving objects: 100% (1247/1247), 497.63 KiB | 4.18 MiB/s, done.
Resolving deltas: 100% (835/835), done.
remote: Total 1247 (delta 91), reused 146 (delta 86), pack-reused 1084
-- Building for: Visual Studio 17 2022
-- Selecting Windows SDK version 10.0.22000.0 to target Windows 10.0.22621.
-- The CXX compiler identification is MSVC 19.35.32216.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.35.32215/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found ZLIB: C:/Program Files/zlib/lib/zlib.lib (found suitable version "1.2.13", minimum required is "1.2.8")
CMake Deprecation Warning at build/_deps/googletest-src/CMakeLists.txt:4 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The C compiler identification is MSVC 19.35.32216.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.35.32215/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
CMake Deprecation Warning at build/_deps/googletest-src/googlemock/CMakeLists.txt:45 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


CMake Deprecation Warning at build/_deps/googletest-src/googletest/CMakeLists.txt:56 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Found PythonInterp: C:/Program Files/Python311/python.exe (found version "3.11.2")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
-- Configuring done (23.6s)
-- Generating done (0.1s)
-- Build files have been written to: C:/Users/ASUS/Documents/Repositories/Thesis/External/spoa/build
make: *** No targets specified and no makefile found.  Stop.

Adding a Graph::clear() function

I find that doing a large number of alignments in the same process leads to memory fragmentation and eventual memory allocation failure. This problem could be eliminated easily by simply providing a Graph::clear() function that clears all of the vectors stored in the Graph and allows a Graph object to be reused. Vector::clear() does not free the memory, it just sets the size to zero. That way, the vectors capacities would grow to the capacity needed by the largest alignment, but without causing memory fragmentation, because of the small number of vectors involved.

This is easy to implement and I could just create a fork and do it, but I would prefer if this is done in the main repository. I am using the spoa library in the Shasta assembler.

Without this functionality, a large run needs to be split among small separate processes, and multithreaded parallelism results in intolerable memory fragmentation (each thread using separate Graphs, of course).

Could you add the ability to give a negative "offset" to each vertex in the graph?

Hi,

Thanks for making available this fast and easy-to-use POA library. I'm using it to find consensus alignments of noisy PacBio reads, which contain frequent random insertion (and deletion) errors, and finding that the resulting consensus sequence often contains very weakly supported bases resulting from these noise insertions, which I would prefer to get rid of -- if I understand correctly, this is because generate_consensus() looks for the heaviest-weight path in the graph, and all sequence bases contribute a positive weight, so a base will only be excluded if it appears to conflict with bases in one or more other sequences. Ironically, the very simplistic consensus caller I was originally using, which simply performs a heuristic MSA and then reports only bases from columns that have a strict majority, currently seems to give me better consensus sequences on 2x-10x coverage data because it does a better job of getting rid of these insertions.

I think that what I want could be accomplished by adding a negative offset to each graph vertex -- this would effectively create a "barrier to entry" that only lets the heaviest-path algorithm consider a base for the consensus sequence if it appears in sufficiently many input sequences. Do you agree that this would be a helpful approach? And would it be possible to implement something like this? I would certainly appreciate it!

Unexpected multiple sequence alignment

I am seeing an unexpected MSA and consensus sequence. By eye, I can see a more parsimonious result. See below for details. Any insight would be appreciated.

Actual Output
Consensus (142)
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
Multiple sequence alignment
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCATCCACCAGGCTG-CCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGCGGGCGCTGTGGACAGCGCTCCTTACCACC------------------------------------
CCCGCCCCTGAAAGCCTTCGCGCCCGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGG------------------C-----------------------------------
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCGGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGG------------------CCTG--------------------------------
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTG-CCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAAC----AGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAAC----AGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
Expected Output
Consensus (142)
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGGC----TATCCCCAGCCCTTACCGGCGTGT
Multiple sequence alignment
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCATCCACCAGGCTG-CCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGCGGGCGCTGTGGACAGCGCTCCTTACCACC--------------------------------
CCCGCCCCTGAAAGCCTTCGCGCCCGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGC-------------------------------------------------
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCGGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTG----------------------------------------------
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTG-CCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGCCAGGCTATCCCCAGCCCTTACCGGCGTGT
Source: example.cpp
#include "spoa/spoa.hpp"

int main(int argc, char** argv) {

	std::vector<std::string> sequences = {
		"CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCATCCACCAGGCTGCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGCGGGCGCTGTGGACAGCGCTCCTTACCACC",
		"CCCGCCCCTGAAAGCCTTCGCGCCCGCTGCCCCTTCCTCCAGGCTGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGC",
		"CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCGGCCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTG",
		"CCCGCCCCTGAAAGCCTTCGCGCACGCTGCCCCTTCCTCCAGGCTGCCCCCCTGGATGGACGGAGCAGCAGGGCCAGCGAGGGCGCTGTGGCCTGAGCTCCTTACCAACAGCCAGGCTATCCCCAGCCCTTACCGGCGTGT"
	};
	
	auto alignment_engine = spoa::createAlignmentEngine(static_cast<spoa::AlignmentType>(atoi(argv[1])), atoi(argv[2]), atoi(argv[3]), atoi(argv[4]));

	auto graph = spoa::createGraph();

	for (const auto& it: sequences) {
		auto alignment = alignment_engine->align_sequence_with_graph(it, graph);
		graph->add_alignment(alignment, it);
	}

	std::string consensus = graph->generate_consensus();

	fprintf(stderr, "Consensus (%zu)\n", consensus.size());
	fprintf(stderr, "%s\n", consensus.c_str());

	std::vector<std::string> msa;
	graph->generate_multiple_sequence_alignment(msa, true);

	fprintf(stderr, "Multiple sequence alignment\n");
	for (const auto& it: msa) {
		fprintf(stderr, "%s\n", it.c_str());
	}

	return 0;
}

Spoa commit : 783d7b6

Compiled with g++ example.cpp -std=c++11 -Iinclude/ -Lbuild/lib/ -lspoa -o example

Run with ./example 0 5 -4 -8

Potential Meson addition

Hi @rvaser
Meson is a new build system that aims to improve a lot on the imperfections of CMake, like a nicer syntax, being non-turing complete, being faster and allowing trivial composability. One of the big drawbacks of CMake is that it was never designed to be composable, and as such all ways of bundling it are ultimately painful and break in idiosyncratic ways. Would you be open to me contributing a Meson PR? You can always just call a "best effort thing" and all problems with it should be dealt by me. Many projects are switching to Meson now, such as GNOME, the complete X.org stack and many others, hence this is not just a niche sideproject of some random guy.

Segmentation fault

Hi @rvaser,

Not sure what is wrong this command. Oddly, passing in the three fasta sequences separately it works.

Unrelated, is there a way to modify the dot output to condense uniq paths into a single node?

(base) [zkronenberg@mp0709-sge chr2_q37_1]$ ~/tools/spoa/build/bin/spoa -d test homSap_combined.fasta
Segmentation fault (core dumped)

homSap_combined.fasta.zip

Using quality values from FastQ to generate consensus/graph

I there any option to feed FastQ data to your algorithm in a way that it will use the quality information? From what I have seen I do not think so, but asking anyway, maybe you have something experimental?

Do you think this would make sense in general and if yes, would it be a big effort to implement this in your view (if not already existing)?

Using exceptions in case of errors rather than calling exit

The code currently writes a message to stderr, then calls exit if an error occurs. Throwing an std::exception instead would be a more robust and standard behavior, as it permits proper destruction of data structures owned by the caller, as well as reacting to the error with application-specific messages or with custom behaviors.

Num modified

Hi Robert, we are using the mito branch of spoa, and would like to know what the "num modified" as given in the output of each run refers to.

Johnathan

Unexpected starting bases for consensus sequence

Hi,
Thank you for developing this great library !
I am using spoa as a library to generate a consensus sequence from similar reads albeit with some errors.
In the example below why does spoa skip the first two bases, 'G' and 'C' in the consensus generation ?
Is it because the heaviest bundle traversal traceback stops at the 254 -G node ? Is it possible to change this behavior ? Sorry if this is an obvious question, I am not proficient at C++ but I did attempt to read the code.

I used these parameters : 2(semi global) 5(match) -4(mismatch) -10(gap open) -8(gap extend)
These were the sequences :
example_sequences.txt.txt

This is a snippet of the graph output :
graph_first_few_bases

Thank You

4.0.5 build -DBUILD_SHARED_LIBS=ON : CMakeFiles/spoa.dir/build.make:144: *** missing separator. Stop.

Excerpt from the cmake generated CMakeFiles/spoa.dir/build.make at line 144

lib/libspoa.so.$(EQUALS);7.0.0: CMakeFiles/spoa.dir/src/alignment_engine.cpp.o
lib/libspoa.so.$(EQUALS);7.0.0: CMakeFiles/spoa.dir/src/graph.cpp.o
lib/libspoa.so.$(EQUALS);7.0.0: CMakeFiles/spoa.dir/src/sisd_alignment_engine.cpp.o
lib/libspoa.so.$(EQUALS);7.0.0: CMakeFiles/spoa.dir/src/dispatcher.cpp.o
lib/libspoa.so.$(EQUALS);7.0.0: CMakeFiles/spoa.dir/build.make
lib/libspoa.so.$(EQUALS);7.0.0: CMakeFiles/spoa.dir/link.txt
        @$(CMAKE_COMMAND) -E cmake_echo_color --switch=$(COLOR) --green --bold --progress-dir=/home/michael/src/spoa/spoa/build/CMakeFiles --progress-num=$(CMAKE_PROGRESS_5) "Linking CXX shared library lib/libspoa.so"
        $(CMAKE_COMMAND) -E cmake_link_script CMakeFiles/spoa.dir/link.txt --verbose=$(VERBOSE)
        $(CMAKE_COMMAND) -E cmake_symlink_library "lib/libspoa.so.=;7.0.0" "lib/libspoa.so.=;7.0.0" lib/libspoa.so

Without -DBUILD_SHARED_LIBS=ON the build suceeds

Full build log:

$ cmake -Dspoa_build_executable=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON .. && make
-- The CXX compiler identification is GNU 10.2.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/lib/ccache/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning (dev) at vendor/cereal/CMakeLists.txt:2 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  The following variable(s) would be set to empty:

    PROJECT_VERSION
    PROJECT_VERSION_MAJOR
    PROJECT_VERSION_MINOR
    PROJECT_VERSION_PATCH
This warning is for project developers.  Use -Wno-dev to suppress it.

-- The C compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/lib/ccache/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake (found version "1.74.0") found components: serialization 
-- boost_variant.cpp
-- Found Doxygen: /usr/bin/doxygen (found version "1.8.20") found components: doxygen dot 
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11") 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/michael/src/spoa/spoa/build
CMakeFiles/spoa.dir/build.make:141: *** missing separator.  Stop.
make[1]: *** [CMakeFiles/Makefile2:481: CMakeFiles/spoa.dir/all] Error 2
make: *** [Makefile:149: all] Error 2

Debian was depending on this to make a shared library, so this is preventing us from release an updated Debian package of spoa 4.0.5

Segment fault on memory exhaustion. Should error out instead.

Spoa segment faults if it runs out of memory. Instead, it should throw an exception or return an error code. The attached test program segment faulted on my 32 GB laptop. I realize sequences these long are outside of what spoa can reasonably do, but a segment fault is not an acceptable termination mode, because client code does not get a chance to write a meaningful message for the end user.

testSpoa.cpp.gz

[bioparser::FastqParser] error: invalid file format!

Hi @rvaser,

I get
[bioparser::FastqParser] error: invalid file format!

when running

spoa ~/tmp/stefan_isonclust/bugfix/fastq_file_not_working.fq -l 0 -r 0 -g -2

My version of spoa

(base)  kxs624$ spoa --version
v1.1.5

I have attached the file fastq_file_not_working.fq (but I renamed it .txt in order to upload it here)
fastq_file_not_working.txt

I tried to look in the file for any malformatting, but the file is automatically generated so doubt it. I note however that the accession names are pretty long. Any idea what it could be?

Fails with the latest cereal: ld: error: unable to find library -lcereal

spoa-4.0.7 fails to find cereal rev. 64f50dbd:

===>  Building for spoa-4.0.7_1
[1/3] : && /usr/bin/c++ -fPIC -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -fno-omit-frame-pointer  -isystem /usr/local/include -Wall -Wextra -pedantic -fopenmp-simd -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -fno-omit-frame-pointer  -isystem /usr/local/include  -lz -lcpu_features -fstack-protector-strong -L/usr/local/lib -shared -Wl,-soname,libspoa.so.7.0.0 -o lib/libspoa.so.7.0.0 CMakeFiles/spoa.dir/src/alignment_engine.cpp.o CMakeFiles/spoa.dir/src/graph.cpp.o CMakeFiles/spoa.dir/src/sisd_alignment_engine.cpp.o CMakeFiles/spoa.dir/src/dispatcher.cpp.o  -lcereal && :
FAILED: lib/libspoa.so.7.0.0 
: && /usr/bin/c++ -fPIC -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -fno-omit-frame-pointer  -isystem /usr/local/include -Wall -Wextra -pedantic -fopenmp-simd -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -fno-omit-frame-pointer  -isystem /usr/local/include  -lz -lcpu_features -fstack-protector-strong -L/usr/local/lib -shared -Wl,-soname,libspoa.so.7.0.0 -o lib/libspoa.so.7.0.0 CMakeFiles/spoa.dir/src/alignment_engine.cpp.o CMakeFiles/spoa.dir/src/graph.cpp.o CMakeFiles/spoa.dir/src/sisd_alignment_engine.cpp.o CMakeFiles/spoa.dir/src/dispatcher.cpp.o  -lcereal && :
ld: error: unable to find library -lcereal
c++: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

FreeBSD 12.2

Illegal Instruction

Hello,

Thank you very much for your great tool! I used SPOA as part of my tool (by including spoa.hpp as described in Github and compiling from source). I am running my tool on a cluster and noticed that SPOA fails with Illegal instruction on some nodes with Intel(R) Xeon(R) E5-2680 CPU processors. Both see4.1 and AVX2 are supported by this processor:

flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts.

that would be great if you could help me understand what is causing this, and if there is something that can be done to fix it.

Best,
Helia

Illegal instruction

A Shasta user (@zhenzhenyang-psu) is getting an "Illegal instruction" signal while running the assembly phase that uses the spoa library. The signal occurs immediately on starting that assembly phase. He is using Shasta 0.5.0, which links in Spoa 3.4.0. He is running on an old CentOS 6 system, and it is likely that the processors are missing some of the newer instruction sets. I suspect an error in the recently introduced cpu dispatching capability in Spoa.

I will ask @zhenzhenyang-psu to report the processor flags for his system, so we will know exactly what is the available instruction set.

Depending on how quickly you are able to address this, we may have to downgrade Shasta back to Spoa 0.3.0, or perhaps just provide this user a temporary Shasta build done with Spoa 0.3.0.

For reference, the original Shasta issue is here. The title is misleading because the issue was initially filed for a different problem.

Spoa v4.0.0 doesn't work when input is FASTA

Hi Robert,

Recently I upgraded Spoa from v3.0.1 to v4.0.0 and everything works great when the input file in FASTQ format. However, when feeding a FASTA file, it shows:

$ spoa -r 2 test.fa
[spoa::Graph::AddAlignment] error: sequence and weights are of unequal size!

Thanks,
Wen-Wei

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.