Giter VIP home page Giter VIP logo

Comments (10)

dancazarin avatar dancazarin commented on June 5, 2024 1

Hi @xkzl ,
KFR also supports inplace mode. Just pass the same pointer to in and out parameters as shown here:

dft.execute(data, data, tmp);

temp parameter is optional in KFR6. Passing nullptr will result in buffer allocation during dft execution.

The problem of undefined reference is resolved, so I'll close this as completed. If you have further questions about using KFR please open another topic and start subject with Question.

from kfr.

xkzl avatar xkzl commented on June 5, 2024

It seems the problem get fixed when I enabled C API and uses clang instead of gcc.
Do you have any clue why this is not happening on macos? (I don't enable C API on macOS, but i am using clang by default)

from kfr.

xkzl avatar xkzl commented on June 5, 2024

Hi @dancazarin
I am not sure this is related, but while preparing my docker environment installing kfr and enabling capi option.

I ended up with some memory issues. I could traceback up to kfr, but no further.
I am wondering whether this docker container is not messing up the architecture detection, maybe ?
This is maybe the reason i got this undefined reference to void kfr::sse2::dft_initialize ` initial issue that was not happening on my ARM macOS

Could you give me your opinion ?

Here is the basic code I am using (minimal reproducer, compiled with Clang 14):

#include <kfr/all.hpp>

int main() {

        std::vector<double> input = {2, 1.739320897989603321, 1.111585192241478115, 0.4651946789936177717, 0.08686620820818069522, 0.0002655974925263326642, -0.02775467909650402784, -0.2803625394325067188, -0.8450476981473055149, -1.520640638903657083, -1.950949887071400601, -1.884110337648723377, -1.359401810914456554, -0.6783948405585837893, -0.1824768515010283809, -0.007053706735352200004, -1.648065648398194479e-06, 0.1349731116370727346, 0.5828102020663453731, 1.248565342109722032, 1.807957217368342029, 1.940134795198463324, 1.565046483956072709, 0.9035783544276514423, 0.3142685529761814478, 0.03158868811086192052, 0.005847729973192756711, -0.0339825311613773523, -0.3451430443613818233, -0.9470403833054994447, -1.583067018304429308, -1.899767734308779099, 
-1.707105248653190577, -1.120553735280900964, -0.4759834468700876453, -0.08241995593086165472, -0.001843033411668155291, -0.02380995220001747709, 0.1476495484979366002, 0.6409879499056966301, 1.294975547260964532, 1.763524549667581987, 1.768537748147161359, 1.307360744592875212, 0.6557049980836907599, 0.1636171517239771722, 0.0006691861693925252883, 0.04504656884606737799, 4.009608726850157192e-05, -0.3538099960611205685, -0.9671239436514901255, -1.540120599675593338, -1.738616326262960143, -1.442617673474021478, -0.8368828077476384575, -0.2738093095562325607, -0.0135873653540896748, -0.04044781361531966934, -0.09483191091585801979, 0.105142820874979781, 0.625333835075665645, 1.245681739775967634, 1.614243214658051206, 1.507912083596089525, 
0.9999956661224340682, 0.4059336398723785155, 0.04860598279536287175, 0.02298352625783861569, 0.139944074195321233, 0.09089309181000310156, -0.2952469203305729817, -0.9021909541431394342, -1.400489565173038686, -1.489959898003710537, -1.124669105408877767, -0.5477393598695156074, -0.1090929340403639924, -0.005814388456486995968, -0.1439445460039573133, -0.2267643411384250851, -0.0001496804328339733782, 0.5353400686655153118, 1.110286691496064071, 1.382280383200549911, 1.192006447523911383, 0.6830002733063942344, 0.1930122233016666466, 0.0002655075168004892346, 0.1191855088698750842, 0.3018140241168703608, 0.2426430223038470169, -0.1720216412112436644, -0.7632948910358308137, -1.186186731172929498, -1.186863009297784544, -0.7933030053527594383, 
-0.2928886235632580659, -0.01408790495665835328, -0.07983941540372113677, -0.32175818914851817, -0.4203328116083723254, -0.1622646315032800768, 0.3840676702091681549, 0.9109677059972636215, 1.0997953174376629, 0.8602062280336781885, 0.3965152038749057417, 0.05022723960886350814, 0.0397863100425041652, 0.2974704968087940049, 0.5284315085803341638, 0.4458087572456688186, 0.0002915729722856205382, -0.573226470833971713, -0.9284508132084656751, -0.8675184009328303913, -0.4883275828033899479, -0.1062563444335613005, -0.01061700574003038126, -0.2432181864126915505, -0.5692551352464521042, -0.6626326063906402553, -0.362726488689640969, 0.1954378094992745007, 0.6782248283242355846, 0.8034232606099845908, 0.5512864429180411863, 0.1745466132404353277, 
3.750000254026371867e-11, 0.174574500241098135, 0.5514276210715076676, 0.8037730482777486474, 0.678796468659037755, 0.1961264841444000095, -0.3620935798432189934, -0.6621958297192190868, -0.5690459594299979162, -0.2431626230490116503, -0.01061378163628759475, -0.1062677193952378946, -0.4884117673409048566, -0.8677783455873464558, -0.9289393958950559194, -0.57388250175807487, -0.0003748782435482769619, 0.4452956641709209795, 0.5281479970371536492, 0.2973753826725218374, 0.03977595641660955361, 0.05023093234604798785, 0.3965561677698070109, 0.8603807791328722532, 1.10018652447662979, 0.9115611760747557302, 0.3847361702402604111, -0.1616934265338785814, -0.4199748001128623121, -0.3216128341036101346, -0.07981477945678633334, -0.01408925791060520495, 
-0.2929008666464609778, -0.7934026902671815762, -1.187150750346630534, -1.186692897764935806, -0.7639327698073674622, -0.1726263121436744929, 0.2422174343547515429, 0.3016108461399393259, 0.1191374217302022115, 0.0002658332251878870421, 0.1930090393869714482, 0.6830399302061813671, 1.192192783863664118, 1.382681599687336726, 1.110862851948544838, 0.5359489235208109159, 0.0003292931388187352948, -0.2265005691949465161, -0.1438634679802621164, -0.005811096900609521824, -0.1090848211520083211, -0.5477360547020913017, -1.124763378765599287, -1.490246819833502601, -1.400977109464180081, -0.9027724238930221468, -0.2957584341933354177, 0.09057201759866409518, 0.1398219393147618894, 0.0229708955918215274, 0.04859958245892709999, 0.4059045227439800896, 
1.000012956632289285, 1.508084048999398297, 1.614621677191439009, 1.246204626729205511, 0.6258517624661268375, 0.1055112241030421172, -0.09466392670415910149, -0.04041829528812360128, -0.01358504722098169068, -0.2737696651953872018, -0.8368419168514028561, -1.442682239930049182, -1.738873285944682712, -1.540556772698358445, -0.9676188893393139479, -0.3542091984003683525, -0.0001737452054680235336, 0.04499244910407947107, 0.0006693112623704244478, 0.1635788681411923517, 0.6556262001433073028, 1.307332454289188206, 1.768669659872286193, 1.763851396675544025, 1.295417304496725208, 0.6413957587874419275, 0.1479034609939437472, -0.0237251447271462669, -0.001840842993162925033, -0.08239057877232205951, -0.4758862725081852707, -1.12045226567931766, 
-1.707117419203634245, -1.899970116389768426, -1.583427219611244219, -0.9474305420160242797, -0.3454250992832720302, -0.03410080641030275445, 0.00583672780772816896, 0.03157115245176606716, 0.3141698555918571723, 0.903426274219513048, 1.56495221059336842, 1.940206299437091664, 1.80821190267722276, 1.248909684366706907, 0.5831027190560391649, 0.1351229980559817689, 2.471883173358129944e-05, -0.007046752307522965303, -0.1823893677741245667, -0.6782152295543568687, -1.359220791301259945, -1.884053701154217331, -1.951081717343044586, -1.52091158297334883, -0.8453283707555798721, -0.2805367572733507009, -0.02780127474116106043, 0.0002647878664472493948, 0.08679775869842103198, 0.4650088786726502832, 1.111341391684988578, 1.739147565261143447};

       kfr::dft_plan_real_ptr<double> dft = kfr::dft_cache::instance().getreal(kfr::ctype_t<double>(), input.size());
       std::vector<kfr::complex<double>> output(input.size(), std::numeric_limits<double>::quiet_NaN());
       std::vector<kfr::u8> temp(input.size());

       dft->execute(&output[0], &input[0], &temp[0]);

       output.resize(input.size() / 2 + 1);
       kfr::dft_cache::instance().clear();

       return 0;
}

Here is the valgrind log output:

==15644== Memcheck, a memory error detector
==15644== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==15644== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==15644== Command: ./tests/EnvelopeTest
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC91B3: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2b20 is 176 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC91BC: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2b10 is 160 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC91C5: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2b00 is 144 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC91CE: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2af0 is 128 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC91D7: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2ae0 is 112 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC91E0: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2ad0 is 96 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC91E9: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2ac0 is 80 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC91F2: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2ab0 is 64 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC91FB: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2aa0 is 48 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC9200: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2a90 is 32 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC9205: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2a80 is 16 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC920A: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2a70 is 0 bytes inside an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC920F: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2a60 is 16 bytes before an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC9214: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2a50 is 32 bytes before an unallocated block of size 1,336,688 in arena "client"
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC9219: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2a40 is 16 bytes after a block of size 256 alloc'd
==15644==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==15644==    by 0x10BAB4: main (in /opt/omu/build/tests/EnvelopeTest)
==15644== 
==15644== Invalid write of size 8
==15644==    at 0x7FC921E: void kfr::sse2::intrinsics::write<false, 32ul, double, (cometa::details::unique_enum_impl<371>::type)371>(cometa::cval_t<bool, false>, double*, kfr::sse2::vec<double, 32ul> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC9120: kfr::sse2::vec<double, 32ul> const& kfr::sse2::vec<double, 32ul>::write<false>(double*, cometa::cval_t<bool, false>) const (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FC5E2A: void kfr::sse2::intrinsics::cwrite<16ul, false, double>(kfr::complex<double>*, kfr::sse2::vec<double, (16ul)*(2)> const&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD94DE: void kfr::sse2::intrinsics::radix4_autosort_pass_first<4ul, false, false, false, false, double>(unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD900E: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
==15644==  Address 0xd8e2a30 is 0 bytes after a block of size 256 alloc'd
==15644==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==15644==    by 0x10BAB4: main (in /opt/omu/build/tests/EnvelopeTest)
==15644== 

valgrind: m_mallocfree.c:303 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 320, hi = 4600849206870085994.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.


host stacktrace:
==15644==    at 0x5804284A: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==15644==    by 0x58042977: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==15644==    by 0x58042B1B: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==15644==    by 0x5804C8CF: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==15644==    by 0x5803AE9A: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==15644==    by 0x580395B7: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==15644==    by 0x5803DF3D: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==15644==    by 0x58038868: ??? (in /usr/libexec/valgrind/memcheck-amd64-linux)
==15644==    by 0x1008D5A766: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 15644)
==15644==    at 0x7F3B94C: _ZN3kfr4sse210intrinsics9simd_readILm8EdEEDvT__NS_16internal_generic10unwrap_bitIT0_E4typeEPKS5_ (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7F3B8DD: kfr::sse2::vec<double, 8ul> kfr::sse2::intrinsics::read<8ul, false, double, (cometa::details::unique_enum_impl<356>::type)356>(cometa::cval_t<bool, false>, cometa::cval_t<unsigned long, 8ul>, double const*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7F5A57A: kfr::sse2::vec<double, 8ul>::vec<false>(double const*, cometa::cval_t<bool, false>) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FDA7B6: kfr::sse2::vec<double, (4ul)*(2)> kfr::sse2::intrinsics::cread_split<4ul, false, false, double>(kfr::complex<double> const*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD963B: void kfr::sse2::intrinsics::radix4_autosort_pass<4ul, false, false, false, false, double>(unsigned long, unsigned long, cometa::cval_t<unsigned long, 4ul>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, kfr::complex<double> const*&) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD9029: void kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute<false>(kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x7FD8D8C: kfr::sse2::intrinsics::fft_specialization<double, 7ul>::do_execute(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) (in /usr/local/lib/libkfr_capi.so)
==15644==    by 0x10D226: void kfr::dft_plan<double>::execute_dft<false>(cometa::cval_t<bool, false>, kfr::complex<double>*, kfr::complex<double> const*, unsigned char*) const (in /opt/omu/build/tests/EnvelopeTest)
==15644==    by 0x10BB24: main (in /opt/omu/build/tests/EnvelopeTest)
client stack range: [0x1FFEFFD000 0x1FFF000FFF] client SP: 0x1FFEFFF2B0
valgrind stack range: [0x1002CAE000 0x1002DADFFF] top usage: 18984 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

from kfr.

dancazarin avatar dancazarin commented on June 5, 2024

Hello @xkzl
temp array must be allocated with dft->temp_size bytes, not the size of DFT. If the array is smaller than temp_size (as in your case) kfr tries to write after the allocated memory block, this is exactly what valgrind detects.
Example from kfr repo:

univector<u8> temp(dft.temp_size);

Please take a look at the following doc section about building and linking KFR https://kfr.dev/docs/latest/installation/#compile-for-multiple-architectures
Note that building DFT on GCC (and MSVC) is not officially supported. But using clang-compiled DFT libraries (static or shared) is perfectly fine in any compiler, ABI is compatible.

from kfr.

xkzl avatar xkzl commented on June 5, 2024

Ah.. indeed, it's now working ! Thank you for your explanation !
This was very helpful.

from kfr.

xkzl avatar xkzl commented on June 5, 2024

Sorry for reopening, I just tried the following:

#include <kfr/all.hpp>
int main() {
 
    int n = 128;
    
    kfr::dft_plan<double> dft(n);
    std::cout << dft.temp_size << std::endl;

    return 0;  
}

Is there any reason why this is printing 0?

from kfr.

dancazarin avatar dancazarin commented on June 5, 2024

Please replace the code above with this and rerun:

#include <kfr/all.hpp>
int main() {
    int n = 128;
    kfr::println(kfr::library_version()); // print KFR version
    kfr::dft_plan<double> dft(n);
    dft.dump(); // print selected DFT algorithm
    std::cout << dft.temp_size << std::endl;

    return 0;  
}

What does it print now?

from kfr.

xkzl avatar xkzl commented on June 5, 2024

Hi @dancazarin thank you for replying so quickly.
I was going through some tests and I noticed some conflicts in two of my kfr installation.

I am suspecting a conflict due to a bad configuration on my side.
It is now returning something that is not 0.

KFR 5.2.0 neon64 64-bit (clang-15.0.0/macos) +in +ve
fft_specialization<double, 7>(neon64): 0, 128, 2048, 2048, 1, 0, 0, 0, 1
2048

from kfr.

dancazarin avatar dancazarin commented on June 5, 2024

0 in temp_size is ok as well as allocating zero sized univector (and std::vector). Some DFT sizes require no temp buffer and have 0 in temp_size. But for double precision DFT of size 128 KFR 5.2 must return 2048 while previous KFR returned zero. That's why I thought about wrong KFR version.

from kfr.

xkzl avatar xkzl commented on June 5, 2024

Hi @dancazarin

I was doing further tests yesterday. Your suggestions triggered some interested to understand the underlying mechanism of KFR you mentioned. Thank you for the explanation ! I compared many dft plan declarations.

#include <iostream>
#include <kfr/all.hpp>

int main() {

    kfr::println(kfr::library_version()); // print KFR version

    int n = 16384;
    for(int i = 0; i <= n; i++) {

        kfr::dft_plan<double> dft(i);
        std::cout << ">>>> i = " << i << ": dft.temp_size = " << dft.temp_size << std::endl;
        if(!dft.temp_size) {
            std::cout << ">>>>";
            dft.dump(); // print selected DFT algorithm
        }

    }

    return 0;  
}

I ran this code above and it resulted in the following output.
dft_plan.log
Here is the summary

KFR 5.2.0 neon64 64-bit (clang-15.0.0/macos) +in +ve
>>>> i = 0: dft.temp_size = 0
>>>>>>>> i = 1: dft.temp_size = 0
>>>>fft_specialization<double, 0>(neon64): 0, 1, 0, 0, 1, 0, 0, 0, 1
>>>> i = 2: dft.temp_size = 0
>>>>fft_specialization<double, 1>(neon64): 0, 2, 0, 0, 1, 0, 0, 0, 1
>>>> i = 3: dft.temp_size = 64
>>>> i = 4: dft.temp_size = 0
>>>>fft_specialization<double, 2>(neon64): 0, 4, 0, 0, 1, 0, 0, 0, 1
>>>> i = 5: dft.temp_size = 128
>>>> i = 6: dft.temp_size = 128
>>>> i = 7: dft.temp_size = 128
>>>> i = 8: dft.temp_size = 0
>>>>fft_specialization<double, 3>(neon64): 0, 8, 0, 0, 1, 0, 0, 0, 1
>>>> i = 9: dft.temp_size = 192
>>>> i = 10: dft.temp_size = 192
[..]
>>>> i = 15: dft.temp_size = 256
>>>> i = 16: dft.temp_size = 0
>>>>fft_specialization<double, 4>(neon64): 0, 16, 0, 0, 1, 0, 0, 0, 1
>>>> i = 31: dft.temp_size = 1024
>>>> i = 32: dft.temp_size = 0
>>>>fft_specialization<double, 5>(neon64): 0, 32, 0, 0, 1, 0, 0, 0, 1
>>>> i = 64: dft.temp_size = 0
>>>>fft_specialization<double, 6>(neon64): 0, 64, 0, 0, 1, 0, 0, 0, 1
>>>> i = 1024: dft.temp_size = 16384
>>>> i = 1025: dft.temp_size = 17152
[..]
>>>> i = 2048: dft.temp_size = 32768
>>>> i = 4096: dft.temp_size = 0
>>>>fft_stage_impl<double, false, true>(neon64): 4, 4096, 49152, 0, 4, 0, 0, 1, 1
fft_final_stage_impl<double, true, 1024>(neon64): 1024, 1024, 24576, 0, 4, 1024, 0, 1, 1
fft_reorder_stage_impl<double, true>(neon64): 0, 4096, 0, 0, 1, 0, 0, 0, 1
[..]
>>>> i = 8191: dft.temp_size = 131072
>>>> i = 8192: dft.temp_size = 0
>>>>fft_stage_impl<double, false, false>(neon64): 4, 8192, 98304, 0, 4, 0, 0, 1, 1
fft_stage_impl<double, true, false>(neon64): 4, 2048, 24576, 0, 4, 0, 0, 1, 1
fft_final_stage_impl<double, true, 512>(neon64): 512, 512, 12288, 0, 4, 512, 0, 1, 1
fft_reorder_stage_impl<double, false>(neon64): 0, 8192, 0, 0, 1, 0, 0, 0, 1
[..]
>>>> i = 16384: dft.temp_size = 0
>>>>fft_stage_impl<double, false, true>(neon64): 4, 16384, 196608, 0, 4, 0, 0, 1, 1
fft_stage_impl<double, true, true>(neon64): 4, 4096, 49152, 0, 4, 0, 0, 1, 1
fft_final_stage_impl<double, true, 1024>(neon64): 1024, 1024, 24576, 0, 4, 1024, 0, 1, 1
fft_reorder_stage_impl<double, true>(neon64): 0, 16384, 0, 0, 1, 0, 0, 0, 1

I noticed there are indeed many plans returning 0 temp_size.
Is there actually a way to make the temp variable optional in the dft_plan->execute() method maybe ?
Looking at fftw I think there are not using such prototype. They actually even have an "inPlace" option providing only an input vector without the need of output vector.

from kfr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.