Giter VIP home page Giter VIP logo

Comments (23)

hannesm avatar hannesm commented on June 15, 2024 1

in your foo2_disassemble you ran the C code without any arguments (run) -- which will make strtol segfault -- could you use run aa55aa55 instead on the gdb prompt?
also, it is crucial to compile the C file with exactly the same CFLAGS as used to compile mirage-crypto with.

from mirage-crypto.

hannesm avatar hannesm commented on June 15, 2024

Thanks for your report. I don't have a "AMD Phenom II X4 955" at hand to reproduce / fix the issue.

Since v0.7.0, CPU feature detection is done at run-time (i.e. cfg.ml only adds CFLAGS whether or not the intrinsics are available and the code is to be compiled).

It sounds like a runtime issue that you are encountering. Here, in native/detect_cpu_features.c the function mc_detect_cpu_features() inspects the CPU ID, and enables/disables the CPU instructions to use.

What I would guess that may go wrong is that the "detect cpu features" and "use X instruction" is wrongly aligned. Thus, the output of the cpuid instruction would be appreciated and also which instruction fails to execute (you should be able to check by running the binary in a debugger and once the invalid instruction is triggered type "disassemble").

from mirage-crypto.

samoht avatar samoht commented on June 15, 2024

To give more context to that issue: this is causing tezos-node 8.0~rc1 to segfault crash immediately on this CPU:

(gdb) thread apply all bt
Thread 2 (Thread 0x7ffff7233700 (LWP 14043)):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x555559fb8d28 <pool_condition+40>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x555559fb8cc0 <pool_mutex>, cond=0x555559fb8d00 <pool_condition>) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=cond@entry=0x555559fb8d00 <pool_condition>, mutex=mutex@entry=0x555559fb8cc0 <pool_mutex>) at pthread_cond_wait.c:655
#3  0x0000555557fc0315 in lwt_unix_condition_wait (condition=condition@entry=0x555559fb8d00 <pool_condition>, mutex=mutex@entry=0x555559fb8cc0 <pool_mutex>) at lwt_unix_stubs.c:225
#4  0x0000555557fc057e in worker_loop (data=<optimized out>) at lwt_unix_stubs.c:937
#5  0x00007ffff7fadfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007ffff7c924cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1 (Thread 0x7ffff7b98080 (LWP 14038)):
#0  0x0000555557fbd63b in array_swap32 (nb=<optimized out>, s=<optimized out>, d=<optimized out>) at native/bitfn.h:55
#1  sha256_do_chunk (ctx=ctx@entry=0x7ffff7685740, buf=buf@entry=0x7ffff7685748) at native/sha256.c:83
#2  0x0000555557fbdd62 in _mc_sha256_update (ctx=0x7ffff7685740, data=<optimized out>, len=<optimized out>) at native/sha256.c:131
#3  0x0000555557fbb318 in mc_sha256_update (ctx=<optimized out>, src=<optimized out>, off=<optimized out>, len=<optimized out>) at native/hash_stubs.c:39
#4  0x0000555557a29d55 in camlMirage_crypto__Hash__fun_1548 () at src/native.ml:56
#5  0x0000555557a294f2 in camlMirage_crypto__Hash__feedi_545 () at src/hash.ml:64
#6  0x0000555557a22ad8 in camlMirage_crypto_rng__Fortuna__digesti_173 () at src/hash.ml:68
#7  0x0000555557a22d16 in camlMirage_crypto_rng__Fortuna__reseedi_397 () at rng/fortuna.ml:59
#8  0x0000555557a21a5c in camlMirage_crypto_rng__Rng__create_inner_316 () at option.ml:26
#9  0x00005555579a7009 in camlMirage_crypto_rng_unix__initialize_375 () at rng/unix/mirage_crypto_rng_unix.ml:41
#10 0x00005555579a6de3 in camlTls_lwt__entry () at lwt/tls_lwt.ml:267
#11 0x0000555556bba8b9 in caml_program ()
#12 0x00005555580b2834 in caml_start_program ()
#13 0x000055555809600d in caml_startup_common (argv=0x7fffffffe478, pooling=<optimized out>, pooling@entry=0) at startup_nat.c:159
#14 0x000055555809605b in caml_startup_exn (argv=<optimized out>) at startup_nat.c:169
#15 caml_startup (argv=<optimized out>) at startup_nat.c:169
#16 0x0000555556bb73fc in main (argc=<optimized out>, argv=<optimized out>) at main.c:44

Will post the result of disassemble as soon as I get it.

from mirage-crypto.

hannesm avatar hannesm commented on June 15, 2024

please also find out which mirage-crypto version is used.

is it a segmentation fault or invalid instruction?

from mirage-crypto.

hannesm avatar hannesm commented on June 15, 2024

was the binary compiled on the same CPU as it is being executed? array_swap32 uses bitfn_swap32 which uses __builtin_swap32 -- could you find out which C compiler and configuration was used (i.e. what is the implementation of __builtin_swap32, is this resulting in an invalid instruction?)

from mirage-crypto.

samoht avatar samoht commented on June 15, 2024

It's compiled and run on the same machine. Here is the output of disassemble:

tezos_node_gdb_disassemble.txt

And yes sorry it's a SIGILL (not a SIGKILL).

It's using mirage-crypto.0.8.0

from mirage-crypto.

samoht avatar samoht commented on June 15, 2024

And here gcc -E bitfn.h:
gcc-E_bitfn.h (1).txt

from mirage-crypto.

hannesm avatar hannesm commented on June 15, 2024

ok, so a dune runtest of mirage-crypto 0.8.0 as well fails on that CPU? (since SHA256 seems to trigger the illegal instruction, what about: test_symmetric_runner.exe only, and the commenting out all but the test_hash (and in there the SHA256 test))?

from mirage-crypto.

hannesm avatar hannesm commented on June 15, 2024

since it looks like __builtin_bswap32(x) is miscompiled here, could you try to figure out what the following does:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>

int main (int argc, char ** argv) {
  uint32_t e, f;
  e = strtol(argv[1], NULL, 16);
  f = __builtin_bswap32(e);
  printf("e %X f %X\n", e, f);
  return 0;
}

compiled and executed here, it emits a bswap instruction. I wonder what this does on your machine (please take the CFLAGS from whatever is used in the context of compiling mirage-crypto) -- dune clean && dune build --verbose --release show them -- in my case it is cc -O2 -fno-strict-aliasing -fwrapv -fPIC -D_FILE_OFFSET_BITS=64 -D_REENTRANT -O2 -fno-strict-aliasing -fwrapv -fPIC --std=c99 -Wall -Wextra -Wpedantic -O3 -DENTROPY -mrdrnd -mrdseed -DACCELERATE -mssse3 -maes -mpclmul -g -o test.exe test.c) -- I've read some articles that passing march here may result in illegal instructions.

executing ./test.exe aa55aa55 results in e AA55AA55 f 55AA55AA here -- for you I'd expect it to segfault.

from mirage-crypto.

samoht avatar samoht commented on June 15, 2024

Also:

# Cpuid.flags ();;
- : Cpuid.flag list Cpuid.result =
Ok
 [`CLFLUSH; `PSE36; `MMXEXT; `CMP_LEGACY; `F_3DNOW; `F_3DNOWPREFETCH;
  `PDPE1GB; `EXTAPIC; `LAHF_LM; `DE; `HT; `LM; `NX; `ABM; `CX8; `FPU; `IBS;
  `MCA; `MCE; `MMX; `MSR; `PAE; `PAT; `PGE; `PNI; `PSE; `SEP; `SSE; `SVM;
  `TSC; `VME; `WDT; `SSE4A; `SYSCALL; `SKINIT; `FXSR_OPT; `MISALIGNSSE;
  `CR8_LEGACY; `RDTSCP; `APIC; `CMOV; `CX16; `POPCNT; `FXSR; `MONITOR;
  `F_3DNOWEXT; `MTRR; `OSVW; `SSE2]

will try to execute the tests and report back.

from mirage-crypto.

samoht avatar samoht commented on June 15, 2024

@hannesm your code snippet segfaults (had to change uint32_t to u_int32_t to compile it with gcc).

foo2_disassemble.txt

gcc-E_c_code.c.txt

from mirage-crypto.

samoht avatar samoht commented on June 15, 2024

I don't have direct access to the machine where this fails, but I'll try to get the exact C flags which are used.

from mirage-crypto.

samoht avatar samoht commented on June 15, 2024

Hum yes sorry.

Here the C flags used:

/usr/bin/gcc -O2 -fno-strict-aliasing -fwrapv -fPIC -D_FILE_OFFSET_BITS=64 -D_REENTRANT -O2 -fno-strict-aliasing -fwrapv -fPIC --std=c99 -Wall -Wextra -Wpedantic -O3 -DENTROPY -mrdrnd -mrdseed -DACCELERATE -mssse3 -maes -mpclmul -g c_code.c -o foo3

and here is the result of gdb/run with an argument:

gdb) run aa55aa55
Starting program: /home/gonzo/foo3 aa55aa55
e AA55AA55 f 55AA55AA
[Inferior 1 (process 13304) exited normally]

(so it seems to work fine)

from mirage-crypto.

dinosaure avatar dinosaure commented on June 15, 2024

I think (and it's a suposition), __builtin_bswap32 can be implemented by an SSSE3 instruction pshufb (which is the invalid instruction) according to this comment (en). It's hard to believe that in the case of mirage-crypto, the compiler aggressively decide to use such instruction when it does not take such compilation path for the minor example.

We probably should try to compile mirage-crypto without -mssse3 and see what happens.

So I think the bug is related to:

  • An update of the compiler (which decide to aggressively optimize sha256_do_chunk
  • -mssse which is passed to the compiler even if the instruction does not exist (according to this output)

A possible fix should a use of cpuid in the config/cfg.ml to really know what is available or not.

from mirage-crypto.

hannesm avatar hannesm commented on June 15, 2024

Thanks for your comments and investigations. I'd still be interested in (a) dune runtest [with the expectation that the SHA256 test suite should fail / illegal instruction) and (b) (as mentioned by @dinosaure) remove the -mssse3 from cfg.ml and check that a dune runtest succeeds.

The underlying issue is while since 0.7.0 we have run-time CPU feature detection, this is not entirely true for the __builtin_bswap32 (neither __builtin_bswap64) in bitfn.h. There are several ways to fix this:

  • (a) revert to older bitfn.h code (with lots of ifdefs and some inline assembly)
  • (b) move to a pure C implementation of these functions
  • (c) use the run-time feature detection logic (as done in misc_sse.c for example -- the _mc_switch_accel(<feature>, <generic-function>, <optimized-version>))

Now (a) is safe but a bit unpleasant (and may not solve the issue at hand or others), (b) is safe and eventually slow, (c) may have some impact (since now every swap is guarded by an conditional (but branch prediction should help in all cases) -- so I tend towards (c) but would like to investigate the performance drawbacks. TL;DR: I don't think config.ml needs adjustments, but we should have both code paths in the binary and at runtime select the right implementation -- the machinery is already present, but not used for the bswap operations.

EDIT: to be clear, the current hypothesis is that gcc on the platform when getting -mssse3 will output SSE3 instructions, and esp. __builtin_bswap32 will result (in the sha256 usage) to pshufb which is not available and leading to an illegal instruction. To check this hypothesis, whoever has access to the CPU could remove the -mssse3 from config/cfg.ml and recompile / use dune runtest (after verifying that dune runtest hits the same issue).

from mirage-crypto.

samoht avatar samoht commented on June 15, 2024

Here the results of dune runtest before and after editing cfg.ml:

dune_runtest.txt

from mirage-crypto.

pirbo avatar pirbo commented on June 15, 2024

In case it's useful. At least a second user experienced the issue and opened https://gitlab.com/tezos/tezos/-/issues/1038

from mirage-crypto.

hannesm avatar hannesm commented on June 15, 2024

thanks for your input. I prepared #96 which should fix the issue. Could anyone with their hands on such a CPU (a) try that they can reproduce the issue by cloning master and executing dune runtest (which should fail with illegal instruciton), and (b) test whether #96 fixes the issue. When someone reports success, I'm fine to merge and cut a bugfix release.

from mirage-crypto.

RichAyotte avatar RichAyotte commented on June 15, 2024

Thank you. runtest for master and 6a266c0

dune-runtest-master.txt
dune-runtest-6a266c0.txt

from mirage-crypto.

sebeec avatar sebeec commented on June 15, 2024

Many thanks. Dune runtest results on a AMD(tm Phenom II X4 955:

dune-runtest-6a266c0.txt
dune-runtest-master.txt

from mirage-crypto.

hannesm avatar hannesm commented on June 15, 2024

thanks for your tests. since the test still lead to illegal instructions (not in sha256, but in chacha20), I added another commit (and tested with qemu-x86_64-static -cpu=phenom <binary>) which compiles the chacha code (as chacha_generic) without any -m flags, and depending on runtime CPU detection (and availablility of SSSE3) calls chacha_generic or chacha.

since I figured qemu is able to reproduce the earlier illegal instruction as well, I'll merge and release once CI passes (no requirement to wait for an actual hardware test, though @sebeec @RichAyotte if you could test 0f93117 73a04a0 1c6fc96 and report here, that'd be great). expected (as under qemu):

# qemu-x86_64-static  -cpu phenom _build/default/tests/test_symmetric_runner.exe 
accel: 
..............................................................................................................................................
Ran: 142 tests in: 0.38 seconds.
OK

from mirage-crypto.

hannesm avatar hannesm commented on June 15, 2024

fixed and PRed to opam-repository as ocaml/opam-repository#17917

from mirage-crypto.

tezms avatar tezms commented on June 15, 2024

Thanks, @hannesm !

Checked 1c6fc96 on a AMD(tm) Phenom II X4 955:

test_entropy alias tests/runtest                           
no CPU RNG available                                       
test entropy OK                                            
test_random_runner alias tests/runtest                     
.........                                                  
Ran: 9 tests in: 0.13 seconds.                             
OK                                                         
test_symmetric_runner alias tests/runtest                  
accel:                                                     
..........................................................................................
....................................................       
Ran: 142 tests in: 0.16 seconds.                           
OK

from mirage-crypto.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.