Given the following example:
pub fn f(a: &[i32; 4], b: &[i32; 4], c: &mut [i32; 4]) {
c[0] = a[0] + b[0]
c[1] = a[1] + b[1]
c[2] = a[2] + b[2]
c[3] = a[3] + b[3]
}
The Rust compiler rustc
provides two direct flags that control auto-vectorisation. However, rustc
does not do any auto-vectorization on its own but uses LLVM for that. However, rustc
provides the information to LLVM that allows it to decide which instructions get vectorized.
- no-vectorize-loops: If enabled, LLVM tries to unroll loops. llvm-docs
- no-vectorize-slp: If enabled, LLVM tries to combine similar independent instructions into SIMD instructions. llvm-docs
example::f:
movdqu xmm0, xmmword ptr [rdi]
movdqu xmm1, xmmword ptr [rsi]
paddd xmm1, xmm0
movdqu xmmword ptr [rdx], xmm1
ret
example::f:
mov eax, dword ptr [rsi]
add eax, dword ptr [rdi]
mov dword ptr [rdx], eax
mov eax, dword ptr [rsi + 4]
add eax, dword ptr [rdi + 4]
mov dword ptr [rdx + 4], eax
mov eax, dword ptr [rsi + 8]
add eax, dword ptr [rdi + 8]
mov dword ptr [rdx + 8], eax
mov eax, dword ptr [rsi + 12]
add eax, dword ptr [rdi + 12]
mov dword ptr [rdx + 12], eax
ret
Since the function does not include a loop, setting no-vectorize-loops
has no impact.
To make the comparisons fair, we set both no-vectorize-loops
and no-vectorize-slp
in our Cargo.toml
.