Comments (12)
@Kerollmops No x86 intrinsic per se will be "added", so in a strict sense, the answer is simply No.
...but we will probably offer general APIs that do similar things. The result may be less terse, as e.g. it is quite likely we will offer safe transmutation functions that allow you to use to_ne_bytes
and then do the byte rotation (and then interleaving) on your own and then cast from_ne_bytes
, and hopefully LLVM will optimize that correctly. There is not actually a whole lot we can do if it doesn't, honestly, as we have a fairly limited amount of power over codegen on this end.
A generalized byte permutation in a single function seems plausible but that's going to take Some Design, especially given the obstacles we already have w/r/t shuffle APIs.
Also that intrinsic is already supported in core::arch and this sort of request reinforces why we will allow people to cast into hardware types and use such intrinsics if they need that kind of optimization.
from portable-simd.
It's not bytewise, it's bitwise. to/from_ne_bytes doesn't really help.
from portable-simd.
Seems reasonable to put in. I'm not sure how people would want to define it for things other than 128-bit size, but a guess a general byte rotation might be fine.
from portable-simd.
This is a highliy specialized instruction that is only available on x86. This makes it a bad fit for stdsimd. Stdsimd is supposed to be roughly the biggest common denominator of all platforms supported by rust. Of course LLVM is allowed to optimize a sequence of functions that behaves identical to that intrinsic to a single instruction.
from portable-simd.
Naw it's got a very clear semantics though, "rotate the value by N bytes", which makes it at worst a slightly odd shuffle. It's a reasonable helper method to have i think.
from portable-simd.
@Lokathor It isn't a byte rotate at all as far as I know. It concatenates blocks from both arguments, shifts a given amount and then takes the lower half of each block.
from portable-simd.
Yeah, they're not really rotate. They're really useful where available though... I called it out a long time ago as the kind of instruction that would be useful to support but might be hard to describe semantically...
from portable-simd.
ah my mistake, i remember now, it's only a rotate if you pass the same register as both arguments.
the general two-arg form might be weird enough to be very low priority or even out of scope.
from portable-simd.
This kind of thing is why I was hoping we'd land on some generalization of permutation, which would handle a lot of these styles of intrinsics... but I don't really know what that would look like.
from portable-simd.
the intel guide says
Operation
tmp[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8)
dst[127:0] := tmp[127:0]
which seems byte-wise to me.
from portable-simd.
Ah, right, hmm, my bad. There are some bitwise permutation operations but I'm mistaken here.
from portable-simd.
Thank you very much for all your fast answers, I wasn't expecting this amount of interest here 😄
The fact that we will rely on the LLVM codegen suits me and as you say I can use the core::intrinsic
function on x86.
from portable-simd.
Related Issues (20)
- ensure array/slice load/store functions actually optimize to LLVM IR vector `load`/`store` instructions
- `cast_ptr` is willing to cast between mut and non-mut pointers HOT 3
- Not using WASM bitmask / all_true instructions HOT 4
- Looking for "blend" methods. HOT 2
- Bounds check is not eliminated HOT 5
- How to `Simd::cast` now that it is removed HOT 6
- Critical issues before stabilization HOT 23
- Support for `_mm_maddubs_epi16` and `_mm_maddubs_epi16` and similar? HOT 2
- Saturating casts for integers, like `_mm_packus_epi16` and `_mm_packs_epi16` HOT 5
- Ensure `load`s of vector types have `!noundef`
- API to specify max number of elements HOT 5
- from_bitmask_vector on big-endian calls simd_select_bitmask with the mask at the wrong end of the byte HOT 1
- Move intrinsic declarations to core and verify behavior HOT 1
- Simd::load implementation does not match documentation
- UB for certain transmutes could be easier to detect
- MulAdd and MulAddAssign HOT 1
- Bit operations on float HOT 8
- Missing SIMD version of `round_ties_even` HOT 1
- migration pitfall from packed_simd: PartialOrd behaves differently? HOT 4
- Seemless softwere/hardwere computation interoperation HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from portable-simd.