Comments (16)
Let's chat a bit about the type conversion when you get back to this. I don't think the type conversion should be particularly hard...
from heir.
@AlexanderViand-Intel #504 should show you how to convert to affine loops, though it still runs into the problem of a failed type conversion in --polynomial-to-standard
, which I suspect is an unrelated bug.
from heir.
Digging this back up, as I noticed (as part of starting on #559) that poly.ntt
and poly.intt
aren't ElementwiseMappable
and, in fact, can't be, because ElementwiseMappable
requires that all non-scalar types match, and a "tensorized" version of an NTT might look something like this:
#ring = #polynomial.ring<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>
%0= polynomial.ntt %some_poly : !polynomial.polynomial<#ring> -> tensor<1024xi64, #ring> // normal NTT
%1= polynomial.ntt %some_poly_tensor : tensor<2x!polynomial.polynomial<#ring>> -> tensor<2x1024xi64, #ring> // tensor NTT
and so the ElementwiseMappable verifier will complain that "all non-scalar operands/results must have the same shape and base type". In the future, it might be interesting to start a conversation upstream about the possibility of expanding ElementwiseMappable, but for now I'll just avoid adding the trait to the op and instead add a todo for custom patterns for poly.ntt
and poly.intt
to our elementwise-to-affine pass.
from heir.
@j2kun very kindly reminded me we can just multi-dim the tensors
from heir.
(1) Flatten the tensor: e.g. if we have a tensor polynomials, each of which are lowered to
tensor<1024xi64>
, then a size 2 tensor would becometensor<2048xi64>
. However, this would appear to severely complicate the lowering math.
The tensor<2x1024xi64>
seems to be more intuitive than tensor<2048xi64>
.
from heir.
Yep :) multi-dim it is. Haha it totally slipped my mind. I'll update with some code, I believe I need to add a type converter for ranked tensors, and then handle some tensor ops.
What's really strange is there's no interface pattern for this :/ which makes it really weird. Even if the type converter exists, I need to add support for converting ops like tensor.from_elements
- I believe some conversions write a basic catch-all op converter that just changes types for these ops.
from heir.
From our HEIR meeting today I suggested we should be able to find a way to use upstream passes to map elementwise operations to loops, perhaps via linalg.
@AlexanderViand-Intel could you post your sample IR?
from heir.
Sure, here's a minimal example, something like this also appears when lowering tests/bgv/to_polynomial.mlir
:
func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = polynomial.add(%arg0, %arg1) : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
Trying to lower this with -polynomial-to-standard
with the pass "as-is" (as of 04f6106) simply crashes on this cast in ConvertAdd
because the pass assumes the operands are polynomials but here they of course aren't.
Since the whole idea of traits like ElementwiseMappable
is that we shouldn't have to re-invent the wheel, I tried applying the existing -convert-elementwise-to-linalg
pass to the example, which results in:
#map = affine_map<(d0) -> (d0)>
func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%arg0, %arg1 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) outs(%arg0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) {
^bb0(%in: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, %in_0: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, %out: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>):
%1 = polynomial.add(%in, %in_0) : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
linalg.yield %1 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
} -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
However, from here I'm a bit stuck, none of the passes I tried would further simplify this in a meaningful way (e.g., convert it to loop).
Note that applying -polynomial-to-standard
on the linalg.generic
version as-is crashes because of unresolved materializations around yield, but adding builtin.unrealized_conversion_cast
as materializers lets you run the pass and see what's going on a bit better:
#map = affine_map<(d0) -> (d0)>
func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%arg0, %arg1 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) outs(%arg0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) {
^bb0(%in: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, %in_0: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>, %out: !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>):
%1 = builtin.unrealized_conversion_cast %in : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%2 = builtin.unrealized_conversion_cast %in_0 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%cst = arith.constant dense<33538049> : tensor<1024xi26>
%3 = arith.extsi %1 : tensor<1024xi25> to tensor<1024xi26>
%4 = arith.extsi %2 : tensor<1024xi25> to tensor<1024xi26>
%5 = arith.addi %3, %4 : tensor<1024xi26>
%6 = arith.remsi %5, %cst : tensor<1024xi26>
%7 = arith.trunci %6 : tensor<1024xi26> to tensor<1024xi25>
%8 = builtin.unrealized_conversion_cast %7 : tensor<1024xi25> to !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
linalg.yield %8 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
} -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
From here, we could think about how these builtin.unrealized_conversion_cast
s should be resolved and might arrive at something like this (manually adapted from the above) with multi-dimensional tensors:
#map = affine_map<(d0) -> (d0)>
func.func @test_bin_ops(%arg0: tensor<2x1024xi25>, %arg1: tensor<2x1024xi25>) -> tensor<2x1024xi25> {
%0 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%arg0, %arg1 : tensor<2x1024xi25>, tensor<2x1024xi25>) outs(%arg0 : tensor<2x1024xi25>) {
^bb0(%in: tensor<1024xi25>, %in_0: tensor<1024xi25>, %out: tensor<1024xi25>):
%cst = arith.constant dense<33538049> : tensor<1024xi26>
%3 = arith.extsi %in : tensor<1024xi25> to tensor<1024xi26>
%4 = arith.extsi %in_0 : tensor<1024xi25> to tensor<1024xi26>
%5 = arith.addi %3, %4 : tensor<1024xi26>
%6 = arith.remsi %5, %cst : tensor<1024xi26>
%7 = arith.trunci %6 : tensor<1024xi26> to tensor<1024xi25>
linalg.yield %7 : tensor<1024xi25>
} -> tensor<2x1024xi25>
return %0 : tensor<2x1024xi25>
}
Unfortunately, this doesn't work because linalg.generic
expects to be "looping" (conceptually) over all the dimensions: error: 'linalg.yield' op type of yield operand 1 ('tensor<1024xi25>') doesn't match the element type of the enclosing linalg.generic op ('i25')
. Of course, we could flatten tensor<2x1024xi25>
into tensor<2048xi25>
but that seems like a lot of work/boilerplate for something pretty messy.
from heir.
I think if we had a pass that converted elementwise mappable things to affine.for
instead of linalg.generic
, which might look a bit like this:
#map = affine_map<(d0) -> (d0)>
func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = affine.for %i = 0 to 2 iter_args(%a = %arg0) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%in = tensor.extract %arg0[%i] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%in_0 = tensor.extract %arg1[%i] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%1 = polynomial.add(%in, %in_0) : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
%t = tensor.insert %1 into %a[%i] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
affine.yield %t : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
We could then handle the type conversion gracefully with multi-dimensional tensors. I haven't yet found a pass that would do this (either going through linalg.generic
or directly), but I've also not spent as much time with the "standard" part of MLIR as @j2kun and @asraa so maybe you've got an idea here.
from heir.
@AlexanderViand-Intel #504 should show you how to convert to affine loops, though it still runs into the problem of a failed type conversion in
--polynomial-to-standard
, which I suspect is an unrelated bug.
Even my "manually converted" affine loop thing above runs into failed type conversion issues, which I think is because the type converter doesn't realize that it can convert from tensor<2x!poly.poly<...>>
to tensor<2x1024xi25>
(or whatever the poly in the tensor happens to convert to, so #505 might not be entirely unrelated?
from heir.
I see what you're saying, but I would expect it to just type-convert the content of the memref opaquely. Maybe the problem is that you can't have a memref of tensors? Anyhow, if you have time to root-cause that type conversion issue, that'd be helpful. Otherwise as least we have a clear path to convert elementwise to loops now :)
from heir.
I don't think it's memref related, because lowering my example above (no memrefs, just affine.for) with the -polynomial-to-lower
gives similar errors, and using the "augmented" version (with unrealized conversion cast) results in this:
func.func @test_bin_ops(%arg0: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>, %arg1: tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) -> tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> {
%0 = affine.for %arg2 = 0 to 2 iter_args(%arg3 = %arg0) -> (tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) {
%extracted = tensor.extract %arg0[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%1 = builtin.unrealized_conversion_cast %extracted : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%extracted_0 = tensor.extract %arg1[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%2 = builtin.unrealized_conversion_cast %extracted_0 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%cst = arith.constant dense<33538049> : tensor<1024xi26>
%3 = arith.extsi %1 : tensor<1024xi25> to tensor<1024xi26>
%4 = arith.extsi %2 : tensor<1024xi25> to tensor<1024xi26>
%5 = arith.addi %3, %4 : tensor<1024xi26>
%6 = arith.remsi %5, %cst : tensor<1024xi26>
%7 = arith.trunci %6 : tensor<1024xi26> to tensor<1024xi25>
%8 = builtin.unrealized_conversion_cast %7 : tensor<1024xi25> to !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
%inserted = tensor.insert %8 into %arg3[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
affine.yield %inserted : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
return %0 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
If I add an explicit typeconversion then we get a bit closer (note the func has been handled properly):
func.func @test_bin_ops(%arg0: tensor<1024x2xi25>, %arg1: tensor<1024x2xi25>) -> tensor<1024x2xi25> {
%0 = builtin.unrealized_conversion_cast %arg1 : tensor<1024x2xi25> to tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%1 = builtin.unrealized_conversion_cast %arg0 : tensor<1024x2xi25> to tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%2 = affine.for %arg2 = 0 to 2 iter_args(%arg3 = %1) -> (tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>) {
%extracted = tensor.extract %1[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%4 = builtin.unrealized_conversion_cast %extracted : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%extracted_0 = tensor.extract %0[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
%5 = builtin.unrealized_conversion_cast %extracted_0 : !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>> to tensor<1024xi25>
%cst = arith.constant dense<33538049> : tensor<1024xi26>
%6 = arith.extsi %4 : tensor<1024xi25> to tensor<1024xi26>
%7 = arith.extsi %5 : tensor<1024xi25> to tensor<1024xi26>
%8 = arith.addi %6, %7 : tensor<1024xi26>
%9 = arith.remsi %8, %cst : tensor<1024xi26>
%10 = arith.trunci %9 : tensor<1024xi26> to tensor<1024xi25>
%11 = builtin.unrealized_conversion_cast %10 : tensor<1024xi25> to !polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>
%inserted = tensor.insert %11 into %arg3[%arg2] : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
affine.yield %inserted : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>>
}
%3 = builtin.unrealized_conversion_cast %2 : tensor<2x!polynomial.polynomial<<cmod=33538049, ideal=#polynomial.polynomial<1 + x**1024>>>> to tensor<1024x2xi25>
return %3 : tensor<1024x2xi25>
}
but it's still not quite there, because the affine.for
isn't converted. I think this is because it's not part of the operations that addStructuralConversionPatterns
handles. Do you know if there's an equivalent for affine, or do we need to manually mark it as a dynamically legal op?
from heir.
I think I have an idea of how to handle this (and also #505), WIP here. It nearly works, but right now only with --verify-each=false
because it produces invalid tensor.extract
and affine.load
operations.
The reason is that when we go from extracting a poly from tensor<2x!polynomial.polynomial<..>>
to triyng the same from a tensor<2x1024xi25>
, we need to switch from tensor.extract
to tensor.slice
(and ditto for affine.load
to affine.load_vector
). I'll put together a PR later today.
PS: this seems like it would pop up nearly as frequently as the func.func
/control-flow conversion needs, so I wouldn't be surprised if there's already something like addStructuralConversionPatterns
but for containers - I just couldn't find one when I looked around the upstream examples.
PPS: 🤦 addStructuralConversionPatterns
is a HEIR thing, not an MLIR thing, so no wonder there's no equivalent (yet!)
from heir.
The "tensor of tensor" PR (#508) lays the groundwork for supporting poly operations on tensors, but it's not a full end-to-end solution because it doesn't support the code generated by the --convert-elementwise-to-linalg
chain (c.f. #504), so even with that we still can't quite lower a polynomial.add %t1, %t2 : tensor<2x!polynomial.polynomial>
.
Instead of adding "memref of memref" helpers and then trying to figure out how to elegantly handle the raising from memref
back to tensor
(c.f. #505) I figured it might actually be easier to just add a --convert-elementwise-to-affine
pass, which seems like a generally useful thing to have, so I started a draft PR for that: #524
from heir.
This issue has 2 outstanding TODOs:
- lib/Conversion/PolynomialToStandard/PolynomialToStandard.cpp:489: Handle tensor of polys.
- lib/Conversion/PolynomialToStandard/PolynomialToStandard.cpp:622: Handle tensor of polys.
This comment was autogenerated by todo-backlinks
from heir.
Reopening just to clean up the remaining comments
from heir.
Related Issues (20)
- Website includes builtin types in documentation HOT 1
- poly: use poly.ntt as a rewrite for polynomial.mul HOT 4
- Investigate what parts of CiFlow can be incorporated into HEIR
- Add an importer for AutoHoG json netlists HOT 4
- Compiler error HOT 1
- Getting Started guide missing details on running hello-world example HOT 1
- Write documentation for people who want to integrate a hardware accelerator
- Enable asan/ubsan in HEIR HOT 6
- SIMD vectorizer: Handle arbitrary sized vectors using EVA style splitting and repeating HOT 3
- Verify lut integer sizes for CGGI lut ops
- Canonicalize `cggi.lut2` and `cggi.lut3` as `cggi.lut_lincomb`
- Create a new dialect `arith_ext`
- Add passes to convert `arith.remui` to use the `arith_ext` ops
- trouble building HEIR: timeout running `git reset` HOT 6
- Update SIMD design doc with more details HOT 1
- docs: add tensor_ext canonicalization documentation somewhere under the Dialect/TensorExt page HOT 3
- polynomial: upstream lowerings HOT 1
- Accelerator-Friendly Lowering of polynomial.mul to NTT HOT 13
- segfault in heir-translate HOT 8
- test: re-enable openfhe end to end tests after #727 is fixed upstream
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from heir.