Comments (19)
@titaiwangms Do you have an idea? Looks like it is related to https://github.com/microsoft/onnxscript/blame/0d98619dee85025f8fb110864607f6f477c3d8ae/onnxscript/function_libs/torch_lib/ops/core.py#L5625
from onnxscript.
I can take a look after I done my work on hands,
from onnxscript.
@xadupre I think optimizer should be already applied in torch-nightly? https://github.com/pytorch/pytorch/blob/d5182bb75bbc109cb327212e7205981fbf72cb5e/torch/onnx/_internal/exporter.py#L1274
Are you writing new optimization? Just trying to understand the usage here.
Specifically, if we try:
import onnx
from onnxscript import optimizer
from onnxscript.rewriter import onnxruntime as ort_rewriter
onx = onnx.load("dump3bug.onnx")
onnx.checker.check_model(onx, full_check=True)
optimized = optimizer.optimize(onx)
The same error is spotted by checker.
from onnxscript.
It does. I tweaked torch code in onnxruntime.py to get the model before it gets optimized to know of the error happens before optimization or after. It is after.
from onnxscript.
Could you update the model to the one before optimizer?
from onnxscript.
I'll check again but it should be the one before optimizer.
from onnxscript.
I mean the model in zip doesn't pass onnx.checker.check_model(model, full_check=True). That's why it gets the error message from
It's not even hitting the constant folding and general rewriter yet it seems.
from onnxscript.
I wonder if we should put onnx.checker to guard the models generated from converter/dort. Or we already did?
from onnxscript.
I mean the model in zip doesn't pass onnx.checker.check_model(model, full_check=True).
True ... tried it, and this seems to fail
from onnxscript.
I would not call onnx.checker. The converter may introduce nodes coming from domain com.microsoft. I created PR #1467 to replicate the issue.
from onnxscript.
So I think there are two issues here. The first one is that if we don't want to make sure our models passing checker before feeding to optimizer, we should turn off strict_mode in ONNX shape type inference inside optimizer, since they are basically the same. I will submit a PR for this to unblock this model.
The other issue is that, in torchlib, we respect PyTorch native_batch_norm CUDA to accept size=0 outputs in index=1 and 2 (here), which is originated from PyTorch code. That's why in the error message saying the existing shape is 0. However, ONNX shape type inference infers this as 2. @justinchuby @xiaowuhu @gramalingam any suggestion on this?
from onnxscript.
Do we know if this model is exported with cuda or with cpu? Even though the models exported under cuda is different from that under cpu, each of them should pass shape inference, or there must be something I don't remember?
from onnxscript.
Do we know if this model is exported with cuda or with cpu? Even though the models exported under cuda is different from that under cpu, each of them should pass shape inference, or there must be something I don't remember?
The tests @xadupre I executed with CUDA, and repro the error. Could you point the code "passing shape inference". My guess is that one does not invoke strict mode.
from onnxscript.
I think it should be covered in the torchlib tests, but we don't run it with cuda regularly.
from onnxscript.
Hi, is this related to #1256 ?
from onnxscript.
Do we know if this model is exported with cuda or with cpu? Even though the models exported under cuda is different from that under cpu, each of them should pass shape inference, or there must be something I don't remember?
Given the comment in the code that Titai links above, it appears that cuda/cpu have different behavior? But the onnxscript encoding chooses one of the two behaviors (it says cuda) ... now, if the actual shapes are being emitted as produced by the runtime, there is going to be a mismatch between shape inferred by ONNX (the cuda shape) and the valueinfo shape embedded (coming from cpu) ... that would explain it, right?
from onnxscript.
But Titai also says the error is reproduced in a cuda run, which seems strange (inconsistent with the message here)
from onnxscript.
I guess we need to find out what happened in ONNX shape type inference. One can try this out with #1467 test cases, and turn #1472 strict mode back to True.
from onnxscript.
Write down some findings today:
This is only reproducible on DORT. Dynamo_export does not support this case, because it is decomposed at aot_autograd (Functionalization). And ExportedProgram can't repro this because the unused outputs are trimmed.
from onnxscript.
Related Issues (20)
- [torchlib] Implement <OpOverload(op='aten.take', overload='default')>
- [torchlib] Implement <OpOverload(op='aten.unsafe_split', overload='Tensor')>
- [torchlib] Implement <OpOverload(op='torchvision.nms', overload='default')>
- [torchlib] Implement <OpOverload(op='torchvision.roi_align', overload='default')>
- [torchlib] Implement <OpOverload(op='torchvision.roi_pool', overload='default')>
- [exporter] Create a pass to turn tensors into external tensors HOT 1
- [core] Migrate OpSignature to ONNX Script
- [exporter] Create an IR modularization pass
- [exporter] Create an inliner pass for the IR
- [IR] Create a utility for merging models
- Use IR in the optimizer HOT 2
- Extend sequence optimizations in optimizer
- [IR] Journaling system
- [torchlib] Improve index_put
- Preserve metadata_props in optimizer
- [torchlib] Reimplement as_strided
- [torchlib] Fix quantization ops
- [optimizer] Collapse `Slice`s
- [torchlib] Reimplement batchnorm
- IR improvements
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onnxscript.