microsoft / onnxscript Goto Github PK
View Code? Open in Web Editor NEWONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
Home Page: https://onnxscript.ai/
License: MIT License
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
Home Page: https://onnxscript.ai/
License: MIT License
Tracking issue for allowing inputs by name in function definitions.
The queue time on ADO is longer than the actual execution time. Too much wait.
Having pyi files helps both users and type checkers / editors to understand what ops are available and what to expect. We can also add the documentation from the proto files to the generated pyi.
We can refer to https://github.com/microsoft/onnxruntime/blob/main/orttraining/orttraining/eager/opgen/opgen/generator.py and https://github.com/microsoft/onnxruntime/blob/main/orttraining/orttraining/eager/opgen/opgen/onnxops.py
All functions decorated with script()
will be compiled in import time. With a potential library of >~1000 functions, this may translate to a long import time (to measure). Potentially, we can compile the function just in time when used.
Liveness analysis does not yet handle looping constructs, including break statements.
It helps if we set up linters early in the development process (less big PRs for fixes in the future). We may consider: mypy, pylint, black, isort, pydocstyle, flake8, bandit and xdoctest.
We would like to annotate the types with typing.Annotated
(https://docs.python.org/3/library/typing.html#typing.Annotated) to include run time value checking for the ATen lib. Annotate should only be used for attributes.
@onnxscript.script()
@atenop("aten::elu")
def Elu(
self,
alpha: float = 1.0,
scale: Annotated[float, Is[lambda x: x == 1.0]] = 1.0, # Here, just need to take the float type out
input_scale: Annotated[float, Is[lambda x: x == 1.0]] = 1.0,
):
# del scale
# del input_scale
return op.Elu(self, alpha=alpha)
Currently in eager mode, an int passed as a positional argument is treated as a tensor and one as a keyword argument is considered a raw scalar. This behavior can create confusion for users.
There is also a potential for making programmatic calls trickier because the tensors cannot by supplied by name.
Currently @script
does not work in python interactive mode because inspect
cannot find code source in the command prompt. However, there are hacks we can use.
dill implements a hack that reads the input buff to retrieve the source code: https://github.com/uqfoundation/dill/blob/master/dill/source.py#L326-L415.
An example of its usage is in the taichi project. It wraps the dill code in the sourceinspect library that supports inspecting code in more environments: https://github.com/taichi-dev/sourceinspect
When an argument is not used, I would like to delete it as good python code does. The arguments cannot be removed if we wanted to keep a good correspondence between aten signatures and the lib.
I assume onnxscript can just do nothing for del
?
@onnxscript.script()
def Elu(
self,
alpha: float = 1.0,
scale: float = 1.0,
input_scale: float = 1.0,
):
del scale # Right here, it is now ValueError: ERROR None:10 -- line: del input_scale
del input_scale
return op.Elu(self, alpha=alpha)
Generics
alpha_dropout(Tensor(a!) input, float p, bool train) -> Tensor(a!)
def alpha_dropout(input: TensorType[T, ...], p: float, train: bool) -> TensorType[T, ...]
Dims
reflection_pad3d(Tensor self, SymInt[6] padding) -> Tensor
def reflection_pad3d(self: TensorType[Any, ...], padding: TensorType[INT64, [6]]) -> TensorType[Any, ...]
def aten_transpose(self, dim0: int, dim1: int):
# transpose.int(Tensor(a) self, int dim0, int dim1) -> Tensor(a)
# FIXME(justinchuby): onnxscript raises Unsupported expression type
return op.Transpose(self, [dim0, dim1])
cc @fatcat-z
Potentially increase user friendliness by reconciling opset versions in a function in compile time.
For testing purposes, we would like to create a utility to compare 2 FunctionProtos. And, similarly, 2 ModelProtos.
For robust checking, ideally it should support the following features, but it is okay to start with something simpler first, and add these features.
To allow users choose an implementation based on the implemented subset of an opset, PyTorch implements a decomposition system that allows users to choose the one they can support. This idea is potentially useful for onnxscript functions too, where a function can have more than one onnx decomposed implementation.
I created this toy example
@script()
def FuncWithStr(input, negative_slope: float = 0.01, mode: str = "foo"):
if mode == "foo":
return input
else:
zero = op.CastLike(0, input)
negative_slope = op.CastLike(negative_slope, input)
return op.Where(input < zero, negative_slope * input, input)
which I think is common for pytorch functions.
I get
ValueError: ERROR
thenGraph_3:4 -- line: return input
Return statements are not permitted inside control-flow statements.
Should I break the logic out to a non-script function and conditionally call two different onnx functions for the two cases instead?
We should publish the documentation somewhere. GitHub pages?
Keep docstring style consistent. We should pick one from {google, numpy, pep257} that is checkable by http://www.pydocstyle.org/en/stable/error_codes.html#default-conventions.
I suggest the google style because it is less verbose, easy to read and write and is relatively popular.
Some of the restructuredtext style docs can be converted using https://github.com/cbillingham/docconvert
When I was browsing the test directory, I realized test_onnx_backend
is the only one that starts with "test". We should rename it for consistency.
When we export/convert a Python function to a ModelProto, we need to identify the set of functions that will be included in the generated ModelProto as model-local functions. Furthermore, the users should be able to control this. For example, I might want to create model that calls Relu, with or without including the function-definition for Relu (even though we might have a function-definition for Relu available).
(See PR: #41 )
It would be helpful to show what types of inputs are allowed/expected types in eager mode calls, the name of the argument, as well as how one can make it right.
The current message is
Traceback (most recent call last):
File "/home/justinchu/dev/onnx-script/playground/test_func.py", line 13, in <module>
result = LeakyRelu(np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=np.float32), 0.1)
File "/home/justinchu/dev/onnx-script/onnxscript/values.py", line 192, in __call__
return self._usercall(*args, **kwargs)
File "/home/justinchu/dev/onnx-script/onnxscript/values.py", line 203, in _usercall
raise TypeError(f"Unexpected input type {type(a)} for an input {i}.")
TypeError: Unexpected input type <class 'float'> for an input 1.
FloatType = FLOAT16 | FLOAT | DOUBLE
def aten_relu6(self: FloatType) -> FloatType:
zero = op.CastLike(op.Constant(value_float=0.0), self)
return op.Max(self, zero)
raises
if fn.returns:
returntype = self.eval_constant_expr(fn.returns)
if isinstance(returntype, tuple):
assert all(ta.is_valid(t) for t in returntype)
self.returntype = returntype
else:
> assert ta.is_valid(returntype)
E AssertionError
cc @gramalingam
Potentially have a way to specify tensor shapes while authoring scripts.
References:
def aten_elu__int(
self: IntType, alpha: float = 1.0, scale: Annotated[float, Is[lambda x: x == 1.0]] = 1.0,
input_scale: Annotated[float, Is[lambda x: x == 1.0]] = 1.0,
) -> TensorType:
return op.Elu(op.Cast(self, to=onnxscript.FLOAT), alpha=alpha)
pylance will complain:
Argument of type "BFLOAT16 | BOOL | DOUBLE | FLOAT | FLOAT16 | INT16 | INT32 | INT64 | INT8 | STRING | UINT16 | UINT32 | UINT64 | UINT8" cannot be assigned to parameter "X" of type "DOUBLE | FLOAT | FLOAT16" in function "Elu"
Type "BFLOAT16 | BOOL | DOUBLE | FLOAT | FLOAT16 | INT16 | INT32 | INT64 | INT8 | STRING | UINT16 | UINT32 | UINT64 | UINT8" cannot be assigned to type "DOUBLE | FLOAT | FLOAT16"
Type "BFLOAT16" cannot be assigned to type "DOUBLE | FLOAT | FLOAT16"
"BFLOAT16" is incompatible with "DOUBLE"
"BFLOAT16" is incompatible with "FLOAT"
"BFLOAT16" is incompatible with "FLOAT16"
Making to=
to take a generic and making it to return a TypeGuard
may help.
https://mypy.readthedocs.io/en/stable/type_narrowing.html#user-defined-type-guards
cc @abock
Suggested by Justin:
>We should consider limiting the github token's permission for this job:
>https://docs.github.com/en/actions/security-guides/automatic-token-authentication#example-1-passing-the-github_token-as-an-input
>https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions
I created a function that I forgot to add return
def aten_lt(self, other):
# lt.Tensor(Tensor self, Tensor other) -> Tensor
# TODO(justinchuby): Input spec: non bool tensor
# Boolean inputs can be pre-casted by policy
op.Less(self, other)
The message is
onnxscript/main.py:105: in transform
result = script_check(ast, opset, env, src, default_opset=default_opset)
onnxscript/main.py:53: in script_check
return convert.top_level_stmt(f)
onnxscript/converter.py:1337: in top_level_stmt
analysis.do_liveness_analysis(stmt, self.message)
onnxscript/analysis.py:157: in do_liveness_analysis
live = visit(s, live)
onnxscript/analysis.py:98: in visit
live = do_visit(stmt, live_out)
onnxscript/analysis.py:152: in do_visit
raise ValueError(formatter(stmt, f"Unsupported statement type {type(stmt)!r}."))
onnxscript/converter.py:235: in message
return self.source_of(node).msg(error_msg)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <onnxscript.converter.Converter object at 0x7f257b1f5e70>, node = <ast.Expr object at 0x7f257b1f5d80>
def source_of(self, node: ast.AST) -> sourceinfo.SourceInfo:
> return sourceinfo.SourceInfo(node, self.source, self.current_fn.name)
E AttributeError: 'NoneType' object has no attribute 'name'
onnxscript/converter.py:231: AttributeError
I used main branch with
pytest onnxscript/test
And we had a failed message on onnxscript/test/eager_test.py::TestOnnxSignal::test_dft_rstft_istft
.
Tracking issue for fixing B023 flake8 errors
Traceback (most recent call last):
File "/home/justinchu/dev/onnx-script/onnxscript/poc/os_graph_builder.py", line 145, in <module>
onnx_model = gb.make_model(model_name)
File "/home/justinchu/dev/onnx-script/onnxscript/poc/os_graph_builder.py", line 77, in make_model
checker.check_model(self.onnx_model)
File "/home/justinchu/anaconda3/envs/onnx/lib/python3.10/site-packages/onnx/checker.py", line 106, in check_model
C.check_model(protobuf_string)
onnx.onnx_cpp2py_export.checker.ValidationError: Graph must be in single static assignment (SSA) form, however 'other' has been used as output names multiple times.
From script
@script()
def aten_add(self, other, alpha: float = 1) -> TensorType:
# add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
if alpha != 1:
other = op.Mul(other, alpha) # type: ignore[arg-type]
return op.Add(self, other)
cc @gramalingam
__init__
).Prevent circular import errors: Programming FAQ — Python 3.10.4 documentation
Circular imports are fine where both modules use the “import ” form of import. They fail when the 2nd module wants to grab a name out of the first (“from module import name”) and the import is at the top level. That’s because names in the 1st are not yet available, because the first module is busy importing the 2nd.
Clean namespace: For example, readers don’t need to backtrack to see sleep is a function from time, as opposed to a function defined in the file. https://google.github.io/styleguide/pyguide.html#22-imports
ORT doesn't support float64 and int for many ops, which impacts test coverage.
In eager mode, the tensors currently need to be np arrays. We can leverage numpy's conversion capability (https://stackoverflow.com/questions/40378427/numpy-formal-definition-of-array-like-objects#:~:text=%22Array%2Dlike%22%20is%20more,interpret%20it%20as%20an%20array.)
Specifically, it would be nice to support torch.Tensor
s, which implemented the __array__
method. I think wrapping everything with np.array
will do, and it shouldn't create extra copies?
(a) We would like to be able to run onnxscript functions by calling ORT to execute each primitive op. This will allow us to debug function definitions more easily.
As an example, consider the following onnxscript code fragment, which defines a gemm followed by gelu in terms of more primitive operations.
import oxs
def gemmgelu(
A: FLOAT[2048, 16],
W: FLOAT[16, 4096],
Bias: FLOAT[4096]
) -> FLOAT[2048, 4096]:
a = oxs.Constant(value_float=0.5)
b = oxs.Constant(value_float=0.797885)
c = oxs.Constant(value_float=0.035677)
one = oxs.Constant(value_float=1.0)
P1 = oxs.MatMul(A, W)
X = oxs.Add(P1, Bias)
T1 = oxs.Mul(X, X)
T2 = oxs.Mul(c, T1)
T3 = oxs.Add(b, T2)
T4 = oxs.Mul(X, T3)
T5 = oxs.Tanh(T4)
T6 = oxs.Add(one, T5)
T7 = oxs.Mul(X, T6)
Y = oxs.Mul(a, T7)
return Y
print(gemmgelu(a, w, b))
We would like to execute this in a standard python debugger, by calling on ORT kernels to execute each op.
Finally, we would also like to be able to use test-cases defined for the function-op (eg., from onnx) to check correctness of function definition.
Otherwise there will be cyclic imports. Seen in #204 (comment).
We may be able to do this by updating the ImportAdjuster
Reference: https://github.com/microsoft/onnx-script/wiki/Coding-style
ModuleNotFoundError: No module named 'autopep8' when running pytest onnxscript/test
Creating a pile-on issue for problems with lintrunner
.
lintrunner
does not work appropriately on Windows... this is a barrier to any dev working in Windows. WSL should not be a workaround.
Timed out:
WIP in https://microsoft.sharepoint.com/:w:/t/ONNX2/EcUaSHDlDiBFvGGX5BC49Z0B0mHhO7s_6uLeVrOoDE4n2w?e=ZXXIgy. Will move here when stable.
This design doc reflects thinking until Jan 2023. The design may have evolved since then, especially around what we call a "GraphBuilder" in this doc; some assumptions may no longer be accurate; but it should capture the gist for the torch_lib.
Created: December 2022
Updated: January 2023
Authors: @justinchuby , @fatcat-z , @xiaowuhu
This document aims to provide a design of the ATen Op Library in ONNX
Script and its integration with the new torch.fx exporter.
https://github.com/microsoft/onnx-converters-private/issues/124
We design the onnxscript ATen library and its integration with exporters
into three major components: the function library itself, the
GraphBuilder, and the exporter side logic.
The function library is a collection of ONNX functions written in
onnxscript. As such, it only contains logic representable by an ONNX
function proto. Each function matches, as closely as possible, the
signature of a torch.ops.aten operator. The function decomposes the ATen
operator with one or more ONNX operators.
A function has two roles, (1) decompose the ATen op to ONNX IR, and
(2) specify the requirements for inputs (expected input types etc.).
(2) is necessary because ONNX function cannot handle dtype dependent
logic. We will need additional code around the functions to bridge the
inputs and/or dispatch to different function overloads on the exporter
side. Requirements for inputs serve as meta information that can be
leveraged by the exporter for it to decide which function to use.
Based on the constraints of ONNX functions and principles of separation
of responsibilities, A function does not verify inputs or
handle/emit any errors. It should be regarded as a description of
the decomposition logic (which gets translated into a proto). Components
that leverage the functions are responsible for verifying inputs.
All functions will be tested by PyTorch's OpInfo database, using
onnxscript's eager mode, to make sure the numerical output matches that
of the ATen operator.
The function library additionally provides a default op-name-to-function
mapping for lookup in the exporter.
Having all the functions built, we still need a mechanism to synthesize
them into an onnx model graph. In the current version of torch.onnx
exporter, we use torchscript as the IR for the graph. As we envision the
new exporter, it makes sense for all onnx-related logic to be handled by
onnx. The proposed Graph Builder component will replace torchscript to
store the graph being built, and provide limited but necessary graph
manipulation (TBD, e.g. graph stitching) capabilities. Its main
responsibilities are:
Maintain one, or potentially more, representations of ONNX (sub)
graphs and (TBD, the relationship between framework native
input/output/operator representation and its internal
representation)
Provide an API to acquire the operator information from the
exporter, as well as inputs/outputs of the whole graph.
Define the operator information needed and a protocol for the
exporter to supply the information. The protocol should define
traits that will be implemented by the exporter.
Provide a protocol for the exporter to supply stack traces and
diagnostics information and preserve them through the exported model
in a sensible representation (most likely SARIF).
Provide the capability of executing any (sub)graph being built for
debuggability, such that users can e.g. examine any intermediate
results or run dissect algorithms on the partial graphs and
programmatically compare results.
Serialize graphs into ONNX model proto.
Build in guards for catching graph errors during graph
construction and provide useful error messages, leveraging
onnxscript/onnx capabilities and input requirements specified by
individual ONNX functions.
Provide a general mechanism to insert glue nodes to bridge dtype
discrepancy between input tensors and what functions expect, based
on the function input requirements.
Ensure the produced onnx model is valid, and produce diagnosable
errors as early as it can.
(TBD) Maintain a Python source representation of the model when a
model cannot be represented solely by onnx
Graph optimization and transformations are out of scope in this design.
To provide a consistent experience for both exporter developers and
onnxscript authors, we propose to use the same interface as onnxscript
ops to produce graph by implementing a graph capturing Evaluator (which
internally talks to the GraphBuilder).
The following example builds an ONNX graph:
graphbuilder = GraphBuilder()
a = graphbuilder.input(dtype=..., shape=[2, 3])
b = graphbuilder.input(dtype=..., shape=...)
c = op.Mul(a, b)
d = aten_lib.ops.elu(c) # A function from the function library
print(a.rank()) # Python logic that is not possible in onnx functions
graphbuilder.set_output(c, d)
This way exporter developers and other users do not have to use the
lower-level graph manipulation APIs provided the GraphBuilder like
make_node, etc.
TODO
TODO
The exporter is responsible for capturing a computation graph and
transforming it into a composition of ATen ops. It should also
We envision the export process roughly as the following.
A core goal of this design is to provide a delightful experience. We
consider the experience from perspectives of (1) onnx-script Aten
library authors (2) Exporter developers (3) Exporter users.
A delightful experience should include
TODO
We code gen the function signatures from native_functions.yaml
from
pytorch to ensure correctness of the function signatures. This will
serve as a onetime tool to get us started. Having the tool to keep the
implementation updated can be more work than desired and is not planned.
Example generated signatures: #223
To support graph conversion in the exporter, we focus on ensuring that
the model is correct and complete.
if
nodes in a function thatcast
nodes when needed to ensureGraph optimization is out of scope and is assumed to be done by
downstream optimization passes and runtimes.
The ATen function library assumes all inputs are non-quantized.
Quantized tensors and operators should be managed by the exporter.
GraphBuilder can provide a procedure to represent a dequant-reqant
block.
TBD. Needs input.
TODO: This needs a separate design
See GraphBuilder
To ensure correctness and scale coverage. We can also set break points
and trace code execution easily. TODO
Function calls function. We need delayed script compilation. TODO
Maintain invariants across components. Avoid raising errors in the
middle of computation logic. TODO
Most of the tests are to make sure the aten functions in onnxscript are
robust to different inputs and match the torch eager mode output. We
will leverage the Pytorch OpInfo
for generating sample test inputs and the onnxscript eager mode
evaluation for getting the function output.
Quantized ops tend to not have OpInfos. We can (1) Create simple tests
(2) work with the torch team to define OpInfos for those ops when high
coverage is needed.
PyTorch has a decomp (_decomp,_prim) library that decomposes aten ops
with more primitive aten ops. It is currently used by torch dynamo for
easing backend support. While we should leverage this feature in the
dynamo implementation, we should still aim to implement most of the
decompositions in this library so that code can be reused for other
frameworks and provide enough information for the downstream compiler
optimization.
Response: This includes dtype dependent logic. While they cannot be
expressed by ONNX functions, we can use the graph building experience to
capture the logic statically into the ONNX model.
Response: We need to take examples and design the graphbuilder
capability to support this kind of subgraphs.
...
Example
We define the ATen add operator in onnxscript as
def aten_add(self, other, alpha: float = 1) -> TensorType:
# add.Tensor(Tensor self, Tensor other, \*, Scalar alpha=1) -> Tensor
if alpha != 1:
other = op.Mul(other, alpha)
return op.Add(self, other)
Note that there is an attribute that needed a conditional.
Response: The library and the exporter concern the correctness of the
model and preserves as much logic as possible. Optimization is the
responsibility of downstream passes and runtimes.
"Functions"
, "onnx functions"
, "onnx functions written in onnxscript"
are used interchangeably."Function lib"
The ATen op library being described in this document."Proto"
An onnx protobuf sterilization."Input/output"
When not clearly stated, they refer to the input and"Op"
, "operator"
An operator in a computation graph."ATen Op".
A computational operator defined by PyTorch, e.g. one"torch IR"
defined in"Exporter"
, "Converter"
Model converters. Usually the torch.onnx@justinchuby is using this issue to track experiments
__array__
)onnx script is going to be great for the conversion process in pytorch. The torch exporter uses the onnx dialect in TorchScript, before going through a few additional passes and eventually generating an onnx proto. (So we cannot create onnx protos directly)
It would be very helpful to be able to delegate the graph building process to another object / entity. For example, we can create a wrapper around torch's graph.op
method so that each graph building call is delegated to graph.op
, allowing it to build a torch script graph.
One way of doing this can be exposing the graph building APIs so we don't need to rely on @script
and the source code for constructing the graph.
cc @BowenBao
Currently, onnx-script depends on ort when create any ops:
It would be valuable to not depend on ort to keep onnx-script lean when adopted in other projects.
Looks like the default opset is opset 14. It would then be helpful to have a latest_opset
ONNX 1.13 produced some errors in tests: https://github.com/microsoft/onnx-script/actions/runs/3678076190/jobs/6220897226
E onnx.onnx_cpp2py_export.checker.ValidationError: Field 'type' of 'value_info' is required but missing.
It looks like TensorType is not a type. When I did
def LeakyRelu(input, negative_slope: FLOAT | float = 0.01, inplace: BOOL | bool = False):
...
I got
TypeError: unsupported operand type(s) for |: 'TensorType' and 'type'
Goal: support beartype like runtime type checkers for tensor shape, type checking.
Question: do we need it, or does type checking come for free with onnx?
Related:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.