stochasticai / x-stable-diffusion Goto Github PK

Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6

Home Page: https://stochastic.ai

License: Apache License 2.0

Python 33.51% Makefile 0.08% Dockerfile 0.14% Jupyter Notebook 62.99% MDX 3.28%

inference pytorch stable-diffusion tensorrt aitemplate nvfuser cuda onnx onnxruntime notebook

x-stable-diffusion's Introduction

Welcome to x-stable-diffusion by Stochastic!

This project is a compilation of acceleration techniques for the Stable Diffusion model to help you generate images faster and more efficiently, saving you both time and money.

With example images and a comprehensive benchmark, you can easily choose the best technique for your needs. When you're ready to deploy, our CLI called stochasticx makes it easy to get started on your local machine. Try x-stable-diffusion and see the difference it can make for your image generation performance and cost savings.

🚀 Installation

Quickstart

Make sure you have Python and Docker installed on your system

Install the latest version of stochasticx library.

pip install stochasticx

Deploy the Stable Diffusion model

stochasticx stable-diffusion deploy --type aitemplate

Alternatively, you can deploy stable diffusion without our CLI by checking the steps here.

To perform inference with this deployed model:

stochasticx stable-diffusion inference --prompt "Riding a horse"

Check all the options of the inference command:

stochasticx stable-diffusion inference --help

You can get the logs of the deployment executing the following command:

stochasticx stable-diffusion logs

Stop and remove the deployment with this command:

stochasticx stable-diffusion stop

How to get less than 1s latency?

Change the num_inference_steps to 30. With this, you can get an image generated in 0.88 seconds.

{
  'max_seq_length': 64,
  'num_inference_steps': 30, 
  'image_size': (512, 512) 
}

You can also experiment with reducing the image_size.

How to run on Google Colab?

In each folder, we will provide a Google Colab notebook with which you can test the full flow and inference on a T4 GPU

Manual deployment

Check the README.md of the following directories:

🔥 Optimizations

AITemplate: Latest optimization framework of Meta
TensorRT: NVIDIA TensorRT framework
nvFuser: nvFuser with Pytorch
FlashAttention: FlashAttention intergration in Xformers

Benchmarks

Setup

For hardware, we used 1x40GB A100 GPU with CUDA 11.6 and the results are reported by averaging 50 runs.

The following arguments were used for image generation for all the benchmarks:

{
  'max_seq_length': 64,
  'num_inference_steps': 50, 
  'image_size': (512, 512) 
}

Online results

For batch_size 1, these are the latency results:

A100 GPU

project	Latency (s)	GPU VRAM (GB)
PyTorch fp16	5.77	10.3
nvFuser fp16	3.15	---
FlashAttention fp16	2.80	7.5
TensorRT fp16	1.68	8.1
AITemplate fp16	1.38	4.83
ONNX (CUDA)	7.26	13.3

T4 GPU

Note: AITemplate might not support T4 GPU yet. Check support here

project	Latency (s)
PyTorch fp16	16.2
nvFuser fp16	19.3
FlashAttention fp16	13.7
TensorRT fp16	9.3

Batched results - A100 GPU

The following results were obtained by varying batch_size from 1 to 24.

project \ bs	1	4	8	16	24
Pytorch fp16	5.77s/10.3GB	19.2s/18.5GB	36s/26.7GB	OOM
FlashAttention fp16	2.80s/7.5GB	9.1s/17GB	17.7s/29.5GB	OOM
TensorRT fp16	1.68s/8.1GB	OOM
AITemplate fp16	1.38s/4.83GB	4.25s/8.5GB	7.4s/14.5GB	15.7s/25GB	23.4s/36GB
ONNX (CUDA)	7.26s/13.3GB	OOM	OOM	OOM	OOM

Note: TensorRT fails to convert UNet model from ONNX to TensorRT due to memory issues.

Sample images generated

Click here to view the complete list of generated images

Optimization \ Prompt	Super Mario learning to fly in an airport, Painting by Leonardo Da Vinci	The Easter bunny riding a motorcycle in New York City	Drone flythrough of a tropical jungle convered in snow
PyTorch fp16
nvFuser fp16
FlashAttention fp16
TensorRT fp16
AITemplate fp16

References

🌎 Join our community

Discord - https://discord.gg/TgHXuSJEk6

🌎 Contributing

As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our contributing guide to learn how you can get involved.

For managed hosting on our cloud or on your private cloud [Contact us →](https://stochastic.ai/contact)

x-stable-diffusion's People

Contributors

Stargazers

Watchers

x-stable-diffusion's Issues

Can not reproduce the TensorRT result

I have try the process described in this code repo about tensorrt， but i can not reproduce the TensorRT latency on A100，
my fp16 result is about 3.2，greater than the statistic that you post on 。

Can't run colab notebook

Was trying to run this colab notebook - https://colab.research.google.com/drive/1m3n2n5bfNpRgWJ8K-xwTvhzrTQETWduq?usp=sharing#scrollTo=ONitKUKN4CCW

and seem to be stuck on this error despite pip installing the libraries in the cell above

"ImportError: Please install the accelerate library to use Diffusers with PyTorch. You can do so by running pip install diffusers[torch]. Or if torch is already installed, you can run pip install accelerate."

Currently we only support a100 and t4 gpus. We may support other gpus in the future

When I enter "stochasticx stable-diffusion deploy --type aitemplate," the following message appears: "WARNING | stochasticx.client.stable_diffusion | Currently we only support a100 and t4 gpus. We may support other gpus in the future." My GPU is an RTX4090. Please add it to the list of supported devices.

Can the stable diffusion model we use integrate a CKPT model and deploy it to WebUI? If so, how?

As title，and WebUI refers to：WebUI
Thank you!!!

unet Onnx file issue?

Your colab example is working perfectly well. I am able to convert all onnx models to tensorrt engines with slight modification. But when i generate onnx file for unet model using the default diffusers onnx export script, that onnx file is not convertible with the colab notebook you provided. I have also tested on my own pc with RTX-3090.

So my question is how did you generate the onnx file for unet (that you used in colab tutorial)?

AttributeError: module 'numpy' has no attribute 'bool'

After got unet.engine, then run "python demo.py", and got the following errors, who know this issue, thanks!
(env_py39) [jch@localhost TensorRT]$ python demo.py
[12/27/2022-06:37:00] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
/home/jch/anaconda3/envs/env_py39/lib/python3.9/site-packages/tensorrt/init.py:166: FutureWarning: In the future np.bool will be defined as the corresponding NumPy scalar. (This may have returned Python scalars in past versions.
bool: np.bool,
Traceback (most recent call last):
File "/home/jch/work/x-stable-diffusion/TensorRT/demo.py", line 125, in
model = TrtDiffusionModel(args)
File "/home/jch/work/x-stable-diffusion/TensorRT/demo.py", line 48, in init
self.unet = TRTModel(args.trt_unet_save_path)
File "/home/jch/work/x-stable-diffusion/TensorRT/trt_model.py", line 41, in init
self.inputs, self.outputs, self.bindings, self.stream = self.allocate_buffers(
File "/home/jch/work/x-stable-diffusion/TensorRT/trt_model.py", line 79, in allocate_buffers
dtype = trt.nptype(engine.get_binding_dtype(binding))
File "/home/jch/anaconda3/envs/env_py39/lib/python3.9/site-packages/tensorrt/init.py", line 166, in nptype
bool: np.bool,
File "/home/jch/anaconda3/envs/env_py39/lib/python3.9/site-packages/numpy/init.py", line 284, in getattr
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'bool'
Segmentation fault

Tensorrt converter

Thank you for your great work. I am trying to follow your instruction to convert UNET to tensorrt format, however, I got the following errors:

when converting the model to onnx: I got stuck because the att:ScaledDotProductAttention is not supported. I found the solution on the internet by adding a customized function for scaleddotproductattention. However, I once again encounter this issue when trying to convert the model from onnx to trt. I just want to ask if you have ever encountered this error. And if yes, how did you manage to fix it? Thank you!

PyTorch Baseline perhaps too weak?

I am wondering if the PyTorch baseline is actually optimized enough? Specifically, could you

Remove autocast since the model is already in FP16? AutoCast would actually convert some other non-GEMM fp16 kernels in FP32 (or TF32 in the case of Ampere GPUs)
Run some warm up iteration before measuring the inference latency (averaged across a few)? Like how you did it with TensorRT
Use flags such as torch.backends.cudnn.benchmark = True before running GPU kernels.

On my local machine, just these optimizations (for lack of a better word as they are not really optimizations) would make PyTorch baseline at least 2X faster.

ValueError: only one element tensors can be converted to Python scalars--Error in line 88 of demo.py:

Hey friends,

Interested in getting the notebook running, would love to solve this! Thanks in advance!

Traceback (most recent call last):
File "/content/drive/MyDrive/x-stable-diffusion/TensorRT/demo.py", line 114, in
image = model.predict(
File "/content/drive/MyDrive/x-stable-diffusion/TensorRT/demo.py", line 88, in predict
latents = self.scheduler.step(noise_pred.cuda(), i, latents)["prev_sample"]
File "/usr/local/lib/python3.8/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py", line 218, in step
step_index = (self.timesteps == timestep).nonzero().item()
ValueError: only one element tensors can be converted to Python scalars

Only batch size=1 is supported when transforming the model

When I changed batchsize=4 while converting the model, it reported an error.
The error message is that the broadcast sizes do not match

Add deepspeed, xformers, kernl, transformerengine, ColossalAI, tritonserver, VoltaML, etc

I've been bouncing around various StableDiffusion optimisations the last couple of weeks, and figured I would link out to some of the ones I remember in hopes that they can be explored/added into the benchmarks/comparisons here:

https://github.com/VoltaML/voltaML-fast-stable-diffusion
- Lightweight library to accelerate Stable-Diffusion, Dreambooth into fastest inference models with single line of code
https://github.com/kamalkraj/stable-diffusion-tritonserver
- Deploy stable diffusion model with onnx + tritonserver
https://github.com/luohao123/gaintmodels
- Int8 StableFusion model
AUTOMATIC1111/stable-diffusion-webui#1625
- AITemplate
AUTOMATIC1111/stable-diffusion-webui#4721
- https://github.com/NVIDIA/TransformerEngine
AUTOMATIC1111/stable-diffusion-webui#4096
- https://github.com/ELS-RD/kernl
  - Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
- https://www.reddit.com/r/MachineLearning/comments/ydqmjp/p_up_to_12x_faster_gpu_inference_on_bert_t5_and/
  - Up to 12X faster GPU inference on Bert, T5 and other transformers with OpenAI Triton kernels
AUTOMATIC1111/stable-diffusion-webui#4606
- https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion?rgh-link-date=2022-11-11T18%3A20%3A17Z
- Other related issues: AUTOMATIC1111/stable-diffusion-webui#4606 (comment)
etc

Tensorrt Dynamic Height&Width

Is it possible to build tensorrt engine with dynamic height and width?

Frameworks supporting Mac

I would like to know which frameworks support mac, ideally using Metal Performance Shaders (MPS) or Apple Neural Engine (ANE).
Seeing how these benchmark against windows and linux would be nice to see, too.

batch and uncapped token support for tensorrt

does this have support for making batch inferences?
does this have support for handling prompts that creates more than 75 tokens?

Does onnxruntime slower or faster?

Some people shared their result onnxruntime actually getting faster:

https://medium.com/microsoftazure/accelerating-stable-diffusion-inference-with-onnx-runtime-203bd7728540

Is this supporting Stable Diffusion V2?

As title

stochasticx stable-diffusion infer --prompt "Riding a horse" error

INFO: 172.17.0.1:53344 - "POST /predict/ HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 404, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in call
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 680, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 65, in app
response = await func(request)
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 231, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 160, in run_endpoint_function
return await dependant.call(**values)
File "/code/./server.py", line 37, in predict
images = pipe(
File "/usr/local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/code/./pipeline_stable_diffusion_ait.py", line 310, in call
noise_pred = self.unet_inference(
File "/code/./pipeline_stable_diffusion_ait.py", line 105, in unet_inference
exe_module.run_with_tensors(inputs, ys, graph_mode=True)
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 483, in run_with_tensors
outputs_ait = self.run(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 438, in run
return self._run_impl(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 377, in _run_impl
self.DLL.AITemplateModelContainerRun(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 192, in _wrapped_func
raise RuntimeError(f"Error in function: {method.name}")
RuntimeError: Error in function: AITemplateModelContainerRun
ERROR:uvicorn.error:Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 404, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in call
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 680, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 65, in app
response = await func(request)
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 231, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 160, in run_endpoint_function
return await dependant.call(**values)
File "/code/./server.py", line 37, in predict
images = pipe(
File "/usr/local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/code/./pipeline_stable_diffusion_ait.py", line 310, in call
noise_pred = self.unet_inference(
File "/code/./pipeline_stable_diffusion_ait.py", line 105, in unet_inference
exe_module.run_with_tensors(inputs, ys, graph_mode=True)
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 483, in run_with_tensors
outputs_ait = self.run(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 438, in run
return self._run_impl(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 377, in _run_impl
self.DLL.AITemplateModelContainerRun(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 192, in _wrapped_func
raise RuntimeError(f"Error in function: {method.name}")
RuntimeError: Error in function: AITemplateModelContainerRun
INFO: 172.17.0.1:53346 - "POST /predict HTTP/1.1" 307 Temporary Redirect
0it [00:00, ?it/s][22:02:37] ./tmp/UNet2DConditionModel/model-generated.h:3235: Got error: no error enum: 1 at ./tmp/UNet2DConditionModel/model-generated.h: 4757
[22:02:37] ./tmp/UNet2DConditionModel/model_interface.cu:102: Error: Got error: no error enum: 1 at ./tmp/UNet2DConditionModel/model-generated.h: 4757
0it [00:00, ?it/s]
INFO: 172.17.0.1:53346 - "POST /predict/ HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 404, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in call
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 680, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 65, in app
response = await func(request)
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 231, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 160, in run_endpoint_function
return await dependant.call(**values)
File "/code/./server.py", line 37, in predict
images = pipe(
File "/usr/local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/code/./pipeline_stable_diffusion_ait.py", line 310, in call
noise_pred = self.unet_inference(
File "/code/./pipeline_stable_diffusion_ait.py", line 105, in unet_inference
exe_module.run_with_tensors(inputs, ys, graph_mode=True)
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 483, in run_with_tensors
outputs_ait = self.run(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 438, in run
return self._run_impl(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 377, in _run_impl
self.DLL.AITemplateModelContainerRun(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 192, in _wrapped_func
raise RuntimeError(f"Error in function: {method.name}")
RuntimeError: Error in function: AITemplateModelContainerRun
ERROR:uvicorn.error:Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 404, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 270, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 124, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in call
raise exc
File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 680, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 65, in app
response = await func(request)
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 231, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 160, in run_endpoint_function
return await dependant.call(**values)
File "/code/./server.py", line 37, in predict
images = pipe(
File "/usr/local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/code/./pipeline_stable_diffusion_ait.py", line 310, in call
noise_pred = self.unet_inference(
File "/code/./pipeline_stable_diffusion_ait.py", line 105, in unet_inference
exe_module.run_with_tensors(inputs, ys, graph_mode=True)
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 483, in run_with_tensors
outputs_ait = self.run(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 438, in run
return self._run_impl(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 377, in _run_impl
self.DLL.AITemplateModelContainerRun(
File "/usr/local/lib/python3.9/site-packages/aitemplate/compiler/model.py", line 192, in _wrapped_func
raise RuntimeError(f"Error in function: {method.name}")
RuntimeError: Error in function: AITemplateModelContainerRun

a bug in /Tensorrt/demo.py

latents = self.scheduler.step(noise_pred.cuda(), t, latents)[
"prev_sample"
]
i think there is a bug in this code, it should be:
latents = self.scheduler.step(noise_pred.cuda(), i, latents)[
"prev_sample"
]
because if i use t as the input , it will occur IndexError: index 999 is out of bounds for dimension 0 with size 51
and i change t to i ,the code can run right

Traceback (most recent call last):
  File "convert_unet_to_tensorrt.py", line 52, in <module>
    convert(args)
  File "convert_unet_to_tensorrt.py", line 47, in convert
    f.write(serialized_engine)
TypeError: a bytes-like object is required, not 'NoneType'

Except for the onnx file mentioned in the library, none of the other onnx files seem to convert. Any fix? Thanks.