torchpipe / torchpipe Goto Github PK

An Alternative for Triton Inference Server. Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends

Home Page: https://torchpipe.github.io/

License: Apache License 2.0

CMake 0.52% Shell 0.11% Python 26.51% C++ 70.52% C 0.40% Cuda 1.46% Dockerfile 0.49%

deployment inference pipeline-parallelism serving tensorrt ray-serve triton-inference-server ray pytorch torch2trt

torchpipe's Introduction

English | 简体中文

torchpipe

Pytorch 内的多线程流水线并行库

torchpipe是介于底层加速库（如tensorrt，opencv，CVCUDA, ppl.cv）以及 RPC（如thrift, gRPC）之间并与他们严格解耦的多实例流水线并行库；对外提供面向pytorch前端的线程安全函数接口，对内提供面向用户的细粒度后端扩展。

torchpipe是 Triton Inference Server 的一个代替选择，主要功能类似于其共享显存，Ensemble, BLS机制。

生产级别：在网易智企内部，每天有海量调用由Torchpipe支持。

注意事项

建议选择以下两种方式测试多客户端同时发送请求下结果的一致性：
- 少量输入（比如10张图片），在线校验每张图片输出结果相同
- 大量输入（比如10000张图片），离线保存结果，校验多次一致性

tensorrt在max_batch_size=4时，很多时候输入1张和4张时结果有差异，这是正常的。但是此时固定输入只有有限种类（一般为2）结果

快速开始

1. 安装

clone code :

git clone https://github.com/torchpipe/torchpipe.git
cd torchpipe/ && git submodule update --init --recursive

prepare env

export img_name=nvcr.io/nvidia/pytorch:22.12-py3 
# or build docker image by yourself (recommend, for tensorrt 9.3): 
# docker build --network=host -f ./docker/Dockerfile -t trt-9 thirdparty/
# export img_name=trt-9

build torchpipe

docker run --rm --gpus=all --ipc=host  --network=host -v `pwd`:/workspace  --shm-size 1G  --ulimit memlock=-1 --ulimit stack=67108864  --privileged=true  -w/workspace -it $img_name /bin/bash

python setup.py install

cd examples/resnet18 && python resnet18.py

详见安装文档

2. 获取恰当的模型文件(目前支持 onnx, trt engine等)

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True).eval().cuda()

import tempfile, os, torch
model_path =  os.path.join(tempfile.gettempdir(), "./resnet18.onnx") 
data_bchw = torch.rand((1, 3, 224, 224)).cuda()
print("export: ", model_path)
torch.onnx.export(resnet18, data_bchw, model_path,
                  opset_version=17,
                  do_constant_folding=True,
                  input_names=["in"], output_names=["out"],dynamic_axes={"in": {0: "x"},"out": {0: "x"}})

# os.system(f"onnxsim {model_path} {model_path}")

3. 现在你可以并发调用单模型了

import torch, torchpipe
model = torchpipe.pipe({'model': model_path,
                        'backend': "Sequential[cvtColorTensor,TensorrtTensor,SyncTensor]", # 后端引擎， 可见后端API参考文档。
                        'instance_num': 2, 'batching_timeout': '5', # 实例数和超时时间
                        'max': 4, # 模型优化范围最大值，也可以为 '4x3x224x224'
                        'mean': '123.675, 116.28, 103.53',#255*"0.485, 0.456, 0.406"，
                        'std': '58.395, 57.120, 57.375', # 将融合进tensorrt网络中
                        'color': 'rgb'}) # cvtColorTensor后端的参数： 目标颜色空间顺序
data = torch.zeros((1, 3, 224, 224)) # or torch.from_numpy(...)
input = {"data": data, 'color': 'bgr'}
model(input)  # 可多线程并行调用
# 使用 "result" 作为数据输出标识；当然，其他键值也可自定义写入
print(input["result"].shape)  # 失败则此键值一定不存在，即使输入时已经存在。

纯c++ API 可通过 [libtorch+cmake] 或者 [pybind11]的方式获得.

4. 我们的核心功能为多个节点间的一系列流水线设施。

更多信息，访问 Torchpipe的文档。

5. RoadMap

torchpie目前处于一个快速迭代阶段，我们非常需要你的帮助。欢迎通过issues或者merge requests等方式进行反馈。贡献指南。

我们的最终目标是让服务端高吞吐部署尽可能简单。为了实现这一目标，我们将积极自我迭代，也愿意参与有相近目标的其他项目。

近期 RoadMap

公开的基础镜像和pypi(manylinux)
优化编译系统，分为core,pplcv,model/tensorrt,opencv等模块
基础结构优化。包含python与c++交互，异常，日志系统，跨进程后端的优化；
技术报告

潜在未完成的研究方向

单节点调度和多节点调度后端，他们与计算后端无本质差异，需要更多面向用户进行解耦，我们想要将这部分优化为用户API的一部分；
针对多节点的调试工具。由于在多节点调度中，使用了模拟栈设计，比较容易设计节点级别的调试工具；
负载均衡

6. 致谢

我们的代码库使用或者修改后使用了多个开源库，请查看致谢了解更多详细信息。

torchpipe's People

Contributors

Stargazers

Watchers

Forkers

alwaysssssss aiceria islinxu mm-0712 shijianjian cdev-io wangkai222 linyuxing rane2021 mirroryu xuesongtap adambear

torchpipe's Issues

torchpipe与torchgpipe有什么区别么？

torchpipe与torchgpipe有什么区别么？两个库名字很相似

ANNOUNCEMENT: Torchpipe Release v0.4.0

Discussed in #25

^{Originally posted by tp-nan January 12, 2024}

torchpipe 0.4.0 Release Notes

torchpipe 0.4.0 is a major release of the library providing multiple new features, and fixes to multiple customer-reported issues.

Release Highlights

torchpipe v0.4.0 includes the following key changes:

Initial support for CVCUDA 0.5.0 (WITH_CVCUDA=1 pip install -e .).
Improved support for Python backend and Python filters.
Added support for TensorRT 9.2.
Enhanced compatibility and stability.

Compatibility

GPU Compute Capability: 6.1+(7+.x for WITH_CVCUDA)
Ubuntu x86_64: 20.04, 22.04
CUDA Toolkit: 11.0+ (11.2+ for WITH_CVCUDA)
Python: >= 3.7

Known Issues/Limitations

For the torchpipe installed via whl, the dynamic library path for CVCUDA may need to be set using LD_LIBRARY_PATH

License

torchpipe is licensed under the Apache 2.0 license.

Acknowledgements

Thanks for contributions in various ways to torchpipe

This discussion was created from the release Torchpipe Release v0.4.0.

Move examples to another repo

Current main repo is around 100 MB when download those examples and the history files. It is better to move those unnecessary files to another repo to keep this one cleaner, and easy to clone.

Call for Co-developers: Join our Open Source Project to Improve and Perfect the Codebase

Discussed in #28

^{Originally posted by tp-nan January 30, 2024}
Dear Community,

We are excited to announce that we are seeking passionate co-developers/teams who can help us enhance and perfect our open source project. Our team has been working diligently on this initiative, but we believe that collaboration and diverse perspectives will lead to a better end product. By joining forces with like-minded developers, we aim to create an even more robust, efficient, and user-friendly solution while fostering a vibrant community around it.

Project Overview:

TorchPipe is a multi-instance pipeline parallel library that provides a seamless integration between lower-level acceleration libraries (such as TensorRT and CVCUDA) and RPC frameworks. It achieves strict decoupling from these libraries, offering a thread-safe function interface for the PyTorch frontend at a higher level. At the same time, it enables users to extend the backend capabilities at a lower level. TorchPipe is primarily applied in cloud-based computer vision scenes, with hundreds of millions to billions of daily calls. Its design ensures high service throughput while meeting latency requirements.

Why Contribute?

Providing Direct Technical Support for you, Prioritizing Optimization of Your Required Scenarios
Learn new skills and improve existing ones through hands-on practice.
become co-authers

Responsibilities of Co-developers:

Collaborate effectively with other team members using communication tools.
Actively contribute ideas and solutions during design discussions.
Provide timely feedback on pull requests and issues raised by others.
Adhere to the project's coding standards and guidelines.
Attend regular meetings (if applicable) and actively participate in decision-making processes.

Requirements:

We do not require extensive programming experience from you.

Familiarity with version control systems, preferably Git.
A positive attitude and willingness to learn from constructive criticism.
Self-motivated and able to work independently when necessary.

How to Apply:

Please directly reply to this issue, or apply via email at [email protected]" with the subject line "Application for Co-developer Position." We look forward to hearing from you!

Best Regards,

torchpipe teams

有多节点多进程的实现么？

只发现了单节点单进程多线程的examples, 有多节点多进程的torchpipe实现么？单节点的话， triton的 ensemble 是可以做到多进程单线程pipeline，对于python, 更能充分利用cpu, 只是没有torchpipe多线程轻量, 那比triton的 ensemble的性能好的原因是什么？都是cpu 与 gpu拆分成流水线