Giter VIP home page Giter VIP logo

torch_npu's Introduction

PyTorch Backend

  • Bridging and Integration: Construct a device-agnostic layer that promotes a unified interface on the upper layer and ensures compatibility with various hardware on the lower layer, shielding PyTorch from direct awareness of multiple backends.
  • Low cost Integration: Provide device abstraction layer to accelerate new backend integration by only implementing few interfaces, offer comprehensive integration documentation, provide integrate implementations as a reference (CUDA/CPU/NPU) and general test cases and contract tests.
  • Quality Assurance: Maintain quality through CI/CD for the integration mechanism of third-party devices based on PrivateUse1.
  • Mainstream Approach: Promote the integration mechanism of third-party devices based on PrivateUse1 as the mainstream approach for integrating new backends into PyTorch in the future.

Current Progress:

  • Runtime: Completed components include Device, Stream, Event, Generator, Guard, and Allocator.
  • AMP: Registration and API have been completed.
  • Operators: Migrated NPU operator list and codegen. The next steps will involve operator simplification and codegen refactoring.

Next Steps:

  • Device-agnostic: Complete the device-agnostic layer; organize specific device logic according to different device type (e.g., backends/cuda, backends/cpu, backends/...). Making it as submodule in the future.
  • CodeGen: Enhance and refactor codegen module, providing general and reusable code generation capabilities that cover official operators, custom operators, routing code, forward and backward binding, etc.
  • Operators: Simplify operators, implement all factory class operators (as operator implementation reference), as well as functional operators (for testing the functionality of the third-party device integration mechanism).
  • Tests & Docs: Complete general test case suites, the full module integration and API documentation.
  • Live Demo: Integrate CPU into PyTorch based on this project and provide a full-process integration tutorial.

Getting Started

To start using the PyTorch Backend Project, users can refer to the comprehensive documentation provided. This includes detailed guides on setting up the environment, integrating new devices, and best practices for optimizing performance.

Project Structure

    .
    ├── backends
    │   ├── fake               // dummy backend: provide all weak symbols needed by csrc, we can run this demo without implementing all symbols in REAL Backend by this fake backend.
    │   ├── npu                // one of REAL Backend: provide API and Structure related witch specific Backends strongly
    │   ├── cuda               // one of REAL Backend: will be implemented later
    │   └── ...
    ├── cmake
    ├── codegen                // Code generation: includes registration for forward and backward, backward implementation, backward binding, custom operator routing, reroute routing, etc.
    │   ├── autograd
    │   │   └── templates      // General template
    │   └── templates
    ├── csrc                   // C++ implementations related to PyTorch, not involving specific backend implementations, theoretically only includes backend interface calls
    │   ├── api                // libtorch functionalities
    │   ├── aten               // Code generation: includes only wrap and PyTorch operator registration; in the future, considering moving Tensor & Storage & Serialization here, as these three are related to Tensor logic
    │   ├── backend            // General Implementation of PyTorch API
    │   ├── core               // Common Utils
    │   │   ├── allocator
    │   │   ├── generator
    │   │   └── guard
    │   └── distributed        // Distributed
    ├── docs                   // All docs: C++ API, Python API and E2E tutorials
    │   ├── cpp
    │   │   └── source
    │   └── source
    ├── test                   // General TestCase Sets: including C++ and python
    │   └── cpp
    │       ├── backend
    │       ├── common
    │       └── core
    ├── third_party
    │   └── googletest
    └── torch_backend          // Python interface implementation for PyTorch
        ├── backend
        ├── csrc               // Python & C++ binding
        │   ├── backend        // Python bindings for all low-level capabilities needed to be exposed to Python
        │   └── core           // General capabilities, only provided for Python
        └── meta               // Meta operator registration

Documents

API Documents

C++ API

License

PyTorch Backend has a BSD-style license, as found in the LICENSE file.

torch_npu's People

Contributors

fffrog avatar hippocookie avatar shink avatar zhenbin-8 avatar hipudding avatar dependabot[bot] avatar

Stargazers

 avatar  avatar

Watchers

ChenRui avatar Yikun Jiang avatar  avatar JQ avatar wangxiyuan avatar  avatar

torch_npu's Issues

分工

0、autoload torch.npu & torch_npu.npu
1、NPUEvent.h -> 元昊
2、StreamGuard.h -> 逢春
3、NPUGuardImpl.cpp -> 元昊
4、HasCompatibleShallowCopyType.cpp -> warnging 逢春
5、VariableFallbackKernel.cpp -> 李佳伟(Done)
6、BinaryOps.cpp -> 李佳伟(Done)
7、torch_npu/optim 删除 -> 泽升
8、testing删除以及新的test framework -> 泽升
9、torch_npu/utils -> 李佳伟
10、NPUPluggableAllocator.h删除 -> 贞斌
11、torch_npu/csrc/Module.cpp 单独拆出来(torch_npu/csrc/.../xxx, torch_npu/npu/xxx) -> ALL
12、npu+op-plugin 新建个仓库 - 李佳伟
13、文档 - ALL
14、namespace 统一
15、DeviceProp 下移到 csrc/npu下(可以参考XPU实现) - 元昊
16、NPU LOG
17、Refactor AutocastMode.cpp(官方提供了默认列表,对比差异,如果一样可以直接调用宏)(社区多,搞不了)- 逢春

Roadmap

  • [元昊] autoload torch.npu & torch_npu.npu
  • [元昊] NPUGuardImpl.cpp 内部注册拆分 #47
  • [元昊] DeviceProp 下移到 csrc/npu下(可以参考XPU实现)
  • [元昊] AclInterface.h 删除 #46
  • [元昊] 新增 torch_backend._C 的 pyi文件(代码提示)#79
  • [逢春] StreamGuard.h 按需重构
  • [逢春] HasCompatibleShallowCopyType.cpp Warning定位(这个可以改成设备无关,不通过dispatcher调用)
  • [逢春] Refactor AutocastMode.cpp(官方提供了默认列表,对比差异,如果一样可以直接调用宏)
  • [逢春] npu init & finalize 重构
  • [泽升] torch_npu/optim 删除
  • [泽升] testing删除以及新的test framework
  • [泽升] namespace 设计
  • [贞斌] NPUPluggableAllocator.h删除
  • [贞斌] torch_npu/csrc/Module.cpp 单独拆出来内存相关
  • [贞斌] NPU LOG重新设计,ERROR打印错误栈
  • [佳伟] aten 重构
  • [佳伟] torch_npu/utils refactor
  • [佳伟] npu+op-plugin 新建仓库
  • [佳伟] NPUGuard.h 移到 csrc/npu
  • [贞斌] autocast 设备相关api,兼容社区最新api
  • [待定] deviceguard api完备度缺失,后续考虑对标CUDA(torch.Stream("npu:0"))
  • [泽升] REAMDE.md
  • [泽升] LICENSE update
  • [泽升] torch_npu/utils 无用代码删除
  • [ALL] 文档
  • [贞斌] torch_npu/csrc/Module.cpp内容未拆分完全
  • [待定] torch_npu/npu/utils.py内容未拆分
  • [逢春] maybe_initialize_npu删除,替换成device_lazy_init
  • [待定] torch_npu 移除npu前缀
  • [待定] _npu_is_bf16_supported 新增
  • [待定] 补齐 CachingAllocator NPUCachingAllocator NPUCachingHostAllocator NPUCachingAllocatorHelper 宏 PR
  • [待定] 新增 提供device name的API

CI Failure

Type Description Occurr Counts Example PR
Refactoring Turn Allocator::allocate into non-const, derived class’ override function is not modified. 1 #120969
Refactoring ​​Use DeviceIndex instead of int in CUDA wrappers, derived class’ override function is not modified. 1 #119142
Refactoring Move new trace utils from source to header, which leads to some symbols can’t be found. 1 #114367
Refactoring Migrate to getCvar* functions for env variable checking, which leads to function name can’t be found. 1 #113797
New Features Add support for new data types, data type assert fails. 3 #107586, #116594
New Features Add function to materialize COW storages, which add a pure virtual function Allocator::copy_data, derived class didn’t implement this pure virtual function. 2 #117053, #113396
Refactoring Make macro with AMP more generic. 1 #124050

[Deivce][DeivceGuard] Solutions for refactoring

1. NPUFunctions.h

  1. 参考 CUDA 实现:https://github.com/pytorch/pytorch/blob/main/c10/cuda/CUDAFunctions.cpp,API和实现逻辑尽量与其保持一致

  2. ACL 和 CUDA API 的差异单独抽取一层,使 NPUFunctionsCUDAFunctions 尽量保持高度一致

    比如:aclrtGetDevice()cudaGetDevice() 存在差异,后者可直接获取当前设备,而前者在返回 ACL_ERROR_RT_CONTEXT_NULL 错误码时也是正常的,此错误码表示先前尚未 SetDevice 或创建 Context
    CANN 文档:https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha003/apiref/appdevgapi/aclcppdevg_03_0040.html

Bugs Lists

  • [待定] TORCH_SHOW_CPP_STACKTRACES=1 npu tensor pin_memory coredump
  • [待定] import torch_npu warning
  • [元昊] CI runner crashed
  • [待定] torch.npu.FloatStorage create cpu storage other than npu storage(Reason:__module__ name)

Roadmap

roadmap

  • Stream、Event -> zhenbin-8

  • PyTyte -> zhenbin-8

  • Allocator -> zhenbin-8

  • Device、DeviceGuard -> shink

  • CI -> shink

  • doc preview -> shink

  • MACROS、Exception -> zong

  • Generator、lazy_init -> zong

  • generate_code.sh -> fffrog

  • codegen -> fffrog

  • cmake refactor

  • NPUCachingHostAllocator.h

  • Move headers and libs into correct directory when installed.

import torch_backend warning

  1. 编译的torch+编译的torch_backend 有问题
  2. pip的torch + 编译的torch_backend 没问题
  3. 无论是什么样的torch + 我们的torch_backend 都有问题

退出时资源未释放

复现方法:

import torch
import torch_npu

s = torch.npu.Stream()
print(s)
x = torch.randn(2, 2).npu()
y = torch.randn(2, 2).npu()

with torch.npu.stream(s):   
    z = x.mm(y)

print(z)

ctrl+C后,报错如下:

/home/hua/miniconda3/envs/pt/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 41 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

直接python直接文件不会报错

Keep up to date

Type Description PR
Refactoring Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept. #126376
Refactoring ​​generalize custom_fwd&custom_bwd to be device-agnostic. #126531
Refactoring Refactor autocast C++ APIs to be device-agnostic. #124359
Refactoring refactor autocast python APIs. #124479
Refactoring refactor lazy init to device-agnostic. #118846
Refactoring Generalize host allocator to be device-agnostic. #123079

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.