zdevito / pytorch Goto Github PK

This project forked from pytorch/pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

License: Other

CMake 2.02% Python 32.27% C++ 49.59% C 4.55% Shell 0.50% Lua 0.01% Vim Script 0.01% Objective-C 0.01% Cuda 9.95% Makefile 0.01% Batchfile 0.01% Dockerfile 0.08% Metal 0.15% Objective-C++ 0.85% CSS 0.01% HTML 0.02%

pytorch's Issues

What is `snapshot.py` in your blog?

$ python _memory_viz.py memory snapshot.py -o memory.svg

from https://zdevito.github.io/2022/08/16/memory-snapshots.html, you didn't explain :(

Add non-legacy mechanism for defining operators

Currently everything needs to be TH. We should have a way to define a new operator given a cwrap declaration and a templated function that takes the scalar type and processor as template arguments and can do whatever it wants with them.

For methods, this has to be part of the actual library.

For functions, this can be either as part of TensorLib, or as standalone a user can run to produce their own equivalents to Function.h (including optional/output arguments).

Method/Function overlap

Some functions have different declarations for method/function variants:

[[
  name: lt
  return: argument 0
  options:
    - cname: ltValue
      arguments:
        - arg: THBoolTensor* result
          output: True
        - THTensor* self
        - real value
    - cname: ltTensor
      arguments:
        - arg: THBoolTensor* result
          output: True
        - THTensor* self
        - THTensor* other
]]

[[
  name: lt_
  return: self
  options:
    - cname: ltValueT
      arguments:
        - THTensor* self
        - THTensor* self
        - real value
    - cname: ltTensorT
      arguments:
        - THTensor* self
        - THTensor* self
        - THTensor* other
]]

[[
  name: lt
  variants:
    - function
  return: argument 0
  options:
    - cname: ltValue
      arguments:
        - arg: THBoolTensor* result
          output: True
        - THTensor* tensor
        - real value
    - cname: ltTensor
      arguments:
        - arg: THBoolTensor* result
          output: True
        - THTensor* tensor
        - THTensor* other
    - cname: ltValueT
      arguments:
        - arg: THTensor* result
          output: True
        - THTensor* tensor
        - real value
    - cname: ltTensorT
      arguments:
        - arg: THTensor* result
          output: True
        - THTensor* tensor
        - THTensor* other
]]

However, they overlap in name and arguments causing conflicts. In the case where a function is method only we should add a suffix to the name e.g. "lt_method" in Type, and have the Tensor method call that one to avoid this.

Remove lt_t variants

In addition to moving the cwrap files back to the way there were, we need to rename method-only functions in Type.h/cpp so that they don't conflict with function only things of the same name.

THSize/THStride handling

THSize and THStride need to be some sort of iterable C++ object. Ideally it should be legal to pass a vector a {1,3,4}-style literal, and a begin(),end() pointer in this position.

The best thing may be to implement a very minimal version of LLVM's ArrayRef class, which has implicit constructors for std::vector, std::initializer and a constructor for begin()/end()

http://llvm.org/docs/doxygen/html/classllvm_1_1ArrayRef.htmla

Sort through and regularize the list of special-replacements for constants

Currently looks like this but is growing as compiler errors are fixed.

    ('THPDefaultGenerator->cdata','dynamic_cast<${Processor}Generator*>(context->defaultGenerator(processor())->generator'),
    ('__storage_size.get\\(\\)', 'THStorageView::make(static_cast<int64_t>(storage.size()))')

ported cwrap declaration changes from pytorch trunk

with stateless, only stateless -> variants (method, function)
comparison ops, rename to e.g. ltTensor, ltValue
defined_if -> processors and types
combined cpu, cuda random functions. Need to handle generator removal, real --> double, splitting,
normal options

Add overloaded operators to Tensor

These can be directly written in Tensor.h and forward to the correct method names.
See pytorch's tensor.py for how these map to function names.

Sort out whatever is happening with bernoulli, and regularize

It is the only thing that has THFloatTensor THDoubleTensor referred to directly and it is not clear what is even happening in this file.

#define THCudaDoubleTensor_BERNOULLI_TENSOR THCudaDoubleTensor_bernoulli_DoubleTensor
#define THCudaTensor_BERNOULLI_TENSOR THCudaTensor_bernoulli_FloatTensor

[[
  name: bernoulli
  defined_if: CUDA_FLOAT || CUDA_DOUBLE
  types:
    - Float
    - Double
  processors:
    - CUDA
  return: argument 0
  variants:
    - method
    - function
  cname: BERNOULLI_TENSOR
  before_call:
    THTensor_(resizeAs)(LIBRARY_STATE ((THPTensor*)$arg0)->cdata, ((THPTensor*)$arg1)->cdata);
  arguments:
    - arg: THTensor* output
      output: True
    - THTensor* self
]]

#undef THCudaDoubleTensor_BERNOULLI_TENSOR
#undef THCudaTensor_BERNOULLI_TENSOR

[[
  name: bernoulli_
  defined_if: CUDA_FLOAT || CUDA_DOUBLE || CUDA_HALF
  types:
    - floating_point
  processors:
    - CUDA
  return: self
  options:
    - cname: bernoulli
      arguments:
        - THTensor* self
        - arg: double p
          default: 0.5
    - cname: bernoulli_FloatTensor
      arguments:
        - THTensor* self
        - THCudaTensor* float_p
    - cname: bernoulli_DoubleTensor
      arguments:
        - THTensor* self
        - THCudaDoubleTensor* float_p
]]

THCState and multi-gpu handling

Is THCState for a single GPU? If so, how should we handle multiple GPUs in TensorLib...?

topk and other have ambiguous overloads due to 'default'

we need to port filter_unique_options

remove with_stateless and only_stateless. Replace with variants: [method,function]

with_stateless: True === variants: [ method, function ]
only_stateless === variants: [ functions ]
nothing specified (i.e. default variants): variants: [ method ]

do it for TensorLib, leaving with_stateless and only_stateless in place
backport it to actual cwrap and remove with_stateless and only_stateless

Add bindings for CUDNN

bindings to CUDNN that expect details about the convolution be described using CUDNN data-types directly, except for tensor descriptors.
bindings that handle the details of setting up and caching necessary cudnn state.

Depends on #19 because we need that interface for implementing the bindings.

TensorRandom.cwrap: cpu, cuda normal functions have different options

Need to unify.

implement wrap_dim checking

This insures a argument dimension matches a a dim of a tensor...

Port a printf function to work directly from C++

At least in Lua Torch print was implemented in Lua. We need to dig up something that can be used to easily print Tensors.

Tensor -> Tensor/TensorImpl

Right now tensors are returned by pointer and passed by reference. Instead tensors should be passed around by value and internally use TH* ref counting mechanisms to know when to delete. The statically-dispatched methods which are now on Tensor will remain there, others will need to forward to a TensorImpl class that will hold the actual pointer.

Handle before_call places correctly.

For instance here:

[[
  name: mv
  cname: addmv
  variants:
    - method
    - function
  return: argument 0
  before_call: |
    long s = THTensor_(size)(LIBRARY_STATE ((THPTensor*)$arg4)->cdata, 0);
    THTensor_(resize1d)(LIBRARY_STATE ((THPTensor*)$arg0)->cdata, s);
    #if !IS_CUDA
    THTensor_(zero)(LIBRARY_STATE ((THPTensor*)$arg0)->cdata);
    #endif
  arguments:
    - arg: THTensor* result
      output: True
    - CONSTANT AS_REAL(0)
    - argument 0
    - CONSTANT AS_REAL(1)
    - THTensor* self
    - THTensor* vec
]]

Why do we need _(long,CNativeLong,i) on OSX?

TensorRandom.cwrap has duplicates to handle removing Generator

TensorRandom.cwrap has some methods that are the same except that on CUDA the Generator argument is removed because CUDA does not ever take generator arguments. This is problem for C++ wrap because it causes duplicate declarations. We can fix this by unifying them into a single declaration across CUDA/CPU and then have a cwrap plugin split it into the CPU/CUDA versions, removing the THGenerator* from the CUDA version.

[[
  name: multinomial
  defined_if: defined(TH_REAL_IS_FLOAT) || defined(TH_REAL_IS_DOUBLE)
  types:
    - floating_point
  processors:
    - CPU
    - CUDA
  variants:
    - method
    - function
  return: argument 0
  arguments:
    - arg: THIndexTensor* result
      output: True
    - arg: THGenerator* generator
      default: THPDefaultGenerator->cdata
      kwarg_only: True
    - THTensor* self
    - long num_samples
    - arg: bool replacement
      default: "false"
]]

[[
  name: multinomial
  defined_if: CUDA_FLOAT || CUDA_DOUBLE || CUDA_HALF
  types:
    - floating_point
  processors:
    - CUDA
  variants:
    - method
    - function
  return: argument 0
  arguments:
    - arg: THIndexTensor* result
      output: True
    - THTensor* self
    - long num_samples
    - arg: bool replacement
      default: "false"
]]

Backport processors: and types: to original cwrap

removed defined_if that is related only to processors/types and processor_types_pairs. Write a plugin that generates defined_if from the processor/types info.

write the plugin to generated a new defined_if and check via #ifdefs that it is the same as the original defined if

#ifdef ORIGINAL_DEFINED_IF
#ifndef NEW_DEFINED_IF
#error "wrong!"
#endif
#endif

#ifdef NEW_DEFINED_IF
#ifndef ORIGINAL_DEFINED_IF
#error "wrong!"
#endif
#endif

remove the old defined_if and enable the plugin.

zdevito / pytorch Goto Github PK

pytorch's People

Contributors

Stargazers

Watchers

Forkers

pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org