<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="61

Scheme 1 <a target="_blank" rel="noopener noreferrer nofollow" href="https:/

Agree on point 1, Layer vs <code class="

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data

<div class="snippet-clipboard-content notranslate position-relative overfl

<div class="highlight highlight-source-python notranslate position-relative overflow-auto" dir="auto

Refactor autograd module,about apache/singa

Comments (23)

XJDKC commented on July 23, 2024

Scheme 1
Scheme 2

from singa.

dcslin commented on July 23, 2024

Agree on point 1, Layer vs Operation:

Currently Layer is in autograd.py, but what Layer does is defining forward connection using building blocks including Operation and Layer. Thus by right Layer is not part of autograd. Singa autograd is more on constructing backward pass.

Thus, propose to separate Layer into individual module. Additionally, stateless function like relu should also be a conceptual layer.

For external user, only Layer and Module are visible. Can build new Layer with exisiting Layer. Can build new Module with existing Layer

from singa.layers import Conv2d, relu

For internal user/advanced user, can build Layer with Operation.

Lastly, if this is finalized, autograd.py should contains only Operation, then we could keep the original naming convention. Not need to mark operation as "private" by prefixing underscore e.g. _RNN

from singa.

nudles commented on July 23, 2024

Agree on point 1, Layer vs Operation:

Currently Layer is in autograd.py, but what Layer does is defining forward connection using building blocks including Operation and Layer. Thus by right Layer is not part of autograd. Singa autograd is more on constructing backward pass.

Thus, propose to separate Layer into individual module. Additionally, stateless function like relu should also be a conceptual layer.
We can create class ReLU(Layer) in layer.py.

For external user, only Layer and Module are visible. Can build new Layer with exisiting Layer. Can build new Module with existing Layer
from singa.layers import Conv2d, relu

should be ReLU?

For internal user/advanced user, can build Layer with Operation.

Lastly, if this is finalized, autograd.py should contains only Operation, then we could keep the original naming convention. Not need to mark operation as "private" by prefixing underscore e.g. _RNN
yes.

from singa.

nudles commented on July 23, 2024

* Scheme 1
  ![scheme_1](https://user-images.githubusercontent.com/32295829/81771083-a34f7700-9514-11ea-92bd-48f884d760f7.png)

* Scheme 2
  ![scheme_2](https://user-images.githubusercontent.com/32295829/81771093-a9455800-9514-11ea-9ed6-7f6003c807ff.png)

How about using the following APIs.

class Module:
   def __init__(self, inputs):
       # inputs is a list of input tensors (placeholders)
       # randomly fill each input tensor
       self.forward(*inputs)  # turn off the graph mode completely

    # another option is to define a compile method
    def compile(self, inputs, is_train, use_graph, graph_alg):
        self.forward(*inputs)
       
class MyModel(Module):
    def __init__(self, inputs):
       # define all layers
       # can we force the next call to be invoked at the end of the __init__ method?
       super.__init__(inputs)  

x = Placeholder((2, 3), device = gpu)  # alias of Tensor.
m = MyModel([x])
# if use the other option
# m = MyModel()
# m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')

for pname, ptensor in m.get_params():
    ptensor.uniform(-1, 1)

y = Tensor((2,), device = gpu)
for npx, npy in data:
   x.copy_from(npx)
   y.copy_from(npy)
   y_ = m(x)             # build the graph in the first iteration.
   l = m.loss(y, y_)
   # ...

class MyLayer:
    def __init__(kernel_initialization="he_uniform", name=None):
       # kernel_initialization is a string for the predefined initialization method 
       # or a function that accept a tensor as input and fill the values in-place; 
       # this is to provide a default initialization method; 
       #  users can also configure it to use customized initialization method or 
      # get the params out and fill the values explicitly as shown below.
      self.init = False
      self.kernel_initialization = ...

    def __call__(self, x):
       if self.init == False:
           self.kernel = Tensor(...)
           self.kernel_initialization(self.kernel)
       else:
          # do the forward propagation

from singa.

XJDKC commented on July 23, 2024

* Scheme 1
  ![scheme_1](https://user-images.githubusercontent.com/32295829/81771083-a34f7700-9514-11ea-92bd-48f884d760f7.png)

* Scheme 2
  ![scheme_2](https://user-images.githubusercontent.com/32295829/81771093-a9455800-9514-11ea-9ed6-7f6003c807ff.png)

How about using the following APIs.

class Module:
   def __init__(self, inputs):
       # inputs is a list of input tensors (placeholders)
       # randomly fill each input tensor
       self.forward(*inputs)  # turn off the graph mode completely

    # another option is to define a compile method
    def compile(self, inputs, is_train, use_graph, graph_alg):
        self.forward(*inputs)
       
class MyModel(Module):
    def __init__(self, inputs):
       # define all layers
       # can we force the next call to be invoked at the end of the __init__ method?
       super.__init__(inputs)  

x = Placeholder((2, 3), device = gpu)  # alias of Tensor.
m = MyModel([x])
# if use the other option
# m = MyModel()
# m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')

for pname, ptensor in m.get_params():
    ptensor.uniform(-1, 1)

y = Tensor((2,), device = gpu)
for npx, npy in data:
   x.copy_from(npx)
   y.copy_from(npy)
   y_ = m(x)             # build the graph in the first iteration.
   l = m.loss(y, y_)
   # ...

class MyLayer:
    def __init__(kernel_initialization="he_uniform", name=None):
       # kernel_initialization is a string for the predefined initialization method 
       # or a function that accept a tensor as input and fill the values in-place; 
       # this is to provide a default initialization method; 
       #  users can also configure it to use customized initialization method or 
      # get the params out and fill the values explicitly as shown below.
      self.init = False
      self.kernel_initialization = ...

    def __call__(self, x):
       if self.init == False:
           self.kernel = Tensor(...)
           self.kernel_initialization(self.kernel)
       else:
          # do the forward propagation

Thanks for such a detailed explanation! For the first scheme, we can't call the forward function in __init__ function because inputs variable is just a list of placeholders. So we still need to separate initialization and forward propagation, or it will produce a runtime error.

from singa.

XJDKC commented on July 23, 2024

class MyLayer:
    def __init__(kernel_initialization="he_uniform", name=None):
       # kernel_initialization is a string for the predefined initialization method 
       # or a function that accept a tensor as input and fill the values in-place; 
       # this is to provide a default initialization method; 
       #  users can also configure it to use customized initialization method or 
      # get the params out and fill the values explicitly as shown below.
      self.initialized = False
      self.kernel_initialization = ...

    def init(self, x):
       # init params and other state data

    def forward(self, x): 
       # do the forward propagation

    # __call__ function is inherited by all subclasses
    # This part of code does not need to be implemented in every subclass
    def __call__(self, x):
       if self.initialized == False:
           self.init(inputs)
           self.initialized = True
       
       self.forward(inputs)

In this way, when to initialize is transparent to users, users just need to implement __init__, init and forward function if they want to create a new Layer.

from singa.

nudles commented on July 23, 2024

Before we call Module.forward(), we can randomly fill the placeholder tensors.
We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the __call__ method or in a init() method. It is up to the contributor.

Any comments on the drawbacks?
@dcslin @XJDKC

from singa.

XJDKC commented on July 23, 2024

Before we call Module.forward(), we can randomly fill the placeholder tensors.

We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the __call__ method or in a init() method. It is up to the contributor.

Any comments on the drawbacks?
@dcslin @XJDKC

If we separate the initialization and the forward propagation, there is no need to fill the placeholder. The data of tensors will only be accessed in the forward propagation. For initialization, we just access their shapes, types and so on.
That's great. But if users move the initialization into __call__ function, they should determine whether the layer has been initialized by themselves.

from singa.

joddiy commented on July 23, 2024

Before we call Module.forward(), we can randomly fill the placeholder tensors.

We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the __call__ method or in a init() method. It is up to the contributor.

Any comments on the drawbacks?
@dcslin @XJDKC

For some models, it cannot use the random inputs, such as BERT within ONNX, some nodes may compute the indices of a tensor, and the next node may split the tensor by using these indices. If we randomly generate the inputs, this case always fails.

By the way, I prefer the idea of:

# another option is to define a compile method
    def compile(self, inputs, is_train, use_graph, graph_alg):
        self.forward(*inputs)

However, I'd like to add a method to compute the shape based on the inputs of each node instead of calling the forward function:

def compute_output_shape(self, input_shape):
    # print(input_shape) # [(None, 10), (None, 12)]
    return (None, input_shape[0][1] + input_shape[1][1] + 2)

Let me think about it, I'll comment the detailed API later.

from singa.

dcslin commented on July 23, 2024

Considering the API requirement, and constraints as below:

API requirement:

[Model] multi-input/output (multi loss fn)
[Model] Load model from disk, in other words: Model param memory allocation should be done in model.__init__

API constraints:

[Model] graph module buffer first forward call or turn off graph module in the first forward call
[Layer] layer param memory allocation & initialization requires input x

@XJDKC 's scheme 2 and @nudles 's Placeholder is close to current implemenation, and changes required could be small.

For model building:

class MyModel(Model):
  def __init__(self, inputs, configs):
    self.mylayer=MyLayer(configs)
    self.linear1=Linear(configs, kernel_init=configs.ker_init)
    super.__init__(inputs) # maybe a bit confuse for user what is this
  def forward(self, inputs):
    return linear1(mylayer(inputs[0], inputs[1]))

For model running:

x=PlaceHolder(shape=(batch, shape1, shape2))
m=MyModel([x],**configs,**graph_configs)
m.on_device(gpu)
m.train()
for e in epochs:
  for x, y in data_gen:
    losses = m.loss(y, m(x))
    m.optim(l1)
    m.optim(l2)

For Layer building:

class MyLayer(Layer):
  def __init__(self, configs):
    self.configs = configs
  def __call__(self, inputs):
    if not self.init:
      self.W = Tensor(self.configs, inputs.shape).initializer()
      self.device_check(inputs, self.W)
      self.init=True
    return = operator1(inputs[0], inputs[1])

For Module class impl:

class Module:
  def __init__(self, placeholder_input, configs):
    turn_off_graph()
    self.forward(*placeholder_input)
    turn_on_graph()
  def __call__(self,inputs):
    return self.forward(*inputs)

from singa.

nudles commented on July 23, 2024

Before we call Module.forward(), we can randomly fill the placeholder tensors.

We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the __call__ method or in a init() method. It is up to the contributor.

Any comments on the drawbacks?
@dcslin @XJDKC
1. If we separate the initialization and the forward propagation, there is no need to fill the placeholder. The data of tensors will only be accessed in the forward propagation. For initialization, we just access their shapes, types and so on.

Then we need a method like infer_output_shapes(self, input_shapes); otherwise, we have to call the forward method to get the output shapes from the output tensor. I prefer to call the forward function to avoid adding a new method to each layer.

2. That's great. But if users move the initialization into __call__ function, they should determine whether the layer has been initialized by themselves.

from singa.

nudles commented on July 23, 2024

Before we call Module.forward(), we can randomly fill the placeholder tensors.

We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the __call__ method or in a init() method. It is up to the contributor.

Any comments on the drawbacks?
@dcslin @XJDKC

For some models, it cannot use the random inputs, such as BERT within ONNX, some nodes may compute the indices of a tensor, and the next node may split the tensor by using these indices. If we randomly generate the inputs, this case always fails.

Good point. Then we can config the data type when creating the placeholder and initialize the placeholder according to this data type. But how to initialize? randomly or set to 0? there could still be some issues. So the better way is to use real data instead of placeholders..

By the way, I prefer the idea of:

# another option is to define a compile method
    def compile(self, inputs, is_train, use_graph, graph_alg):
        self.forward(*inputs)

However, I'd like to add a method to compute the shape based on the inputs of each node instead of calling the forward function:

def compute_output_shape(self, input_shape):
    # print(input_shape) # [(None, 10), (None, 12)]
    return (None, input_shape[0][1] + input_shape[1][1] + 2)

Do you need this one for onnx loading?

Let me think about it, I'll comment the detailed API later.

from singa.

nudles commented on July 23, 2024

Shall we go with the following APIs?
@joddiy @dcslin @XJDKC
They should be compatible with the current APIs.

class Module:
    def compile(self, inputs, is_train, use_graph, graph_alg):
        set train, graph etc config
        turn off graph
        if inputs are not filled, print warnings and fill inputs according to data type.
        self.forward(*inputs)
    
     def load(self, ckp_path, include_state=False):
       load onnx model and copy the params to each layer; 
       generate warnings for mismatched layers/params.
       restore the states and return it as a dict
     
     def save(self, ckp_path, state={}):
       save the model as onnx format
       save the states
    
     def forward(self, x):    # turn on graph if necessary
        pass

     def train_one_batch(self, x, y):  # turn on graph if necessary
        pass   
   
     @deprecated 
     def loss(self, ):
        pass

      @deprecated 
      def optim(self,):
          pass      


class Layer:
    def __init__(name=None):
      self.init = False
      
    def __call__(self, x):
       if self.init == False:
           init layer states
       else:
          # do the forward propagation 


class MyLayer(Layer):
     def __init__(self):
          self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform') 
          self.layer2 = layer.MaxPool2d(kernel=3, stride=2)

      def forward(self, x):
          return self.layer2(self.layer1(x))



class MyModule(Module):
     def __init__(self):
           self.blk1 = MyLayer()
           self.blk2 = MyLayer()
           self.optim = SGD()
           self.loss = CrossEntropyLoss()

      def forward(self, x):
           return self.blk2(self.blk1(x))    

      def train_one_batch(self, x, y): 
           y_ = self.forward(x)
           l = self.loss(y_, y)
           self.optim.backward_and_update(l)
           return l

x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
fill x with values
m = MyModel()

# compatible with existing code which does not have the following two statements.
m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
for pname, ptensor in m.get_params():
    ptensor.uniform(-1, 1)   # not necessary if each layer's param init methods are configured.

y = Placeholder((2,), device = gpu)
for npx, npy in data:
   x.copy_from(npx)
   y.copy_from(npy)
   m.train_one_batch(x, y)  # build the graph in the first iter.  For the old code, the params are initialized here.

m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}

from singa.

XJDKC commented on July 23, 2024

So we replace the loss and optim with train_one_batch?
Should we make Module a subclass of Layer?

from singa.

dcslin commented on July 23, 2024

To me this api is more clear. For example, the model.compile([x]) make more sense as it require x as arg, compared to model([x]) (first call for init purpose).
Also introducing train_one_batch() gives flexibility on loss function and optim function.
While train_one_batch() is only an interface, and let user to define, and there is no decorator for this method. It sounds like there is no enforcement for user to use this method. Maybe we can make it clear in documentation that this is singa-way to build the model to enforece this train_one_batch() method.

from singa.

joddiy commented on July 23, 2024

Shall we go with the following APIs?
@joddiy @dcslin @XJDKC
They should be compatible with the current APIs.

class Module:
    def compile(self, inputs, is_train, use_graph, graph_alg):
        set train, graph etc config
        turn off graph
        if inputs are not filled, print warnings and fill inputs according to data type.
        self.forward(*inputs)
    
     def load(self, ckp_path, include_state=False):
       load onnx model and copy the params to each layer; 
       generate warnings for mismatched layers/params.
       restore the states and return it as a dict
     
     def save(self, ckp_path, state={}):
       save the model as onnx format
       save the states
    
     def forward(self, x):    # turn on graph if necessary
        pass

     def train_one_batch(self, x, y):  # turn on graph if necessary
        pass   
   
     @deprecated 
     def loss(self, ):
        pass

      @deprecated 
      def optim(self,):
          pass      


class Layer:
    def __init__(name=None):
      self.init = False
      
    def __call__(self, x):
       if self.init == False:
           init layer states
       else:
          # do the forward propagation 


class MyLayer(Layer):
     def __init__(self):
          self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform') 
          self.layer2 = layer.MaxPool2d(kernel=3, stride=2)

      def forward(self, x):
          return self.layer2(self.layer1(x))



class MyModule(Module):
     def __init__(self):
           self.blk1 = MyLayer()
           self.blk2 = MyLayer()
           self.optim = SGD()
           self.loss = CrossEntropyLoss()

      def forward(self, x):
           return self.blk2(self.blk1(x))    

      def train_one_batch(self, x, y): 
           y_ = self.forward(x)
           l = self.loss(y_, y)
           self.optim.backward_and_update(l)
           return l

x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
fill x with values
m = MyModel()

# compatible with existing code which does not have the following two statements.
m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
for pname, ptensor in m.get_params():
    ptensor.uniform(-1, 1)   # not necessary if each layer's param init methods are configured.

y = Placeholder((2,), device = gpu)
for npx, npy in data:
   x.copy_from(npx)
   y.copy_from(npy)
   m.train_one_batch(x, y)  # build the graph in the first iter.  For the old code, the params are initialized here.

m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}

This approach still postpones the operation init till the training phase right? When the user has a batch of samples, he calls train_one_batch, to call forward, and then to call _call_:

def __call__(self, x):
    if self.init == False:
        init layer states

it's still strange to init the graph until the user has the data.

In my opinion, the current problem is,

we don't have the shape of the input -> so we using a Placeholder as the input
even we have the shape of input data, we cannot compute the all shapes of intermediate tensors since we cannot call the forward with Placeholder -> we may want to init random data but it may incur error.

So, the key point is, we bind the graph construction with forward function. Only when we call forward, we construct the graph. But if we want to call forward we must have the real data.

Then I'm thinking about separating the graph construction with forward function. We define several classes called Graph, Node, the Graph stores relationship between Nodes, and Nodes stores an Operation as well as its input and output.

In the _call_ function of an Operation, we don't call the forward function, instead, create a Node, and stores this operation itself within this Node, set its input and output, then return this newly created Node. So finally, in the following code:

class Operation(object):
    def __init__(self):
        pass

    def __call__(self, previous_node): # for multiply input is similiar
        # create an Node
        # link the current with previous node
        # do the infer_shape, set the shape of each input and output for the current node and previous node
        current_node = new Node()
        current_node.input.node = previous_node
        current_node.operation = self
        current_node.output.shape = infer_shape()
        previous_node.output.node = current_node
        return current_node

    def forward():
        pass

    def backward():
        pass

    def infer_shape():
        pass

We actually constructed a Graph linked with Node by using the following code:

class MyModule(Module):
    def __init__(self):
        super(Model, self).__init__()

        self.conv1 = autograd.Conv2d(1, 20, 5, padding=0)
        self.conv2 = autograd.Conv2d(20, 50, 5, padding=0)

        self.sgd = opt.SGD(lr=0.01)

    def construt_graph(self, x):
        # x is a placeholder
       # create the Graph linked with Node
        y = self.conv1(x)
        y = self.conv2(y)
        self.graph = Graph(x, y)

    def train(self, x, y): 
        y_ = self.graph.forward(x)
        l = self.loss(y_, y)
        self.optim.backward_and_update(l)
        return l

    def loss(self, out, y):
        return autograd.softmax_cross_entropy(out, y)

    def optim(self, loss):
        self.sgd.backward_and_update(loss)

model = MyModule()
x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
model.construt_graph(x) # build the graph

y = Placeholder((2,), device = gpu)
for npx, npy in data:
   x.copy_from(npx)
   y.copy_from(npy)
   m.train(x, y)  # directly train

m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}

from singa.

dcslin commented on July 23, 2024

I see the @joddiy proposal is different only on a naming of a method, compile to construt_graph in terms of API perspective. Maybe can upgrade the backend to graph construction later without breaking new API?

from singa.

XJDKC commented on July 23, 2024

I think we don't need to fill placeholders. If we turn off the buffer, the operations and intermediate tensors in the forward functions can be generated without executing them. So we can get the output tensors in this way.

from singa.

joddiy commented on July 23, 2024

I see the @joddiy proposal is different only on a naming of a method, compile to construt_graph in terms of API perspective. Maybe can upgrade the backend to graph construction later without breaking new API?

Hi, @dcslin , the key point is the _call_ function of Operation(or Layer, since we want to merge these two), this _call doesn't call the forward function instead it creates a new object we called Node, so within the construt_graph, or we say, compile, we build the graph with a placeholder, which means we don't need to postpone the graph construction til we call forward function after we have the real data.

from singa.

XJDKC commented on July 23, 2024

class Module:
    def compile(self, inputs, is_train, use_graph, graph_alg):
        set train, graph etc config
        ===turn on graph===
        if inputs are not filled, print warnings and fill inputs according to data type.
        self.forward(*inputs)
        ===turn off graph===
    
     def load(self, ckp_path, include_state=False):
       load onnx model and copy the params to each layer; 
       generate warnings for mismatched layers/params.
       restore the states and return it as a dict
     
     def save(self, ckp_path, state={}):
       save the model as onnx format
       save the states
    
     def forward(self, x):    # turn on graph if necessary
        pass

     def train_one_batch(self, x, y):  # turn on graph if necessary
        pass   
   
     @deprecated 
     def loss(self, ):
        pass

      @deprecated 
      def optim(self,):
          pass      


class Layer:
    def __init__(name=None):
      self.init = False

    def do_init(x):
        ===turn off graph===
           init layer states
           As the graph is turned off, the initialization operations will be executed
        ===restore the state of the graph===
      
    def forward():
        # do the forward propagation 

    def __call__(self, x):
       if self.init == False:
          self.do_init(x)
       self.forward(x)

class MyLayer(Layer):
     def __init__(self):
          self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform') 
          self.layer2 = layer.MaxPool2d(kernel=3, stride=2)

      def forward(self, x):
          return self.layer2(self.layer1(x))



class MyModule(Module):
     def __init__(self):
           self.blk1 = MyLayer()
           self.blk2 = MyLayer()
           self.optim = SGD()
           self.loss = CrossEntropyLoss()

      def forward(self, x):
           return self.blk2(self.blk1(x))    

      def train_one_batch(self, x, y): 
           y_ = self.forward(x)
           l = self.loss(y_, y)
           self.optim.backward_and_update(l)
           return l

x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
#  === no need to fill x with values===
m = MyModel()

# compatible with existing code which does not have the following two statements.
m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
for pname, ptensor in m.get_params():
    ptensor.uniform(-1, 1)   # not necessary if each layer's param init methods are configured.

y = Placeholder((2,), device = gpu)
for npx, npy in data:
   x.copy_from(npx)
   y.copy_from(npy)
   m.train_one_batch(x, y)  # build the graph in the first iter.  For the old code, the params are initialized here.

m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}

How about this proposal?

from singa.

joddiy commented on July 23, 2024

class Module:
    def compile(self, inputs, is_train, use_graph, graph_alg):
        set train, graph etc config
        ===turn on graph===
        if inputs are not filled, print warnings and fill inputs according to data type.
        self.forward(*inputs)
        ===turn off graph===
    
     def load(self, ckp_path, include_state=False):
       load onnx model and copy the params to each layer; 
       generate warnings for mismatched layers/params.
       restore the states and return it as a dict
     
     def save(self, ckp_path, state={}):
       save the model as onnx format
       save the states
    
     def forward(self, x):    # turn on graph if necessary
        pass

     def train_one_batch(self, x, y):  # turn on graph if necessary
        pass   
   
     @deprecated 
     def loss(self, ):
        pass

      @deprecated 
      def optim(self,):
          pass      


class Layer:
    def __init__(name=None):
      self.init = False

    def do_init(x):
        ===turn off graph===
           init layer states
           As the graph is turned off, the initialization operations will be executed
        ===restore the state of the graph===
      
    def forward():
        # do the forward propagation 

    def __call__(self, x):
       if self.init == False:
          self.do_init(x)
       self.forward(x)

class MyLayer(Layer):
     def __init__(self):
          self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform') 
          self.layer2 = layer.MaxPool2d(kernel=3, stride=2)

      def forward(self, x):
          return self.layer2(self.layer1(x))



class MyModule(Module):
     def __init__(self):
           self.blk1 = MyLayer()
           self.blk2 = MyLayer()
           self.optim = SGD()
           self.loss = CrossEntropyLoss()

      def forward(self, x):
           return self.blk2(self.blk1(x))    

      def train_one_batch(self, x, y): 
           y_ = self.forward(x)
           l = self.loss(y_, y)
           self.optim.backward_and_update(l)
           return l

x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
#  === no need to fill x with values===
m = MyModel()

# compatible with existing code which does not have the following two statements.
m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
for pname, ptensor in m.get_params():
    ptensor.uniform(-1, 1)   # not necessary if each layer's param init methods are configured.

y = Placeholder((2,), device = gpu)
for npx, npy in data:
   x.copy_from(npx)
   y.copy_from(npy)
   m.train_one_batch(x, y)  # build the graph in the first iter.  For the old code, the params are initialized here.

m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}

How about this proposal?

Thanks for your comments. I guess it's a good idea we add a compile function before the training. Based on Ruling's code, if we don't want to run the computation during the init phase, we can add a function to compute the shape:

class Module:
    def compile(self, inputs, is_train, use_graph, graph_alg):
        set train, graph etc config
        turn off graph
        if inputs are not filled, print warnings and fill inputs according to data type.
        self.forward(*inputs)
    
     def load(self, ckp_path, include_state=False):
       load onnx model and copy the params to each layer; 
       generate warnings for mismatched layers/params.
       restore the states and return it as a dict
     
     def save(self, ckp_path, state={}):
       save the model as onnx format
       save the states
    
     def forward(self, x):    # turn on graph if necessary
        pass

     def train_one_batch(self, x, y):  # turn on graph if necessary
        pass   
   
     @deprecated 
     def loss(self, ):
        pass

      @deprecated 
      def optim(self,):
          pass      


class Layer:
    def __init__(name=None):
      self.init = False

    def do_init(x):
        #  compute the output shape
        output_shape = self.infer_shape(x)
        # init weights by the shape
        init_weights()
        # return a new Placeholder to the next operation
        return Placeholder(output_shape, device = gpu, dtype=singa.float) # alias of Tensor
      
    def forward():
        # do the forward propagation 

    def __call__(self, x):
       if self.init == False:
          y = self.do_init(x)
       y = self.forward(x)
       return y
    
    def infer_shape(x):
        # infer shape


class MyLayer(Layer):
     def __init__(self):
          self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform') 
          self.layer2 = layer.MaxPool2d(kernel=3, stride=2)

      def forward(self, x):
          return self.layer2(self.layer1(x))



class MyModule(Module):
     def __init__(self):
           self.blk1 = MyLayer()
           self.blk2 = MyLayer()
           self.optim = SGD()
           self.loss = CrossEntropyLoss()

      def forward(self, x):
           return self.blk2(self.blk1(x))    

      def train_one_batch(self, x, y): 
           y_ = self.forward(x)
           l = self.loss(y_, y)
           self.optim.backward_and_update(l)
           return l

x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
#  === no need to fill x with values===
m = MyModel()

# compatible with existing code which does not have the following two statements.
m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
for pname, ptensor in m.get_params():
    ptensor.uniform(-1, 1)   # not necessary if each layer's param init methods are configured.

y = Placeholder((2,), device = gpu)
for npx, npy in data:
   x.copy_from(npx)
   y.copy_from(npy)
   m.train_one_batch(x, y)  # build the graph in the first iter.  For the old code, the params are initialized here.

m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}

from singa.

nudles commented on July 23, 2024

Here is the latest API proposal
https://gist.github.com/nudles/d7f8043f251872333ec06f2701696cce

from singa.

nudles commented on July 23, 2024

resolved in #697

from singa.

Refactor autograd module about singa HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent