Comments (23)
from singa.
Agree on point 1, Layer
vs Operation
:
Currently Layer
is in autograd.py
, but what Layer
does is defining forward connection using building blocks including Operation
and Layer
. Thus by right Layer
is not part of autograd
. Singa autograd
is more on constructing backward pass.
Thus, propose to separate Layer
into individual module. Additionally, stateless function like relu
should also be a conceptual layer
.
For external user, only Layer
and Module
are visible. Can build new Layer
with exisiting Layer
. Can build new Module
with existing Layer
from singa.layers import Conv2d, relu
For internal user/advanced user, can build Layer
with Operation
.
Lastly, if this is finalized, autograd.py
should contains only Operation
, then we could keep the original naming convention. Not need to mark operation as "private" by prefixing underscore e.g. _RNN
from singa.
Agree on point 1,
Layer
vsOperation
:Currently
Layer
is inautograd.py
, but whatLayer
does is defining forward connection using building blocks includingOperation
andLayer
. Thus by rightLayer
is not part ofautograd
. Singaautograd
is more on constructing backward pass.Thus, propose to separate
Layer
into individual module. Additionally, stateless function likerelu
should also be a conceptuallayer
.
We can createclass ReLU(Layer)
in layer.py.For external user, only
Layer
andModule
are visible. Can build newLayer
with exisitingLayer
. Can build newModule
with existingLayer
from singa.layers import Conv2d, relu
should be ReLU
?
For internal user/advanced user, can build
Layer
withOperation
.Lastly, if this is finalized,
autograd.py
should contains onlyOperation
, then we could keep the original naming convention. Not need to mark operation as "private" by prefixing underscore e.g._RNN
yes.
from singa.
* Scheme 1 ![scheme_1](https://user-images.githubusercontent.com/32295829/81771083-a34f7700-9514-11ea-92bd-48f884d760f7.png) * Scheme 2 ![scheme_2](https://user-images.githubusercontent.com/32295829/81771093-a9455800-9514-11ea-9ed6-7f6003c807ff.png)
How about using the following APIs.
class Module:
def __init__(self, inputs):
# inputs is a list of input tensors (placeholders)
# randomly fill each input tensor
self.forward(*inputs) # turn off the graph mode completely
# another option is to define a compile method
def compile(self, inputs, is_train, use_graph, graph_alg):
self.forward(*inputs)
class MyModel(Module):
def __init__(self, inputs):
# define all layers
# can we force the next call to be invoked at the end of the __init__ method?
super.__init__(inputs)
x = Placeholder((2, 3), device = gpu) # alias of Tensor.
m = MyModel([x])
# if use the other option
# m = MyModel()
# m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
for pname, ptensor in m.get_params():
ptensor.uniform(-1, 1)
y = Tensor((2,), device = gpu)
for npx, npy in data:
x.copy_from(npx)
y.copy_from(npy)
y_ = m(x) # build the graph in the first iteration.
l = m.loss(y, y_)
# ...
class MyLayer:
def __init__(kernel_initialization="he_uniform", name=None):
# kernel_initialization is a string for the predefined initialization method
# or a function that accept a tensor as input and fill the values in-place;
# this is to provide a default initialization method;
# users can also configure it to use customized initialization method or
# get the params out and fill the values explicitly as shown below.
self.init = False
self.kernel_initialization = ...
def __call__(self, x):
if self.init == False:
self.kernel = Tensor(...)
self.kernel_initialization(self.kernel)
else:
# do the forward propagation
from singa.
* Scheme 1 ![scheme_1](https://user-images.githubusercontent.com/32295829/81771083-a34f7700-9514-11ea-92bd-48f884d760f7.png) * Scheme 2 ![scheme_2](https://user-images.githubusercontent.com/32295829/81771093-a9455800-9514-11ea-9ed6-7f6003c807ff.png)
How about using the following APIs.
class Module: def __init__(self, inputs): # inputs is a list of input tensors (placeholders) # randomly fill each input tensor self.forward(*inputs) # turn off the graph mode completely # another option is to define a compile method def compile(self, inputs, is_train, use_graph, graph_alg): self.forward(*inputs) class MyModel(Module): def __init__(self, inputs): # define all layers # can we force the next call to be invoked at the end of the __init__ method? super.__init__(inputs) x = Placeholder((2, 3), device = gpu) # alias of Tensor. m = MyModel([x]) # if use the other option # m = MyModel() # m.compile([x], is_train=True, use_graph=True, graph_alg='sequence') for pname, ptensor in m.get_params(): ptensor.uniform(-1, 1) y = Tensor((2,), device = gpu) for npx, npy in data: x.copy_from(npx) y.copy_from(npy) y_ = m(x) # build the graph in the first iteration. l = m.loss(y, y_) # ...class MyLayer: def __init__(kernel_initialization="he_uniform", name=None): # kernel_initialization is a string for the predefined initialization method # or a function that accept a tensor as input and fill the values in-place; # this is to provide a default initialization method; # users can also configure it to use customized initialization method or # get the params out and fill the values explicitly as shown below. self.init = False self.kernel_initialization = ... def __call__(self, x): if self.init == False: self.kernel = Tensor(...) self.kernel_initialization(self.kernel) else: # do the forward propagation
Thanks for such a detailed explanation! For the first scheme, we can't call the forward function in __init__ function because inputs
variable is just a list of placeholders. So we still need to separate initialization and forward propagation, or it will produce a runtime error.
from singa.
class MyLayer:
def __init__(kernel_initialization="he_uniform", name=None):
# kernel_initialization is a string for the predefined initialization method
# or a function that accept a tensor as input and fill the values in-place;
# this is to provide a default initialization method;
# users can also configure it to use customized initialization method or
# get the params out and fill the values explicitly as shown below.
self.initialized = False
self.kernel_initialization = ...
def init(self, x):
# init params and other state data
def forward(self, x):
# do the forward propagation
# __call__ function is inherited by all subclasses
# This part of code does not need to be implemented in every subclass
def __call__(self, x):
if self.initialized == False:
self.init(inputs)
self.initialized = True
self.forward(inputs)
In this way, when to initialize is transparent to users, users just need to implement __init__, init and forward function if they want to create a new Layer.
from singa.
- Before we call Module.forward(), we can randomly fill the placeholder tensors.
- We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the
__call__
method or in ainit()
method. It is up to the contributor.
Any comments on the drawbacks?
@dcslin @XJDKC
from singa.
- Before we call Module.forward(), we can randomly fill the placeholder tensors.
- We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the
__call__
method or in ainit()
method. It is up to the contributor.
- If we separate the initialization and the forward propagation, there is no need to fill the placeholder. The data of tensors will only be accessed in the forward propagation. For initialization, we just access their shapes, types and so on.
- That's great. But if users move the initialization into __call__ function, they should determine whether the layer has been initialized by themselves.
from singa.
- Before we call Module.forward(), we can randomly fill the placeholder tensors.
- We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the
__call__
method or in ainit()
method. It is up to the contributor.
For some models, it cannot use the random inputs, such as BERT within ONNX, some nodes may compute the indices of a tensor, and the next node may split the tensor by using these indices. If we randomly generate the inputs, this case always fails.
By the way, I prefer the idea of:
# another option is to define a compile method
def compile(self, inputs, is_train, use_graph, graph_alg):
self.forward(*inputs)
However, I'd like to add a method to compute the shape based on the inputs of each node instead of calling the forward function:
def compute_output_shape(self, input_shape):
# print(input_shape) # [(None, 10), (None, 12)]
return (None, input_shape[0][1] + input_shape[1][1] + 2)
Let me think about it, I'll comment the detailed API later.
from singa.
Considering the API requirement, and constraints as below:
API requirement:
- [Model] multi-input/output (multi loss fn)
- [Model] Load model from disk, in other words: Model param memory allocation should be done in
model.__init__
API constraints:
- [Model] graph module buffer first forward call or turn off graph module in the first forward call
- [Layer] layer param memory allocation & initialization requires input x
@XJDKC 's scheme 2 and @nudles 's Placeholder
is close to current implemenation, and changes required could be small.
For model building:
class MyModel(Model):
def __init__(self, inputs, configs):
self.mylayer=MyLayer(configs)
self.linear1=Linear(configs, kernel_init=configs.ker_init)
super.__init__(inputs) # maybe a bit confuse for user what is this
def forward(self, inputs):
return linear1(mylayer(inputs[0], inputs[1]))
For model running:
x=PlaceHolder(shape=(batch, shape1, shape2))
m=MyModel([x],**configs,**graph_configs)
m.on_device(gpu)
m.train()
for e in epochs:
for x, y in data_gen:
losses = m.loss(y, m(x))
m.optim(l1)
m.optim(l2)
For Layer building:
class MyLayer(Layer):
def __init__(self, configs):
self.configs = configs
def __call__(self, inputs):
if not self.init:
self.W = Tensor(self.configs, inputs.shape).initializer()
self.device_check(inputs, self.W)
self.init=True
return = operator1(inputs[0], inputs[1])
For Module class impl:
class Module:
def __init__(self, placeholder_input, configs):
turn_off_graph()
self.forward(*placeholder_input)
turn_on_graph()
def __call__(self,inputs):
return self.forward(*inputs)
from singa.
- Before we call Module.forward(), we can randomly fill the placeholder tensors.
- We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the
__call__
method or in ainit()
method. It is up to the contributor.1. If we separate the initialization and the forward propagation, there is no need to fill the placeholder. The data of tensors will only be accessed in the forward propagation. For initialization, we just access their shapes, types and so on.
Then we need a method like infer_output_shapes(self, input_shapes)
; otherwise, we have to call the forward method to get the output shapes from the output tensor. I prefer to call the forward function to avoid adding a new method to each layer.
2. That's great. But if users move the initialization into __call__ function, they should determine whether the layer has been initialized by themselves.
from singa.
- Before we call Module.forward(), we can randomly fill the placeholder tensors.
- We can make Layer.init() optional. To implement a new layer, the parameter initialization can be done within the
__call__
method or in ainit()
method. It is up to the contributor.For some models, it cannot use the random inputs, such as BERT within ONNX, some nodes may compute the indices of a tensor, and the next node may split the tensor by using these indices. If we randomly generate the inputs, this case always fails.
Good point. Then we can config the data type when creating the placeholder and initialize the placeholder according to this data type. But how to initialize? randomly or set to 0? there could still be some issues. So the better way is to use real data instead of placeholders..
By the way, I prefer the idea of:
# another option is to define a compile method def compile(self, inputs, is_train, use_graph, graph_alg): self.forward(*inputs)However, I'd like to add a method to compute the shape based on the inputs of each node instead of calling the forward function:
def compute_output_shape(self, input_shape): # print(input_shape) # [(None, 10), (None, 12)] return (None, input_shape[0][1] + input_shape[1][1] + 2)
Do you need this one for onnx loading?
Let me think about it, I'll comment the detailed API later.
from singa.
Shall we go with the following APIs?
@joddiy @dcslin @XJDKC
They should be compatible with the current APIs.
class Module:
def compile(self, inputs, is_train, use_graph, graph_alg):
set train, graph etc config
turn off graph
if inputs are not filled, print warnings and fill inputs according to data type.
self.forward(*inputs)
def load(self, ckp_path, include_state=False):
load onnx model and copy the params to each layer;
generate warnings for mismatched layers/params.
restore the states and return it as a dict
def save(self, ckp_path, state={}):
save the model as onnx format
save the states
def forward(self, x): # turn on graph if necessary
pass
def train_one_batch(self, x, y): # turn on graph if necessary
pass
@deprecated
def loss(self, ):
pass
@deprecated
def optim(self,):
pass
class Layer:
def __init__(name=None):
self.init = False
def __call__(self, x):
if self.init == False:
init layer states
else:
# do the forward propagation
class MyLayer(Layer):
def __init__(self):
self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform')
self.layer2 = layer.MaxPool2d(kernel=3, stride=2)
def forward(self, x):
return self.layer2(self.layer1(x))
class MyModule(Module):
def __init__(self):
self.blk1 = MyLayer()
self.blk2 = MyLayer()
self.optim = SGD()
self.loss = CrossEntropyLoss()
def forward(self, x):
return self.blk2(self.blk1(x))
def train_one_batch(self, x, y):
y_ = self.forward(x)
l = self.loss(y_, y)
self.optim.backward_and_update(l)
return l
x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
fill x with values
m = MyModel()
# compatible with existing code which does not have the following two statements.
m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
for pname, ptensor in m.get_params():
ptensor.uniform(-1, 1) # not necessary if each layer's param init methods are configured.
y = Placeholder((2,), device = gpu)
for npx, npy in data:
x.copy_from(npx)
y.copy_from(npy)
m.train_one_batch(x, y) # build the graph in the first iter. For the old code, the params are initialized here.
m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
from singa.
So we replace the loss and optim with train_one_batch?
Should we make Module a subclass of Layer?
from singa.
To me this api is more clear. For example, the model.compile([x])
make more sense as it require x
as arg, compared to model([x])
(first call for init purpose).
Also introducing train_one_batch()
gives flexibility on loss function and optim function.
While train_one_batch()
is only an interface, and let user to define, and there is no decorator for this method. It sounds like there is no enforcement for user to use this method. Maybe we can make it clear in documentation that this is singa-way to build the model to enforece this train_one_batch()
method.
from singa.
Shall we go with the following APIs?
@joddiy @dcslin @XJDKC
They should be compatible with the current APIs.class Module: def compile(self, inputs, is_train, use_graph, graph_alg): set train, graph etc config turn off graph if inputs are not filled, print warnings and fill inputs according to data type. self.forward(*inputs) def load(self, ckp_path, include_state=False): load onnx model and copy the params to each layer; generate warnings for mismatched layers/params. restore the states and return it as a dict def save(self, ckp_path, state={}): save the model as onnx format save the states def forward(self, x): # turn on graph if necessary pass def train_one_batch(self, x, y): # turn on graph if necessary pass @deprecated def loss(self, ): pass @deprecated def optim(self,): pass class Layer: def __init__(name=None): self.init = False def __call__(self, x): if self.init == False: init layer states else: # do the forward propagation class MyLayer(Layer): def __init__(self): self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform') self.layer2 = layer.MaxPool2d(kernel=3, stride=2) def forward(self, x): return self.layer2(self.layer1(x)) class MyModule(Module): def __init__(self): self.blk1 = MyLayer() self.blk2 = MyLayer() self.optim = SGD() self.loss = CrossEntropyLoss() def forward(self, x): return self.blk2(self.blk1(x)) def train_one_batch(self, x, y): y_ = self.forward(x) l = self.loss(y_, y) self.optim.backward_and_update(l) return l x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor fill x with values m = MyModel() # compatible with existing code which does not have the following two statements. m.compile([x], is_train=True, use_graph=True, graph_alg='sequence') for pname, ptensor in m.get_params(): ptensor.uniform(-1, 1) # not necessary if each layer's param init methods are configured. y = Placeholder((2,), device = gpu) for npx, npy in data: x.copy_from(npx) y.copy_from(npy) m.train_one_batch(x, y) # build the graph in the first iter. For the old code, the params are initialized here. m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
This approach still postpones the operation init till the training phase right? When the user has a batch of samples, he calls train_one_batch
, to call forward
, and then to call _call_
:
def __call__(self, x):
if self.init == False:
init layer states
it's still strange to init the graph until the user has the data.
In my opinion, the current problem is,
- we don't have the shape of the input -> so we using a Placeholder as the input
- even we have the shape of input data, we cannot compute the all shapes of intermediate tensors since we cannot call the forward with Placeholder -> we may want to init random data but it may incur error.
So, the key point is, we bind the graph construction with forward
function. Only when we call forward, we construct the graph. But if we want to call forward we must have the real data.
Then I'm thinking about separating the graph construction with forward
function. We define several classes called Graph
, Node
, the Graph
stores relationship between Node
s, and Node
s stores an Operation
as well as its input and output.
In the _call_
function of an Operation
, we don't call the forward
function, instead, create a Node
, and stores this operation itself within this Node
, set its input and output, then return this newly created Node
. So finally, in the following code:
class Operation(object):
def __init__(self):
pass
def __call__(self, previous_node): # for multiply input is similiar
# create an Node
# link the current with previous node
# do the infer_shape, set the shape of each input and output for the current node and previous node
current_node = new Node()
current_node.input.node = previous_node
current_node.operation = self
current_node.output.shape = infer_shape()
previous_node.output.node = current_node
return current_node
def forward():
pass
def backward():
pass
def infer_shape():
pass
We actually constructed a Graph
linked with Node
by using the following code:
class MyModule(Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = autograd.Conv2d(1, 20, 5, padding=0)
self.conv2 = autograd.Conv2d(20, 50, 5, padding=0)
self.sgd = opt.SGD(lr=0.01)
def construt_graph(self, x):
# x is a placeholder
# create the Graph linked with Node
y = self.conv1(x)
y = self.conv2(y)
self.graph = Graph(x, y)
def train(self, x, y):
y_ = self.graph.forward(x)
l = self.loss(y_, y)
self.optim.backward_and_update(l)
return l
def loss(self, out, y):
return autograd.softmax_cross_entropy(out, y)
def optim(self, loss):
self.sgd.backward_and_update(loss)
model = MyModule()
x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
model.construt_graph(x) # build the graph
y = Placeholder((2,), device = gpu)
for npx, npy in data:
x.copy_from(npx)
y.copy_from(npy)
m.train(x, y) # directly train
m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
from singa.
I see the @joddiy proposal is different only on a naming of a method, compile
to construt_graph
in terms of API perspective. Maybe can upgrade the backend to graph construction later without breaking new API?
from singa.
I think we don't need to fill placeholders. If we turn off the buffer, the operations and intermediate tensors in the forward functions can be generated without executing them. So we can get the output tensors in this way.
from singa.
I see the @joddiy proposal is different only on a naming of a method,
compile
toconstrut_graph
in terms of API perspective. Maybe can upgrade the backend to graph construction later without breaking new API?
Hi, @dcslin , the key point is the _call_
function of Operation(or Layer, since we want to merge these two), this _call
doesn't call the forward function instead it creates a new object we called Node
, so within the construt_graph
, or we say, compile
, we build the graph with a placeholder, which means we don't need to postpone the graph construction til we call forward function after we have the real data.
from singa.
class Module:
def compile(self, inputs, is_train, use_graph, graph_alg):
set train, graph etc config
===turn on graph===
if inputs are not filled, print warnings and fill inputs according to data type.
self.forward(*inputs)
===turn off graph===
def load(self, ckp_path, include_state=False):
load onnx model and copy the params to each layer;
generate warnings for mismatched layers/params.
restore the states and return it as a dict
def save(self, ckp_path, state={}):
save the model as onnx format
save the states
def forward(self, x): # turn on graph if necessary
pass
def train_one_batch(self, x, y): # turn on graph if necessary
pass
@deprecated
def loss(self, ):
pass
@deprecated
def optim(self,):
pass
class Layer:
def __init__(name=None):
self.init = False
def do_init(x):
===turn off graph===
init layer states
As the graph is turned off, the initialization operations will be executed
===restore the state of the graph===
def forward():
# do the forward propagation
def __call__(self, x):
if self.init == False:
self.do_init(x)
self.forward(x)
class MyLayer(Layer):
def __init__(self):
self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform')
self.layer2 = layer.MaxPool2d(kernel=3, stride=2)
def forward(self, x):
return self.layer2(self.layer1(x))
class MyModule(Module):
def __init__(self):
self.blk1 = MyLayer()
self.blk2 = MyLayer()
self.optim = SGD()
self.loss = CrossEntropyLoss()
def forward(self, x):
return self.blk2(self.blk1(x))
def train_one_batch(self, x, y):
y_ = self.forward(x)
l = self.loss(y_, y)
self.optim.backward_and_update(l)
return l
x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
# === no need to fill x with values===
m = MyModel()
# compatible with existing code which does not have the following two statements.
m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
for pname, ptensor in m.get_params():
ptensor.uniform(-1, 1) # not necessary if each layer's param init methods are configured.
y = Placeholder((2,), device = gpu)
for npx, npy in data:
x.copy_from(npx)
y.copy_from(npy)
m.train_one_batch(x, y) # build the graph in the first iter. For the old code, the params are initialized here.
m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
How about this proposal?
from singa.
class Module: def compile(self, inputs, is_train, use_graph, graph_alg): set train, graph etc config ===turn on graph=== if inputs are not filled, print warnings and fill inputs according to data type. self.forward(*inputs) ===turn off graph=== def load(self, ckp_path, include_state=False): load onnx model and copy the params to each layer; generate warnings for mismatched layers/params. restore the states and return it as a dict def save(self, ckp_path, state={}): save the model as onnx format save the states def forward(self, x): # turn on graph if necessary pass def train_one_batch(self, x, y): # turn on graph if necessary pass @deprecated def loss(self, ): pass @deprecated def optim(self,): pass class Layer: def __init__(name=None): self.init = False def do_init(x): ===turn off graph=== init layer states As the graph is turned off, the initialization operations will be executed ===restore the state of the graph=== def forward(): # do the forward propagation def __call__(self, x): if self.init == False: self.do_init(x) self.forward(x) class MyLayer(Layer): def __init__(self): self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform') self.layer2 = layer.MaxPool2d(kernel=3, stride=2) def forward(self, x): return self.layer2(self.layer1(x)) class MyModule(Module): def __init__(self): self.blk1 = MyLayer() self.blk2 = MyLayer() self.optim = SGD() self.loss = CrossEntropyLoss() def forward(self, x): return self.blk2(self.blk1(x)) def train_one_batch(self, x, y): y_ = self.forward(x) l = self.loss(y_, y) self.optim.backward_and_update(l) return l x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor # === no need to fill x with values=== m = MyModel() # compatible with existing code which does not have the following two statements. m.compile([x], is_train=True, use_graph=True, graph_alg='sequence') for pname, ptensor in m.get_params(): ptensor.uniform(-1, 1) # not necessary if each layer's param init methods are configured. y = Placeholder((2,), device = gpu) for npx, npy in data: x.copy_from(npx) y.copy_from(npy) m.train_one_batch(x, y) # build the graph in the first iter. For the old code, the params are initialized here. m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}How about this proposal?
Thanks for your comments. I guess it's a good idea we add a compile function before the training. Based on Ruling's code, if we don't want to run the computation during the init phase, we can add a function to compute the shape:
class Module:
def compile(self, inputs, is_train, use_graph, graph_alg):
set train, graph etc config
turn off graph
if inputs are not filled, print warnings and fill inputs according to data type.
self.forward(*inputs)
def load(self, ckp_path, include_state=False):
load onnx model and copy the params to each layer;
generate warnings for mismatched layers/params.
restore the states and return it as a dict
def save(self, ckp_path, state={}):
save the model as onnx format
save the states
def forward(self, x): # turn on graph if necessary
pass
def train_one_batch(self, x, y): # turn on graph if necessary
pass
@deprecated
def loss(self, ):
pass
@deprecated
def optim(self,):
pass
class Layer:
def __init__(name=None):
self.init = False
def do_init(x):
# compute the output shape
output_shape = self.infer_shape(x)
# init weights by the shape
init_weights()
# return a new Placeholder to the next operation
return Placeholder(output_shape, device = gpu, dtype=singa.float) # alias of Tensor
def forward():
# do the forward propagation
def __call__(self, x):
if self.init == False:
y = self.do_init(x)
y = self.forward(x)
return y
def infer_shape(x):
# infer shape
class MyLayer(Layer):
def __init__(self):
self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform')
self.layer2 = layer.MaxPool2d(kernel=3, stride=2)
def forward(self, x):
return self.layer2(self.layer1(x))
class MyModule(Module):
def __init__(self):
self.blk1 = MyLayer()
self.blk2 = MyLayer()
self.optim = SGD()
self.loss = CrossEntropyLoss()
def forward(self, x):
return self.blk2(self.blk1(x))
def train_one_batch(self, x, y):
y_ = self.forward(x)
l = self.loss(y_, y)
self.optim.backward_and_update(l)
return l
x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
# === no need to fill x with values===
m = MyModel()
# compatible with existing code which does not have the following two statements.
m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
for pname, ptensor in m.get_params():
ptensor.uniform(-1, 1) # not necessary if each layer's param init methods are configured.
y = Placeholder((2,), device = gpu)
for npx, npy in data:
x.copy_from(npx)
y.copy_from(npy)
m.train_one_batch(x, y) # build the graph in the first iter. For the old code, the params are initialized here.
m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
from singa.
Here is the latest API proposal
https://gist.github.com/nudles/d7f8043f251872333ec06f2701696cce
from singa.
resolved in #697
from singa.
Related Issues (20)
- Switch between CPU and GPU devices for cnn example HOT 4
- Save the downloaded datasets to local directory HOT 2
- Add running scripts for cnn and cifar_distributed_cnn examples HOT 4
- Intermediate information printing HOT 3
- Adding arguments for weight decay and momentum HOT 2
- Increase max epoch for cnn example for better convergence HOT 2
- Update CMakeLists.txt for release 4.0.0 HOT 1
- Check Apache license header for release 4.0.0
- OpenCL Compilation Fails
- Upload Release 4.0.0 Package to SVN HOT 1
- Update the NOTICE file for images HOT 1
- gitignore and gitmodules should be removed from the release tar file HOT 2
- Create a new branch dev-postgresql HOT 2
- Create the SumError New Loss Function HOT 1
- Dynamic Creation of Models HOT 2
- Need to return the gradients from optimizer HOT 4
- Maximum recursion depth exceeded in comparison for string HOT 1
- AttributeError: module 'singa.singa_wrap' has no attribute 'Communicator' HOT 2
- Update bloodmnist example by refining inline comments HOT 2
- Update documentation for distributed training HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from singa.