Giter VIP home page Giter VIP logo

Comments (28)

XJDKC avatar XJDKC commented on July 23, 2024

I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized.

from singa.

XJDKC avatar XJDKC commented on July 23, 2024

The module class just buffers the operations during calling the __call__ function of the Layer and the forward function and backward function of the Operation.

from singa.

XJDKC avatar XJDKC commented on July 23, 2024

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized.

Yes, if the initization is in the init function, it will be not buffered automatically. If the initialization is in call function, we can still add a few lines to turn off the buffering. In both case, the graph function won't be affected.

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

Yes, when we buffer the ops, there won't be any actual run, but all the parameters size will be obtained after the call function.

from singa.

XJDKC avatar XJDKC commented on July 23, 2024

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

Yes, when we buffer the ops, there won't be any actual run, but all the parameters size will be obtained after the call function.

After calling the constructor of Layer class.

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

Yes, when we buffer the ops, there won't be any actual run, but all the parameters size will be obtained after the call function.

After calling the constructor of Layer class.

Constructor might not be enough, because in the case of RNN the CudnnRNNHandle.weight_size is inferred from the input size. So this can only be obtained from the forward function. FYI, this is the new Shicong's code in his PR (I guess this issue is for solving this PR problem):

self.W = Tensor(shape=(self.handle.weights_size,),

from singa.

nudles avatar nudles commented on July 23, 2024

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

do you mean making Module a subclass of Layer?

from singa.

nudles avatar nudles commented on July 23, 2024

I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized.

Yes, if the initization is in the init function, it will be not buffered automatically. If the initialization is in call function, we can still add a few lines to turn off the buffering. In both case, the graph function won't be affected.

how to turn it off?

from singa.

nudles avatar nudles commented on July 23, 2024

I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized.

but you need to identify the operations for initialization.. how to tell if an operation is for initialization or not?

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized.

Yes, if the initization is in the init function, it will be not buffered automatically. If the initialization is in call function, we can still add a few lines to turn off the buffering. In both case, the graph function won't be affected.

how to turn it off?

To turn off just need to add three lines:

  1. Before the statament before you want to execute, add two lines:
    flag = param.device.graph_enabled()
    param.device.EnableGraph(False)
  2. After the statement, add one line
    param.device.EnableGraph(flag)
    Note that param is any input tensor that has the attribute "device" for us to use

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized.

but you need to identify the operations for initialization.. how to tell if an operation is for initialization or not?

If the initization statement is in forward function: to tell that it is an initialization operation not for buffering, we need to add three lines:

  1. Before the initialization statament not for buffering, add two lines:
    flag = param.device.graph_enabled()
    param.device.EnableGraph(False)
  2. After the statement, add one line
    param.device.EnableGraph(flag)
    Note that param is any input tensor that has the attribute "device" for us to use

from singa.

nudles avatar nudles commented on July 23, 2024

I think it would be better to define a new function to wrap these lines:

def execute_once(fn):
    get flag
    disable graph
    fn()
    set flag

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

I think it would be better to define a new function to wrap these lines:

def execute_once(fn):
    get flag
    disable graph
    fn()
    set flag

@XJDKC Since we have used this stucture at least 2 times before, this wrapper should be useful. Could you help us wrap these line to turn off the buffering temporatory for tensor initialization (need to select the most suitable place/class to add the wrapper)

from singa.

nudles avatar nudles commented on July 23, 2024

we may need more discussion before the implementation, e.g., which option to go?

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

we may need more discussion before the implementation, e.g., which option to go?

My personal opinion is:

  1. keep the constructors of existing layer classes and examples unchanged (otherwise may not be backward compatible to current examples/APIs and may need lot of debug and finding errors of current examples, it is near the release so better not take the risk)
  2. for the new RNN PR the tensor size is infered from the input, and this initialzation statement should put inside execute_once(fn, dev), then there is no problem
  3. If we really want to support "get_params()" when we use RNN function, we may make use of module class to buffer the ops, then there won't be any actual run. In this case, all the parameters size will be obtained after the forward function and so we can use"get_params()" afterward. Say if we don't want to use graph after getting the parameter size, we can use ResetGraph to clear the buffer, then turn off buffer and run without graph. (In this setting, layer class needs to have module class buffering, so need to be a subclass of module)

@XJDKC @dcslin @joddiy may you please give your suggestions as well

from singa.

nudles avatar nudles commented on July 23, 2024

We can make the code compatible in this way

# in v3.1
def Foo(**kwargs):
   check if the args are for v3.0 or v3.1 and then call FooV300 or FooV310

In V4.0, we change the API completely.

yes. We may need to update Module later to support more functionalities like save and load.

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

We can make the code compatible in this way

# in v3.1
def Foo(**kwargs):
   check if the args are for v3.0 or v3.1 and then call FooV300 or FooV310

In V4.0, we change the API completely.

yes. We may need to update Module later to support more functionalities like save and load.

Yes, this selector for function version is very helpful (says v3.1), and will make the code compatible. Then later in V4.0, can change the API completely.

from singa.

XJDKC avatar XJDKC commented on July 23, 2024

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

Yes, when we buffer the ops, there won't be any actual run, but all the parameters size will be obtained after the call function.

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

Yes, when we buffer the ops, there won't be any actual run, but all the parameters size will be obtained after the call function.

After calling the constructor of Layer class.

Constructor might not be enough, because in the case of RNN the CudnnRNNHandle.weight_size is inferred from the input size. So this can only be obtained from the forward function. FYI, this is the new Shicong's code in his PR (I guess this issue is for solving this PR problem):

self.W = Tensor(shape=(self.handle.weights_size,),

Sorry, I haven't checked the newest code. Handles in the past code like ConvHandle are initialized in the constructor, so I thought CudnnRNNHandle was the same.

from singa.

XJDKC avatar XJDKC commented on July 23, 2024

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

do you mean making Module a subclass of Layer?

Yes, or making Layer a subclass of Module.

from singa.

XJDKC avatar XJDKC commented on July 23, 2024

I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized.

but you need to identify the operations for initialization.. how to tell if an operation is for initialization or not?

At present, if the operations are for initialization, they are likely to be in the constructor function. So I just turn on the graph before calling the __call__ function of Layer and forward and backward function of Operation. So all the initialization operations in the constructor will be executed immediately. For those initialization operations in the functions I mentioned above, I turn off the buffer before calling them. Like this, turn off graph.

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

Yes, when we buffer the ops, there won't be any actual run, but all the parameters size will be obtained after the call function.

I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved.

Yes, when we buffer the ops, there won't be any actual run, but all the parameters size will be obtained after the call function.

After calling the constructor of Layer class.

Constructor might not be enough, because in the case of RNN the CudnnRNNHandle.weight_size is inferred from the input size. So this can only be obtained from the forward function. FYI, this is the new Shicong's code in his PR (I guess this issue is for solving this PR problem):

self.W = Tensor(shape=(self.handle.weights_size,),

Sorry, I haven't checked the newest code. Handles in the past code like ConvHandle are initialized in the constructor, so I thought CudnnRNNHandle was the same.

@XJDKC FYI Even ConvHandle is not initialized in the constructor of Layer but in the call function, but the RNN is different from conv2d just because there is tensor initization in the call function https://github.com/apache/singa/blob/master/python/singa/autograd.py#L1758

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized.

but you need to identify the operations for initialization.. how to tell if an operation is for initialization or not?

At present, if the operations are for initialization, they are likely to be in the constructor function. So I just turn on the graph before calling the call function of Layer and forward and backward function of Operation. So all the initialization operations in the constructor will be executed immediately. For those initialization operations in the functions I mentioned above, I turn off the buffer before calling them. Like this, turn off graph.

@XJDKC Yes, that' why prof. suggest add the wrapper to it:

I think it would be better to define a new function to wrap these lines:
def execute_once(fn):
    get flag
    disable graph
    fn()
    set flag

from singa.

XJDKC avatar XJDKC commented on July 23, 2024

@chrishkchris. Thanks for the correction, It is my fault. But for the initialization of the handle, it won't submit operations to Device. So they are not included in the graph either.

from singa.

XJDKC avatar XJDKC commented on July 23, 2024

I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized.

but you need to identify the operations for initialization.. how to tell if an operation is for initialization or not?

At present, if the operations are for initialization, they are likely to be in the constructor function. So I just turn on the graph before calling the call function of Layer and forward and backward function of Operation. So all the initialization operations in the constructor will be executed immediately. For those initialization operations in the functions I mentioned above, I turn off the buffer before calling them. Like this, turn off graph.

@XJDKC Yes, that' why prof. suggest add the wrapper to it:

I think it would be better to define a new function to wrap these lines:
def execute_once(fn):
    get flag
    disable graph
    fn()
    set flag

Yes, we can use a decorator to achieve it.

from singa.

chrishkchris avatar chrishkchris commented on July 23, 2024

@chrishkchris. Thanks for the correction, It is my fault. But for the initialization of the handle, it won't submit operations to Device. So they are not included in the graph either.

yes, constructor of handle don't contain any operations to submit to Device, no matter in conv2d or in rnn.

In Shicong code the only thing to modify is to turn off buffering by wrapping the line in the link below

self.W = Tensor(shape=(self.handle.weights_size,),

While in long term (says V4.0), it is possible all the initizaltion will be moved to init function of Layer, not in the call function

from singa.

XJDKC avatar XJDKC commented on July 23, 2024

Got it, thanks.

from singa.

nudles avatar nudles commented on July 23, 2024

resolved in #697

from singa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.