Looking at <a href="https://github.com/benanne/nntools/blob/2840cbc657ff341ef2f1654c1a

I think I put int32 there because _srng.binomial</code

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Dropout performance about lasagne HOT 22 CLOSED

lasagne commented on August 27, 2024

Dropout performance

from lasagne.

Comments (22)

benanne commented on August 27, 2024

I think I put int32 there because _srng.binomial would output int64 and I figured it needed to be 32 bit to be able to live on the GPU. Since floatX cannot be assumed to be 'float32' here, this is not really proper. I guess I figured using floatX as the target dtype wouldn't work, but if it does, we should just do that. Have you tested it?
As far as I can remember, the mean reason to use the symbolic shape in get_output_for() as much as possible, is because we want to support the case where the batch size is undefined (i.e. None) at compile time. But I guess we could just check inside the layer whether the input shape is available and then use that, or fall back to the symbolic shape otherwise.

I don't recall if there are any other reasons to do it this way. At any rate I usually pay attention to using symbolic shapes in get_output_for, for the "unknown batch size" use case. If this has performance implications we should probably add some more checks for this and use the shapes specified at compile time where possible.

from lasagne.

f0k commented on August 27, 2024

I guess I figured using floatX as the target dtype wouldn't work, but if it does I see no reason not to do that. Have you tested it?

Yes, my own code used a nonsymbolic shape and dtype=theano.config.floatX, that's why I noticed at all. If you specify the dtype to be 'int64', it will compute a uniform distribution with 'floatX', threshold it and cast the result to 'int64'. If you specify the dtype to be 'floatX', it will compute the uniform distribution with 'floatX', threshold it and cast the result to 'floatX', which is what we want.

If this has performance implications we should probably add some more checks for this and use the shapes specified at compile time where possible.

I'm not sure if the performance difference would be notable in practice, but at least the uncomfortable UserWarning: MRG_RandomStreams Can't determine #streams from size (Shape.0), guessing 60*256 disappears if you specify a fixed shape at compile time.

from lasagne.

benanne commented on August 27, 2024

Yeah, I guess it would be nice to get rid of that warning :) So feel free to modify it to use the compile-time shape, although it would still be good to keep the "unknown batch size" use case in mind, so then we definitely need to check for it and fall back to the symbolic shape if necessary.

from lasagne.

craffel commented on August 27, 2024

Related (maybe) issue I have been debugging today: It seems like there's a memory leak when using DropoutLayer. Here's an example:

INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
    l_in = nntools.layers.InputLayer(shape=x_in.shape)
    l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
    l_hidden1_dropout = nntools.layers.DropoutLayer(l_hidden1, p=0.5)
    l_out = nntools.layers.DenseLayer(l_hidden1_dropout, num_units=output_dim)
    net_in = T.matrix()
    output = theano.function([net_in], l_out.get_output(net_in))
    return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
    print 'Call {}, before: {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
    build_net(x)
    print 'Call {}, after:  {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])

yields

Call 0, before: 3108216832
Call 0, after:  3108216832
Call 1, before: 3108216832
Call 1, after:  3107168256
Call 2, before: 3107168256
Call 2, after:  3107168256
Call 3, before: 3107168256
Call 3, after:  3106119680

If I remove the dropout layer:

INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
    l_in = nntools.layers.InputLayer(shape=x_in.shape)
    l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
    l_out = nntools.layers.DenseLayer(l_hidden1, num_units=output_dim)
    net_in = T.matrix()
    output = theano.function([net_in], l_out.get_output(net_in))
    return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
    print 'Call {}, before: {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
    build_net(x)
    print 'Call {}, after:  {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])

it yields

Call 0, before: 3106119680
Call 0, after:  3106119680
Call 1, before: 3106119680
Call 1, after:  3106119680
Call 2, before: 3106119680
Call 2, after:  3106119680
Call 3, before: 3106119680
Call 3, after:  3106119680

The amount of memory leaked is of course larger when the network is bigger. Not sure if this is a theano bug or a nntools bug, but it came up when I was doing a large hyperparameter search (so building and training lots of networks sequentially in a for loop), and the memory leaks accumulated over time until I ran out of memory on my GPU.

from lasagne.

benanne commented on August 27, 2024

@craffel Does it still happen with the new version without the superfluous cast? Also, what Theano version are you using? Maybe they've fixed this in the latest version from git?

from lasagne.

craffel commented on August 27, 2024

Yeah, sorry, should have specified. Using the latest nntools (including the removal of the superfluous cast), and using the latest Theano from github (well, latest as of a few hours ago).

from lasagne.

benanne commented on August 27, 2024

Interesting! That seems more likely to be a Theano problem than an nntools problem though. Maybe if Jan also makes the change to use the compile-time shape instead of the symbolic shape, that could make a difference. But then there's still a bug somewhere.

from lasagne.

craffel commented on August 27, 2024

Yeah, the use of the symbolic shape is my only guess in terms of it being an nntools issue. If it turns out not to be that, I can try to distill it into pure Theano to create a bug request over there.

from lasagne.

benanne commented on August 27, 2024

Even if that fixes it we should probably send a bug report, because it should really just work as it is.

from lasagne.

f0k commented on August 27, 2024

Maybe if Jan also makes the change to use the compile-time shape instead of the symbolic shape, [...]

I think we probably shouldn't do that unconditionally. Let's instead decide on how to handle switching between compile-time shape and runtime shape in the DropoutLayer, convolutional layers and anything else that may need it. As I said, the convolutional layers currently have an extra input_shape parameter in get_output_for -- should we do the same in the dropout layer to allow overriding the shape it would use?

[...] that could make a difference.

@craffel: To check for that, just replace input.shape by self.input_layer.get_output_shape() and try! You could also try THEANO_FLAGS=allow_gc=1 in case you disabled it.

But in any case, as Sander said, we should write a test case that doesn't use nntools and file a Theano issue.

from lasagne.

craffel commented on August 27, 2024

Right, sorry, should have specified that I have allow_gc=True and linker=cvm' (you can also disable garbage collection via the linker config setting). And, just confirming that changingDropoutLayerto useself.input_layer.get_output_shape()` doesn't fix the memory leak.

from lasagne.

f0k commented on August 27, 2024

OK, do you still observe the leak when you remove all the layers except for the dropout layer? If so, you can copy out the implementation of the dropout layer and have a somewhat minimal example that does not use nntools. If you create an Issue for Theano, Frédéric will probably be able to track down the problem.

from lasagne.

f0k commented on August 27, 2024

I was curious and can confirm the leak. I've been able to minimize it to:

import theano
T = theano.tensor
import numpy as np
import nntools

def build_net():
    l_in = nntools.layers.InputLayer(shape=(10,4))
    l_out = nntools.layers.dropout(l_in, p=0.5)
    fn = theano.function([l_in.input_var], l_out.get_output())

for n in xrange(10):
    print 'Call {}, before: {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
    build_net()
    print 'Call {}, after:  {}'.format(
        n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])

That is, the sheer compilation triggers the leak, the function does not need to be run at all.

from lasagne.

f0k commented on August 27, 2024

Reported as Theano/Theano#2335.

from lasagne.

benanne commented on August 27, 2024

I will close this as the spurious cast has been removed, and the other is being taken care of at the Theano level.

from lasagne.

f0k commented on August 27, 2024

The second point of my initial post is still open, though -- we should use a nonsymbolic shape for the dropout mask when possible. I'm not sure what "when possible" should mean, though. "when there is no None in the input layer's output shape"? "when the user did not tell otherwise in the constructor"? "when the user did not specify a shape in get_output_for"?

from lasagne.

benanne commented on August 27, 2024

Good point, I missed that. I guess we can use it whenever the necessary shape info is specified? I think it's relatively safe to assume that if a shape is specified, the layer only has to deal with inputs of that shape.

from lasagne.

f0k commented on August 27, 2024

I think it's relatively safe to assume that if a shape is specified, the layer only has to deal with inputs of that shape.

But isn't the shape always specified? Won't self.input_layer.get_output_shape() always return a fully-specified nonsymbolic shape? Is there a way to create an InputLayer with a partially-defined shape?

from lasagne.

benanne commented on August 27, 2024

well, I figured we want to support at least the case of a variable batch size, so it would be okay to specify None for the batch size. I don't know how well this is supported at the moment (the convolution layers might fail I guess).

from lasagne.

dnouri commented on August 27, 2024

For the record (search engines). For those like me who (wrongly) set a fixed batch size in the input layer, but were passing batches with variable size into the network, this change in DropoutLayer may trigger an error like:

ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[0] == 128, but the output's size on that axis is <SOME_NUMBER>.

The right thing to do is to set the batch size to None when specifying the shape of your input layers; this is what's referred to in the previous comment.

@benanne I tested both ~~layers.Conv2DCCLayer~~ layers.Conv2DLayer and layers.cuda_convnet.Conv2DCCLayer, and both seem to be okay with a variable batch size indicated by None.

from lasagne.

benanne commented on August 27, 2024

@dnouri do you mean layers.Conv2DLayer? Because those two are the same now :) The cuda-convnet wrappers don't require any shape info so I'm sure they're fine. The regular Conv2DLayer might or might not be (it shouldn't be too hard to fix that anyway). We'll need to add tests for this use case because it seems easy to forget about it.

from lasagne.

dnouri commented on August 27, 2024

@benanne Yes, I meant layers.Conv2DLayer; it just works. Haven't tried with dnn_conv or GpuCorrMM yet; these are the defaults in newer versions of Theano.

from lasagne.

Dropout performance about lasagne HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent