Comments (22)
- I think I put int32 there because
_srng.binomial
would output int64 and I figured it needed to be 32 bit to be able to live on the GPU. Since floatX cannot be assumed to be 'float32' here, this is not really proper. I guess I figured using floatX as the target dtype wouldn't work, but if it does, we should just do that. Have you tested it? - As far as I can remember, the mean reason to use the symbolic shape in
get_output_for()
as much as possible, is because we want to support the case where the batch size is undefined (i.e.None
) at compile time. But I guess we could just check inside the layer whether the input shape is available and then use that, or fall back to the symbolic shape otherwise.
I don't recall if there are any other reasons to do it this way. At any rate I usually pay attention to using symbolic shapes in get_output_for
, for the "unknown batch size" use case. If this has performance implications we should probably add some more checks for this and use the shapes specified at compile time where possible.
from lasagne.
I guess I figured using floatX as the target dtype wouldn't work, but if it does I see no reason not to do that. Have you tested it?
Yes, my own code used a nonsymbolic shape and dtype=theano.config.floatX
, that's why I noticed at all. If you specify the dtype to be 'int64', it will compute a uniform distribution with 'floatX', threshold it and cast the result to 'int64'. If you specify the dtype to be 'floatX', it will compute the uniform distribution with 'floatX', threshold it and cast the result to 'floatX', which is what we want.
If this has performance implications we should probably add some more checks for this and use the shapes specified at compile time where possible.
I'm not sure if the performance difference would be notable in practice, but at least the uncomfortable UserWarning: MRG_RandomStreams Can't determine #streams from size (Shape.0), guessing 60*256
disappears if you specify a fixed shape at compile time.
from lasagne.
Yeah, I guess it would be nice to get rid of that warning :) So feel free to modify it to use the compile-time shape, although it would still be good to keep the "unknown batch size" use case in mind, so then we definitely need to check for it and fall back to the symbolic shape if necessary.
from lasagne.
Related (maybe) issue I have been debugging today: It seems like there's a memory leak when using DropoutLayer
. Here's an example:
INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
l_in = nntools.layers.InputLayer(shape=x_in.shape)
l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
l_hidden1_dropout = nntools.layers.DropoutLayer(l_hidden1, p=0.5)
l_out = nntools.layers.DenseLayer(l_hidden1_dropout, num_units=output_dim)
net_in = T.matrix()
output = theano.function([net_in], l_out.get_output(net_in))
return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
print 'Call {}, before: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
build_net(x)
print 'Call {}, after: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
yields
Call 0, before: 3108216832
Call 0, after: 3108216832
Call 1, before: 3108216832
Call 1, after: 3107168256
Call 2, before: 3107168256
Call 2, after: 3107168256
Call 3, before: 3107168256
Call 3, after: 3106119680
If I remove the dropout layer:
INPUT_DIM = 4
BATCH_SIZE = 10
def build_net(x_in, num_hidden_units=5, output_dim=2):
l_in = nntools.layers.InputLayer(shape=x_in.shape)
l_hidden1 = nntools.layers.DenseLayer(l_in, num_units=num_hidden_units)
l_out = nntools.layers.DenseLayer(l_hidden1, num_units=output_dim)
net_in = T.matrix()
output = theano.function([net_in], l_out.get_output(net_in))
return output(x_in)
x = np.random.randn(BATCH_SIZE, INPUT_DIM).astype(theano.config.floatX)
for n in xrange(4):
print 'Call {}, before: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
build_net(x)
print 'Call {}, after: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
it yields
Call 0, before: 3106119680
Call 0, after: 3106119680
Call 1, before: 3106119680
Call 1, after: 3106119680
Call 2, before: 3106119680
Call 2, after: 3106119680
Call 3, before: 3106119680
Call 3, after: 3106119680
The amount of memory leaked is of course larger when the network is bigger. Not sure if this is a theano
bug or a nntools
bug, but it came up when I was doing a large hyperparameter search (so building and training lots of networks sequentially in a for loop), and the memory leaks accumulated over time until I ran out of memory on my GPU.
from lasagne.
@craffel Does it still happen with the new version without the superfluous cast? Also, what Theano version are you using? Maybe they've fixed this in the latest version from git?
from lasagne.
Yeah, sorry, should have specified. Using the latest nntools (including the removal of the superfluous cast), and using the latest Theano from github (well, latest as of a few hours ago).
from lasagne.
Interesting! That seems more likely to be a Theano problem than an nntools problem though. Maybe if Jan also makes the change to use the compile-time shape instead of the symbolic shape, that could make a difference. But then there's still a bug somewhere.
from lasagne.
Yeah, the use of the symbolic shape is my only guess in terms of it being an nntools
issue. If it turns out not to be that, I can try to distill it into pure Theano to create a bug request over there.
from lasagne.
Even if that fixes it we should probably send a bug report, because it should really just work as it is.
from lasagne.
Maybe if Jan also makes the change to use the compile-time shape instead of the symbolic shape, [...]
I think we probably shouldn't do that unconditionally. Let's instead decide on how to handle switching between compile-time shape and runtime shape in the DropoutLayer, convolutional layers and anything else that may need it. As I said, the convolutional layers currently have an extra input_shape
parameter in get_output_for
-- should we do the same in the dropout layer to allow overriding the shape it would use?
[...] that could make a difference.
@craffel: To check for that, just replace input.shape
by self.input_layer.get_output_shape()
and try! You could also try THEANO_FLAGS=allow_gc=1
in case you disabled it.
But in any case, as Sander said, we should write a test case that doesn't use nntools and file a Theano issue.
from lasagne.
Right, sorry, should have specified that I have allow_gc=True
and linker=cvm' (you can also disable garbage collection via the linker config setting). And, just confirming that changing
DropoutLayerto use
self.input_layer.get_output_shape()` doesn't fix the memory leak.
from lasagne.
OK, do you still observe the leak when you remove all the layers except for the dropout layer? If so, you can copy out the implementation of the dropout layer and have a somewhat minimal example that does not use nntools. If you create an Issue for Theano, Frédéric will probably be able to track down the problem.
from lasagne.
I was curious and can confirm the leak. I've been able to minimize it to:
import theano
T = theano.tensor
import numpy as np
import nntools
def build_net():
l_in = nntools.layers.InputLayer(shape=(10,4))
l_out = nntools.layers.dropout(l_in, p=0.5)
fn = theano.function([l_in.input_var], l_out.get_output())
for n in xrange(10):
print 'Call {}, before: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
build_net()
print 'Call {}, after: {}'.format(
n, theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()[0])
That is, the sheer compilation triggers the leak, the function does not need to be run at all.
from lasagne.
Reported as Theano/Theano#2335.
from lasagne.
I will close this as the spurious cast has been removed, and the other is being taken care of at the Theano level.
from lasagne.
The second point of my initial post is still open, though -- we should use a nonsymbolic shape for the dropout mask when possible. I'm not sure what "when possible" should mean, though. "when there is no None
in the input layer's output shape"? "when the user did not tell otherwise in the constructor"? "when the user did not specify a shape in get_output_for
"?
from lasagne.
Good point, I missed that. I guess we can use it whenever the necessary shape info is specified? I think it's relatively safe to assume that if a shape is specified, the layer only has to deal with inputs of that shape.
from lasagne.
I think it's relatively safe to assume that if a shape is specified, the layer only has to deal with inputs of that shape.
But isn't the shape always specified? Won't self.input_layer.get_output_shape()
always return a fully-specified nonsymbolic shape? Is there a way to create an InputLayer
with a partially-defined shape?
from lasagne.
well, I figured we want to support at least the case of a variable batch size, so it would be okay to specify None
for the batch size. I don't know how well this is supported at the moment (the convolution layers might fail I guess).
from lasagne.
For the record (search engines). For those like me who (wrongly) set a fixed batch size in the input layer, but were passing batches with variable size into the network, this change in DropoutLayer
may trigger an error like:
ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[0] == 128, but the output's size on that axis is <SOME_NUMBER>.
The right thing to do is to set the batch size to None
when specifying the shape
of your input layers; this is what's referred to in the previous comment.
@benanne I tested both layers.Conv2DCCLayer
layers.Conv2DLayer
and layers.cuda_convnet.Conv2DCCLayer
, and both seem to be okay with a variable batch size indicated by None
.
from lasagne.
@dnouri do you mean layers.Conv2DLayer
? Because those two are the same now :) The cuda-convnet wrappers don't require any shape info so I'm sure they're fine. The regular Conv2DLayer
might or might not be (it shouldn't be too hard to fix that anyway). We'll need to add tests for this use case because it seems easy to forget about it.
from lasagne.
@benanne Yes, I meant layers.Conv2DLayer
; it just works. Haven't tried with dnn_conv or GpuCorrMM yet; these are the defaults in newer versions of Theano.
from lasagne.
Related Issues (20)
- ThinPlateSpline is bugged, proposed fix HOT 2
- Update Lasagne installation doc to new gpuarray backend HOT 3
- AttributeError: 'Conv2DLayer' object has no attribute 'flip_filters' HOT 1
- Theano discontinuation HOT 5
- Where is the GlobalMaxPool2D?? HOT 1
- How to save layer l_out as lasagne layer to network in json or h5 format to be imported from Matlab HOT 1
- how to get the exact value of the tensor variable and its type. HOT 2
- The tremendous different time consuming on mnist between cnn and mlp architecture. HOT 6
- How to put constraint on the weights in each layer. HOT 1
- How to put the constraint on parameters
- AttributeError: 'Conv2DLayer' object has no attribute 'num_groups' HOT 2
- Why the `bcast` is needed in `create_param()`? HOT 2
- rules in setting weights in the combination of conv2d layer and batch norm layer HOT 1
- updates.py HOT 2
- Hi! There are some problems about creating a new layer! HOT 1
- lasagne\layers\base.py HOT 1
- LocallyConnected2DLayer params not initialized correctly HOT 1
- Center Loss as an Objective Function?
- Error with mock in Python 3.8.3 and 3.9 HOT 3
- lasagne isn't running on CUDA (Windows 10) .theanorc setup HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lasagne.