torch / cunn Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Hi,
Can someone explain to me why this fails:
require 'cunn'
mytester = torch.Tester()
tests = {}
function tests.test()
local input = torch.randn(3,32,32)
local cnn = nn.Sequential()
cnn:add(nn.SpatialConvolution(3,8,5,5))
cnn:add(nn.ReLU())
cnn:add(nn.SpatialAveragePooling(2,2,2,2))
cnn:add(nn.SpatialConvolution(8,12,5,5))
cnn:add(nn.ReLU())
cnn:add(nn.SpatialAveragePooling(2,2,2,2))
local outsize = cnn:forward(input):size()
cnn:add(nn.Reshape(outsize[1]*outsize[2]*outsize[3]))
cnn:add(nn.Linear(outsize[1]*outsize[2]*outsize[3],20))
cnn:add(nn.ReLU())
cnn:add(nn.Linear(20,10))
local output = cnn:forward(input):clone()
local gradOutput = output:clone()
local gradInput = cnn:backward(input, gradOutput):clone()
cnn:float()
local input3 = input:float()
local output3 = cnn:forward(input3):clone()
local gradOutput3 = output3:clone()
local gradInput3 = cnn:backward(input3, gradOutput3):clone()
mytester:assertTensorEq(output3:float(), output:float(), 0.000001, "type float fwd err")
mytester:assertTensorEq(gradInput3:float(), gradInput:float(), 0.00001, "type float bwd err")
cnn:cuda()
local input2 = input3:cuda()
local gradOutput2 = gradOutput3:cuda()
local output2 = cnn:forward(input2)
local gradInput2 = cnn:backward(input2, gradOutput2)
print(gradInput2[1][1], gradInput[1][1])
mytester:assertTensorEq(output2:float(), output3, 0.000001, "type cuda fwd err")
mytester:assertTensorEq(gradInput2:float(), gradInput3, 0.00001, "type cuda bwd err")
end
mytester:add(tests)
mytester:run()
Yet when I comment out the SpatialConvolution lines, it passes. What is the difference in behavior.
Hi,
does anyone have a VolumetricConvolutionCuda module?
thanks
Getting unstable results with nn.testcuda(). Sometimes passes, sometimes fails, sometime segfaults. Ran tests due to cpu->gpu results discrepency for identical scripts and data.
Running Macbook Pro Retina 10,1 (mid 2012).
Yosemite 10.10.1, CUDA 6.5 - latest drivers and libs as of 12/1/14.
Re-installed today - as part of ongoing effort to solve cpu/gpu discrepancies - described at end.
Latest Torch7 install - using '2 line' scripts from Torch.ch. Used Clang 6.0 as CUDA 6.5 is incompatible with gcc49. Am not clear how scripts deal with libstdc++ issues.
Ran dependencies script as normal admin user.
Ran luajit-torch script using sudo -s.
This fails to build a loadable cunn properly. Local build fix did not work, due to cmake 3.0.2 changes in rpath handling. Edited FindCUDA.cmake as recommended - produced loadable libcunn.so.
Attached terminal sessions shows a common failure mode. Repeated testing shows passing, passing with significant delays, and failures ranging from failing a single test, to segfault, to out of memory.
Some background: have been struggling for 2-3 weeks trying to get cpu and gpu results to match. Have reinstalled all Torch components as well as CUDA numerous times. Did experiments with setting manualSeed(). Found that each platform produced repeatable results, but none of them matched. This is cpu and gpu on OSX, Ubuntu 14.04, and CENTOS 6.6. Timing differences with and without gpu are also inconsistent. Feels to me that this could be some kind of an install issue - but after having built the environment from scratch numerous times, am in the dark as to what it might be.
Following https://groups.google.com/forum/#!topic/torch7/cw2hetc_YjQ , there seems to be an inconsistency between nn
and cunn
versions of ClassNLLCriterion
, when the target has 2 dimensions (or more).
nn
simply throws an error, while cunn
doesn't. Reading the source code of CUDA version, I have the impression that if the target is 2D, then it requires it to be all zeros, except in the corresponding target position, which should have the index. It's a bit confusing to explain, maybe an example is easier
require 'cunn'
m = nn.ClassNLLCriterion()
a = torch.rand(2,3)
t1 = torch.Tensor({2,1})
t2 = torch.zeros(2,3); t2[1][2] = 2; t2[2][1] = 1;
t3 = torch.zeros(2,3); t3[1][1] = 2; t3[2][1] = 1;
-- t2 and t3 is similar to a one-hot encoding, but each
-- non-zero element should have the proper target index
-- in whatever position it whats
print(m:forward(a,t1))
print(m:forward(a,t2)) -- error in nn
-- cuda
m:cuda()
a = a:cuda(); t1 = t1:cuda(); t2= t2:cuda(); t3 = t3:cuda()
print(m:forward(a,t1))
print(m:forward(a,t2)) -- works in cunn, exactly as the previous
print(m:forward(a,t3)) -- works in cunn, exactly as the previous
They all produce the same results, whenever it works.
Is there a reason/logic behind this implementation ? Is it to support multi-target multi-class learning ?
The convolution network modules part:
3 :
{
padding : 3
kW : 7
nInputPlane : 8
gradBias : CudaTensor - size: 8
dW : 1
gradWeight : CudaTensor - size: 8x392
output : CudaTensor - size: 1x8x61x61
fgradInput : CudaTensor - size: 61x61
finput : CudaTensor - size: 392x3721
bias : CudaTensor - size: 8
weight : CudaTensor - size: 8x392
nOutputPlane : 8
gradInput : CudaTensor - empty
kH : 7
dH : 1
}
4 :
{
padding : 2
kW : 7
nInputPlane : 8
gradBias : CudaTensor - size: 12
dW : 1
gradWeight : CudaTensor - size: 12x392
output : CudaTensor - size: 1x12x59x59
fgradInput : CudaTensor - size: 59x59
finput : CudaTensor - size: 392x3481
bias : CudaTensor - size: 12
weight : CudaTensor - size: 12x392
nOutputPlane : 12
gradInput : CudaTensor - empty
kH : 7
dH : 1
}
As is listed above, the output size of layer 3 is 8x61x61
The layer 4 is defined as nn.SpatialConvolutionMM(8,12,7,7,1,1,3))
from which we should expected output is
(61+3-7)/1+1 = 58,
however, the module info shows that the output is 12x59x59.
Anyone can help me figure out this?
thx~
installation of cunn on Mac OS X (Yosemite) fails with the following error
Installing https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec... switching to 'build' mode
Cloning into 'cunn'...
remote: Counting objects: 50, done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 50 (delta 19), reused 35 (delta 12)
Receiving objects: 100% (50/50), 75.57 KiB | 0 bytes/s, done.
Resolving deltas: 100% (19/19), done.
Checking connectivity... done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/usr/local/bin/.." -DCMAKE_INSTALL_PREFIX="/usr/local/lib/luarocks/rocks/cunn/scm-1" && make
-- The C compiler identification is GNU 4.9.1
-- The CXX compiler identification is GNU 4.9.1
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/bin/gcc-4.9
-- Check for working C compiler: /usr/local/bin/gcc-4.9 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++-4.9
-- Check for working CXX compiler: /usr/local/bin/g++-4.9 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /usr/local
-- Found CUDA: /Developer/NVIDIA/CUDA-6.5 (Required is at least version "4.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_cunn-scm-1-6051/cunn/build
[100%] Building NVCC (Device) object CMakeFiles/cunn.dir//./cunn_generated_init.cu.o
Scanning dependencies of target cunn
Linking CXX shared module libcunn.so
[100%] Built target cunn
cd build && make install
[100%] Built target cunn
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/lib/luarocks/rocks/cunn/scm-1/lib/libcunn.so
/opt/local/bin/install_name_tool: object: /usr/local/lib/luarocks/rocks/cunn/scm-1/lib/libcunn.so malformed object (load command 23 cmdsize is zero)
/opt/local/bin/install_name_tool: object: /usr/local/lib/luarocks/rocks/cunn/scm-1/lib/libcunn.so malformed object (load command 23 cmdsize is zero)
-- Installing: /usr/local/lib/luarocks/rocks/cunn/scm-1/lua/cunn/init.lua
-- Installing: /usr/local/lib/luarocks/rocks/cunn/scm-1/lua/cunn/test.lua
Updating manifest for /usr/local/lib/luarocks/rocks
cunn scm-1 is now built and installed in /usr/local/ (license: BSD)
usage cunn also fails with similar error, there was another issue (torch/cutorch#66) that reported the same but it was closed without any resolution or help on how to resolve it
I managed to install cutorch and cunn with OS X 10.9.4, CUDA 6.5 and cmake 3.0.1, with the small changes described in ticket 27 of the cutorch repository.
However, I now have some problems running the forward method when my neural network includes a nn.Tanh layer. It works with a nn.Linear layer, but if I add the hyperbolic tangent I then get a C++ Exception.
Is there a way to see more details about what was the root cause of the exception? In the "th" interactive tool I don´t get further details, just a "C++ Exception" error message.
Did this happen to anyone else?
Thanks
[joonazan@arkkikaari char-rnn]$ luarocks install cunn
Installing https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec... switching to 'build' mode
Klone nach 'cunn' ...
remote: Counting objects: 49, done.
remote: Compressing objects: 100% (38/38), done.
remote: Total 49 (delta 15), reused 21 (delta 7), pack-reused 0
Empfange Objekte: 100% (49/49), 56.13 KiB | 0 bytes/s, Fertig.
Löse Unterschiede auf: 100% (15/15), Fertig.
Prüfe Konnektivität ... Fertig.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/joonazan/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/joonazan/torch/install/lib/luarocks/rocks/cunn/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install
-- The C compiler identification is GNU 5.1.0
-- The CXX compiler identification is GNU 5.1.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Torch7 in /home/joonazan/torch/install
-- Found CUDA: /opt/cuda (found suitable version "7.0", minimum required is "4.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_cunn-scm-1-2819/cunn/build
[100%] Building NVCC (Device) object CMakeFiles/cunn.dir/cunn_generated_init.cu.o
In file included from /home/joonazan/torch/install/include/THC/THCApply.cuh:5:0,
from /tmp/luarocks_cunn-scm-1-2819/cunn/HardTanh.cu:2,
from /tmp/luarocks_cunn-scm-1-2819/cunn/init.cu:14:
/home/joonazan/torch/install/include/THC/THCReduceApplyUtils.cuh:9:30: schwerwiegender Fehler: THCDeviceUtils.cuh: Datei oder Verzeichnis nicht gefunden
Kompilierung beendet.
CMake Error at cunn_generated_init.cu.o.cmake:206 (message):
Error generating
/tmp/luarocks_cunn-scm-1-2819/cunn/build/CMakeFiles/cunn.dir//./cunn_generated_init.cu.o
CMakeFiles/cunn.dir/build.make:55: die Regel für Ziel „CMakeFiles/cunn.dir/cunn_generated_init.cu.o“ scheiterte
make[2]: *** [CMakeFiles/cunn.dir/cunn_generated_init.cu.o] Fehler 1
CMakeFiles/Makefile2:60: die Regel für Ziel „CMakeFiles/cunn.dir/all“ scheiterte
make[1]: *** [CMakeFiles/cunn.dir/all] Fehler 2
Makefile:116: die Regel für Ziel „all“ scheiterte
make: *** [all] Fehler 2
Error: Build error: Failed building.
hi guys,
in the "koraykv/unsup" package there is a conv-psd autoencoder, i was trying to do something similar using cunn, but the spatialConvolutionCUDA module only allows output feature maps that are multiples of 16, so i wasn't able to make a decoder. does anybody know how to work around this?
thanks a lot
I'm playing with float
and cuda
networks, and now I get this
th> model.modules[1].padding = 0
[0.0001s]
th> model.modules[1].padding
0
[0.0001s]
th> model.modules[1]:forward(torch.CudaTensor(3,224,244))
[string "_RESULT={model.modules[1]:forward(torch.CudaT..."]:1: bad argument #1 (field padding does not exist)
stack traceback:
[C]: in function 'forward'
[string "_RESULT={model.modules[1]:forward(torch.CudaT..."]:1: in main chunk
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/trepl/init.lua:630: in function 'repl'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
[C]: at 0x00406260
[0.0008s]
th> model.modules[1].padding
[0.0000s]
th>
How could I go over this?
cmake at compile-time should be able to determine the compute capability of the card. we should use this to set nvcc flags, rather than use sm20 by default.
Will fix #144
Hi guys,
I noticed there is no cunn implementation of this criterion.
Do you plan to implement that any time soon?
We could try implementing it ourselves otherwise or just keep the criterion on CPU.
Best,
D.
I have encountered nondeterministic behavior in the backpropagation step of SpatialMaxPooling
when kW~=dW or kH~=dH
(i.e. atomicmaxgradinput
kernel is run). I suspect the issue being caused by atomicAdd()
and general non-associativity of floats. Thus, I'm fairly sure that this is a feature of parallelism but somehow I wanted to make you aware of it (it's scary)...
The following code works on my machine (GTX Titan, CUDA 346.46) when the GPU is under some load. If I let the computation run on CPU wit FloatTensors, it's deterministic.
local inp = torch.CudaTensor(1,18,18):zero()
inp[1][3][3] = 1 --will force adding up 3 numbers
fw = torch.CudaTensor(1,8,8):fill(0)
fw[1][2][1] = -0.00055786536540836
fw[1][2][2] = 0.00075417151674628
fw[1][1][2] = -0.00029314149287529
local model = nn.SpatialMaxPooling(3, 3, 2, 2):cuda()
model:forward(inp)
local bw = model:backward(inp, fw):clone()
for i=1,100 do
local diff = bw - model:backward(inp, fw)
print(diff:sum()) --sometimes, the input is not zero!
end
Note further that
local a = (-0.00055786536540836 + 0.00075417151674628) -0.00029314149287529
local b = 0.00075417151674628 + (-0.00029314149287529 -0.00055786536540836)
print(a-b) -- not zero
On a slightly related note, is there any reason for calling atomicmaxgradinput
instead of maxgradinput
on SpatialMaxPooling.cu:320?
The nn version works
model = nn.SoftMax()
x = torch.randn(1, 2, 65536, 1)
y = model:forward(x)
however the cunn version fails
model = model:cuda()
x = x:cuda()
y = model:forward(x)
with error
nngraph/gmodule.lua:281: invalid configuration argument at [...]/torch/extra/cunn/SoftMax.cu:153
This error does not occur when the image contains 65535 elements.
I have encountered this issue while working on some word embeddings extraction project. For example, if we tried to get the embedding at an out of range index while running on the CPU it throws an exception. However, when running on GPU it returns some redundant value. See the code chunk below:
On CPU:
ll = nn.LookupTable(5,6)
ll:forward(torch.Tensor(1):fill(100))
result:
index out of range
On GPU:
ll = nn.LookupTable(5,6):cuda()
ll:forward(torch.Tensor(1):fill(100))
result:
nan -2417556347208890920730624.000000 nan 41205700250658325873491584824761122816.000000 nan -0.000000
[torch.CudaTensor of size 1x6]
Any insights about what this value might mean?
Hi,
There seems to be a small bug in the cuda version of nn.PReLU. Basically the batch and data axes seem to be confused. CPU version works well. Here is an isolated example:
th> prelu = nn.PReLU(3)
th> prelu.weight = torch.FloatTensor({2, 4, 6})
th> test = torch.ones(2, 3):mul(-1)
th> test
-1 -1 -1
-1 -1 -1
[torch.FloatTensor of size 2x3]
th> prelu:forward(test) -- OK
-2 -4 -6
-2 -4 -6
[torch.FloatTensor of size 2x3]
th> prelu:cuda():forward(test:cuda()) -- bug
-2 -2 -4
-4 -6 -6
[torch.CudaTensor of size 2x3]
Should I fix it by submitting a patch to PReLU.cu?
trying to install cunn I have the following problem (error at the bottom). any ideas how to fix it? :
root@yitzhak-VirtualBox:/media/sf_Dima/Projects_Torch/examples_from_github/eladhoffer/ConvNet-torch-master# luarocks install cunn
Installing https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec... switching to 'build' mode
Missing dependencies for cunn:
cutorch >= 1.0
Using https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
Cloning into 'cutorch'...
remote: Counting objects: 75, done.
remote: Compressing objects: 100% (71/71), done.
remote: Total 75 (delta 8), reused 22 (delta 1), pack-reused 0
Receiving objects: 100% (75/75), 106.22 KiB | 181.00 KiB/s, done.
Resolving deltas: 100% (8/8), done.
Checking connectivity... done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/yitzhak/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/yitzhak/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install
-- The C compiler identification is GNU 4.8.2
-- The CXX compiler identification is GNU 4.8.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /home/yitzhak/torch/install
CMake Error at /usr/share/cmake-2.8/Modules/FindCUDA.cmake:548 (message):
Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
CMakeLists.txt:7 (FIND_PACKAGE)
-- Configuring incomplete, errors occurred!
See also "/tmp/luarocks_cutorch-scm-1-1323/cutorch/build/CMakeFiles/CMakeOutput.log".
Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec - Build error: Failed building.
in spatialmaxpooling, gradOutput freed twice:
With VolumetricConvolution have cunn lib, VolumetricMaxPooling doesn't have it.
If the batch size is large and you include per-class weights than ClassNLLCriterion is SUPER slow. Every tensor dereference it's pulling data back to the CPU. In my case I have 3 classes, batch size of 10,000 (yes I know this is high) and forward + backward on the criterion more than 100x slower than the network.
Now that I'm at Google I need to get permission to fix this, so it might take a while. But if someone has any spare cycles I would be extremely grateful if they can fix it :-)
Ubuntu, all VolumetricConvolution tests fail with different outputs.
examples:
1)
> nn.testcuda()
Running 81 tests
___|_____________________________________________________________________________ ==> VolumetricConvolution_forward_batch*** Error in `/usr/local/bin/luajit': double free or corruption (!prev): 0x0000000011762b50 ***
Aborted (core dumped)
> nn.testcuda()
Running 81 tests
___|_____________________________________________________________________________ ==> VolumetricConvolution_forward_batch$ Invalid argument 4: out of range
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
Segmentation fault (core dumped)
> nn.testcuda()
Running 81 tests
___|_____________________________________________________________________________ ==> VolumetricConvolution_forward_batch$ Invalid argument 2: invalid number of input planes
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
$ Invalid argument 2: out of range
Segmentation fault (core dumped)
require 'cunn'
values = {5.4334, 3.5017, 1.1042, -4.9523, 1.7309, 2.4248, 3.0710, -2.9635, 2.5460, -1.7663, 0.5645, 12.3819, -3.1641, -3.6385, 0.5742, -1.3508, -2.2765}
floatInput = torch.FloatTensor(values)
floatSoftMax = nn.SoftMax():float()
cudaInput = torch.CudaTensor(values)
cudaSoftMax = nn.SoftMax():cuda()
print(floatSoftMax:forward(floatInput))
print(cudaSoftMax:forward(cudaInput))
returns
atcold@elab-GPU1 ~ $ th bug.lua
0.0010
0.0001
0.0000
0.0000
0.0000
0.0001
0.0001
0.0000
0.0001
0.0000
0.0000
0.9986
0.0000
0.0000
0.0000
0.0000
0.0000
[torch.FloatTensor of dimension 17]
9.5879e-04
1.3893e-04
1.2635e-05
2.9598e-08
2.3645e-05
4.7326e-05
9.0312e-05
2.1627e-07
5.3424e-05
7.1603e-07
7.3652e-06
9.9866e-01
1.7696e-07
1.1011e-07
7.4370e-06
1.0849e-06
4.2989e-07
[torch.CudaTensor of dimension 17]
The SpatialFullConvolution tests sometimes fail due to invalid configurations:
SpatialFullConvolution_forward_batch
Function call failed
.../cunn/test.lua:785: bad argument #1 to 'forward' (3D or 4D (batch mode) tensor is expected)
It can be reproduced when running
torch -lcunn -e "nn.testcuda({'SpatialFullConvolution_forward_batch','SpatialFullConvolution_backward_batch'}, false, 1, 1449228794)"
The input sizes don't make sense: 16,14, 0.33333333333333, 2.3333333333333
I've only observed this in cunn, but I suspect the same could happen in nn.
It seems to me that the file init.lua has changed recently and this change causes an error.
Line 9: "nn.Module._flattenTensorBuffer['torch.CudaTensor'] = torch.FloatTensor.new"
throws error if
"require 'cunn'"
is used.
In particular I get the following error (if I don't remove the line):
.../torch/install/share/lua/5.1/cunn/init.lua:9: attempt to index field '_flattenTensorBuffer' (a nil value)
stack traceback:
.../torch/install/share/lua/5.1/cunn/init.lua:9: in main chunk
[C]: in function 'require'
stdin:1: in main chunk
[C]: at 0x00406670
Thanks for your help!
right now the kernels we wrote only support the last dimension.
p.s.: I'm treating any inconsistency with the CPU versions as bugs (as they rightly should be treated)
Fails for the case:
m=nn.SpatialMaxPooling(2,2,2,2):cuda()
input=torch.randn(128,1024,12,12):cuda()
m:forward(input)
error in SpatialMaxsampling.updateOutput: invalid argument
luajit: ...ch-distro/install/share/lua/5.1/nn/SpatialMaxPooling.lua:18: aborting
Code to reproduce the error:
require 'cunn'
require 'cutorch'
cutorch.setDevice(1)
model=nn.Sequential():add(nn.Linear(300, 500)):add(nn.LogSoftMax()):cuda()
batch_size=90000
output=model:forward(torch.rand(batch_size,300):float():cuda())
The error:
/home/tushar/torch/install/share/lua/5.1/nn/Sequential.lua:44: invalid argument at /tmp/luarocks_cunn-scm-1-144/cunn/LogSoftMax.cu:249
stack traceback:
[C]: in function 'updateOutput'
/home/tushar/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
[string "_RESULT={model:forward(torch.rand(batch_size,..."]:1: in main chunk
[C]: in function 'xpcall'
/home/tushar/torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl'
...shar/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
[C]: at 0x00406670
[0.2761s]
Large batch sizes work fine on CPU. Smaller batch sizes (<80k) work fine on GPU.
smth chntla on the issue:
It must be that the cuda launch parameters (number of blocks/threads) were configured without such large batch sizes in mind
Fails when installing cunn
On: OSX 10.10.5, cmake 3.3.2, CUDA 7.5.21, fresh build of Torch7
Output:
luarocks install cunn
Installing https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec... switching to 'build' mode
Missing dependencies for cunn:
cutorch >= 1.0
Using https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
Cloning into 'cutorch'...
remote: Counting objects: 82, done.
remote: Compressing objects: 100% (79/79), done.
remote: Total 82 (delta 7), reused 26 (delta 0), pack-reused 0
Receiving objects: 100% (82/82), 126.43 KiB, done.
Resolving deltas: 100% (7/7), done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/Users/jrbaldwin/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/Users/jrbaldwin/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install
-- The C compiler identification is AppleClang 6.1.0.6020049
-- The CXX compiler identification is AppleClang 6.1.0.6020049
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Torch7 in /Users/jrbaldwin/torch/install
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found suitable version "7.5", minimum required is "5.5")
-- Compiling for CUDA architecture: 3.0
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_cutorch-scm-1-4865/cutorch/build
[ 6%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCReduceApplyUtils.cu.o
[ 6%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorSort.cu.o
[ 6%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o
[ 8%] Generating TensorMath.c
[ 10%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCStorageCopy.cu.o
[ 13%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCStorage.cu.o
[ 15%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensor.cu.o
[ 17%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o
In file included from :326:
In file included from :13:
In file included from /usr/local/cuda/include/cuda_runtime.h:112:
/usr/local/cuda/include/common_functions.h:65:10: fatal error: 'string.h' file not found
In file included from :326:
In file included from :13:
In file included from /usr/local/cuda/include/cuda_runtime.h:112:
/usr/local/cuda/include/common_functions.h:65:10: fatal error: 'string.h' file not found
In file included from :326:
In file included from :13:
In file included from /usr/local/cuda/include/cuda_runtime.h:112:
/usr/local/cuda/include/common_functions.h:65:10: fatal error: 'string.h' file not found
IIn nf ilfei lien cilnucdleudd efdr ofmr o<mb ui:n3>2:63:2
6I:n
Ifni lfei lien cilnucdleudd efdr ofmr o<mc on:e1>3::1
3I:n
Ifni lfei lien cilnucdleudd efdr ofmr o/mu s/ru/slro/claolc/aclu/dcau/dian/cilnucdleu/dceu/dcau_drau_nrtuinmtei.mhe:.1h1:21:1
2:
/usr/local/cu/duas/ri/nlcolcuadle//ccuodmam/oinn_cfluundcet/icoonmsm.ohn_functions.h:65:10: fata:l6 5e:r1r0o:r :f a'tsatlr ienrgr.ohr': f'islter innogt. hf'o ufnidl
e not found
In file included from :326:
In file included from :13:
In file included from /usr/local/cuda/include/cuda_runtime.h:112:
/usr/local/cuda/include/common_functions.h:65:10: fatal error: 'string.h' file not found
In file included from :326:
In file included from :13:
In file included from /usr/local/cuda/include/cuda_runtime.h:112:
/usr/local/cuda/include/common_functions.h:65:10: fatal error: 'string.h' file not found
^^
^
^
^
^
^
Scanning dependencies of target cutorch_static
1 error generated.
1 error generated.
1 error generated.
1 error generated.
CCMMaakkee EErrrroorr aatt TTHHCC__ggeenneerraatteedd__TTHHCCTSetnosroagre.Ccoup.yo..ccum.aok.ec:m2a0kCe7M: a2(k0em7 e Es(rsmraeogsres )aa:gt
e )T :HE
Cr _r goEerrn regoreran teegrdea_nteTirHanCtgiB
nl ga
s/ .t cm/upt./mopl./uclamuraaokrcoekcsk:_s2c_0uc7ut too(rrmccehhs--sssaccgmme--)11:--
44 88 66E55r//rccouuttoorcrchh//bbuuiirll ddg//ellniiebbr//aTTtHiCnH/gC
C/ MC aM/katekmFepiF/lilelusea/srT/oHTcCHk.Csd._idcriu/rt//o./r/.cT/hHT-CHs_Ccg_meg-ne1en-re4ar8ta6et5de/_dcT_uHTtCHoSCrtTcoehrn/asbgoueriC.locdpu/y.l.oic
bu
/.ToH
C
/C
Make
Files/THC.dir//./THC_generated_THCBlas.cu.o
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensor.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCBlas.cu.o] Error 1
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCStorageCopy.cu.o] Error 1
CMake Error at THC_generated_THCTensorCopy.cu.o.cmake:207 (message):
Error generating
/tmp/luarocks_cutorch-scm-1-4865/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorCopy.cu.o
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o] Error 1
1 error generated.
CMake Error at THC_generated_THCReduceApplyUtils.cu.o.cmake:207 (message):
Error generating
/tmp/luarocks_cutorch-scm-1-4865/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCReduceApplyUtils.cu.o
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCReduceApplyUtils.cu.o] Error 1
[ 19%] Building C object CMakeFiles/cutorch_static.dir/Storage.c.o
[ 21%] Building C object CMakeFiles/cutorch_static.dir/init.c.o
[ 26%] Building C object CMakeFiles/cutorch_static.dir/Tensor.c.o
[ 26%] Building C object CMakeFiles/cutorch_static.dir/TensorMath.c.o
[ 30%] Building C object CMakeFiles/cutorch_static.dir/torch/utils.c.o
[ 30%] Building C object CMakeFiles/cutorch_static.dir/TensorOperator.c.o
[ 32%] Linking C static library libcutorch.a
[ 32%] Built target cutorch_static
1 error generated.
CMake Error at THC_generated_THCTensorSort.cu.o.cmake:207 (message):
Error generating
/tmp/luarocks_cutorch-scm-1-4865/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorSort.cu.o
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorSort.cu.o] Error 1
1 error generated.
CMake Error at THC_generated_THCStorage.cu.o.cmake:207 (message):
Error generating
/tmp/luarocks_cutorch-scm-1-4865/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCStorage.cu.o
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCStorage.cu.o] Error 1
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Error 2
make: *** [all] Error 2
Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec - Build error: Failed building.
gcc -v
:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/c++/4.2.1
Apple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.5.0
Thread model: posix
Hi, all~
Currently, I'm looking for torch API which has stride mechanism for convolution and pooling and I got nn.SpatialConvolutionCUDA
and nn.SpatialMaxPoolingCUDA
ported from cuda-convnet
The SptialConvolutionCUDA
takes a BHWD
input format, while SpatialMaxPoolingCUDA
not.
I've got error when add max-pooling after conv funciton.
static int cunn_SpatialMaxPoolingCUDA_updateOutput(lua_State *L)
{
THCudaTensor *input = (THCudaTensor *)luaT_checkudata(L, 2, "torch.CudaTensor");
int kW = luaT_getfieldcheckint(L, 1, "kW");
int kH = luaT_getfieldcheckint(L, 1, "kH");
int dW = luaT_getfieldcheckint(L, 1, "dW");
int dH = luaT_getfieldcheckint(L, 1, "dH");
THCudaTensor *output = (THCudaTensor *)luaT_getfieldcheckudata(L, 1, "output", "torch.CudaTensor");
luaL_argcheck(L, input->nDimension == 4, 2, "4D (batch) tensor expected");
long nInputCols = input->size[2];
long nInputRows = input->size[1];
long nInputPlane = input->size[0];
long batchSize = input->size[3];
long nOutputCols = (nInputCols - kW) / dW + 1;
long nOutputRows = (nInputRows - kH) / dH + 1;
This code snippet should be reivsed like this
long batchSize = input->size[0];
long nInputRows = input->size[1];
long nInputCols = input->size[2];
long nInputPlane = input->size[3];
At this moment, the SpatialMaxPoolingCUDA
will take effect only if we change ouput format from BHWD
to DHWB
, OMG.....
When I run test.sh, th -lcunn -e "nn.testcuda()" is unstable and occasionally fails with different error messages:
Every time the error message is different, some examples are:
SpatialSubSampling_backward
error on state (backward)
LT(<) violation val=1.2421855926514, condition=0.01
/root/torch/install/share/lua/5.1/torch/Tester.lua:26: in function 'assertlt'
/root/torch/install/share/lua/5.1/cunn/test.lua:1391: in function 'v'
LogSoftMax_forward_batch
error on state (forward)
LT(<) violation val=0.0010080337524414, condition=0.001
/root/torch/install/share/lua/5.1/torch/Tester.lua:26: in function 'assertlt'
/root/torch/install/share/lua/5.1/cunn/test.lua:2364: in function 'v'
I guess a similar issue was raised in #50 and solved(maybe?).
I updated to the latest torch and packages.
I use a ubuntu 14.04 docker image and cuda 7.0
Thanks.
The backward pass of the LookupTable seems to be non-deterministic on GPU, and different to the CPU implementation. Is this expected?
require 'torch'
require 'cutorch'
require 'nn'
require 'cunn'
do
local lt = nn.LookupTable(4096, 256):cuda()
lt.weight:fill(1)
lt.gradWeight:fill(0)
local input = torch.CudaTensor(4000):fill(1)
lt:forward(input)
print(lt.output:sum())
lt:backward(input, lt.output)
print(lt.gradWeight:sum())
end
do
local lt = nn.LookupTable(4096, 256)
lt.weight:fill(1)
lt.gradWeight:fill(0)
local input = torch.DoubleTensor(4000):fill(1)
lt:forward(input)
print(lt.output:sum())
lt:backward(input, lt.output)
print(lt.gradWeight:sum())
end
Output:
1024000
641312 -- this varies
1024000
1024000
@nouiz pointed out that they have excellent Volumetric convolutions in theano, recently implemented by @stencilman and others.
The code is Caffe-style, 500 lines of portability.
https://github.com/ballasn/Theano/blob/Corr3DMM/theano/sandbox/cuda/corr3d_gemm.cu
We can get these in to cunn as well.
if anyone wants to put their hands on this ping here and claim the task. (You can pretty much do copy-pasta programming here, starting with SpatialConvolutionMM). The work involved is about 1-2 hours.
If no one does it, I will get around to this probably this weekend (or when I find time),
Just noticed that these tests are constantly failing. We have a regression somewhere. Working on bisect.
When running luarocks install cunn
build fails with THCGenerateAllTypes.h No such file or directory
.
Copying the file manually helped: cp ./extra/cutorch/lib/THC/THCGenerateAllTypes.h ~/torch/install/include/THC/
OS: Ubuntu 15.10
CC=gcc-4.9
CXX=g++-4.9
Running VolumetricConvolution
in CUDA mode seems to alternate the dimension of the output tensor.
For example, I have an input batch of 5D as follows:
input = torch.Tensor(20, 2, 32, 64, 64)
I only have one layer as follows:
model:add(nn.VolumetricConvolution(2, 16, 1, 5, 5))
In CPU mode, doing a forward pass produces the correct output of dimension (20, 16, 32, 60, 60)
.
But in CUDA the dimensions get swapped and I get the output dimension as (20, 16, 60, 32, 60)
, which shouldn't be the case.
Ideally, both should produce the same output.
I'm using nn.MSECriterion and I've noticed that the following code results with negative value in very rare cases:
local err = criterion:forward(output, targets)
It happened to me in very rare cases (3-4 times in the last year) - in all cases I trained two networks at the same time over the same gpu.
Note that it happened after thousands of epochs and it is very hard to reproduce.
If someone knows why it can happen or how to fix/avoid it - please let me know.
Thanks in advance,
Itzik.
Let me start by saying I am new to Torch7 and Lua. I have Torch7 installed under Ubuntu 14.04. I have successfully installed the nn package. Now, I and am trying to install the cunn package using:
luarocks install cunn
The following is the output of my install:
kzachery@DELL:~$ luarocks install cunn
Installing https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cunn-scm-1.rockspec... switching to 'build' mode
Cloning into 'cunn'...
remote: Counting objects: 47, done.
remote: Compressing objects: 100% (28/28), done.
remote: Total 47 (delta 18), reused 30 (delta 16), pack-reused 0
Receiving objects: 100% (47/47), 49.04 KiB | 0 bytes/s, done.
Resolving deltas: 100% (18/18), done.
Checking connectivity... done.
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/kzachery/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/kzachery/torch/install/lib/luarocks/rocks/cunn/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install
-- The C compiler identification is GNU 4.8.2
-- The CXX compiler identification is GNU 4.8.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /home/kzachery/torch/install
-- Found CUDA: /usr/local/cuda (found suitable version "7.0", minimum required is "4.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_cunn-scm-1-9119/cunn/build
[100%] Building NVCC (Device) object CMakeFiles/cunn.dir//./cunn_generated_init.cu.o
In file included from /tmp/luarocks_cunn-scm-1-9119/cunn/init.cu:14:0:
/tmp/luarocks_cunn-scm-1-9119/cunn/HardTanh.cu:2:24: fatal error: THCApply.cuh: No such file or directory
#include "THCApply.cuh"
^
compilation terminated.
CMake Error at cunn_generated_init.cu.o.cmake:206 (message):
Error generating
/tmp/luarocks_cunn-scm-1-9119/cunn/build/CMakeFiles/cunn.dir//./cunn_generated_init.cu.o
make[2]: *** [CMakeFiles/cunn.dir/./cunn_generated_init.cu.o] Error 1
make[1]: *** [CMakeFiles/cunn.dir/all] Error 2
make: *** [all] Error 2
Error: Build error: Failed building.
What am I missing in order to get this package installed? Any help you could provide would be greatly appreciated.
What can we borrow from Alex's new code?
https://code.google.com/p/cuda-convnet2/
Hi,
Working with @soumith we discovered a bug in SpatialMaxPooling where the gradOutput is a non-contiguous tensor. Shouldn't be to hard to fix (copy gradOutput into contiguous tensor if not contiguous).
Even if input contains index 0
, it doesn't assert that situation.
None of the cunn modules that use THCudaTensor_pointwiseApply2
or THCudaTensor_pointwiseApply3
check the retvals, and so size inconsistency errors are silently ignored.
The functions return false if the tensor has too many dimensions, or if the sizes of the tensors mismatch.
I think there was a reason at the time why the pointwise functions did not throw Lua errors (I think it was because the caller had more contextual information about what the error should be, instead of a non-descriptive size mismatch). Either the retvals should be checked in cunn, or we should add Lua errors in cutorch's pointwiseApply*.
https://github.com/torch/cunn/search?utf8=%E2%9C%93&q=THCudaTensor_pointwiseApply2&type=Code
https://github.com/torch/cunn/search?utf8=%E2%9C%93&q=THCudaTensor_pointwiseApply3&type=Code
> require 'cunn'
true
[2.1727s]
> cpuModule = nn.ReLU(false)
[0.0000s]
> gpuModule = nn.ReLU(false):cuda()
[0.0002s]
> cpuInput = torch.DoubleTensor(8):fill(-1)
[0.0000s]
> cpuGradOutput = torch.DoubleTensor(9):uniform()
[0.0000s]
> gpuInput = cpuInput:cuda()
[0.1500s]
> gpuGradOutput = cpuGradOutput:cuda()
[0.0003s]
> cpuModule:updateOutput(cpuInput)
0
0
0
0
0
0
0
0
[torch.DoubleTensor of size 8]
> cpuModule:updateGradInput(cpuInput, cpuGradOutput)
...plearning/torch/cuth.llar.linktree/_lua/nn/Threshold.lua:26: inconsistent tensor size at torch/oss/nn/generic/Threshold.c:48
stack traceback:
...learning/torch/cuth.llar.linktree/_lua/fb/util/error.lua:76: in function <...learning/torch/cuth.llar.linktree/_lua/fb/util/error
.lua:72>
[C]: in function 'Threshold_updateGradInput'
...plearning/torch/cuth.llar.linktree/_lua/nn/Threshold.lua:26: in function 'updateGradInput'
[string "_RESULT={cpuModule:updateGradInput(cpuInput, ..."]:1: in function 'inner_func'
...learning/torch/cuth.llar.linktree/_lua/fb/trepl/init.lua:492: in function <...learning/torch/cuth.llar.linktree/_lua/fb/trepl/ini
t.lua:492>
...
> gpuModule:updateOutput(gpuInput)
0
0
0
0
0
0
0
0
[torch.CudaTensor of size 8]
[0.0006s]
> gpuModule:updateGradInput(gpuInput, gpuGradOutput)
0
0
0
0
0
0
0
0
[torch.CudaTensor of size 8]
[0.0004s]
A month ago, the same code can run perfectly fine, but today when I checked again, I got this error, any ideas or hints?
==> switching to CUDA
/usr/local/bin/luajit: unable to initialize cublas
stack traceback:
[C]: at 0x7f19499a5f50
[C]: in function 'require'
/usr/local/share/lua/5.1/cutorch/init.lua:2: in main chunk
[C]: in function 'require'
/usr/local/share/lua/5.1/cunn/init.lua:1: in main chunk
[C]: in function 'require'
cifardoall.lua:60: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:109: in main chunk
[C]: at 0x00404480
the code logic was like:
float *data;
THCudaCheck(cudaMalloc((void**)(&data), size * sizeof(float)));
THCudaCheck(cudaMemcpyAsync(data, self->data, THMin(self->size, size) * sizeof(float), cudaMemcpyDeviceToDevice));
THCudaCheck(cudaFree(self->data));
when considering scenario:
GPU have 4G RAM, but 3G used, if lua call resize() the storage from 3G to 3.5G, above code will crash by out of memory, but actually the GPU still have 1G spare ram.
the optimized logic should be like:
if device ram is not enough to malloc, first , copy current data from device to host, then release device ram , after that malloc new device ram , finally copy by the content from host to device.
Please consider this request's importance b/c device ram always very tight. it should be better if release ahead of malloc.
/home/nicholas14/cunn/ClassNLLCriterion.cu(11): error: identifier "assert" is undefined
I'm running a very tiny network (3-layer ConvNet with few tens of fiters) on a Nvidia Jetson TK1 with Ubuntu 14.04, Cuda 6.0 and the latest drivers (R19.3.0_armhf
).
If I use the CPU, everything works fine and the feedforward step is completed in 10s of ms.
If I try to use the GPU (using nn.SpatialConvolutionMM
and nn.SpatialMaxPooling
), I get the following error message I do not understand:
error in SpatialMaxsampling.updateOutput: too many resources requested for launch
/usr/local/bin/luajit: /usr/local/share/lua/5.1/nn/SpatialMaxPooling.lua:18: aborting
stack traceback:
[C]: in function 'SpatialMaxPooling_updateOutput'
/usr/local/share/lua/5.1/nn/SpatialMaxPooling.lua:18: in function 'updateOutput'
/usr/local/share/lua/5.1/nn/Sequential.lua:37: in function 'forward'
./src/profileNet.lua:32: in function 'time'
general-profiler.lua:33: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:129: in main chunk
[C]: at 0x0000cf89
Running the same code on a Intel machine with Ubuntu 14.04 and a GeForce GTX 780 this doesn't happen. A similar error, though, happened on another Intel machine with Ubuntu 14.04 and a Tesla K40c while trying to run OverFeat.
Moreover, this error does not happen if I load the model and process the SpatialMaxPooling
layer only. Instead, if I run all the previous layers, one by one, then it fails at the SpatialMaxPooling
one with the "too many resources requested for launch" Cuda error message.
I believe it has to do something with the registers available per block. Perhaps the TK1 has more limited resources and the SpatialMaxsampling
is trying to allocate too many registers.
If the output frame size of a cunn TemporalMaxPooling layer is greater than 1024, the output is no longer correct. Compare the CPU and GPU output of the code below:
require('torch')
require('cutorch')
require('cunn')
-- ConvNet parameters
kW = 20; dW = 1; maxPool = 4;
numFeatures1 = 20
print("Loading data...")
docData = torch.randn(4200, 530) --this data is transposed from the usual data format
nSamples = docData:size()[1]
nExamples = docData:size()[2]
n = nSamples
print("Creating neural network...")
-- ConvNet
n1 = (n-kW)/dW+1
n2 = torch.floor(n1/maxPool)
cpuFeatsNet = nn.Sequential()
tConvLayer = nn.TemporalConvolution(1, numFeatures1, kW)
cpuFeatsNet:add(tConvLayer)
cpuFeatsNet:add(nn.TemporalMaxPooling(maxPool))
gpuFeatsNet = nn.Sequential()
tConvLayerGpu = nn.TemporalConvolution(1, numFeatures1, kW)
gpuFeatsNet:add(tConvLayerGpu)
gpuFeatsNet:add(nn.TemporalMaxPooling(maxPool))
gpuFeatsNet:cuda()
batchSize = 530
numProcSamples = 530
tConvLayer.bias:zero()
tConvLayerGpu.bias:zero()
tConvLayerGpu.weight[{}] = tConvLayer.weight
function saveNewFeatures(krnToSave, procUnit)
procUnit = procUnit or "cpu"
if procUnit == "cpu" then
featsNet = cpuFeatsNet
print("Processing on the CPU...")
elseif procUnit == "gpu" then
featsNet = gpuFeatsNet
cudaInp = torch.CudaTensor(batchSize,nSamples,1)
print("Processing on the GPU...")
end
local feats = torch.Tensor(numProcSamples, n2)
for i=1,numProcSamples,batchSize do
tIdx = i
inp = docData[{{1,nSamples},{tIdx,tIdx+batchSize-1}}]:t():reshape(batchSize,nSamples,1):double()
if procUnit == "cpu" then
feats[{{i,i+batchSize-1},{}}] = featsNet:forward(inp)[{{},{},krnToSave}]
elseif procUnit == "gpu" then
cudaInp[{}] = inp --use same cuda memory instead of allocating new
cudaOut = featsNet:forward(cudaInp)[{{},{},krnToSave}]
feats[{{i,i+batchSize-1},{}}] = cudaOut:double()
end
end
return feats:t()
end
cpuFeats = saveNewFeatures(15, 'cpu')
gpuFeats = saveNewFeatures(15, 'gpu')
diff = cpuFeats-gpuFeats
print("Here is the difference between the output of a CPU net and a GPU net, for an arbitrary feature (15 in this case) from index 1020 to 1030. Note the errors beginning at index 1025.")
print(diff[{{1020,1030},1}])
Hi,
can you point me to a example application of the CUDA convolution modules? A repo with code that makes use of these would work. If not, that's ok.
--Nick
Hey all,
I've been testing SpatialConvolutionMM
quite a bit, and as for the CPU version at the time, it's replacing all other conv modules for me at this point. Questions:
SpatialConvolution
, for which the perf is ridiculously lowSpatialConvolutionMap
, which is half-implemented (and not sure anybody uses this type of module)SpatialConvolutionCUDA
and SpatialMaxPoolingCUDA
modules, which were only here temporarily (Alex's kernels)I vote yes for the first 2 questions, ok to keep the 3rd alive for a bit more since people might depend on them.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.