Comments (11)
@reedscot How is the speed for imagenet?
from caffe.
On Nvidia Titan GPU it finishes 1000 iterations in around 10 minutes. By 'iterations' I am not sure whether it is passing through the entire training set or just a subset, but I just mean the 'iteration' that is displayed as output during training. However, on my machine it slows down quite a bit as the memory consumption inexorably grows to almost 100%. By ~5000 iterations it is basically stuck, possibly thrashing. So, I am wondering if there is a memory leak or some memory that should be freed each iteration that is not being freed. I observe the same thing when I set solver_mode to 0 or 1 (CPU or GPU). Other than this everything seems to work (I can complete MNIST training for example).
from caffe.
I ran into the same error as you did, but the issue isn't with compute 3+ functionality, but rather architecture limitations prior to compute 3.0. In particular, for large networks, you're running out of blocks per grid dim (compute 2.0 had only 65535 blocks per dim, while 3.0 bumped it to 2^31-1). This is easily remedied by making the grid 2d, which gives you 65535^2 total available blocks (or even 3d if so desired), and changing all the thread index computations to: int index = threadIdx.x + (blockIdx.x + blockIdx.y*gridDim.x) * blockDim.x;
After making this change, I ran into another error with insufficient number of registers for some max pooling layers, so I also had to reduce the num_threads_per_block to 512 from 1024.
For reference, I am running the imagenet architecture (with a few small tweaks) on a tesla m2090
from caffe.
The problem was also encountered on NVIDIA GeForce GTX 560 Ti with compute capability of 2.1. The error message "Cuda kernel failed. Error: invalid configuration argument" is the proof that the original problem was indeed caused by not generating PTX back-end target for the GPUs with compute capability less than 3.0.
It has been solved by commit b5badf7 "Add CUDA gencode for all 2x & 3x arch compute capability combinations".
from caffe.
Beyond generating CUDA 2x arch through the compiler switch though, the problem still remains that with large networks, i.e. the included imagenet sample, the above problems prevent the code from running since you will run out of block indices and registers.
from caffe.
Hi
I have been trying to run caffe on imagenet with GTX 660Ti graphics card that has 3 GB of RAM and i am getting a cudamalloc error while allocating memory for layer params. Does this mean imagenet configuration cannot be supported on this hardware and i need to upgrade to 6 GB?
Alternatively, what would be the minimum GPU spec (RAM etc) for running imagenet configurations as provided in the package?
from caffe.
You can reduce the size of the batchs in the prototxt training and test
files, to reduce the memory requirements.
Sergio
2014-01-26 everanurag [email protected]
Hi
I have been trying to run caffe on imagenet with GTX 660Ti graphics card
that has 3 GB of RAM and i am getting a cudamalloc error while allocating
memory for layer params. Does this mean imagenet configuration cannot be
supported on this hardware and i need to upgrade to 6 GB?Alternatively, what would be the minimum GPU spec (RAM etc) for running
imagenet configurations as provided in the package?—
Reply to this email directly or view it on GitHubhttps://github.com//issues/12#issuecomment-33342872
.
from caffe.
Just tried reducing the batchsize in prototxt file, still getting the following error, any throughts?
F0126 21:48:38.671995 6452 syncedmem.cpp:48] Check failed: (cudaMalloc(&gpu_ptr_, size_)) == cudaSuccess (38 vs. 0)
*** Check failure stack trace: ***
@ 0x7f0b88046b7d google::LogMessage::Fail()
@ 0x7f0b88048c7f google::LogMessage::SendToLog()
@ 0x7f0b8804676c google::LogMessage::Flush()
@ 0x7f0b8804951d google::LogMessageFatal::~LogMessageFatal()
@ 0x4335fc caffe::SyncedMemory::mutable_gpu_data()
@ 0x423512 caffe::Blob<>::mutable_gpu_data()
@ 0x460b91 caffe::DataLayer<>::Forward_gpu()
@ 0x42a3c2 caffe::Net<>::ForwardPrefilled()
@ 0x422380 caffe::Solver<>::Solve()
@ 0x40d265 main
@ 0x7f0b8670676d (unknown)
@ 0x40e51d (unknown)
6452 Aborted (core dumped) GLOG_logtostderr=1
from caffe.
cudaError_t value 38 means no cuda-capable device is available, so maybe
doublecheck your hardware / driver installation.
(For error codes, check driver_types.h)
Yangqing
On Sun, Jan 26, 2014 at 9:45 PM, everanurag [email protected]:
Just tried reducing the batchsize in prototxt file, still getting the
following error, any throughts?F0126 21:48:38.671995 6452 syncedmem.cpp:48] Check failed:
(cudaMalloc(&gpu_ptr_, size_)) == cudaSuccess (38 vs. 0)
*** Check failure stack trace: ***
@ 0x7f0b88046b7d google::LogMessage::Fail()
@ 0x7f0b88048c7f google::LogMessage::SendToLog()
@ 0x7f0b8804676c google::LogMessage::Flush()
@ 0x7f0b8804951d google::LogMessageFatal::~LogMessageFatal()
@ 0x4335fc caffe::SyncedMemory::mutable_gpu_data()
@ 0x423512 caffe::Blob<>::mutable_gpu_data()
@ 0x460b91 caffe::DataLayer<>::Forward_gpu()
@ 0x42a3c2 caffe::Net<>::ForwardPrefilled()
@ 0x422380 caffe::Solver<>::Solve()
@ 0x40d265 main
@ 0x7f0b8670676d (unknown)
@ 0x40e51d (unknown)6452 Aborted (core dumped) GLOG_logtostderr=1
Reply to this email directly or view it on GitHubhttps://github.com//issues/12#issuecomment-33343375
.
from caffe.
It runs the mnist demo in GPU mode fine, so could this be due to large network in imagenet that needs GPU with more RAM (currently its 3 GB for me, GTX660Ti) ?
from caffe.
This seems to still be a problem. I would like to help make caffe work well on CUDA compute capability 2.x devices for ImageNet scale configurations.
@SWu's workaround solves the block indexing problem, but there are some questions about how to implement it in practice, since it would require any kernel to account for the fact that the grid may be 2D.
The most straightforward way would be something like this:
- Modify CAFFE_GET_BLOCKS to potentially return a 2D dim3
-
A way to compute the 2D block dimensions being:
int n = (N + CAFFE_CUDA_NUM_THREADS - 1) / CAFFE_CUDA_NUM_THREADS dim3 blocks(ceil(sqrt(n)), ceil(sqrt(n)))
-
- Modify the expressions in the kernels for getting a 1D index
- perhaps by making a macro like
CAFFE_GET_1D_INDEX()
- perhaps by making a macro like
However, there is probably a more principled way to account for the 2D structure in the first place, which would require more drastic rewriting of the kernels.
from caffe.
Related Issues (20)
- caffe time -model -weights -gpu=0
- BUG: error happens while building the project using cmake, if without preinstall `gflags`. HOT 1
- Makefile
- import error: segment fault when import caffe
- Segmentation fault (core dumped) when creating imageset
- MSBuild Error
- DeleteMe
- Glib 3.4.30 not found HOT 1
- Error MSB6006: "cmd.exe" exited with code -1073741 515 HOT 2
- blob.hpp dimension check code problem
- Is it possible to use OpenCL on FreeBSD without using ROCm?
- How to build Caffe(OpenCL) on Linux from source code? HOT 1
- Caffe(OpenCL) Error: ordered comparison between pointer and zero ('int32_t *' (aka 'int *') and 'int') HOT 1
- Failed inference with nyud-fcn32s-hha
- ю
- caffe installation HOT 1
- Assessment of the difficulty in porting CPU architecture for caffe
- How to add new layer to caffe like HardSigmoid or Resize HOT 1
- module 'caffe' has no attribute 'set_mode_cpu'
- `GLOG_LIBRARYRARY_DIRS` appears to be in error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from caffe.