torontodeeplearning / convnet Goto Github PK

View Code? Open in Web Editor NEW

503.0 74.0 229.0 19.59 MB

A GPU implementation of Convolutional Neural Nets in C++

License: BSD 2-Clause "Simplified" License

Makefile 0.36% C++ 23.50% Python 9.52% Cuda 56.22% C 9.72% Protocol Buffer 0.68%

convnet's Introduction

Welcome to ConvNet.

ConvNet is a fast C++ based GPU implementation of Convolutional Neural Nets.

Supports Multi-GPU architectures (Multiple GPUs, Single machine).
Provides a fast CPU-only feature extractor.

Installation

[Install guide] (https://github.com/torontodeeplearning/convnet/blob/master/INSTALL)

Pre-trained Models

Pre-trained models and examples for training and feature extraction are provided for

Tutorials

Coming soon.

Documentation

here

convnet's People

Contributors

Stargazers

Watchers

Forkers

research2010 wqren nashjojo linshijie openhero brb-chen yjxiong wangdongfrank phecy rootlessweed zengqiang2006 easyfmxu mblankq vlsi1217 garftalk yiiwood sucre nipengmath jwyang yanweifu amos-zq ty01csbaidu wycg1984 chenglongchen michaelbbtiger liufengcse liyanghua king1991wbs dongh11 qimiaoguo bruce2008github claudemit wenhuizhang changguanghua woooha feidong1991 jmliu88 dachylong 10sun tongming godson1024 yx-skywell robert0812 chaos-dd fancyspeed chagge maydaygmail mulinfro arshak a-b geraldstanje williamtang cteckwee dangermangls guolisen liujie8 dreadlord1984 jiangdong123 sunkaianna linshifei pkvprakash anteagle mudelin guker avdmitry markovcat putaozhuose amoliu xiaozhuka yaolubrain zoucan520 kelvinxu yzli yanshanjing alexkouts junliangxing lqshixinlei fairymane relationbuilder lgen monkeytang xi-qian mtao deercoderresearch tjusxh weilongye chengchengowen zhangry868 nerei pfshawn jamesjohnson92 ericeiffel xiyuanhou jjyycchh kracwarlock noelds invinciblejha jethrotan sleepsophia yigenliang

convnet's Issues

How to get performance gain using multigpu training

Hi, Nitish,

How to obtain efficiency gain with multigpu training? Some screenshots of training information are attached for your information. In my result, the one-batch training time is nearly the same for both one-gpu and two-gpu cases.

training with only one GPU board with one-step training time around 0.890:

training with two GPU board with one-step training time around 0.884:

Assertion failed when run py/run_convnet.py

I try to run the py/run_convnet.py in the following way:

In the path py/
Add "sys.path.insert('.../convnet')" in the run_convnet.py before ''import convnet as cn"
Then run "python run_convnet.py ../examples/imagenet/CLS_net_20140621074703.pbtxt ../examples/imagenet/CLS_net_20140621074703.h5 ../examples/imagenet/pixel_mean.h5"

The outputs as follows:
Using board 0
[u'input', u'hidden1_conv', u'hidden1_maxpool', u'hidden1_rnorm', u'hidden2_conv', u'hidden2_maxpool', u'hidden2_rnorm', u'hidden3_conv', u'hidden4_conv', u'hidden5_conv', u'hidden5_maxpool', u'hidden6', u'hidden7', u'output']
python: cudamat_conv_gemm.cu:477: void convUpGemm(cudamat, cudamat_, cudamat*, Shape4D, Shape4D, Shape4D, ConvDesc, float, float, bool): Assertion `output_channel_end - output_channel_begin == num_output_channels3' failed.
Aborted (core dumped)

How can i fix it ? Thank you!

Bug in LoadChunk(DataIterator& it, Matrix& mat, vector<int>& random_rows)

There is a bug in Line 296-312
void DataHandler::LoadChunk(DataIterator& it, Matrix& mat, vector& random_rows) {
float* data_ptr = mat.GetHostData();
int num_dims = it.GetDims();
int num_rand = (chunk_size_ + random_access_chunk_size_ - 1) / random_access_chunk_size_;
int row, end;
for (int i = 0; i < num_rand; i++) {
row = random_rows[i];
end = (row + random_access_chunk_size_) % dataset_size_;
if (end < row) {
it.Get(data_ptr, row, dataset_size_);
it.Get(data_ptr + num_dims * (dataset_size_ - row), 0, end);
} else {
it.Get(data_ptr, row, end);
}
data_ptr += num_dims * random_access_chunk_size_;
}
}

One possible way to fix it could be :
void DataHandler::LoadChunk(DataIterator& it, Matrix& mat, vector& random_rows) {
float* data_ptr = mat.GetHostData();
int num_dims = it.GetDims();
int num_rand = (chunk_size_ + random_access_chunk_size_ - 1) / random_access_chunk_size_;
int row, end;
for (int i = 0; i < num_rand; i++) {
row = random_rows[i];
end = (row + random_access_chunk_size_) % dataset_size_;
if (end < row) {
it.Get(data_ptr, row, dataset_size_);
int remain_size = random_access_chunk_size_ - (dataset_size_ - row);
if (remain_size > 0)
{
it.Get(data_ptr + num_dims * (dataset_size_ - row), 0, remain_size);
}
} else {
it.Get(data_ptr, row, end);
}
data_ptr += num_dims * random_access_chunk_size_;
}
}

Wrong post

attempting to build cpu-only on OSX 10.10

Been playing around with this a little bit and wanted to open an issue in case anyone else is doing the same. This assumes you're using homebrew.

current progress:

openCV must be version 3.0. by default homebrew installs 2.4.9; this can be fixed by doing brew uninstall opencv && brew install --devel opencv.
you apparently need to run make in convnet/eigenmat before convnet/apps/cpu
you need to add the following two arguments to your makefile's CPPFlags: -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE
as part of the transition away from gcc, OSX now ships with both /usr/bin/gcc and /usr/bin/g++ symlined to clang. You need to download gcc (homebrew, again) and then either specify the correct compiler in your makefile (probably best, but i'm not sure quite how this works) or else modify your path so that you're pointing at the correct gcc/g++.
you may need to explicitly add #include <stdlib.h> to the top of CPUMatrix.h
you need to manually link in the X11 headers (assumes you have xcode + the command line tools installed) ln -s /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers/X11/ /usr/local/include/X11

now however I'm getting a bunch of errors in Cimg:

../..//deps/CImg/CImg.h: In constructor 'cimg_library::cimg::X11_info::X11_info()': ../..//deps/CImg/CImg.h:2527:22: error: 'XInitThreads' was not declared in this scope XInitThreads(); ^ ../..//deps/CImg/CImg.h: In static member function 'static int cimg_library::CImgDisplay::screen_width()': ../..//deps/CImg/CImg.h:7308:45: error: 'XOpenDisplay' was not declared in this scope Display *const _dpy = XOpenDisplay(0); ^ ../..//deps/CImg/CImg.h:7312:27: error: 'XCloseDisplay' was not declared in this scope XCloseDisplay(_dpy);

I'm going to step away for a bit, but if anyone else has gotten this working I'd be happy to hear from you.

Python interface problem (fixed)

When I run:
python run_convnet.py

I got the following error:
Traceback (most recent call last):
File "run_convnet.py", line 57, in
main()
File "run_convnet.py", line 28, in main
board = cn.LockGPU()
File "/home/fs/ylu/Code/convnet/py/util.py", line 19, in LockGPU
board = gpu_lock.obtain_lock_id()
File "/home/fs/ylu/Code/convnet/cudamat/gpu_lock2.py", line 83, in obtain_lock_id
id = obtain_lock_id_to_hog()
File "/home/fs/ylu/Code/convnet/cudamat/gpu_lock2.py", line 100, in obtain_lock_id_to_hog
for id in board_ids():
File "/home/fs/ylu/Code/convnet/cudamat/gpu_lock2.py", line 30, in board_ids
p = Popen(['/u/tang/bin/get_num_gpu_boards'], stdout=PIPE)
File "/usr/lib/python2.7/subprocess.py", line 710, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

I fixed it by modifying 27-32 lines in convnet/cudamat/gpu_lock2.py. From:

from glob import glob

board_devs = glob(_dev_prefix + '[0-9]*')

return range(len(board_devs))

p = Popen(['/u/tang/bin/get_num_gpu_boards'], stdout=PIPE)
nBoards = int(p.stdout.read())
return range(nBoards)

to:

from glob import glob
board_devs = glob(_dev_prefix + '[0-9]*')
return range(len(board_devs))

Display localization

Hi,

How can we configure convnet to display localization?

Thanks
Atiqur

GetName method undefined in pbtxt2dot.py

In pbtxt2dot.py, the Sort method seems to be calling an undefined GetName method. I had to add the method to get it working.

def Sort(model):
  def GetName(edge):
    return '%s:%s' % (edge.source, edge.dest)

  S = []
  L = []

Please correct me if I am wrong. Thanks.

/usr/bin/ld: cannot find -leigenmat

video2hdf5.o error

make error:

g++ -L/home/xijing/local/lib -L/usr/local/cuda/lib64 -L/home/xijing/Work/convnet/cudamat -I/home/xijing/local/include -I/usr/local/cuda/include -Isrc -Ideps -DUSE_GEMM obj/image_iterators.o obj/image2hdf5.o obj/util.o -o bin/image2hdf5 -lopencv_core -lopencv_imgcodecs -lopencv_imgproc -lopencv_videoio -lhdf5 -ljpeg -lX11 -lpthread -lprotobuf -lcublas -ldl -lgomp -lcudamat -lcudart -Wl,-rpath=/home/xijing/Work/convnet/cudamat -Wl,-rpath=/home/xijing/local/lib -Wl,-rpath=/usr/local/cuda/lib64 -lcudamat_conv_gemm
make: *** No rule to make target `obj/video2hdf5.o', needed by `bin/video2hdf5'.  Stop.

osx

hi,

does it run on osx + NVIDIA GeForce GT 750M with 2GB GDDR5 memory?

Thanks,
Gerald

imagenet case

Hi Nitish,

Would you write out how to create a model file from raw-jpeg-images for imagenet (:Alex's net) on README-file?
Though my efforts, some problems occurs and i couldn't reach the success.

How to cite it?

I am happily using this awesome beautifully designed CNN tool.

If I publish a paper based on it, how should I cite it?

Unable to compile: undefined references

I'm using g++ 4.6 and CUDA 5.5. I successfully build the libcudamat* files, but the main build fails to find several of the references within them:

g++46 -L/usr/gapps/brain/installs/opencv3beta/lib -L/usr/gapps/brain/installs/generic/lib -L/opt/cudatoolkit-5.5/lib64 -L/g/g13/boakye1/workspace/convnet/cudamat -I/usr/gapps/brain/installs/opencv3beta/include -I/usr/gapps/brain/installs/generic/include -I/opt/cudatoolkit-5.5/include -Isrc -Ideps -I/g/g13/uname/workspace/convnet/cudamat -DUSE_GEMM obj/convnet_config.pb.o obj/util.o obj/matrix.o obj/loss_functions.o obj/layer.o obj/image_iterators.o obj/video_iterators.o obj/datahandler.o obj/datawriter.o obj/optimizer.o obj/edge.o obj/edge_with_weight.o obj/avgpool_edge.o obj/conv_edge.o obj/conv_onetoone_edge.o obj/downsample_edge.o obj/fc_edge.o obj/local_edge.o obj/maxpool_edge.o obj/response_norm_edge.o obj/rgb_to_yuv_edge.o obj/upsample_edge.o obj/convnet.o obj/train_convnet.o obj/multigpu_convnet.o -o bin/train_convnet -lopencv_core -lopencv_imgcodecs -lopencv_imgproc -lopencv_videoio -lhdf5 -ljpeg -lX11 -lpthread -lprotobuf -lcublas -ldl -lgomp -lcudamat -lcudart -Wl,-rpath=/g/g13/uname/workspace/convnet/cudamat -Wl,-rpath=/usr/gapps/brain/installs/opencv3beta/lib -Wl,-rpath=/usr/gapps/brain/installs/generic/lib -Wl,-rpath=/opt/cudatoolkit-5.5/lib64 -lcudamat_conv_gemm
obj/matrix.o: In function Matrix::~Matrix()': matrix.cc:(.text+0x9): undefined reference todestroy_tex'
obj/matrix.o: In function Matrix::WriteValue(int, int, float)': matrix.cc:(.text+0x5f5): undefined reference towrite_at'
obj/matrix.o: In function Matrix::ReadValue(int, int)': matrix.cc:(.text+0x69a): undefined reference toread_from'
obj/matrix.o: In function Matrix::CopyP2PAsync(Matrix&)': matrix.cc:(.text+0x744): undefined reference tocopy_on_device_p2p_async'

....and so on. I've tried with and without the GEMM kernels but am still unsuccessful. Also, I had to change "-Wno-unused-result" to "-Wno-unused-value" in the Makefile. Any idea what the issue might be?

about 'partial_sum'

Hi Nitish,

(Firstly, I'd like to mention I have not the knowledge about GEMM.)
I don't know the rule of the variable 'partial_sum' in Edge group.
I had understand that the default value '0' is ok for all but the speed is very slow.
But I met the error also.
Though I had investigated to get the rule how to decide this value, I don't get the rule.
What's the rule for it?

Comipling the CPU-only feature extractor

According the install instruction. Set the paths to the dependencies in convnet/cpu/Makefile I don't find the file in convnet about the version.Please help me.

make cpu error

It happens when I make the CPU feature tools.
I downloaded the exact versions of protobuf and hdf5 as in the INSTALL but make gave error like this:
convnet_config.pb.h:17:2: error: #error This file was generated by an older version of protoc which is
convnet_config.pb.h:18:2: error: #error incompatible with your Protocol Buffer headers. Please
convnet_config.pb.h:19:2: error: #error regenerate this file with a newer version of protoc.

Please help.

Assertion failed: (mat->tex_obj != 0), function getTextureObject, file cudamat_conv_util.cu, line 38.

I'm trying to run the code on the GPU of my macbook pro. I got it working over a month ago. But now I'm getting this error:

Extracting features for dataset of size 10 # batches 1 # left overs 0
Writing to output.h5
Adding Dataspace output of size 10 1000
Adding Dataspace hidden7 of size 10 4096
Batch 1
Assertion failed: (mat->tex_obj != 0), function getTextureObject, file cudamat_conv_util.cu, line 38.

The GPU is GeForce GT 650M and the docs say it supports textures (cuda compatibility 3.0).

Any ideas what the issue could be?

Step 1Invalid start / end 0 0

when running
train_convnet --model=net.pbtxt --train=train_data.pbtxt --val=val_data.pbtxt --board 0
or
train_convnet --model=net.pbtxt --train=train_plus_val_data.pbtxt --val=val_data.pbtxt --board 0
this is what we get:

Warning : Could not set up P2P, GPU-to-GPU communication will be slow. CUDA error 
Using board 0
Using board 0
Using board 0
Using board 0
Max for Used for computing average length of incoming weight vectors. 48 on gpu 0
Max for Used for computing average length of incoming weight vectors. 128 on gpu 0
Layer input: 28x28
Layer hidden1_conv: 25x25
Layer hidden1_maxpool: 11x11
Layer hidden2_conv: 8x8
Layer hidden2_maxpool: 3x3
Layer output: 1x1
Training data set size 60000
Validation data set size 10000
Max for shared bias 30000 on gpu 0
Allocating new temp memory of size 30000 on gpu 0
Initialized weight: Dense Uniform. Initial scale 0.981715
Initialized weight: Dense Uniform. Initial scale 0.999742
Initialized weight: Dense Uniform. Initial scale 0.993758
input:hidden1_conv Convolutional Kernel: 4-4-1 : 48 Layer: 28-28 : 25-25
hidden1_conv:hidden1_maxpool MaxPool Kernel: 4-4-48 : 48 Layer: 25-25 : 11-11
hidden1_maxpool:hidden2_conv Convolutional Kernel: 4-4-48 : 128 Layer: 11-11 : 8-8
hidden2_conv:hidden2_maxpool MaxPool Kernel: 4-4-128 : 128 Layer: 8-8 : 3-3
hidden2_maxpool:output Fully Connected :3-3-128:10
Checkpointing at ./checkpoint_dir/mnist_conv_20141127173240
----- SUMMARY ------
databuffer  209.618 MB
edge    0.422585 MB
layer   35.0544 MB
misc    1.07556 MB
ones    2 MB
temp    0.114441 MB
TOTAL   248.285 MB
Step 1Invalid start / end 0 0

Step 1Invalid start / end 0 0 ?

the size of response norm?

In Caffe and cuda-convnet, there is "size" parameter in the local response norm layer.
In this code, there is "frac_of_filters_response_norm" in the RESPONSE_NORM Edge.

How are the two parameters, "size" and "frac_of_filters_response_norm", related? I think
size = number of total neurons * frac_of_filters_response_norm

Is that correct?

make error

/usr/local/CImg-1.5.9/CImg.h:269:21: error: jpeglib.h: No such file or directory
this seems CImg depends on jpeglib? where to download it? it hasn't been mentioned in your INSTALL file.

Warning : Could not set up P2P, GPU-to-GPU communication will be slow. CUDA error

Since we only have one GPU is it safe to ignore this warning?

How to train the convnet on multiGPUs?

Execuse me, I have some question about convnet.
I have train the convnet on multiGPUs by using "boards=01", however, the result is the train only on one board rather than two boards. I get the result by using "nvidia-smi" and find that there two processes but only one of the boards is used.
So I want to know how to train the model by data parallel and model parallel using multiGPUs, do I have to modify something?
My mailbox is [email protected] and look forward to your response.
Thank you very much.

Assertion error about targets.shape[3] during Backprop.

Hi Nitish,

I met this kind of error when i run train_convnet.
"train_convnet: cudamat_conv_gemm.cu:691: void convOutpGemm(cudamat, cudamat_, cudamat*, Shape4D, Shape4D, Shape4D, ConvDesc, float, float, bool): Assertion `num_input_channels * filterModuleMult == num_input_channels3Mult' failed."

after analyzing, I found the error occur a part below.

void ConvEdge::ComputeOuter(Matrix& input, Matrix& deriv_output) {
...
if (partial_sum_locs > 1) {
...
} else {
...
}

dt_temp(:targets)'s element shape[3] is strange...
when I change like it,
'if (partial_sum_locs > 1) { --> if (0) {'
the problem was not generated.

Can we get some walk through on how to train CNN?

Hi,

I want to train using my own dataset. Can you please guide me to some walkthrough guide on how to train and (later) validate the CNN?

Windows support

Actually I have been using Nitish's deepnet toolbox (https://github.com/nitishsrivastava/deepnet) for almost a year and I have successfully installed it on my windows machine, which is not difficult. Now I noticed that Convnet is a more wonderful toolbox of CNNs, and it is similar to deepnet, thus I am wondering is it easy to install Convnet on Windows system?

meaning of variables in RESPONSE_NORM type edge

Hi.
how i should treat {k, n} in the equation in [1] using your variables in net-model file?
(I understand that the alpha is 'add_scale' and beta is 'pow_scale')

[1] '3.3 Local Response Normalization' in Krizhevsky et al.'s paper.

And now I found the assertion error mentioned before is occur in RESOPONSE_NORM edge.

torontodeeplearning / convnet Goto Github PK

convnet's Introduction

Welcome to ConvNet.

Installation

Pre-trained Models

Tutorials

Documentation

convnet's People

Contributors

Stargazers

Watchers

Forkers

convnet's Issues

from glob import glob

board_devs = glob(_dev_prefix + '[0-9]*')

return range(len(board_devs))

Recommend Projects

Recommend Topics

Recommend Org