Hello, I'm newbie on deep learning and working with obsolete hardwar

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

With my change, the cost function become : <div class="snippet-clipboard-content n

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Help for the (optional) bidirectional mapping generation about deep-exemplar-based-colorization HOT 4 CLOSED

msracver commented on July 17, 2024

Help for the (optional) bidirectional mapping generation

from deep-exemplar-based-colorization.

Comments (4)

hmmlillian commented on July 17, 2024

@dorpxam Hello, do you mean the gradient did not update in the deconv function with your CPU-based L-BFGS?
In the previous GPU-based implementation (decov.cpp), diff (float* diff = m_classifier->net_->blob_by_name(m_layer1)->mutable_gpu_diff();) is gpu_ptr to the gradient object, whose data will be updated by the following substraction (caffe_gpu_sub(m_num1, src, m_dy, **diff**);).
In your code, before calling m_classifier->net_->BackwardFromTo(m_id1, m_id2 + 1), have you updated the gradient values diff on gpu?

from deep-exemplar-based-colorization.

commented on July 17, 2024

With my change, the cost function become :

void my_cost_function::cpu_f_gradf(const floatdouble *h_x, floatdouble *h_f, floatdouble *h_gradf)
{
	m_classifier->net_->ForwardFromTo(m_id2 + 1, m_id1);

	const float* src = m_classifier->net_->blob_by_name(m_layer1)->cpu_data();
	float* diff = m_classifier->net_->blob_by_name(m_layer1)->mutable_cpu_diff();
	caffe_sub(m_num1, src, m_dy, diff);

	float* diff2;
	diff2 = (float*)malloc(m_num1 * sizeof(float));
	caffe_mul(m_num1, diff, diff, diff2);

	float total;
	total = caffe_cpu_asum(m_num1, diff2);

	m_classifier->net_->BackwardFromTo(m_id1, m_id2 + 1);

	const float* diff3 = m_classifier->net_->blob_by_name(m_layer2)->cpu_diff();
	memcpy(h_gradf, diff3, m_num2*sizeof(float));
	memcpy(h_f, &total, sizeof(float));

	free(diff2);
}

And launched in the deconv method by :

my_cost_function func (classifier, m_layer1, d_y, num1, m_layer2, num2, id1, id2);

lbfgs solver (func);

lbfgs::status s = solver.cpu_lbfgs(d_x);

std::cout << solver.statusToString(s) << std::endl;

diff is correctly modified, but (as you can see in the log) the Forward/Backward seem broken somewhere and does not propagate the change. I don't think it's a model loading problem because except the hack on the math function files (for cublas v1), caffe is the version cloned on your hub with all the (nutget) dependencies.

Thank you for your help.

from deep-exemplar-based-colorization.

hmmlillian commented on July 17, 2024

@dorpxam Thanks for sharing your code.
I think the major problem in your code should be caused by unsynchronized CPU and GPU memory. There are 4 states in Caffe for CPU and GPU memory management, UNINITIALIZED, HEAD_AT_CPU, HEAD_AT_GPU, SYNCED. In your situation, the state should be SYNCED to ensure data consistency between CPU and GPU. But this state will be violated by mutable_cpu_diff(). As a result, only CPU data is updated, asynchronous to GPU memory. You may check the state before and after. Maybe
async_gpu_push can be used to copy data from CPU to GPU before calling BackwardFromTo(m_id1, m_id2 + 1). More details can be found in caffe source code (https://github.com/BVLC/caffe/blob/master/src/caffe/syncedmem.cpp).
BTW, the input data type should be float instead of floatdouble.
Hope this could be helpful.

from deep-exemplar-based-colorization.

commented on July 17, 2024

@hmmlillian Thanks for your answer.
I understand. As I have said before, I init caffe in CPU mode. So the trained data was loaded in cpu_ptr memory blobs. The classifiers was modified to load the differents vectors of memory in cpu mem (data_A / data_AP / data_B / data_BP) mutable or copy. For my limited GPU memory hardware, I only move the cpu memory to allow use the differents block/thread cuda methods in the deep_image_analogy methods. The DeepAnalogy.cu file (in pseudo code) :

classifier_A -> load all layers on cpu from trained datas to vector of / A / AP / 
classifier B -> load all layers on cpu from trained datas to vector of / B / BP / 

No change on ANN/ANND -> host on cpu / device on gpu

for each layers // from 32 to 512
{
    memcpy current_layer (A/AP/B/BP) to GPU

    No change in the GPU process for :  
    init/upsample -> norm -> blend -> norm -> patchmatch 

    Process avg_vote on GPU too and move target result to CPU

    copy back A/AP/B/BP to CPU

    launch deconv on CPU two times for data_AP[next_layer] and data_B[next_layer]

    free cuda mem for current_layer of A/AP/B/BP
}

So the only time where the memory go to GPU, this is temporary. Only to process the differents layers and use the code in "GeneralizedPatchMatch.cu" without any change.

Note that the floatdouble is not typedefined as double but as float ofcourse ;)

I will investigate the question of async memory and the states as you've said.

In any case, thank's a lot for your help.

from deep-exemplar-based-colorization.

Help for the (optional) bidirectional mapping generation about deep-exemplar-based-colorization HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent