megvii-research / iccv2019-learningtopaint Goto Github PK

View Code? Open in Web Editor NEW

2.2K 48.0 314.0 18.67 MB

ICCV2019 - Learning to Paint With Model-based Deep Reinforcement Learning

License: MIT License

Python 89.95% Jupyter Notebook 10.05%

deep-learning reinforcement-learning computer-vision pytorch painting

iccv2019-learningtopaint's Introduction

ICCV2019-Learning to Paint

arXiv | YouTube | Reddit | Slide(中文) | Replicate

Zhewei Huang, Wen Heng, Shuchang Zhou

Abstract

We show how to teach machines to paint like human painters, who can use a small number of strokes to create fantastic paintings. By employing a neural renderer in model-based Deep Reinforcement Learning (DRL), our agents learn to determine the position and color of each stroke and make long-term plans to decompose texture-rich images into strokes. Experiments demonstrate that excellent visual effects can be achieved using hundreds of strokes. The training process does not require the experience of human painters or stroke tracking data.

You can easily use colaboratory to have a try.

Dependencies

pip3 install torch==1.1.0
pip3 install tensorboardX
pip3 install opencv-python

Testing

Make sure there are renderer.pkl and actor.pkl before testing.

You can download a trained neural renderer and a CelebA actor for test: renderer.pkl and actor.pkl

$ wget "https://drive.google.com/uc?export=download&id=1-7dVdjCIZIxh8hHJnGTK-RA1-jL1tor4" -O renderer.pkl
$ wget "https://drive.google.com/uc?export=download&id=1a3vpKgjCVXHON4P7wodqhCgCMPgg1KeR" -O actor.pkl
$ python3 baseline/test.py --max_step=100 --actor=actor.pkl --renderer=renderer.pkl --img=image/test.png --divide=4
$ ffmpeg -r 10 -f image2 -i output/generated%d.png -s 512x512 -c:v libx264 -pix_fmt yuv420p video.mp4 -q:v 0 -q:a 0
(make a painting process video)

We also provide with some other neural renderers and agents, you can use them instead of renderer.pkl to train the agent:

triangle.pkl --- actor_triangle.pkl;

round.pkl --- actor_round.pkl;

bezierwotrans.pkl --- actor_notrans.pkl

We also provide 百度网盘 source. 链接: https://pan.baidu.com/s/1GELBQCeYojPOBZIwGOKNmA 提取码: aq8n

Training

Datasets

Download the CelebA dataset and put the aligned images in data/img_align_celeba/******.jpg

Neural Renderer

To create a differentiable painting environment, we need train the neural renderer firstly.

$ python3 baseline/train_renderer.py
$ tensorboard --logdir train_log --port=6006
(The training process will be shown at http://127.0.0.1:6006)

Paint Agent

After the neural renderer looks good enough, we can begin training the agent.

$ cd baseline
$ python3 train.py --max_step=40 --debug --batch_size=96
(A step contains 5 strokes in default.)
$ tensorboard --logdir train_log --port=6006

Resources

量子位报道

Learning to Paint：一个绘画 AI

旷视研究院推出基于深度强化学习的绘画智能体

Our ICCV poster
Our ICCV rebuttal for reviewers

Contributors

hzwer
ak9250

Also many thanks to ctmakro for inspiring this work. He also explored using greedy algorithm to generate paintings - opencv_playground.

If you find this repository useful for your research, please cite the following paper:

@inproceedings{huang2019learning,
  title={Learning to paint with model-based deep reinforcement learning},
  author={Huang, Zhewei and Heng, Wen and Zhou, Shuchang},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2019}
}

iccv2019-learningtopaint's People

Contributors

Stargazers

Watchers

Forkers

faketomatoes amoliu edbeeching wh-forker leo-xxx joechenrh python-z ak9250 ametavaibhav hhy5277 burakakrishna moranqingchen team-honglou ruinnight zsc levinkkk nguyenducnhaty chaoyue729 awesome-archive chenyouxin113 lijinhuai truehaolix kissthink adolfoeliazat vaanye xujiaming1997 glchaos sabogeng amath0312 lubaoyilang alphonsetai dbdoer wnheng peterzhousz cclauss tvturnhout gitcontainer hyzcn betai18n wangyuescream kaozhub justgohead howardyan93 pranavcode helloexception veikin samimideksa adamlouisky avatarworld faruba waitingfy 526326991 sicolas yhtian01 exitna haifengzeng parety abnerhu sa757 cndavy fashtimedotcom anoceanapart lin1github kocwei batermj pandinosaurus opptimus sjl421 victor8733 hogwartsrico arthur-null fancyerii ryanliu0808 shuidong benikaba fakegit chenokay jbnhandsome w121211 koryako wenhuach lotapp jsshaojinjie willnevermore gracedgl xiaolangsong czibula belial2010 jizongfox xiongyaokun levindong littleserendipity mahsamrm 24kb-star aciuvu robvcc hunterhawk rosssong 19ai que-yue

iccv2019-learningtopaint's Issues

notrans renderer weights loading error

Hello @hzwer
i have a problem loading notrans renderer weights while using test.py file
here is a screenshot of the error

Do i need to change anything in the code?

confused by the update policy.

in the update_policy() :
cur_q, step_reward = self.evaluate(state, action) target_q += step_reward.detach() value_loss = criterion(cur_q, target_q)

it's quite confusing .. so the value_loss = discount*(self.critic(St+1)+reward(St+1)) -self.critic(St) ??

shouldn't it be : Value_loss = discount*(self.critic(St+1)) + reward(St) - self.critic(St) ?

Critic and discriminator

Hi!
I am trying to understand the Deep Reinforcement Learning part. I know that the actor outputs is a set of stroke parameters based on the canvas status and target image and the discriminator give (to the actor) a reward at each step . But what about critic? What is the input and the output for the actor? I am reading the paper but I do not understand this part.
thank you so much

Run out of memory

When I ran train. py with a GPU. It seems that RAM has run out. My computer has 46G of RAM, including 30G virtual memory.

$ python3 baseline/train.py --max_step=200 --debug --batch_size=96
mkdir: cannot create directory â€˜./modelâ€™: File exists
loaded 10000 images
loaded 20000 images
loaded 30000 images
loaded 40000 images
loaded 50000 images
loaded 60000 images
loaded 70000 images
loaded 80000 images
loaded 90000 images
loaded 100000 images
loaded 110000 images
loaded 120000 images
loaded 130000 images
loaded 140000 images
loaded 150000 images
loaded 160000 images
loaded 170000 images
loaded 180000 images
loaded 190000 images
loaded 200000 images
finish loading data, 197999 training images, 2001 testing images
observation_space (96, 128, 128, 7) action_space 13
/home/rody/xu/npaint/LearningToPaint/baseline/DRL/ddpg.py:157: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  s0 =torch.tensor(self.state, device='cpu')
/home/rody/xu/npaint/LearningToPaint/baseline/DRL/ddpg.py:163: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  s1 =torch.tensor(state, device='cpu')
 #0: steps:200 interval_time:9.08 train_time:0.00
 #1: steps:400 interval_time:22.40 train_time:0.00
 #2: steps:600 interval_time:19.66 train_time:6.90
 #3: steps:800 interval_time:20.01 train_time:5.28
 #4: steps:1000 interval_time:20.89 train_time:6.01
 #5: steps:1200 interval_time:20.52 train_time:6.34
 #6: steps:1400 interval_time:18.20 train_time:7.01
Killed

Here's the memory footprint

              total        used        free      shared  buff/cache   available
Mem:          15892       15627         139          11         125          81
Swap:         30273       30273           0

What is the difference between `baseline` and `baseline_modelfree`?

Hi, I have a question about your directory structure.

What is the difference between baseline and baseline_modelfree?

I think that they look the same.

Would you teach me the difference?

Renderer input features

Hi @hzwer ,
Could you clarify the input feature of the neural renderer as it is 10-value vector or 13-value vector (+RGB).
If training with 10-value vector, how the painter can generate color pictures?

Bests,

cleanup

while I was reviewing your code to better understand your paper, I found some dead code. Would you mind if I clean up some code, add some instructive comments ( for people like me ), and send a PR?

The default setting only supports L2 loss as reward

Hi, it seems the default training setting only supports L2 loss as reward, modifying it to wgan loss is slightly non-trivial.

Any other reward function

Very cool project! It seems using GAN loss here is a natural choice to compare the drawing and images. Have you ever tried other losses like the perceptual loss? Thank you!

How to change strokes?

the readme.md said A step contains 5 strokes in default,when I train another model,where i can change strokes?

Question about Q value

I love this amazing project. I'm surprised that neural networks can do such incredible thing.
There is a small problem about Q value. In the paper cur_q = reward + γ * target_q, so normally it should be "return Q, gan_reward" in evaluate(). This is actually the case in model-free method. But in model-based method it's "return (Q+gan_reward), gan_reward", this makes me confused. Why does the Q value need to be added with the reward of the same step?

training

when running train.py with celebA it automatically gets interrupted

loaded 200000 images finish loading data, 197999 training images, 2001 testing images observation_space (96, 128, 128, 7) action_space 13 /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") /content/LearningToPaint/baseline/DRL/ddpg.py:158: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). s0 = torch.tensor(self.state, device='cpu') /content/LearningToPaint/baseline/DRL/ddpg.py:161: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). s1 = torch.tensor(state, device='cpu') ^C

also I am on gpu

how wgan is trained??

Thank you for sharing your awesome work!

In your paper, you mentioned that using wgan discriminator loss to define the reward.

But how wgan is trained in your work?(pre-train to some extent beforehand and using in cal_dis??)

Output strokes per iteration only.

Hi,
thank you for your work.

I am struggling to modify the code so that, when running python3 baseline/test.py --max_step=80 --actor=actor.pkl --renderer=renderer.pkl --img=path_to_image --divide=5, it generates images with only the strokes added during the latest iteration - instead of the sum of all strokes.

Do you have an idea on this?

Thank you

Neural Renderer

Hello !

I want to understand the Neural renderer (DL network) part. How did you train this neural renderer?
If there is a dataset, please provide a link for it.
Have you used a traditional rendering algorithm in this case? (If so how ?)

Thank you

`python3 baseline/train_renderer.py` call fails

does your code include action batching?

does your code include action batching mentioned in your paper? From reading your code, I don't think it has action batching implemented.

Role of constant coordinates in merged state?

Hi,

Can you please clarify why the merged state for actor, critic takes the constant coordinate images as input?

Thanks!

Renderer Training Doubts

Hi, I have an issue with the way how the Neural Renderer is trained.
Let's consider a generic ML/DL training procedure: We fix a train/validation set of fixed size and on the same train set we do backpropagation and then we evaluate on the final validation set. But here, we are randomly generating batchSize of 64, in both train and valid (after every 1000th iteration afair) parts and perform training for 5,00,000 epochs. I find this confusing, the randomly generated samples could vary drastically across the epochs, how are you ensuring model improvement? Are you simply trying to overfit the model to all possible combinations of co-ordinates in the canvas? I want to understand why you have taken this approach.

Thanks
Niharika

About stroke generation

In stroke_gen.py you use Quadratic Bezier Curve to generate stroke. I wonder why (x1, y1) is calculated by (x0, y0) and (x2, y2)

x1 = x0 + (x2 - x0) * x1
y1 = y0 + (y2 - y0) * y1

What would happen if I comment this 2 line?

Divide parameter and k=5

Hello :)

I have some doubts...

I have seen that in the algorithm a "divide" parameter is defined which divides the Canvas into mini canvas in order to improve the agent accuracy. But.... I would like to understand when this action is performed during the training (what are the steps). when the actor is going to make a stroke, the canvas is divided and then it is reconstructed?

Also I have seen that for each state the actor performs 5 actions (brush strokes), I understand that the discriminate gives the reward to the actor. But what about with respect to the critic? update q for each of the five actions?

Thank you very much in advance

What is the meaning of training the Neural Renderer？

Decoding of strokes

The strokes are rendered from parameters to strokes and added to a canvas in the decode function

ICCV2019-LearningToPaint/baseline/DRL/ddpg.py

Line 26 in 5f62ffc

def decode(x, canvas): # b * (10 + 3)

I've got a couple of questions regarding the procedure. Why does the decoder return

ICCV2019-LearningToPaint/baseline/Renderer/model.py

Line 34 in 5f62ffc

return 1 - x.view(-1, 128, 128)

? It is trained by comparing to the ground truth, why should it learn the inverse, instead of the actual image?

Why is the stroke then

ICCV2019-LearningToPaint/baseline/DRL/ddpg.py

Line 28 in 5f62ffc

stroke = 1 - Decoder(x[:, :10])

And why is it added to the canvas via

ICCV2019-LearningToPaint/baseline/DRL/ddpg.py

Line 36 in 5f62ffc

canvas = canvas * (1 - stroke[:, i]) + color_stroke[:, i]

I don't understand why you would do the 1 - stroke at every step in this chain. Also the canvas is initialized to all zeros. Is the canvas * (1 - stroke[:, k]) in canvas = canvas * (1 - stroke[:, k]) + color_stroke[:, k] really necessary? stroke is included in color_stroke anyway.

Am I missing something? Thanks for any help!

How to make L2 rewards work?

I have tried to use L2 reward in ddpg.py line 102 and cancel WGAN optimization, but after the same iterations, this painter is not as good as WGAN reward.
Kindly, how do you make L2 rewards work?

some typos

noticed some typos in your paper:

equation 3 has a hanging paranthesis in the very right

V(s_t) = r(s_t, a_t) + γV(s_t1))

suggested fix:

V(s_t) = r(s_t, a_t) + γV(s_t1)

on page 5, the first sentence of the last paragraph,

The neural renderer network is consisting of several fully connect layers and convolution layers

suggested fix:

The neural renderer network is consisting of several fully connected layers and convolution layers

Hope it helps :)

Undefined name 'init' in actor.py

flake8 testing of https://github.com/hzwer/LearningToPaint on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./baseline/DRL/actor.py:17:9: F821 undefined name 'init'
        init.xavier_uniform(m.weight, gain=np.sqrt(2))
        ^
./baseline/DRL/actor.py:18:9: F821 undefined name 'init'
        init.constant(m.bias, 0)
        ^
2     F821 undefined name 'init'
2

E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.

F821: undefined name name
F822: undefined name name in __all__
F823: local variable name referenced before assignment
E901: SyntaxError or IndentationError
E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

stroke sequence learning

@hzwer
Can this method be trained not only to pain, but also to pain in a certain sequence?
i am interested in training a network to learn the sequence and order of the drawing and strokes.
any suggestions

关于 load_data 里的 img_test 疑问？

def load_data(self):
# CelebA
global train_num, test_num
for i in range(200000):
img_id = '%06d' % (i + 1)
try:
img = cv2.imread('/data/CelebA/celeba/img_align_celeba/' + img_id + '.jpg', cv2.IMREAD_UNCHANGED)
img = cv2.resize(img, (width, width))
if i > 2000:
train_num += 1
img_train.append(img)
else:
test_num += 1
img_test.append(img)
finally:
if (i + 1) % 10000 == 0:
print('loaded {} images'.format(i + 1))
print('finish loading data, {} training images, {} testing images'.format(str(train_num), str(test_num)))
请问在 env.py 文件 load_data 函数中，0~1999 张图片被 append 到 img_test 列表中，请问测试图片在哪里被用到了呢？我想使用这 2000 张图片对模型进行测试定量分析，该怎么用呢？test.py 只是对单张图片进行测试。

The differences between `env_batch` and `batch_size`

Hi, I dived into the code of your paper and I'm confused of the two variables env_batch and batch_size, which seems to be the same according to your implementation.

Could you give me some hints to help me figure it out? Thank you very much

Parameter Doubts

Few doubts on parameters :

Q1. Here, what is the difference between max_steps, train_times, and episode_train_times? Can you please define them?

Q2. What happens during the warmup stage? ( Is there any issue if we keep warmup step=0)

Different Neural Renderer

Hello @hzwer,
Kindly, I have 2 questions:-

I noticed you provided extra renderers in the README file. What modifications did you apply to the stroke_gen file so that you could train those renderers?
What bezierwotrans.pkl --- actor_notrans.pkl files names stand for?

Thanks in advance

hard_update

Hello :)

Could you tell me why is necessary this function and what it do exactly?

def hard_update(target, source):
for m1, m2 in zip(target.modules(), source.modules()):
m1._buffers = m2.buffers.copy()
for target_param, param in zip(target.parameters(), source.parameters()):
target_param.data.copy(param.data)

I do not understand! Thanks so much!

Stroke opacity

Hi,
I noticed that each stroke is transparent, so that layers over layers of color will add up over time to form the target picture.
Is there a possibility to adjust the opacity to simulate the painting of a picture using a opaque palette? I guess for that the training of a new model would be necessary.

Thanks in advance.

How were straight strokes, circles and triangles generated?

Thanks for your nice work,

I am just wondering, for simple strokes like (flat) circles, triangles, rectangles, do we really need the renderer since we already have simpler state representation? For example, the circle only needs a center and a radius instead of a 10-value state vector.

trained model and example how to use?

will the pretrained model be provided and how to use it?

Question about reward

I find the reward save to Replay buffer https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/env.py#L105 is different from the reward calculate in training process https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/DRL/ddpg.py#L102 ，one is divide by initial distance and one is not, is it a bug? or it's just ok

Possibility to output SVG instead of PNG

Can you point me in the right direction what I have to modify in order to generate an SVG with the generated strokes?

only support the pictures size of 128x128?

Would you please provide download link in Baidu netdisk?

Hi, Hzwer,
Would you please provide download link of these render.pkl and actor.pkl with a Baidu netdisk share? As you know, google is not easy to access from China. Thanks.

关于其他数据集的问题

您好！我在使用CUB Birds 和 Stanford cars数据集进行训练时，图片只显示一个颜色，随着训练过程进行也没有其他变化，我对代码的修改仅有load_data(), 为什么会造成这种情况呢？

Why using weight norm rather than batch norm in discriminator?

as title

What is the effect that add reward or not on Q value?

Hi hzwer, I have a question with update_policy function, In modelbased code, the return value of Q add gan_reward, but in modelfree code, it is not add gan_reward. Does it have any effect

stroke

I want to get the final stroke parameters! What should I do? please! Thank you!

So glad to see that I'm credited.

because I'm a simple man.

the rl-painter idea was from another earlier adventure:
https://github.com/ctmakro/opencv_playground
which does not use RL but local gradient descent for greedy optimality.

hope you could put a link to that too :)

Stroke gen

Hi, looking at the draw() function it seems like the generator creates greyscale brushstrokes. Where do the colour parameters get inputted?

Question about brush stroke texture

This is a terrific project. Being able to generate so few strokes is quite an achievement.

Is it possible to use a textured brush stroke?

What I mean is using a grayscale picture of a real brush stroke. The grayscale value in the picture gives the transparency.

See for example this blog post
http://3dstereophoto.blogspot.com/2018/07/non-photorealistic-rendering-software.html
that describes a "classic" (not AI) Stroke Based Renderer

spectral normalization GAN

Have you tried spectral normalization GAN & adding L1 distance to WGAN loss? I wonder how these two changes could impact the performance:

1. Replacing WGAN-GP with spectral normalization

Spectral normalization has two main advantages:

Slight performance improvement relative to WGAN-GP on ResNet. The inception score of spectral normalization had a slight upper hand — approximately 0.16 — with less deviation compared to WGAN-GP.
Spectral normalization is ~30% more computationally efficient.
Since both actors and critics use ResNet as the backbone, replacing WGAN-GP with spectral normalization can potentially yield meaningful results.

2. Combining WGAN-GP with spectral normalization

The authors of the spectral normalization paper suggest that combining WGAN-GP with spectral normalization can further improve the results compared to the baseline WGA-GP and spectral normalization GAN.

can't work out what versions I need of the dependencies

For example, pyTorch: is it 0.4.1 or 1.1.0? what version of tensorboardX do I need to use? And do you have a requirements.txt file I could see?

Training parameters

Hi !

I am trying to train the paint agent in my GPU. In the paper I could read that the training time was about 2 days in your case.

Can you tell me what parameters used you to train the paint agent? In my case the training time is more than 1 week (I am training the agent in a GPU too but I think that there is a lot of time difference).

Thanks so much!

megvii-research / iccv2019-learningtopaint Goto Github PK

iccv2019-learningtopaint's Introduction

ICCV2019-Learning to Paint

arXiv | YouTube | Reddit | Slide(中文) | Replicate

Abstract

Dependencies

Testing

Training

Datasets

Neural Renderer

Paint Agent

Resources

Contributors

iccv2019-learningtopaint's People

Contributors

Stargazers

Watchers

Forkers

iccv2019-learningtopaint's Issues

1. Replacing WGAN-GP with spectral normalization

2. Combining WGAN-GP with spectral normalization

Recommend Projects

Recommend Topics

Recommend Org