Giter VIP home page Giter VIP logo

iccv2019-learningtopaint's Introduction

ICCV2019-Learning to Paint

Zhewei Huang, Wen Heng, Shuchang Zhou

Abstract

We show how to teach machines to paint like human painters, who can use a small number of strokes to create fantastic paintings. By employing a neural renderer in model-based Deep Reinforcement Learning (DRL), our agents learn to determine the position and color of each stroke and make long-term plans to decompose texture-rich images into strokes. Experiments demonstrate that excellent visual effects can be achieved using hundreds of strokes. The training process does not require the experience of human painters or stroke tracking data.

You can easily use colaboratory to have a try.

DemoDemoDemo DemoDemoDemo

Dependencies

pip3 install torch==1.1.0
pip3 install tensorboardX
pip3 install opencv-python

Testing

Make sure there are renderer.pkl and actor.pkl before testing.

You can download a trained neural renderer and a CelebA actor for test: renderer.pkl and actor.pkl

$ wget "https://drive.google.com/uc?export=download&id=1-7dVdjCIZIxh8hHJnGTK-RA1-jL1tor4" -O renderer.pkl
$ wget "https://drive.google.com/uc?export=download&id=1a3vpKgjCVXHON4P7wodqhCgCMPgg1KeR" -O actor.pkl
$ python3 baseline/test.py --max_step=100 --actor=actor.pkl --renderer=renderer.pkl --img=image/test.png --divide=4
$ ffmpeg -r 10 -f image2 -i output/generated%d.png -s 512x512 -c:v libx264 -pix_fmt yuv420p video.mp4 -q:v 0 -q:a 0
(make a painting process video)

We also provide with some other neural renderers and agents, you can use them instead of renderer.pkl to train the agent:

triangle.pkl --- actor_triangle.pkl;

round.pkl --- actor_round.pkl;

bezierwotrans.pkl --- actor_notrans.pkl

We also provide 百度网盘 source. 链接: https://pan.baidu.com/s/1GELBQCeYojPOBZIwGOKNmA 提取码: aq8n

Training

Datasets

Download the CelebA dataset and put the aligned images in data/img_align_celeba/******.jpg

Neural Renderer

To create a differentiable painting environment, we need train the neural renderer firstly.

$ python3 baseline/train_renderer.py
$ tensorboard --logdir train_log --port=6006
(The training process will be shown at http://127.0.0.1:6006)

Paint Agent

After the neural renderer looks good enough, we can begin training the agent.

$ cd baseline
$ python3 train.py --max_step=40 --debug --batch_size=96
(A step contains 5 strokes in default.)
$ tensorboard --logdir train_log --port=6006

Resources

量子位报道

Learning to Paint:一个绘画 AI

旷视研究院推出基于深度强化学习的绘画智能体

Contributors

Also many thanks to ctmakro for inspiring this work. He also explored using greedy algorithm to generate paintings - opencv_playground.

If you find this repository useful for your research, please cite the following paper:

@inproceedings{huang2019learning,
  title={Learning to paint with model-based deep reinforcement learning},
  author={Huang, Zhewei and Heng, Wen and Zhou, Shuchang},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2019}
}

iccv2019-learningtopaint's People

Contributors

bfirsh avatar cclauss avatar hzwer avatar sungyk avatar unrahul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

iccv2019-learningtopaint's Issues

confused by the update policy.

in the update_policy() :
cur_q, step_reward = self.evaluate(state, action) target_q += step_reward.detach() value_loss = criterion(cur_q, target_q)

it's quite confusing .. so the value_loss = discount*(self.critic(St+1)+reward(St+1)) -self.critic(St) ??

shouldn't it be : Value_loss = discount*(self.critic(St+1)) + reward(St) - self.critic(St) ?

Critic and discriminator

Hi!
I am trying to understand the Deep Reinforcement Learning part. I know that the actor outputs is a set of stroke parameters based on the canvas status and target image and the discriminator give (to the actor) a reward at each step . But what about critic? What is the input and the output for the actor? I am reading the paper but I do not understand this part.
thank you so much

Run out of memory

When I ran train. py with a GPU. It seems that RAM has run out. My computer has 46G of RAM, including 30G virtual memory.

$ python3 baseline/train.py --max_step=200 --debug --batch_size=96
mkdir: cannot create directory ‘./model’: File exists
loaded 10000 images
loaded 20000 images
loaded 30000 images
loaded 40000 images
loaded 50000 images
loaded 60000 images
loaded 70000 images
loaded 80000 images
loaded 90000 images
loaded 100000 images
loaded 110000 images
loaded 120000 images
loaded 130000 images
loaded 140000 images
loaded 150000 images
loaded 160000 images
loaded 170000 images
loaded 180000 images
loaded 190000 images
loaded 200000 images
finish loading data, 197999 training images, 2001 testing images
observation_space (96, 128, 128, 7) action_space 13
/home/rody/xu/npaint/LearningToPaint/baseline/DRL/ddpg.py:157: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  s0 =torch.tensor(self.state, device='cpu')
/home/rody/xu/npaint/LearningToPaint/baseline/DRL/ddpg.py:163: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  s1 =torch.tensor(state, device='cpu')
 #0: steps:200 interval_time:9.08 train_time:0.00
 #1: steps:400 interval_time:22.40 train_time:0.00
 #2: steps:600 interval_time:19.66 train_time:6.90
 #3: steps:800 interval_time:20.01 train_time:5.28
 #4: steps:1000 interval_time:20.89 train_time:6.01
 #5: steps:1200 interval_time:20.52 train_time:6.34
 #6: steps:1400 interval_time:18.20 train_time:7.01
Killed

Here's the memory footprint

              total        used        free      shared  buff/cache   available
Mem:          15892       15627         139          11         125          81
Swap:         30273       30273           0

Renderer input features

Hi @hzwer ,
Could you clarify the input feature of the neural renderer as it is 10-value vector or 13-value vector (+RGB).
If training with 10-value vector, how the painter can generate color pictures?

Bests,

cleanup

while I was reviewing your code to better understand your paper, I found some dead code. Would you mind if I clean up some code, add some instructive comments ( for people like me ), and send a PR?

Any other reward function

Very cool project! It seems using GAN loss here is a natural choice to compare the drawing and images. Have you ever tried other losses like the perceptual loss? Thank you!

How to change strokes?

the readme.md said A step contains 5 strokes in default,when I train another model,where i can change strokes?

Question about Q value

I love this amazing project. I'm surprised that neural networks can do such incredible thing.
There is a small problem about Q value. In the paper cur_q = reward + γ * target_q, so normally it should be "return Q, gan_reward" in evaluate(). This is actually the case in model-free method. But in model-based method it's "return (Q+gan_reward), gan_reward", this makes me confused. Why does the Q value need to be added with the reward of the same step?

training

when running train.py with celebA it automatically gets interrupted

loaded 200000 images finish loading data, 197999 training images, 2001 testing images observation_space (96, 128, 128, 7) action_space 13 /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1332: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") /content/LearningToPaint/baseline/DRL/ddpg.py:158: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). s0 = torch.tensor(self.state, device='cpu') /content/LearningToPaint/baseline/DRL/ddpg.py:161: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). s1 = torch.tensor(state, device='cpu') ^C

also I am on gpu

how wgan is trained??

Thank you for sharing your awesome work!

In your paper, you mentioned that using wgan discriminator loss to define the reward.

But how wgan is trained in your work?(pre-train to some extent beforehand and using in cal_dis??)

Output strokes per iteration only.

Hi,
thank you for your work.

I am struggling to modify the code so that, when running python3 baseline/test.py --max_step=80 --actor=actor.pkl --renderer=renderer.pkl --img=path_to_image --divide=5, it generates images with only the strokes added during the latest iteration - instead of the sum of all strokes.

Do you have an idea on this?

Thank you

Neural Renderer

Hello !

I want to understand the Neural renderer (DL network) part. How did you train this neural renderer?
If there is a dataset, please provide a link for it.
Have you used a traditional rendering algorithm in this case? (If so how ?)

Thank you

Renderer Training Doubts

Hi, I have an issue with the way how the Neural Renderer is trained.
Let's consider a generic ML/DL training procedure: We fix a train/validation set of fixed size and on the same train set we do backpropagation and then we evaluate on the final validation set. But here, we are randomly generating batchSize of 64, in both train and valid (after every 1000th iteration afair) parts and perform training for 5,00,000 epochs. I find this confusing, the randomly generated samples could vary drastically across the epochs, how are you ensuring model improvement? Are you simply trying to overfit the model to all possible combinations of co-ordinates in the canvas? I want to understand why you have taken this approach.

Thanks
Niharika

About stroke generation

In stroke_gen.py you use Quadratic Bezier Curve to generate stroke. I wonder why (x1, y1) is calculated by (x0, y0) and (x2, y2)

x1 = x0 + (x2 - x0) * x1
y1 = y0 + (y2 - y0) * y1

What would happen if I comment this 2 line?

Divide parameter and k=5

Hello :)

I have some doubts...

I have seen that in the algorithm a "divide" parameter is defined which divides the Canvas into mini canvas in order to improve the agent accuracy. But.... I would like to understand when this action is performed during the training (what are the steps). when the actor is going to make a stroke, the canvas is divided and then it is reconstructed?

Also I have seen that for each state the actor performs 5 actions (brush strokes), I understand that the discriminate gives the reward to the actor. But what about with respect to the critic? update q for each of the five actions?

Thank you very much in advance

Decoding of strokes

The strokes are rendered from parameters to strokes and added to a canvas in the decode function

def decode(x, canvas): # b * (10 + 3)

I've got a couple of questions regarding the procedure. Why does the decoder return

return 1 - x.view(-1, 128, 128)
? It is trained by comparing to the ground truth, why should it learn the inverse, instead of the actual image?

Why is the stroke then

stroke = 1 - Decoder(x[:, :10])
?

And why is it added to the canvas via

canvas = canvas * (1 - stroke[:, i]) + color_stroke[:, i]
?

I don't understand why you would do the 1 - stroke at every step in this chain. Also the canvas is initialized to all zeros. Is the canvas * (1 - stroke[:, k]) in canvas = canvas * (1 - stroke[:, k]) + color_stroke[:, k] really necessary? stroke is included in color_stroke anyway.

Am I missing something? Thanks for any help!

How to make L2 rewards work?

I have tried to use L2 reward in ddpg.py line 102 and cancel WGAN optimization, but after the same iterations, this painter is not as good as WGAN reward.
Kindly, how do you make L2 rewards work?

some typos

noticed some typos in your paper:

  1. equation 3 has a hanging paranthesis in the very right

V(s_t) = r(s_t, a_t) + γV(s_t1))

suggested fix:

V(s_t) = r(s_t, a_t) + γV(s_t1)

  1. on page 5, the first sentence of the last paragraph,

The neural renderer network is consisting of several fully connect layers and convolution layers

suggested fix:

The neural renderer network is consisting of several fully connected layers and convolution layers

Hope it helps :)

Undefined name 'init' in actor.py

flake8 testing of https://github.com/hzwer/LearningToPaint on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./baseline/DRL/actor.py:17:9: F821 undefined name 'init'
        init.xavier_uniform(m.weight, gain=np.sqrt(2))
        ^
./baseline/DRL/actor.py:18:9: F821 undefined name 'init'
        init.constant(m.bias, 0)
        ^
2     F821 undefined name 'init'
2

E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.

  • F821: undefined name name
  • F822: undefined name name in __all__
  • F823: local variable name referenced before assignment
  • E901: SyntaxError or IndentationError
  • E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

stroke sequence learning

@hzwer
Can this method be trained not only to pain, but also to pain in a certain sequence?
i am interested in training a network to learn the sequence and order of the drawing and strokes.
any suggestions

关于 load_data 里的 img_test 疑问?

def load_data(self):
# CelebA
global train_num, test_num
for i in range(200000):
img_id = '%06d' % (i + 1)
try:
img = cv2.imread('/data/CelebA/celeba/img_align_celeba/' + img_id + '.jpg', cv2.IMREAD_UNCHANGED)
img = cv2.resize(img, (width, width))
if i > 2000:
train_num += 1
img_train.append(img)
else:
test_num += 1
img_test.append(img)
finally:
if (i + 1) % 10000 == 0:
print('loaded {} images'.format(i + 1))
print('finish loading data, {} training images, {} testing images'.format(str(train_num), str(test_num)))
请问 在 env.py 文件 load_data 函数中,0~1999 张图片被 append 到 img_test 列表中,请问测试图片在哪里被用到了呢?我想使用这 2000 张图片对模型进行测试定量分析,该怎么用呢?test.py 只是对单张图片进行测试。

The differences between `env_batch` and `batch_size`

Hi, I dived into the code of your paper and I'm confused of the two variables env_batch and batch_size, which seems to be the same according to your implementation.

Could you give me some hints to help me figure it out? Thank you very much

Parameter Doubts

Few doubts on parameters :

Q1. Here, what is the difference between max_steps, train_times, and episode_train_times? Can you please define them?

Q2. What happens during the warmup stage? ( Is there any issue if we keep warmup step=0)

Different Neural Renderer

Hello @hzwer,
Kindly, I have 2 questions:-

  1. I noticed you provided extra renderers in the README file. What modifications did you apply to the stroke_gen file so that you could train those renderers?
  2. What bezierwotrans.pkl --- actor_notrans.pkl files names stand for?

Thanks in advance

hard_update

Hello :)

Could you tell me why is necessary this function and what it do exactly?

def hard_update(target, source):
for m1, m2 in zip(target.modules(), source.modules()):
m1._buffers = m2.buffers.copy()
for target_param, param in zip(target.parameters(), source.parameters()):
target_param.data.copy
(param.data)

I do not understand! Thanks so much!

Stroke opacity

Hi,
I noticed that each stroke is transparent, so that layers over layers of color will add up over time to form the target picture.
Is there a possibility to adjust the opacity to simulate the painting of a picture using a opaque palette? I guess for that the training of a new model would be necessary.

Thanks in advance.

How were straight strokes, circles and triangles generated?

Thanks for your nice work,

I am just wondering, for simple strokes like (flat) circles, triangles, rectangles, do we really need the renderer since we already have simpler state representation? For example, the circle only needs a center and a radius instead of a 10-value state vector.

关于其他数据集的问题

您好!我在使用CUB Birds 和 Stanford cars数据集进行训练时,图片只显示一个颜色,随着训练过程进行也没有其他变化,我对代码的修改仅有load_data(), 为什么会造成这种情况呢?

stroke

I want to get the final stroke parameters! What should I do? please! Thank you!

Stroke gen

Hi, looking at the draw() function it seems like the generator creates greyscale brushstrokes. Where do the colour parameters get inputted?

spectral normalization GAN

Have you tried spectral normalization GAN & adding L1 distance to WGAN loss? I wonder how these two changes could impact the performance:

1. Replacing WGAN-GP with spectral normalization

Spectral normalization has two main advantages:

  1. Slight performance improvement relative to WGAN-GP on ResNet. The inception score of spectral normalization had a slight upper hand — approximately 0.16 — with less deviation compared to WGAN-GP.

  2. Spectral normalization is ~30% more computationally efficient.
    Since both actors and critics use ResNet as the backbone, replacing WGAN-GP with spectral normalization can potentially yield meaningful results.

2. Combining WGAN-GP with spectral normalization

The authors of the spectral normalization paper suggest that combining WGAN-GP with spectral normalization can further improve the results compared to the baseline WGA-GP and spectral normalization GAN.

Training parameters

Hi !

I am trying to train the paint agent in my GPU. In the paper I could read that the training time was about 2 days in your case.

Can you tell me what parameters used you to train the paint agent? In my case the training time is more than 1 week (I am training the agent in a GPU too but I think that there is a lot of time difference).

Thanks so much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.