Giter VIP home page Giter VIP logo

sg2im's Introduction

sg2im

This is the code for the paper

Image Generation from Scene Graphs
Justin Johnson, Agrim Gupta, Li Fei-Fei
Presented at CVPR 2018

Please note that this is not an officially supported Google product.

A scene graph is a structured representation of a visual scene where nodes represent objects in the scene and edges represent relationships between objects. In this paper we present and end-to-end neural network model that inputs a scene graph and outputs an image.

Below we show some example scene graphs along with images generated from those scene graphs using our model. By modifying the input scene graph we can exercise fine-grained control over the objects in the generated image.


If you find this code useful in your research then please cite

@inproceedings{johnson2018image,
  title={Image Generation from Scene Graphs},
  author={Johnson, Justin and Gupta, Agrim and Fei-Fei, Li},
  booktitle={CVPR},
  year={2018}
}

Model

The input scene graph is processed with a graph convolution network which passes information along edges to compute embedding vectors for all objects. These vectors are used to predict bounding boxes and segmentation masks for all objects, which are combined to form a coarse scene layout. The layout is passed to a cascaded refinement network (Chen an Koltun, ICCV 2017) which generates an output image at increasing spatial scales. The model is trained adversarially against a pair of discriminator networks which ensure that output images look realistic.

Setup

All code was developed and tested on Ubuntu 16.04 with Python 3.5 and PyTorch 0.4.

You can setup a virtual environment to run the code like this:

python3 -m venv env               # Create a virtual environment
source env/bin/activate           # Activate virtual environment
pip install -r requirements.txt   # Install dependencies
echo $PWD > env/lib/python3.5/site-packages/sg2im.pth  # Add current directory to python path
# Work for a while ...
deactivate  # Exit virtual environment

Pretrained Models

You can download pretrained models by running the script bash scripts/download_models.sh. This will download the following models, and will require about 355 MB of disk space:

  • sg2im-models/coco64.pt: Trained to generate 64 x 64 images on the COCO-Stuff dataset. This model was used to generate the COCO images in Figure 5 from the paper.
  • sg2im-models/vg64.pt: Trained to generate 64 x 64 images on the Visual Genome dataset. This model was used to generate the Visual Genome images in Figure 5 from the paper.
  • sg2im-models/vg128.pt: Trained to generate 128 x 128 images on the Visual Genome dataset. This model was used to generate the images in Figure 6 from the paper.

Table 1 in the paper presents an ablation study where we disable various components of the full model. You can download the additional models used in this ablation study by running the script bash scripts/download_ablated_models.sh. This will download 12 additional models, requiring and additional 1.25 GB of disk space.

Running Models

You can use the script scripts/run_model.py to easily run any of the pretrained models on new scene graphs using a simple human-readable JSON format. For example you can replicate the sheep images above like this:

python scripts/run_model.py \
  --checkpoint sg2im-models/vg128.pt \
  --scene_graphs scene_graphs/figure_6_sheep.json \
  --output_dir outputs

The generated images will be saved to the directory specified by the --output_dir flag. You can control whether the model runs on CPU or GPU using py passing the flag --device cpu or --device gpu.

We provide JSON files and pretrained models allowing you to recreate all images from Figures 5 and 6 from the paper.

(Optional): GraphViz

This script can also draw images for the scene graphs themselves using GraphViz; to enable this option just add the flag --draw_scene_graphs 1 and the scene graph images will also be saved in the output directory. For this option to work you must install GraphViz; on Ubuntu 16.04 you can simply run sudo apt-get install graphviz.

Training new models

Instructions for training new models can be found here.

sg2im's People

Contributors

faviovazquez avatar jcjohnson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sg2im's Issues

Image quality

Have tried generating scenes using coco and vg . This is a very nice idea. However, the image quality is very bad , looks like not enough models exist here.

Also, the noise is also being added in code. May I know the reason please.

Is there any extension to it, as the current quality images being produced are good for old atari games but not for business.

No module named 'sg2im'

Hello there,

I was trying to run the pretrained model using the method descried in the ReadMe, but was getting the following error.

Traceback (most recent call last): File "scripts/run_model.py", line 22, in <module> from sg2im.model import Sg2ImModel ModuleNotFoundError: No module named 'sg2im'

Any help would be greatly appreciated.

Question about Inception Scores

Hello,

I'm trying to replicate the Inception Scores shown in the sg2im paper and I have been using a version of this: https://github.com/sbarratt/inception-score-pytorch/blob/master/inception_score.py by giving the predicted and ground-truth image tensors as input. When I do this, my results are significantly lower than the inception scores in the paper. Can someone provide any guidance for how I should try to calculate the inception scores so that I can replicate the numbers in the Sg2im paper?

Some questions about GraphConvNet

Hi, I read your paper recently and I have some questions about the "Graph Convolution Network" part, which I am not familiar with:

  1. I noticed this sentence, "Updating object vectors is more complex, since an object may participate in many relationships". I don't quite understand the scope of "many relationships", relationships in a batch? or in the whole dataset? I think your code suggests "in a batch". Is that correct?
  2. I think the graph convolution net is co-trained with the whole model. Is it possible to train graph embeddings as an individual part or as a pre-processed part?

This is actually not a bug but some personal confusions. Actually, I am not a native English speaker, so if you find some expression vague, please feel free to ask me.

Your work is great. :)

Vocab mismatch between checkpoint and paper

Was just running the checkpoints for COCO & VG.

For VG there are indeed 45 relationships plus a "in_image" relationship, which matches the paper on arxiv. However, for COCO there are additional "touching" relationships, which brings the total of non "in_image" relationships to 10.

@jcjohnson could you potentially help clarify this question?

Dataloader on COCO Validation dataset

Hello, Johnson, I found a mistake about the dataloader on COCO Validation dataset like the following pictures. There are totally 172 classes on COCO dataset, however, the returned tensor "objs" in code includes No.177 objects. Maybe there exist some mistakes, I'm looking forward for your reply, thank you!
image

I find some mistake in your code

mask = imresize(255 * mask, (self.mask_size, self.mask_size), mode='constant')
in the coco.py line 281, in there should use 255.0, when use a integer, the following mask will be a 0 matrix.

Some confusions at coco 2017 stuff dataset

I've download the coco 2017 stuff_trainval annotations at this link, and found the number of image ids containing stuff annotations is total 118280 rather than 40k mentioned in paper.
The corresponding code is in line 151 of coco.py:
click here

Does anyone meet this problem? Thanks.

In the coco.py, some doubts about the objects' relationships

In the coco.py, line 337-348

if sx0 < ox0 and sx1 > ox1 and sy0 < oy0 and sy1 > oy1:
        p = 'surrounding'
elif sx0 > ox0 and sx1 < ox1 and sy0 > oy0 and sy1 < oy1:
        p = 'inside'
elif theta >= 3 * math.pi / 4 or theta <= -3 * math.pi / 4:
        p = 'left of'
elif -3 * math.pi / 4 <= theta < -math.pi / 4:
        p = 'above'
elif -math.pi / 4 <= theta < math.pi / 4:
        p = 'right of'
elif math.pi / 4 <= theta < 3 * math.pi / 4:
        p = 'below'

I have some doubts about the 'above' and 'below', i think they should be exchanged and the code is

elif -3 * math.pi / 4 <= theta < -math.pi / 4:
        p = 'below'
elif math.pi / 4 <= theta < 3 * math.pi / 4:
        p = 'above'

A better way to calculate object area

According to your paper, objects that smaller than 2% will be ignored.

_, _, w, h = object_data['bbox']

but this code only consider the width and height,
and this could not reflect the true area of an object,

For example, this is a "clothes" stuff, and most of the area is blank. (Original image: http://cocodataset.org/#explore?id=245764)
clothes

the area calculated by w and h is 10692,
but if you use codes like `np.sum(mask)', you get a more accurate area of 1115.

No module named 'tkinter'

when I run the run_model.py, consolo tell me :No module named 'tkinter'.
pip install tkinter :404

scene graphs

Could you provide codes for generating scene graphs in JSON format?

Segmentation fault: 11

Hello,
I am getting the following output:

Segmentation fault: 11

After using the following command:

>> python run_model.py   --checkpoint sg2im-models/vg128.pt   --scene_graphs scene_graphs/figure_6_sheep.json   --output_dir outputs --device cpu

Environment:

  • OSX Monterey 12.0.1
  • MacBook Air 2018
  • Conda virtual environment
  • Python 3.5.6
  • Matplotlib installed via Conda
  • Rest of requirements installed via pip

Switching to eval mode after 100k iterations

Hi @jcjohnson,

Thanks for sharing the code and for the great work on this.

While looking at the training code, I've came across the following line: https://github.com/google/sg2im/blob/master/scripts/train.py#L510

It seems in this line you switch the model to eval mode (although existence of batchnorm/dropout layers) and create a new optimizer instance.

I'm wondering what is the justifications for this or whether you have found this to be useful for particular reason?

Thanks you so much for your time.

Best,
Amir

[Paper Confusion] Figure 3

Screen Shot 2019-07-06 at 1 06 35 AM

I have read this so many times and one thing i can not make sense is that with an edge (o3, r2, o2) the output vector v'3 should be calculate by function gs but in this graph it was calculated by go. Is there any mistake or am i misleading something?

Undefined names: 'conv' and 'Variable()'

Undefined names have the potential to raise NameError at runtime...

flake8 testing of https://github.com/google/sg2im on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./sg2im/layers.py:57:28: F821 undefined name 'conv'
    nn.init.kaiming_normal(conv.weight)
                           ^
./sg2im/layers.py:59:29: F821 undefined name 'conv'
    nn.init.kaiming_uniform(conv.weight)
                            ^
./sg2im/losses.py:152:11: F821 undefined name 'Variable'
  x_hat = Variable(x_hat, requires_grad=True)
          ^
3     F821 undefined name 'conv'
3

Is this project still being worked upon?

I can see this project was last updated 11 months ago. Wondering if someone is already working on this?
Was planning to use to it generate and altering images while someone describes them.

Bug report: the model does not use any relationship in training.

Hi,
I possibly have found a major bug in the data loading code sg2im/data/vg.py, which hides all the relationship and the model only sees the type and number of objects.

In sg2im/data/vg.py line 66, the getitem function use a python set to store the object index that is involved in some relationships, keep a certain number of objects, and then keep the relationships whose subjects and objects are both kept. But the following simple example shows that this doesn't work.

s = set()
x = torch.LongTensor([1, 2, 3])

s.add(x[0])
x[0] in s  # False

When adding

assert len(triples) == 0

to line 134, vg.py, the training can go through, which proves that the model does not see any relationship except for in_image

When generating images with the pre-trained model vg124.pt, the following two scene graphs generates almost the same images.

{
      "objects": ["sky", "grass", "sheep", "sheep", "tree", "ocean", "boat"],
      "relationships": [
        [0, "above", 1],
        [2, "standing on", 1],
        [3, "by", 2],
        [4, "behind", 2],
        [5, "by", 4],
        [6, "on", 1]
      ]   
}
{
      "objects": ["sky", "grass", "sheep", "sheep", "tree", "ocean", "boat"],
      "relationships": [
        [0, "standing on", 1],
        [2, "standing on", 1],
        [3, "standing on", 2],
        [4, "standing on", 2],
        [5, "standing on", 4],
        [6, "standing on", 1]
      ] 
}

I recommend to turn pytorch scalar tensor to python int object before put it in python set, and the pre-trained model may need some update.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.