Giter VIP home page Giter VIP logo

nri's Introduction

Neural relational inference for interacting systems

This repository contains the official PyTorch implementation of:

Neural relational inference for interacting systems.
Thomas Kipf*, Ethan Fetaya*, Kuan-Chieh Wang, Max Welling, Richard Zemel.
https://arxiv.org/abs/1802.04687 (*: equal contribution)

Neural Relational Inference (NRI)

Abstract: Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data.

Requirements

  • Pytorch 0.2 (0.3 breaks simulation decoder)
  • Python 2.7 or 3.6

Data generation

To replicate the experiments on simulated physical data, first generate training, validation and test data by running:

cd data
python generate_dataset.py

This generates the springs dataset, use --simulation charged for charged particles.

Note: Make sure to use the same preprocessing and evaluation scripts (check the loss function as well) as in our code release to get comparable results.

Run experiments

From the project's root folder, simply run

python train.py

to train a Neural Relational Inference (NRI) model on the springs dataset. You can specify a different dataset by modifying the suffix argument: --suffix charged5 will run the model on the charged particle simulation with 5 particles (if it has been generated).

To train the encoder or decoder separately, run

python train_enc.py

or

python train_dec.py

respectively. We provide a number of training options which are documented in the respective training files.

Additionally, we provide code for an LSTM baseline (denoted LSTM (joint) in the paper), which you can run as follows:

python lstm_baseline.py

Cite

If you make use of this code in your own work, please cite our paper:

@article{kipf2018neural,
  title={Neural Relational Inference for Interacting Systems},
  author={Kipf, Thomas and Fetaya, Ethan and Wang, Kuan-Chieh and Welling, Max and Zemel, Richard},
  journal={arXiv preprint arXiv:1802.04687},
  year={2018}
}

nri's People

Contributors

ethanfetaya avatar loewex avatar tkipf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nri's Issues

How long does it takes to generate data?

It took me nearly 8 hours to generate data. Is it normal?
The CPU utilization is very low during the generation process, I suppose the program could be further optimized.

what does mean logits shape?

what does mean logits shape ?
logits = encoder(pts, rel_rec, rel_send)

my pts ----> torch.Size([32, 14, 30, 3])

logits - torch.Size([32, 182, 3])

prior

Hi, why is the prior uniform distributed?

Support for large graphs?

Many thanks for the interesting work.
Indeed, I am trying to use your model on large biological graphs (more than 10K nodes) but I am facing memory limits.
Basically, you are using the one-hot encoding for all the edges in a fully connected graph to exchange the messages and to facilitate the optimization of the ELBO. For very large graphs such encoding is not an option.
I tried using sparse tensors but the missing strides for torch.matmul (requires contiguous representation for the data) and the unsupported broadcasting for matrix multiplication with torch.mm limited my efforts to patch your implementation.
Do you have please an idea on how we could extend the application of your model on large graphs?
Thank you very much in advance.

Results of charge experiment

Hi, I can not reproduce the experimental results of the charged simulation dataset. The accuracy is only 50+% and I didn't modify the code (just modify variable() to fit higher pytorch versions). Also, when I try to reproduce the experimental results of the spring simulation dataset, the accuracy is not good when I do not apply --skip_first (only about 70%). Can you help me out? Thank you very much!

Error in running the simulation

Hi,

I could generate the data using this command:
python generate_dataset.py

But when I want to run this command:

--simulation charged

It gives me this error:

error: '--simulation' is not recognized as an internal or external command, operable program or batch file.

Sport UV dataset

Hi,

Thanks for your great work,

Can you provide the link or the sport basketball dataset you used in your paper?

You also mentioned that you focused on the PnR instances of the game. How to find these instances?

Best,

An important issue.

In the test phase, the encoder sees ground truth data that should not be seen, resulting in higher precision. May I ask for some explanation?

Unsupervised learning

In the Appendix, A.2., unsupervised learning was done:

To test whether our model can infer an empty graph, we create a test set of 1000 simulations with 5 non-interacting particles and test an unsupervised NRI model which was trained on the spring simulation dataset with 5 particles as before. We find that it achieves an accuracy of 98.4% in identifying ”no interaction” edges (i.e. the empty graph).

Can someone point out do unsupervised learning from the code in this repo?

About edge_accuracy() in utils.py

First, thanks a lot for sharing this great repo.
I have two questions with the computation of relation prediction accuracy:

  1. Suppose the model is trained and we only want to evaluate the trained model. The accuracy can be different with different values for the batch-size parameter (however, it should not be influenced by batch-size because the model does not change), especially when the number of test examples is not very large. The reason could be that not all batches have batch-size examples (if num_test_example % batch-size != 0). I feel it is better that edge_accuracy() in utils.py returns the average accuracy and the number of examples in this batch, and then compute the average in the main script by taking the division.
  2. (If I understand correctly), we (or you) do not care about the ''absolute'' class label. It is more like clustering instead of classification. So, for the two-relation cases, the accuracy should be max(acc, 1.0-acc)? Besides, I wonder do you have some ideas to compute the accuracy with multiple (>2) relation cases? (the current edge_accuracy() function seems only suitable for two-relation case).

is there is a plan to release Motion capture data in generate_dataset.py?

hi,thanks for your really great code!
it seems you just implement Physics simulations dataset in your code. i want to apply it to reasoning in video/image, and i dont know the meaning of npy file.
'edges_valid_springs5.npy' is (10000,5,5),what dose the last (5,5) mean in video.
'loc_valid_springs5.npy' is (10000,49,2,5),what dose the last(2,5) mean in video.
vel_valid_springs5.npy' is (10000,49,2,5),what dose the last(2,5) mean in video.
, and can those nodes be output of regien proposal like ROIAlign?
look forward to your reply.

Undirected latent graph

I was wondering if we can fix the latent graph to be an undirected graph. The schematics in figure 1 suggests that this would be possible, but I can't see an option for that in the code.

Request for Kuramoto dataset

According to Section 5.1 of the original paper, I use the code by Laszuk (https://github.com/laszukdawid/Dynamical-systems/blob/master/kuramoto.py) to simulate the Kuramoto model. The settings are listed as follows.

N = 5 # number of particles
intrinsic frequencies \omega uniformly sampled from [1, 10)
initial phases \phi uniformly sampled from [0, 2\pi)
coupling constants k_{ij} = 1 with probability 0.5
subsample factor = 10
length of trajectories T = 50
particle states x = (d\phi / dt, sin \phi, \omega)

For normalization, I use the function load_kuramoto_data from utils.py.

Some important settings of NRI are listed as follows.

encoder: CNN
decoder: MLP
skip_first = True
lr = 5e-4
prediction_step = 10 # teacher forcing in every 10-th time step

It seems I've strictly followed the settings of the original paper, but the accuracy gets stucked at around 54%, and the mse gets stucked at the level of 1e-1. There must be some mistakes in simulation or training. Do you have any advice? Would you mind providing a copy of Kuramoto dataset to help me out?

non-interaction edge type

For the system in which 2 particles interact or not, such as the spring experiments, if we use z_{ij}=[0,1] to denote interaction, and z_{ij} = [1,0] to denote non-interaction(no message between node i and j), in the decoder should we only consider the interaction edge type, i.e., h^t_(i,j) = z_{ij,0}fe([x^t_i, x^t_j])? Since no message between the non-interaction edge.

relational inference in dynamic systems between different attributes

Hello,

I have read the paper and the code and I'm fascinated about this tool and their possible applications.

In my biological set-up I have different objects from which I want to create an interaction graph. Unfortunately, not all biological objects have the same number of attributes, e.g. fibrines have defined their morphometry but not their phenotype, and cells have defined their phenotype but not their morphology. I would like to know if there is any relation between them.

I have thought about creating an attribute vector containing all the features that are available. Following the example: fibrines would have a vector of 2 attributes with their morphometry leaving their phenotype undefined (using zeros or random numbers), and cells will have their phenotype defined leaving the morphometry undefined.

Can you give me any suggestions about this approach based on your experience?

Thank you,
Daniel Jiménez.

For the type of edge in the experimental setup.

There is no supervised training in training. How to know the first type is the existence side and the second type is the non existence side.
def edge_accuracy(preds, target): _, preds = preds.max(-1) # preds torch.Size([32, 20, 2]) preds_hou torch.Size([32, 20]) correct = preds.float().data.eq( target.float().data.view_as(preds)).cpu().sum() return np.float(correct) / (target.size(0) * target.size(1))

Where can I find the code of Eq. 12 in the paper??

Below is the code snippet of MLPDecoder.
I think prediction is ended with Eq. 11 in the paper.
I can't find the code of Eq. 12.
Am I missing something in this code??

Thanks in advance.

    def single_step_forward(self, single_timestep_inputs, rel_rec, rel_send,
                            single_timestep_rel_type):

        # single_timestep_inputs has shape
        # [batch_size, num_timesteps, num_atoms, num_dims]

        # single_timestep_rel_type has shape:
        # [batch_size, num_timesteps, num_atoms*(num_atoms-1), num_edge_types]

        # Node2edge 
        receivers = torch.matmul(rel_rec, single_timestep_inputs)
        senders = torch.matmul(rel_send, single_timestep_inputs)
        # Eq 10 [x_i^t, x_j^t] [#sims(batch_size), #tsteps_indexed, #edges, #dims*2]
        pre_msg = torch.cat([senders, receivers], dim=-1)
        # self.msg_out_shape = #node_features
        all_msgs = Variable(torch.zeros(pre_msg.size(0), pre_msg.size(1),
                                        pre_msg.size(2), self.msg_out_shape))
        if single_timestep_inputs.is_cuda:
            all_msgs = all_msgs.cuda()

        if self.skip_first_edge_type:
            start_idx = 1
        else:
            start_idx = 0

        # Run separate MLP for every edge type
        # NOTE: To exlude one edge type, simply offset range by 1
        # Eq 10 MLP
        for i in range(start_idx, len(self.msg_fc2)):
            msg = F.relu(self.msg_fc1[i](pre_msg))
            msg = F.dropout(msg, p=self.dropout_prob)
            msg = F.relu(self.msg_fc2[i](msg))
            msg = msg * single_timestep_rel_type[:, :, :, i:i + 1] #element-wise product with broadcast
            all_msgs += msg

        # Aggregate all msgs to receiver
        # Eq 11 / rel_rec [#edges, #nodes]
        agg_msgs = all_msgs.transpose(-2, -1).matmul(rel_rec).transpose(-2, -1)
        agg_msgs = agg_msgs.contiguous()

        # Skip connection
        aug_inputs = torch.cat([single_timestep_inputs, agg_msgs], dim=-1)

        # Output MLP
        pred = F.dropout(F.relu(self.out_fc1(aug_inputs)), p=self.dropout_prob)
        pred = F.dropout(F.relu(self.out_fc2(pred)), p=self.dropout_prob)
        pred = self.out_fc3(pred)

        # Predict position/velocity difference / Eq 11 >> Where is Eq 12??
        return single_timestep_inputs + pred

    def forward(self, inputs, rel_type, rel_rec, rel_send, pred_steps=1):
        # NOTE: Assumes that we have the same graph across all samples.
        # Input shape: [num_sims, num_atoms, num_timesteps, num_dims] > [#sims, #tsteps, #nodes, #dims]
        inputs = inputs.transpose(1, 2).contiguous()

        sizes = [rel_type.size(0), inputs.size(1), rel_type.size(1),
                 rel_type.size(2)]
        rel_type = rel_type.unsqueeze(1).expand(sizes)

        time_steps = inputs.size(1)
        assert (pred_steps <= time_steps)
        preds = []

        # Only take n-th timesteps as starting points (n: pred_steps)
        last_pred = inputs[:, 0::pred_steps, :, :]
        curr_rel_type = rel_type[:, 0::pred_steps, :, :]
        # NOTE: Assumes rel_type is constant (i.e. same across all time steps).

        # Run n prediction steps / Eq 10~11
        for step in range(0, pred_steps):
            last_pred = self.single_step_forward(last_pred, rel_rec, rel_send,
                                                 curr_rel_type)
            preds.append(last_pred)

        sizes = [preds[0].size(0), preds[0].size(1) * pred_steps,
                 preds[0].size(2), preds[0].size(3)]

        output = Variable(torch.zeros(sizes))
        if inputs.is_cuda:
            output = output.cuda()

        # Re-assemble correct timeline
        for i in range(len(preds)):
            output[:, i::pred_steps, :, :] = preds[i]
        # last prediction is one step beyond input
        pred_all = output[:, :(inputs.size(1) - 1), :, :]

        return pred_all.transpose(1, 2).contiguous()

How to reproduce some paper results

Hi, thanks for the the code release.

To make sure that I am running the code properly, I am trying to reproduce some of the paper results. What's the correspondence between the results returned by the code and those reported in the paper? My understanding is as follows:

  • The values reported in Table 1 of the paper should be similar to np.mean(acc_test).
  • The values reported in Table 2 of the paper correspond to what in the code is called "mse". More precisely, in the code there are two similar variables referring to "mse" for test: mse_test and mean_mse. My understanding is that np.mean(mse_test) should be similar to the first column of Table 2 (because a prediction step of 1 is being used, see line 323 of train.py), and np.mean(mean_mse) should be similar to the third column of Table 2 (because a prediction step of 20 is being used, see line 351 of train.py).

Is this correct? Thank you!

Error in class MLP def forward

The step x = F.elu(self.fc1(inputs)) has error. When using the forward in MLP class, the error says "mat1 and mat2 shapes cannot be multiplied (640*16 and 196*512)".

Is it possible to learn more than 2 edge-types in unsupervised manner?

Hello, thank you for your great work and nice code.

I saw the supplementary material, and it said that NRI can learn "known" 3 edge types (no-interaction, weak spring, strong spring).
In this sentence, dose "known" mean that NRI can learn the relations only in supervised manner, not in unsupervised manner?
In the source code, is it right that relation-supervised training is not implemented?

Again, thank you for your great work!

Some difference from the paper

Dear ethanfetaya:

I learn the codes of RNNDecoder and find some difference from the equations: (14)-(16) in your paper. In your code, you do not concatenate the MSG and x as the input of GRU and there is not additional hidden state. Why? Which is right?

my_softmax

Why does the my_softmax function seems to be normalizing alongside the batch dimension instead of classes dimension?

Undirected latent graph

I was wondering if we can fix the latent graph to be an undirected graph. The schematics in figure 1 suggests that this would be possible, but I can't see an option for that in the code. Thanks!

some errors in train.py

Line 93 os.mkdir ----> os.makedirs
Line 46 default='logs' ----> default='./logs'

Not a big problem, Just mention it here for others' convenient.

dynamic_graph

Hi, thanks for your outstanding works and contribution. I have a question that Can we use dynamic_graph in training step? If yes, can you give me some implementation guidance? Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.