ethanfetaya / nri Goto Github PK

Neural relational inference for interacting systems - pytorch

License: MIT License

Python 100.00%

nri's Introduction

Neural relational inference for interacting systems

This repository contains the official PyTorch implementation of:

Neural relational inference for interacting systems.
Thomas Kipf*, Ethan Fetaya*, Kuan-Chieh Wang, Max Welling, Richard Zemel.
https://arxiv.org/abs/1802.04687 (*: equal contribution)

Abstract: Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data.

Requirements

Pytorch 0.2 (0.3 breaks simulation decoder)
Python 2.7 or 3.6

Data generation

To replicate the experiments on simulated physical data, first generate training, validation and test data by running:

cd data
python generate_dataset.py

This generates the springs dataset, use --simulation charged for charged particles.

Note: Make sure to use the same preprocessing and evaluation scripts (check the loss function as well) as in our code release to get comparable results.

Run experiments

From the project's root folder, simply run

python train.py

to train a Neural Relational Inference (NRI) model on the springs dataset. You can specify a different dataset by modifying the suffix argument: --suffix charged5 will run the model on the charged particle simulation with 5 particles (if it has been generated).

To train the encoder or decoder separately, run

python train_enc.py

python train_dec.py

respectively. We provide a number of training options which are documented in the respective training files.

Additionally, we provide code for an LSTM baseline (denoted LSTM (joint) in the paper), which you can run as follows:

python lstm_baseline.py

Cite

If you make use of this code in your own work, please cite our paper:

@article{kipf2018neural,
  title={Neural Relational Inference for Interacting Systems},
  author={Kipf, Thomas and Fetaya, Ethan and Wang, Kuan-Chieh and Welling, Max and Zemel, Richard},
  journal={arXiv preprint arXiv:1802.04687},
  year={2018}
}

nri's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 codeaudit amoliu arnabgho linpingchuan skaasj hyzcn tkipf fullstackenviormentss cmdphantom tony32769 tonydeep hoangcuong2011 gaxler tpnguyen siyuzhou ai3dvision waleedgondal afcarl falconzyx stevenlol hujunxianligong codes-kzhan elemkep xichen612 kelvinson mannykayy vhcg77 wshenx morronp reeshark njuhugn liukangling smazu mrmotallebi rlrs aszot klqulei sebastiani wtdeng mengcz13 fendaq hal2001 ysun57 qiansi phyllish guoshi28 fuxianh tianshulyu xiuyanni demarley miliana belindalcc007 liuqk3 justin-yuan hengfun raymondkoopmanschap lromor jingweiz yleng 3riccc chentangmark alexanderdurr fbbradheintz cdchushig mc-nya aliceeeeeeee zizai wwxfromtju coolsunxu etarakci-hvl syyunn fanyangr yogytes kingmbc jacr13 diligentest fagan2888 yaolezju wuminbin baran-phys lifanghe disiok kailintong mariel-pettee dqt1995 richardkelley nshervt kaimaoge bczhu freekang imera88 yuanshuai666 fischer19 pablomorales92 shuowang-ai aimihat feilongwang92 cxlz wang422003

nri's Issues

Compatibility with Pytorch 1.xx ?

Hi,
Great paper. Do you plan to make your code compatible with Pytorch 1.xx ?
Thanks.

Questions regarding the edges that are created in the latent space

What does edges represent here?

NRI/train.py

Line 203 in e63fcb0

edges = gumbel_softmax(logits, tau=args.temp, hard=args.hard)

What do the probabilities represent here?

NRI/train.py

Line 204 in e63fcb0

prob = my_softmax(logits, -1)

some errors in train.py

Line 93 os.mkdir ----> os.makedirs
Line 46 default='logs' ----> default='./logs'

Not a big problem, Just mention it here for others' convenient.

my_softmax

Why does the my_softmax function seems to be normalizing alongside the batch dimension instead of classes dimension?

/

Sorry, i make a mistake...

About edge_accuracy() in utils.py

First, thanks a lot for sharing this great repo.
I have two questions with the computation of relation prediction accuracy:

Suppose the model is trained and we only want to evaluate the trained model. The accuracy can be different with different values for the batch-size parameter (however, it should not be influenced by batch-size because the model does not change), especially when the number of test examples is not very large. The reason could be that not all batches have batch-size examples (if num_test_example % batch-size != 0). I feel it is better that edge_accuracy() in utils.py returns the average accuracy and the number of examples in this batch, and then compute the average in the main script by taking the division.
(If I understand correctly), we (or you) do not care about the ''absolute'' class label. It is more like clustering instead of classification. So, for the two-relation cases, the accuracy should be max(acc, 1.0-acc)? Besides, I wonder do you have some ideas to compute the accuracy with multiple (>2) relation cases? (the current edge_accuracy() function seems only suitable for two-relation case).

what does mean logits shape?

what does mean logits shape ?
logits = encoder(pts, rel_rec, rel_send)

my pts ----> torch.Size([32, 14, 30, 3])

logits - torch.Size([32, 182, 3])

Undirected latent graph

I was wondering if we can fix the latent graph to be an undirected graph. The schematics in figure 1 suggests that this would be possible, but I can't see an option for that in the code. Thanks!

Annealed Temperature in Gumbel Softmax

It doesn’t look like the temperature is annealed in your gumbel softmax. Is there a reason for this as it is not standard? @tkipf

Unsupervised learning

In the Appendix, A.2., unsupervised learning was done:

To test whether our model can infer an empty graph, we create a test set of 1000 simulations with 5 non-interacting particles and test an unsupervised NRI model which was trained on the spring simulation dataset with 5 particles as before. We find that it achieves an accuracy of 98.4% in identifying ”no interaction” edges (i.e. the empty graph).

Can someone point out do unsupervised learning from the code in this repo?

How to reproduce some paper results

Hi, thanks for the the code release.

To make sure that I am running the code properly, I am trying to reproduce some of the paper results. What's the correspondence between the results returned by the code and those reported in the paper? My understanding is as follows:

The values reported in Table 1 of the paper should be similar to np.mean(acc_test).
The values reported in Table 2 of the paper correspond to what in the code is called "mse". More precisely, in the code there are two similar variables referring to "mse" for test: mse_test and mean_mse. My understanding is that np.mean(mse_test) should be similar to the first column of Table 2 (because a prediction step of 1 is being used, see line 323 of train.py), and np.mean(mean_mse) should be similar to the third column of Table 2 (because a prediction step of 20 is being used, see line 351 of train.py).

Is this correct? Thank you!

Error in running the simulation

Hi,

I could generate the data using this command:
python generate_dataset.py

But when I want to run this command:

--simulation charged

It gives me this error:

error: '--simulation' is not recognized as an internal or external command, operable program or batch file.

relational inference in dynamic systems between different attributes

Hello,

I have read the paper and the code and I'm fascinated about this tool and their possible applications.

In my biological set-up I have different objects from which I want to create an interaction graph. Unfortunately, not all biological objects have the same number of attributes, e.g. fibrines have defined their morphometry but not their phenotype, and cells have defined their phenotype but not their morphology. I would like to know if there is any relation between them.

I have thought about creating an attribute vector containing all the features that are available. Following the example: fibrines would have a vector of 2 attributes with their morphometry leaving their phenotype undefined (using zeros or random numbers), and cells will have their phenotype defined leaving the morphometry undefined.

Can you give me any suggestions about this approach based on your experience?

Thank you,
Daniel Jiménez.

dynamic_graph

Hi, thanks for your outstanding works and contribution. I have a question that Can we use dynamic_graph in training step? If yes, can you give me some implementation guidance? Thank you very much!

is there is a plan to release Motion capture data in generate_dataset.py?

hi,thanks for your really great code!
it seems you just implement Physics simulations dataset in your code. i want to apply it to reasoning in video/image, and i dont know the meaning of npy file.
'edges_valid_springs5.npy' is (10000,5,5),what dose the last (5,5) mean in video.
'loc_valid_springs5.npy' is (10000,49,2,5),what dose the last(2,5) mean in video.
vel_valid_springs5.npy' is (10000,49,2,5),what dose the last(2,5) mean in video.
, and can those nodes be output of regien proposal like ROIAlign?
look forward to your reply.

Request for Kuramoto dataset

According to Section 5.1 of the original paper, I use the code by Laszuk (https://github.com/laszukdawid/Dynamical-systems/blob/master/kuramoto.py) to simulate the Kuramoto model. The settings are listed as follows.

N = 5 # number of particles
intrinsic frequencies \omega uniformly sampled from [1, 10)
initial phases \phi uniformly sampled from [0, 2\pi)
coupling constants k_{ij} = 1 with probability 0.5
subsample factor = 10
length of trajectories T = 50
particle states x = (d\phi / dt, sin \phi, \omega)

For normalization, I use the function load_kuramoto_data from utils.py.

Some important settings of NRI are listed as follows.

encoder: CNN
decoder: MLP
skip_first = True
lr = 5e-4
prediction_step = 10 # teacher forcing in every 10-th time step

It seems I've strictly followed the settings of the original paper, but the accuracy gets stucked at around 54%, and the mse gets stucked at the level of 1e-1. There must be some mistakes in simulation or training. Do you have any advice? Would you mind providing a copy of Kuramoto dataset to help me out?

How to plot the beautiful trajectory like your figure1?

Thanks for your amazing code. How to plot the beautiful trajectory like your figure1?

Where can I find the code of Eq. 12 in the paper??

Below is the code snippet of MLPDecoder.
I think prediction is ended with Eq. 11 in the paper.
I can't find the code of Eq. 12.
Am I missing something in this code??

Thanks in advance.

    def single_step_forward(self, single_timestep_inputs, rel_rec, rel_send,
                            single_timestep_rel_type):

        # single_timestep_inputs has shape
        # [batch_size, num_timesteps, num_atoms, num_dims]

        # single_timestep_rel_type has shape:
        # [batch_size, num_timesteps, num_atoms*(num_atoms-1), num_edge_types]

        # Node2edge 
        receivers = torch.matmul(rel_rec, single_timestep_inputs)
        senders = torch.matmul(rel_send, single_timestep_inputs)
        # Eq 10 [x_i^t, x_j^t] [#sims(batch_size), #tsteps_indexed, #edges, #dims*2]
        pre_msg = torch.cat([senders, receivers], dim=-1)
        # self.msg_out_shape = #node_features
        all_msgs = Variable(torch.zeros(pre_msg.size(0), pre_msg.size(1),
                                        pre_msg.size(2), self.msg_out_shape))
        if single_timestep_inputs.is_cuda:
            all_msgs = all_msgs.cuda()

        if self.skip_first_edge_type:
            start_idx = 1
        else:
            start_idx = 0

        # Run separate MLP for every edge type
        # NOTE: To exlude one edge type, simply offset range by 1
        # Eq 10 MLP
        for i in range(start_idx, len(self.msg_fc2)):
            msg = F.relu(self.msg_fc1[i](pre_msg))
            msg = F.dropout(msg, p=self.dropout_prob)
            msg = F.relu(self.msg_fc2[i](msg))
            msg = msg * single_timestep_rel_type[:, :, :, i:i + 1] #element-wise product with broadcast
            all_msgs += msg

        # Aggregate all msgs to receiver
        # Eq 11 / rel_rec [#edges, #nodes]
        agg_msgs = all_msgs.transpose(-2, -1).matmul(rel_rec).transpose(-2, -1)
        agg_msgs = agg_msgs.contiguous()

        # Skip connection
        aug_inputs = torch.cat([single_timestep_inputs, agg_msgs], dim=-1)

        # Output MLP
        pred = F.dropout(F.relu(self.out_fc1(aug_inputs)), p=self.dropout_prob)
        pred = F.dropout(F.relu(self.out_fc2(pred)), p=self.dropout_prob)
        pred = self.out_fc3(pred)

        # Predict position/velocity difference / Eq 11 >> Where is Eq 12??
        return single_timestep_inputs + pred

    def forward(self, inputs, rel_type, rel_rec, rel_send, pred_steps=1):
        # NOTE: Assumes that we have the same graph across all samples.
        # Input shape: [num_sims, num_atoms, num_timesteps, num_dims] > [#sims, #tsteps, #nodes, #dims]
        inputs = inputs.transpose(1, 2).contiguous()

        sizes = [rel_type.size(0), inputs.size(1), rel_type.size(1),
                 rel_type.size(2)]
        rel_type = rel_type.unsqueeze(1).expand(sizes)

        time_steps = inputs.size(1)
        assert (pred_steps <= time_steps)
        preds = []

        # Only take n-th timesteps as starting points (n: pred_steps)
        last_pred = inputs[:, 0::pred_steps, :, :]
        curr_rel_type = rel_type[:, 0::pred_steps, :, :]
        # NOTE: Assumes rel_type is constant (i.e. same across all time steps).

        # Run n prediction steps / Eq 10~11
        for step in range(0, pred_steps):
            last_pred = self.single_step_forward(last_pred, rel_rec, rel_send,
                                                 curr_rel_type)
            preds.append(last_pred)

        sizes = [preds[0].size(0), preds[0].size(1) * pred_steps,
                 preds[0].size(2), preds[0].size(3)]

        output = Variable(torch.zeros(sizes))
        if inputs.is_cuda:
            output = output.cuda()

        # Re-assemble correct timeline
        for i in range(len(preds)):
            output[:, i::pred_steps, :, :] = preds[i]
        # last prediction is one step beyond input
        pred_all = output[:, :(inputs.size(1) - 1), :, :]

        return pred_all.transpose(1, 2).contiguous()

Undirected latent graph

I was wondering if we can fix the latent graph to be an undirected graph. The schematics in figure 1 suggests that this would be possible, but I can't see an option for that in the code.

Is it possible to learn more than 2 edge-types in unsupervised manner?

Hello, thank you for your great work and nice code.

I saw the supplementary material, and it said that NRI can learn "known" 3 edge types (no-interaction, weak spring, strong spring).
In this sentence, dose "known" mean that NRI can learn the relations only in supervised manner, not in unsupervised manner?
In the source code, is it right that relation-supervised training is not implemented?

Again, thank you for your great work!

Results of charge experiment

Hi, I can not reproduce the experimental results of the charged simulation dataset. The accuracy is only 50+% and I didn't modify the code (just modify variable() to fit higher pytorch versions). Also, when I try to reproduce the experimental results of the spring simulation dataset, the accuracy is not good when I do not apply --skip_first (only about 70%). Can you help me out? Thank you very much!

'args' is an undefined name in utils.py

flake8 testing of https://github.com/ethanfetaya/NRI on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./utils.py:459:24: F821 undefined name 'args'
        const = np.log(args.edge_types)
                       ^

prior

Hi, why is the prior uniform distributed?

Some difference from the paper

Dear ethanfetaya:

I learn the codes of RNNDecoder and find some difference from the equations: (14)-(16) in your paper. In your code, you do not concatenate the MSG and x as the input of GRU and there is not additional hidden state. Why? Which is right?

what does num_atoms mean?

what does the argument num_atoms mean in the code?
atom is not shown in the paper.

An important issue.

In the test phase, the encoder sees ground truth data that should not be seen, resulting in higher precision. May I ask for some explanation?

Sport UV dataset

Hi,

Thanks for your great work,

Can you provide the link or the sport basketball dataset you used in your paper?

You also mentioned that you focused on the PnR instances of the game. How to find these instances?

Best,

non-interaction edge type

For the system in which 2 particles interact or not, such as the spring experiments, if we use z_{ij}=[0,1] to denote interaction, and z_{ij} = [1,0] to denote non-interaction(no message between node i and j), in the decoder should we only consider the interaction edge type, i.e., h^t_(i,j) = z_{ij,0}fe([x^t_i, x^t_j])? Since no message between the non-interaction edge.

Support for large graphs?

Many thanks for the interesting work.
Indeed, I am trying to use your model on large biological graphs (more than 10K nodes) but I am facing memory limits.
Basically, you are using the one-hot encoding for all the edges in a fully connected graph to exchange the messages and to facilitate the optimization of the ELBO. For very large graphs such encoding is not an option.
I tried using sparse tensors but the missing strides for torch.matmul (requires contiguous representation for the data) and the unsupported broadcasting for matrix multiplication with torch.mm limited my efforts to patch your implementation.
Do you have please an idea on how we could extend the application of your model on large graphs?
Thank you very much in advance.

what does mean edge_type?

How long does it takes to generate data?

It took me nearly 8 hours to generate data. Is it normal?
The CPU utilization is very low during the generation process, I suppose the program could be further optimized.

For the type of edge in the experimental setup.

There is no supervised training in training. How to know the first type is the existence side and the second type is the non existence side.
def edge_accuracy(preds, target): _, preds = preds.max(-1) # preds torch.Size([32, 20, 2]) preds_hou torch.Size([32, 20]) correct = preds.float().data.eq( target.float().data.view_as(preds)).cpu().sum() return np.float(correct) / (target.size(0) * target.size(1))

Error in class MLP def forward

The step x = F.elu(self.fc1(inputs)) has error. When using the forward in MLP class, the error says "mat1 and mat2 shapes cannot be multiplied (640*16 and 196*512)".