Giter VIP home page Giter VIP logo

rmn's Introduction

Related Memory Network (RMN)

  • End-to-End neural network architecture exploiting both memory network and relation network structures
  • State-of-the-art result in jointly trained bAbI-10k story-based question answering

Result

Task MemN2N DMN+ RN RMN
1 0.0 0.0 0.0 0.0
2 0.3 0.3 6.5 0.5
3 9.3 1.1 12.9 14.7
4 0.0 0.0 0.0 0.0
5 0.6 0.5 0.5 0.4
6 0.0 0.0 0.0 0.0
7 3.7 2.4 0.2 0.5
8 0.8 0.0 0.1 0.3
9 0.8 0.0 0.0 0.0
10 2.4 0.0 0.0 0.0
11 0.0 0.0 0.4 0.5
12 0.0 0.0 0.0 0.0
13 0.0 0.0 0.0 0.0
14 0.0 0.0 0.0 0.0
15 0.0 0.0 0.0 0.0
16 0.4 45.3 50.3 0.9
17 40.7 4.2 0.9 0.3
18 6.7 2.1 0.6 2.3
19 66.5 0.0 2.1 2.9
20 0.0 0.0 0.0 0.0
Mean error 6.6 2.8 3.7 1.2
Failed tasks 4 1 3 1

Prerequisites

  • Python 3.6
  • Tensorflow 1.3.0
  • dependencies
    • pip install tqdm colorlog

Usage

1. prepare data

To process bAbI story-based QA dataset, run:

$ python preprocessor.py --data story

To process bAbI dialog dataset, run:

$ python preprocessor.py --data dialog

2. train model

To train RMN on bAbI story-based QA dataset, run:

$ python ./babi_story/train.py  

To train RMN on bAbI dialog dataset task 4, run:

$ python ./babi_dialog/train.py --task 4 --embedding concat --word_embed_dim 50

To use match, use_match flag is required:

$ python ./babi_dialog/train.py --task 4 --use_match True --embedding concat --word_embed_dim 50

To test on OOV dataset, is_oov flag is required:

$ python ./babi_dialog/train.py --task 4 --is_oov True --embedding concat --word_embed_dim 50

rmn's People

Contributors

goldenaem avatar inmoonlight avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

rmn's Issues

Some more work which might be relevant

Hi,

In your paper, one of the important points you make is that your method is O(n) whereas Relation nets are O(n^2).

In fact, if we write both architectures in a (very) simplified way (and ignoring certain complications such as softmax), we can see that RMNs are indeed a slight variation on relation nets, which prevent this problem:

Relation net:
r(X) = sum_i(sum_j(f_1(f_2(x_i), f_3(x_j)))
For some f_1, f_2, f_3.
(In the case of the paper, f_2 and f_3 are the identity, and f_1 is an MLP on the concatenation of the inputs, which is additionally parameterized by p)

FMN (2-stage):
r(X) = sum_i(g_1(g_2(x_i), sum_j(g_3(x_j)))) i.e. we move the sum inside the bracket so we can re-use it for all i
For some g_1, g_2, g_3.
(In this case g_3 is attention with parameter p, g_1 is another attention, and g_2 is identity)

Note: in both cases we can change the function r(X) into a transformation of the individual elements, rather than a representation of the set, by simply not taking the outer sum. This allows us to stack multiple layers on top of each other, as you do with your n-stage FMN.

While this similarity might seem superficial, since very different functions are used for f_1,f_2,f_3 and g_1,g_2,g_3, it turns out a lot of other architectures exist which also fall into this pattern. In particular, the self-attention used in https://arxiv.org/abs/1706.03762 is a better comparison to your architecture, as it uses attention mechanisms.

There's also been a few pieces of work done which would fall into this second category, in particular this paper https://arxiv.org/abs/1703.06114 . I've also written a bit about these variants (as well as relation nets) here. It's possible some of the notes in these might be useful, and it may be interesting to compare your architecture to theirs on their problems (or try theirs on yours), as ultimately they solve the same problem, but with a large difference in architecture.

I hope that all made sense. I have some block diagrams of various architectures that might make things clearer, although they don't really show much on the conceptual level.

Best wishes,

Will

babi_story accuracy is incorrect for path-finding task

From babi_story/module.py:
final_pred = tf.one_hot(tf.argmax(pred[0], axis=1), depth=self.answer_vocab_size) * answer_bool[0] + tf.one_hot(tf.argmax(pred[1], axis=1), depth=self.answer_vocab_size) * answer_bool[1] + tf.one_hot(tf.argmax(pred[2], axis=1), depth=self.answer_vocab_size) * answer_bool[2]

final_answer = a_s[0] * answer_bool[0] + a_s[1] * answer_bool[1] + a_s[2] * answer_bool[2]

The pathfinding task depends on the order of the answers but the code above makes the accuracy measurement independent from the order of pred[0], pred[1], pred[2] and a_s[0], a_s[1], a_s[2].

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.