thu-coai / ccm Goto Github PK

View Code? Open in Web Editor NEW

219.0 219.0 69.0 821 KB

This project is a tensorflow implement of our work, CCM (Commonsense Conversational Model).

License: Apache License 2.0

Python 100.00%

ccm's People

Stargazers

Watchers

ccm's Issues

Could you explain a little bit about the input data?

Hi there,

Could you explain a little bit about the input data?
For example, what are "match_triples", "match_index", "post_triples", "all_entities", "response_triples", "all_triples"?

Thanks,
Serena

ConceptNet and input data

Hi, I am very interested in your work, but I have some doubts about where is the input data and how do you use the Conceptnet? Thanks for sharing!

The download link for the dataset is invalid

Which tensorflow versions are attention_decoder & dynamic_decoder based on?

Hi, thanks for your great work! I'm a little confused when I try to read your codes, which tensorflow versions are attention_decoder.py & dynamic_decoder.py based on?

Can't get the dataset

Hi, tuxcow , I can't get the dataset from the download link. Any help, thanks.

Unable to download dataset

I have tried this link(http://coai.cs.tsinghua.edu.cn/file/commonsense_conversation_dataset.tar.gz), but it returned with error code 404. Where can I download your dataset?

Can't download the dataset

Hi, I am very interested in your work, but I can't download the dataset by http://coai.cs.tsinghua.edu.cn/hml/dataset/#commonsense

Can't get the dataset

Hi tuxcow , I can't get the dataset from the download link, could you provide the new download address? Or provide sample dataset? Thank you anyway~

How can you get the conceptNet data and its TransE representation？

Hi，thanks for your great work! I have one question. In the prepare-data stage, I want to know some details, How can you get the conceptNet data and its TransE representation? Can you put the code in the github?

Not the author but I'm working on extending this paper for my master's thesis so I've done some work to decode the input data and be able to recreate it so I think I can provide some insight.

Match_triples are the triples where an entity from the post and an entity from the response appear in the same commonsense knowledge triple.

Match_index is the list of response entities matched with post entities in the following format: the first index is the number of the entity in post_triples that matches the current word (the list is the length of the response), the second index is the index of the entity that matches in all_entities. [-1,-1] is appended if the response word is not an entity or the entity doesn't match anything in the post.

Post_triples is a list of the entities that appear in the post, with 0 representing an entity is not found in the list of entities and >0 indicating an index of entities starting with 1 and incrementing each time a new entity is found.

all_entities is a list of all the matching entities on the other end of a csk triple for the entities found in the post.

response_triples is either -1 if a word is not an entity or the entity doesn't match a triple in csk or the index of the matched triple where the current word is an entity and that entity is part of a triple in both the post and response.

all_triples is simply a list of all the matched triples between the post and response entities.

Below is the script I've written to recreate the training data. It seems to output extra entities in all_entities for some reason, at least more than what the authors found but that shouldn't break anything I think. If you find a bug in the script please let me know so I can update it on my end.

import json

test = {"post": ["you", "mean", "the", "occupation", "that", "did", "happen", "?"], "response": ["no", "i", "mean", "the", "fighting", "invasion", "that", "the", "military", "made", "so", "many", "purple", "hearts", "for", "in", "anticipation", "for", "that", "we", "have", "n't", "used", "up", "to", "this", "day", "."]}
f = open('resource.txt')
data = json.load(f)
f.close()

data['postEntityToCSKTripleIndex'] = {}
data['postEntityToOtherCSKTripleEntities'] = {}
index = 0
for triple in data['csk_triples']:
    firstEntity = triple.split(',')[0]
    secondEntity = triple.split(',')[2].strip()
    if(not firstEntity in data['postEntityToCSKTripleIndex']):
        data['postEntityToCSKTripleIndex'][firstEntity] = []
    data['postEntityToCSKTripleIndex'][firstEntity].append(index)
    if(not secondEntity in data['postEntityToCSKTripleIndex']):
        data['postEntityToCSKTripleIndex'][secondEntity] = []
    data['postEntityToCSKTripleIndex'][secondEntity].append(index)

    if (not firstEntity in data['postEntityToOtherCSKTripleEntities']):
        data['postEntityToOtherCSKTripleEntities'][firstEntity] = []
    data['postEntityToOtherCSKTripleEntities'][firstEntity].append(data['dict_csk_entities'][secondEntity])
    if (not secondEntity in data['postEntityToOtherCSKTripleEntities']):
        data['postEntityToOtherCSKTripleEntities'][secondEntity] = []
    data['postEntityToOtherCSKTripleEntities'][secondEntity].append(data['dict_csk_entities'][firstEntity])
    index += 1

data['indexToCSKTriple'] = {v: k for k,v in data['dict_csk_triples'].items()}

post_triples = []
all_triples = []
all_entities = []

post = test['post']
index = 0
for word in post:
    try:
        entityIndex = data['dict_csk_entities'][word]
        index += 1
        post_triples.append(index)
        all_triples.append(data['postEntityToCSKTripleIndex'][word])
        all_entities.append(data['postEntityToOtherCSKTripleEntities'][word])
    except:
        post_triples.append(0)
test['post_triples'] = post_triples
test['all_triples'] = all_triples
test['all_entities'] = all_entities

response_triples = []
match_index = []
match_triples = []
for word in test['response']:
    try:
        found = False
        entityIndex = data['dict_csk_entities'][word]
        for index,entitiesList in enumerate(test['all_entities']):
            for subindex,entity in enumerate(entitiesList):
                if(entity == entityIndex):
                    match_index.append([index+1,subindex])
                    response_triples.append(test['all_triples'][index][subindex])
                    match_triples.append(test['all_triples'][index][subindex])
                    found = True
                    break
        if not found:
            response_triples.append(-1)
            match_index.append([-1,-1])
    except:
        response_triples.append(-1)
        match_index.append([-1,-1])

test['response_triples'] = response_triples
test['match_index'] = match_index
test['match_triples'] = match_triples
print(str(test))

Originally posted by @andrewtackett in https://github.com/tuxchow/ccm/issues/3#issuecomment-461907771

How do you choose a triplet for a word in the post?

For example, if the post is " I like oranges" then for the word oranges there are more than one triplet extracted. Now for getting the kg embeddings which triplet to choose?

Unable to download dataset

I have tried both the normal and ftp version of download listed here http://coai.cs.tsinghua.edu.cn/hml/dataset/#commonsense

The download doesn't proceed after 274KB

Thanks

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.