stranse's Introduction

STransE: a novel embedding model of entities and relationships in knowledge bases

This STransE program provides the implementation of the embedding model STransE for knowledge base completion, as described in my NAACL-HLT 2016 paper:

Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson. 2016. STransE: a novel embedding model of entities and relationships in knowledge bases. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2016, pp. 460-466. [.bib]

Please cite my NAACL-HLT 2016 paper whenever STransE is used to produce published results or incorporated into other software.

The program also provides the implementation of the embedding model TransE. See an overview of embedding models of entities and relationships for knowledge base completion at HERE.

I would highly appreciate to have your bug reports, comments and suggestions about STransE. As a free open-source implementation, STransE is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

Usage

Compile the program

Suppose that g++ is already set to run in command line or terminal. After you clone or download (and then unzip) the program, you have to compile the program by executing:

SOURCE_DIR$ g++ -I ../SOURCE_DIR/ STransE.cpp -o STransE -O2 -fopenmp -lpthread

Note that the actual command starts from g++. Here SOURCE_DIR is simply used to denote the source code directory. Examples:

STransE$ g++ -I ../STransE/ STransE.cpp -o STransE -O2 -fopenmp -lpthread

STransE-master$ g++ -I ../STransE-master/ STransE.cpp -o STransE -O2 -fopenmp -lpthread

Run the program

To run the program, we perform:

$./STransE -model 1_OR_0 -data CORPUS_DIR_PATH -size <int> -l1 1_OR_0 -margin <double> -lrate <double> [-init 1_OR_0] [-nepoch <int>] [-evalStep <int>] [-nthreads <int>]

//For Windows OS: use ./STransE.exe instead of ./STransE

where hyper-parameters in [ ] are optional!

Required parameters:

-model: Specify the embedding model STransE or TransE. It gets value 1 or 0, where 1 denotes STransE while 0 denotes TransE.

-data: Specify path to the dataset directory. Find the dataset format instructions in the Datasets folder inside the source code directory.

-size: Specify the number of vector dimensions.

-l1: Specify the L1 or L2 norm. It gets value 1 or 0, where 1 denotes L1-norm while 0 denotes L2-norm.

-margin: Specify the margin hyper-parameter.

-lrate: Specify the SGD learning rate.

Optional parameters:

-init: Use when -model gets value 1 (i.e. for STransE). It gets value 1 or 0 in which the default value is 1. The value 1 means that the entity and relation vectors are initialized from external files (e.g. entity2vec.init and relation2vec.init in the Datasets folder inside the source code directory), while the value 0 means that the entity and relation vectors are randomly initialized.

-nepoch: Specify the number of training epochs. The default value is 2000.

-evalStep: Specify a step to save and evaluate the model, e.g., evaluating the model after each step of 500 training epochs. The default value is 2000.

-nthreads: Specify the number of multiple threads used for evaluation. The default value is 1. Note that evaluating link/entity prediction in knowledge bases is slow. If you can afford to run the program with many threads, the evaluation process will be much faster, thus you can even evaluate the model after each training epoch.

Evaluation metrics

For evaluating link/entity prediction, the program provides ranking-based scores as evaluation metrics, including the mean rank, the mean reciprocal rank, Hits@1, Hits@5 and Hits@10 in two setting protocols "Raw" and "Filtered".

Reproduce the STransE results

To reproduce the STransE results published in my NAACL-HLT 2016 paper, execute:

$ ./STransE -model 1 -data Datasets/WN18/ -size 50 -margin 5 -l1 1 -lrate 0.0005

$ ./STransE -model 1 -data Datasets/FB15k/ -size 100 -margin 1 -l1 1 -lrate 0.0001

stranse's People

Contributors

Stargazers

Watchers

stranse's Issues

Expected input format

What are the expected input formats of STransE? I am trying to recreate the paper results with the WN18RR dataset, but I keep getting a segmentation fault.

STransE -model 1 -data ../relationPrediction/data/WN18RR -size 50 -margin 5 -l1 1 -lrate 0.0005
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Model: STransE
Dataset: ../relationPrediction/data/WN18RR
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005

Program received signal SIGSEGV, Segmentation fault.
_IO_vfscanf_internal (s=0x0, format=0x412f36 "%s%d", argptr=argptr@entry=0x7fffffffdf78, errp=errp@entry=0x0) at vfscanf.c:347
347     vfscanf.c: No such file or directory.

I am attempting to init embeddings for this project:
[1] https://deepakn97.github.io/blog/2019/Knowledge-Base-Relation-Prediction/
[2] https://github.com/deepakn97/relationPrediction/

The file formats are identical, save for the .init files:

entity2id.txt
relation2id.txt
train.txt
valid.txt
test.txt

# Seg fault with identically formatted train.txt, valid.txt, test.txt, relation2id.txt, entity2id.txt
$ ./STransE -model 1 -data ../relationPrediction/data/WN18RR/ -size 50 -margin 5 -l1 1 -lrate 0.0005
Model: STransE
Dataset: ../relationPrediction/data/WN18RR/
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005
#relations = 11
#entities = 40943
Segmentation fault (core dumped)


# Working
$ ./STransE -model 1 -data Datasets/WN18/ -size 50 -margin 5 -l1 1 -lrate 0.0005
Model: STransE
Dataset: Datasets/WN18/
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005
#relations = 18
#entities = 40943
Optimize entity vectors, relation vectors and relation matrices:

datquocnguyen / stranse Goto Github PK

stranse's Introduction

STransE: a novel embedding model of entities and relationships in knowledge bases

Usage

Compile the program

Run the program

Evaluation metrics

Reproduce the STransE results

stranse's People

Contributors

Stargazers

Watchers

Forkers

stranse's Issues

Expected input format

Could you provide the hyper-parameters for training the TransE model?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent