Giter VIP home page Giter VIP logo

stranse's Introduction

STransE: a novel embedding model of entities and relationships in knowledge bases

This STransE program provides the implementation of the embedding model STransE for knowledge base completion, as described in my NAACL-HLT 2016 paper:

Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson. 2016. STransE: a novel embedding model of entities and relationships in knowledge bases. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2016, pp. 460-466. [.bib]

Please cite my NAACL-HLT 2016 paper whenever STransE is used to produce published results or incorporated into other software.

The program also provides the implementation of the embedding model TransE. See an overview of embedding models of entities and relationships for knowledge base completion at HERE.

I would highly appreciate to have your bug reports, comments and suggestions about STransE. As a free open-source implementation, STransE is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

Usage

Compile the program

Suppose that g++ is already set to run in command line or terminal. After you clone or download (and then unzip) the program, you have to compile the program by executing:

SOURCE_DIR$ g++ -I ../SOURCE_DIR/ STransE.cpp -o STransE -O2 -fopenmp -lpthread

Note that the actual command starts from g++. Here SOURCE_DIR is simply used to denote the source code directory. Examples:

STransE$ g++ -I ../STransE/ STransE.cpp -o STransE -O2 -fopenmp -lpthread

STransE-master$ g++ -I ../STransE-master/ STransE.cpp -o STransE -O2 -fopenmp -lpthread

Run the program

To run the program, we perform:

$./STransE -model 1_OR_0 -data CORPUS_DIR_PATH -size <int> -l1 1_OR_0 -margin <double> -lrate <double> [-init 1_OR_0] [-nepoch <int>] [-evalStep <int>] [-nthreads <int>]

//For Windows OS: use ./STransE.exe instead of ./STransE

where hyper-parameters in [ ] are optional!

Required parameters:

-model: Specify the embedding model STransE or TransE. It gets value 1 or 0, where 1 denotes STransE while 0 denotes TransE.

-data: Specify path to the dataset directory. Find the dataset format instructions in the Datasets folder inside the source code directory.

-size: Specify the number of vector dimensions.

-l1: Specify the L1 or L2 norm. It gets value 1 or 0, where 1 denotes L1-norm while 0 denotes L2-norm.

-margin: Specify the margin hyper-parameter.

-lrate: Specify the SGD learning rate.

Optional parameters:

-init: Use when -model gets value 1 (i.e. for STransE). It gets value 1 or 0 in which the default value is 1. The value 1 means that the entity and relation vectors are initialized from external files (e.g. entity2vec.init and relation2vec.init in the Datasets folder inside the source code directory), while the value 0 means that the entity and relation vectors are randomly initialized.

-nepoch: Specify the number of training epochs. The default value is 2000.

-evalStep: Specify a step to save and evaluate the model, e.g., evaluating the model after each step of 500 training epochs. The default value is 2000.

-nthreads: Specify the number of multiple threads used for evaluation. The default value is 1. Note that evaluating link/entity prediction in knowledge bases is slow. If you can afford to run the program with many threads, the evaluation process will be much faster, thus you can even evaluate the model after each training epoch.

Evaluation metrics

For evaluating link/entity prediction, the program provides ranking-based scores as evaluation metrics, including the mean rank, the mean reciprocal rank, Hits@1, Hits@5 and Hits@10 in two setting protocols "Raw" and "Filtered".

Reproduce the STransE results

To reproduce the STransE results published in my NAACL-HLT 2016 paper, execute:

$ ./STransE -model 1 -data Datasets/WN18/ -size 50 -margin 5 -l1 1 -lrate 0.0005

$ ./STransE -model 1 -data Datasets/FB15k/ -size 100 -margin 1 -l1 1 -lrate 0.0001

stranse's People

Contributors

datquocnguyen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

stranse's Issues

Expected input format

What are the expected input formats of STransE? I am trying to recreate the paper results with the WN18RR dataset, but I keep getting a segmentation fault.

STransE -model 1 -data ../relationPrediction/data/WN18RR -size 50 -margin 5 -l1 1 -lrate 0.0005
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Model: STransE
Dataset: ../relationPrediction/data/WN18RR
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005

Program received signal SIGSEGV, Segmentation fault.
_IO_vfscanf_internal (s=0x0, format=0x412f36 "%s%d", argptr=argptr@entry=0x7fffffffdf78, errp=errp@entry=0x0) at vfscanf.c:347
347     vfscanf.c: No such file or directory.

I am attempting to init embeddings for this project:
[1] https://deepakn97.github.io/blog/2019/Knowledge-Base-Relation-Prediction/
[2] https://github.com/deepakn97/relationPrediction/

The file formats are identical, save for the .init files:

  • entity2id.txt
  • relation2id.txt
  • train.txt
  • valid.txt
  • test.txt
# Seg fault with identically formatted train.txt, valid.txt, test.txt, relation2id.txt, entity2id.txt
$ ./STransE -model 1 -data ../relationPrediction/data/WN18RR/ -size 50 -margin 5 -l1 1 -lrate 0.0005
Model: STransE
Dataset: ../relationPrediction/data/WN18RR/
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005
#relations = 11
#entities = 40943
Segmentation fault (core dumped)


# Working
$ ./STransE -model 1 -data Datasets/WN18/ -size 50 -margin 5 -l1 1 -lrate 0.0005
Model: STransE
Dataset: Datasets/WN18/
Number of epoches: 2000
Vector size: 50
Margin: 5
L1-norm: 1
SGD learing rate: 0.0005
#relations = 18
#entities = 40943
Optimize entity vectors, relation vectors and relation matrices:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.