Giter VIP home page Giter VIP logo

detie's Introduction

DetIE: Multilingual Open Information Extraction Inspired by Object Detection

This repository contains the code for the paper DetIE: Multilingual Open Information Extraction Inspired by Object Detection by Michael Vasilkovsky, Anton Alekseev, Valentin Malykh, Ilya Shenbin, Elena Tutubalina, Dmitriy Salikhov, Mikhail Stepnov, Andrei Chertok and Sergey Nikolenko.

Disclaimers

All the results have been obtained using V100 GPU with CUDA 10.1.

Preparations

Download the files bundle from here. Each of them should be put into the corresponding directory:

  1. folder version_243 (DetIE_LSOIE) should be copied to: results/logs/default/version_243;
  2. folder version_263 (DetIE_IMoJIE) should be copied to: results/logs/default/version_263;
  3. files imojie_train_pattern.json, lsoie_test10.json and lsoie_train10.json should be copied to data/wikidata.

We suggest that you use the provided Dockerfile to deal with all the dependencies of this project.

E. g. clone this repository, then

cd DetIE/
docker build -t detie .
nvidia-docker run  -p 8808:8808 -it detie:latest bash

Once this docker image starts, we're ready for work.

Taking a minute to read the configs

This project uses hydra library for storing and changing the systems' metadata. The entry point to the arguments list that will be used upon running the scripts is the config/config.yaml file.

defaults:
  - model: detie-cut
  - opt: adam
  - benchmark: carb

model leads to config/model/... subdirectory; please see detie-cut.yaml for the parameters description.

opt/adam.yaml and benchmark/carb.yaml are the examples of configurations for the optimizer and the benchmark used.

If you want to change some of the parameters (e.g. max_epochs), not modifying the *.yaml files, just run e.g.

PYTHONPATH=. python some_..._script.py model.max_epochs=2

Training

PYTHONPATH=. python3 modules/model/train.py

Inference time

PYTHONPATH=. python3 modules/model/test.py model.best_version=243

This yields time in seconds when running inference against modules/model/evaluation/oie-benchmark-stanovsky/raw_sentences/all.txt using batch size equal to 32.

Should be 708.6 sentences/sec. on NVIDIA Tesla V100 GPU.

Evaluation

English sentences

To apply the model to CaRB sentences, run

cd modules/model/evaluation/carb-openie6/
PYTHONPATH=<repo root> python3 detie_predict.py
head -5 systems_output/detie243_output.txt

This will save the predictions into the modules/model/evaluation/carb-openie6/systems_output/ directory. The same should be done with modules/model/evaluation/carb-openie6/detie_conj_predictions.py.

To reproduce the DetIE numbers from the Table 3 in the paper, run

cd modules/model/evaluation/carb-openie6/
./eval.sh
  • detie243 is a codename for DetIE_{LSOIE}
  • detie243conj is a codename for DetIE_{LSOIE} + IGL-CA
  • detie263 is a codename for DetIE_{IMoJIE}
  • detie263conj is a codename for DetIE_{IMoJIE} + IGL-CA

Synthetic data

To generate sentences using Wikidata's triplets, one can run the scripts

PYTHONPATH=. python3 modules/scripts/data/generate_sentences_from_triplets.py  wikidata.lang=<lang> 
PYTHONPATH=. python3 modules/scripts/data/download_wikidata_triplets.py  wikidata.lang=<lang>

Cite

Please cite the original paper if you use this code.

@inproceedings{Vasilkovsky2022detie,
   author    = {Michael Vasilkovsky, Anton Alekseev, Valentin Malykh, Ilya Shenbin, Elena Tutubalina, 
               Dmitriy Salikhov, Mikhail Stepnov, Andrei Chertok and Sergey Nikolenko},
   title     = {{DetIE: Multilingual Open Information Extraction Inspired by Object Detection}},
   booktitle = {
       {Proceedings of the 36th {AAAI} Conference on Artificial Intelligence}
   },
   year      = {2022}
 }

Contact

Michael Vasilkovsky waytobehigh (at) gmail (dot) com

detie's People

Contributors

alexeyev avatar tutubalinaev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

detie's Issues

Error(s) in loading state_dict for TripletsExtractorBERTOnly

HI~

I tried to run a demo without training. Here is demo.py (put at DetIE/):

from modules.model.models import TripletsExtractorBERTOnly
from modules.model.apply import pprint_triplets

TEST_TEXTS = [
    "This is how Agent Archer received a medal from the US.",
    "Alexander Pushkin was shot at a duel in St. Petersburg.",
    "The assault rifle is one of the very popular weapons in the Sahara desert.",
    "Bill Gates owns Microsoft Corporation.",
    "Lindon fictional universe where entity is from legendarium, The Boeing Company produces 747, "
    "population of Ukraine has census Ukrainian Census (2020), Bellecour is on Lyon Metro Line D, "
    "Vierwaldstättersee uses transport by boat, Wang Mang follows religion Confucianism.",
    "Фоллаут есть отличная постапокалиптическая игра",
    "Roses are red. Violets are blue. Cleaning automatic rifles is the only thing I ever do.",
]


def main():
    best_ckpt_path = "results/logs/default/version_243/checkpoints/best.ckpt"
    best_hparams_path = "results/logs/default/version_243/hparams.yaml"
    model = TripletsExtractorBERTOnly.load_from_checkpoint(checkpoint_path=best_ckpt_path, hparams_file=best_hparams_path)
    # texts = list(cfg.model.viz_sentences)
    triplets = model.predict(list(TEST_TEXTS))
    pprint_triplets(TEST_TEXTS, triplets)

if __name__ == "__main__":
    main()

However, I met something error about TripletsExtractorBERTOnly:

Traceback (most recent call last):
  File "E:/github/DetIE/demo.py", line 44, in <module>
    main()
  File "E:/github/DetIE/demo.py", line 26, in main
    model = TripletsExtractorBERTOnly.load_from_checkpoint(checkpoint_path=best_ckpt_path, hparams_file=best_hparams_path)
  File "D:\anaconda3\lib\site-packages\pytorch_lightning\core\saving.py", line 156, in load_from_checkpoint
    model = cls._load_model_state(checkpoint, strict=strict, **kwargs)
  File "D:\anaconda3\lib\site-packages\pytorch_lightning\core\saving.py", line 204, in _load_model_state
    keys = model.load_state_dict(checkpoint["state_dict"], strict=strict)
  File "C:\Users\hp\AppData\Roaming\Python\Python36\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TripletsExtractorBERTOnly:
	Missing key(s) in state_dict: "pretrained_encoder.pooler.dense.weight", "pretrained_encoder.pooler.dense.bias". 
	Unexpected key(s) in state_dict: "pretrained_encoder.embeddings.position_ids". 

Could you help me with the error? Thank you very much.

Triple atomicity?

First of all, thank you for creating and publishing this system. I did some quick tests: it is very fast and indeed very promising.
I was wondering if ensuring triple atomicity (or minimality) was one of the goals of project. For example, given the sentence:
ERBB2 is known to activate both the PIK3CA gene and the JAK1 gene.
i am expecting to extract two triples:
ERBB2 activate the PIK3CA gene
ERBB2 activate the JAK1 gene

However, the system extracts only one (compound) triple: ERBB2 activate the PIK3CA gene and the JAK1 gene

Is this by design? Or are there maybe some configuration settings that i missed?
thanks a lot.

DetIE on BenchIE Dataset

Hey,
we evaluated both DetIE models on BenchIE. Here are our results:

### English: 
detie243 precision: 0.14823008849557523
detie243 recall: 0.04962962962962963
detie243 f1: 0.07436182019977802

detie263 precision: 0.3140794223826715
detie263 recall: 0.1288888888888889
detie263 f1: 0.1827731092436975

### Chinese:
detie243_zh precision: 0.09333333333333334
detie243_zh recall: 0.035211267605633804
detie243_zh f1: 0.051132213294375464

detie263_zh precision: 0.21568627450980393
detie263_zh recall: 0.09959758551307847
detie263_zh f1: 0.13626978664831382

### German: 
detie243_de precision: 0.10285714285714286
detie243_de recall: 0.03314917127071823
detie243_de f1: 0.05013927576601671

detie263_de precision: 0.1494949494949495
detie263_de recall: 0.06813996316758748
detie263_de f1: 0.09361163820366857 ```

Open Source License

Thanks for making the code public. Any chance that you add open source friendly licence to the code, so one can use it?

How to reimplement the results in LSOIE test sets and MultiOIE2016 (i.e., Table 4 and Table 5)?

Thanks for releasing the code of this great paper.

But when I reproduce your code, I find you seems only release the relevant code or dataset of the CaRB test dataset.

So I want to ask where I can find the relavant parts (the LSOIE test sets and MultiOIE2016)? For example, where is the LSOIE test dataset with IGL-CA in Table4? How to process the MultiOIE2016 dataset to your format? How to combine the Synth in Table5?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.