Giter VIP home page Giter VIP logo

k-ret's Introduction

K-RET: Knowledgeable Biomedical Relation Extraction System

K-RET is a flexible biomedical RE system, allowing for the use of any pre-trained BERT-based system (e.g., SciBERT and BioBERT) to inject knowledge in the form of knowledge graphs from a single source or multiple sources simultaneously. This knowledge can be applied to various contextualizing tokens or just to the tokens of the candidate relation for single and multi-token entities.

Our academic paper which describes K-RET in detail can be found here.

The uer folder corresponds to an updated version of the toolkit developed by Zhao et al. (2019) available here.

Downloading Pre-Trained Models

You should use both a baseline model and one of our pre-trained models to make predictions on new data. If you wish to train a new model on your data, you only need a baseline model, which can be either model referenced in our academic paper.

Baseline Models

After downloading a baseline model, for instance SciBERT, the model needs to be converted using the uer toolkit. For this, you can run the following example, making the necessary adaptations given different baseline models or paths.

cd K-RET/uer/
python3 convert_bert_from_huggingface_to_uer.py --input_model_path ../models/pre_trained_model_scibert/scibert_scivocab_uncased/pytorch_model.bin --output_model_path ../models/pre_trained_model_scibert/output_model.bin

Our Models

Available versions of the best performing pre-trained models are as follows:

The training details are described in our academic paper.

Getting Started

Our project includes code adaption of the K-BERT model available here. Use the K-RET Image available at Docker Hub to set up the rest of the experimental environment.

Usage Example

 CUDA_VISIBLE_DEVICES='1,2,3' python3 -u run_classification.py \
    --pretrained_model_path ./models/pre_trained_model_scibert/output_model.bin \
    --config_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/config.json \
    --vocab_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/vocab.txt \
    --train_path ./datasets/ddi_corpus/train.tsv \
    --dev_path ./datasets/ddi_corpus/dev.tsv \
    --test_path ./datasets/ddi_corpus/test.tsv \
    --class_weights True \
    --weights "[0.234, 3.377, 4.234, 6.535, 24.613]" \
    --epochs_num 30 \
    --batch_size 32 \
    --kg_name "['ChEBI']" \
    --output_model_path ./outputs/scibert_ddi.bin | tee ./outputs/scibert_ddi.log &

For more options check run.sh and, for additional configuration settings (e.g., max_number_entities and contextual_knowledge), check brain/config.py.

Predict New Data Example

CUDA_VISIBLE_DEVICES='0' python3 -u run_classification.py \
    --pretrained_model_path ./models/pre_trained_model_scibert/output_model.bin \
    --config_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/config.json \
    --vocab_path ./models/pre_trained_model_scibert/scibert_scivocab_uncased/vocab.txt \
    --train_path ./datasets/ddi_corpus/train.tsv \
    --dev_path ./datasets/ddi_corpus/dev.tsv \
    --class_weights True \
    --weights "[0.234, 3.377, 4.234, 6.535, 24.613]" \
    --test_path ./datasets/ddi_corpus/test.tsv \
    --epochs_num 30 --batch_size 32 --kg_name "[]" \
    --testing True \
    --to_test_model ./outputs/scibert_ddi.bin \
    | tee ./outputs/ddi_results.log &

Process Results Example

python3 src/process_results.py ./outputs/ddi_results.log ./datasets/ddi_corpus/test.tsv ddi_results.tsv

Reference

  • Diana Sousa and Francisco M. Couto. 2022. K-RET: Knowledgeable Biomedical Relation Extraction System. Bioinformatics.

k-ret's People

Contributors

dpavot avatar sirconceicao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k-ret's Issues

Cannot use the code you provide to convert roberta

Cannot use the code you provide to convert roberta
the following error will happen
Traceback (most recent call last):
File "convert_bert_from_huggingface_to_uer.py", line 73, in
main()
File "convert_bert_from_huggingface_to_uer.py", line 49, in main
output_model["embedding.word_embedding.weight"] = input_model["bert.embeddings.word_embeddings.weight"]
KeyError: 'bert.embeddings.word_embeddings.weight'

Don't know dataset format

thank you for provide this code, but I don't know what the data format, can you provide the dataset directlty, and I don't know how to split data, the dataset url you provide I don't which to use like DDI

CUDA error

Hello Using your docker dpavot/kret:update I get an error

root@jm-Z490-AORUS-ULTRA:/workspaces/K-RET#  CUDA_VISIBLE_DEVICES='0,1' python3 -u run_classification.py \
>     --pretrained_model_path /workspaces/K-RET/models/pre_trained_model_scibert/output_model.bin \
>     --config_path /workspaces/K-RET/models/pre_trained_model_scibert/scibert_scivocab_uncased/config.json \
>     --vocab_path /workspaces/K-RET/models/pre_trained_model_scibert/scibert_scivocab_uncased/vocab.txt \
>     --train_path /workspaces/K-RET/datasets/pgr_corpus/train.tsv \
>     --dev_path /workspaces/K-RET/datasets/pgr_corpus/dev.tsv \
>     --test_path /workspaces/K-RET/datasets/pgr_corpus/test.tsv \
>     --class_weights True \
>     --weights "[0.234, 3.377, 4.234, 6.535, 24.613]" \
>     --epochs_num 30 \
>     --batch_size 32 \
>     --kg_name "['ChEBI']" \
>     --output_model_path /workspaces/K-RET/outputs/scibert_ddi.bin | tee /workspaces/K-RET/outputs/scibert_ddi.log &
[1] 421
root@jm-Z490-AORUS-ULTRA:/workspaces/K-RET# 
root@jm-Z490-AORUS-ULTRA:/workspaces/K-RET# [nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Vocabulary file line 30107 has bad format token
Vocabulary Size:  31090
Namespace(batch_size=32, bidirectional=False, block_size=2, class_weights='True', config_path='/workspaces/K-RET/models/pre_trained_model_scibert/scibert_scivocab_uncased/config.json', dev_path='/workspaces/K-RET/datasets/pgr_corpus/dev.tsv', dropout=0.1, emb_size=768, encoder='bert', epochs_num=30, feedforward_size=3072, heads_num=12, hidden_size=768, kernel_size=3, kg_name="['ChEBI']", labels_num=2, layers_num=12, learning_rate=2e-05, mean_reciprocal_rank=False, no_vm=False, output_model_path='/workspaces/K-RET/outputs/scibert_ddi.bin', pooling='first', pretrained_model_path='/workspaces/K-RET/models/pre_trained_model_scibert/output_model.bin', report_steps=100, seed=7, seq_length=256, sub_layers_num=2, sub_vocab_path='models/sub_vocab.txt', subencoder='avg', subword_type='none', target='bert', test_path='/workspaces/K-RET/datasets/pgr_corpus/test.tsv', testing=False, to_test_model=None, tokenizer='bert', train_path='/workspaces/K-RET/datasets/pgr_corpus/train.tsv', vocab=<uer.utils.vocab.Vocab object at 0x7f001c4a5160>, vocab_path='/workspaces/K-RET/models/pre_trained_model_scibert/scibert_scivocab_uncased/vocab.txt', warmup=0.1, weights='[0.234, 3.377, 4.234, 6.535, 24.613]', workers_num=1)
[BertClassifier] use visible_matrix: True
2 GPUs are available. Let's use them.
[KnowledgeGraph] Loading spo from /workspaces/K-RET/brain/kgs/chebi.spo
Start training.
Loading sentences from /workspaces/K-RET/datasets/pgr_corpus/train.tsv
There are 4050 sentence in total. We use 1 processes to inject knowledge into sentences.
Progress of process 0: 0/4050
Shuffling dataset
Trans data to tensor.
input_ids
label_ids
mask_ids
pos_ids
vms
Batch size:  32
The number of training instances: 4050
terminate called after throwing an instance of 'std::runtime_error'
  what():  NCCL Error 1: unhandled cuda error
Fatal Python error: Aborted

Current thread 0x00007f0108ce5740 (most recent call first):
  File "/usr/local/lib/python3.6/dist-packages/torch/cuda/comm.py", line 40 in broadcast_coalesced
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py", line 21 in forward
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/replicate.py", line 13 in replicate
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 147 in replicate
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 142 in forward
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489 in __call__
  File "run_classification.py", line 581 in main
  File "run_classification.py", line 622 in <module>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.