Giter VIP home page Giter VIP logo

clinicaltransformerner's People

Contributors

bugface avatar dparedespardo avatar yonghuiwuf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

clinicaltransformerner's Issues

preprocessing code for i2b2 dataset

hello, thanks for creating this library. I am trying to reproduce the results for bert on i2b2 2010,2012 and n2c2 2018. However, I have trouble converting these dataset into the conll-2003 txt file shown in test_data. I assume the preprocessing script are different for each dataset because i2b2 2010 (txt, con) and 2012 (txt, extent, tlink) have different file extension.

Is it possible to release the preprocessing scripts for easier reproducibility?

BERT-large (MIMIC)?

Hi, thank you for releasing this excellent resource. I'm wondering if you have released BERT-Large (MIMIC)? The model here only has 12 layers so must be BERT-base? Am I missing something?

fp16 training using pytorch amp

since PyTorch 1.6.0, a PyTorch amp package is available now for fp.16 training. We will update the code to use the PyTorch amp instead of Apex when it is possible.

No such file or directory: label2idx.json

Hi,

Trying to run a batch prediction as such:

python ./src/run_transformer_batch_prediction.py \
      --model_type bert \
      --pretrained_model models/mimiciii_bert_10e_128b/ \
      --raw_text_dir ./raw-mimic/ \
      --preprocessed_text_dir ./iob-mimic/ \
      --output_dir ./prediction-results \
      --max_seq_length 512 \
      --do_lower_case \
      --eval_batch_size 8 \
      --log_file ./log.txt\
      --do_format 0 \
      --do_copy

Running into this error:

Traceback (most recent call last):
  File "./src/run_transformer_batch_prediction.py", line 123, in <module>
    main(global_args)
  File "./src/run_transformer_batch_prediction.py", line 31, in main
    label2idx = json_load(os.path.join(args.pretrained_model, "label2idx.json"))
  File "/home/ubuntu/mimic2iob/ClinicalTransformerNER/src/common_utils/common_io.py", line 32, in json_load
    with open(ifn, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'models/mimiciii_bert_10e_128b/label2idx.json'

I've downloaded the pre-trained BERT base + MIMIC model from here:
https://transformer-models.s3.amazonaws.com/mimiciii_bert_10e_128b.zip

I don't see label2idx.json present after extracting the archive:

$ ls -ltr models/mimiciii_bert_10e_128b/
total 430396
-rw-r--r-- 1 ubuntu ubuntu    231508 Dec 11  2019 vocab.txt
-rw-r--r-- 1 ubuntu ubuntu       170 Dec 11  2019 tokenizer_config.json
-rw-r--r-- 1 ubuntu ubuntu       112 Dec 11  2019 special_tokens_map.json
-rw-r--r-- 1 ubuntu ubuntu         2 Dec 11  2019 added_tokens.json
-rw-r--r-- 1 ubuntu ubuntu 440470760 Dec 11  2019 pytorch_model.bin
-rw-r--r-- 1 ubuntu ubuntu       566 Dec 11  2019 config.json

Any help would be much appreciated. Thanks for your project!

Compatible with Transformers >= 2.11.0

Since 2.11.0, transformers altered several API names which can cause breaks in the current package, we will work on this issue to make the current package more compatible with various versions of Transformers.

Xlnet doesn't suport use_biaffine

python src/run_transformer_ner.py
--model_type xlnet
--pretrained_model xlnet-base-cased
--data_dir ./test_data/conll-2003
--new_model_dir ./new_bert_ner_model
--overwrite_model_dir
--predict_output_file ./bert_pred.txt
--max_seq_length 256
--save_model_core
--do_train
--do_predict
--model_selection_scoring strict-f_score-1
--do_lower_case
--train_batch_size 8
--eval_batch_size 8
--train_steps 500
--learning_rate 1e-5
--num_train_epochs 1
--gradient_accumulation_steps 1
--do_warmup
--seed 13
--warmup_ratio 0.1
--max_num_checkpoints 3
--log_file ./log.txt
--progress_bar
--early_stop 3

Traceback (most recent call last):
File "/data/datasets/yonghui/project/ClinicalTransformerNER/src/run_transformer_ner.py", line 169, in main
run_task(global_args)
File "/data/datasets/yonghui/project/ClinicalTransformerNER/src/transformer_ner/task.py", line 604, in run_task
model = model_model.from_pretrained(args.pretrained_model, config=config)
File "/home/yonghui.wu/.pyenv/versions/anaconda3-2021.11/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2024, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/data/datasets/yonghui/project/ClinicalTransformerNER/src/transformer_ner/model.py", line 308, in init
if config.use_biaffine:
File "/home/yonghui.wu/.pyenv/versions/anaconda3-2021.11/lib/python3.9/site-packages/transformers/configuration_utils.py", line 253, in getattribute
return super().getattribute(key)
AttributeError: 'XLNetConfig' object has no attribute 'use_biaffine'

Performance of XLNet and Longformer?

Thanks again for providing this repository and actively maintaining it. Do you have performance of XLNet and Longformer on the 2010 i2b2 test set, 2012 i2b2 test set, and/or 2018 n2c2 test set readily available and shareable?

add DeBERTa support

add DeBERTa for BERT:

  1. implementation
  2. test on CONLL-2003
  3. test on 2010-i2b2

continue train the NER model on new dataset

currently, we do not have a train from where it left function. Every training starts from a new model (at least a new linear classification layer).

We need to implement continuing training function to support use cases like we want to train more epochs on the same data or train on new data with the exact same labels (no new labels are allowed)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.