The cmtn from sxthunder

This repository is the source code of paper: "A combined recall and rank framework with online negative sampling for Chinese procedure terminology normaliztion"

folders and files:

/data contains the raw data file used in this paper, which could be downloaded from http://openkg.cn/dataset/yidu-n7k.

/dict contains the keywords dictionary used for keywords attentive mechanism, in which body.txt includes procedure site words and ot.txt includes procedure type words.

rerank_k_fold_data contains the k-fold training data for keywords attentive ranker, which is generated by candidate generator.

/output contains the output results(such as saved model, middel output, caches for prediction), you should pass a output_name parameter each time you run the experiment.

We use the bert trained on Chinese corpus provided by google. You could change the defalut path in train.py and rerank_keywords.py by the arg parameter pretrained_model_path, you could also change it in the running command

train and evaluate

candidate generator

Train:

# k_fold_id range from [0, 4]
# device_id is used when you have multiple gpu, starts from 0 
python train.py -output_name={your_output_name} -k_fold={k_fold_id} -device={devicd_id} -pretrained_model_path={your_pretrained_model_path}


# if you don't want to use k-fold, just run:
python train.py -output_name={your_output_name} -device={device_id} -pretrained_model_path={your_pretrained_model_path}

After you run the following code, there should be a folder /output/mto_output/{your_output_name}. If you use k-fold, there should be 5 folders for each fold, to evaluate, just run:

python train.py -output_name={your_ourput_name}_test -type=evaluate -k_fold=0 -saved_model_path=./output/mto_output/{your_output_name} -generate_candidates=test -device={device_id} -pretrained_model_path={your_pretrained_model_path}

keywords attentive ranker

Train

python rerank_keywords.py -k_fold={k_fold_id} -output_name={your_output_name} -device={device_id} -pretrained_model_path={your_pretrained_model_path}

Evaluate

python rerank_keywords.py -type=evaluate -output_name={your_output_name}_test -saved_model_path=./output/rerank_keywords_output/{your_output_name} -generate_candidates=test -device=0 -k_fold=0

sxthunder / cmtn Goto Github PK

cmtn's Introduction

folders and files:

train and evaluate

candidate generator

keywords attentive ranker

cmtn's People

Contributors

Stargazers

Watchers

Forkers

cmtn's Issues

你好，如何生成训练的candidates？

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent