This repo contains code for paper in ICADL2022 [Distantly Supervised Named Entity Recognition with Category-Oriented Confidence Calibration]
In this work, we study the noisy-labeled named entity recognition under distant supervision setting. We propose a category-oriented confidence calibration(Coca) strategy with an automatically confidence threshold calculation module. We integrate our method into a teacher-student self-training framework BOND to improve the model performance.
Python 3.7, Pytorch 1.11, Hugging Face Transformers v2.6.0.
We used four weakly labeled NER datasets provided by BOND: conll2003, wikigold, webpage, twitter and one distant dataset BC5CDR generated by the dictionary provided by AutoNER. The training scripts for all five open-domain distantly/weakly labeled NER datasets are in Scripts.