The NLP-TAB corpus is a collection of 120 UTF-8 plain text synthetic clinical notes. These notes are sourced from the MTSamples corpus, available in its entirety from www.mtsamples.com. These 120 documents are located in the Documents folder.
It is important to not do any preprocessing or manipulation of the text in your processing pipeline. NLP-TAB matches documents between systems using checksums, so any modification, however small, will prevent comparisons between systems.
NLP-TAB is developed by the University of Minnesota Institute for Health Informatics NLP/IE Group and the Open Health NLP Consortium.
Funding for this work was provided by:
- 1 R01 LM011364-01 NIH-NLM
- 1 R01 GM102282-01A1 NIH-NIGMS
- U54 RR026066-01A2 NIH-NCRR