The home repository of the NerKor corpus, a Hungarian gold standard named entity annotated corpus containing 1 million tokens.
Corpus files are under the data folder. They are grouped by genre: fiction, legal, news, web, wikipedia.
genre | morph/no-morph | token number |
---|---|---|
fiction | morph | 0 |
no-morph | 203216 | |
sum | 203216 | |
legal | morph | 0 |
no-morph | 202195 | |
sum | 202195 | |
news | morph | 9178 |
no-morph | 204478 | |
sum | 213656 | |
web | morph | 187232 |
no-morph | 0 | |
sum | 187232 | |
wikipedia | morph | 26764 |
no-morph | 194033 | |
sum | 220797 | |
altogether | morph | 223174 |
no-morph | 803922 | |
sum | 1027096 |
Annotation guidelines and Annotation scheme are available in the Guidelines folder. (Only in Hungarian.)