30 Million+ Model downloads in ๐ค | Cited in NeurIPS('22,'24), IEEE/CVF, ACL | 3K+ โญ๏ธ GitHub.
prithivirajdamodaran / c4_200m-synthetic-dataset-for-grammatical-error-correction Goto Github PK
View Code? Open in Web Editor NEWThis project forked from google-research-datasets/c4_200m-synthetic-dataset-for-grammatical-error-correction
This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)
License: Creative Commons Attribution 4.0 International