News
Subword-level grammatical error correction: a universal approach
Abstract
In this study, we propose a fully automatic methodology for data generation, correction rule vocabulary construction, and Sequence Tagging model training that specifically targets Grammatical Error Correction. Our approach operates at the SentencePiece subword level, using basic transformations – keep, append, replace and delete – that are universally applicable across languages, thereby eliminating the need for grammar-specific operations. By using the Levenshtein algorithm to generate ground truth corrections and editorial prescriptions, we obtained a completely invariant and language-independent dataset generation process. We applied our method to the Sequence Tagging model GECToR and achieved comparable quality results for English with F0.5 scores of 62.4 on the CoNLL-2014 (test set) and 61.9 on the BEA-2019 (test set), without manual rule design or manual annotation of error spans/types. The results indicate that subword-level universal edits can provide a practical alternative to grammar-specific operations, while requiring only parallel correction data.
Keywords
Edition
Proceedings of the Institute for System Programming, vol. 38, issue 3, part 1, 2026, pp. 187-196
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2026-38(3)-11
For citation
Full text of the paper in pdf
Back to the contents of the volume