Subword-level grammatical error correction: a universal approach


Subword-level grammatical error correction: a universal approach

Khabutdinov I.A. (NRU MIPT, Dolgoprudny, Moscow Region, Russia)
Grabovoy А.V. (NRU MIPT, Dolgoprudny, Moscow Region, Russia; ICS RAS, Moscow, Russia)
Chekhovich Yu.V. (ICS RAS, Moscow, Russia)
Kildyakov A.S. (ICS RAS, Moscow, Russia)
Ivakhnenko A.A. (ICS RAS, Moscow, Russia)

Abstract

In this study, we propose a fully automatic methodology for data generation, correction rule vocabulary construction, and Sequence Tagging model training that specifically targets Grammatical Error Correction. Our approach operates at the SentencePiece subword level, using basic transformations – keep, append, replace and delete – that are universally applicable across languages, thereby eliminating the need for grammar-specific operations. By using the Levenshtein algorithm to generate ground truth corrections and editorial prescriptions, we obtained a completely invariant and language-independent dataset generation process. We applied our method to the Sequence Tagging model GECToR and achieved comparable quality results for English with F0.5 scores of 62.4 on the CoNLL-2014 (test set) and 61.9 on the BEA-2019 (test set), without manual rule design or manual annotation of error spans/types. The results indicate that subword-level universal edits can provide a practical alternative to grammar-specific operations, while requiring only parallel correction data.

Keywords

grammar error corrections; neural language processing; transformers; machine learning.

Edition

Proceedings of the Institute for System Programming, vol. 38, issue 3, part 1, 2026, pp. 187-196

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2026-38(3)-11

For citation

Khabutdinov I.A., Grabovoy А.V., Chekhovich Yu.V., Kildyakov A.S., Ivakhnenko A.A. Subword-level grammatical error correction: a universal approach. Proceedings of the Institute for System Programming, vol. 38, issue 3, part 1, 2026, pp. 187-196 DOI: 10.15514/ISPRAS-2026-38(3)-11.

Full text of the paper in pdf Back to the contents of the volume