News
Detection of Human Edits in Russian Scientific Machine-Generated Texts
Abstract
Large language models (LLMs) are rapidly evolving and increasingly integrated into various aspects of life. The texts generated by these models are becoming increasingly indistinguishable from those written by humans, posing significant challenges in identifying synthetic content. In this work, we explore methods for detecting human edits and corrections in abstracts of scientific papers written in Russian and originally generated by various LLMs. In addition to building a strong encoder-based detection model everaging BERT- and RoBERTa-based architectures with current state-of-the-art techniques, we also focus on analysis of robustness to domain shift, aiming for generalization to LLMs not seen during training. We demonstrate that our approach outperforms LLM few-shot learning baselines even on small datasets, and we investigate in which scenarios the addition of a CRF layer improves metrics and in which it does not.
Keywords
Edition
Proceedings of the Institute for System Programming, vol. 38, issue 3, part 2, 2026, pp. 149-160
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2026-38(3)-26
For citation
Full text of the paper in pdf
Back to the contents of the volume