Detection of Human Edits in Russian Scientific Machine-Generated Texts

News

02 August, 2019 OS DAY-2019. Cooperation among operating platform developers and the security of Russian software

10 April, 2019 Ivannikov Memorial Workshop has been supported by IEEE

14 March, 2019 The annual Ivannikov Memorial Workshop will take place on 13-14 September 2019

Detection of Human Edits in Russian Scientific Machine-Generated Texts

Malykh V.A. (ITMO, St. Petersburg, Russia)
Dorosh M. (ITMO, St. Petersburg, Russia)

Abstract

Large language models (LLMs) are rapidly evolving and increasingly integrated into various aspects of life. The texts generated by these models are becoming increasingly indistinguishable from those written by humans, posing significant challenges in identifying synthetic content. In this work, we explore methods for detecting human edits and corrections in abstracts of scientific papers written in Russian and originally generated by various LLMs. In addition to building a strong encoder-based detection model everaging BERT- and RoBERTa-based architectures with current state-of-the-art techniques, we also focus on analysis of robustness to domain shift, aiming for generalization to LLMs not seen during training. We demonstrate that our approach outperforms LLM few-shot learning baselines even on small datasets, and we investigate in which scenarios the addition of a CRF layer improves metrics and in which it does not.

Keywords

large language models; AI content detection; domain generalization.

Edition

Proceedings of the Institute for System Programming, vol. 38, issue 3, part 2, 2026, pp. 149-160

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2026-38(3)-26

For citation

Malykh V.A., Dorosh M. Detection of Human Edits in Russian Scientific Machine-Generated Texts. Proceedings of the Institute for System Programming, vol. 38, issue 3, part 2, 2026, pp. 149-160 DOI: 10.15514/ISPRAS-2026-38(3)-26.

Full text of the paper in pdf

Back to the contents of the volume

На нашем сайте мы используем cookie файлы, содержащие информацию о предыдущих посещениях веб-сайта. Данные обрабатываются для улучшения качества работы нашего веб-сайта. Если вы не хотите использовать cookie файлы, измените настройки браузера.

Понятно