Applying language models to automatically check students’ open-ended answers


Applying language models to automatically check students’ open-ended answers

Kopnin V.N. (YSU, Yaroslavl, Russia)
Lagutina K.V. (YSU, Yaroslavl, Russia)
Poletaev A.Y. (YSU, Yaroslavl, Russia)
Lagutina N.S. (YSU, Yaroslavl, Russia)

Abstract

Automatic grading of short open-ended student answers simplifies teachers’ work and allows for quick and effective assessment. The goal of this study is to compare methods for classifying Russian-language short answers depending on the assessment. The authors analyzed the application of neural network language models and machine learning methods. Evaluation is based on a reference answer. The student’ answer is categorized in two classes: correct/incorrect, or three: correct/partially correct/incorrect. For the experiments, the authors collected four corpora of answers to questions from various disciplines and subject areas: a corpus of general questions on IT disciplines and higher mathematics, a corpus of questions on databases, a corpus of questions on history, and a corpus of questions on Qt development. During the experiments with these texts, 11 pre-trained language models, 2 training methods, 2 methods of splitting training and test sets, and 7 classifiers were compared to analyze various methods of vector representation and classification of Russian-language texts. An analysis of binary classification results revealed that there is no dominant model + classifier pair that consistently outperforms others across all corpora. BERT models in combination with a centroid classifier, logistic regression, or multilayer perceptron demonstrated the F-measure greater than 0.9. For ternary classification, the best combinations were rugpt3m, MiniLM-L12, and rubert-tiny2 models in combination with categorical boosting and a centroid classifier, with the F-measure of 0.58. Augmentation based on rules for recombination of real data helped to improve the F-measure to 0.96 for binary classification and to 0.91 for ternary classification. Error analysis revealed that the main difficulty is separating completely correct answers from partially correct ones. Based on the experimental results, a software system for conducting assessments among students was developed and published.

Keywords

natural language processing; assessing students' answers; text classification; neural network language models; artificial intelligence in education.

Edition

Proceedings of the Institute for System Programming, vol. 38, issue 3, part 2, 2026, pp. 197-214

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2026-38(3)-30

For citation

Kopnin V.N., Lagutina K.V., Poletaev A.Y., Lagutina N.S. Applying language models to automatically check students’ open-ended answers. Proceedings of the Institute for System Programming, vol. 38, issue 3, part 2, 2026, pp. 197-214 DOI: 10.15514/ISPRAS-2026-38(3)-30.

Full text of the paper in pdf (in Russian) Back to the contents of the volume