Anomaly detection in computer system logs using semi-supervised learning and natural language processing

News

02 August, 2019 OS DAY-2019. Cooperation among operating platform developers and the security of Russian software

10 April, 2019 Ivannikov Memorial Workshop has been supported by IEEE

14 March, 2019 The annual Ivannikov Memorial Workshop will take place on 13-14 September 2019

Anomaly detection in computer system logs using semi-supervised learning and natural language processing

Kiriachek V.A. (RUDN, Moscow, Russia)
Salpagarov S.I. (RUDN, Moscow, Russia)

Abstract

The detection of anomalies in computer system logs is crucial for maintaining reliable technological infrastructures. This study introduces a novel approach combining Semi-supervised learning with Natural Language Processing to analyze log files for early identification of potential system failures. The methodology employs a specialized log parser based on semantic graphs alongside context-independent embedding models for text vectorization, focusing on collective rather than point anomalies. Experiments were conducted on both the public HDFS dataset and a proprietary Vertica database dataset containing over 830 million logs. Results demonstrate that the obtained solution based on autoencoders with convolutional layers can effectively detect system anomalies when paired with appropriate preprocessing techniques. The approach achieved impressive performance metrics on the HDFS dataset, particularly when using TF-IDF token weighting, with a Fault Detection Rate of 0.982 and ROC AUC of 0.811. Additionally, testing on the Vertica dataset successfully identified anomalous periods preceding system failures. The findings indicate that predictive maintenance approaches traditionally applied to technical equipment can be successfully adapted for computer systems, enabling proactive intervention before critical failures occur and potentially reducing the significant costs associated with system downtime.

Keywords

anomaly detection; log analysis; semi-supervised learning; natural language processing; predictive maintenance; TF-IDF vectorization.

Edition

Proceedings of the Institute for System Programming, vol. 38, issue 3, part 2, 2026, pp. 133-148

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2026-38(3)-25

For citation

Kiriachek V.A., Salpagarov S.I. Anomaly detection in computer system logs using semi-supervised learning and natural language processing. Proceedings of the Institute for System Programming, vol. 38, issue 3, part 2, 2026, pp. 133-148 DOI: 10.15514/ISPRAS-2026-38(3)-25.

Full text of the paper in pdf

Back to the contents of the volume

На нашем сайте мы используем cookie файлы, содержащие информацию о предыдущих посещениях веб-сайта. Данные обрабатываются для улучшения качества работы нашего веб-сайта. Если вы не хотите использовать cookie файлы, измените настройки браузера.

Понятно