Методы отбора признаков в задаче определения авторства в контексте кибербезопасности
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
issn 2310-6018

Feature selection methods for authorship attribution in cybersecurity context

idRomanov A.S.

UDC 004.89
DOI: 10.26102/2310-6018/2024.44.1.001

This paper considers methods for authorship attribution of natural-language and artificially generated texts, which are important in the context of cybersecurity and intellectual property protection to prevent misinformation and fraud. The use of authorship methods is justified by the findings on the fastText and support vector method (SVM) effectiveness discussed in past studies. The feature selection algorithm is chosen based on the comparison of five different methods: genetic algorithm, forward and backward sequential methods, regularization selection and Shapley's method. The considered selection algorithms include heuristic methods, game theory elements and iterative algorithms. The regularisation-based algorithm is found to be the most efficient method, while methods based on complete brute-force selection are found to be inefficient for any set of authors. The regularization-based and SVM-based selection accuracy averaged 77 %, outperforming the other methods by between 3 and 10 % for an identical number of features. For the same tasks, the average accuracy of fastText is 84 %. A study was conducted to examine the robustness of the developed approach to generative samples. SVM proved to be more robust to model confounding. The maximum loss of accuracy for fastText was 16 % and for SVM was 12 %.

18. New frequency dictionary of Russian vocabulary. URL: http://dict.ruslang.ru/freq.php (accessed on 04.12.2023). (In Russ.).

Romanov Aleksandr Sergeevich
Сandidate of Engineering Sciences Associate Professor


Tomsk State University of Control Systems and Radioelectronics

Tomsk, the Russian Federation

Keywords: feature selection, authorship attribution, machine learning, neural networks, text analysis, information security

For citation: Romanov A.S. Feature selection methods for authorship attribution in cybersecurity context. Modeling, Optimization and Information Technology. 2024;12(1). Available from: https://moitvivt.ru/ru/journal/pdf?id=1489 DOI: 10.26102/2310-6018/2024.44.1.001 (In Russ).


Full text in PDF

Received 06.12.2023

Revised 20.12.2023

Accepted 16.01.2024

Published 18.01.2024