Keywords: malware, machine learning, anti-virus neural network, neural network training, keras, ember, dropou
Malware detection system based on machine learning technology
UDC 004.85:004.056.57
DOI: 10.26102/2310-6018/2020.30.3.042
The continuous growth in the number of malicious programs makes the task of their detection urgent: classifying programs into malicious and safe. In this regard, this study is devoted to the development of a malware detection system based on machine learning, namely, training an artificial neural network with a teacher. In the course of the study, we analyzed the structure of Portable Executable files of the Windows operating system, selected characteristics from PE-files to form a training set, and also selected and substantiated the topology (four-level perceptron) and parameters of the antivirus neural network. The Keras library was used to create and train the model. The Ember dataset of safe and malicious software was used to form the training set. We have trained and verified the adequacy of training for the developed malicious code recognition model. The training results of the anti-virus neural network proposed in the study showed a high accuracy of malware detection and the absence of the overtraining effect, which indicates good prospects for using the model. Although the experimental model of a neural network is not able to fully replace the anti-virus scanners, the materials of the article are of practical value for the tasks of classifying programs into malicious and safe.
1. Statistics. Kaspersky Securelist. Available from: https://statistics.securelist.com/ (Accessed 30th August 2020).
2. Nazarov A.V., Marenkov A.N., Kaliev A.B. Detection of cryptographic viruses behavior signs in the work of the computer system. Caspian Journal: Management and high technologies. 2018;1(41):196-204. (In Russ)
3. Saxe J., Berlin K. Deep Neural Network Based Malware Detection Using Two-Dimensional Binary Program Features. Proceedings of 10th International Conference on Malicious and Unwanted Software (MALWARE). 2015. Available from: https://arxiv.org/pdf/1508.03096v2.pdf DOI: 10.1109/MALWARE.2015.7413680 (Accessed 29th August 2020).
4. Pidchenko I.A., Vybornova O.N. Application of machine learning in together with heuristic analysis for anti-virus scanning tasks. Matematicheskie metody v tehnike i tehnologijah – MMTT. 2020;5:96-99. (In Russ)
5. PE Format. Available from: https://docs.microsoft.com/en-us/windows/win32/debug/peformat (Accessed 29th August 2020).
6. Binary crossentropy. Peltarion. Available from: https://peltarion.com/knowledgecenter/documentation/modeling-view/build-an-ai-model/loss-functions/binarycrossentropy (Accessed 30th August 2020).
7. Ember Dataset. Available from: https://github.com/endgameinc/ember (Accessed 10th September 2020).
8. Library to Instrument Executable Formats. Available from: https://lief.quarkslab.com (accessed 10th September 2020).
9. Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research. 2014;15:1929-1958. Available from: https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf (Accessed 29th August 2020)
10. Ergo. Available from: https://github.com/evilsocket/ergo (Accessed 10th September 2020)
Keywords: malware, machine learning, anti-virus neural network, neural network training, keras, ember, dropou
For citation: Vybornova O.N., Pidchenko I.A. Malware detection system based on machine learning technology. Modeling, Optimization and Information Technology. 2020;8(3). URL: https://moit.vivt.ru/wp-content/uploads/2020/08/VybornovaPidchenko_3_20_1.pdf DOI: 10.26102/2310-6018/2020.30.3.042 (In Russ).
Published 30.09.2020