References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2020.30.3.018

816

Малоранговые аппроксимации нейросетевых алгоритмов

Low rank approximations for neural networks

0000-0003-2792-7693

Шапошникова

Нина Владимировна

Shaposhnikova

Nina V.

shapninel@gmail.com aff-1

ФГБОУ ВО "Сибирский государственный университет науки и технологии имени академика М.Ф. Решетнева» Federal State Budgetary Educational Institution of Higher Education «Reshetnev Siberian State University of Science and Technology»

01 01 2026

1 1

10.26102/2310-6018/2020.30.3.018

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

На сегодняшний день искусственные нейронные сети (далее ИНС) и глубокое обучение стали практически незаменимыми в приложениях, связанных с задачами машинного зрения, машинного перевода, преобразования речи в текст, рубрикации текстов, обработки видеоданных и т.д. Однако, несмотря на наличие ряда классических теорем, обосновывающих аппроксимирующие способности нейросетевых структур, текущие успехи в области ИНС в большинстве случаев связаны с эвристическим построением архитектуры сети, применимой только для конкретной рассматриваемой задачи. С другой стороны, глубокие ИНС имеют миллионы параметров и требуют для своего функционирования мощные вычислительные устройства, что ограничивает возможности их применения, например, на мобильных устройствах. Существенный прогресс в решении данных проблем может быть получен при использовании современных мощных алгоритмов малоранговых аппроксимаций для параметров слоев ИНС, что позволит как упростить процесс разработки нейросетевой архитектуры, так и получить существенное сжатие и ускорение обучения глубоких ИНС. Рассматривая, например, ядро сверточной ИНС, как четырехмерный массив (тензор), мы можем построить для него малоранговую аппроксимацию с эффективной реализацией его свертки с вектором (прямое распространение сигнала в сети при формировании предсказания) и дифференцирования по параметрам (обратное распространение сигнала в сети при обучении). В данной работе мы рассмотрим современную парадигму машинного обучения и малоранговых тензорных аппроксимаций, и на конкретном модельном численном примере, соответствующем задаче автоматического распознавания рукописных цифр, продемонстрируем перспективы тензоризации глубоких ИНС.

Today, artificial neural networks (hereinafter ANN) and deep learning have become almost indispensable in applications related to the tasks of machine vision, machine translation, speech to text conversion, text rubrication, video processing, etc. However, despite the presence of a number of classical theorems substantiating the approximating capabilities of neural network structures, the current successes in the field of ANNs in most cases are associated with the heuristic construction of the network architecture applicable only for the specific problem under consideration. On the other hand, deep ANNs have millions of parameters and require powerful computing devices for their functioning, which limits the possibilities of their application, for example, on mobile devices. Significant progress in solving these problems can be obtained using modern powerful algorithms of low-rank approximations for the parameters of the ANN layers, which will both simplify the process of developing a neural network architecture and will lead to significant compression and acceleration of the training of deep ANNs. Considering, for example, the core of the convolutional ANN as a four-dimensional array (tensor), we can construct a lowrank approximation for it with the effective implementation of its convolution with the vector (direct signal propagation in the network when generating the prediction) and differentiation with respect to the parameters (back signal propagation in the network when training). In this paper, we will consider the modern paradigm of machine learning and low-rank tensor approximations, and we will demonstrate the prospects for the tensorization of deep ANNs using a specific model numerical example corresponding to the task of automatic recognition of handwritten digits.

машинное обучение нейронная сеть глубокая сверточная сеть малоранговая аппроксимация

machine learning neural network deep convolutional network low rank approximation

Исследование выполнено без спонсорской поддержки.

The study was performed without external funding.

References 1

LeCun Y., Bengio Y., Hinton G. Deep learning. Nature. 2015;521(7553):436-444.

Zhang C., Patras P., Haddadi H. Deep learning in mobile and wireless networking: A survey. IEEE Communications Surveys & Tutorials. 2019;21(3):2224-2287.

Zhao Z., Zheng P., Xu S. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems. 2019;30(11):3212-3232.

Cybenko G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems. 1989;2(4):303–314.

Hornik K. Approximation capabilities of multilayer feedforward networks. Neural Networks. 1991;4(2):251–257.

Cohen N., Sharir O., Shashua, A. On the expressive power of deep learning: a tensor analysis. arXiv preprint. 2015;arXiv:1509.05009.

Cichocki A. Tensor networks for dimensionality reduction and large-scale optimization. Foundations and Trends in Machine Learning. 2016;9.4-5.

Lebedev V. Speeding-up convolutional neural networks using fine-tuned cpdecomposition. arXiv preprint, 2014;arXiv:1412.6553.

Novikov A., Podoprikhin D., Osokin A., Vetrov D. Tensorizing neural networks. In Advances in neural information processing systems. 2015;442-450.

Deng L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine. 2012;29(6):141-142.

Bottou L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010. 2010;177–186.

Rumelhart D., Hinton G., Williams R. Learning representations by back-propagating errors. Nature. 1986;323(6088):533–538.

Grasedyck L., Kressner D., Tobler C. A literature survey of low‐rank tensor approximation techniques. GAMM‐Mitteilungen. 2013:36(1):53-78.

Harshman R. Foundations of the Parafac procedure: Models and conditions for an explanatory multimodal factor analysis. UCLA Working Papers in Phonetics. 1970;1–84.

Tucker L. Some mathematical notes on three-mode factor analysis. Psychometrika. 1966;31:279–311.

PyTorch, фреймворк машинного обучения [Электронный ресурс]. – Режим доступа: https://pytorch.org – Дата доступа: 10.06.2020

Colab, интерактивная облачная среда [Электронный ресурс]. – Режим доступа: https://colab.research.google.com – Дата доступа: 10.06.2020

The authors declare that there are no conflicts of interest present.