ОТБОР ИНФОРМАТИВНЫХ РЕГРЕССОРОВ С УЧЕТОМ МУЛЬТИКОЛЛИНЕАРНОСТИ МЕЖДУ НИМИ В РЕГРЕССИОННЫХ МОДЕЛЯХ КАК ЗАДАЧА ЧАСТИЧНО-БУЛЕВОГО ЛИНЕЙНОГО ПРОГРАММИРОВАНИЯ
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

SUBSET SELECTION IN REGRESSION MODELS WITH CONSIDERING MULTICOLLINEARITY AS A TASK OF MIXED 0-1 INTEGER LINEAR PROGRAMMING

Bazilevsky M.P.  

UDC 519.862.6
DOI:

  • Abstract
  • List of references
  • About authors

The article is devoted to the problem of subset selection in linear regression model, the exact solution of which guarantees either a full search of all possible regressions or a solution of a specially formulated mathematical programming problem with Boolean variables. Often the problem of subset selection is solved using only one criterion of adequacy, for example, only model errors are minimized. But in the case of estimating regression using ordinary least squares, it is necessary to strive not only to increase the quality of the approximation, but also to observe the conditions of the Gauss-Markov theorem, one of which is the absence of a linear dependence between the explanatory variables. If this condition is not satisfied, then it is said that multicollinearity takes place. Thus, when selecting informative regressors, it is expedient to solve the two-criteria problem - to strive to maximize the quality of approximation and at the same time minimize the multicollinearity between explanatory variables. Since there are no exact quantitative criteria for determining the presence / absence of multicollinearity, in this paper, based on the wellknown recommendation, a criterion for the upper bound of multicollinearity is formulated. Using this criterion, four possible statements of the two-criteria problem of subset selection are proposed, each of which is reduced to task of mixed 0-1 integer linear programming. To demonstrate the proposed mathematical apparatus, a trial version of a specialized software package was developed, with the help of which the task of modeling the freight turnover of the Krasnoyarsk railroad was solved.

1. Jekonometrika / Eliseeva I.I., Kurysheva S.V., Kosteeva T.V. Moscow, Finansy i statistika, 2007. 576 p. (in Russian)

2. Miller A.J. Subset selection in regression / A.J. Miller. – Chapman & Hall/CRC, 2002. – p. 247.

3. Noskov S.I. Tehnologija modelirovanija ob’ektov s nestabil'nym funkcionirovaniem i neopredelennost'ju v dannyh. Irkutsk: RIC GP «Oblinformpechat'», 1996. 321 p. (in Russian)

4. Ajvazjan S.A. Metody jekonometriki / S.A. Ajvazjan. Moscow : Magistr : INFRA-M, 2010. 512 p. (in Russian)

5. Kremer N.Sh. Jekonometrika / N.Sh. Kremer, B.A. Putko. Moscow : JuNITIDANA, 2002. 311 p. (in Russian)

6. Konno H. Choosing the best set of variables in regression analysis using integer programming / H. Konno, R. Yamamoto // Journal of Global Optimization, 2009. Vol. 44, no. 2, pp. 272-282.

7. Park Y.W. Subset selection for multiple linear regression via optimization / Y.W. Park, D. Klabjan // Technical report, 2013. Available from http://www.klabjan.dynresmanagement.com

8. Chung, S. A mathematical programming approach for integrated multiple linear regression subset selection and validation / S. Chung, Y.W. Park, T. Cheong. arXiv.org, 2017. Available from https://arxiv.org/abs/1712.04543.

9. Best subset selection for eliminating multicollinearity / R. Tamura, K. Kobayashi, Y. Takano, R. Miyashiro, K. Nakata, T. Matsui // Journal of the Operations Research Society of Japan. Vol. 60, No. 3, 2017, pp. 321-336.

10. Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor / R. Tamura, K. Kobayashi, Y. Takano, R. Miyashiro, K. Nakata, T. Matsui // Optimization online, 2016. Available from http://www.optimizationonline.org/DB_HTML/2016/09/5655.html.

11. Bazilevskij M.P. Svedenie zadachi otbora informativnyh regressorov pri ocenivanii linejnoj regressionnoj modeli po metodu naimen'shih kvadratov k zadache chastichno-bulevogo linejnogo programmirovanija // Modelirovanie, optimizacija i informacionnye tehnologii. Voronezh, 2018. Vol. 6, no. 1. Available from https://moit.vivt.ru/wpcontent/uploads/2018/01/Bazilevskiy_1_1_18.pdf. (in Russian)

12. Professional'nyj informacionno-analiticheskij resurs, po-svjashhennyj mashinnomu obucheniju, raspoznavaniju obrazov i intellektual'nomu analizu dannyh. Available from http://www.machinelearning.ru/wiki/index.php?title=Фактор_инфляции_ре грессии. (in Russian)

13. Srednesrochnoe prognozirovanie jekspluatacionnyh pokazatelej funkcionirovanija Krasnojarskoj zheleznoj dorogi / M.P. Bazilevskij, I.P. Vrublevskij, S.I. Noskov, I.S. Jakovchuk // Fundamental'nye issledovanija. 2016. Vol. 10, no. 3, pp. 471-476. (in Russian)

Bazilevsky Mikhail Pavlovich
Candidate of Technical Sciences
Email: mik2178@yandex.ru

Irkutsk State Transport University

Irkutsk, Russian Federation

Keywords: regression model, ordinary least squares, multicollinearity, subset selection in regression, task of mixed 0-1 integer linear programming

For citation: Bazilevsky M.P. SUBSET SELECTION IN REGRESSION MODELS WITH CONSIDERING MULTICOLLINEARITY AS A TASK OF MIXED 0-1 INTEGER LINEAR PROGRAMMING. Modeling, Optimization and Information Technology. 2018;6(2). Available from: https://moit.vivt.ru/wp-content/uploads/2018/04/Bazilevskiy_2_18_1.pdf DOI: (In Russ).

504

Full text in PDF