Keywords: regression model, ordinary least squares, multicollinearity, subset selection in regression, task of mixed 0-1 integer linear programming
SUBSET SELECTION IN REGRESSION MODELS WITH CONSIDERING MULTICOLLINEARITY AS A TASK OF MIXED 0-1 INTEGER LINEAR PROGRAMMING
UDC 519.862.6
DOI:
The article is devoted to the problem of subset selection in linear regression model, the exact solution of which guarantees either a full search of all possible regressions or a solution of a specially formulated mathematical programming problem with Boolean variables. Often the problem of subset selection is solved using only one criterion of adequacy, for example, only model errors are minimized. But in the case of estimating regression using ordinary least squares, it is necessary to strive not only to increase the quality of the approximation, but also to observe the conditions of the Gauss-Markov theorem, one of which is the absence of a linear dependence between the explanatory variables. If this condition is not satisfied, then it is said that multicollinearity takes place. Thus, when selecting informative regressors, it is expedient to solve the two-criteria problem - to strive to maximize the quality of approximation and at the same time minimize the multicollinearity between explanatory variables. Since there are no exact quantitative criteria for determining the presence / absence of multicollinearity, in this paper, based on the wellknown recommendation, a criterion for the upper bound of multicollinearity is formulated. Using this criterion, four possible statements of the two-criteria problem of subset selection are proposed, each of which is reduced to task of mixed 0-1 integer linear programming. To demonstrate the proposed mathematical apparatus, a trial version of a specialized software package was developed, with the help of which the task of modeling the freight turnover of the Krasnoyarsk railroad was solved.
1. Jekonometrika / Eliseeva I.I., Kurysheva S.V., Kosteeva T.V. Moscow, Finansy i statistika, 2007. 576 p. (in Russian)
2. Miller A.J. Subset selection in regression / A.J. Miller. – Chapman & Hall/CRC, 2002. – p. 247.
3. Noskov S.I. Tehnologija modelirovanija ob’ektov s nestabil'nym funkcionirovaniem i neopredelennost'ju v dannyh. Irkutsk: RIC GP «Oblinformpechat'», 1996. 321 p. (in Russian)
4. Ajvazjan S.A. Metody jekonometriki / S.A. Ajvazjan. Moscow : Magistr : INFRA-M, 2010. 512 p. (in Russian)
5. Kremer N.Sh. Jekonometrika / N.Sh. Kremer, B.A. Putko. Moscow : JuNITIDANA, 2002. 311 p. (in Russian)
6. Konno H. Choosing the best set of variables in regression analysis using integer programming / H. Konno, R. Yamamoto // Journal of Global Optimization, 2009. Vol. 44, no. 2, pp. 272-282.
7. Park Y.W. Subset selection for multiple linear regression via optimization / Y.W. Park, D. Klabjan // Technical report, 2013. Available from http://www.klabjan.dynresmanagement.com
8. Chung, S. A mathematical programming approach for integrated multiple linear regression subset selection and validation / S. Chung, Y.W. Park, T. Cheong. arXiv.org, 2017. Available from https://arxiv.org/abs/1712.04543.
9. Best subset selection for eliminating multicollinearity / R. Tamura, K. Kobayashi, Y. Takano, R. Miyashiro, K. Nakata, T. Matsui // Journal of the Operations Research Society of Japan. Vol. 60, No. 3, 2017, pp. 321-336.
10. Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor / R. Tamura, K. Kobayashi, Y. Takano, R. Miyashiro, K. Nakata, T. Matsui // Optimization online, 2016. Available from http://www.optimizationonline.org/DB_HTML/2016/09/5655.html.
11. Bazilevskij M.P. Svedenie zadachi otbora informativnyh regressorov pri ocenivanii linejnoj regressionnoj modeli po metodu naimen'shih kvadratov k zadache chastichno-bulevogo linejnogo programmirovanija // Modelirovanie, optimizacija i informacionnye tehnologii. Voronezh, 2018. Vol. 6, no. 1. Available from https://moit.vivt.ru/wpcontent/uploads/2018/01/Bazilevskiy_1_1_18.pdf. (in Russian)
12. Professional'nyj informacionno-analiticheskij resurs, po-svjashhennyj mashinnomu obucheniju, raspoznavaniju obrazov i intellektual'nomu analizu dannyh. Available from http://www.machinelearning.ru/wiki/index.php?title=Фактор_инфляции_ре грессии. (in Russian)
13. Srednesrochnoe prognozirovanie jekspluatacionnyh pokazatelej funkcionirovanija Krasnojarskoj zheleznoj dorogi / M.P. Bazilevskij, I.P. Vrublevskij, S.I. Noskov, I.S. Jakovchuk // Fundamental'nye issledovanija. 2016. Vol. 10, no. 3, pp. 471-476. (in Russian)
Keywords: regression model, ordinary least squares, multicollinearity, subset selection in regression, task of mixed 0-1 integer linear programming
For citation: Bazilevsky M.P. SUBSET SELECTION IN REGRESSION MODELS WITH CONSIDERING MULTICOLLINEARITY AS A TASK OF MIXED 0-1 INTEGER LINEAR PROGRAMMING. Modeling, Optimization and Information Technology. 2018;6(2). URL: https://moit.vivt.ru/wp-content/uploads/2018/04/Bazilevskiy_2_18_1.pdf DOI: (In Russ).
Published 30.06.2018