Keywords: regression model, least squares method, selection of informative regressors, partial boolean linear programming problem, standardized regression, correlation coefficient, determination criterion
REDUCTION THE PROBLEM OF SELECTING INFORMATIVE REGRESSORS WHEN ESTIMATING A LINEAR REGRESSION MODEL BY THE METHOD OF LEAST SQUARES TO THE PROBLEM OF PARTIAL-BOOLEAN LINEAR PROGRAMMING
UDC 519.862.6
DOI:
One of the main problems in regression analysis is the problem of choosing the structural specification of the regression model, i.e. the choice of the composition of variables and the mathematical form of the connection between them. In the case of a linear regression model, this task is reduced only to the selection of the most informative regressors. The exact solution of the problem of selecting informative regressors when evaluating linear regression using the least squares method can be obtained either by a complete search algorithm or by introducing Boolean variables into consideration and then solving a very complicated computational problem of partial Boolean quadratic programming. In this paper, the problem of selecting informative regressors in a linear regression estimated using the least squares method is reduced to the problem of partial-boolean linear programming, the solution of which does not cause any difficulties when using the corresponding software packages. The new formulation of the problem assumes the preliminary normalization of all variables for estimating the unknown parameters of the linear regression model in order to find the beta coefficients of the standardized regression. Beta coefficients are determined from the known intercorrelation matrix and the correlation vector between the dependent variable and the independent factors. To assess the adequacy of linear regression, the determination coefficient is applied.
1. Miller A.J. Subset selection in regression. Chapman & Hall/CRC, 2002. 247 p.
2. Burnham, K.P. Model selection and multimodel inference: a practical information theoretic approach / K.P. Burnham, D.R. Anderson. – Springer, 2002. – P. 515.
3. Seber Dzh. Linejnyj regressionnyj analiz. Moscow, Izdatel'stvo «Mir», 1980. 456 p. (in Russian)
4. Strizhov V.V., Krymova E.A. Metody vybora regressionnyh modelej. Moscow, Vychislitel'nyj centr RAN, 2010. 60 p. (in Russian)
5. Liu, H. Computational methods of feature selection / H. Liu, H. Motoda. – Chapman and Hall/CRC, 2007. – 419 p
6. Guyon, I. An introduction to variable and feature selection / I. Guyon, A. Elisseeff // Journal of machine learning research, 2003. – Vol. 3. – Pp. 1157- 1182.
7. Ivahnenko A.G. Induktivnyj metod samoorganizacii modelej slozhnyh system. Kiev, Naukova dumka, 1981. 296 p. (in Russian)
8. Konno, H. Choosing the best set of variables in regression analysis using integer programming / H. Konno, R. Yamamoto // Journal of Global Optimization, 2009. Vol. 44, no. 2, pp. 272-282.
9. Park, Y.W. Subset selection for multiple linear regression via optimization / Y.W. Park, D. Klabjan // Technical report, 2013. Available from http://www.klabjan.dynresmanagement.com.
10. Tamura, R. Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor / R. Tamura, K. Kobayashi, Y. Takano, R. Miyashiro, K. Nakata, T. Matsui // Optimization online, 2016. Available from http://www.optimizationonline.org/DB_HTML/2016/09/5655.html.
11. Chung, S. A mathematical programming approach for integrated multiple linear regression subset selection and validation / S. Chung, Y.W. Park, T. Cheong. arXiv.org, 2017. Available from https://arxiv.org/abs/1712.04543.
12. Miyashiro, R. Mixed integer second-order cone programming formulations for variable selection / R. Miyashiro, Y. Takano // Technical Report, 2013. Available from http://www.me.titech.ac.jp/technicalreport/h25/2013-7.pdf.
13. Miyashiro, R. Subset selection by Mallows’ Cp: a mixed integer programming approach / R. Miyashiro, Y. Takano. Technical report, 2014. Available from http://www.me.titech.ac.jp/technicalreport/h26/2014-1.pdf.
14. Noskov S.I. Tehnologija modelirovanija ob#ektov s nestabil'nym funkcionirovaniem i neopredelennost'ju v dannyh. Irkutsk: RIC GP «Oblinformpechat'», 1996. 321 p. (in Russian)
15. Eliseeva I.I., Kurysheva S.V., Kosteeva T.V. Jekonometrika. Moscow, Finansy i statistika, 2007. 576 p. (in Russian)
16. Fjorster Je, Rjonc B. Metody korreljacionnogo i regressionnogo analiza. Moscow, Finansy i statistika, 1983. 303 p. (in Russian)
Keywords: regression model, least squares method, selection of informative regressors, partial boolean linear programming problem, standardized regression, correlation coefficient, determination criterion
For citation: Bazilevsky M.P. REDUCTION THE PROBLEM OF SELECTING INFORMATIVE REGRESSORS WHEN ESTIMATING A LINEAR REGRESSION MODEL BY THE METHOD OF LEAST SQUARES TO THE PROBLEM OF PARTIAL-BOOLEAN LINEAR PROGRAMMING. Modeling, Optimization and Information Technology. 2018;6(1). URL: https://moit.vivt.ru/wp-content/uploads/2018/01/Bazilevskiy_1_1_18.pdf DOI: (In Russ).
Published 31.03.2018