СВЕДЕНИЕ ЗАДАЧИ ОТБОРА ИНФОРМАТИВНЫХ РЕГРЕССОРОВ ПРИ ОЦЕНИВАНИИ ЛИНЕЙНОЙ РЕГРЕССИОННОЙ МОДЕЛИ ПО МЕТОДУ НАИМЕНЬШИХ КВАДРАТОВ К ЗАДАЧЕ ЧАСТИЧНО-БУЛЕВОГО ЛИНЕЙНОГО ПРОГРАММИРОВАНИЯ
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

REDUCTION THE PROBLEM OF SELECTING INFORMATIVE REGRESSORS WHEN ESTIMATING A LINEAR REGRESSION MODEL BY THE METHOD OF LEAST SQUARES TO THE PROBLEM OF PARTIAL-BOOLEAN LINEAR PROGRAMMING

Bazilevsky M.P.  

UDC 519.862.6
DOI:

  • Abstract
  • List of references
  • About authors

One of the main problems in regression analysis is the problem of choosing the structural specification of the regression model, i.e. the choice of the composition of variables and the mathematical form of the connection between them. In the case of a linear regression model, this task is reduced only to the selection of the most informative regressors. The exact solution of the problem of selecting informative regressors when evaluating linear regression using the least squares method can be obtained either by a complete search algorithm or by introducing Boolean variables into consideration and then solving a very complicated computational problem of partial Boolean quadratic programming. In this paper, the problem of selecting informative regressors in a linear regression estimated using the least squares method is reduced to the problem of partial-boolean linear programming, the solution of which does not cause any difficulties when using the corresponding software packages. The new formulation of the problem assumes the preliminary normalization of all variables for estimating the unknown parameters of the linear regression model in order to find the beta coefficients of the standardized regression. Beta coefficients are determined from the known intercorrelation matrix and the correlation vector between the dependent variable and the independent factors. To assess the adequacy of linear regression, the determination coefficient is applied.

1. Miller A.J. Subset selection in regression. Chapman & Hall/CRC, 2002. 247 p.

2. Burnham, K.P. Model selection and multimodel inference: a practical information theoretic approach / K.P. Burnham, D.R. Anderson. – Springer, 2002. – P. 515.

3. Seber Dzh. Linejnyj regressionnyj analiz. Moscow, Izdatel'stvo «Mir», 1980. 456 p. (in Russian)

4. Strizhov V.V., Krymova E.A. Metody vybora regressionnyh modelej. Moscow, Vychislitel'nyj centr RAN, 2010. 60 p. (in Russian)

5. Liu, H. Computational methods of feature selection / H. Liu, H. Motoda. – Chapman and Hall/CRC, 2007. – 419 p

6. Guyon, I. An introduction to variable and feature selection / I. Guyon, A. Elisseeff // Journal of machine learning research, 2003. – Vol. 3. – Pp. 1157- 1182.

7. Ivahnenko A.G. Induktivnyj metod samoorganizacii modelej slozhnyh system. Kiev, Naukova dumka, 1981. 296 p. (in Russian)

8. Konno, H. Choosing the best set of variables in regression analysis using integer programming / H. Konno, R. Yamamoto // Journal of Global Optimization, 2009. Vol. 44, no. 2, pp. 272-282.

9. Park, Y.W. Subset selection for multiple linear regression via optimization / Y.W. Park, D. Klabjan // Technical report, 2013. Available from http://www.klabjan.dynresmanagement.com.

10. Tamura, R. Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor / R. Tamura, K. Kobayashi, Y. Takano, R. Miyashiro, K. Nakata, T. Matsui // Optimization online, 2016. Available from http://www.optimizationonline.org/DB_HTML/2016/09/5655.html.

11. Chung, S. A mathematical programming approach for integrated multiple linear regression subset selection and validation / S. Chung, Y.W. Park, T. Cheong. arXiv.org, 2017. Available from https://arxiv.org/abs/1712.04543.

12. Miyashiro, R. Mixed integer second-order cone programming formulations for variable selection / R. Miyashiro, Y. Takano // Technical Report, 2013. Available from http://www.me.titech.ac.jp/technicalreport/h25/2013-7.pdf.

13. Miyashiro, R. Subset selection by Mallows’ Cp: a mixed integer programming approach / R. Miyashiro, Y. Takano. Technical report, 2014. Available from http://www.me.titech.ac.jp/technicalreport/h26/2014-1.pdf.

14. Noskov S.I. Tehnologija modelirovanija ob#ektov s nestabil'nym funkcionirovaniem i neopredelennost'ju v dannyh. Irkutsk: RIC GP «Oblinformpechat'», 1996. 321 p. (in Russian)

15. Eliseeva I.I., Kurysheva S.V., Kosteeva T.V. Jekonometrika. Moscow, Finansy i statistika, 2007. 576 p. (in Russian)

16. Fjorster Je, Rjonc B. Metody korreljacionnogo i regressionnogo analiza. Moscow, Finansy i statistika, 1983. 303 p. (in Russian)

Bazilevsky Mikhail Pavlovich
Candidate of Technical Sciences
Email: mik2178@yandex.ru

Irkutsk State Transport University

Irkutsk, Russian Federation

Keywords: regression model, least squares method, selection of informative regressors, partial boolean linear programming problem, standardized regression, correlation coefficient, determination criterion

For citation: Bazilevsky M.P. REDUCTION THE PROBLEM OF SELECTING INFORMATIVE REGRESSORS WHEN ESTIMATING A LINEAR REGRESSION MODEL BY THE METHOD OF LEAST SQUARES TO THE PROBLEM OF PARTIAL-BOOLEAN LINEAR PROGRAMMING. Modeling, Optimization and Information Technology. 2018;6(1). Available from: https://moit.vivt.ru/wp-content/uploads/2018/01/Bazilevskiy_1_1_18.pdf DOI: (In Russ).

497

Full text in PDF