Keywords: data preparation, decision-making support, data mining, peer reviewer, scientific publication
Method of preparing data on scientific publications for intelligent decision-making support in evaluating expertise of peer reviewers
UDC 004.622, 519.816
DOI: 10.26102/2310-6018/2024.47.4.026
One of the main factors in assigning a peer reviewer is his expertise on the manuscript topic (the existence of the relevant publicatios). Decision-making support, based on the usage of mining scientometric base data on scientific publications, speeds up the process of evaluating the expertise of peer reviewers and makes it less time-consuming. However, the critical point in this case is the correctness of the data on scientific publications subject to intellectual analysis. At present, researchers actively deal with the question of defining the scientometric base data correctness and means of ensuring it, conducting different procedures of cleaning within data preparation. Yet in the existing works, the specifics of the task, for which data on scientific publications are gathered, is not taken into account. To address the problem, a method of preparing data on scientific publications for intelligent decision-making support in evaluating expertise of peer reviewers, considering features associated with the need to define the semantic similarity of text of data on publications, is suggested in the paper. The method was successfully tested when preparing data on scientific publications of members of the academic journal “Systems Engineering and Information Technologies” editorial board, involving the content of their profiles in scientometric bases “RISC” and “Google Scholar”.
1. Sharifyanov N., Latypova V. A Method of Filling Missing Values in Data using Data Mining. In: 2023 IX International Conference on Information Technology and Nanotechnology (ITNT), 17–21 April 2023, Samara, Russian Federation. IEEE; 2023. pp. 1–5. https://doi.org/10.1109/ITNT57377.2023.10139280
2. Okafor N.U., Delaney D.T. Missing Data Imputation on IoT Sensor Networks: Implications for on-Site Sensor Calibration. IEEE Sensors Journal. 2021;21(20):22833–22845. https://doi.org/10.1109/JSEN.2021.3105442
3. McCombe N., Liu S., Ding X., Prasad G., Bucholc M., Finn D.P. Practical Strategies for Extreme Missing Data Imputation in Dementia Diagnosis. IEEE Journal of Biomedical and Health Informatics. 2022;26(2):818–827. https://doi.org/10.1109/JBHI.2021.3098511
4. Sharifyanov N.V., Latypova V.A. Formation of data in fixations of oil and gas well models using an intelligent method for missing value completion. Modeling, Optimization and Information Technology. 2023;11(2). (In Russ.). https://doi.org/10.26102/2310-6018/2023.41.2.022
5. Hunko M., Tkachov V., Liashenko O., Rabčan J. Application Architecture For Obtaining Data From Scientometric Databases. In: 2022 IEEE 3rd KhPI Week on Advanced Technology (KhPIWeek), 03–07 October 2022, Kharkiv, Ukraine. IEEE; 2022. pp. 1–4. https://doi.org/10.1109/KhPIWeek57572.2022.9916398
6. Wan H., Zhang Y., Zhang J., Tang J. AMiner: Search and Mining of Academic Social Networks. Data Intelligence. 2019;1(1):58–76. https://doi.org/10.1162/dint_a_00006
7. Sauvayre R. Types of Errors Hiding in Google Scholar Data. Journal of Medical Internet Research. 2022;24(5). https://doi.org/10.2196/28354
8. Van Eck N.J., Waltman L. Accuracy of citation data in Web of Science and Scopus. ArXiv. URL: https://doi.org/10.48550/arXiv.1906.07011 [Accessed 10th August 2024].
9. Selivanova I.V., Kosyakov D.V., Guskov A.E. The Impact of Errors in the Sсopus Database on the Research Assessment. Scientific and Technical Information Processing. 2019;46(3):204–212. https://doi.org/10.3103/S0147688219030109
10. Zhang J., Tang J. Name disambiguation in AMiner. Science China Information Sciences. 2020;64(4). https://doi.org/10.1007/s11432-019-9884-y
11. Zhang Y., Zhang F., Yao P., Tang J. Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. In: KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 19–23 August 2018, London, United Kingdom. New York: Association for Computing Machinery; 2018. pp. 1002–1011. https://doi.org/10.1145/3219819.3219859
12. Müller M.-C., Reitz F., Roy N. Data sets for author name disambiguation: an empirical analysis and a new resource. Scientometrics. 2017;111(3):1467–1500. https://doi.org/10.1007/s11192-017-2363-5
13. Maddi A., Baudoin L. The quality of the web of science data: a longitudinal study on the completeness of authors-addresses links. Scientometrics. 2022;127(11):6279–6292. https://doi.org/10.1007/s11192-022-04525-0
14. Liu W., Hu G., Tang L. Missing author address information in Web of Science – An explorative study. Journal of Informetrics. 2018;12(3):985–997. https://doi.org/10.1016/j.joi.2018.07.008
15. Aksenteva M.S., Chebukov D.E. The effect of errors in the reference lists in the Web of Science database on the citation and impact factor of scientific journals. In: World-Class Scientific Publication – 2019: Strategy and Tactics of Management and Development: Proceedings of the 8th International Scientific and Practical Conference, 23–26 April 2019, Moscow, Russia. Yekaterinburg: Ural University Press; 2019. pp. 7–16. (In Russ.). https://doi.org/10.24069/konf-23-26-04-2019.01
16. Cioffi A., Coppini S., Massari A., Moretti A., Peroni S., Santini C., Asadi N.S. Identifying and correcting invalid citations due to DOI errors in Crossref data. Scientometrics. 2022;127(6):3593–3612. https://doi.org/10.1007/s11192-022-04367-w
17. Rodrigues D., Lopes A.L., Batista F. Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations. In: 12th Symposium on Languages, Applications and Technologies (SLATE 2023), 26–28 June 2023, Vila do Conde, Portugal. Schloss Dagstuhl – Leibniz-Zentrum für Informatik; 2023. pp. 5:1–5:11. https://doi.org/10.4230/OASIcs.SLATE.2023.5
18. Latypova V.A. Decision support method in reviewer multicriteria choice using integrated assessment and natural language processing methods in a scientific journal. Modeling, Optimization and Information Technology. 2023;11(4). (In Russ.). https://doi.org/10.26102/2310-6018/2023.43.4.035
19. Schock C., Dumler J., Doepper F. Data Acquisition and Preparation – Enabling Data Analytics Projects within Production. Procedia CIRP. 2021;104:636–640. https://doi.org/10.1016/j.procir.2021.11.107
20. Grinev A.V. Problems of Scientometrics and its Suitability for Management Scientific Activity in Modern Russia. Management Sciences. 2024;14(1):117–132. https://doi.org/10.26794/2404-022X-2024-14-1-117-132
21. López-Cózar E.D., Orduna-Malea E., Martín-Martín A., Ayllón J.M. Google Scholar: The Big Data Bibliographic Tool. In: Research Analytics. Boosting University Productivity and Competitiveness through Scientometrics: Chapter 4. New York: Auerbach Publications; 2017. pp. 59–80. https://doi.org/10.1201/9781315155890-4
Keywords: data preparation, decision-making support, data mining, peer reviewer, scientific publication
For citation: Latypova V.A. Method of preparing data on scientific publications for intelligent decision-making support in evaluating expertise of peer reviewers. Modeling, Optimization and Information Technology. 2024;12(4). URL: https://moitvivt.ru/ru/journal/pdf?id=1748 DOI: 10.26102/2310-6018/2024.47.4.026 (In Russ).
Received 18.11.2024
Revised 29.11.2024
Accepted 03.12.2024