Концепция и архитектура парсинга и хранения единой базы патентов и научных журнальных публикаций
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

Concept and architecture of parsing and storing a unified database of patents and scientific journal publications

idKozina S., Kulinchenko I.,  idKorobkin D., idFomenkov S.

UDC 004.853
DOI: 10.26102/2310-6018/2024.47.4.024

  • Abstract
  • List of references
  • About authors

The currently existing methods of automated data collection, although they facilitate this process, often face problems of low reliability, efficiency and speed. Unstable connections, blocking IP addresses and changes in the structure of sites lead to data loss and the need for constant monitoring of the parsing process, which increases the cost of maintaining and operating such systems. In this regard, the development of new approaches and tools for parsing the necessary information is a very urgent task that can transform the field of data mining. The article discusses the process of developing a module for parsing information from patent systems and websites of physics and technology journals using modern technologies and approaches, and also presents the results of checking its operability. This tool can be useful for patent offices, researchers, students, engineers, and scientists working in the subject area under consideration. The use of such a module will open up new opportunities for data mining and strategic decision-making in the field of innovative development, as well as for in-depth analysis of technological trends, identification of promising developments and building innovative development strategies.

1. Zagrebelny M.S. Intellectual property as key resource in the digital economy. Vestnik nauki. 2024;1(6):502–511. (In Russ.).

2. Gorbashko E.A., Karlik A.E., Shepelev R.E. Patent analytics as an element of strategic management of economic structures. Izvestiya Sankt-Peterburgskogo gosudarstvennogo ekonomicheskogo universiteta. 2023;(3–1):114–121. (In Russ.).

3. Nikolaev A.S. Patentnaya analitika. Saint Petersburg: ITMO University; 2022. 98 p. (In Russ.).

4. Nikitenko S.M., Mesyats M.A., Korolev M.K. Patent analytics as a tool of formation innovative sectors of the economy. Economics and Innovation Management. 2022;(1):86–95. (In Russ.). https://doi.org/10.26730/2587-5574-2022-1-86-95

5. Fedortsova A.S. Intellectual property objects. Russian Economic Bulletin. 2021;4(2):287–290. (In Russ.).

6. Mazanik A.A. Goals and main methods of patent-information search in electronic databases. In: Intellektual'naya sobstvennost' v sovremennom mire: vyzovy vremeni i perspektivy razvitiya: Materialy Mezhdunarodnoi nauchno-prakticheskoi konferentsii: Chast' 2, 20 October 2021, Minsk, Belarus. Minsk: Al'fa-kniga; 2021. pp. 7–13. (In Russ.).

7. Menshikov Ya.S. Advantages of automatic data collection in the Internet over manual data collection. Universum: tekhnicheskie nauki. 2022;10(103). (In Russ.). URL: https://7universum.com/ru/tech/archive/item/14383

8. Kozina S.A., Korobkin D.M., Fomenkov S.A. Formation of a unified database on physical subjects. Mathematical Methods in Technologies and Technics. 2021;(8):89–92. (In Russ.). https://doi.org/10.52348/2712-8873_MMTT_2021_8_89

9. Genin B.L., Zolkin D.S. Similarity search in patents databases. The evaluations of the search quality. World Patent Information. 2021;64. https://doi.org/10.1016/j.wpi.2021.102022

10. Feng Z. Formal Analysis for Natural Language Processing: A Handbook. Singapore: Springer; 2023. 796 p. https://doi.org/10.1007/978-981-16-5172-4

Kozina Svetlana

ORCID |

Volgograd State Technical University

Volgograd, Russia

Kulinchenko Inna

Volgograd State Technical University

Volgograd, Russia

Korobkin Dmitriy
Candidate of Technical Sciences

ORCID |

Volgograd State Technical University

Volgograd, Russia

Fomenkov Sergey
Doctor of Technical Sciences

ORCID |

Volgograd State Technical University

Volgograd, Russia

Keywords: patents, physics and technology journals, parsing, scalability, fault tolerance

For citation: Kozina S., Kulinchenko I., Korobkin D., Fomenkov S. Concept and architecture of parsing and storing a unified database of patents and scientific journal publications. Modeling, Optimization and Information Technology. 2024;12(4). URL: https://moitvivt.ru/ru/journal/pdf?id=1740 DOI: 10.26102/2310-6018/2024.47.4.024 (In Russ).

54

Full text in PDF

Received 13.11.2024

Revised 25.11.2024

Accepted 27.11.2024