Keywords: patents, physics and technology journals, parsing, scalability, fault tolerance
Concept and architecture of parsing and storing a unified database of patents and scientific journal publications
UDC 004.853
DOI: 10.26102/2310-6018/2024.47.4.024
The currently existing methods of automated data collection, although they facilitate this process, often face problems of low reliability, efficiency and speed. Unstable connections, blocking IP addresses and changes in the structure of sites lead to data loss and the need for constant monitoring of the parsing process, which increases the cost of maintaining and operating such systems. In this regard, the development of new approaches and tools for parsing the necessary information is a very urgent task that can transform the field of data mining. The article discusses the process of developing a module for parsing information from patent systems and websites of physics and technology journals using modern technologies and approaches, and also presents the results of checking its operability. This tool can be useful for patent offices, researchers, students, engineers, and scientists working in the subject area under consideration. The use of such a module will open up new opportunities for data mining and strategic decision-making in the field of innovative development, as well as for in-depth analysis of technological trends, identification of promising developments and building innovative development strategies.
1. Zagrebelny M.S. Intellectual property as key resource in the digital economy. Vestnik nauki. 2024;1(6):502–511. (In Russ.).
2. Gorbashko E.A., Karlik A.E., Shepelev R.E. Patent analytics as an element of strategic management of economic structures. Izvestiya Sankt-Peterburgskogo gosudarstvennogo ekonomicheskogo universiteta. 2023;(3–1):114–121. (In Russ.).
3. Nikolaev A.S. Patentnaya analitika. Saint Petersburg: ITMO University; 2022. 98 p. (In Russ.).
4. Nikitenko S.M., Mesyats M.A., Korolev M.K. Patent analytics as a tool of formation innovative sectors of the economy. Economics and Innovation Management. 2022;(1):86–95. (In Russ.). https://doi.org/10.26730/2587-5574-2022-1-86-95
5. Fedortsova A.S. Intellectual property objects. Russian Economic Bulletin. 2021;4(2):287–290. (In Russ.).
6. Mazanik A.A. Goals and main methods of patent-information search in electronic databases. In: Intellektual'naya sobstvennost' v sovremennom mire: vyzovy vremeni i perspektivy razvitiya: Materialy Mezhdunarodnoi nauchno-prakticheskoi konferentsii: Chast' 2, 20 October 2021, Minsk, Belarus. Minsk: Al'fa-kniga; 2021. pp. 7–13. (In Russ.).
7. Menshikov Ya.S. Advantages of automatic data collection in the Internet over manual data collection. Universum: tekhnicheskie nauki. 2022;10(103). (In Russ.). URL: https://7universum.com/ru/tech/archive/item/14383
8. Kozina S.A., Korobkin D.M., Fomenkov S.A. Formation of a unified database on physical subjects. Mathematical Methods in Technologies and Technics. 2021;(8):89–92. (In Russ.). https://doi.org/10.52348/2712-8873_MMTT_2021_8_89
9. Genin B.L., Zolkin D.S. Similarity search in patents databases. The evaluations of the search quality. World Patent Information. 2021;64. https://doi.org/10.1016/j.wpi.2021.102022
10. Feng Z. Formal Analysis for Natural Language Processing: A Handbook. Singapore: Springer; 2023. 796 p. https://doi.org/10.1007/978-981-16-5172-4
Keywords: patents, physics and technology journals, parsing, scalability, fault tolerance
For citation: Kozina S., Kulinchenko I., Korobkin D., Fomenkov S. Concept and architecture of parsing and storing a unified database of patents and scientific journal publications. Modeling, Optimization and Information Technology. 2024;12(4). URL: https://moitvivt.ru/ru/journal/pdf?id=1740 DOI: 10.26102/2310-6018/2024.47.4.024 (In Russ).
Received 13.11.2024
Revised 25.11.2024
Accepted 27.11.2024