Keywords: patents, patent search, keyword extraction, full-text search, HDFS, apache Solr, django, keyT5
The development of the information retrieval system for state of art assessment
UDC 004.853
DOI: 10.26102/2310-6018/2023.42.3.023
The relevance of this study is due to the need to improve the efficiency of extracting key phrases and words from the Russian-language patent array. Currently, patent office experts have to analyze texts of patent applications manually in order to identify key phrases and words that are then used to search for patent counterparts. This process is time-consuming and can be error-prone. Another problem is the lack of a system similar to Google Patents but for Russian-language patents. Currently, there is no reliable and effective tool for automatic identification of key patent phrases and words in Russian-language patents. This limits the ability of experts to search and analyze patent analogues as well as to make decisions on patenting. Improving the efficiency of extracting key phrases and words from the Russian-language patent array is of great practical importance. This will reduce the time spent on the analysis of patent applications, improve the accuracy of the search for similar patents and provide more reliable patenting solutions. Such a tool will be useful for patent offices, legal consultants, engineers and researchers who work with Russian-language patents. In general, this study is conditioned by the need to improve and automate the process of analyzing patent applications, which will lead to an increase in the efficiency and accuracy of managing the Russian-language patent array and make it more accessible and user-friendly.
1. Zharova A.K. Intellectual law. Intellectual property protection. Moscow, Yurite Publishing House; 2023. 379 p. (In Russ.).
2. Ishkov A.D., Stepanov A.V. Industrial property. Registration of a patent application for a utility model. Moscow, Flinta; 2013. 98 p. (In Russ.).
3. Zhiwei F. Formal Analysis for Natural Language Processing: A Handbook. Singapore, Springer; 2023. 796 p.
4. Romanadze E.L., Sudakov V.A., Kislinsky V.G. Development of a method for extracting keywords based on a probabilistic thematic model. Data modeling and analysis. 2022;12(2). URL: https://psyjournals.ru/journals/mda/archive/2022_n2/Romanadze_et_al. DOI: 10.17759/mda.2022120202 (accessed on 22.07.22). (In Russ.).
5. Phenogenova A., Tikhonova M., Mikhailov V. Russian SuperGLUE 1.1: revision of lessons not learned by Russian NLP models. ArXiv. 2022; 2202.07791. URL: https://arxiv.org/abs/2202.07791. DOI: 10.28995/2075-7182-2021-20-235-245 (accessed on 12.11.2022). (In Russ.).
6. Koitzsch K. Advanced Search Techniques with Hadoop, Lucene, and Solr. In: Pro Hadoop Data Analytics. Berkeley, Apress; 2017. 298 p. DOI: 10.1007/978-1-4842-1910-2.
7. Wadkar S., Siddalingaiah M. Pro Apache Hadoop. CA: Apress Berkeley; 2014. 413 p. DOI: 10.1007/978-1-4302-4864-4.
8. Abu-Salih B., Wongthongtham P., Zhu D., Chan K.Y., Rudra A. Introduction to Big Data Technology. Singapore, Springer; 2021. 218 p.
9. Dudchenko P.V. Metrics for assessing classifiers in medical diagnostic tasks. In: The youth and modern information technologies: Proceedings of XVI International scientific and practical conference for undergraduate, postgraduate students and young researchers, 3–7 December 2018, Tomsk. Tomsk, Publishing House of TPU; 2019. URL: http://earchive.tpu.ru/handle/11683/52692 (accessed on 12.02.2023). (In Russ.).
10. Nikolaev A.S. Patent analytics: educational and methodological manual. Saint Petersburg, ITMO University; 2022. 98 p. (In Russ.).
Keywords: patents, patent search, keyword extraction, full-text search, HDFS, apache Solr, django, keyT5
For citation: Bobunov A.V., Korobkin D.M., Fomenkov S.A. The development of the information retrieval system for state of art assessment. Modeling, Optimization and Information Technology. 2023;11(3). URL: https://moitvivt.ru/ru/journal/pdf?id=1413 DOI: 10.26102/2310-6018/2023.42.3.023 (In Russ).
Received 20.06.2023
Revised 29.07.2023
Accepted 20.09.2023
Published 30.09.2023