Keywords: chemical formula, smiles, inchi, mdl molfile
THE AUTOMATIZATION OF CHEMICAL FORMULAS COMPARISON
UDC 004.89
DOI: 10.26102/2310-6018/2018.23.4.014
An expert of the patent office to establish the uniqueness of the patented technology, it is necessary to compare the patent application with the patents and make sure that there are no analogues of the invention. When analyzing patents of chemical classes, it is required to compare chemical formulas that can be given in different formats: MOL, InChi, SMILES, structural formula, molecular fingerprint. This paper describes the development of a software that automates the procedures: conversion of various formalization of the chemical formula, comparison of chemical formulas from the patent application and patents, identification of patents-analogues based on the results of comparison of chemical formulas. Comparison of chemical formulas is based on the calculation of the similarity of molecular fingerprints using the Tanimoto coefficient. The coefficient of similarity of patents is calculated based on the maximum values of the Tanimoto coefficient for a set of compared chemical compounds from patents. The software is developed on Java using the Spring Framework technology, the H2, and the Chemistry Development Kit (CDK). The software showed a high performance (high recall and precision of the patent search on the basis of chemical formulas, the lowest values of the information loss and noise).
1. D.M. Korobkin, N.A. Gordeev, S.A. Fomenkov, M.A. Dykov. Metod vyyavleniya patentnyh trendov na osnove opisanij tekhnicheskih funkcij. Izvestiya VolgGTU. Ser. Aktual'nye problemy upravleniya, vychislitel'noj tekhniki i informatiki v tekhnicheskih sistemah. - Volgograd, 2018. - № 5 (215). - C. 56-60
2. D.M. Korobkin, S.A. Fomenkov, I.A. Koblikov, G.A. Karachunova. Metodika semanticheskogo patentnogo poiska. Izvestiya VolgGTU. Ser. Aktual'nye problemy upravleniya, vychislitel'noj tekhniki i informatiki v tekhnicheskih sistemah. - Volgograd, 2017. - № 1 (196). - C. 65-73.
3. Chemical formula – https://www.britannica.com/science/chemical-formula.
4. MDL Information Systems, Inc. CTFile Formats / MDL Information Systems, Inc. – San Leandro : MDL Information Systems, 2003. – 106 p.
5. Heller, R. The IUPAC International Chemical Identifier (InChI) / R. Heller, Alan D. McNaught // CHEMISTRY International. – 2009. – № 1. – pp. 7- 9
6. Daylight Theory: SMILES - http://www.daylight.com/dayhtml/doc/ theory/theory.smiles.html.
7. Dalke, A. Molecular fingerprints, background - http://www.dalkescientific.com/writings/diary/archive/2008/06/26/ fingerprint_background.html.
8. Bulk Data Storage System - https://bulkdata.uspto.gov.
9. ChemSpider reaches 50 million compounds - http://www.rsc.org/journalsbooks-databases/librarians-information/librarians-notes/allarticles/2016/jun/ chemspider- reaches-50-million-compounds.
10. PubChem Docs – About – https://pubchemdocs.ncbi.nlm.nih.gov/about
11. ChemSynthesis – Chemical Database [EHlektronnyj resurs]. – Rezhim dostupa: http://www.chemsynthesis.com/ (data obrashch. 18.05.2018).
12. NCI/CADD Chemical Resolver – Chemical Identifier Resolver documentation – https://cactus.nci.nih.gov/chemical/structure_documentation.
13. D.M. Korobkin, E.A. Tyul'kina, S.A. Fomenkov, S.G. Kolesnikov. Sistema izvlecheniya tekhnicheskih funkcij iz patentnogo massiva. ITNOU: Informacionnye tekhnologii v nauke, obrazovanii i upravlenii. - 2017. - № 2 (2). - C. 24-30.
14. I.A. Koblikov, D.M. Korobkin, S.A. Fomenkov, V.A. YArovenko. Metodika izvlecheniya opisanij realizuemyh v patente tekhnicheskih funkcij. Izvestiya VolgGTU. Ser. Aktual'nye problemy upravleniya, vychislitel'noj tekhniki i informatiki v tekhnicheskih sistemah. - Volgograd, 2017. - № 8 (203). - C. 55-59.
15. Tanimoto (cdk 2.1-SNAPSHOT API) – http://cdk.github.io/cdk/2.1/docs/api/org/openscience/cdk/similarity/ Tanimoto.html.
16. Spring Framework Overview – https://www.tutorialspoint.com/spring/spring_overview.htm
17. Spring Boot and H2 in memory database – Why, What and How? – Spring Boot Tutorial –http
18. Chemistry Development Kit –https://cdk.github.io.
19. Gopta E.A., Fomenkov S.A., Karachunova G.A. Avtomatizaciya processa linejnogo sinteza fizicheskogo principa dejstviya. Izvestiya Volgogradskogo gosudarstvennogo tekhnicheskogo universiteta. 2010. № 11 (71). S. 129-133.
Keywords: chemical formula, smiles, inchi, mdl molfile
For citation: Vayngolts N.A., Vereshchak G.A., Korobkin D.M., Fomenkov S.A. THE AUTOMATIZATION OF CHEMICAL FORMULAS COMPARISON. Modeling, Optimization and Information Technology. 2018;6(4). URL: https://moit.vivt.ru/wp-content/uploads/2018/10/VayngoltsSoatori_4_18_1.pdf DOI: 10.26102/2310-6018/2018.23.4.014 (In Russ).
Published 31.12.2018