Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
cетевое издание
issn 2310-6018

THE AUTOMATIZATION OF CHEMICAL FORMULAS COMPARISON

Vayngolts N.A.   Vereshchak G.A.   Korobkin D.M.   Fomenkov S.A.  

UDC 004.89
DOI: 10.26102/2310-6018/2018.23.4.014

  • Abstract
  • List of references
  • About authors

An expert of the patent office to establish the uniqueness of the patented technology, it is necessary to compare the patent application with the patents and make sure that there are no analogues of the invention. When analyzing patents of chemical classes, it is required to compare chemical formulas that can be given in different formats: MOL, InChi, SMILES, structural formula, molecular fingerprint. This paper describes the development of a software that automates the procedures: conversion of various formalization of the chemical formula, comparison of chemical formulas from the patent application and patents, identification of patents-analogues based on the results of comparison of chemical formulas. Comparison of chemical formulas is based on the calculation of the similarity of molecular fingerprints using the Tanimoto coefficient. The coefficient of similarity of patents is calculated based on the maximum values of the Tanimoto coefficient for a set of compared chemical compounds from patents. The software is developed on Java using the Spring Framework technology, the H2, and the Chemistry Development Kit (CDK). The software showed a high performance (high recall and precision of the patent search on the basis of chemical formulas, the lowest values of the information loss and noise).

1. D.M. Korobkin, N.A. Gordeev, S.A. Fomenkov, M.A. Dykov. Metod vyyavleniya patentnyh trendov na osnove opisanij tekhnicheskih funkcij. Izvestiya VolgGTU. Ser. Aktual'nye problemy upravleniya, vychislitel'noj tekhniki i informatiki v tekhnicheskih sistemah. - Volgograd, 2018. - № 5 (215). - C. 56-60

2. D.M. Korobkin, S.A. Fomenkov, I.A. Koblikov, G.A. Karachunova. Metodika semanticheskogo patentnogo poiska. Izvestiya VolgGTU. Ser. Aktual'nye problemy upravleniya, vychislitel'noj tekhniki i informatiki v tekhnicheskih sistemah. - Volgograd, 2017. - № 1 (196). - C. 65-73.

3. Chemical formula – https://www.britannica.com/science/chemical-formula.

4. MDL Information Systems, Inc. CTFile Formats / MDL Information Systems, Inc. – San Leandro : MDL Information Systems, 2003. – 106 p.

5. Heller, R. The IUPAC International Chemical Identifier (InChI) / R. Heller, Alan D. McNaught // CHEMISTRY International. – 2009. – № 1. – pp. 7- 9

6. Daylight Theory: SMILES - http://www.daylight.com/dayhtml/doc/ theory/theory.smiles.html.

7. Dalke, A. Molecular fingerprints, background - http://www.dalkescientific.com/writings/diary/archive/2008/06/26/ fingerprint_background.html.

8. Bulk Data Storage System - https://bulkdata.uspto.gov.

9. ChemSpider reaches 50 million compounds - http://www.rsc.org/journalsbooks-databases/librarians-information/librarians-notes/allarticles/2016/jun/ chemspider- reaches-50-million-compounds.

10. PubChem Docs – About – https://pubchemdocs.ncbi.nlm.nih.gov/about

11. ChemSynthesis – Chemical Database [EHlektronnyj resurs]. – Rezhim dostupa: http://www.chemsynthesis.com/ (data obrashch. 18.05.2018).

12. NCI/CADD Chemical Resolver – Chemical Identifier Resolver documentation – https://cactus.nci.nih.gov/chemical/structure_documentation.

13. D.M. Korobkin, E.A. Tyul'kina, S.A. Fomenkov, S.G. Kolesnikov. Sistema izvlecheniya tekhnicheskih funkcij iz patentnogo massiva. ITNOU: Informacionnye tekhnologii v nauke, obrazovanii i upravlenii. - 2017. - № 2 (2). - C. 24-30.

14. I.A. Koblikov, D.M. Korobkin, S.A. Fomenkov, V.A. YArovenko. Metodika izvlecheniya opisanij realizuemyh v patente tekhnicheskih funkcij. Izvestiya VolgGTU. Ser. Aktual'nye problemy upravleniya, vychislitel'noj tekhniki i informatiki v tekhnicheskih sistemah. - Volgograd, 2017. - № 8 (203). - C. 55-59.

15. Tanimoto (cdk 2.1-SNAPSHOT API) – http://cdk.github.io/cdk/2.1/docs/api/org/openscience/cdk/similarity/ Tanimoto.html.

16. Spring Framework Overview – https://www.tutorialspoint.com/spring/spring_overview.htm

17. Spring Boot and H2 in memory database – Why, What and How? – Spring Boot Tutorial –http

18. Chemistry Development Kit –https://cdk.github.io.

19. Gopta E.A., Fomenkov S.A., Karachunova G.A. Avtomatizaciya processa linejnogo sinteza fizicheskogo principa dejstviya. Izvestiya Volgogradskogo gosudarstvennogo tekhnicheskogo universiteta. 2010. № 11 (71). S. 129-133.

Vayngolts Natalia Alexandrovna

Email: natalia.vayngolts@gmail.com

Volgograd State Technical University

Volgograd, Russian Federation

Vereshchak Grigory Alekseevich

Email: grigoryg37@gmail.com

Volgograd State Technical University

Volgograd, Russian Federation

Korobkin Dmitry Mikhailovich
Candidate of Technical Sciences
Email: dkorobkin80@mail.ru

Volgograd State Technical University

Volgograd, Russian Federation

Fomenkov Sergey Alekseevich
Doctor of Technical Sciences Professor
Email: saf550@yandex.ru

Volgograd State Technical University

Volgograd, Russian Federation

Keywords: chemical formula, smiles, inchi, mdl molfile

For citation: Vayngolts N.A. Vereshchak G.A. Korobkin D.M. Fomenkov S.A. THE AUTOMATIZATION OF CHEMICAL FORMULAS COMPARISON. Modeling, Optimization and Information Technology. 2018;6(4). Available from: https://moit.vivt.ru/wp-content/uploads/2018/10/VayngoltsSoatori_4_18_1.pdf DOI: 10.26102/2310-6018/2018.23.4.014 (In Russ).

114

Full text in PDF