Keywords: adjective ordering, natural language processing, word vector representation, gloVe, classification methods, hypernyms
Application of machine learning for adjective ordering in English sentences
UDC 004.891.2
DOI: 10.26102/2310-6018/2023.40.1.028
The article presents a methodology for solving the adjective ordering problem in English sentences by determining their hypernyms. The determining of a hypernym can be represented as a classification task; therefore, the most popular machine-learning classification methods were compared, they include the following: nearest neighbors method, logistic regression, decision classifier, support vector machine and naive Bayes method. The models were trained on a sample that contained adjectives and their hypernyms. For each adjective, similar adjectives from the training sample were selected; the most semantically appropriate hypernym was determined based on them. The use of information about word similarity from GloVe embeddings is proposed. The optimal values of hyperparameters for the K-Nearest Neighbors method were selected by means of the gridsearch technique. The quality of data classification was evaluated applying the metrics of precision, recall, and F1-measure for each of the methods. Since there were no ready-made datasets of classified adjectives, 300 adjectives were classified manually to create necessary samples.
1. Mitrovic A., Koedinger K.R., Martin B. A comparative analysis of cognitive tutoring and constraint-based modeling. Lecture Notes in Computer Science. 2003;2702:313–322. DOI: 10.1007/3-540-44963-9_42.
2. Uglev V.A., Sychev O.A., Anikin A.V. Data mining of digital footprint during assessment grading for intelligent decision making during learning process.. Zhurnal Sibirskogo federal'nogo universiteta. Tekhnika i tekhnologii = Journal of Siberian Federal University. Engineering & Technologies. 2022;15(1):121–136. DOI: 10.17516/1999-494X-0378. (In Russ.).
3. Malkani N. A Comprehensive guide on General English for competitive examinations. Agra, Oswal Publishers; 2020. 518 p.
4. Yogish D., Manjunath T. N., Hegadi S.R. Review on natural language processing trends and techniques using NLTK. Recent Trends in Image Processing and Pattern Recognition. 2018;1037:589–606. DOI: 10.1007/978-981-13-9187-3_53.
5. Bird S, Klein E, Loper E. Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc; 2009. 502 p.
6. Cheng X., Kong X., Liao L., Li B. A combined method for usage of NLP libraries towards analyzing software documents. Advanced Information Systems Engineering. CAiSE 2020. Lecture Notes in Computer Science. 2020;12127:515–529. DOI: 10.1007/978-3-030-49435-3_32.
7. Sarkar D. Text Analytics with Python: A Practitioner's Guide to Natural Language Processing. New York, Apress; 2019. 698 p.
8. Fellbaum C. WordNet: an Electronic Lexical Database. Cambridge, MIT Press; 1998. 422 p. DOI: 10.7551/mitpress/7287.001.0001.
9. Pennington J., Socher R., Manning C.D. Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014:1532–1543. DOI: 10.3115/v1/D14-1162.
10. Daniel T.L., Chantal D.L. Discovering knowledge in data: an introduction to data mining. New Jersey, Wiley-interscience. John Wiley & Sons, Inc; 2005. 222 p.
11. Haneen A.A.A., Ahmad B.A.H. Effects of distance measure choice on K-nearest neighbor classifier performance: a review. Big Data. 2019:221–248
12. Li B. Importance weighted feature selection strategy for text classification. International Conference on Asian Language Processing (IALP). 2016:344–347.
13. Cristianini N., Shawe-Taylor J. An introduction to support vector machines: and other kernel-based learning methods. Cambridge, Cambridge University Press; 2000. 204 p. DOI: 10.1017/CBO9780511801389.
14. Shafieezadeh-Abadeh S., Esfahani P.M., Kuhn D., Distributionally robust logistic regression. Advances in Neural Information Processing Systems. 2015:1576–1584.
15. Champandard A.J. AI Game Development: Synthetic Creatures with Learning and Reactive Behaviors. San Francisco, New Riders Pub; 2003. 500 p.
Keywords: adjective ordering, natural language processing, word vector representation, gloVe, classification methods, hypernyms
For citation: Terekhova A.D., Terekhov G.V., Sychev O.A. Application of machine learning for adjective ordering in English sentences. Modeling, Optimization and Information Technology. 2023;11(1). URL: https://moitvivt.ru/ru/journal/pdf?id=1301 DOI: 10.26102/2310-6018/2023.40.1.028 (In Russ).
Received 11.01.2023
Revised 09.03.2023
Accepted 20.03.2023
Published 31.03.2023