Детектирование машинно-сгенерированных текстов при помощи адаптивной квантильной регрессии
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

Detecting machine-generated texts with adaptive quantile regression

idTyurin A.S. idSaraev P.V.

UDC 519.6
DOI: 10.26102/2310-6018/2024.44.1.033

  • Abstract
  • List of references
  • About authors

This paper considers the problem of detecting machine-generated texts using various regression model building tools – classical linear regression, logistic regression and quantile regression. Advances in machine learning are creating increasingly realistic texts, which opens the door to misuse. As text generation algorithms become more sophisticated, the complexity of the task of detecting such texts increases, which also requires more sophisticated mathematical modeling methods and more efficient numerical methods. The proposed adaptive quantile regression algorithm is a tool that allows building models with emphasis on different quantiles, which makes it particularly useful for detecting atypical values that may indicate the artificial nature of the texts. The paper also presents a detailed description of the initial open dataset for model training, which is a set of generated texts using the GhatGPT 3 model and random texts from various forums, and analyzes the computational experiments performed. The results show the high efficiency of the proposed method in this field of application.

1. He Y., Qiu J., Zhang W., Yuan Z. Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models. URL: http://arxiv.org/abs/2402.01725 [Accessed 3rd February 2024].

2. Seo Ji-Hoon, Lee Ho-Sun, Choi Jin-Tak. Classification Technique for Filtering Sentiment Vocabularies for the Enhancement of Accuracy of Opinion Mining. International journal of u- and e-service, science and technology. 2015;8(10):11–20. DOI: 10.14257/ijunesst.2015.8.10.02.

3. Sandler M., Choung H., Ross A., David P. A Linguistic Comparison between Human and ChatGPT-Generated Conversations. URL: https://arxiv.org/pdf/2401.16587.pdf [Accessed 5th February 2024].

4. Hans A., et al. Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. URL: https://arxiv.org/pdf/2401.12070.pdf [Accessed 4th February 2024].

5. Zheng Qi, Peng Limin, He Xuming. Globally adaptive quantile regression with ultra-high dimensional data. The Annals of Statistics. 2015;43(5):2225–2258. DOI: 10.1214/15-AOS1340.

6. Barrodale I., Roberts F.D.K. An Improved Algorithm for Discrete l1 Linear Approximation. SIAM Journal on Numerical Analysis. 1973;10(5):839–848. DOI: 10.1137/0710069.

7. Chen C. An Adaptive Algorithm for Quantile Regression. In: Theory and Applications of Recent Robust Methods by ICORS2003: International Conference on Robust Statistics – 2003, 13–18 July 2003, Antwerp, Belgium. Basel: Springer Basel AG; 2004. P. 39–48.

8. Chen C. A Finite Smoothing Algorithm for Quantile Regression. Journal of Computational and Graphical Statistics. 2007;16(1):136–164. DOI: 10.1198/106186007X180336.

9. Tyurin A.S. Adaptive quantile regression. Modelirovanie, optimizatsiya i informatsionnye tekhnologii = Modeling, Optimization and Information Technology. 2024;12(1). (In Russ.). URL: https://moitvivt.ru/ru/journal/pdf?id=1514. DOI: 10.26102/2310-6018/2024.44.1.016 [Accessed 7th February 2024].

10. Duan T., Avati A., Ding D.Y., Thai K.K., Basu S., Ng A., Schuler A. NGBoost: Natural Gradient Boosting for Probabilistic Prediction. In: ICML 2020: 37th International Conference on Machine Learning: Proceedings of the 37 th International Conference on Machine Learning, 13-18 July 2020, Vienna, Austria. 2020. P. 2690–2700.

11. Tyurin A.S., Saraev P.V. Construction of quantile regression using natural gradient descent. Prikladnaya matematika i voprosy upravleniya = Applied Mathematics and Control Sciences. 2023;(2):43–52. (In Russ.). DOI: 10.15593/2499-9873/2023.2.04.

Tyurin Aleksey Sergeevich

ORCID | eLibrary |

Lipetsk State Technical University

Lipetsk, the Russian Federation

Saraev Pavel Viktorovich
Doctor of Engineering Sciences, Associate Professor

ORCID |

Lipetsk State Technical University

Lipetsk, the Russian Federation

Keywords: text classification, quantile regression, adaptive algorithm, gradient descent, mathematical modeling, numerical methods

For citation: Tyurin A.S. Saraev P.V. Detecting machine-generated texts with adaptive quantile regression. Modeling, Optimization and Information Technology. 2024;12(1). Available from: https://moitvivt.ru/ru/journal/pdf?id=1536 DOI: 10.26102/2310-6018/2024.44.1.033 (In Russ).

36

Full text in PDF

Received 10.03.2024

Revised 21.03.2024

Accepted 29.03.2024

Published 13.04.2024