Keywords: text classification, quantile regression, adaptive algorithm, gradient descent, mathematical modeling, numerical methods
Detecting machine-generated texts with adaptive quantile regression
UDC 519.6
DOI: 10.26102/2310-6018/2024.44.1.033
This paper considers the problem of detecting machine-generated texts using various regression model building tools – classical linear regression, logistic regression and quantile regression. Advances in machine learning are creating increasingly realistic texts, which opens the door to misuse. As text generation algorithms become more sophisticated, the complexity of the task of detecting such texts increases, which also requires more sophisticated mathematical modeling methods and more efficient numerical methods. The proposed adaptive quantile regression algorithm is a tool that allows building models with emphasis on different quantiles, which makes it particularly useful for detecting atypical values that may indicate the artificial nature of the texts. The paper also presents a detailed description of the initial open dataset for model training, which is a set of generated texts using the GhatGPT 3 model and random texts from various forums, and analyzes the computational experiments performed. The results show the high efficiency of the proposed method in this field of application.
1. He Y., Qiu J., Zhang W., Yuan Z. Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models. URL: http://arxiv.org/abs/2402.01725 [Accessed 3rd February 2024].
2. Seo Ji-Hoon, Lee Ho-Sun, Choi Jin-Tak. Classification Technique for Filtering Sentiment Vocabularies for the Enhancement of Accuracy of Opinion Mining. International journal of u- and e-service, science and technology. 2015;8(10):11–20. DOI: 10.14257/ijunesst.2015.8.10.02.
3. Sandler M., Choung H., Ross A., David P. A Linguistic Comparison between Human and ChatGPT-Generated Conversations. URL: https://arxiv.org/pdf/2401.16587.pdf [Accessed 5th February 2024].
4. Hans A., et al. Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. URL: https://arxiv.org/pdf/2401.12070.pdf [Accessed 4th February 2024].
5. Zheng Qi, Peng Limin, He Xuming. Globally adaptive quantile regression with ultra-high dimensional data. The Annals of Statistics. 2015;43(5):2225–2258. DOI: 10.1214/15-AOS1340.
6. Barrodale I., Roberts F.D.K. An Improved Algorithm for Discrete l1 Linear Approximation. SIAM Journal on Numerical Analysis. 1973;10(5):839–848. DOI: 10.1137/0710069.
7. Chen C. An Adaptive Algorithm for Quantile Regression. In: Theory and Applications of Recent Robust Methods by ICORS2003: International Conference on Robust Statistics – 2003, 13–18 July 2003, Antwerp, Belgium. Basel: Springer Basel AG; 2004. P. 39–48.
8. Chen C. A Finite Smoothing Algorithm for Quantile Regression. Journal of Computational and Graphical Statistics. 2007;16(1):136–164. DOI: 10.1198/106186007X180336.
9. Tyurin A.S. Adaptive quantile regression. Modelirovanie, optimizatsiya i informatsionnye tekhnologii = Modeling, Optimization and Information Technology. 2024;12(1). (In Russ.). URL: https://moitvivt.ru/ru/journal/pdf?id=1514. DOI: 10.26102/2310-6018/2024.44.1.016 [Accessed 7th February 2024].
10. Duan T., Avati A., Ding D.Y., Thai K.K., Basu S., Ng A., Schuler A. NGBoost: Natural Gradient Boosting for Probabilistic Prediction. In: ICML 2020: 37th International Conference on Machine Learning: Proceedings of the 37 th International Conference on Machine Learning, 13-18 July 2020, Vienna, Austria. 2020. P. 2690–2700.
11. Tyurin A.S., Saraev P.V. Construction of quantile regression using natural gradient descent. Prikladnaya matematika i voprosy upravleniya = Applied Mathematics and Control Sciences. 2023;(2):43–52. (In Russ.). DOI: 10.15593/2499-9873/2023.2.04.
Keywords: text classification, quantile regression, adaptive algorithm, gradient descent, mathematical modeling, numerical methods
For citation: Tyurin A.S., Saraev P.V. Detecting machine-generated texts with adaptive quantile regression. Modeling, Optimization and Information Technology. 2024;12(1). URL: https://moitvivt.ru/ru/journal/pdf?id=1536 DOI: 10.26102/2310-6018/2024.44.1.033 (In Russ).
Received 10.03.2024
Revised 21.03.2024
Accepted 29.03.2024
Published 31.03.2024