Keywords: time series, annotation generation, LLM, multi-agent system, dashboards
Research and evaluation of the quality of natural language annotations generated by the multi-agent system
UDC 004.89
DOI: 10.26102/2310-6018/2025.50.3.009
This study is devoted to assessing the quality of annotations in Russian generated by a multi-agent system for time series analysis. The system includes four specialized agents: a dashboard analyst, a time series analyst, a domain-specific agent, and an agent for user interaction. Annotations are generated by analyzing dashboard and time series data using the GPT-4o-mini model and a task graph implemented with LangGraph. The quality of the annotations was assessed using the metrics of clarity, readability, contextual relevance, and literacy, as well as using an adapted Flesch readability index formula for the Russian language. Testing was developed and conducted with the participation of 21 users on 10 dashboards – a total of 210 ratings on a ten-point scale for each of the metrics. The assessment and results showed the effectiveness of annotations: clarity - 8.486, readability - 8.705, contextual relevance – 8.890, literacy – 8.724. The readability index was 33.6, which shows the average complexity of the text. This indicator is related to the specifics of the research area and does not take into account the arrangement of words and their context, but only static length indicators. An adult and a non-specialist in each field are able to perceive complex words in the annotation, which is proven by other ratings. All comments left by users will be taken into account to improve the format and interactivity of the system in further research.
1. Gkatzia D., Lemon O., Rieser V. Natural Language Generation Enhances Human Decision-Making with Uncertain Information. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016: Volume 2: Short Papers, 07–12 August 2016, Berlin, Germany. Association for Computational Linguistics; 2016. P. 264–268. https://doi.org/10.18653/v1/P16-2043
2. Jiang Yu., Pan Z., Zhang X., et al. Empowering Time Series Analysis with Large Language Models: A Survey. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence Survey Track, IJCAI 2024, 03–09 August 2024, Jeju, South Korea. 2024. P. 8095–8103. https://doi.org/10.24963/ijcai.2024/895
3. Tang F., Ding Yi. Are Large Language Models Useful for Time Series Data Analysis? arXiv. URL: https://doi.org/10.48550/arXiv.2412.12219 [Accessed 15th May 2025].
4. Jin M., Zhang Yi., Chen W., et al. Position: What Can Large Language Models Tell Us About Time Series Analysis. In: Proceedings of the 41st International Conference on Machine Learning, ICML 2024, 21–27 July 2024, Vienna, Austria. 2024. https://doi.org/10.48550/arXiv.2402.02713
5. Lin M., Chen Zh., Liu Ya., et al. Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation. arXiv. URL: https://doi.org/10.48550/arXiv.2410.17462 [Accessed 15th May 2025].
6. Sycara K.P. Multiagent Systems. AI Magazine. 1998;19(2):79. https://doi.org/10.1609/aimag.v19i2.1370
7. Ghylsels E., Osborn D.R. The Econometric Analysis of Seasonal Time Series. New York: Cambridge University Press; 2001. 228 p.
8. Jebb A.T., Tay L., Wang W., Huang Q. Time Series Analysis for Psychological Research: Examining and Forecasting Change. Frontiers in Psychology. 2015;6. https://doi.org/10.3389/fpsyg.2015.00727
9. Darban Z.Z., Webb G.I., Pan Sh., Aggarwal Ch., Salehi M. Deep Learning for Time Series Anomaly Detection: A Survey. ACM Computing Surveys. 2024;57(1). https://doi.org/10.1145/3691338
10. Orgad H., Toker M., Gekhman Z., et al. LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations. In: The Thirteenth International Conference on Learning Representations, ICLR 2025, 24–28 April 2025, Singapore. 2025. https://doi.org/10.48550/arXiv.2410.02707
11. Štajner S., Evans R., Orasan C., Mitkov R. What Can Readability Measures Really Tell Us About Text Complexity? In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, 23–25 May 2012, Istanbul, Turkey. European Language Resources Association (ELRA); 2012. P. 14–21.
Keywords: time series, annotation generation, LLM, multi-agent system, dashboards
For citation: Kuznetsova A.I., Noskin V.V. Research and evaluation of the quality of natural language annotations generated by the multi-agent system. Modeling, Optimization and Information Technology. 2025;13(3). URL: https://moitvivt.ru/ru/journal/pdf?id=1967 DOI: 10.26102/2310-6018/2025.50.3.009 (In Russ).
Received 28.05.2025
Revised 26.06.2025
Accepted 01.07.2025