Быстрый поиск аномалий в числовых рядах при помощи модифицированного метода Хампеля
Работая с нашим сайтом, вы даете свое согласие на использование файлов cookie. Это необходимо для нормального функционирования сайта, показа целевой рекламы и анализа трафика. Статистика использования сайта отправляется в «Яндекс» и «Google»
Научный журнал Моделирование, оптимизация и информационные технологииThe scientific journal Modeling, Optimization and Information Technology
Online media
issn 2310-6018

Quick search for anomalies in number series using the modified Hampel method

idGilmullin M.F., Gilmullin T.M. 

UDC 004.942 + 519.246.8
DOI: 10.26102/2310-6018/2023.43.4.030

  • Abstract
  • List of references
  • About authors

The article discusses and formally introduces the concepts of a number series anomaly and an anomaly filter function. The relevance of the research is due to the absence of a unified approach to understanding the concept of anomaly. At the same time, they play a key role in solving many practical problems. The study uses a method for measuring the stability of the selected method of statistical assessment for outliers using breakdown points and sliding windows. The method of filtering a number series for outliers is based on a combination of the median and the median absolute deviation. In relation to solving a wide range of issues in IT automation а modification of the Hampel method is proposed for determining outliers in a sample. Functions for filtering a number series for anomalies and determining the index of the first anomalous element are developed in Python. As an example, a script was developed using the Jupyter Notebook platform to solve the problem of quick search for anomalies in stock prices by means of the modified Hampel method. To obtain a sample with outliers, the author's library is used to generate test stock data. The experimental results confirm that the proposed algorithms can clearly filter anomalies for different values of adjustable parameters. The advantages and disadvantages of this method are noted. The Hampel filter is easy to optimize and parallelize. The article has practical application for solving the problem of automation and identifying anomalies in number series.

1. Laxman S., Sastry P.S. A survey of temporal data mining. Sadhana. 2006;31:173–198. DOI: 10.1007/BF02719780.

2. Chesnokov M.Ju. Poisk anomalij vo vremennyh rjadah na osnove ansamblej algoritmov DBSCAN4 Moscow; 2018. URL: http://www.isa.ru/aidt/images/documents/2018-01/99-107.pdf (accessed on 01.10.2023).

3. Mastickij S.Je. Analiz vremennyh rjadov s pomoshh'ju R; 2020. URL: https://ranalytics.github.io/tsa-with-r/ch-anomaly-detection.html (accessed on 01.10.2023).

4. Ardelean V. Outliers in Time Series. Department of Statistics and Econometrics, University of Erlangen-Nuremberg; 2011. URL: https://www.statistik.rw.fau.de/files/2016/03/v01-2011.pdf (accessed on 01.10.2023).

5. Chandola V., Banerjee A., Kumar V. Anomaly detection: a survey, ACM Computing Surveys; 2009. URL: http://cucis.ece.northwestern.edu/projects/DMS/publications/AnomalyDetection.pdf (accessed on 01.10.2023).

6. Hampel F.R. The Influence curve and its role in robust estimation. Journal of the American Statistical Association. 1974;69:383–393. DOI: 10.2307/2285666.

7. Hancong Liu, Sirish Shah and Wei Jiang. On-line outlier detection and data cleaning. Computers & Chemical Engineering. 2004;28(9):1635–1647. URL: https://sites.ualberta.ca/~slshah/files/on_line_outlier_det.pdf (accessed on 01.10.2023).

8. Lewinson E. Python for Finance Cookbook — Second Edition. Birmingham, Packt; 2022. 740 p.

9. Hampel F.R. A general qualitative definition of robustness. Ann. Math. Stat. 1971;42:1887–1896.

10. Hampel F.R., Rousseeuw P.J., Ronchtti E.M., Stahel W.A. Robust Statistic: The Approach Based on Influence Functons. New York, Wiley & Sons; 1986. 536 p.

Gilmullin Mansur Fajzrakhmanovich
Candidate of Pedagogical Sciences, Associate Professor

WoS | Scopus | ORCID | eLibrary |

Freelancer

Kazan, the Russian Federation

Gilmullin Timur Mansurovich
Candidate of Technical Sciences

Freelancer

Moscow, the Russian Federation

Keywords: number series, anomalies, outliers, filtering, hampel

For citation: Gilmullin M.F., Gilmullin T.M. Quick search for anomalies in number series using the modified Hampel method. Modeling, Optimization and Information Technology. 2023;11(4). URL: https://moitvivt.ru/ru/journal/pdf?id=1482 DOI: 10.26102/2310-6018/2023.43.4.030 (In Russ).

650

Full text in PDF

Received 04.12.2023

Revised 08.12.2023

Accepted 20.12.2023

Published 31.12.2023