Keywords: number series, anomalies, outliers, filtering, hampel
Quick search for anomalies in number series using the modified Hampel method
UDC 004.942 + 519.246.8
DOI: 10.26102/2310-6018/2023.43.4.030
The article discusses and formally introduces the concepts of a number series anomaly and an anomaly filter function. The relevance of the research is due to the absence of a unified approach to understanding the concept of anomaly. At the same time, they play a key role in solving many practical problems. The study uses a method for measuring the stability of the selected method of statistical assessment for outliers using breakdown points and sliding windows. The method of filtering a number series for outliers is based on a combination of the median and the median absolute deviation. In relation to solving a wide range of issues in IT automation а modification of the Hampel method is proposed for determining outliers in a sample. Functions for filtering a number series for anomalies and determining the index of the first anomalous element are developed in Python. As an example, a script was developed using the Jupyter Notebook platform to solve the problem of quick search for anomalies in stock prices by means of the modified Hampel method. To obtain a sample with outliers, the author's library is used to generate test stock data. The experimental results confirm that the proposed algorithms can clearly filter anomalies for different values of adjustable parameters. The advantages and disadvantages of this method are noted. The Hampel filter is easy to optimize and parallelize. The article has practical application for solving the problem of automation and identifying anomalies in number series.
1. Laxman S., Sastry P.S. A survey of temporal data mining. Sadhana. 2006;31:173–198. DOI: 10.1007/BF02719780.
2. Chesnokov M.Ju. Poisk anomalij vo vremennyh rjadah na osnove ansamblej algoritmov DBSCAN4 Moscow; 2018. URL: http://www.isa.ru/aidt/images/documents/2018-01/99-107.pdf (accessed on 01.10.2023).
3. Mastickij S.Je. Analiz vremennyh rjadov s pomoshh'ju R; 2020. URL: https://ranalytics.github.io/tsa-with-r/ch-anomaly-detection.html (accessed on 01.10.2023).
4. Ardelean V. Outliers in Time Series. Department of Statistics and Econometrics, University of Erlangen-Nuremberg; 2011. URL: https://www.statistik.rw.fau.de/files/2016/03/v01-2011.pdf (accessed on 01.10.2023).
5. Chandola V., Banerjee A., Kumar V. Anomaly detection: a survey, ACM Computing Surveys; 2009. URL: http://cucis.ece.northwestern.edu/projects/DMS/publications/AnomalyDetection.pdf (accessed on 01.10.2023).
6. Hampel F.R. The Influence curve and its role in robust estimation. Journal of the American Statistical Association. 1974;69:383–393. DOI: 10.2307/2285666.
7. Hancong Liu, Sirish Shah and Wei Jiang. On-line outlier detection and data cleaning. Computers & Chemical Engineering. 2004;28(9):1635–1647. URL: https://sites.ualberta.ca/~slshah/files/on_line_outlier_det.pdf (accessed on 01.10.2023).
8. Lewinson E. Python for Finance Cookbook — Second Edition. Birmingham, Packt; 2022. 740 p.
9. Hampel F.R. A general qualitative definition of robustness. Ann. Math. Stat. 1971;42:1887–1896.
10. Hampel F.R., Rousseeuw P.J., Ronchtti E.M., Stahel W.A. Robust Statistic: The Approach Based on Influence Functons. New York, Wiley & Sons; 1986. 536 p.
Keywords: number series, anomalies, outliers, filtering, hampel
For citation: Gilmullin M.F., Gilmullin T.M. Quick search for anomalies in number series using the modified Hampel method. Modeling, Optimization and Information Technology. 2023;11(4). URL: https://moitvivt.ru/ru/journal/pdf?id=1482 DOI: 10.26102/2310-6018/2023.43.4.030 (In Russ).
Received 04.12.2023
Revised 08.12.2023
Accepted 20.12.2023
Published 31.12.2023