Methods for detecting atypical objects in a musical sequence

Kotelnikov V.V., Ahlestin A.I., Parinova E.V.

UDC 004.85
DOI: 10.26102/2310-6018/2025.50.3.035

Abstract
List of references
About authors

The article explores modern methods for automatic detection of atypical (anomalous) musical events within a musical sequence, such as unexpected harmonic shifts, uncharacteristic intervals, rhythmic disruptions, or deviations from musical style, aimed at automating this process and optimizing specialists' working time. The task of anomaly detection is highly relevant in music analytics, digital restoration, generative music, and adaptive recommendation systems. The study employs both traditional features (Chroma Features, MFCC, Tempogram, RMS-energy, Spectral Contrast) and advanced sequence analysis techniques (self-similarity matrices, latent space embeddings). The source data consisted of diverse MIDI corpora and audio recordings from various genres, normalized to a unified frequency and temporal scale. Both supervised and unsupervised learning methods were tested, including clustering, autoencoders, neural network classifiers, and anomaly isolation algorithms (isolation forests). The results demonstrate that the most effective approach is a hybrid one that combines structural musical features with deep learning methods. The novelty of this research lies in a comprehensive comparison of traditional and neural network approaches for different types of anomalies on a unified dataset. Practical testing has shown the proposed method's potential for automatic music content monitoring systems and for improving the quality of music recommendations. Future work is planned to expand the research to multimodal musical data and real-time processing.

1. Müller M. Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Cham: Springer; 2015. 487 p. https://doi.org/10.1007/978-3-319-21945-5

2. Tzanetakis G., Cook P. Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing. 2002;10(5):293–302. https://doi.org/10.1109/TSA.2002.800560

3. Choi K., Fazekas G., Sandler M.B., Cho K. Convolutional Recurrent Neural Networks for Music Classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 05–09 March 2017, New Orleans, LA, USA. IEEE; 2017. P. 2392–2396. https://doi.org/10.1109/ICASSP.2017.7952585

4. Huang Yu-S., Yang Yi-H. Pop Music Transformer: Beat-Based Modeling and Generation of Expressive Pop Piano Compositions. In: MM '20: Proceedings of the 28th ACM International Conference on Multimedia, 12–16 October 2020, Seattle, WA, USA. New York: Association for Computing Machinery; 2020. P. 1180–1188. https://doi.org/10.1145/3394171.3413671

5. Luo Yi.-J., Su L. Learning Domain-Adaptive Latent Representations of Music Signals Using Variational Autoencoders. In: Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), 23–27 September 2018, Paris, France. 2018. P. 653–660.

6. Foote J. Visualizing Music and Audio Using Self-Similarity. In: MULTIMEDIA '99: Proceedings of the 7th ACM international Conference on Multimedia (Part 1), 30 October – 5 November 1999, Orlando, FL, USA. New York: Association for Computing Machinery; 1999. P. 77–80. https://doi.org/10.1145/319463.319472

7. Peeters G., Angulo F. SSM-Net: Feature Learning for Music Structure Analysis Using a Self‑Similarity‑Matrix Based Loss. In: ISMIR 2022: Proceedings of the 23rd International Society for Music Information Retrieval Conference, 04–08 December 2022, Bengaluru, India. 2022. https://arxiv.org/abs/2211.08141

8. McFee B., Ellis D. Analyzing Song Structure with Spectral Clustering. In: ISMIR 2014: Proceedings of the 15th International Society for Music Information Retrieval Conference, 27–31 October 2014, Taipei, Taiwan. 2014. P. 405–410.

9. Sigtia S., Benetos E., Dixon S. An End-to-End Neural Network for Polyphonic Piano Music Transcription. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2016;24(5):927–939. https://doi.org/10.1109/TASLP.2016.2533858

10. Lattner S., Grachten M., Widmer G. Learning Transposition-Invariant Interval Features from Symbolic Music and Audio. In: Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), 23–27 September 2018, Paris, France. 2018. https://doi.org/10.48550/arXiv.1806.08236

Kotelnikov Vladimir Vladimirovich

Email: vv.kotelnikov@inbox.ru

Voronezh State Technical University

Voronezh, Russian Federation

Ahlestin Andrey Igorevich

Voronezh State Technical University

Voronezh, Russian Federation

Parinova Evgeniya Victorovna

Voronezh State Technical University

Voronezh, Russian Federation

Keywords: musical sequence, anomaly, tempogram, musical style, MFCC, chroma, autoencoder, music anomaly detection

For citation: Kotelnikov V.V., Ahlestin A.I., Parinova E.V. Methods for detecting atypical objects in a musical sequence. Modeling, Optimization and Information Technology. 2025;13(3). URL: https://moitvivt.ru/ru/journal/pdf?id=1993 DOI: 10.26102/2310-6018/2025.50.3.035 (In Russ).

295

Full text in PDF

Received 24.06.2025

Revised 29.07.2025

Accepted 07.08.2025

Published 30.09.2025