DSDC: A Large-scale KPI Clustering Model Based on DWSBD Improved DBSCAN

Journal: Modern Economics & Management Forum DOI: 10.32629/memf.v5i2.1973

Haiyan Feng, Mingwei Li

Department of Statistics, Northeastern University, Shenyang 066004, Liaoning, China

Abstract

In response to the problem of a large amount of noise, anomalies, and phase shifts in KPI time series, which makes it difficult to obtain the correct number of clusters and good clustering accuracy, this paper proposes an improved DBSCAN based clustering algorithm — DSDC algorithm. Firstly, the underlying shape extraction technique of KPI data is proposed based on the existence of a large amount of noise and anomalies in KPI data. Secondly, the DBSCAN clustering algorithm is used to solve the phase shift problem in KPI time series. Finally, for the problem that the DBSCAN algorithm is parameter sensitive and cannot handle multi-density data, a new similarity measure is proposed, i.e., density-weighted shape-based distance (DWSBD). The experimental results show that the DSDC algorithm has higher ACC, NMI, F-score and shorter clustering time compared with K-Medoids and Spectral clustering algorithms which are also based on the underlying shape extraction technique and DWSBD distance.

Keywords

DBSCAN, DBSCAN, DWSBD, KPI, clustering, density-weighted

References

[1]A. Fahim, “An extended DBSCAN clustering algorithm”, International Journal of Advanced Computer Science and Applications (IJACSA), vol.13, no. 3, 2022.
[2]Nakagawa K, Imamura M, Yoshida K. Stock Price Prediction using k-Medoids Clustering with Indexing Dynamic Time Warping[J]. IEEJ Transactions on Electronics Information and Systems, 2018, 138(8):986-991.
[3]J. Zhao, L. Itti, shapedtw: Shape dynamic time warping, Pattern Recognition 74 (2018) 171–184.
[4]X. Xi, E. Keogh, L. Wei, A. Mafra-Neto, Finding motifs in a database of shapes, in: Proceedings of the 2007 SIAM International Conference on Data Mining, SIAM, 2007, pp. 249–260.
[5]M. F. Hassanin, M. Hassan, and A. Shoeb, “DDBSCAN: Different densities-based spatial clustering of applications with noise”, International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), pp. 401–404, 2015.
[6]K. H. Zou, K. Tuncali, and S. G. Silverman, “Correlation and simple linear regression,” Radiology, vol. 227, no. 3, pp. 617–628, 2003.
[7]R. Hyde, and P. Angelov, “A fully autonomous data density based clustering technique”, IEEE Symposium on Evolving and Autonomous Learning Systems (EALS), pp. 116–123, 2014.
[8]Dau H,Bagnall A,Kamgar K,et al.The UCR time series archive[J].IEEE-CAA Journal of Automatica Sinica,2019,6(6):1293-1305.
[9]Souto M, Coelho A, Faceli K, et al. A Comparison of External Clustering Evaluation Indices in the Context of Imbalanced Data Sets[C]// Neural Networks. IEEE, 2012.

Copyright © 2024 Haiyan Feng, Mingwei Li

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License