Distribution-Wise Symbolic Aggregate ApproXimation (dwSAX)

Kloska, M., Rozinajova, V.

The Symbolic Aggregate approXimation algorithm (SAX) is one of the most popular symbolic mapping techniques for time series. It is extensively utilized in sequence classification, pattern mining, anomaly detection and many other data mining tasks. SAX as a powerful symbolic mapping technique is widely used due to its data adaptability. However this approach heavily relies on assumption that processed time series have Gaussian distribution. When time series distribution is non-Gaussian or skews over time, this method does not provide sufficient symbolic representation. This paper proposes a new method of symbolic time series representation named distribution-wise SAX (dwSAX) which can deal with Gaussian as well as with non-Gaussian data distribution in contrast with the original SAX, handling only the first case. Our method employs more general approach for symbol breakpoints selection and thus it contributes to more efficient utilization of provided alphabet symbols. The goal is to optimally cover the information space. The method was evaluated on different data mining tasks with promising improvements over SAX.

Cite: Kloska, M., Rozinajova, V. Distribution-Wise Symbolic Aggregate ApproXimation (dwSAX). Intelligent Data Engineering and Automated Learning – IDEAL (2020). DOI: 10.1007/978-3-030-62362-3_27


Matej Kloska
Research Engineer
Viera Rozinajová
Lead and Researcher