symbolic aggregate
play

Symbolic Aggregate Case of Interval . . . ApproXimation (SAX) How - PowerPoint PPT Presentation

Formulation of the . . . Symbolic Aggregate . . . SAX: Problem Towards Formulating . . . Symbolic Aggregate Case of Interval . . . ApproXimation (SAX) How Measurement . . . How Measurement . . . under Interval Uncertainty Solving the . . .


  1. Formulation of the . . . Symbolic Aggregate . . . SAX: Problem Towards Formulating . . . Symbolic Aggregate Case of Interval . . . ApproXimation (SAX) How Measurement . . . How Measurement . . . under Interval Uncertainty Solving the . . . What If We Minimize . . . Chrysostomos D. Stylios 1 and Vladik Kreinovich 2 Home Page Title Page 1 Laboratory of Knowledge and Intelligent Computing Department of Computer Engineering ◭◭ ◮◮ Technological Educational Institute of Epirus ◭ ◮ 47100 Kostakioi, Arta, Greece, stylios@teiep.gr Page 1 of 22 2 Department of Computer Science University of Texas at El Paso, 500 W. University Go Back El Paso, Texas 79968, USA vladik@utep.edu Full Screen Close Quit

  2. Formulation of the . . . Symbolic Aggregate . . . 1. Formulation of the Problem SAX: Problem • Need for diagnostics: often, we are monitoring a certain Towards Formulating . . . process for possible problems; e.g.: Case of Interval . . . How Measurement . . . – we check the observed vibrations of a mechanical How Measurement . . . system indicate an abnormality; Solving the . . . – we check the vital signs of a patient to see if an What If We Minimize . . . urgent medical intervention is needed. Home Page • Sometimes, we have an algorithm that, based on the Title Page observations, decided whether intervention is needed. ◭◭ ◮◮ • However, in most practical applications – especially in ◭ ◮ medicine – no such algorithm is readily available. Page 2 of 22 • What we have instead is numerous past data series corresponding both: Go Back – to cases when situation turned out to be normal, Full Screen – and to cases with abnormality. Close Quit

  3. Formulation of the . . . Symbolic Aggregate . . . 2. Formulation of the Problem (cont-d) SAX: Problem • We have numerous past data series corresponding both: Towards Formulating . . . Case of Interval . . . – to cases when situation turned out to be normal, How Measurement . . . – and to cases with abnormality. How Measurement . . . • We thus need to extract such an algorithm from all Solving the . . . these examples, i.e., use machine learning . What If We Minimize . . . Home Page • Most machine learning algorithms work well if we have up to dozens of inputs. Title Page ◭◭ ◮◮ • However, as a result of monitoring, we get values x ( t ) corresponding to hundreds of moments of time t . ◭ ◮ • So, to efficiently apply machine learning algorithms, we Page 3 of 22 first need to compress the input data. Go Back Full Screen Close Quit

  4. Formulation of the . . . Symbolic Aggregate . . . 3. Symbolic Aggregate approXimation (SAX): SAX: Problem Main Idea Towards Formulating . . . • The main objective of monitoring is to catch deviations Case of Interval . . . from the normal regimes as early as possible. How Measurement . . . How Measurement . . . • As a result, monitoring is performed at a high rate, to Solving the . . . catch a deviation while this deviation is small. What If We Minimize . . . • Thus, when the monitoring is arranged properly, values Home Page change very little from one moment to the next. Title Page • So, we can safely replace the original function x ( t ) with ◭◭ ◮◮ a piece-wise constant approximation. ◭ ◮ • On each interval, we store only its endpoints and the Page 4 of 22 value of the function on this interval. Go Back • This representation indeed leads to a drastic reduction in data size. Full Screen Close Quit

  5. Formulation of the . . . Symbolic Aggregate . . . 4. Symbolic Aggregate approXimation (cont-d) SAX: Problem • A further compression is possible since: Towards Formulating . . . Case of Interval . . . – a computer-represented real number require dozens How Measurement . . . of bits to store, corresponding to ten decimal digits, How Measurement . . . – but measurements accuracy is usually 1–10%, so Solving the . . . two decimal digits are enough. What If We Minimize . . . • Symbolic Aggregate approXimation (SAX) is a tech- Home Page nique for such a reduction. Title Page • In the interval [ x, x ] of possible values of x ( t ), we select ◭◭ ◮◮ thresholds x 0 = x, x 1 , x 2 , . . . , x m . ◭ ◮ • Then, for each moment of time t , instead of storing Page 5 of 22 x ( t ), we store the index i for which x ( t ) ∈ [ x i , x i +1 ]. Go Back • At present, SAX is the most efficient data compression Full Screen technique. Close Quit

  6. Formulation of the . . . Symbolic Aggregate . . . 5. SAX: Details and Successes SAX: Problem • To maximize the amount of information after compres- Towards Formulating . . . sion, SAX takes into account that: Case of Interval . . . How Measurement . . . – the maximum amount of Shannon’s information � m How Measurement . . . − p i · log 2 ( p i ), where p i = Prob( x ( t ) ∈ [ x i , x i +1 ]), Solving the . . . i =0 – is attained when all the probabilities p i are equal What If We Minimize . . . 1 Home Page to each other – and is, thus, equal to p i = m + 1. Title Page • Thus, SAX selects the thresholds x i for which ◭◭ ◮◮ 1 p i = Prob( x ( t ) ∈ [ x i , x i +1 ]) = m + 1 . ◭ ◮ Page 6 of 22 • SAX techniques led to many practical applications ranging from engineering to medicine. Go Back Full Screen Close Quit

  7. Formulation of the . . . Symbolic Aggregate . . . 6. SAX: Problem SAX: Problem • Measurement errors were a motivation for SAX tech- Towards Formulating . . . niques. Case of Interval . . . How Measurement . . . • However, SAX does not take measurement errors into How Measurement . . . account. Solving the . . . • So, we often get thresholds x i and x i +1 which are much What If We Minimize . . . closer to each other than the measurement accuracy. Home Page • Sometimes, x i and x i +1 differ by 5% while the mea- Title Page surement accuracy is 10%. ◭◭ ◮◮ • In this case, we cannot tell whether the actual value ◭ ◮ x ( t ) was in the i -th interval or in the next interval. Page 7 of 22 • It is therefore desirable to explicitly take measurement Go Back uncertainty into account in SAX techniques. Full Screen • This is what we do in this paper. Close Quit

  8. Formulation of the . . . Symbolic Aggregate . . . 7. Case When Measurement Inaccuracy Can Be SAX: Problem Ignored (Reminder) Towards Formulating . . . • Based on the observed values x ( t ), we can find the Case of Interval . . . probabilities with which different values of x occur. How Measurement . . . How Measurement . . . • These probabilities can be naturally described by a � Solving the . . . probability density function ρ ( x ), with ρ ( x ) dx = 1. What If We Minimize . . . • In many practical situations, the observed signal is a Home Page joint effect of many different independent processes. Title Page • In such situations, the Central Limit Theorem implies ◭◭ ◮◮ that the resulting distribution is Gaussian. ◭ ◮ • We want to select the thresholds x 1 , x 2 , . . . Page 8 of 22 • We can describe, for every value x , the number ρ t ( x ) of � Go Back thresholds per unit length; the total is ρ t ( x ) dx = m . Full Screen Close Quit

  9. Formulation of the . . . Symbolic Aggregate . . . 8. Case of No Measurement Inaccuracy (cont-d) SAX: Problem • After the data compression, the only information that Towards Formulating . . . we have about each value x ( t ) in the index i . Case of Interval . . . How Measurement . . . • So, to reconstruct the value x ( t ) based on this informa- How Measurement . . . tion, we select the midpoint � x ( t ) of the i -th subinterval. Solving the . . . • This reconstruction is approximate, there is an approx- What If We Minimize . . . def imation error ε ( t ) = � x ( t ) − x ( t ) � = 0. Home Page • Ideally, we would like to have all these errors to be as Title Page close to 0 as possible. ◭◭ ◮◮ • The vector ε = ( ε ( t 1 ) , ε ( t 2 ) , . . . ) of these errors should ◭ ◮ be close to the zero vector � 0 = (0 , 0 , . . . ): �� Page 9 of 22 ( ε ( t k )) 2 → min . d ( ε,� 0) = Go Back k Full Screen • In the continuous approximation, this is equivalent to � ( ε ( t )) 2 dt . Close minimizing Quit

  10. Formulation of the . . . Symbolic Aggregate . . . 9. Alternative Ideas SAX: Problem • The least-squares approach is vulnerable to outliers. Towards Formulating . . . Case of Interval . . . • The second idea is to avoid this sensitivity by using How Measurement . . . ℓ p -estimates: � | ε ( t ) | p dt → min . How Measurement . . . Solving the . . . What If We Minimize . . . • The third idea is to explicitly minimize the number of Home Page bits needed to describe all the thresholds. Title Page • If x i +1 − x i ≈ 2 − b , then it is sufficient to describe the first b binary digits of the corresponding interval. ◭◭ ◮◮ ◭ ◮ • This, the number of bits needed to store each threshold is approximately equal to b ≈ − log 2 ( x i +1 − x i ). Page 10 of 22 • So, we minimize the average number of bits, i.e., the Go Back sum − � log 2 ( x i +1 − x i ) or the corresponding integral. Full Screen k Close Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend