CS145: INTRODUCTION TO DATA MINING Sequence Data: Similarity Search - PowerPoint PPT Presentation

CS145: INTRODUCTION TO DATA MINING Sequence Data: Similarity Search Instructor: Yizhou Sun yzsun@cs.ucla.edu November 27, 2017

Methods to be Learnt Vector Data Set Data Sequence Data Text Data Logistic Regression; Naïve Bayes for Text Classification Decision Tree ; KNN; SVM ; NN Clustering K-means; hierarchical PLSA clustering; DBSCAN; Mixture Models Linear Regression Prediction GLM* Apriori; FP growth GSP; PrefixSpan Frequent Pattern Mining Similarity Search DTW 2

Similarity Search on Time Series Data • Basic Concepts • Time Series Similarity Search • *Time Series Prediction and Forecasting • Summary 3

Example: Inflation Rate Time Series 4

Example: Unemployment Rate Time Series 5

Example: Stock 6

Example: Product Sale 7

Time Series • A time series is a sequence of numerical data points, measured typically at successive times, spaced at (often uniform) time intervals • Random variables for a time series are Represented as: • 𝑍 = 𝑍 1 , 𝑍 2 , … , 𝑝𝑠 • 𝑍 = 𝑍 𝑢 : 𝑢 ∈ 𝑈 , 𝑥ℎ𝑓𝑠𝑓 𝑈 𝑗𝑡 𝑢ℎ𝑓 𝑗𝑜𝑒𝑓𝑦 𝑡𝑓𝑢 • An observation of a time series with length N is represent as: • 𝑍 = {𝑧 1 , 𝑧 2 , … , 𝑧 𝑂 } 8

Why Similarity Search? • Wide applications • Find a time period with similar inflation rate and unemployment time series? • Find a similar stock to Facebook? • Find a similar product to a query one according to sale time series? • … 10

Example VanEck International Fund Fidelity Selective Precious Metal and Mineral Fund Two similar mutual funds in the different fund group 11

Similarity Search for Time Series Data • Time Series Similarity Search • Euclidean distances and 𝑀 𝑞 norms • Dynamic Time Warping (DTW) • Time Domain vs. Frequency Domain 12

Euclidean Distance and Lp Norms • Given two time series with equal length n • 𝐷 = 𝑑 1 , 𝑑 2 , … , 𝑑 𝑜 • 𝑅 = 𝑟 1 , 𝑟 2 , … , 𝑟 𝑜 • 𝑒 𝐷, 𝑅 = ∑|𝑑 𝑗 − 𝑟 𝑗 | 𝑞 1/𝑞 • When p=2, it is Euclidean distance 13

Enhanced Lp Norm-based Distance • Issues with Lp Norm: cannot deal with offset and scaling in the Y-axis • Solution: normalizing the time series ′ = 𝑑 𝑗 −𝜈(𝐷) • 𝑑 𝑗 𝜏(𝐷) 14

Dynamic Time Warping (DTW) • For two sequences that do not line up well in X-axis, but share roughly similar shape • We need to warp the time axis to make better alignment 15

Goal of DTW • Given • Two sequences (with possible different lengths): • 𝑌 = {𝑦 1 , 𝑦 2 , … , 𝑦 𝑂 } • 𝑍 = {𝑧 1 , 𝑧 2 , … , 𝑧 𝑁 } • A local distance (cost) measure between 𝑦 𝑜 and 𝑧 𝑛 : 𝑑(𝑦 𝑜 , 𝑧 𝑛 ) • Goal: • Find an alignment between X and Y, such that, the overall cost is minimized 16

Cost Matrix of Two Time Series 𝒅(𝒚 𝒐 , 𝒛 𝒏 ) 17

Represent an Alignment by Warping Path • An (N,M)-warping path is a sequence 𝑞 = (𝑞 1 , 𝑞 2 , … , 𝑞 𝑀 ) with 𝑞 𝑚 = (𝑜 𝑚 , 𝑛 𝑚 ) , satisfying the three conditions: • Boundary condition: 𝑞 1 = 1,1 , 𝑞 𝑀 = 𝑂, 𝑁 • Starting from the first point and ending at last point • Monotonicity condition: 𝑜 𝑚 and 𝑛 𝑚 are non- decreasing with 𝑚 • Step size condition: • 𝑞 𝑚+1 − 𝑞 𝑚 ∈ 0,1 , 1,0 , 1,1 • Move one step right, up, or up-right 18

Q: Which Path is a Warping Path? 19

Optimal Warping Path • The total cost given a warping path p • 𝑑 𝑞 𝑌, 𝑍 = ∑ 𝑚 𝑑(𝑦 𝑜 𝑚 , 𝑧 𝑛 𝑚 ) • The optimal warping path p* • 𝑑 𝑞 ∗ 𝑌, 𝑍 = min 𝑑 𝑞 𝑌, 𝑍 𝑞 𝑗𝑡 𝑏𝑜 𝑂, 𝑁 − 𝑥𝑏𝑠𝑞𝑗𝑜𝑕 𝑞𝑏𝑢ℎ • DTW distance between X and Y is defined as: • the optimal cost 𝑑 𝑞 ∗ 𝑌, 𝑍 20

How to Find p*? • Naïve solution: • Enumerate all the possible warping path • Exponential in N and M! 21

Dynamic Programming for DTW • Dynamic programming: • Let D(n,m) denote the DTW distance between X(1,…,n) and Y(1,…,m ) • D is called accumulative cost matrix • Note D(N,M) = DTW(X,Y) • Recursively calculate D(n,m) • 𝐸 𝑜, 𝑛 = min 𝐸 𝑜 − 1, 𝑛 , 𝐸 𝑜, 𝑛 − 1 , 𝐸 𝑜 − 1, 𝑛 − 1 + 𝑑(𝑦 𝑜 , 𝑧 𝑛 ) • When m or n = 1 • 𝐸 𝑜, 1 = ∑ 𝑙=1:𝑜 𝑑 𝑦 𝑙 , 𝑧 1 ; Time complexity: O(MN) • 𝐸 1, 𝑛 = ∑ 𝑙=1:𝑛 𝑑 𝑦 1 , 𝑧 𝑙 ; 22

Trace back to Get p* from D 23

Example 24

Time Domain vs. Frequency Domain • Many techniques for signal analysis require the data to be in the frequency domain • Usually data-independent transformations are used • The transformation matrix is determined a priori • discrete Fourier transform (DFT) • discrete wavelet transform (DWT) • The distance between two signals in the time domain is the same as their Euclidean distance in the frequency domain 25

Example of DFT 26

Example of DWT (with Harr Wavelet) 28

*Discrete Fourier Transformation • DFT does a good job of concentrating energy in the first few coefficients • If we keep only first a few coefficients in DFT, we can compute the lower bounds of the actual distance • Feature extraction: keep the first few coefficients (F-index) as representative of the sequence 30

*DFT (Cont.) • Parseval’s Theorem   1 1 n n    2 2 | | | | x X t f   0 0 t f • The Euclidean distance between two signals in the time domain is the same as their distance in the frequency domain • Keep the first few (say, 3) coefficients underestimates the distance and there will be no false dismissals! 3 n          2 2 | [ ] [ ] | | ( )[ ] ( )[ ] | S t Q t F S f F Q f   0 0 t f 31

Categories of Time-Series Movements • Categories of Time-Series Movements (T, C, S, I) • Long-term or trend movements (trend curve): general direction in which a time series is moving over a long interval of time • Cyclic movements or cycle variations: long term oscillations about a trend line or curve • e.g., business cycles, may or may not be periodic • Seasonal movements or seasonal variations • E.g., almost identical patterns that a time series appears to follow during corresponding months of successive years. • Irregular or random movements 33

Lag, Difference • The first lag of 𝑍 𝑢 is 𝑍 𝑢−1 ; the jth lag of 𝑍 𝑢 is 𝑍 𝑢−𝑘 • The first difference of a time series, Δ𝑍 𝑢 = 𝑍 𝑢 − 𝑍 𝑢−1 • Sometimes difference in logarithm is used Δln(𝑍 𝑢 ) = ln(𝑍 𝑢 ) − ln(𝑍 𝑢−1 ) 35

Example: First Lag and First Difference 36

Autocorrelation • Autocorrelation: the correlation between a time series and its lagged values • The first autocorrelation 𝜍 1 • The jth autocorrelation 𝜍 𝑘 Autocovariance 37

Sample Autocorrelation Calculation • The jth sample autocorrelation ෞ 𝑑𝑝𝑤(𝑍 𝑢 ,𝑍 𝑢−𝑘 ) • ො 𝜍 𝑘 = 𝑍 𝑍 𝑢 𝑢−𝑘 𝑤𝑏𝑠(𝑍 ෞ 𝑢 ) 𝑧 𝑘+1 𝑧 1 • Where ෞ 𝑑𝑝𝑤(𝑍 𝑢 , 𝑍 𝑢−𝑘 ) is calculated as: 𝑧 𝑘+2 𝑧 2 ⋮ ⋮ 𝑧 𝑈−1 𝑧 𝑈−𝑘−1 𝑧 𝑈 𝑧 𝑈−𝑘 • i.e., considering two time series: Y(1,…,T -j) and Y(j+1,…,T) 38

Example of Autocorrelation • For inflation and its change 𝝇 𝟐 = 𝟏. 𝟗𝟔 , very high: Last quarter’s inflation rate contains much information about this quarter’s inflation rate 39

Focus on Stationary Time Series • Stationary is key for time series regression: Future is similar to the past in terms of distribution 40

Autoregression • Use past values 𝑍 𝑢−1, 𝑍 𝑢−2 , … to predict 𝑍 𝑢 • An au auto tore regre gressi ssion on is a regression model in which Y t is regressed against its own lagged values. • The number of lags used as regressors is called the or orde der r of the autoregression. • In a first order autoregression , Y t is regressed against Y t – 1 • In a p th order autoregression , Y t is regressed against Y t – 1 , Y t – 2 ,…, Y t – p 41

The First Order Autoregression Model AR(1) • AR(1) model: • The AR(1) model can be estimated by OLS regression of Y t against Y t – 1 • Testing β 1 = 0 vs. β 1 ≠ 0 provides a test of the hypothesis that Y t – 1 is not useful for forecasting Y t 42

CS145: INTRODUCTION TO DATA MINING Sequence Data: Similarity Search - PowerPoint PPT Presentation

CS145: INTRODUCTION TO DATA MINING Sequence Data: Similarity Search Instructor: Yizhou Sun yzsun@cs.ucla.edu November 27, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Logistic Regression; Nave Bayes for Text

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

CS145: INTRODUCTION TO DATA MINING Sequence Data: Sequential Pattern Mining Instructor: Yizhou

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

CS145: INTRODUCTION TO DATA MINING 09: Vector Data: Clustering Basics Instructor: Yizhou Sun

CS145: INTRODUCTION TO DATA MINING 7: Vector Data: K Nearest Neighbor Instructor: Yizhou Sun

CS145: INTRODUCTION TO DATA MINING 6: Vector Data: Neural Network Instructor: Yizhou Sun

CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun

CS145: INTRODUCTION TO DATA MINING Course Project Overview Instructor: Yizhou Sun

CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues

CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun

Introduction What is data mining? to Data mining functionalities Data Mining Major

CS145: INTRODUCTION TO DATA MINING 1: Introduction Instructor: Yizhou Sun yzsun@cs.ucla.edu

CS145: INTRODUCTION TO DATA MINING 1: Introduction Instructor: Yizhou Sun yzsun@cs.ucla.edu

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Announcements The first midterm is a week from today It will be in class and similar in format to

Monthly Unemployment Daniela Gumprecht Directorate Population Quality Issues Madrid 11 May

Solar Neutrino and Solar Neutrino and Neutrino Physics in Brazil Neutrino Physics in Brazil

Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, November 26 th Definition of

Commonsense Computing: Concurrency and Concert Tickets Gary Lewandowski Dennis J. Bouvier

Committee July 1, 2018 to June 30, 2019 Introduction Who We are Site Visits Who is the

Dynamic validation of OCL constraints with mOdCL Manuel Rold an Francisco Dur an

Recycling Charity Retail Association zerowastescotland.org.uk @zerowastescot Outline