Time series data mining Outline Basic Knowledge Multi variate - - PowerPoint PPT Presentation

▶

Mar 22, 2024 204 likes •481 views

Time series data mining Outline Basic Knowledge Multi variate association States association What is time series data Formally, a time series data is defined as a sequence of pairs T = [(p1, t1), (p2, t2), . . . , (pi, ti), . . . , (pn, tn)]

SLIDE 1

Time series data mining

SLIDE 2

Outline

Basic Knowledge Multi variate association States association

SLIDE 3

What is time series data

Formally, a time series data is defined as a sequence of pairs T = [(p1, t1), (p2, t2), . . . , (pi, ti), . . . , (pn, tn)] (t1 < t2 < · · · < ti < · · · < tn), where each pi is a data point in a d-dimensional data space, and each ti is the time stamp at which the corresponding pi occurs.

10 20 30 40 50 60

SLIDE 4

1.high dimensionality 2.hierarchical nature A time series can be analyzed by its underlying time hierarchy, such as hourly, weekly, monthly, and yearly. 3.multi-variate Time series data analysis often studies one variable, but sometimes deals with time series data consisting of multiple related variables. For example, weather data consists of well-known measurements such as temperature, dew point, humidity, etc

Time Series Data Characteristics

SLIDE 5

Our work Multi variate association: A and B are highly correlated States association : A = 2 → B = 3

SLIDE 6

Multi variate association Extract feature ↓ Cluster the feature ↓ Analyze the clustering result

SLIDE 7

Time series are essentially high dimensional data and working directly with such data in its raw format is very expensive in terms of both processing and storage cost. It is thus highly desirable to develop representation techniques that can reduce the dimensionality of time series, while still preserve the fundamental characteristics of it.

Why to extract feature

SLIDE 8

How to extract feature

Principles: reduce dimension while preserve its fundamental characteristics Split the data into fixed size window ↓ Extract feature of each window [relative time,standard deviation]

SLIDE 9

Clustering

The objective is to find the most homogeneous clusters that are as distinct as possible from other clusters. More formally, the grouping should maximize intercluster variance while minimize intracluster variance.

SLIDE 10

Clustering

均小于相似度阈值至少有一个大于相似度阈值

N Y Y

取下一条数据取第一条数据单独成簇，簇中心为自己是否处理完所有数据和所有簇中心比较该数据单独成为一个新簇将该数据放进和它相似度最高的簇中，并且更换簇中心 N 取第一条数据和所有簇中心比较该数据单独成为一个新簇将该数据放进和它相似度最高的簇中是否处理完所有数据取下一条数据

Y N N

是否大于给定阈值合并这两个簇退出簇中心是否变化从所有簇中选出最相近的两个更换簇中心

SLIDE 11

The analysis of clustering result

. . ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Noise

∑∑ ∑

= = =

k j n i ij n i i

L L port

1 1 1

sup

SLIDE 12

Experiments result

SLIDE 13

States Association

Our goal

Not only to find that variable A,B,C,D are related but also discover their value association. A = 2 ,D = 4→ B = 3,C = 7

SLIDE 14

States Association

Transform the data into symbol
Apriori algorithm

SLIDE 15

Data preprocessing

Split the data into fixed length window ↓ Extract the feature of each window ↓ Cluster the window ↓ Symbolize each cluster

SLIDE 16

Feature extraction

1 2 3 4

DFT DWT APCA P A A

thers

feature extraction

5 6

frequency domain

statistics

time domain

… Multi views

SLIDE 17

Clustering and Symbolization

Cluster the features extracted from each window and then each cluster is represented by a character. So the windows within a cluster is marked with the corresponding character.

SLIDE 18

data →

SLIDE 19

data

time A1 A2 A3 A4 A5 A6

t1 a d a b c a t2 1 a a a d a t3 2 b b b c b

SLIDE 20

Association mining

The Apriori Algorithm is an influential algorithm for mining frequent itemsets and association rules. Association rule generation is usually split up into two separate steps:

1. First, minimum support is applied to find all frequent

itemsets in a database.

2. Second, these frequent itemsets and the minimum

confidence constraint are used to form rules. While the second step is straight forward, the first step needs more attention.

SLIDE 21

Association mining

SLIDE 22

Experiment result

A1 = {0,1,2}

1 3 1 1

1000 (t t )/ T 1 1 1000 (t t )/ T 2 A A A A × − = ⎧ ⎪ = = ⎨ ⎪− × − = ⎩

1 4 1 1

sin(t ) 1 1 sin(t ) 2 A A A A δ δ + = ⎧ ⎪ = = ⎨ ⎪− + = ⎩

SLIDE 23

Experiment result

A2 = {6,7,8,9}

2 2 5 2 2

2000 (t t )/ T 6 1500 (t t )/ T 7 1 8 1000 (t t )/ T 9 A A A A A × − = ⎧ ⎪ × − = ⎪ = ⎨ = ⎪ ⎪− × − = ⎩

2 2 6 2 2

sin(t ) 6 1 7 sin(t ) 8 cos(t ) 9 A A A A A δ δ δ + = ⎧ ⎪ = ⎪ = ⎨− + = ⎪ ⎪ + = ⎩

SLIDE 24

Experiment result

SLIDE 25

Experiment result

SLIDE 26

Time series data mining

Outline

Basic Knowledge Multi variate association States association

What is time series data

Time Series Data Characteristics

Our work Multi variate association: A and B are highly correlated States association : A = 2 → B = 3

Multi variate association Extract feature ↓ Cluster the feature ↓ Analyze the clustering result

Why to extract feature

How to extract feature

Clustering

Clustering

The analysis of clustering result

∑∑ ∑

Experiments result

States Association

Our goal

States Association

Data preprocessing

Feature extraction

Clustering and Symbolization

data →

data

Association mining

Association mining

Experiment result

Experiment result

Experiment result

Experiment result

Thanks