1 Relational Database Domains SNOOPYFAMILY - - PowerPoint PPT Presentation

1 relational database domains snoopyfamily male
SMART_READER_LITE
LIVE PREVIEW

1 Relational Database Domains SNOOPYFAMILY - - PowerPoint PPT Presentation

1 Relational Database Domains SNOOPYFAMILY Male Female Primary Key SQL ID NAME SEX 1 SNOOPY Male Select NAME 2 CHARLIE BROWN Male From SNOOPYFAMILY 3 SALLY


slide-1
SLIDE 1

1

資料庫系統實驗室

指導教授:張玉盈

slide-2
SLIDE 2

2

Relational Database

ID NAME SEX 1 SNOOPY Male 2 CHARLIE BROWN Male 3 SALLY BROWN Female 4 LUCY VAN PELT Female 5 LINUS VAN PELT Male 6 PEPPERMINT PATTY Female 7 MARCIE Female 8 SCHROEDER Male 9 WOODSTOCK

  • Degree

Cardinality Attributes Tuples

Male Female

Domains Primary Key

Select NAME From SNOOPYFAMILY Where SEX = ‘Male’;  利用SQL做查詢:  結果:

ID NAME SEX

1 SNOOPY Male 2 CHARLIE BROWN Male 5 LINUS VAN PELT Male 8 SCHROEDER Male

SNOOPYFAMILY

slide-3
SLIDE 3

3

Image Databases

S H T M

(a) An image picture (b) The corresponding symblic representation

2D String : x : M<H<T=S y : H=T<M<S

slide-4
SLIDE 4

4

Image Database

應用層面:辦公室自動化、電腦輔助設計、醫學影像擷取…等等。

影像資料庫中的查詢(Queries):

  • Spatial Reasoning(空間推理) : 在一張影像中推論兩兩物件之間的空間

關係。

  • Pictorial Query(圖像查詢) : 允許使用者給予特定的空間關係以查詢相

對應的影像。

  • Similarity Retrieval(圖形相似擷取) : 藉由使用者所提供的資訊在影像

資料庫中找尋出最相似的圖形。

Marc Lucy Pe Fr Linu Char (a) An image picture (b) Symbolic Picture

slide-5
SLIDE 5

5

  • Uids of 13 spatial operators
slide-6
SLIDE 6

6

Another View of 169 relations

| | |* |* | | |* |* |* |* |* |* | / |* |* / | /* /* |* |* /* /* | ] |* |* ] | [ |* |* [ | % |* |* % | = |* |* = | ]* ]* |* |* ]* ]* | [* [* |* |* [* [* | %* %* |* |* %* %* /* /* < / < / <* <* /* /* <* <* ] < ] <* <* [ < [ <* <* % < % <* <* = < = <* <* ]* ]* < ]* ]* <* <* [* [* < [* [* <* <* %* %* < %* %* <* <* < < <* <* < < <* <* <* <* <* <* < | <* <* | < |* |* <* <* |* |* <* <* / < / < /* /* <* <* /* /* < ] <* <* ] < [ <* <* [ < % <* <* % <* <* = < = < ]* ]* <* <* ]* ]* < [* [* <* <* [* [* < %* %* <* <* %* %* | < |* |* < | <* <* |* |* <* <* / | / |* |* /* /* | /* /* |* |* ] | ] |* |* [ | [ |* |* % | % |* |* = | = |* |* ]* ]* | ]* ]* |* |* [* [* | [* [* |* |* %* %* | %* %* |* |* %* %* / [* [* / ]* ]* / = / % / [ / ] / /* /* / / / / /* /* /* /* /* /* ] /* /* [ /* /* % /* /* = /* /* ]* ]* /* /* [* [* /* /* %* %* /* /* / ] /* /* ] / [ /* /* [ /* /* % / = /* /* = / ]* ]* /* /* ]* ]* / [* [* /* /* [* [* / %* %* /* /* %* %* ] %* %* ] [* [* [ %* %* % %* %* [ [* [* % [* [* % ]* ]* [ ]* ]* ] ]* ]* ] = [ = % = % % [ % ] % = % ]* ]* = [* [* = %* %* = %* %* ]* ]* [* [* ]* ]* ]* ]* ]* ]* = ]* ]* = [* [* = %* %* ]* ]* %* %* ]* ]* [* [* [* [* [* [* [* [* %* %* %* %* %* %* %* %* [* [* %* %* % [* [* % ]* ]* % ]* ]* [ [* [* [ %* %* [ %* %* ] [* [* ] ]* ]* ]

D (48) J (40) P (50) C (16) B (16)

/ % ] ] [ ] % ] = ] ] [ [ [ % [ = [ = = = =

slide-7
SLIDE 7

7

  • 5 Category Relationships(CAB)

A B Disjoin : A B Meet : B A Partly Overlap : A B Contain : B A Inside :

slide-8
SLIDE 8

8

  • Decision tree of the CATEGORY function
  • idx, oidy > 4

Contain Belong Part_Overlap Join Disjoin T T T T F F F F

  • idx, oidy > 2

7 ≦ oidx, oidy ≦ 10 10 ≦ oid

x, oidy ≦ 13

slide-9
SLIDE 9

9

  • UID Matrix representation(cont.)

a c d b f1

                  * % * * % % /* /* * /* d c b a

a b c d

            1 13 1 1 2 13 9 6 1 6 2 6 d c b a

a b c d

slide-10
SLIDE 10

10

  • Similarity Retrieval based on the UID Matrix(1)

Definition1 Picture f’ is a type-i unit picture of f, if (1) f’ is a picture containing the two objects A and B, represented as x: A rx’A,B B, y: A ry’A,B B. (2) A and B are also contained in f. (3) the relationships between A and B in f are represented as x: A rxA,B B, and y: A ryA,B B. Then, (Type-0): Category(rx’A,B , ry’A,B) (Type-1): (Type-0) and (rx’A,B = rxA,B or ry’A,B =ryA,B) (Type-2): rx’A,B = rxA,B and ry’A,B =ryA,B

slide-11
SLIDE 11

11

  • 3 type-i similarities

A B

f(A/B, A/*B)

A B

typ ype-1 (A/B, A[*B)

B A

type-0(A/*B, A%*B)

B A

type-2 (A/B, A/*B)

slide-12
SLIDE 12

Image Mining: Finding Frequent Patterns in Image Databases

Charlie Brown often appears to the right of Snoopy. Setting the minimum support to ½.

12

slide-13
SLIDE 13

13

Video : Image + Time + + +

範例: 一幕幕的Snoopy影像,編織成一部精彩的Snoopy影片

Time

Image 1 Image 2 Image 3 Image 4 Image N

……

+ +

slide-14
SLIDE 14

14

Multimedia Database

-Voice -Video - Pictures - Flow Chart - Pictures with the depicted texts

你喜歡史奴比 嗎? 你可以加入我們實 驗室。 Yes 到別的實驗室看看 吧! No

slide-15
SLIDE 15

15

Spatial Database : Nearest Neighbor Query

Where is the nearest restaurant to our location ?

slide-16
SLIDE 16

16

Query Types

  • 1. 精確比對查詢:

哪一個城市位在北緯43度與西經88度?

  • 2. 部分比對查詢:

哪些城市的緯度屬於北緯39度43分?

  • 3. 給定範圍查詢:

哪些城市的經緯度介於北緯39度43分 至43度與西經53度至58度之間?

  • 4. 近似比對查詢:

最靠近東勢鎮的城市是?

slide-17
SLIDE 17

17

Difficulty

 No total ordering of spatial data objects that

preserves the spatial proximity.

a b c d a b c d a b c d ? / a c b d ?

slide-18
SLIDE 18

18

Space Decomposition and DZ expression

slide-19
SLIDE 19

19

The Bucket-Numbering Scheme

(b)

5 7 1 4 2 3 6 8 9 12 13 10 11 14 15

(a) (c) Smaller Bigger N-order Peano Curve the uptrend of the bucket numbers of an object

slide-20
SLIDE 20

20

Example

O(l,u) = (12,26) The total number of buckets depends on the expected number

  • f data objects.

maximum bucket number:

Max_bucket = 63

slide-21
SLIDE 21

21

Example

the data (b) the corresponding NA-tree structure (bucket_capacity = 2)

slide-22
SLIDE 22

22

The basic structure of the revised version of the NA-tree

slide-23
SLIDE 23

23

NN (Nearest Neighbor)

 NN problem is to find the nearest

neighbor of q (query point).

Query point Nearest neighbor of q

q

Managed by a Peer

slide-24
SLIDE 24

Spatial Databases: KN KNN N Ke Keywor word d Qu Quer ery

Where are the 2 nearest points with keywords B and C?

24

slide-25
SLIDE 25

Road Network Databases: K K Ne Near arest est Ne Neighbor ghbor Qu Quer ery

Where are the 3 nearest restaurants?

25

slide-26
SLIDE 26

Spatial Databases: Top

  • p-k Sp

Spat atial al Key eywo word d Qu Quer ery

Where are the top-1 ‘Snoopy hotel’ near Kaohsiung?

26

slide-27
SLIDE 27

27

RNN (Reversed NN)

 The q is the nearest neighbor of the

blue points.

 RNN is a complement of NN problem.

Query point Reverse nearest neighbor of q Reverse nearest neighbor of q Reverse nearest neighbor of q

q

Managed by a Peer

slide-28
SLIDE 28
  • Reverse Nearest Neighbor(RNN) Query

means : to obtain the objects which treat the query as their nearest neighbor.

  • Application : Business strategy

Query q Residents Location A Location B

Five residents treat Location B as their NN. Three residents treat Location A as their NN.

Location B is a better place to run the store.

28

slide-29
SLIDE 29
  • Reverse Nearest Neighbor(RNN) Query means :

to obtain the objects which treat the query as their nearest neighbor.

  • Application : Traffic police

A B

Five cars treat Location A as their NN. Three cars treat Location B as their NN.

Location A is a better place to the police for patrol.

A

Query q Query move Cars Traffic jam Traffic smooth

29

slide-30
SLIDE 30

Spatial Database : Continuous Nearest Neighbor Query

S E

Find the nearest gas stations from the starting point to the ending point.

30

slide-31
SLIDE 31

31

Spatio-temporal Database

Where is the available gas station around my location after 20 minutes? What is the traffic condition ahead of me during the next 30 minutes?

slide-32
SLIDE 32

32

P2P System

I want to eat a pumpkin. Who has it? I have it and let’s share it.

slide-33
SLIDE 33

33

Client-server vs. Peer-to-Peer network

 Example : How to find an object in the

network

  • Client-server approach

 Use a big server store objects and provide a

directory for look up.

  • Peer-to-Peer approach

 Data are fully distributed.  Each peer acts as both a client and a server.  By asking.

slide-34
SLIDE 34

Data Grids

I want File-A. I want File-X.

34

slide-35
SLIDE 35

Protein Database

Find the patterns from the protein database.

Sequence 1 KGGAKRHRKIL Sequence 2 KVGAKRHSKRS Sequence 3 KVGAKRHSRKS Sequence 4 KGGAKRHRKVL

判斷蛋白質 所屬家族 判斷蛋白質 功能

35

slide-36
SLIDE 36

36

Data Mining

顧客通常在 買麵包時也 會買牛奶 收銀台

大家排隊來結帳

利用資料挖礦的技術 對大家購買的紀錄作分析

PC

Peanuts Supermarket

slide-37
SLIDE 37

37

{A} {B} {C} {D} {E} 2 3 3 1 3 Itemset Sup. C1 {A} {B} {C} {E} 2 3 3 3 Itemset Sup. L1 Scan D Scan D Scan D {A B} {A C} {A E} {B C} {B E} {C E} Itemset C2 {B C E} Itemset C3 {A B} {A C} {A E} {B C} {B E} {C E} Itemset C2 1 2 1 2 3 2 Sup. {A C} {B C} {B E} {C E} Itemset L2 2 2 3 2 Sup. {B C E} Itemset C3 Sup. 2 {B C E} Itemset L3 Sup. 2 100 200 300 400 A C D B C E A B C E B E TID Items Database D

slide-38
SLIDE 38

38

Data Clustering

一組非常雜亂的資料,分析困難 找到資料間彼此相似的特性 產生三個相似的群集 形成三個較為單純群集再做分析較為容易 Animal Boy Girl

slide-39
SLIDE 39

39

Example

income age cluster

  • bject
slide-40
SLIDE 40

40

Classification

從目前的 資料中學習

GIRLS

對新的資料 做準確的 預測分類

slide-41
SLIDE 41

Classification of Uroflowmetry Curves

41 Uroflow patterns: (a) Bell-shaped; (b) Tower-shaped; (c) Staccato-shaped; (d) Interrupted-shaped; (e) Plateau-shaped; (f) Obstructive-shaped.

slide-42
SLIDE 42

42

Sample Training Data

No Attributes Class Location Age Marriage status Gender Low 1 Urban Below 21 Married Female Low 2 Urban Below 21 Married Male Low 3 Suburban Below 21 Married Female High 4 Rural Between 21 and 30 Married Female High 5 Rural Above 30 Single Female High 6 Rural Above 30 Single Male Low 7 Suburban Above 30 Single Male High 8 Urban Between 21 and 30 Married Female Low 9 Urban Above 30 Single Female High 10 Rural Between 21 and 30 Single Female High 11 Urban Between 21 and 30 Single Male High 12 Suburban Between 21 and 30 Married Male High 13 Suburban Below 21 Single Female High 14 Rural Between 21 and 30 Married Male Low

slide-43
SLIDE 43

43

A Complex Decision Tree

Age Location Location Gender

Gender Marrage Status Marrage Status Gender Location Gender

Above 30 Between 21 and 30 Below 21 Urban Suburban Ruarl

High High

Ruarl Suburban Urban Female Male

Low High

Female Male

High Low Low High

Female Female Male Male

Low High High High Low High ?

Urban Suburban Ruarl Married Married Single Single

Predictive power low

slide-44
SLIDE 44

44

A Compact Decision Tree

Location Marrage Status Gender High gh

Ruarl Suburban Urban Female Male

Low

  • w

Low

  • w

High gh High gh

Married Single

Its predictive power is often higher than that of a complex decision tree.

slide-45
SLIDE 45

Subspace Clustering

10 20 30 40 50 60 70 80 90 b c h j e

Gene 1 Gene 2 Gene 3

Subspace Cluster : {gene1, gene2, gene3} x {b, c, h, j, e} {gene1, gene3} x {c, e, g, b}

10 20 30 40 50 60 70 80 90 c e g b

Gene 1 Gene 3

10 20 30 40 50 60 70 80 90 a b c d e f g h i j

Gene 1 Gene 2 Gene 3

45

slide-46
SLIDE 46

46 Profile Interests

Web DB

Profile index Matching process

Filtered result

Web Pages

Recommend the page which introduces “basketball” to those people whose interest is “ basketball” .

slide-47
SLIDE 47

47

Web Mining

A B C D E O U V W G H 1 2 3 4 5 6 7 8 9 10 11 13 14 15 12 An illustrative example for traversal patterns

slide-48
SLIDE 48

48

Data Stream Mining

從封包的Stream Data中找出DOS 攻擊的IP

slide-49
SLIDE 49

49

Traditional vs. Stream Data

 Traditional Databases

  • Data stored in finite, persistent data sets.

 Stream Data (Big data in cloud)

  • Data as ordered, continuous, rapid, huge

amount, time-varying data streams. (In- Memory Databases)

slide-50
SLIDE 50

50

Landmark Window Model

t0 t1

… … …

t2 t i tj tj+1 tj+2 W1 W2 W3 time Figure 1. Landmark Window

slide-51
SLIDE 51

51

Titlted-Time Window Model

31 days

… …

24 hours 4 qtrs

time

Figure 3. Tilted-Time Window

slide-52
SLIDE 52

52

Sliding Window Model

t0 t1

… … …

t2 ti tj tj+1 tj+2 W1 W2 W3 time Figure 2. Sliding Window

slide-53
SLIDE 53

53

False-Positive answer

Exactly Real Answer False-Positive Answer

slide-54
SLIDE 54

54

False-Negative answer

False- Negative Answer Exactly Real Answer

slide-55
SLIDE 55

55

Periodicity Mining in Time Series Databases

 Three types of periodic patterns:

  • Symbol periodicity

 T = abd acb aba abc  Symbol a , p = 3, stPos = 0

  • Sequence periodicity (partial periodic

patterns)

 T = bbaa abbd abca abbc abcd  Sequence ab, p = 4, stPos = 4

  • Segment periodicity (full-cycle periodicity)

 T = abcab abcab abcab  Segement abcab, p = 5, stPos = 0

slide-56
SLIDE 56

User wants to know whether the pattern periodic or not in the time-series database.

Mining Frequent Periodic Patterns

Use computer analyzes time-series database. Find frequent periodic patterns and predict the future tend of the time- series database. How to earn money?

56

slide-57
SLIDE 57

Customers buy something, storage item and time-interval.

Mining Time-Interval Sequential Patterns

Use computer analyzes database. Find Time-interval patterns not only reveals the order of items but also the time intervals between successive items.

57

slide-58
SLIDE 58

Mining Weight Maximal Frequent Patterns

User wants to know which pattern can make money and the most items.

58

slide-59
SLIDE 59

Mining High Utility Patterns

Which itemset can contribute the most profit value

  • f all the

transactions?

59

slide-60
SLIDE 60

60

Monomg Repeating Patterns in Music Databases

slide-61
SLIDE 61

61

Co-Location Patterns

slide-62
SLIDE 62

62

Mining Spatial Co-Location Patterns

 Ex.

{A,C} ───────── {(3,1),(4,1)} {(2,3),(1,2)} {(2,3),(3,3)}

slide-63
SLIDE 63

Co-Location Patterns

A = Auto dealers R = auto Repair shops D = Department stores G = Gift stores H = Hotels Co-location patterns: {A, R}, {D, G}

63

Where is good location for retailers to open an after-market ?

slide-64
SLIDE 64

64

知識的表達 效率分析 處理

資料庫模型、資料結構、資料整體的維護 查詢處理、簡單性、回應 時間、空間需求 查詢語言、使用方便性

圖例. 資料庫系統的研究領域