1
資料庫系統實驗室
指導教授:張玉盈
1 Relational Database Domains SNOOPYFAMILY - - PowerPoint PPT Presentation
1 Relational Database Domains SNOOPYFAMILY Male Female Primary Key SQL ID NAME SEX 1 SNOOPY Male Select NAME 2 CHARLIE BROWN Male From SNOOPYFAMILY 3 SALLY
1
指導教授:張玉盈
2
ID NAME SEX 1 SNOOPY Male 2 CHARLIE BROWN Male 3 SALLY BROWN Female 4 LUCY VAN PELT Female 5 LINUS VAN PELT Male 6 PEPPERMINT PATTY Female 7 MARCIE Female 8 SCHROEDER Male 9 WOODSTOCK
Cardinality Attributes Tuples
Male Female
Domains Primary Key
Select NAME From SNOOPYFAMILY Where SEX = ‘Male’; ❖ 利用SQL做查詢: ❖ 結果:
ID NAME SEX
1 SNOOPY Male 2 CHARLIE BROWN Male 5 LINUS VAN PELT Male 8 SCHROEDER Male
SNOOPYFAMILY
previously unknown, hidden, and potentially useful information from the large database.
pattern mining.
supports (or frequency) than a given minimum support threshold.
3
4
{A} {B} {C} {D} {E} 2 3 3 1 3 Itemset Sup. C1 {A} {B} {C} {E} 2 3 3 3 Itemset Sup. L1 Scan D Scan D Scan D {A B} {A C} {A E} {B C} {B E} {C E} Itemset C2 {B C E} Itemset C3 {A B} {A C} {A E} {B C} {B E} {C E} Itemset C2 1 2 1 2 3 2 Sup. {A C} {B C} {B E} {C E} Itemset L2 2 2 3 2 Sup. {B C E} Itemset C3 Sup. 2 {B C E} Itemset L3 Sup. 2 100 200 300 400 A C D B C E A B C E B E TID Items Database D
not exist in the binary form.
importance.
5
form the large databases.
cheap.
items at the same time.
6
Mining Weight Maximal Frequent Patterns
User wants to know which pattern can make money and the most items.
7
8
Item Weight A 0.6 B 0.8 C 0.4 TID Transaction 1 A, C 2 B, C 3 A, B, C 4 A, B 5 B, C
patterns.
9
{A,B,C}:1 {A,C}:2 {B,C}:3 {A,B}:2 {A}:3 {B}:4 {C}:4 {item set}:count Min_Sup:2
Item Weight A 0.6 B 0.8 C 0.4
{item set}:WSup Min_Sup:1.8
10
{A,B,C}:0.6 {A,C}:1.0 {A,B}:1.4 {A}:1.8 {B}:3.2 {C}:1.6 {B,C}:1.8 WSup(PS) = sup(PS)*
σ𝑗=1
𝑚𝑓𝑜𝑢ℎ(𝑄𝑇)(𝑄𝑗)
𝑚𝑓𝑜𝑢ℎ(𝑄𝑇)
have any weighted frequent super pattern.
{B,C}.
11
the number of items purchased by consumers may be not only one.
that represents the quantity of the item in each transaction, and external utility value such as profit
12
Which itemset can contribute the most profit value of all the transactions?
13
stream data such as transactions of retail markets.
usually coming with high speed.
14
15
(In-Memory Databases)
16
t0 t1
… … …
t2 ti tj tj+1 tj+2 W1 W2 W3 time Figure 2. Sliding Window
fixed size window are employed to discover meaningful patterns over data streams.
bounded memory resources.
17
Problem Statement
utility threshold, mining high utility patterns in a window over the data stream is equivalent to discover a set of patterns having no smaller utilities than the minimum utility threshold from this window.
18
TID Transaction TU T1 (A, 2) (B, 3) (C, 1) 1550 T2 (A, 1) (B, 2) 300 T3 (A, 2) 400 Item Profit A 200 B 50 C 1000
uT(AB, T1) = uT (A, T1) + uT (B, T1) = 200 × 2 + 50 × 3 = 550 u(AB) = uT (AB, T1) + uT (AB, T2) = 550 + 200 × 1 + 50 × 2 = 850 TU(T1) = 200 × 2 + 50 × 3 + 1000 × 1 = 1550 TWU(AB) = TU(T1) + TU(T2) = 1550 + 300 = 1850
19
20
User wants to know whether the pattern periodic or not in the time- series database.
Mining Frequent Periodic Patterns
Use computer to analyze time-series database. Find frequent periodic patterns and predict the future tend of the time- series database. How to earn money?
21
Customers buy something, storage item and time-interval.
Mining Time-Interval Sequential Patterns
Use computer to analyze database. Find Time-interval patterns not only reveals the order of items but also the time intervals between successive items.
22
23
知識的表達 效率分析 處理
資料庫模型、資料結構、資料整體的維護 查詢處理、簡單性、回應 時間、空間需求 查詢語言、使用方便性
圖例. 資料庫系統的研究領域