1 Relational Database Domains SNOOPYFAMILY - - PowerPoint PPT Presentation

1 relational database domains snoopyfamily male
SMART_READER_LITE
LIVE PREVIEW

1 Relational Database Domains SNOOPYFAMILY - - PowerPoint PPT Presentation

1 Relational Database Domains SNOOPYFAMILY Male Female Primary Key SQL ID NAME SEX 1 SNOOPY Male Select NAME 2 CHARLIE BROWN Male From SNOOPYFAMILY 3 SALLY


slide-1
SLIDE 1

1

資料庫系統實驗室

指導教授:張玉盈

slide-2
SLIDE 2

2

Relational Database

ID NAME SEX 1 SNOOPY Male 2 CHARLIE BROWN Male 3 SALLY BROWN Female 4 LUCY VAN PELT Female 5 LINUS VAN PELT Male 6 PEPPERMINT PATTY Female 7 MARCIE Female 8 SCHROEDER Male 9 WOODSTOCK

  • Degree

Cardinality Attributes Tuples

Male Female

Domains Primary Key

Select NAME From SNOOPYFAMILY Where SEX = ‘Male’; ❖ 利用SQL做查詢: ❖ 結果:

ID NAME SEX

1 SNOOPY Male 2 CHARLIE BROWN Male 5 LINUS VAN PELT Male 8 SCHROEDER Male

SNOOPYFAMILY

slide-3
SLIDE 3

Introduction

  • Data mining is widely used to mine or extract

previously unknown, hidden, and potentially useful information from the large database.

  • Frequent pattern mining is a basic research topic in

pattern mining.

  • It generates all frequent patterns with no smaller

supports (or frequency) than a given minimum support threshold.

3

slide-4
SLIDE 4

4

{A} {B} {C} {D} {E} 2 3 3 1 3 Itemset Sup. C1 {A} {B} {C} {E} 2 3 3 3 Itemset Sup. L1 Scan D Scan D Scan D {A B} {A C} {A E} {B C} {B E} {C E} Itemset C2 {B C E} Itemset C3 {A B} {A C} {A E} {B C} {B E} {C E} Itemset C2 1 2 1 2 3 2 Sup. {A C} {B C} {B E} {C E} Itemset L2 2 2 3 2 Sup. {B C E} Itemset C3 Sup. 2 {B C E} Itemset L3 Sup. 2 100 200 300 400 A C D B C E A B C E B E TID Items Database D

slide-5
SLIDE 5

Frequent Pattern Mining

  • This technique has two limitations.
  • First, it only considers that each item exists or does

not exist in the binary form.

  • Second, all items have same value with the same

importance.

5

slide-6
SLIDE 6

Data Mining

  • Data mining is the process of finding hidden and useful knowledge

form the large databases.

  • However, items have different importance in the real world.
  • For example, the iPhone (cellphone) is expensive and the telephone is

cheap.

  • Therefore, we have to consider the importance and the count of the

items at the same time.

6

slide-7
SLIDE 7

Mining Weight Maximal Frequent Patterns

User wants to know which pattern can make money and the most items.

7

slide-8
SLIDE 8

Example

8

Item Weight A 0.6 B 0.8 C 0.4 TID Transaction 1 A, C 2 B, C 3 A, B, C 4 A, B 5 B, C

slide-9
SLIDE 9

Frequent Itemsets

  • We can use the Apriori algorithm to find frequent

patterns.

  • 𝑀1={A,B,C}
  • 𝑀1=>𝐷2
  • 𝐷2={AB,AC,BC}
  • 𝑀2={AB,AC,BC}
  • 𝑀2=>𝐷3
  • 𝐷3={ABC}
  • 𝑀3= ø

9

{A,B,C}:1 {A,C}:2 {B,C}:3 {A,B}:2 {A}:3 {B}:4 {C}:4 {item set}:count Min_Sup:2

slide-10
SLIDE 10

Weighted Frequent pattern

Item Weight A 0.6 B 0.8 C 0.4

{item set}:WSup Min_Sup:1.8

10

{A,B,C}:0.6 {A,C}:1.0 {A,B}:1.4 {A}:1.8 {B}:3.2 {C}:1.6 {B,C}:1.8 WSup(PS) = sup(PS)*

σ𝑗=1

𝑚𝑓𝑜𝑕𝑢ℎ(𝑄𝑇)(𝑄𝑗)

𝑚𝑓𝑜𝑕𝑢ℎ(𝑄𝑇)

slide-11
SLIDE 11
  • In this case, the weighted frequent patterns are {A}, {B}, {B,C}.
  • The weighted maximal frequent pattern is a pattern which does not

have any weighted frequent super pattern.

  • So, the weighted maximal frequent patterns in this case are {A},

{B,C}.

11

slide-12
SLIDE 12
  • In the real world, each item has different profit and

the number of items purchased by consumers may be not only one.

  • In utility mining, each item has internal utility value

that represents the quantity of the item in each transaction, and external utility value such as profit

  • r price.

12

slide-13
SLIDE 13

Mining High Utility Patterns

Which itemset can contribute the most profit value of all the transactions?

13

slide-14
SLIDE 14
  • In recent years, many applications have generated

stream data such as transactions of retail markets.

  • These data are continuous, unbounded, and

usually coming with high speed.

14

slide-15
SLIDE 15

15

Traditional vs. Stream Data

  • Traditional Databases
  • Data stored in finite, persistent data sets.
  • Stream Data (Big data in cloud)
  • Data as ordered, continuous, rapid, huge amount, time-varying data streams.

(In-Memory Databases)

slide-16
SLIDE 16

16

Sliding Window Model

t0 t1

… … …

t2 ti tj tj+1 tj+2 W1 W2 W3 time Figure 2. Sliding Window

slide-17
SLIDE 17
  • In the sliding window model, only recent data in a

fixed size window are employed to discover meaningful patterns over data streams.

  • This model is widely used for stream mining because
  • f its ability to emphasize recent data and requires

bounded memory resources.

17

slide-18
SLIDE 18

Mining high utility patterns

Problem Statement

  • Given a data stream and a user-specified minimum

utility threshold, mining high utility patterns in a window over the data stream is equivalent to discover a set of patterns having no smaller utilities than the minimum utility threshold from this window.

18

slide-19
SLIDE 19

Simple Example

TID Transaction TU T1 (A, 2) (B, 3) (C, 1) 1550 T2 (A, 1) (B, 2) 300 T3 (A, 2) 400 Item Profit A 200 B 50 C 1000

uT(AB, T1) = uT (A, T1) + uT (B, T1) = 200 × 2 + 50 × 3 = 550 u(AB) = uT (AB, T1) + uT (AB, T2) = 550 + 200 × 1 + 50 × 2 = 850 TU(T1) = 200 × 2 + 50 × 3 + 1000 × 1 = 1550 TWU(AB) = TU(T1) + TU(T2) = 1550 + 300 = 1850

19

slide-20
SLIDE 20

20

Periodicity Mining in Time Series Databases

  • Three types of periodic patterns:
  • Symbol periodicity
  • T = abd acb aba abc
  • Symbol a , p = 3, stPos = 0
  • Sequence periodicity (partial periodic patterns)
  • T = bbaa abbd abca abbc abcd
  • Sequence ab, p = 4, stPos = 4
  • Segment periodicity (full-cycle periodicity)
  • T = abcab abcab abcab
  • Segement abcab, p = 5, stPos = 0
slide-21
SLIDE 21

User wants to know whether the pattern periodic or not in the time- series database.

Mining Frequent Periodic Patterns

Use computer to analyze time-series database. Find frequent periodic patterns and predict the future tend of the time- series database. How to earn money?

21

slide-22
SLIDE 22

Customers buy something, storage item and time-interval.

Mining Time-Interval Sequential Patterns

Use computer to analyze database. Find Time-interval patterns not only reveals the order of items but also the time intervals between successive items.

22

slide-23
SLIDE 23

23

知識的表達 效率分析 處理

資料庫模型、資料結構、資料整體的維護 查詢處理、簡單性、回應 時間、空間需求 查詢語言、使用方便性

圖例. 資料庫系統的研究領域