CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, - - PowerPoint PPT Presentation

cmpt741 459 assignments tutorial
SMART_READER_LITE
LIVE PREVIEW

CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, - - PowerPoint PPT Presentation

CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, 2015 A1Q3 Generator: (1) sup ( I ) min sup (2) I , I I sup ( I ) = sup ( I ) (a) Relation between closed patterns and generators (b) Algorithm for


slide-1
SLIDE 1

CMPT741/459 Assignments Tutorial

Zicun Cong Yu Yang October 30, 2015

slide-2
SLIDE 2

A1Q3

Generator: (1) sup(I) ≥ min sup (2) ∄I ′, I ′ ⊂ I ∧ sup(I ′) = sup(I) (a) Relation between closed patterns and generators (b) Algorithm for finding all generators

slide-3
SLIDE 3

A1Q3: Relation

◮ Equivalent class: a set of itemsets

contained by the same set of transactions

◮ One closed pattern (proof by

contradiction)

◮ Multiple generators

“Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns”, AAAI 2006

slide-4
SLIDE 4

A1Q3: Algorithm

◮ Apriori Property

◮ fD(I): transactions containing I ◮ Suppose I1 ⊂ I is not a generator ◮ I2 ⊂ I1, fD(I2) = fD(I1) ◮ I2 ∪ (I\I1) ⊂ I ◮ fD(I) = fD(I1) ∩ fD(I\I1) =

fD(I2) ∩ fD(I\I1) = fD(I2 ∪ (I\I1))

slide-5
SLIDE 5

A1Q4

Misleading Rule

An association rule (X → Y ) is a misleading rule if sup(X ∪ Y ) ≥ α and β ≤ conf (X → Y ) < sup(Y )

|TD| ,

where |TD| is the total number of transactions.

slide-6
SLIDE 6

A2Q2

(a) Give a small example where there are one fact table and two dimension tables. (b) Compute an iceberg cube where the aggregate function is monotonic using the universal table. (c) Identify and reduce redundancy.

slide-7
SLIDE 7

A2Q2(a)

Figure: A Star Schema

slide-8
SLIDE 8

A2Q2(b)

Figure: Universal Table

slide-9
SLIDE 9

A2Q2(b)

Figure: BUC, min sup = 3

slide-10
SLIDE 10

A2Q2(c)

◮ Storage Redundancy: values of

non-primary keys in a dimension table is only decided by the value

  • f primary key in the same

dimension table

◮ Computation Redundancy:

repeatedly search the same portion

  • f the universal table (e.g. c1d1e1

and c1d1f1)

slide-11
SLIDE 11

A2Q2(c): Reduce Redundancy

◮ sup(c): only decided by searching

the fact table

◮ Dimension table: only the primary

key is useful in searching the fact table

◮ Idea: find local iceberg cells on

dimension tables first, index local icebergs by their corresponding primary keys (signature).

slide-12
SLIDE 12

A2Q2(c): Algorithm

◮ Propagate information to

dimension tables

◮ Local icebergs: sum(Count) ≥ 3

Figure: A Star Schema

slide-13
SLIDE 13

A2Q2(c): Algorithm

◮ Equivalent class: set of local

icebergs with the same signature (e.g. e1, f1 and e1f1)

◮ Join signatures from different

dimension tables to obtain global iceberg cells

“Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses”, SDM 2005

slide-14
SLIDE 14

A3Q2

Give a counter example that KMeans cannot get the optimal clustering w.r.t.

  • ∈D dist(o, co)

◮ Find an example that has two

stable clustering with different loss

slide-15
SLIDE 15

A3Q2

Figure: A Counter Example

slide-16
SLIDE 16

A3Q5

Show that I × J is a bicluster with coherent values iff. for any i1, i2 ∈ I, j1, j2 ∈ J, ei1j1 − ei2j1 = ei1j2 − ei2j2.

slide-17
SLIDE 17

A3Q5: Necessity

◮ Bicluster: for any i ∈ I and any

j ∈ J, eij = c + αi + βj

◮ ei1j1 − ei2j1 =

c+αi1+βj1−c−αi2−βj1 = αi1−αi2

◮ ei1j2 − ei2j2 = c + αi1 + βj2 − c −

αi2 − βj2 = αi1 − αi2 = ei1j1 − ei2j1

slide-18
SLIDE 18

A3Q5: Sufficiency

◮ eiJ =

  • j∈J eij

|J|

, eIj =

  • i∈I eij

|I|

, eIJ =

  • i∈I
  • j∈J eij

|I||J|

◮ eij − eiJ − eIj + eIJ = 0

◮ Due to ei1j1 − ei2j1 = ei1j2 − ei2j2

◮ eij = eiJ(αi) + eIj(βj) − eIJ(−c)