CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, - - PowerPoint PPT Presentation
CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, - - PowerPoint PPT Presentation
CMPT741/459 Assignments Tutorial Zicun Cong Yu Yang October 30, 2015 A1Q3 Generator: (1) sup ( I ) min sup (2) I , I I sup ( I ) = sup ( I ) (a) Relation between closed patterns and generators (b) Algorithm for
A1Q3
Generator: (1) sup(I) ≥ min sup (2) ∄I ′, I ′ ⊂ I ∧ sup(I ′) = sup(I) (a) Relation between closed patterns and generators (b) Algorithm for finding all generators
A1Q3: Relation
◮ Equivalent class: a set of itemsets
contained by the same set of transactions
◮ One closed pattern (proof by
contradiction)
◮ Multiple generators
“Minimum Description Length (MDL) Principle: Generators Are Preferable to Closed Patterns”, AAAI 2006
A1Q3: Algorithm
◮ Apriori Property
◮ fD(I): transactions containing I ◮ Suppose I1 ⊂ I is not a generator ◮ I2 ⊂ I1, fD(I2) = fD(I1) ◮ I2 ∪ (I\I1) ⊂ I ◮ fD(I) = fD(I1) ∩ fD(I\I1) =
fD(I2) ∩ fD(I\I1) = fD(I2 ∪ (I\I1))
A1Q4
Misleading Rule
An association rule (X → Y ) is a misleading rule if sup(X ∪ Y ) ≥ α and β ≤ conf (X → Y ) < sup(Y )
|TD| ,
where |TD| is the total number of transactions.
A2Q2
(a) Give a small example where there are one fact table and two dimension tables. (b) Compute an iceberg cube where the aggregate function is monotonic using the universal table. (c) Identify and reduce redundancy.
A2Q2(a)
Figure: A Star Schema
A2Q2(b)
Figure: Universal Table
A2Q2(b)
Figure: BUC, min sup = 3
A2Q2(c)
◮ Storage Redundancy: values of
non-primary keys in a dimension table is only decided by the value
- f primary key in the same
dimension table
◮ Computation Redundancy:
repeatedly search the same portion
- f the universal table (e.g. c1d1e1
and c1d1f1)
A2Q2(c): Reduce Redundancy
◮ sup(c): only decided by searching
the fact table
◮ Dimension table: only the primary
key is useful in searching the fact table
◮ Idea: find local iceberg cells on
dimension tables first, index local icebergs by their corresponding primary keys (signature).
A2Q2(c): Algorithm
◮ Propagate information to
dimension tables
◮ Local icebergs: sum(Count) ≥ 3
Figure: A Star Schema
A2Q2(c): Algorithm
◮ Equivalent class: set of local
icebergs with the same signature (e.g. e1, f1 and e1f1)
◮ Join signatures from different
dimension tables to obtain global iceberg cells
“Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses”, SDM 2005
A3Q2
Give a counter example that KMeans cannot get the optimal clustering w.r.t.
- ∈D dist(o, co)
◮ Find an example that has two
stable clustering with different loss
A3Q2
Figure: A Counter Example
A3Q5
Show that I × J is a bicluster with coherent values iff. for any i1, i2 ∈ I, j1, j2 ∈ J, ei1j1 − ei2j1 = ei1j2 − ei2j2.
A3Q5: Necessity
◮ Bicluster: for any i ∈ I and any
j ∈ J, eij = c + αi + βj
◮ ei1j1 − ei2j1 =
c+αi1+βj1−c−αi2−βj1 = αi1−αi2
◮ ei1j2 − ei2j2 = c + αi1 + βj2 − c −
αi2 − βj2 = αi1 − αi2 = ei1j1 − ei2j1
A3Q5: Sufficiency
◮ eiJ =
- j∈J eij
|J|
, eIj =
- i∈I eij
|I|
, eIJ =
- i∈I
- j∈J eij
|I||J|
◮ eij − eiJ − eIj + eIJ = 0
◮ Due to ei1j1 − ei2j1 = ei1j2 − ei2j2
◮ eij = eiJ(αi) + eIj(βj) − eIJ(−c)