BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS
Pauli Miettinen TML 2013 27 September 2013
BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 - - PowerPoint PPT Presentation
BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 27 September 2013 BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Boolean decompositions are like normal decompositions, except that Input is a binary matrix or tensor
BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS
Pauli Miettinen TML 2013 27 September 2013
BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS
except that
BOOLEAN ARITHMETIC
Boolean arithmetic
(A B)ij =
R
_
r=1
bilclj
WHY BOOLEAN ARITHMETIC?
decompositions under normal arithmetic
components
component with 1 in this location
WHY BOOLEAN CONT’D
1 1 1 1 1 1 1
1 2 3 A B C 1 3 2
A C B Boolean artihmetic can be interpret as set operations
EXAMPLE
1 1 1 1 1 1 1
1 1 1 1
1 1 1 1
= × ○ A B C Real analysis
Programming E-mail Internet Contacts
RESULTS ON BOOLEAN MATRIX FACTORIZATION
approximate
Alternating updates are hard!
SOME MORE RESULTS
using the MDL principle
(under certain assumptions)
SOME ALGORITHMS
in early 1980’s
correlation matrix, and greedily selects them
patterns (tiles) based on MDL-esque rule
in factorization (false positives)
SOME APPLICATIONS
detection
RANK-1 (BOOLEAN) TENSORS
X a b
=
X = a × b
X a b c
X = a ×1 b ×2 c
=
THE BOOLEAN CP TENSOR DECOMPOSITION
≈ X a1 a2 aR bR b2 b1 c1 c2 cR ∨ ∨ · · · ∨
xijk ≈
R
_
r=1
airbjrckr
THE BOOLEAN CP TENSOR DECOMPOSITION
X A B C ≈
R
_
r=1
airbjrckr
FREQUENT TRI-ITEMSET MINING
BOOLEAN TENSOR RANK
The Boolean rank of a binary tensor is the minimum number of binary rank-1 tensors needed to represent the tensor exactly using Boolean arithmetic.
=
X a1 a2 aR bR b2 b1 c1 c2 cR ∨ ∨ · · · ∨
SOME RESULTS ON RANKS
hard to compute
n-by-m-by-k tensor can be more than min{n, m, k}
min{nm, nk, mk}
NP-hard to compute
n-by-m-by-k tensor can be more than min{n, m, k}
min{nm, nk, mk}
SPARSITY
rank-R CP-decomposition with factor matrices A1, A2, …, AN such that ∑i |Ai| ≤ N| |
rank-R decomposition A o B such that |A| + |B| ≤ 2|X|
decompositions
X X
SIMPLE ALGORITHM
algorithm with Boolean algebra
NP-hard even to approximate
to multiple local minima
factorization to matricizations
X(1) = A (C B)T X(2) = B (C A)T X(3) = C (B A)T
THE BOOLEAN TUCKER TENSOR DECOMPOSITION
X G A B C ≈
xijk ≈
P
_
p=1 Q
_
q=1 R
_
r=1
gpqraipbjqckr
THE SIMPLE ALGORITHM WITH TUCKER
element
make no difference
X
G
A
B C ≈
xijk ≈
P
_
p=1 Q
_
q=1 R
_
r=1
gpqraipbjqckr
WALK’N’MERGE: MORE SCALABLE ALGORITHM
be any big tiles
subtensors
TENSORS AS GRAPHS
coordinate
they are at most N steps from each other
EXAMPLE
1 1 1 1 1 1 1 1 1 1 1 1 1
1,1,1 1,1,2 1,2,1 1,2,2 2,1,1 2,1,2 2,2,1 2,2,2 2,3,2 1,4,1 1,4,2 3,1,2 3,3,1
RANDOM WALKS
walks
they’re on a small-diameter subgraph
dense areas and miss the smallest rank-1 decompositions
MERGE
sub-tensors outside the already-found dense subtensors
if the resulting subtensor is dense enough
the exhaustive search
MDL STRIKES AGAIN
a good rank?
HOW YOU COUNT THE BITS?
, we represent the tensor with
the description length X E
WHY MDL AND TUCKER DECOMPOSITION
multiple times
FROM CP TO TUCKER WITH MDL
adjust the core accordingly, our encoding is more efficient
the encoding length
APPLICATION: FACT DISCOVERY
the observed data and mappings from surface forms to entities and relations
CONNECTION TO BOOLEAN TENSORS
CONNECTION TO BOOLEAN TENSORS
surface forms
xijk ≈
P
_
p=1 Q
_
q=1 R
_
r=1
gpqraipbjqckr
PROS & CONS
EXAMPLE RESULT
Subject: claude de lorimier, de lorimier, louis, jean-baptiste Relation: was born, [[det]] born in Object: borough of lachine, villa st. pierre, lachine quebec
39,500-by-8,000-by-21,000 tensor with 804 000non-zeros
CONCLUSIONS
most normal factorizations
FUTURE DIRECTIONS
L Tiank Y