BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 - - PowerPoint PPT Presentation

boolean matrix and tensor decompositions
SMART_READER_LITE
LIVE PREVIEW

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 - - PowerPoint PPT Presentation

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Pauli Miettinen TML 2013 27 September 2013 BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS Boolean decompositions are like normal decompositions, except that Input is a binary matrix or tensor


slide-1
SLIDE 1

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS

Pauli Miettinen TML 2013 27 September 2013

slide-2
SLIDE 2

BOOLEAN MATRIX AND TENSOR DECOMPOSITIONS

  • Boolean decompositions are like “normal” decompositions,

except that

  • Input is a binary matrix or tensor
  • Factors are binary
  • Arithmetic is Boolean (so reconstructions are binary)
  • Error measure is (usually) Hamming distance (L1)
slide-3
SLIDE 3

BOOLEAN ARITHMETIC

  • Idenpotent, anti-negative semi-ring ({0,1}, ∨, ∧)
  • Like normal arithmetic, but addition is defined as 1+1 = 1
  • A Boolean matrix is a binary (0/1) matrix endowned with

Boolean arithmetic

  • The Boolean matrix product is defined as 



 


(A B)ij =

R

_

r=1

bilclj

slide-4
SLIDE 4

WHY BOOLEAN ARITHMETIC?

  • Boolean decompositions find different type of structure than

decompositions under normal arithmetic

  • Not better, not worse, just different
  • Normal decomposition: value is a sum of values from rank-1

components

  • Boolean decomposition: value is 1 if there is any rank-1

component with 1 in this location

slide-5
SLIDE 5 Pauli Miettinen 24 September 2012

WHY BOOLEAN CONT’D

1 1 1 1 1 1 1

( )

1 2 3 A B C 1 3 2

A C B Boolean artihmetic can be interpret as set operations

slide-6
SLIDE 6

EXAMPLE

1 1 1 1 1 1 1

( )

1 1 1 1

( )

1 1 1 1

( )

= × ○ A B C Real analysis

  • Discr. math.

Programming E-mail Internet Contacts

slide-7
SLIDE 7

RESULTS ON BOOLEAN MATRIX FACTORIZATION

  • Computing the Boolean rank is NP-hard
  • As hard to approximate as minimum chromatic number
  • Minimum-error decomposition is NP-hard
  • And hard to approximate in both additive and multiplicative sense
  • Given A and B, finding C such that B○C is close to A is hard even to

approximate

Alternating updates are hard!

slide-8
SLIDE 8

SOME MORE RESULTS

  • Boolean rank can be a logarithm of the real rank
  • Sparse matrices have sparse (exact) factorizations
  • The rank of the decomposition can be defined automatically

using the MDL principle

  • Planted rank-1 matrix can be recovered under XOR noise

(under certain assumptions)

slide-9
SLIDE 9

SOME ALGORITHMS

  • Alternating least-squares
  • Proposed in psychometrical litterature

in early 1980’s

  • Asso [M. et al. 2006 & 2008]
  • Builds candidate factors based on

correlation matrix, and greedily selects them
 
 


  • Panda [Lucchese et al. 2010]
  • Expands monochromatic core

patterns (tiles) based on MDL-esque rule

  • Various tiling algorithms
  • Do not allow expressing 0 in data as 1

in factorization (false positives)

  • Binary factorizations
  • Normal algebra but binary factors
slide-10
SLIDE 10

SOME APPLICATIONS

  • Explorative data analysis
  • Psychometrics
  • Role mining
  • Pattern mining
  • Co-clustering-y applications
  • Bipartite community

detection

  • Binary matrix completion
  • But requires {0, 1, ?} data
slide-11
SLIDE 11

RANK-1 (BOOLEAN) TENSORS

X a b

=

X = a × b

X a b c

X = a ×1 b ×2 c

=

slide-12
SLIDE 12

THE BOOLEAN CP TENSOR DECOMPOSITION

≈ X a1 a2 aR bR b2 b1 c1 c2 cR ∨ ∨ · · · ∨

xijk ≈

R

_

r=1

airbjrckr

slide-13
SLIDE 13

THE BOOLEAN CP TENSOR DECOMPOSITION

X A B C ≈

  • xijk ≈

R

_

r=1

airbjrckr

slide-14
SLIDE 14

FREQUENT TRI-ITEMSET MINING

  • Rank-1 N-way binary tensors define an N-way itemset
  • Particularly, rank-1 binary matrices define an itemset
  • In itemset mining the induced sub-tensor must be full of 1s
  • Here, the items can have holes
  • Boolean CP decomposition = lossy N-way tiling
slide-15
SLIDE 15

BOOLEAN TENSOR RANK

The Boolean rank of a binary tensor is the minimum number of binary rank-1 tensors needed to represent the tensor exactly using Boolean arithmetic.

=

X a1 a2 aR bR b2 b1 c1 c2 cR ∨ ∨ · · · ∨

slide-16
SLIDE 16

SOME RESULTS ON RANKS

  • Normal tensor rank is NP-

hard to compute

  • Normal tensor rank of 


n-by-m-by-k tensor can be more than min{n, m, k}

  • But no more than 


min{nm, nk, mk}

  • Boolean tensor rank is

NP-hard to compute

  • Boolean tensor rank of 


n-by-m-by-k tensor can be more than min{n, m, k}

  • But no more than


min{nm, nk, mk}

slide-17
SLIDE 17

SPARSITY

  • Binary N-way tensor of Boolean tensor rank R has Boolean

rank-R CP-decomposition with factor matrices A1, A2, …, AN such that ∑i |Ai| ≤ N| |

  • Binary matrix X of Boolean rank R and |X| 1s has Boolean

rank-R decomposition A o B such that |A| + |B| ≤ 2|X|

  • Both results are existential only and extend to approximate

decompositions


X X

slide-18
SLIDE 18

SIMPLE ALGORITHM

  • We can use typical alternating

algorithm with Boolean algebra

  • Finding the optimal projection is

NP-hard even to approximate

  • Good initial values are needed due

to multiple local minima

  • Obtained using Boolean matrix

factorization to matricizations

X(1) = A (C B)T X(2) = B (C A)T X(3) = C (B A)T

slide-19
SLIDE 19

THE BOOLEAN TUCKER TENSOR DECOMPOSITION

X G A B C ≈

xijk ≈

P

_

p=1 Q

_

q=1 R

_

r=1

gpqraipbjqckr

slide-20
SLIDE 20

THE SIMPLE ALGORITHM WITH TUCKER

  • The core tensor has global effects
  • Updates are hard
  • Factors are not orthogonal
  • Assume core tensor is small
  • We can afford more time per

element

  • In Boolean case many changes

make no difference

X

G

A

B C ≈

xijk ≈

P

_

p=1 Q

_

q=1 R

_

r=1

gpqraipbjqckr

slide-21
SLIDE 21

WALK’N’MERGE: MORE SCALABLE ALGORITHM

  • Idea: For exact decomposition, we could find all N-way tiles
  • Then we “only” need to find the ones we need among them
  • Problem: For approximate decompositions, there might not

be any big tiles

  • We need to find tiles with holes, i.e. dense rank-1

subtensors

slide-22
SLIDE 22

TENSORS AS GRAPHS

  • Create a graph from the tensor
  • Each 1 in the tensor: one vertex in the graph
  • Edge between two vertices if they differ in at most one

coordinate

  • Idea: If two vertices are in the same all-1s rank-1 N-way subtensor,

they are at most N steps from each other

  • Small-diameter subgraphs ⇔ dense rank-1 subtensors
slide-23
SLIDE 23

EXAMPLE

  1 1 1 1 1 1     1 1 1 1 1 1 1  

1,1,1 1,1,2 1,2,1 1,2,2 2,1,1 2,1,2 2,2,1 2,2,2 2,3,2 1,4,1 1,4,2 3,1,2 3,3,1

slide-24
SLIDE 24

RANDOM WALKS

  • We can identify the small-diameter subgraphs by random

walks

  • If many (short) random walks re-visit the same nodes often,

they’re on a small-diameter subgraph

  • Problem: The random walks might return many overlapping

dense areas and miss the smallest rank-1 decompositions

slide-25
SLIDE 25

MERGE

  • We can exhaustively look for all small (e.g. 2-by-2-by-2) all-1s

sub-tensors outside the already-found dense subtensors

  • We can now merge all partially overlapping rank-1 subtensors

if the resulting subtensor is dense enough

  • Result: A Boolean CP-decomposition of some rank
  • False positive rate controlled by the density, false negative by

the exhaustive search

slide-26
SLIDE 26

MDL STRIKES AGAIN

  • We have a decomposition with some rank, but what would be

a good rank?

  • Normally: pre-defined by the user (but how does she know)
  • MDL principle: The best model to describe your data is the
  • ne that does it with the least number of bits
  • We can use MDL to choose the rank
slide-27
SLIDE 27

HOW YOU COUNT THE BITS?

  • MDL asks for an exact representation of the data
  • In case of Boolean CP

, we represent the tensor with

  • Factor matrices
  • Error tensor
  • The bit-strings representing these are encoded to compute

the description length X E

slide-28
SLIDE 28

WHY MDL AND TUCKER DECOMPOSITION

  • Balance between accuracy and complexity
  • High rank: more bits in factor matrices, less in error tensor
  • Small rank: less bits in factor matrices, more in error tensor
  • If one mode uses the same factor multiple times, CP contains it

multiple times

  • The Tucker decomposition needs to have that factor only once
slide-29
SLIDE 29

FROM CP TO TUCKER WITH MDL

  • CP is Tucker with hyper-diagonal core tensor
  • If we can remove a repeated column from a factor matrix and

adjust the core accordingly, our encoding is more efficient

  • Algorithm: Try mergin similar factors and see if that reduces

the encoding length

slide-30
SLIDE 30

APPLICATION: FACT DISCOVERY

  • Input: noun phrase–verbal phrase–noun phrase triples
  • Non-disambiguated
  • E.g. from OpenIE
  • Goal: Find the facts (entity–relation–entity triples) underlying

the observed data and mappings from surface forms to entities and relations

slide-31
SLIDE 31

CONNECTION TO BOOLEAN TENSORS

  • We should see an np1–vp–np2 triple if
  • there exists at least one fact e1–r–e2 such that
  • np1 is the surface form of e1
  • vp is the surface form of r
  • np2 is the surface form of e2
slide-32
SLIDE 32

CONNECTION TO BOOLEAN TENSORS

  • What we want is Boolean Tucker3 decomposition
  • Core tensor contains the facts
  • Factors contain the mappings from entities and relations to

surface forms

xijk ≈

P

_

p=1 Q

_

q=1 R

_

r=1

gpqraipbjqckr

slide-33
SLIDE 33

PROS & CONS

  • Pros: Naturally sparse core tensor
  • Core will be huge ⇒ must be sparse
  • Natural interpretation
  • Cons: No levels of certainity
  • Either is or not
  • Can only handle binary data
slide-34
SLIDE 34

EXAMPLE RESULT

Subject: claude de lorimier, de lorimier, louis, jean-baptiste Relation: was born, [[det]] born in Object: borough of lachine, villa st. pierre, lachine quebec

39,500-by-8,000-by-21,000 tensor with 804 000non-zeros

slide-35
SLIDE 35

CONCLUSIONS

  • Boolean factorizations are more combinatorial in their flavour
  • Interpretations as sets or graphs
  • Boolean matrix factorization is computationally harder than

most normal factorizations

  • With tensors the difference is not so big
slide-36
SLIDE 36

FUTURE DIRECTIONS

  • When should one apply Boolean factorizations?
  • More education is needed
  • Better algorithms & implementations
  • I’ve been asking for this for 7 years now…

L Tiank Y

  • v! L