Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini - - PowerPoint PPT Presentation

vegan fleas movie ratings and the em algorithm
SMART_READER_LITE
LIVE PREVIEW

Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini - - PowerPoint PPT Presentation

Vegan fleas, movie ratings, and the EM algorithm Carlos Cotrini Department of Computer Science ETH Z urich ccarlos@inf.ethz.ch March 25, 2019 Carlos Cotrini (ETH Z urich) The EM algorithm March 25, 2019 1 / 36 Overview The vegan-flea


slide-1
SLIDE 1

Vegan fleas, movie ratings, and the EM algorithm

Carlos Cotrini

Department of Computer Science ETH Z¨ urich ccarlos@inf.ethz.ch

March 25, 2019

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 1 / 36

slide-2
SLIDE 2

Overview

1

The vegan-flea optimization problem

2

Building a movie recommendation system

3

The EM algorithm

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 2 / 36

slide-3
SLIDE 3

The vegan-flea optimization problem

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 3 / 36

slide-4
SLIDE 4

A two-dimensional dog

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 4 / 36

slide-5
SLIDE 5

The dog’s cardiovascular system

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 5 / 36

slide-6
SLIDE 6

The dog’s cardiovascular system

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 6 / 36

slide-7
SLIDE 7

The flea, the dog’s skin, and the vessel’s upper border

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 7 / 36

slide-8
SLIDE 8

Animation

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 8 / 36

slide-9
SLIDE 9

Formalization

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 9 / 36

slide-10
SLIDE 10

Assumptions

We assume that for any x ∈ [0, 1] and any two time points t1, t2 ∈ [0, ∞), skin(x, t1)−vessel(x, t1) = skin(x, t2)−vessel(x, t2). For any x ∈ [0, 1] and any t ∈ [0, ∞), there is t′ ≥ t such that vessel(x, t′) is a maximum of vessel(·, t′). For any t ∈ [0, ∞), the flea can efficiently compute a point x∗ that maximizes skin(·, t). For any x ∈ [0, 1] and any t ∈ [0, ∞), the flea can efficiently compute ˆ t ≥ t such that vessel(x, ˆ t) is a maximum of vessel(·, ˆ t).

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 10 / 36

slide-11
SLIDE 11

Objective

Can the flea compute x∗ such that d(x∗) ≥ d(x0), where x0 is the flea’s current position?

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 11 / 36

slide-12
SLIDE 12

Optimization algorithm

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 12 / 36

slide-13
SLIDE 13

Optimization algorithm

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 13 / 36

slide-14
SLIDE 14

Optimization algorithm

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 14 / 36

slide-15
SLIDE 15

Why does this work?

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 15 / 36

slide-16
SLIDE 16

A movie recommendation system

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 16 / 36

slide-17
SLIDE 17

A simple dataset of movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 17 / 36

slide-18
SLIDE 18

A simple dataset of movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 18 / 36

slide-19
SLIDE 19

A simple dataset of movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 19 / 36

slide-20
SLIDE 20

A simple dataset of movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 20 / 36

slide-21
SLIDE 21

A probability model for movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 21 / 36

slide-22
SLIDE 22

A probability model for movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 22 / 36

slide-23
SLIDE 23

A probability model for movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 23 / 36

slide-24
SLIDE 24

A probability model for movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 24 / 36

slide-25
SLIDE 25

A probability model for movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 25 / 36

slide-26
SLIDE 26

A probability model for movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 26 / 36

slide-27
SLIDE 27

A probability model for movie ratings

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 27 / 36

slide-28
SLIDE 28

Notation

X = (xi,j)i≤N,j≤D. Here, xi,j ∈ {0, 1} indicates whether person i liked movie j or not. ¯ µ = (µk,j)k≤K,j≤D. Here, µk,j ∈ [0, 1] denotes the probability that someone in category k likes movie j. ¯ ν = (νk)k≤K. Here, νk ∈ [0, 1] denotes the probability that a person belongs to category k. ¯ z = (z(i))i≤N. Here, z(i) ∈ {0, . . . , K} indicates person i’s category.

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 28 / 36

slide-29
SLIDE 29

How to mine a probability model from X?

Maximum-likelihood approach: Solve the following problem. arg max

¯ µ,¯ ν

log p (X | ¯ µ, ¯ ν) . s.t.

  • k≤K

νk = 1. Incomplete-data log likelihood. Complete-data log likelihood. log p (X, ¯ z | ¯ µ, ¯ ν) .

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 29 / 36

slide-30
SLIDE 30

How to mine a probability model from X?

Maximum-likelihood approach: Solve the following problem. arg max

¯ µ,¯ ν

  • i≤N log
  • z(i) νz(i)
  • j≤D µxi,j

z(i),j(1 − µz(i),j)1−xi,j

  • .

s.t.

  • k≤K

νk = 1. Incomplete-data log likelihood. Complete-data log likelihood.

  • i≤N log νz(i) +

j≤D xi,j log µz(i),j + (1 − xi,j) log

  • 1 − µz(i),j
  • .

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 30 / 36

slide-31
SLIDE 31

The dilemma

We are between a problem we want to solve, but we don’t know how, and a problem we know how to solve but we don’t want to solve. Let’s try to connect them.

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 31 / 36

slide-32
SLIDE 32

Connecting incomplete-data and complete-data log likelihoods

Let θ = (¯ µ, ¯ ν) How can we connect log p(X | θ) and log p(X, ¯ z | θ)? We can start with the following: p(¯ z | X, θ) = p(X, ¯ z | θ) p(X | θ) .

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36

slide-33
SLIDE 33

Connecting incomplete-data and complete-data log likelihoods

Let θ = (¯ µ, ¯ ν) How can we connect log p(X | θ) and log p(X, ¯ z | θ)? We can start with the following: p(¯ z | X, θ) = p(X, ¯ z | θ) p(X | θ) . From here, we can derive that: log p(X | θ) = log p(X, ¯ z | θ) − log p(¯ z | X, θ).

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36

slide-34
SLIDE 34

Connecting incomplete-data and complete-data log likelihoods

Let θ = (¯ µ, ¯ ν) How can we connect log p(X | θ) and log p(X, ¯ z | θ)? We can start with the following: p(¯ z | X, θ) = p(X, ¯ z | θ) p(X | θ) . From here, we can derive that: log p(X | θ) = log p(X, ¯ z | θ) − log p(¯ z | X, θ). But we don’t know the value of ¯ z.

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 32 / 36

slide-35
SLIDE 35

Take expectations on both sides with respect to ¯ z, using some pdf ˜ p (¯ z) for ¯ z.

  • ˜

p (¯ z) log p(X | θ)d¯ z =

  • ˜

p (¯ z) log p(X, ¯ z | θ)d¯ z−

  • ˜

p (¯ z) log p(¯ z | X, θ)d¯ z.

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 33 / 36

slide-36
SLIDE 36

Since log p(X | θ) does not depend on ¯ z, we get log p(X | θ) =

  • ˜

p (¯ z) log p(X, ¯ z | θ)d¯ z −

  • ˜

p (¯ z) log p(¯ z | X, θ)d¯ z.

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 34 / 36

slide-37
SLIDE 37

In other words, log p(X | θ) = E˜

p(¯ z) log p(X, ¯

z | θ) − E˜

p(¯ z) log p(¯

z | X, θ). Does this look familiar?

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

slide-38
SLIDE 38

In other words, log p(X | θ) = E˜

p(¯ z) log p(X, ¯

z | θ) − E˜

p(¯ z) log p(¯

z | X, θ). Does this look familiar? d(θ) = skin (θ, ˜ p) − vessel (θ, ˜ p) .

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

slide-39
SLIDE 39

In other words, log p(X | θ) = E˜

p(¯ z) log p(X, ¯

z | θ) − E˜

p(¯ z) log p(¯

z | X, θ). Does this look familiar? d(θ) = skin (θ, ˜ p) − vessel (θ, ˜ p) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E˜

p(¯ z) log p(X, ¯

z | θ) and E˜

p(¯ z) log p(¯

z | X, θ)!

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

slide-40
SLIDE 40

In other words, log p(X | θ) = E˜

p(¯ z) log p(X, ¯

z | θ) − E˜

p(¯ z) log p(¯

z | X, θ). Does this look familiar? d(θ) = skin (θ, ˜ p) − vessel (θ, ˜ p) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E˜

p(¯ z) log p(X, ¯

z | θ) and E˜

p(¯ z) log p(¯

z | X, θ)! It turns out that all assumptions hold!

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

slide-41
SLIDE 41

In other words, log p(X | θ) = E˜

p(¯ z) log p(X, ¯

z | θ) − E˜

p(¯ z) log p(¯

z | X, θ). Does this look familiar? d(θ) = skin (θ, ˜ p) − vessel (θ, ˜ p) . Like a vegan flea, we want to maximize the value for θ that maximizes the distance between E˜

p(¯ z) log p(X, ¯

z | θ) and E˜

p(¯ z) log p(¯

z | X, θ)! It turns out that all assumptions hold! We can apply our optimization algorithm to approximately maximize log p(X | θ) with respect to θ.

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 35 / 36

slide-42
SLIDE 42

The EM-algorithm

Prerequisites A1 It is efficient to calculate p (¯ z | X, θ) , for any θ. A2 It is efficient to compute arg maxθ Ep(¯

z|X,θo) [log p (X, ¯

z | θ)], for any θ0. EM-algorithm Init Initialize θo with random values. E-step Compute p (¯ z | X, θo) . M-step θ ← arg maxθ Ep(¯

z|X,θo) [log p (X, ¯

z | θ)]. Repeat If θo and θ are close enough, finish; otherwise, set θo ← θ and go to [E-step].

Carlos Cotrini (ETH Z¨ urich) The EM algorithm March 25, 2019 36 / 36