Online Collec*ve Inference Jay Pujara U. Maryland, College Park U. - - PowerPoint PPT Presentation

online collec ve inference
SMART_READER_LITE
LIVE PREVIEW

Online Collec*ve Inference Jay Pujara U. Maryland, College Park U. - - PowerPoint PPT Presentation

Online Collec*ve Inference Jay Pujara U. Maryland, College Park U. Maryland, College Park Ben London Lise Getoor U. California, Santa Cruz BIRS Workshop: New Perspec*ves for Rela*onal Learning 4/23/2015 Real-world problems benefit from


slide-1
SLIDE 1

Online Collec*ve Inference

Jay Pujara

  • U. Maryland, College Park

Ben London

  • U. Maryland, College Park

Lise Getoor

  • U. California, Santa Cruz

BIRS Workshop: New Perspec*ves for Rela*onal Learning 4/23/2015

slide-2
SLIDE 2

Real-world problems…

slide-3
SLIDE 3

…benefit from rela*onal models

slide-4
SLIDE 4

Genre(M1, G)∧ Genre(M2, G)∧ Likes(U1, M1) → Likes(U1, M2)

Collabora*ve Filtering

Friends Genre

Likes(U1, M1)∧ Friends(U1, U2) → Likes(U2, M1)

Users Items

slide-5
SLIDE 5

Coworkers(U1, C)∧ Coworkers(U2, C)∧ → Coworkers(U1, U2)

Link Predic*on

slide-6
SLIDE 6

Knowledge Graph Iden*fica*on

MutEx(L1, L2)∧ Label(E, L1)∧ → ¬Label(E, L2) Rel(R, E1, T)∧ Rel(R, E2, T)∧ Label(E1, L)∧ → Label(E2, L)

Genre Ar*st

(Jiang et al., ICDM12; Pujara et al., ISWC13)

slide-7
SLIDE 7

…benefit from rela*onal models

slide-8
SLIDE 8

Real-world problems are big!

Millions of users, thousands movies Millions of facts, thousands of

  • ntological constraints

Millions of users Millions of users, thousands of genes

slide-9
SLIDE 9

What happens when?

A user rates a new movie? New facts are extracted from the Web? New user links form? A new gene=c similarity is discovered?

slide-10
SLIDE 10

What happens when?

A user rates a new movie? New facts are extracted from the Web? New user links form? A new gene=c similarity is discovered?

Repeat Inference!

slide-11
SLIDE 11

Why can’t we repeat inference?

  • We want rich, collec*ve models!
  • But, 10M-1B factors = 1-100s hours*
  • Ideal: Inference *me balances update cycle
  • Insanity is doing the same thing over and over…
slide-12
SLIDE 12

PROBLEM SETTING

Online Collec*ve Inference

slide-13
SLIDE 13

Key Problem

  • Real-world problems -> large graphical models
  • Changing evidence -> repeat inference
slide-14
SLIDE 14

Key Problem

  • Real-world problems -> large graphical models
  • Changing evidence -> repeat inference
  • What happens when par*ally upda*ng inference?
  • Can we scalably approximate the MAP state

without recompu*ng inference?

slide-15
SLIDE 15

Generic Answer: NO!

  • Nodes can be or
  • Model has prob. mass only when nodes same
  • Fix some nodes to then observe evidence for
slide-16
SLIDE 16

Generic Answer: NO!

  • Nodes can be or
  • Model has prob. mass only when nodes same
  • Fix some nodes to then observe evidence for

Full Inference

slide-17
SLIDE 17

Generic Answer: NO!

  • Nodes can be or
  • Model has prob. mass only when nodes same
  • Fix some nodes to then observe evidence for

Full Inference

slide-18
SLIDE 18

Previous Work

  • Belief Revision

– e.g. Gardenfors, 1992

  • Bayesian Network Updates

– e.g. Bun*ne, 1991; Friedman & Goldszmidt, 1997

  • Dynamic / Sequen*al Models

– e.g. Murphy, 2002 / Fine et al., 1998

  • Adap*ve Inference

– e.g. Acar et al., 2008

  • BP Message Passing

– e.g. Nath & Domingos, 2010

  • Collec*ve Stability

– e.g. London et al., 2013

slide-19
SLIDE 19

Problem Seing

  • Fixed model: dependencies & weights known
  • Online: changing evidence or observa*ons
  • Closed world: all variables iden*fied
  • Budget: infer only m variables in each epoch
  • Strongly-convex inference objec=ve (e.g. PSL)

Ques*ons:

  • What guarantees can we offer?
  • Which m variables should we infer?
slide-20
SLIDE 20

Approach

  • Define “regret” for online collec*ve inference
  • Introduce regret bounds for strongly convex

inference objec*ves (like PSL!)

  • Develop algorithms to ac%vate a subset of the

variables during inference, given budget

slide-21
SLIDE 21

REGRET BOUNDS

Online Collec*ve Inference

slide-22
SLIDE 22

Inference Regret

  • General inference problem: es*mate
  • In online collec*ve inference: fix , infer
  • Regret (learning): captures distance to op*mal
  • Regret (inference): the distance between the

full inference result and the par*al inference update (when condi*oning on )

YS YS

?

YS P(Y |X)

slide-23
SLIDE 23

Defining Regret

  • Regret: distance between full & approximate

inference where

h(x; ˙ w) = arg min

y

w · f(x, y) + wp 2 kyk2

2 .

Prior weight

Rn(x, yS; ˙ w) , 1 n kh(x; ˙ w) h(x, yS; ˙ w)k1

slide-24
SLIDE 24

Regret Bound

Regret Ingredients: Lipschitz constant 2-norm of model weights Weight of L2 prior L1 distance fixed variables and values in full inference

Key Takeaway: Regret depends on L1 distance between fixed variables & their “true” values in the MAP state

Rn(x, yS; ˙ w)  O s Bkwk2 n · wp kyS ˆ ySk1 !

slide-25
SLIDE 25

Valida*ng Regret Bounds

5 10 15 20 25 30 35 40 45 50 0.05 0.1 0.15 0.2 0.25

# epochs inference regret

scaled regret bound HighLocal Balanced HighRelational

Measure regret of no updates versus full inference, varying the importance of rela*onal features

slide-26
SLIDE 26

ACTIVATION ALGORITHMS

Online Collec*ve Inference

slide-27
SLIDE 27

Which variables to fix?

  • Knapsack: combinatorial, regrets/costs, budget
  • Theory: fix variables that won’t change
  • Prac*ce: how can we know what will change?
  • Idea: Can we use features of past inferences?
  • Explore op*miza*on (case study ADMM & PSL)
slide-28
SLIDE 28

ADMM Inference in PSL

(Boyd et al., 2011; Bach et al. 2012) y1 y2 y3

f2 f1 f4 f3

Poten*als Variables

slide-29
SLIDE 29

ADMM Inference in PSL

y12 y23 y33

f2 f1 f4 f3

y11 y22 y34

Variable Copies Poten*als

slide-30
SLIDE 30

ADMM Inference in PSL

y12 y23 y33

f2 f1 f4 f3

y11 y22 y34

y1 y2 y3

Poten*als Consensus Es*mates

slide-31
SLIDE 31

ADMM Inference in PSL

y12 y23 y33

f2 f1 f4 f3

y11 y22 y34

y1 y2 y3

α11 α12 α22 α23 α33 α34

Poten*als Consensus Es*mates

slide-32
SLIDE 32

ADMM Features

  • Weight: how important is the poten*al?
  • Poten*al: what loss do we incur?
  • Consensus: what is the variable’s value?
  • Lagrange Mul*plier: how much

disagreement is there across poten*als? min

˜ yg

wg fg(x, ˜ yg)+ρ 2

  • ˜

yg − yg + 1 ρ αg

2

  • 2
slide-33
SLIDE 33

Two heuris*cs for ac*va*on

  • Truth-Value: Variable value near 0.5
  • Weighted Lagrangian: rule weight x Lagrange

mul*pliers high

slide-34
SLIDE 34

Using Model Structure

  • Variable dependencies marer!
  • Perform BFS, star*ng with new evidence
  • Use heuris*cs + decay to priori*ze explora*on
slide-35
SLIDE 35

EXPERIMENTAL EVALUATION

slide-36
SLIDE 36

Two Online Inference Tasks

  • Collec*ve Classifica*on (Synthe*c)

– Infer arributes of users in a social network as progressively more informa*on is shared

  • Collabora*ve Filtering (Jester; Goldberg et al. 2001)

– Infer user ra*ngs of jokes as users provide ra*ngs for an increasing number of jokes

slide-37
SLIDE 37

Two Online Inference Tasks

  • Collec*ve Classifica*on (Synthe*c)
  • 100 total trials (10 networks x 10 series)
  • Network evolves from 10% to 60% observed
  • Fix 50% of variables at each epoch
  • Collabora*ve Filtering (Jester)
  • 10 trials, 100 users, 100 jokes
  • Evolve from 25% to 75% revealed ra*ngs
  • Fix {25,50,75}% of variables at each epoch
slide-38
SLIDE 38

Collec*ve Classifica*on: Approximate Inference

5 10 15 20 25 30 35 40 45 50 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 # epochs inference regret Do Nothing Random 50% Value 50% WLM 50% Relational 50% 5 10 15 20 25 30 35 40 45 50 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 # epochs MAE Full Inference Random 50% Value 50% WLM 50% Relational 50%

regret vs. epochs error vs. epochs

  • Regret diminishes over *me
  • Error decreases, approaching full inference
  • 69% reduc*on in inference *me
slide-39
SLIDE 39

Collabora*ve Filtering

0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 % observed inference regret 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.23 0.231 0.232 0.233 0.234 0.235 0.236 % observed RMSE 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.2335 0.234 0.2345 0.235 0.2355 0.236 0.2365 % observed RMSE Full Inference Random 25% Value 25% WLM 25% Relational 25% 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 % observed inference regret Do Nothing Random 25% Value 25% WLM 25% Relational 25%

Regret RMSE

Epochs 50% ac*vated 25% ac*vated

slide-40
SLIDE 40

Collabora*ve Filtering

0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 % observed inference regret 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.23 0.231 0.232 0.233 0.234 0.235 0.236 % observed RMSE 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.2335 0.234 0.2345 0.235 0.2355 0.236 0.2365 % observed RMSE Full Inference Random 25% Value 25% WLM 25% Relational 25%

RMSE

Epochs 50% ac*vated 25% ac*vated

  • Value: high regret, but lower

error than full inference

  • Preserves polarized ra*ngs
  • 66% reduc*on in *me for

approximate inference

slide-41
SLIDE 41

CONCLUSION

Online Collec*ve Inference

slide-42
SLIDE 42

Summary

  • Extremely relevant to modern problems
  • Necessity: approximate MAP state in PGMs
  • Inference regret: bound approxima*on error
  • Approx. algos: use op*miza*on features
  • Results: low regret, low error, faster
  • New possibili*es: rich models, fast inference
slide-43
SLIDE 43

Future Work

  • Berer bounds for approximate inference?
  • Dealing with changing models/weights
  • Explicitly modeling change in models
  • Applica*ons:

– Drug targe*ng – Knowledge Graph construc*on – Context-aware mobile devices