online collec ve inference
play

Online Collec*ve Inference Jay Pujara U. Maryland, College Park U. - PowerPoint PPT Presentation

Online Collec*ve Inference Jay Pujara U. Maryland, College Park U. Maryland, College Park Ben London Lise Getoor U. California, Santa Cruz BIRS Workshop: New Perspec*ves for Rela*onal Learning 4/23/2015 Real-world problems benefit from


  1. Online Collec*ve Inference Jay Pujara U. Maryland, College Park U. Maryland, College Park Ben London Lise Getoor U. California, Santa Cruz BIRS Workshop: New Perspec*ves for Rela*onal Learning 4/23/2015

  2. Real-world problems…

  3. …benefit from rela*onal models

  4. Collabora*ve Filtering Likes ( U1 , M1 ) ∧ Items Users Friends ( U1 , U2 ) → Likes ( U2 , M1 ) Genre Genre ( M1 , G ) ∧ Friends Genre ( M2 , G ) ∧ Likes ( U1 , M1 ) → Likes ( U1 , M2 )

  5. Link Predic*on Coworkers ( U1 , C ) ∧ Coworkers ( U2 , C ) ∧ → Coworkers ( U1 , U2 )

  6. Knowledge Graph Iden*fica*on Rel ( R , E1 , T ) ∧ Genre Rel ( R , E2 , T ) ∧ Label ( E1 , L ) ∧ → Label ( E2 , L ) Ar*st MutEx ( L1 , L2 ) ∧ Label ( E , L1 ) ∧ → ¬ Label ( E , L2 ) (Jiang et al., ICDM12; Pujara et al., ISWC13)

  7. …benefit from rela*onal models

  8. Real-world problems are big! Millions of users, Millions of users, thousands movies thousands of genes Millions of facts, thousands of Millions of users ontological constraints

  9. What happens when? A user rates a new A new gene=c similarity movie? is discovered? New facts are extracted from the New user links form? Web?

  10. What happens when? A user rates a new A new gene=c similarity movie? is discovered? Repeat Inference! New facts are extracted from the New user links form? Web?

  11. Why can’t we repeat inference? • We want rich, collec*ve models! • But, 10M-1B factors = 1-100s hours * • Ideal: Inference *me balances update cycle • Insanity is doing the same thing over and over…

  12. Online Collec*ve Inference PROBLEM SETTING

  13. Key Problem • Real-world problems -> large graphical models • Changing evidence -> repeat inference

  14. Key Problem • Real-world problems -> large graphical models • Changing evidence -> repeat inference • What happens when par*ally upda*ng inference? • Can we scalably approximate the MAP state without recompu*ng inference?

  15. Generic Answer: NO! • Nodes can be or • Model has prob. mass only when nodes same • Fix some nodes to then observe evidence for

  16. Generic Answer: NO! Full Inference • Nodes can be or • Model has prob. mass only when nodes same • Fix some nodes to then observe evidence for

  17. Generic Answer: NO! Full Inference • Nodes can be or • Model has prob. mass only when nodes same • Fix some nodes to then observe evidence for

  18. Previous Work • Belief Revision – e.g. Gardenfors, 1992 • Bayesian Network Updates – e.g. Bun*ne, 1991; Friedman & Goldszmidt, 1997 • Dynamic / Sequen*al Models – e.g. Murphy, 2002 / Fine et al., 1998 • Adap*ve Inference – e.g. Acar et al., 2008 • BP Message Passing – e.g. Nath & Domingos, 2010 • Collec*ve Stability – e.g. London et al., 2013

  19. Problem Seing • Fixed model: dependencies & weights known • Online: changing evidence or observa*ons • Closed world: all variables iden*fied • Budget: infer only m variables in each epoch • Strongly-convex inference objec=ve (e.g. PSL) Ques*ons: • What guarantees can we offer? • Which m variables should we infer?

  20. Approach • Define “regret” for online collec*ve inference • Introduce regret bounds for strongly convex inference objec*ves (like PSL!) • Develop algorithms to ac%vate a subset of the variables during inference, given budget

  21. Online Collec*ve Inference REGRET BOUNDS

  22. Inference Regret • General inference problem: es*mate P ( Y | X ) • In online collec*ve inference: fix , infer Y S Y S • Regret (learning): captures distance to op*mal ? • Regret (inference): the distance between the full inference result and the par*al inference update (when condi*oning on ) Y S

  23. Defining Regret • Regret: distance between full & approximate inference w ) , 1 R n ( x , y S ; ˙ n k h ( x ; ˙ w ) � h ( x , y S ; ˙ w ) k 1 where Prior weight w · f ( x , y ) + w p 2 k y k 2 h ( x ; ˙ w ) = arg min 2 . y

  24. Regret Bound s ! B k w k 2 R n ( x , y S ; ˙ k y S � ˆ w )  O y S k 1 n · w p Regret Ingredients: Lipschitz constant Key Takeaway: 2-norm of model weights Regret depends on L 1 Weight of L 2 prior distance between L 1 distance fixed variables and fixed variables & values in full inference their “true” values in the MAP state

  25. Valida*ng Regret Bounds 0.25 scaled regret bound HighLocal Balanced 0.2 HighRelational inference regret 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 40 45 50 # epochs Measure regret of no updates versus full inference, varying the importance of rela*onal features

  26. Online Collec*ve Inference ACTIVATION ALGORITHMS

  27. Which variables to fix? • Knapsack: combinatorial, regrets/costs, budget • Theory: fix variables that won’t change • Prac*ce: how can we know what will change? • Idea: Can we use features of past inferences? • Explore op*miza*on (case study ADMM & PSL)

  28. ADMM Inference in PSL (Boyd et al., 2011; Bach et al. 2012) f 1 y 1 f 2 Variables Poten*als y 2 f 3 y 3 f 4

  29. ADMM Inference in PSL y 11 f 1 y 12 Variable Copies f 2 y 22 Poten*als y 23 f 3 y 33 f 4 y 34

  30. ADMM Inference in PSL y 11 f 1 y 1 y 12 Consensus Es*mates f 2 y 22 Poten*als y 2 y 23 f 3 y 33 y 3 f 4 y 34

  31. ADMM Inference in PSL y 11 α 11 f 1 y 1 α 12 y 12 Consensus Es*mates f 2 α 22 y 22 Poten*als y 2 α 23 y 23 f 3 y 33 α 33 y 3 α 34 f 4 y 34

  32. ADMM Features � � y g − y g + 1 y g )+ ρ � 2 � min w g f g ( x , ˜ � ˜ ρ α g � � 2 ˜ y g � 2 • Weight: how important is the poten*al? • Poten*al: what loss do we incur? • Consensus: what is the variable’s value? • Lagrange Mul*plier: how much disagreement is there across poten*als?

  33. Two heuris*cs for ac*va*on • Truth-Value: Variable value near 0.5 • Weighted Lagrangian: rule weight x Lagrange mul*pliers high

  34. Using Model Structure • Variable dependencies marer! • Perform BFS, star*ng with new evidence • Use heuris*cs + decay to priori*ze explora*on

  35. EXPERIMENTAL EVALUATION

  36. Two Online Inference Tasks • Collec*ve Classifica*on (Synthe*c) – Infer arributes of users in a social network as progressively more informa*on is shared • Collabora*ve Filtering (Jester; Goldberg et al. 2001) – Infer user ra*ngs of jokes as users provide ra*ngs for an increasing number of jokes

  37. Two Online Inference Tasks • Collec*ve Classifica*on (Synthe*c) • 100 total trials (10 networks x 10 series) • Network evolves from 10% to 60% observed • Fix 50% of variables at each epoch • Collabora*ve Filtering (Jester) • 10 trials, 100 users, 100 jokes • Evolve from 25% to 75% revealed ra*ngs • Fix {25,50,75}% of variables at each epoch

  38. Collec*ve Classifica*on: Approximate Inference 0.44 0.16 Do Nothing Random 50% Value 50% 0.14 0.42 WLM 50% Relational 50% 0.12 0.4 inference regret 0.1 0.38 MAE 0.08 0.36 0.06 0.34 0.04 Full Inference 0.32 Random 50% 0.02 Value 50% WLM 50% Relational 50% 0 0.3 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 # epochs # epochs error vs. epochs regret vs. epochs • Regret diminishes over *me • Error decreases, approaching full inference • 69% reduc*on in inference *me

  39. Collabora*ve Filtering 0.05 0.05 0.045 0.045 0.04 0.04 0.035 Regret 0.035 inference regret inference regret 0.03 0.03 0.025 0.025 0.02 0.02 0.015 0.015 0.01 0.01 Do Nothing Random 25% Value 25% 0.005 0.005 WLM 25% Relational 25% 0 0 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 % observed % observed 50% ac*vated Epochs 25% ac*vated 0.2365 0.236 0.236 0.235 RMSE 0.2355 0.234 RMSE RMSE 0.235 0.233 0.2345 0.232 0.234 Full Inference 0.231 Random 25% Value 25% WLM 25% Relational 25% 0.2335 0.23 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 % observed % observed

  40. Collabora*ve Filtering 0.05 • Value: high regret, but lower 0.045 error than full inference 0.04 0.035 inference regret • Preserves polarized ra*ngs 0.03 0.025 • 66% reduc*on in *me for 0.02 0.015 approximate inference 0.01 0.005 0 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 % observed 50% ac*vated Epochs 25% ac*vated 0.2365 0.236 0.236 0.235 RMSE 0.2355 0.234 RMSE RMSE 0.235 0.233 0.2345 0.232 0.234 Full Inference 0.231 Random 25% Value 25% WLM 25% Relational 25% 0.2335 0.23 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 % observed % observed

  41. Online Collec*ve Inference CONCLUSION

  42. Summary • Extremely relevant to modern problems • Necessity: approximate MAP state in PGMs • Inference regret: bound approxima*on error • Approx. algos: use op*miza*on features • Results: low regret, low error, faster • New possibili*es: rich models, fast inference

  43. Future Work • Berer bounds for approximate inference? • Dealing with changing models/weights • Explicitly modeling change in models • Applica*ons: – Drug targe*ng – Knowledge Graph construc*on – Context-aware mobile devices

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend