Wolfe Practical Machine Learning Using Probabilistic Programming - PowerPoint PPT Presentation

Wolfe Practical Machine Learning Using Probabilistic Programming and Optimization Sameer Singh University of Washington Sebastian Riedel Tim Rocktäschel Luke Hewitt University College London

Large-scale NLP Information Extraction Limitations of existing PPLs (Why NLP people don’t really use them) • “Differently Expressive”, or “I also want to get the awesome results from ICML 2014!” • Inefficient, or lack of support for existing (or latest/greatest) inference approaches • Black-boxness, or “Fine, results suck. Should I (can I) change anything?”

Practical Probabilistic Programming Probabilistic Program Inference and Learning Inference Results

Practical Probabilistic Programming Expressive Models used in Machine Learning   Bayesian Networks, Markov Random Fields, Conditional Random Fields, Matrix Factorization, Word Embeddings, Deep Neural Networks Probabilistic Program Inference and Learning Inference Results

Practical Probabilistic Programming Probabilistic Program Inference and Learning Inference Results Interface for Existing (and Future) Approaches   Gibbs, Adaptive MCMC, Variational, Lifted Inference, Convex Optimization, Linear Programming, Stochastic Gradient Descent, Sub-modular Optimization, …

Practical Probabilistic Programming Probabilistic Program Inference and Learning Inference Results Comprehend Results and Debug Models   Visualizing Distributions, Plate Notation, Inference Progress, Feature engineering, Hyper-parameter optimization, …

Factor Graph Models • Models where distribution is specified by an undirected graphical model over variables and “factors” P ( Y ) = 1 X Z exp φ c ( Y c ) c • Often conditional and parameterized 1 θ ∈ R d X P θ ( Y | X ) = Z θ ( x ) exp θ · ψ c ( Y, X ) ψ c : Y c × X → R d c • Partial support: Figaro, Church, MLNs, ProbLog… • Factorie: orders of magnitude faster MCMC on big graphs

Practical Probabilistic Programming Probabilistic Program Inference and Learning Inference Results

Wolfe Functional Probabilistic Programming for Declarative Machine Learning

Wolfe Philip Wolfe • founder of convex optimization and mathematical programming Sriram: Optimization is more important than PPLs • “We want to give users what they use!”

Wolfe Overview 1) Functional programs 3) Native Language Compiler scalar functions for density, loss/objective,… • Compiles to efficient code special operators for inference/learning • argmax, ¡logZ, ¡expect • User Code Actual Code Wolfe Interpreter Scala Compiler import ¡wolfe._ ¡ import ¡wolfe._ ¡ def ¡domain ¡= ¡… ¡ def ¡domain ¡= ¡… ¡ def ¡model ¡= ¡prod ¡… ¡ def ¡model ¡= ¡factorGraph ¡ def ¡mu ¡= ¡ ¡expect ¡… ¡ def ¡map ¡= ¡beliefProp ¡… ¡ 2) Wolfe Interpreter find factorizations in expression trees • Replaces calls with efficient code •

Wolfe: Language • Inspired by “math” (for example Jason’s Dyna) • make programs look like equations in paper • universal: allow impossible things (but fail at compile time) • But not a DSL! • Integrate with existing tools, codebases, and libraries • Don’t expect users to learn another language • Make use of existing compiler optimizations

Wolfe: Universe • Space of elements of the universe: Set[T] • Booleans/Categories: bools ¡= ¡Set(true, ¡false) • Infinite and Uncountable sets: ints, ¡doubles • Iterables and Functions: seqs(bools), ¡maps(bools, ¡ints) • Abstract Data Types (Structures) • All possible tuples: all(bools,bools) • Person(name: ¡String, ¡age: ¡Int)   cases[Person](strings, ¡ints) • Conditioning: space ¡where ¡cond ¡ (same as “filter”) • persons ¡where ¡_.name==“Andy ¡Gordon”

Wolfe: Functions • Define the density function: T ¡=> ¡Double • def ¡flip(x:Boolean) ¡= ¡0.5 • Easier: Unnormalized, log-probability • def ¡uniform(x: ¡Double) ¡= ¡0.0 ¡// ¡or ¡1.0 • Parameterized distributions • def ¡bernoulli(p)(x) ¡= ¡if(x) ¡log ¡p ¡else ¡log ¡(1-‑p) ¡ • def ¡coin(x) ¡= ¡bernoulli(0.5)(x) ¡ • Model Compositions: def ¡f(x)(z) ¡= ¡g(x) ¡+ ¡h(x)(z)

Wolfe: Operators • sample: (Set[T])(T ¡=> ¡Double) ¡=> ¡T • sample(bools)(bernoulli(0.2)) • argmax: (Set[T])(T ¡=> ¡Double) ¡=> ¡T • expect: (Set[T])(T ¡=> ¡Double)(T ¡=> ¡Vec) ¡=> ¡Vec • expect(doubles ¡st ¡_>0)(norm)(x ¡=> ¡x**2) • logZ: (Set[T])(T ¡=> ¡Double) ¡=> ¡Double argmax (T ¡=> ¡Double) sample expect logZ T

Wolfe: Inference • Sampling and MAP Inference are straightforward • Marginal Inference: T ¡= ¡Seq[(Int, ¡Double)] • expect(seqs)(model) ¡{ ¡seq ¡=> ¡oneHot(‘0 ¡-‑> ¡seq(0)) ¡} • Discriminative learning: model(w)(xy) • Conditional Likelihood: def ¡cll(data)(w) • sum(data){ ¡d=> ¡model(w)(d) ¡-‑ ¡logZ(_.x==d.x)(model(w))} • Maximize: argmax(doubles) ¡{ ¡w ¡=> ¡cll(data)(w) ¡}

Topic Models case ¡class ¡Token(word:String, ¡topic:Int)   case ¡class ¡Doc(tokens:Seq[Token], ¡theta:Map[Int,Double])   case ¡class ¡World(docs:Seq[Doc],phi:Seq[Map[String,Double]]) ¡ val ¡alpha ¡= ¡50.0, ¡beta ¡= ¡0.01 ¡ K Y def ¡lda(world:World) ¡= ¡{   P α , β ( W, Z, θ , φ ) = Dir β ( φ i ) ¡ ¡ ¡import ¡world._   i =1 ¡ ¡ ¡prod(phi) ¡{ ¡dir(_,beta)} ¡*   M Y Dir α ( θ d ) ¡ ¡ ¡prod(docs) ¡{ ¡d ¡=>   ¡ ¡ ¡ ¡ ¡ ¡ ¡dir(d.theta, ¡alpha) ¡* ¡   d =1 N d ¡ ¡ ¡ ¡ ¡ ¡ ¡prod(d.tokens) ¡{t ¡=>   Y Cat( Z d,t | θ d )Cat( W d,t | φ Z d,t ) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡cat(t.topic, ¡d.theta) ¡*   t =1 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡cat(t.word, ¡phi(t.topic)) ¡}}}

Relational Model case ¡class ¡World(smokes:Pred[Symbol],cancer:Pred[Symbol],   ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡friends: ¡Pred[(Symbol, ¡Symbol)]) ¡ def ¡persons ¡= ¡List(’anna, ¡’bob)   def ¡worlds ¡= ¡   ¡ ¡ ¡ ¡ ¡ ¡ ¡cross[World](preds(persons),preds(persons), ¡preds(friends)) ¡ def ¡mln(world: ¡World) ¡= ¡{ ¡   ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡sum(persons) ¡{ ¡p ¡=> ¡1.5*I(smokes(p) ¡-‑-‑> ¡cancer(p)) ¡}   ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡+ ¡sum(persons) ¡{ ¡p1 ¡=> ¡sum(persons) ¡{ ¡p2 ¡=>   ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡1.1*I(friends(p1, ¡p2) ¡-‑-‑> ¡(smokes(p1) ¡== ¡smokes(p2)))   ¡ ¡ ¡ ¡ ¡ ¡ ¡}} ¡} ¡ Friends(person, person) def ¡evidence(world: ¡World) ¡= ¡   Smokes(person) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡world.smokes(’anna) ¡&& ¡world.friends(’anna, ¡’bob) ¡ Cancer(person) def ¡query(world: ¡World) ¡= ¡oneHot(world.cancer(’bob)) ¡ Smokes(x) => Cancer(x) 1.5 val ¡mu ¡= ¡expect(worlds ¡where ¡evidence) ¡{ ¡mln ¡} ¡{ ¡query ¡} Friends(x,y) => (Smokes(x) <=> Smokes(y)) 1.1

Combined Relational+Factorization def ¡mln(world: ¡World) ¡= ¡   ¡ ¡ ¡ ¡ ¡sum(persons) ¡{ ¡p ¡=> ¡1.5 ¡* ¡I(smokes(p) ¡-‑-‑> ¡cancer(p)) ¡} ¡ case ¡class ¡A(smokes:Seq[Double], ¡cancer:Seq[Double])   case ¡class ¡V(ents:Map[Symbol,Seq[Double]]) ¡ def ¡mf(w:World)(a:A)(v:V) ¡= ¡   ¡ ¡sum(persons){p ¡=> ¡I(w.smokes(p))*(a.smokes ¡dot ¡v.ents(p)} ¡   + ¡sum(persons){p ¡=> ¡I(w.cancer(p))*(a.cancer ¡dot ¡v.ents(p)} ¡ def ¡joint(w:World)(a:A)(v:V) ¡= ¡mln(w) ¡+ ¡mf(w)(a)(v) ¡ Easily combine with existing models (or relearn parameters for them)

Wolfe Practical Machine Learning Using Probabilistic Programming - PowerPoint PPT Presentation

Wolfe Practical Machine Learning Using Probabilistic Programming and Optimization Sameer Singh University of Washington Sebastian Riedel Tim Rocktschel Luke Hewitt University College London Large-scale NLP Information Extraction

WOLFE RESIDENCE 337 Kenmore Road The Douglaston Historic District Kevin Wolfe Architect 1

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Presentation for: Prospect Name February 17, 2015 Gregg Wolfe-Principal Doreen Guss National

Fortran Programmers Michael Wolfe PGI compiler engineer michael.wolfe@pgroup.com Outline GPU

NEUROMUSCULAR DISEASE LISA F. WOLFE, MD A SSOCIATE P ROFESSOR IN M EDICINE -P ULMONARY AND N

About The Firm - Kalis, Kleiman & Wolfe In 1996, Mr. Kalis and Mr. Kleiman formed KALIS &

Delivering Effective Presentations Joanna Wolfe, PhD Director, Global Communication Center The

Column Generation, Dantzig-Wolfe, Branch-Price-and-Cut Marco L ubbecke OR Group RWTH

Inpatient Rehabilitation Cortney Wolfe, PT, DPT, NCS Objectives Facilitate skill development,

Susan Marina Wolfe Living on the Edge Conference St Martin in the Fields Oct 2015

Wolfe Research 2017 Power & Gas Leaders Conference John Ketchum Chief Financial Officer

AC 2012-4193: HIGH-QUALITY VISUAL EVIDENCE ON PRESENTA- TION SLIDES MAY OFFSET THE NEGATIVE

How High? Insurance and Medical Marijuana in 2017 John Leinicke Scharome Wolfe Maribel Lopez

The vaporings of a shoddy aesthete without talent. Thomas Wolfe, You Cant Go Home Again

Western Power Trading Forum comments on CRR Analysis Ellen Wolfe for WPTF December 19, 2017

Scott Phillips and Martin Wolfe IOR Elm Farm Research Centre Hamstead Marshall Near Newbury

Reformulations in Mathematical Programming Leo Liberti LIX, Ecole Polytechnique, France CTW

The NJOY Processing Code A.C. (SKIP) KAHLER LOS ALAMOS NATIONAL LABORATORY (RETIRED) KAHLER

ASIC Development @ GSI Holger Flemming Experiment Electronic / ASIC-Design 1 1 The GSI ASIC

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Introduction: Mathematical optimization Motivating Example Applications Least-squares(LS) and

Completely positive and copositive matrices and optimization Bob s birthday conference The

Fast Coordinate Descent methods for Non-Negative Matrix Factorization Inderjit S. Dhillon

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

Wolfe Practical Machine Learning Using Probabilistic Programming - PowerPoint PPT Presentation

Wolfe Practical Machine Learning Using Probabilistic Programming and Optimization Sameer Singh University of Washington Sebastian Riedel Tim Rocktschel Luke Hewitt University College London Large-scale NLP Information Extraction

WOLFE RESIDENCE 337 Kenmore Road The Douglaston Historic District Kevin Wolfe Architect 1

with OpenACC Directives Michael Wolfe michael.wolfe@pgroup.com http://www.pgroup.com/accelerate

Presentation for: Prospect Name February 17, 2015 Gregg Wolfe-Principal Doreen Guss National

Fortran Programmers Michael Wolfe PGI compiler engineer michael.wolfe@pgroup.com Outline GPU

NEUROMUSCULAR DISEASE LISA F. WOLFE, MD A SSOCIATE P ROFESSOR IN M EDICINE -P ULMONARY AND N

About The Firm - Kalis, Kleiman &amp; Wolfe In 1996, Mr. Kalis and Mr. Kleiman formed KALIS &amp;

Delivering Effective Presentations Joanna Wolfe, PhD Director, Global Communication Center The

Column Generation, Dantzig-Wolfe, Branch-Price-and-Cut Marco L ubbecke OR Group RWTH

Inpatient Rehabilitation Cortney Wolfe, PT, DPT, NCS Objectives Facilitate skill development,

Susan Marina Wolfe Living on the Edge Conference St Martin in the Fields Oct 2015

Wolfe Research 2017 Power &amp; Gas Leaders Conference John Ketchum Chief Financial Officer

AC 2012-4193: HIGH-QUALITY VISUAL EVIDENCE ON PRESENTA- TION SLIDES MAY OFFSET THE NEGATIVE

How High? Insurance and Medical Marijuana in 2017 John Leinicke Scharome Wolfe Maribel Lopez

The vaporings of a shoddy aesthete without talent. Thomas Wolfe, You Cant Go Home Again

Western Power Trading Forum comments on CRR Analysis Ellen Wolfe for WPTF December 19, 2017

Scott Phillips and Martin Wolfe IOR Elm Farm Research Centre Hamstead Marshall Near Newbury

Reformulations in Mathematical Programming Leo Liberti LIX, Ecole Polytechnique, France CTW

The NJOY Processing Code A.C. (SKIP) KAHLER LOS ALAMOS NATIONAL LABORATORY (RETIRED) KAHLER

ASIC Development @ GSI Holger Flemming Experiment Electronic / ASIC-Design 1 1 The GSI ASIC

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Introduction: Mathematical optimization Motivating Example Applications Least-squares(LS) and

Completely positive and copositive matrices and optimization Bob s birthday conference The

Fast Coordinate Descent methods for Non-Negative Matrix Factorization Inderjit S. Dhillon

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

About The Firm - Kalis, Kleiman & Wolfe In 1996, Mr. Kalis and Mr. Kleiman formed KALIS &

Wolfe Research 2017 Power & Gas Leaders Conference John Ketchum Chief Financial Officer