The Automatic Statistician and Future Directions in Probabilistic - PowerPoint PPT Presentation

The Automatic Statistician and Future Directions in Probabilistic Machine Learning Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://mlg.eng.cam.ac.uk/ http://www.automaticstatistician.com/ MLSS 2015, Tübingen

M ACHINE L EARNING AS P ROBABILISTIC M ODELLING ◮ A model describes data that one could observe from a system ◮ If we use the mathematics of probability theory to express all forms of uncertainty and noise associated with our model... ◮ ...then inverse probability (i.e. Bayes rule) allows us to infer unknown quantities, adapt our models, make predictions and learn from data. Zoubin Ghahramani 2 / 24

B AYES R ULE P ( data | hypothesis ) P ( hypothesis ) P ( hypothesis | data ) = P ( data ) P ( data | hypothesis ) P ( hypothesis ) = � h P ( data | h ) P ( h ) Zoubin Ghahramani 3 / 24

W HEN IS THE PROBABILISTIC APPROACH ESSENTIAL ? Many aspects of learning and intelligence depend crucially on the careful probabilistic representation of uncertainty : ◮ Forecasting ◮ Decision making ◮ Learning from limited, noisy, and missing data ◮ Learning complex personalised models ◮ Data compression ◮ Automating scientific modelling, discovery, and experiment design Zoubin Ghahramani 5 / 24

C URRENT AND FUTURE DIRECTIONS ◮ Probabilistic programming ◮ Bayesian optimisation ◮ Rational allocation of computational resources ◮ Probabilistic models for efficient data compression ◮ The automatic statistician Zoubin Ghahramani 6 / 24

P ROBABILISTIC P ROGRAMMING Problem: Probabilistic model development and the derivation of inference algorithms is time-consuming and error-prone. Zoubin Ghahramani 7 / 24

P ROBABILISTIC P ROGRAMMING Problem: Probabilistic model development and the derivation of inference algorithms is time-consuming and error-prone. Solution: ◮ Develop Turing-complete Probabilistic Programming Languages for expressing probabilistic models as computer programs that generate data (i.e. simulators). ◮ Derive Universal Inference Engines for these languages that sample over program traces given observed data. Example languages: Church, Venture, Anglican, Stochastic Python*, ones based on Haskell*, Julia* Example inference algorithms: Metropolis-Hastings MCMC, variational inference, particle filtering, slice sampling*, particle MCMC, nested particle inference*, austerity MCMC* Zoubin Ghahramani 7 / 24

Example Probabilistic Program for a Hidden Markov Model (HMM) P ROBABILISTIC P ROGRAMMING Julia statesmean = [‐1, 1, 0] # Emission parameters. initial = Categorical([1.0/3, 1.0/3, 1.0/3]) # Prob distr of state[1]. trans = [Categorical([0.1, 0.5, 0.4]), Categorical([0.2, 0.2, 0.6]), Categorical([0.15, 0.15, 0.7])] # Trans distr for each state. data = [Nil, 0.9, 0.8, 0.7, 0, ‐0.025, ‐5, ‐2, ‐0.1, 0, 0.13] @model hmm begin # Define a model hmm. states = Array(Int, length(data)) @assume(states[1] ~ initial) for i = 2:length(data) @assume(states[i] ~ trans[states[i‐1]]) @observe(data[i] ~ Normal(statesmean[states[i]], 0.4)) end @predict states end initial trans Haskell An example probabilistic program in Julia implementing a ... states[1] states[2] states[3] anglicanHMM :: Dist [n] 3-state hidden Markov model anglicanHMM = fmap (take (length values) . fst) $ score (length values ‐ 1) statesmean (hmm init trans gen) where (HMM). states = [0,1,2] init = uniform states data[1] data[2] data[3] ... trans 0 = fromList $ zip states [0.1,0.5,0.4] Probabilistic programming could revolutionise scientific modelling. trans 1 = fromList $ zip states [0.2,0.2,0.6] trans 2 = fromList $ zip states [0.15,0.15,0.7] Zoubin Ghahramani 8 / 24 gen 0 = certainly (‐1) gen 1 = certainly 1 gen 2 = certainly 0 values = [0.9,0.8,0.7] :: [Double] addNoise = flip Normal 1 score 0 d = d score n d = score (n‐1) $ condition d (prob . (`pdf` (values !! n)) . addNoise . (!! n) . snd)

B AYESIAN O PTIMISATION t=3 t=4 new Posterior Posterior observ. Acquisition function Acquisition function next point Problem: Global optimisation of black-box functions that are expensive to evaluate Zoubin Ghahramani 9 / 24

B AYESIAN O PTIMISATION t=3 t=4 new Posterior Posterior observ. Acquisition function Acquisition function next point Problem: Global optimisation of black-box functions that are expensive to evaluate Solution: treat as a problem of sequential decision-making and model uncertainty in the function. This has myriad applications, from robotics to drug design, to learning neural networks, and speeding up model search in the automatic statistician. Zoubin Ghahramani 9 / 24

B AYESIAN O PTIMISATION Figure 4. Classification error of a 3-hidden-layer neural network constrained to make predictions in under 2 ms. (work with J.M. Hernández-Lobato, M.A. Gelbart, M.W. Hoffman, & R.P. Adams) Zoubin Ghahramani 10 / 24

R ATIONAL ALLOCATION OF COMPUTATIONAL RESOURCES Problem: Many problems in machine learning and AI require the evaluation of a large number of alternative models on potentially large datasets. A rational agent needs to consider the tradeoff between statistical and computational efficiency. Zoubin Ghahramani 11 / 24

R ATIONAL ALLOCATION OF COMPUTATIONAL RESOURCES Problem: Many problems in machine learning and AI require the evaluation of a large number of alternative models on potentially large datasets. A rational agent needs to consider the tradeoff between statistical and computational efficiency. Solution: Treat the allocation of computational resources as a problem in sequential decision-making under uncertainty. Zoubin Ghahramani 11 / 24

R ATIONAL ALLOCATION OF COMPUTATIONAL RESOURCES Movie Link (work with James R. Lloyd) Zoubin Ghahramani 12 / 24

P ROBABILISTIC DATA COMPRESSION Problem: We often produce more data than we can store or transmit. (E.g. CERN → data centres, or Mars Rover → Earth.) Zoubin Ghahramani 13 / 24

P ROBABILISTIC DATA COMPRESSION Problem: We often produce more data than we can store or transmit. (E.g. CERN → data centres, or Mars Rover → Earth.) Solution: ◮ Use the same resources more effectively by predicting the data with a probabilistic model. ◮ Produce a description of the data that is (on average) cheaper to store or transmit. Example: "PPM-DP" is based on a probabilistic model that learns and predicts symbol occurences in sequences. It works on arbitrary files, but delivers cutting-edge compression results for human text. Probabilistic models for human text also have many other applications aside from data compression, e.g. smart text entry methods, anomaly detection, sequence synthesis. (work with Christian Steinruecken and David J. C. MacKay) Zoubin Ghahramani 13 / 24

P ROBABILISTIC DATA COMPRESSION Zoubin Ghahramani 14 / 24

T HE A UTOMATIC S TATISTICIAN Language of models Translation Report Data Search Model Prediction Checking Evaluation Problem: Data are now ubiquitous; there is great value from understanding this data, building models and making predictions... however, there aren’t enough data scientists, statisticians, and machine learning experts. Solution: Develop a system that automates model discovery from data: ◮ processing data, searching over models, discovering a good model, and explaining what has been discovered to the user. Zoubin Ghahramani 15 / 24

T HE A UTOMATIC S TATISTICIAN Language of models Translation Report Data Search Model Prediction Checking Evaluation ◮ An open-ended language of models ◮ Expressive enough to capture real-world phenomena. . . ◮ . . . and the techniques used by human statisticians ◮ A search procedure ◮ To efficiently explore the language of models ◮ A principled method of evaluating models ◮ Trading off complexity and fit to data ◮ A procedure to automatically explain the models ◮ Making the assumptions of the models explicit.. . ◮ . . . in a way that is intelligible to non-experts (work with J. R. Lloyd, D.Duvenaud, R.Grosse, and J.B.Tenenbaum) Zoubin Ghahramani 16 / 24

E XAMPLE : A N ENTIRELY AUTOMATIC ANALYSIS Raw data Full model posterior with extrapolations 700 700 600 600 500 500 400 400 300 300 200 200 100 100 0 1950 1952 1954 1956 1958 1960 1962 1950 1952 1954 1956 1958 1960 1962 Four additive components have been identified in the data ◮ A linearly increasing function. ◮ An approximately periodic function with a period of 1.0 years and with linearly increasing amplitude. ◮ A smooth function. ◮ Uncorrelated noise with linearly increasing standard deviation. Zoubin Ghahramani 17 / 24

The Automatic Statistician and Future Directions in Probabilistic - PowerPoint PPT Presentation

The Automatic Statistician and Future Directions in Probabilistic Machine Learning Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://mlg.eng.cam.ac.uk/ http://www.automaticstatistician.com/ MLSS

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

DB Future Directions Future Directions The Future is hard to predict and is driven by

Future Directions in High Future Directions in High P Performance Computing Performance

Future directions in convective Future directions in convective parameterization

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

The Automatic Statistician an AI for Data Science Zoubin Ghahramani Department of Engineering

The Role of Fundamentals in Future The Role of Fundamentals in Future Directions for the Chemical

4 TH OECD WORLD FORUM PROVISIONAL KEY MESSAGES AND FUTURE DIRECTIONS Martine Durand, OECD Chief

FUTURE PULL: Future Pull Creating Change From the THE FARMHOUSE IN MY FUTURE Future Back Bill

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

Advice Automatic Structures and Uniformly Automatic Classes Faried Abu Zaid 1 , Erich Grdel 2 ,

Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master

Snake News or Fake News? The Game Show Tara Cataldo Science Collections Coordinator Marston

Motivation: No Formal Theory Motivation: No Formal Theory Master course at Leiden University

Administrative - Poster Session on Wednesday, worth 3% of final grade, +2% for top few posters.

Human-level control through deep reinforcement Liia Butler But first... A quote "The

CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 22 Final Exam

CSC421/2516 Lecture 22: Go Roger Grosse and Jimmy Ba Roger Grosse and Jimmy Ba CSC421/2516

Better I/O Through Byte-Addressable, Persistent Memory Jeremy Condit , Ed Nightingale, Chris

Software citation today and tomorrow Daniel S. Katz Assistant Director for Scientific Software

The Automatic Statistician and Future Directions in Probabilistic - PowerPoint PPT Presentation

The Automatic Statistician and Future Directions in Probabilistic Machine Learning Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://mlg.eng.cam.ac.uk/ http://www.automaticstatistician.com/ MLSS

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

G. G. Stokes 1857 Stokes diagram with Stokes directions Halo at with singular directions

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

DB Future Directions Future Directions The Future is hard to predict and is driven by

Future Directions in High Future Directions in High P Performance Computing Performance

Future directions in convective Future directions in convective parameterization

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

The Automatic Statistician an AI for Data Science Zoubin Ghahramani Department of Engineering

The Role of Fundamentals in Future The Role of Fundamentals in Future Directions for the Chemical

4 TH OECD WORLD FORUM PROVISIONAL KEY MESSAGES AND FUTURE DIRECTIONS Martine Durand, OECD Chief

FUTURE PULL: Future Pull Creating Change From the THE FARMHOUSE IN MY FUTURE Future Back Bill

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

Advice Automatic Structures and Uniformly Automatic Classes Faried Abu Zaid 1 , Erich Grdel 2 ,

Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master

Snake News or Fake News? The Game Show Tara Cataldo Science Collections Coordinator Marston

Motivation: No Formal Theory Motivation: No Formal Theory Master course at Leiden University

Administrative - Poster Session on Wednesday, worth 3% of final grade, +2% for top few posters.

Human-level control through deep reinforcement Liia Butler But first... A quote &quot;The

CSC321 Lecture 23: Go Roger Grosse Roger Grosse CSC321 Lecture 23: Go 1 / 22 Final Exam

CSC421/2516 Lecture 22: Go Roger Grosse and Jimmy Ba Roger Grosse and Jimmy Ba CSC421/2516

Better I/O Through Byte-Addressable, Persistent Memory Jeremy Condit , Ed Nightingale, Chris

Software citation today and tomorrow Daniel S. Katz Assistant Director for Scientific Software

Human-level control through deep reinforcement Liia Butler But first... A quote "The