The Automatic Statistician an AI for Data Science Zoubin Ghahramani - PowerPoint PPT Presentation

The Automatic Statistician an AI for Data Science Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Intelligent Machines, Nijmegen, 2015 James Robert Lloyd David Duvenaud Roger Grosse Josh Tenenbaum Cambridge Cambridge → Harvard MIT → Toronto MIT

T HERE IS A GROWING NEED FOR DATA ANALYSIS ◮ We live in an era of abundant data ◮ The McKinsey Global Institute claim ◮ “The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.” ◮ Diverse fields increasingly relying on expert statisticians, machine learning researchers and data scientists e.g. ◮ Computational sciences (e.g. biology, astronomy, . . . ) ◮ Online advertising ◮ Quantitative finance ◮ . . . James Robert Lloyd and Zoubin Ghahramani 2 / 43

W HAT WOULD AN AUTOMATIC STATISTICIAN DO ? Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 3 / 43

G OALS OF THE AUTOMATIC STATISTICIAN PROJECT ◮ Provide a set of tools for understanding data that require minimal expert input ◮ Uncover challenging research problems in e.g. ◮ Automated inference ◮ Model construction and comparison ◮ Data visualisation and interpretation ◮ Advance the field of machine learning in general James Robert Lloyd and Zoubin Ghahramani 4 / 43

I NGREDIENTS OF AN AUTOMATIC STATISTICIAN Language of models Translation Data Search Model Prediction Report Evaluation Checking ◮ An open-ended language of models ◮ Expressive enough to capture real-world phenomena. . . ◮ . . . and the techniques used by human statisticians ◮ A search procedure ◮ To efficiently explore the language of models ◮ A principled method of evaluating models ◮ Trading off complexity and fit to data ◮ A procedure to automatically explain the models ◮ Making the assumptions of the models explicit.. . ◮ . . . in a way that is intelligible to non-experts James Robert Lloyd and Zoubin Ghahramani 5 / 43

P REVIEW : A N ENTIRELY AUTOMATIC ANALYSIS Raw data Full model posterior with extrapolations 700 700 600 600 500 500 400 400 300 300 200 200 100 100 0 1950 1952 1954 1956 1958 1960 1962 1950 1952 1954 1956 1958 1960 1962 Four additive components have been identified in the data ◮ A linearly increasing function. ◮ An approximately periodic function with a period of 1.0 years and with linearly increasing amplitude. ◮ A smooth function. ◮ Uncorrelated noise with linearly increasing standard deviation. James Robert Lloyd and Zoubin Ghahramani 6 / 43

D EFINING A LANGUAGE OF MODELS Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 7 / 43

D EFINING A LANGUAGE OF REGRESSION MODELS ◮ Regression consists of learning a function f : X → Y from inputs to outputs from example input / output pairs ◮ Language should include simple parametric forms ... ◮ e.g. Linear functions, Polynomials, Exponential functions ◮ ...as well as functions specified by high level properties ◮ e.g. Smoothness, Periodicity ◮ Inference should be tractable for all models in language James Robert Lloyd and Zoubin Ghahramani 8 / 43

W E CAN BUILD REGRESSION MODELS WITH G AUSSIAN PROCESSES ◮ GP s are distributions over functions such that any finite subset of function evaluations, ( f ( x 1 ) , f ( x 2 ) , . . . f ( x N )) , have a joint Gaussian distribution ◮ A GP is completely specified by ◮ Mean function, µ ( x ) = E ( f ( x )) ◮ Covariance / kernel function, k ( x , x ′ ) = Cov ( f ( x ) , f ( x ′ )) ◮ Denoted f ∼ GP ( µ, k ) f ( x ) f ( x ) f ( x ) f ( x ) GP Posterior Mean GP Posterior Mean GP Posterior Mean GP Posterior Mean GP Posterior Uncertainty GP Posterior Uncertainty GP Posterior Uncertainty GP Posterior Uncertainty Data Data Data x x x x James Robert Lloyd and Zoubin Ghahramani 9 / 43

T HE ATOMS OF OUR LANGUAGE Five base kernels 0 0 0 0 0 Squared Periodic Linear Constant White exp. (SE) (P ER ) (L IN ) (C) noise (WN) Encoding for the following types of functions Smooth Periodic Linear Constant Gaussian functions functions functions functions noise James Robert Lloyd and Zoubin Ghahramani 10 / 43

T HE COMPOSITION RULES OF OUR LANGUAGE ◮ Two main operations: addition, multiplication 0 0 quadratic locally L IN × L IN SE × P ER functions periodic 0 0 periodic plus periodic plus L IN + P ER SE + P ER linear trend smooth trend James Robert Lloyd and Zoubin Ghahramani 11 / 43

A N EXPRESSIVE LANGUAGE OF MODELS Regression model Kernel SE + WN GP smoothing C + L IN + WN Linear regression � SE + WN Multiple kernel learning � SE + � P ER + WN Trend, cyclical, irregular C + � cos + WN Fourier decomposition � cos + WN Sparse spectrum GP s � SE × cos + WN Spectral mixture e.g. CP ( SE , SE ) + WN Changepoints e.g. SE + L IN × WN Heteroscedasticity Note: cos is a special case of our version of P ER James Robert Lloyd and Zoubin Ghahramani 12 / 43

D ISCOVERING A GOOD MODEL VIA SEARCH Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 13 / 43

D ISCOVERING A GOOD MODEL VIA SEARCH ◮ Language defined as the arbitrary composition of five base kernels (WN , C , L IN , SE , P ER ) via three operators ( + , × , CP). ◮ The space spanned by this language is open-ended and can have a high branching factor requiring a judicious search ◮ We propose a greedy search for its simplicity and similarity to human model-building James Robert Lloyd and Zoubin Ghahramani 14 / 43

E XAMPLE : M AUNA L OA K EELING C URVE RQ 60 Start 40 20 RQ 0 SE L IN P ER −20 2000 2005 2010 ... ... P ER + RQ P ER × RQ SE + RQ ... ... SE + P ER + RQ SE × ( P ER + RQ ) ... ... ... James Robert Lloyd and Zoubin Ghahramani 15 / 43

E XAMPLE : M AUNA L OA K EELING C URVE ( Per + RQ ) 40 Start 30 20 10 RQ SE L IN P ER 0 2000 2005 2010 ... ... SE + RQ P ER + RQ P ER × RQ ... ... SE + P ER + RQ SE × ( P ER + RQ ) ... ... ... James Robert Lloyd and Zoubin Ghahramani 15 / 43

E XAMPLE : M AUNA L OA K EELING C URVE SE × ( Per + RQ ) 50 Start 40 30 20 RQ SE L IN P ER 10 2000 2005 2010 ... ... P ER + RQ P ER × RQ SE + RQ ... ... SE + P ER + RQ SE × ( P ER + RQ ) ... ... ... James Robert Lloyd and Zoubin Ghahramani 15 / 43

E XAMPLE : M AUNA L OA K EELING C URVE ( SE + SE × ( Per + RQ ) ) 60 Start 40 20 RQ SE L IN P ER 0 2000 2005 2010 ... ... P ER + RQ P ER × RQ SE + RQ ... ... SE + P ER + RQ SE × ( P ER + RQ ) ... ... ... James Robert Lloyd and Zoubin Ghahramani 15 / 43

M ODEL EVALUATION Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 16 / 43

M ODEL EVALUATION ◮ After proposing a new model its kernel parameters are optimised by conjugate gradients ◮ We evaluate each optimised model, M , using the model evidence (marginal likelihood) which can be computed analytically for GP s ◮ We penalise the marginal likelihood for the optimised kernel parameters using the Bayesian Information Criterion (BIC): − 0 . 5 × BIC ( M ) = log p ( D | M ) − p 2 log n where p is the number of kernel parameters, D represents the data, and n is the number of data points. James Robert Lloyd and Zoubin Ghahramani 17 / 43

A UTOMATIC TRANSLATION OF MODELS Language of models Translation Report Data Search Model Prediction Checking Evaluation James Robert Lloyd and Zoubin Ghahramani 18 / 43

A UTOMATIC TRANSLATION OF MODELS ◮ Search can produce arbitrarily complicated models from open-ended language but two main properties allow description to be automated ◮ Kernels can be decomposed into a sum of products ◮ A sum of kernels corresponds to a sum of functions ◮ Therefore, we can describe each product of kernels separately ◮ Each kernel in a product modifies a model in a consistent way ◮ Each kernel roughly corresponds to an adjective James Robert Lloyd and Zoubin Ghahramani 19 / 43

S UM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE × ( WN × L IN + CP ( C , P ER )) James Robert Lloyd and Zoubin Ghahramani 20 / 43

S UM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE × ( WN × L IN + CP ( C , P ER )) The changepoint can be converted into a sum of products SE × ( WN × L IN + C × σ + P ER × ¯ σ ) James Robert Lloyd and Zoubin Ghahramani 20 / 43

S UM OF PRODUCTS NORMAL FORM Suppose the search finds the following kernel SE × ( WN × L IN + CP ( C , P ER )) The changepoint can be converted into a sum of products SE × ( WN × L IN + C × σ + P ER × ¯ σ ) Multiplication can be distributed over addition SE × WN × L IN + SE × C × σ + SE × P ER × ¯ σ James Robert Lloyd and Zoubin Ghahramani 20 / 43

The Automatic Statistician an AI for Data Science Zoubin Ghahramani - PowerPoint PPT Presentation

The Automatic Statistician an AI for Data Science Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Intelligent Machines, Nijmegen, 2015 James Robert Lloyd David

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

The Automatic Statistician and Future Directions in Probabilistic Machine Learning Zoubin

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

Advice Automatic Structures and Uniformly Automatic Classes Faried Abu Zaid 1 , Erich Grdel 2 ,

Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master

Automatic Blood Glucose Control for Diabetics Anders Lyngvi Fougner Department of Engineering

Automatic Storm Shutters Automatic Storm Shutters Team: Make It Rain Kyle Weber Zachary

Automatic Query Type Identification Automatic Query Type Identification Based on Click Through

Automatic Extraction From Automatic Extraction From and Reasoning About and Reasoning About

A Framework for Automatic Generation A Framework for Automatic Generation of Configuration Files

ITS TIME TO SAVE Automatic voltage optimisers I IREM 49 POWER SUPPLY AND PROFESSIONAL USERS

Automatic Scoring of Automatic Scoring of Handwritten Essays using Latent Handwritten Essays

Automatic Structures and Rank Cornell REU Group 2 Summer 2009 1 Important Questions 1. Is

Machine Translation 2: Statistical MT: Phrase-Based and Neural Ond rej Bojar

Bibliography for Module 16 on Evaluating Vaccine Efficacy Ninth Summer Institute in Statistics and

Systems Resilience and I (Inoue Lab 10 th Anniversary Symposium) Research group members : Hei Chan

Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals

"50 Shades of Truth: Visualisation and Communication of Uncertainty" Rodolphe Dewarrat

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

P a r t 2 2 I n t e g e r p r o g r a m m i n g p r o b l e m s 1

Modular Detection of Model Structure in Integer Programming Michael Bastubbe & Marco L

The Automatic Statistician an AI for Data Science Zoubin Ghahramani - PowerPoint PPT Presentation

The Automatic Statistician an AI for Data Science Zoubin Ghahramani Department of Engineering University of Cambridge zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Intelligent Machines, Nijmegen, 2015 James Robert Lloyd David

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

The Automatic Statistician and Future Directions in Probabilistic Machine Learning Zoubin

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

Advice Automatic Structures and Uniformly Automatic Classes Faried Abu Zaid 1 , Erich Grdel 2 ,

Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master

Automatic Blood Glucose Control for Diabetics Anders Lyngvi Fougner Department of Engineering

Automatic Storm Shutters Automatic Storm Shutters Team: Make It Rain Kyle Weber Zachary

Automatic Query Type Identification Automatic Query Type Identification Based on Click Through

Automatic Extraction From Automatic Extraction From and Reasoning About and Reasoning About

A Framework for Automatic Generation A Framework for Automatic Generation of Configuration Files

ITS TIME TO SAVE Automatic voltage optimisers I IREM 49 POWER SUPPLY AND PROFESSIONAL USERS

Automatic Scoring of Automatic Scoring of Handwritten Essays using Latent Handwritten Essays

Automatic Structures and Rank Cornell REU Group 2 Summer 2009 1 Important Questions 1. Is

Machine Translation 2: Statistical MT: Phrase-Based and Neural Ond rej Bojar

Bibliography for Module 16 on Evaluating Vaccine Efficacy Ninth Summer Institute in Statistics and

Systems Resilience and I (Inoue Lab 10 th Anniversary Symposium) Research group members : Hei Chan

Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals

&quot;50 Shades of Truth: Visualisation and Communication of Uncertainty&quot; Rodolphe Dewarrat

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

P a r t 2 2 I n t e g e r p r o g r a m m i n g p r o b l e m s 1

Modular Detection of Model Structure in Integer Programming Michael Bastubbe &amp; Marco L

"50 Shades of Truth: Visualisation and Communication of Uncertainty" Rodolphe Dewarrat

Modular Detection of Model Structure in Integer Programming Michael Bastubbe & Marco L