Outline 1. Motivation 2. Gaussian process introduction 3. Change - - PDF document

outline
SMART_READER_LITE
LIVE PREVIEW

Outline 1. Motivation 2. Gaussian process introduction 3. Change - - PDF document

5/22/2016 Scalable Gaussian Processes for Characterizing Multidimensional Change Surfaces April 18, 2016 William Herlands Committee: Daniel Neill, Alex Smola, Wilbert Van Panhuis Chair: Dave Choi Outline 1. Motivation 2. Gaussian process


slide-1
SLIDE 1

5/22/2016 1

Scalable Gaussian Processes for Characterizing Multidimensional Change Surfaces

April 18, 2016 William Herlands Committee: Daniel Neill, Alex Smola, Wilbert Van Panhuis Chair: Dave Choi

Outline

  • 1. Motivation
  • 2. Gaussian process introduction
  • 3. Change surface model
  • 4. Analysis of measles in the United States
slide-2
SLIDE 2

5/22/2016 2

Complex Changes

  • In human systems changes are often

complex

– Policy interventions take time to trickle through government bureaucracy – Environmental hazards affect populations differentially

  • Simple changepoint models are not

sufficiently expressive

Why do we care?

  • Understand past changes

– Explore spatio-temporal heterogeneity – Model the rate of changes in different areas

  • Enable more accurate or equitable policies
  • Applications

– Measles incidence in the U.S – Concerns about lead-tainted water in NYC

08 Jul 2014 21 Oct 2015
slide-3
SLIDE 3

5/22/2016 3

Our objectives

– Multiple, flexible function regimes

Gaussian processes for flexible functions “Change surfaces” for complex changes

  • Model complex changes in real world data

– Non-discrete changes – Non-monotonic changes – Heterogeneous changes over space, time, etc.

  • Non-parametric prior over smooth functions
  • Covariance function is a kernel. Defines the

covariance of function values

Gaussian Processes (GP)

f (x) ~ GP(m(x),k(x, x')) m(x)  E[ f (x)] k(x, x')  cov( f (x), f (x'))

slide-4
SLIDE 4

5/22/2016 4

Gaussian Processes (GP)

  • Any finite set of f(x) is Normally distributed
  • Observation model
  • Marginal log likelihood optimization

log p(y|)log | K  I |yT(K  I)1y f (x

1),..., f (xm)

  ~ N(m(x),K)

,

y(x)  f (x)

 ~ N(0, )

__ __

  • Our model is a convex combination of fi

__ __

y(x)  s

1(x)f1(x)... s r(x)fr(x)n

Full Model

Switching functions

s

i(x)   r

Functional regimes

slide-5
SLIDE 5

5/22/2016 5

Model part 1: Functional Regimes

  • GP prior for each functional regime

– Use flexible stationary kernels

fi ~ GP(0,Ki), i 1,...,r

Model part 2: Change Surfaces

  • Changepoint
  • Non-discrete changepoint
  • Change surface

s

i  I(t  T i )

s

i  softmax(t T i)

s

i  softmax(w i(t))

−10 10 0.5 1 −10 10 0.5 1 −10 10 0.5 1

s

i  (wi(t))

slide-6
SLIDE 6

5/22/2016 6

Model part 2: Change Surfaces

  • Random Kitchen Sink features for

– Variable rate of change – Non-monotonic – Heterogeneous over input

wi(x)  aj

j1 q

cos( j

Tx bj )

wi(x)

Full Model

  • Gaussian process change surface model
  • Can depict this as a single Gaussian process

with covariance function y(x)  (wi(x)) fi(x)

i1 r

n fi(x) ~ GP(0,Ki) kall(x, x')   (wi(x))ki(x, x') (wi(x'))

i1 r

______

slide-7
SLIDE 7

5/22/2016 7

Scalable Inference

  • Log likelihood naively O(n3)
  • We develop scalable Kronecker inference

using the Weyl bound, O(DnD+1/D)

log p(y|)log | K I |yT(K I )1y

Measles in the United States

  • Data

– Monthly incidence rates 1935 – 2003 – Continental United States and D.C. – , 2D space and 1D time – Measles vaccine introduced in 1963

x  3

slide-8
SLIDE 8

5/22/2016 8

Measles in 3 states

1940 1950 1960 1970 1980 1990 2000 2010 100 200 300 Incidence (1000s) California 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x)) 1940 1950 1960 1970 1980 1990 2000 2010 100 200 Incidence (1000s) Maine 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x)) 1940 1950 1960 1970 1980 1990 2000 2010 100 200 300 400 500 Incidence (1000s) Michigan 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x))

CA ME MI

1940 1950 1960 1970 1980 1990 2000 2010 100 200 300 Incidence (1000s) California 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x)) 1940 1950 1960 1970 1980 1990 2000 2010 100 200 Incidence (1000s) Maine 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x)) 1940 1950 1960 1970 1980 1990 2000 2010 100 200 300 400 500 Incidence (1000s) Michigan 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x))

Measles in 3 states

CA ME MI

slide-9
SLIDE 9

5/22/2016 9

Measles in 3 states

  • GP change surface

– 2 functional regimes – as RKS with 5 features

wi(x)

  • Not a causal model!
1940 1950 1960 1970 1980 1990 2000 2010 100 200 300 Incidence (1000s) California 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x)) 1940 1950 1960 1970 1980 1990 2000 2010 100 200 Incidence (1000s) Maine 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x)) 1940 1950 1960 1970 1980 1990 2000 2010 100 200 300 400 500 Incidence (1000s) Michigan 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x))

Measles in 3 states

CA ME MI

slide-10
SLIDE 10

5/22/2016 10

1940 1950 1960 1970 1980 1990 2000 2010 100 200 300 Incidence (1000s) California 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x)) 1940 1950 1960 1970 1980 1990 2000 2010 100 200 Incidence (1000s) Maine 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x)) 1940 1950 1960 1970 1980 1990 2000 2010 100 200 300 400 500 Incidence (1000s) Michigan 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x))

Measles in 3 states

CA ME MI

1940 1950 1960 1970 1980 1990 2000 2010 100 200 300 400 500 Incidence (1000s) Michigan 1940 1950 1960 1970 1980 1990 2000 2010 0.5 1 s(w(x))

Measles in 3 states

“Change date” per state σ(w(x)) = 0.5 “Change slope” from σ(w(x)) = 0.25  0.75 .

MI

slide-11
SLIDE 11

5/22/2016 11

Change date for measles in U.S.

For each state, date where σ(w(x)) = 0.5

1961.5 1967.2

Change slope for measles in U.S.

For each state, slope of σ(w(x)) = 0.75  0.25

0.156 0.297
slide-12
SLIDE 12

5/22/2016 12

Regression Analysis

  • Explore factors that affect the change date

– Birth and death rates – Population numbers per age segment – Income information – Government hospital and health workers – Slope of change surface – Average temperature

Demographic Analysis

slide-13
SLIDE 13

5/22/2016 13

Regression Analysis

  • Gini of family income

– Economically depressed communities – Rural regions

1961.5 1967.2
  • Slope of change surface

– Fewer cases nationwide enable more effective immunization later

Conclusions

  • Introduced model for “change surfaces” in

real world data

  • Developed scalable inference for additive,

non-stationary Gaussian processes

  • Identified heterogeneity in first years of the

measles vaccine

  • Used the results of the change surface

model for policy relevant conclusions

slide-14
SLIDE 14

5/22/2016 14

Acknowledgements

  • Committee

– Daniel Neill, Alex Smola, Wilbert van Panhuis

  • Chair

– Dave Choi

  • Collaborators*

– Andrew Wilson – Seth Flaxman – Hannes Nickisch

*Subset of paper accepted to AISTATS 2016

Fin.

Questions?

28
slide-15
SLIDE 15

5/22/2016 15

Backup slides Conclusions

  • Introduced model for “change surfaces” in

real world data

  • Developed scalable inference for additive,

non-stationary Gaussian processes

  • Identified heterogeneity in first years of the

measles vaccine

  • Used the results of the change surface

model for policy relevant conclusions

slide-16
SLIDE 16

5/22/2016 16

Spectral Mixture Kernels Inference

  • Compute log marginal likelihood
  • General Kronecker methods for scalability

– Assume: – Assume: multiplicative kernel across D – Then we can decompose kernel matrix,

slide-17
SLIDE 17

5/22/2016 17

Inference

  • For additive kernels
  • K-1 can be computed efficiently using LCG*
  • But how can we compute the log|K| ?
*See Flaxman et al. (2015)

Inference

slide-18
SLIDE 18

5/22/2016 18

Inference

  • Choosing indices i, j

Method Complexity Minimization for best pair O(n2) “Middle” heuristic i=j OR i=j+1 O(n) Greedy search of s pairs below and above previous pair O(2sn)

Inference

  • Scaling functions, σ(w(x))
slide-19
SLIDE 19

5/22/2016 19

Inference

10 2 10 3 10 4 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 3 Kernels Log determinant approximation ratio Observations (#) 10 2 10 4 10 6 10 −4 10 −2 10 10 2 10 4 Observations (#) Time (sec) 3 Kernels Weyl exact Weyl middle Weyl greedy Cheb−Hutch True log det

Inference – so what?!

  • Linear complexity for additive kernels

– O(DnD+1/D)

  • Scalable inference for non-separable kernels

in space and time

  • Scalable inference for non-stationary

kernels

slide-20
SLIDE 20

5/22/2016 20

Numerical Experiments

  • 2500 points of synthetic data
  • 2 functional regimes defined by squared

exponential kernels

  • Change surface define by

Results - Numerical

slide-21
SLIDE 21

5/22/2016 21

Demographic Analysis Demographic Analysis