Personalized Health and Africa Neil D. Lawrence University of She - - PowerPoint PPT Presentation

personalized health and africa
SMART_READER_LITE
LIVE PREVIEW

Personalized Health and Africa Neil D. Lawrence University of She - - PowerPoint PPT Presentation

Personalized Health and Africa Neil D. Lawrence University of She ffi eld 18th June 2015 Outline Diversity of Data Massively Missing Data Not the Scale its the Diversity Outline Diversity of Data Massively Missing Data Massive Missing


slide-1
SLIDE 1

Personalized Health and Africa

Neil D. Lawrence

University of Sheffield

18th June 2015

slide-2
SLIDE 2

Outline

Diversity of Data Massively Missing Data

slide-3
SLIDE 3

Not the Scale it’s the Diversity

slide-4
SLIDE 4

Outline

Diversity of Data Massively Missing Data

slide-5
SLIDE 5

Massive Missing Data

◮ If missing at random it can be marginalized. ◮ As data sets become very large (39 million in EMIS) data

becomes extremely sparse.

◮ Imputation becomes impractical.

slide-6
SLIDE 6

Imputation

◮ Expectation Maximization (EM) is gold standard

imputation algorithm.

◮ Exact EM optimizes the log likelihood. ◮ Approximate EM optimizes a lower bound on log

likelihood.

◮ e.g. variational approximations (VIBES, Infer.net).

◮ Convergence is guaranteed to a local maxima in log

likelihood.

slide-7
SLIDE 7

Expectation Maximization

Require: An initial guess for missing data

slide-8
SLIDE 8

Expectation Maximization

Require: An initial guess for missing data repeat

slide-9
SLIDE 9

Expectation Maximization

Require: An initial guess for missing data repeat Update model parameters (M-step)

slide-10
SLIDE 10

Expectation Maximization

Require: An initial guess for missing data repeat Update model parameters Update guess of missing data (M-step) (E-step)

slide-11
SLIDE 11

Expectation Maximization

Require: An initial guess for missing data repeat Update model parameters Update guess of missing data until convergence (M-step) (E-step)

slide-12
SLIDE 12

Imputation is Impractical

◮ In very sparse data imputation is impractical. ◮ EMIS: 39 million patients, thousands of tests. ◮ For most people, most tests are missing. ◮ M-step becomes confused by poor imputation.

slide-13
SLIDE 13

Direct Marginalization is the Answer

◮ Perhaps we need joint distribution of two test outcomes,

p(y1, y2)

◮ Obtained through marginalizing over all missing data,

p(y1, y2) =

  • p(y1, y2, y3, . . . , yp)dy3, . . . dyp

◮ Where y3, . . . , yp contains:

  • 1. all tests not applied to this patient
  • 2. all tests not yet invented!!
slide-14
SLIDE 14

Magical Marginalization in Gaussians

Multi-variate Gaussians

◮ Given 10 dimensional multivariate Gaussian, y ∼ N (0, C). ◮ Generate a single correlated sample y = y1, y2 . . . y10

.

◮ How do we find the marginal distribution of y1, y2?

slide-15
SLIDE 15

Gaussian Marginalization Property

  • 2
  • 1

1 2 2 4 6 8 10 yi i

(a) A 10 dimensional sample

j i

  • 4
  • 3
  • 2
  • 1

1 2 3 4

(b) colormap showing covariance be- tween dimensions.

Figure : A sample from a 10 dimensional correlated Gaussian distribution.

slide-16
SLIDE 16

Gaussian Marginalization Property

  • 2
  • 1

1 2 2 4 6 8 10 yi i

(a) A 10 dimensional sample

j i

  • 4
  • 3
  • 2
  • 1

1 2 3 4

(b) colormap showing covariance be- tween dimensions.

Figure : A sample from a 10 dimensional correlated Gaussian distribution.

slide-17
SLIDE 17

Gaussian Marginalization Property

  • 2
  • 1

1 2 2 4 6 8 10 yi i

(a) A 10 dimensional sample

  • 4
  • 3
  • 2
  • 1

1 2 3 4

(b) colormap showing covariance be- tween dimensions.

Figure : A sample from a 10 dimensional correlated Gaussian distribution.

slide-18
SLIDE 18

Gaussian Marginalization Property

  • 2
  • 1

1 2 2 4 6 8 10 yi i

(a) A 10 dimensional sample

  • 4
  • 3
  • 2
  • 1

1 2 3 4

(b) colormap showing covariance be- tween dimensions.

Figure : A sample from a 10 dimensional correlated Gaussian distribution.

slide-19
SLIDE 19

Gaussian Marginalization Property

  • 2
  • 1

1 2 2 4 6 8 10 yi i

(a) A 10 dimensional sample

  • 4
  • 3
  • 2
  • 1

1 2 3 4

(b) colormap showing covariance be- tween dimensions.

Figure : A sample from a 10 dimensional correlated Gaussian distribution.

slide-20
SLIDE 20

Gaussian Marginalization Property

  • 2
  • 1

1 2 2 4 6 8 10 yi i

(a) A 10 dimensional sample

  • 4
  • 3
  • 2
  • 1

1 2 3 4

(b) colormap showing covariance be- tween dimensions.

Figure : A sample from a 10 dimensional correlated Gaussian distribution.

slide-21
SLIDE 21

Gaussian Marginalization Property

  • 2
  • 1

1 2 2 4 6 8 10 yi i

(a) A 10 dimensional sample

  • 4
  • 3
  • 2
  • 1

1 2 3 4

(b) colormap showing covariance be- tween dimensions.

Figure : A sample from a 10 dimensional correlated Gaussian distribution.

slide-22
SLIDE 22

Gaussian Marginalization Property

  • 2
  • 1

1 2 2 4 6 8 10 yi i

(a) A 10 dimensional sample

4.1 3.1111 3.1111 2.5198

(b) covariance between y1 and y2.

Figure : A sample from a 10 dimensional correlated Gaussian distribution.

slide-23
SLIDE 23

Gaussian Marginalization Property

  • 2
  • 1

1 2 2 4 6 8 10 yi i

(a) A 10 dimensional sample

1 0.96793 0.96793 1

(b) correlation between y1 and y2.

Figure : A sample from a 10 dimensional correlated Gaussian distribution.

slide-24
SLIDE 24

Avoid Imputation: Marginalize Directly

◮ Our approach: Avoid Imputation, Marginalize Directly. ◮ Explored in context of Collaborative Filtering. ◮ Similar challenges:

◮ many users (patients), ◮ many items (tests), ◮ sparse data

◮ Implicitly marginalizes over all future tests too.

Work with Raquel Urtasun (Lawrence and Urtasun, 2009) and ongoing work with Max Zwießele and Nicol´

  • Fusi.
slide-25
SLIDE 25

Marginalization in Bipartite Undirected Graph

x1 x2 x3 x4 x5 y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 latent variables

slide-26
SLIDE 26

Marginalization in Bipartite Undirected Graph

x1 x2 x3 x4 x5 y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 latent variables

slide-27
SLIDE 27

Marginalization in Bipartite Undirected Graph

x1 x2 x3 x4 x5 y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 latent variables

slide-28
SLIDE 28

Marginalization in Bipartite Undirected Graph

x1 x2 x3 x4 x5 y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 latent variables additional layer

  • f latent variables
slide-29
SLIDE 29

Marginalization in Bipartite Undirected Graph

x1 x2 x3 x4 x5 y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 latent variables additional layer

  • f latent variables

For massive missing data, how many additional latent variables?

slide-30
SLIDE 30

Methods that Interrelate Covariates

◮ Need Class of models that interrelates data, but allows for

variable p.

◮ Common assumption: high dimensional data lies on low

dimensional manifold.

◮ Want to retain the marginalization property of Gaussians

but deal with non-Gaussian data!

slide-31
SLIDE 31

Example: Prediction of Malaria Incidence in Uganda

◮ Work with John Quinn and Martin Mubaganzi (Makerere

University, Uganda)

◮ See http://air.ug/research.html.

slide-32
SLIDE 32

Malaria Prediction in Uganda

Data SRTM/NASA from http://dds.cr.usgs.gov/srtm/version2_1 29°E 31°E 33°E 35°E 2°S 0°N 2°N 4°N

slide-33
SLIDE 33

Malaria Prediction in Uganda

3 2 1 1 2 3 4 5 6

Sentinel - all patients

3 2 1 1 2 3 4 5 6

Sentinel - patients with malaria

3 2 1 1 2 3 4 5 6

HMIS - all_patients

3 2 1 1 2 3 4 5 6

Satellite - rain

1500 2000 2500 3000 3500 3 2 1 1 2 3 4 5 6

  • W. station - temperature

Nagongera / Tororo (Multiple output model)

slide-34
SLIDE 34

Malaria Prediction in Uganda

300 600 900 1200 1500 1800 1000 2000 3000 4000 5000

sparse regression incidence

300 600 900 1200 1500 1800

time (days)

1000 2000 3000 4000 5000

multiple output incidence

Mubende

slide-35
SLIDE 35

GP School at Makerere

slide-36
SLIDE 36

Early Warning Systems

slide-37
SLIDE 37

Early Warning Systems

slide-38
SLIDE 38

Deep Models

y1 y2 y3 y4 y5 y6 y7 y8 x1

1

x1

2

x1

3

x1

4

x1

5

x1

6

x2

1

x2

2

x2

3

x2

4

x2

5

x2

6

x3

1

x3

2

x3

3

x3

4

x4

1

x4

2

x4

3

x4

4

Latent layer 4 Latent layer 3 Latent layer 2 Latent layer 1 Data space

slide-39
SLIDE 39

Deep Models

y x1 x2 x3 x4

Latent layer 4 Latent layer 3 Latent layer 2 Latent layer 1 Data space

slide-40
SLIDE 40

Deep Models

y x1 x2 x3 x4

Abstract features More com- bination Combination

  • f low level

features Low level features Data space

slide-41
SLIDE 41

Deep Gaussian Processes

Damianou and Lawrence (2013) ◮ Deep architectures allow abstraction of features (Bengio, 2009; Hinton and Osindero, 2006; Salakhutdinov and Murray, 2008). ◮ We use variational approach to stack GP models.

slide-42
SLIDE 42

Deep Health

I1 I2 x1

1

x1

2

x1

3

x1

4

x1

5

y2 y3 y5 y4 x2

1

x2

2

x2

3

x2

4

y1 y6 x3

1

x3

2

x3

3

x3

4

G E EG latent representation

  • f disease stratification

survival analysis gene ex- pression clinical mea- surements and treatment clinical notes social net- work, music data X-ray biopsy environment epigenotype genotype

slide-43
SLIDE 43

Summary

◮ Intention is to deploy probabilistic machine learning for

assimilating a wide range of data types in personalized health:

◮ Social networking, text (clinical notes), survival times,

medical imaging, phenotype, genotype, mobile phone records, music tastes, Tesco club card

◮ Requires population scale models with millions of features. ◮ May be necessary for early detection of dementia or other

diseases with high noise to signal.

◮ Major issues in privacy and interfacing with the patient. ◮ But: the revolution is coming. We need to steer it.

slide-44
SLIDE 44

References I

  • Y. Bengio. Learning Deep Architectures for AI. Found. Trends Mach.

Learn., 2(1):1–127, Jan. 2009. ISSN 1935-8237. [DOI].

  • A. Damianou and N. D. Lawrence. Deep Gaussian processes. In
  • C. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth

International Workshop on Artificial Intelligence and Statistics, volume 31, AZ, USA, 2013. JMLR W&CP 31. [PDF].

  • G. E. Hinton and S. Osindero. A fast learning algorithm for deep

belief nets. Neural Computation, 18:2006, 2006.

  • N. D. Lawrence and R. Urtasun. Non-linear matrix factorization with

Gaussian processes. In L. Bottou and M. Littman, editors, Proceedings of the International Conference in Machine Learning, volume 26, San Francisco, CA, 2009. Morgan Kauffman. [PDF].

  • R. Salakhutdinov and I. Murray. On the quantitative analysis of deep

belief networks. In S. Roweis and A. McCallum, editors, Proceedings of the International Conference in Machine Learning, volume 25, pages 872–879. Omnipress, 2008.