Personalized Health and Africa Neil D. Lawrence University of She - PowerPoint PPT Presentation

Personalized Health and Africa Neil D. Lawrence University of She ffi eld 18th June 2015

Outline Diversity of Data Massively Missing Data

Not the Scale it’s the Diversity

Outline Diversity of Data Massively Missing Data

Massive Missing Data ◮ If missing at random it can be marginalized. ◮ As data sets become very large (39 million in EMIS) data becomes extremely sparse. ◮ Imputation becomes impractical.

Imputation ◮ Expectation Maximization (EM) is gold standard imputation algorithm. ◮ Exact EM optimizes the log likelihood. ◮ Approximate EM optimizes a lower bound on log likelihood. ◮ e.g. variational approximations (VIBES, Infer.net). ◮ Convergence is guaranteed to a local maxima in log likelihood.

Expectation Maximization Require: An initial guess for missing data

Expectation Maximization Require: An initial guess for missing data repeat

Expectation Maximization Require: An initial guess for missing data repeat Update model parameters (M-step)

Expectation Maximization Require: An initial guess for missing data repeat Update model parameters (M-step) Update guess of missing data (E-step)

Expectation Maximization Require: An initial guess for missing data repeat Update model parameters (M-step) Update guess of missing data (E-step) until convergence

Imputation is Impractical ◮ In very sparse data imputation is impractical. ◮ EMIS: 39 million patients, thousands of tests. ◮ For most people, most tests are missing. ◮ M-step becomes confused by poor imputation.

Direct Marginalization is the Answer ◮ Perhaps we need joint distribution of two test outcomes, p ( y 1 , y 2 ) ◮ Obtained through marginalizing over all missing data, � p ( y 1 , y 2 ) = p ( y 1 , y 2 , y 3 , . . . , y p )d y 3 , . . . d y p ◮ Where y 3 , . . . , y p contains: 1. all tests not applied to this patient 2. all tests not yet invented!!

Magical Marginalization in Gaussians Multi-variate Gaussians ◮ Given 10 dimensional multivariate Gaussian, y ∼ N ( 0 , C ). ◮ Generate a single correlated sample y = � y 1 , y 2 . . . y 10 � . ◮ How do we find the marginal distribution of y 1 , y 2 ?

Gaussian Marginalization Property 4 2 3 1 2 1 0 y i 0 j -1 -1 -2 -2 0 2 4 6 8 10 -3 i -4 i (a) A 10 dimensional sample (b) colormap showing covariance between dimensions. Figure : A sample from a 10 dimensional correlated Gaussian distribution.

Gaussian Marginalization Property 4 2 3 1 2 1 0 y i 0 -1 -1 -2 -2 0 2 4 6 8 10 -3 i -4 (a) A 10 dimensional sample (b) colormap showing covariance between dimensions. Figure : A sample from a 10 dimensional correlated Gaussian distribution.

Gaussian Marginalization Property 2 1 4.1 3.1111 0 y i -1 3.1111 2.5198 -2 0 2 4 6 8 10 i (a) A 10 dimensional sample (b) covariance between y 1 and y 2 . Figure : A sample from a 10 dimensional correlated Gaussian distribution.

Gaussian Marginalization Property 2 1 1 0.96793 0 y i -1 0.96793 1 -2 0 2 4 6 8 10 i (a) A 10 dimensional sample (b) correlation between y 1 and y 2 . Figure : A sample from a 10 dimensional correlated Gaussian distribution.

Avoid Imputation: Marginalize Directly ◮ Our approach: Avoid Imputation, Marginalize Directly. ◮ Explored in context of Collaborative Filtering. ◮ Similar challenges: ◮ many users (patients), ◮ many items (tests), ◮ sparse data ◮ Implicitly marginalizes over all future tests too. Work with Raquel Urtasun (Lawrence and Urtasun, 2009) and ongoing work with Max Zwießele and Nicol´ o Fusi.

Marginalization in Bipartite Undirected Graph x 1 x 2 x 3 x 4 x 5 latent variables y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9 y 10

Marginalization in Bipartite Undirected Graph y 10 x 1 x 2 x 3 x 4 x 5 latent variables y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9

Marginalization in Bipartite Undirected Graph additional layer y 10 of latent variables x 3 x 5 x 1 x 2 x 4 latent variables y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9

Marginalization in Bipartite Undirected Graph additional layer y 10 of latent variables x 1 x 2 x 3 x 4 x 5 latent variables y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9 For massive missing data , how many additional latent variables?

Methods that Interrelate Covariates ◮ Need Class of models that interrelates data, but allows for variable p . ◮ Common assumption: high dimensional data lies on low dimensional manifold. ◮ Want to retain the marginalization property of Gaussians but deal with non-Gaussian data!

Example: Prediction of Malaria Incidence in Uganda ◮ Work with John Quinn and Martin Mubaganzi (Makerere University, Uganda) ◮ See http: // air.ug / research.html.

Malaria Prediction in Uganda Data SRTM/NASA from http://dds.cr.usgs.gov/srtm/version2_1 4Â°N 2Â°N 0Â°N 2Â°S 29Â°E 31Â°E 33Â°E 35Â°E

Malaria Prediction in Uganda Nagongera / Tororo (Multiple output model) Sentinel - all patients 6 5 4 3 2 1 0 1 2 3 Sentinel - patients with malaria 6 5 4 3 2 1 0 1 2 3 HMIS - all_patients 6 5 4 3 2 1 0 1 2 3 Satellite - rain 6 5 4 3 2 1 0 1 2 3 W. station - temperature 6 5 4 3 2 1 0 1 2 3 1500 2000 2500 3000 3500

Malaria Prediction in Uganda Mubende 5000 sparse regression incidence 4000 3000 2000 1000 0 0 300 600 900 1200 1500 1800 5000 4000 incidence multiple output 3000 2000 1000 0 0 300 600 900 1200 1500 1800 time (days)

GP School at Makerere

Early Warning Systems

Deep Models x 4 x 4 x 4 x 4 Latent layer 4 1 2 3 4 x 3 x 3 x 3 x 3 Latent layer 3 1 2 3 4 x 2 x 2 x 2 x 2 x 2 x 2 Latent layer 2 1 2 3 4 5 6 x 1 x 1 x 1 x 1 x 1 x 1 Latent layer 1 1 2 3 4 5 6 y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 Data space

Deep Models x 4 Latent layer 4 x 3 Latent layer 3 x 2 Latent layer 2 x 1 Latent layer 1 y Data space

Deep Models x 4 Abstract features More com- x 3 bination Combination x 2 of low level features Low level x 1 features y Data space

Deep Gaussian Processes Damianou and Lawrence (2013) ◮ Deep architectures allow abstraction of features (Bengio, 2009; Hinton and Osindero, 2006; Salakhutdinov and Murray, 2008) . ◮ We use variational approach to stack GP models.

Deep Health genotype epigenotype environment G E EG latent representation x 3 x 3 x 3 x 3 1 2 3 4 of disease stratification gene ex- survival y 6 x 2 x 2 x 2 x 2 y 1 pression 1 2 3 4 analysis clinical mea- y 4 y 5 x 1 x 1 x 1 x 1 x 1 y 2 y 3 surements 1 2 3 4 5 and treatment social net- clinical I 2 I 1 work, notes music biopsy X-ray data

Summary ◮ Intention is to deploy probabilistic machine learning for assimilating a wide range of data types in personalized health: ◮ Social networking, text (clinical notes), survival times, medical imaging, phenotype, genotype, mobile phone records, music tastes, Tesco club card ◮ Requires population scale models with millions of features. ◮ May be necessary for early detection of dementia or other diseases with high noise to signal. ◮ Major issues in privacy and interfacing with the patient. ◮ But: the revolution is coming. We need to steer it.

Personalized Health and Africa Neil D. Lawrence University of She - PowerPoint PPT Presentation

Personalized Health and Africa Neil D. Lawrence University of She ffi eld 18th June 2015 Outline Diversity of Data Massively Missing Data Not the Scale its the Diversity Outline Diversity of Data Massively Missing Data Massive Missing

Realizing the Dreams of Personalized Medicine Realizing the Dreams of Personalized Medicine

REGIONAL SPOTLIGHT: AFRICA 13th November 2017 Introduction to Africa 3 Introduction Africa:

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

Personalized Learning October 2018 Pattonville Personalized Learning Vision Students own their

Africa Chapte Chapters 21, 22, 23, rs 21, 22, 23, & 24 & 24 Africa (Chp 21, 22, 23,

Official Sensitive Contents 1. Energy Africa: What? 2. Energy Africa: Why? 3. Energy Africa:

Personalized Professional and Quality Personalized Professional and Quality Service Service Job

Personalized Medicine - Transportation and Logistic Aspects Israel December 2016 Victoria Nahary

PCORI and Personalized Medicine Personalized Medicine Coalition Policy Committee December 7,

Office of Personalized Learning Stephanie DiStasio The Office of Personalized Learning focuses on

The Foundations of Personalized Medicine Jeremy M. Berg Pittsburgh Foundation Professor and

My Experiences With Personalized Learning By: Calvin Choo School: Roosevelt Middle What I Think

Sublinear Algorithms for Personalized PageRank, with Applications Ashish Goel Joint work with

Personalized Production Personalized Production Janani Viswanathan, Dawn Tilbury, S. Jack Hu, Z.

Personalized Medicine Are we there yet? Bruce McManus Heart + Lung FEST 2011 Best Practices

Personalized Learning Environments Architectural Considerations for Supporting Personalized

Network Layout Ma Maneesh Agrawala CS 448B: Visualization Fall 2020 1 Last Time: Animation 2

JUST THE MATHS SLIDES NUMBER 3.2 TRIGONOMETRY 2 (Graphs of trigonometric functions) by

CMSC427 Geometry and Vectors Review: where are we? Parametric curves and Hw1? Going

What Youll Learn Today Review, Q&A: Flash Tweening Using Flash as a multimedia

Learning Training data consists of pairs of { X n , Y n } w T w T S w ( X, Y ) = y i x i

A Design-Based Research project on Information Literacy teaching focusing on process, reflections

SybilGuard: Defending Against Sybil Attacks via Social Networks Haifeng

Learning Long Distance Phonotactics Jeffrey Heinz heinz@udel.edu University of Delaware

Personalized Health and Africa Neil D. Lawrence University of She - PowerPoint PPT Presentation

Personalized Health and Africa Neil D. Lawrence University of She ffi eld 18th June 2015 Outline Diversity of Data Massively Missing Data Not the Scale its the Diversity Outline Diversity of Data Massively Missing Data Massive Missing

Realizing the Dreams of Personalized Medicine Realizing the Dreams of Personalized Medicine

REGIONAL SPOTLIGHT: AFRICA 13th November 2017 Introduction to Africa 3 Introduction Africa:

Web Mining and Recommender Systems Advanced Recommender Systems: Bayesian Personalized Ranking

Personalized Learning October 2018 Pattonville Personalized Learning Vision Students own their

Africa Chapte Chapters 21, 22, 23, rs 21, 22, 23, &amp; 24 &amp; 24 Africa (Chp 21, 22, 23,

Official Sensitive Contents 1. Energy Africa: What? 2. Energy Africa: Why? 3. Energy Africa:

Personalized Professional and Quality Personalized Professional and Quality Service Service Job

Personalized Medicine - Transportation and Logistic Aspects Israel December 2016 Victoria Nahary

PCORI and Personalized Medicine Personalized Medicine Coalition Policy Committee December 7,

Office of Personalized Learning Stephanie DiStasio The Office of Personalized Learning focuses on

The Foundations of Personalized Medicine Jeremy M. Berg Pittsburgh Foundation Professor and

My Experiences With Personalized Learning By: Calvin Choo School: Roosevelt Middle What I Think

Sublinear Algorithms for Personalized PageRank, with Applications Ashish Goel Joint work with

Personalized Production Personalized Production Janani Viswanathan, Dawn Tilbury, S. Jack Hu, Z.

Personalized Medicine Are we there yet? Bruce McManus Heart + Lung FEST 2011 Best Practices

Personalized Learning Environments Architectural Considerations for Supporting Personalized

Network Layout Ma Maneesh Agrawala CS 448B: Visualization Fall 2020 1 Last Time: Animation 2

JUST THE MATHS SLIDES NUMBER 3.2 TRIGONOMETRY 2 (Graphs of trigonometric functions) by

CMSC427 Geometry and Vectors Review: where are we? Parametric curves and Hw1? Going

What Youll Learn Today Review, Q&amp;A: Flash Tweening Using Flash as a multimedia

Learning Training data consists of pairs of { X n , Y n } w T w T S w ( X, Y ) = y i x i

A Design-Based Research project on Information Literacy teaching focusing on process, reflections

SybilGuard: Defending Against Sybil Attacks via Social Networks Haifeng

Learning Long Distance Phonotactics Jeffrey Heinz heinz@udel.edu University of Delaware

Africa Chapte Chapters 21, 22, 23, rs 21, 22, 23, & 24 & 24 Africa (Chp 21, 22, 23,

What Youll Learn Today Review, Q&A: Flash Tweening Using Flash as a multimedia