bayesian nonparametric models for data exploration
play

Bayesian Nonparametric Models for Data Exploration Melanie F. - PowerPoint PPT Presentation

Bayesian Nonparametric Models for Data Exploration Melanie F. Pradier Friday 15 th September, 2017 Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian


  1. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Contributions Goal: build useful BNP models for specific data exploration tasks. Atom-dependent DP mixture model Poisson factor analysis (PFA) models • estimates density in stratified data → flexible feature models for count data • suitable for fairness requirements • link to mixture of experts 1 Hierarchical PFA: • deals with stratified data Case-control IBP feature model 2 Three-parameter Restricted PFA: • infers latent features in • imposes structured sparsity in heterogeneous structured data latent space • suitable to separate global and 3 Dynamic PFA: group-specific effects • allows for time-varying • combined with statistical testing activation of latent factors Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 6/50

  2. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Contributions Goal: build useful BNP models for specific data exploration tasks. Atom-dependent DP mixture model Poisson factor analysis (PFA) models • estimates density in stratified data → flexible feature models for count data • suitable for fairness requirements • link to mixture of experts 1 Hierarchical PFA: • Application: marathon • deals with stratified data 2 Three-parameter Restricted PFA: Case-control IBP feature model • imposes structured sparsity in • infers latent features in latent space heterogeneous structured data 3 Dynamic PFA: • suitable to separate global and • allows for time-varying group-specific effects activation of latent factors • combined with statistical testing • Application: international trade • Application: clinical trial Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 6/50

  3. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 7/50

  4. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Bayesian nonparametrics (BNPs) • Bayesian framework for model selection • Nonparametric: number of parameters grows with the amount of data: • Prior over infinite-dimensional parameter space • Only a finite subset of parameters is used for any finite dataset Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 8/50

  5. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Bayesian nonparametrics (BNPs) • Bayesian framework for model selection • Nonparametric: number of parameters grows with the amount of data: • Prior over infinite-dimensional parameter space • Only a finite subset of parameters is used for any finite dataset • Rely on stochastic processes: • Dirichlet process • Beta process • Gaussian process • · · · Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 8/50

  6. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dirichlet process (DP) G ∼ DP( α, H ) ∞ � G = π k δ φ k k =1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 9/50

  7. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dirichlet process (DP) G ∼ DP( α, H ) ∞ � G = π k δ φ k k =1 • central block for infinite mixture models Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 9/50

  8. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dirichlet process (DP) G ∼ DP( α, H ) Stick-breaking representation (Ishwaran et.al, 2001) For k = 1 , · · · , ∞ k − 1 � v k ∼ Beta( α, 1) , π k = v k (1 − v ℓ ) ℓ =1 1 π 1 k = 1 π 2 k = 2 ∞ π 3 � G = π k δ φ k k = 3 . . . k =1 π ∼ GEM( α ) • central block for infinite mixture models Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 9/50

  9. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dirichlet process (DP) Stick-breaking representation G ∼ DP( α, H ) (Ishwaran et.al, 2001) For k = 1 , · · · , ∞ k − 1 � v k ∼ Beta( α, 1) , π k = v k (1 − v ℓ ) ℓ =1 1 π 1 k = 1 π 2 k = 2 π 3 ∞ k = 3 � G = π k δ φ k . . . π ∼ GEM( α ) k =1 For k = 1 , · · · , ∞ • central block for infinite mixture models φ k ∼ H Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 9/50

  10. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Indian buffet process (IBP) • central block for infinite latent feature models Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 10/50

  11. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Indian buffet process (IBP) • central block for infinite latent feature models • hierarchy of a Beta process (BP) with multiple Bernoulli processes (BeP) ∞ � G = π k δ φ k ∼ BP( c, α, H ) k =1 1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 10/50

  12. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Indian buffet process (IBP) • central block for infinite latent feature models • hierarchy of a Beta process (BP) with multiple Bernoulli processes (BeP) ∞ � G = π k δ φ k ∼ BP( c, α, H ) k =1 1 For n = 1 , · · · , ∞ ∞ � ζ n = z nk δ φ k ∼ BeP( G ) k =1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 10/50

  13. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Indian buffet process (IBP) • central block for infinite latent feature models • hierarchy of a Beta process (BP) with multiple Bernoulli processes (BeP) ∞ � G = π k δ φ k ∼ BP( c, α, H ) k =1 1 For n = 1 , · · · , ∞ ∞ � ζ n = z nk δ φ k ∼ BeP( G ) k =1 Z ∼ IBP( α ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 10/50

  14. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 11/50

  15. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation 1 What is the impact of age and gender on runners performance? 2 Can we compare different runners in a fair manner? • entry requirements • rewards Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 12/50

  16. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation 1 What is the impact of age and gender on runners performance? 2 Can we compare different runners in a fair manner? • entry requirements • rewards Our Approach • dependent density estimation model • delivers scientific knowledge in sport sciences • constitutes a fair age-gender grading system • relies on dependent Dirichlet process Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 12/50

  17. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dependent Dirichlet process (DDP) (MacEachern,2000) J : number of groups ∞ � G j = π jk δ φ jk k =1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 13/50

  18. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Dependent Dirichlet process (DDP) (MacEachern,2000) J : number of groups ∞ � G j = π jk δ φ jk k =1 • hierarchical DP (Teh et.al, 2005) ∞ � G j = π jk δ φ k k =1 hierarchical DP single-p DDP • single-p DDP (MacEachern, 2000) G 0 ∼ DP ( α, H) G 0 ∼ DP ( α, H) ∞ � G j = π k δ φ jk G j ∼ DP ( γ, G 0 ) G j = T j [ G 0 ] k =1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 13/50

  19. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Atom-dependent DP mixture model Generative model x i ≡ marathon finishing time for runner i π | α ∼ GEM( α ) c i | π ∼ Cat( π ) µ 0 , σ 2 � � µ k ∼ N 0 σ 2 x ∼ IG ( a, b ) x i | µ c i , σ 2 � � x i | other vars ∼ N x Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 14/50

  20. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Atom-dependent DP mixture model Generative model x ji ≡ marathon finishing time for runner i in age group j π | α ∼ GEM( α ) c ji | π ∼ Cat( π ) µ 0 , σ 2 � � µ k ∼ N 0 σ 2 x ∼ IG ( a, b ) θ ∼ N ( 0 , Σ θ ) x ji | µ c ji + θ j , σ 2 x ji | other vars ∼ N � � x � � − ( ℓ − q ) 2 ( Σ θ ) ℓq = σ 2 θ exp + κδ ( ℓ − q ) 2 ν 2 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 14/50

  21. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Atom-dependent DP mixture model Generative model x ji ≡ marathon finishing time for runner i in age group j g ji ≡ gender π | α ∼ GEM( α ) c ji | π ∼ Cat( π ) µ 0 , σ 2 µ k ∼ N � � 0 σ 2 x ∼ IG ( a, b ) θ ∼ N ( 0 , Σ θ ) 0 , σ 2 � � δ ∼N ω ω ∼N ( 0 , Σ ω ) x ji | µ c ji + θ j + ✶ [ g ji = 1]( δ + ω j ) , σ 2 � � x ji | other vars ∼ N x � − ( ℓ − q ) 2 � ( Σ θ ) ℓq = σ 2 θ exp + κδ ( ℓ − q ) 2 ν 2 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 15/50

  22. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions • MCMC approach Results • conditional conjugacy Impact of age • block Gibbs sampler • 1 / 4 M runners 0 . 8 Histogram pdf by ADDP 0 . 6 Indiv. clusters 0 . 4 0 . 2 0 2 4 6 8 Finishing time (hours) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 16/50

  23. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions • MCMC approach Results • conditional conjugacy Impact of age • block Gibbs sampler 5 • 1 / 4 M runners New York City Boston 0 . 8 London Finishing time (hours) Histogram WMA 4 pdf by ADDP 0 . 6 Indiv. clusters 0 . 4 3 0 . 2 0 2 4 6 8 2 20 30 40 50 60 70 Finishing time (hours) Age Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 16/50

  24. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions • MCMC approach Results • conditional conjugacy Impact of age • block Gibbs sampler 5 • 1 / 4 M runners µ 1 + θ j µ 2 + θ j 0 . 8 New York City Finishing time (hours) Histogram Boston 4 pdf by ADDP London 0 . 6 Indiv. clusters WMA 0 . 4 3 0 . 2 0 2 4 6 8 2 20 30 40 50 60 70 Finishing time (hours) Age Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 16/50

  25. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results Impact of gender 34 δ + ω j (mins) 32 30 28 26 10 20 30 40 50 60 70 age (years) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 17/50

  26. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results Impact of gender 34 δ + ω j (mins) 32 30 28 26 10 20 30 40 50 60 70 age (years) 12 Other Results 11 Cluster 0 (7.2%, T=3.80h) • Speed-dependent cluster means Cluster 1 (24.4%, T=3.93h) 10 Cluster 1 − (14.9%, T=4.03h) Speed (km/h) Elevation (m) Cluster 1 − − (3.6%, T=4.16h) • Link to mixture of experts 9 100 Cluster 2A (13.4%, T=4.17h) Cluster 2A − (11.3%, T=4.27h) 80 Cluster 2A − − (3.2%, T=4.43h) • Analysis of running patterns 8 60 Cluster 2B (1.1%, T=4.32h) Cluster 2B − (1.6%, T=4.47h) 40 • Prediction of finishing time 7 Cluster 3 (3.4%, T=4.56h) Cluster 3 − (4.4%, T=4.59h) 20 Cluster 3 −− (1.4%, T=4.88h) 6 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 42.2 40 42.2 km Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 17/50

  27. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 18/50

  28. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50

  29. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. We want to discover: Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50

  30. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. We want to discover: 1 Indicators of disease progression: prognostic biomarkers Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50

  31. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. We want to discover: 1 Indicators of disease progression: prognostic biomarkers 2 Indicators of (positive) drug response: predictive biomarkers Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50

  32. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: biomarker discovery in clinical trials Def: ”any variable that can be used as an indicator of a particular disease state”. 0 1 We want to discover: 1 Indicators of disease progression: prognostic biomarkers 2 Indicators of (positive) drug response: predictive biomarkers Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 19/50

  33. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions General latent feature model (GLFM) (Valera et.al, 2017) Latent feature model for heterogeneous datasets σ 2 B d = 1 . . . D B • d φ d α Z Y • d X Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 20/50

  34. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions General latent feature model (GLFM) (Valera et.al, 2017) • Link functions T d depend on type of Latent feature model for data for each dimension d heterogeneous datasets x nd = T d ( y nd ; φ d ) σ 2 B N ( Z n • B • d , σ 2 y nd | Z , B ∼ y ) N (0 , σ 2 d = 1 . . . D B kd ∼ B ) B • d ∼ Z IBP( α ) φ d α Z Y • d X Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 20/50

  35. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions General latent feature model (GLFM) (Valera et.al, 2017) • Link functions T d depend on type of Latent feature model for data for each dimension d heterogeneous datasets x nd = T d ( y nd ; φ d ) σ 2 B N ( Z n • B • d , σ 2 y nd | Z , B ∼ y ) N (0 , σ 2 d = 1 . . . D B kd ∼ B ) B • d ∼ Z IBP( α ) φ d Our contribution to GLFM project α Z Y • d X • Open-source python code • Simulations for data exploration https://github.com/ivaleraM/GLFM Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 20/50

  36. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Our contribution: Case-control IBP (C-IBP) R n : drug indicator por patient n σ 2 B x nd = T d ( y nd ; φ d ) d = 1 . . . D y nd | Z , W , B , C , R ∼ N ( Z n • B • d + ✶ [ R n = 1] W n • C • d , σ 2 B • d y ) R B kd ∼ N (0 , σ 2 B ) φ d Z ∼ IBP ( α ) C kd ∼ N (0 , σ 2 C ) α Y • d Z X W ∼ IBP ( α ) • Inference : MCMC approach with α W C • d accelerated Gibbs sampling • Biomarker discovery : statistical multiple hypothesis testing σ 2 C Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 21/50

  37. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: subpopulations GPC3 Antibody Treatment against Liver Cancer (J. Hepatology. 2016 Apr, Abou-Alfa et.al.) • 180 patients: 60 took a placebo, 120 took the drug • PFS: Progression Free Survival Sub-population Drug Identifier Size (number Median PFS of patients) Mean PFS (months) (months) F1 F2 F3 1. 0 0 0 0 33.37 3.06 1.65 2. 0 0 1 0 4.07 2.29 2.24 3. 0 1 0 0 17.84 2.72 1.81 4. 0 1 1 0 4.72 7.05 7.18 5. 1 0 0 0 51.52 3.22 2.55 6. 1 0 0 1 16.77 4.17 3.65 7. 1 0 1 0 8.38 1.74 1.33 8. 1 0 1 1 2.07 2.69 2.65 9. 1 1 0 0 29.88 3.36 2.03 10. 1 1 0 1 4.90 4.44 4.34 11. 1 1 1 0 4.53 6.31 5.31 12. 1 1 1 1 1.94 10.04 10.01 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 22/50

  38. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: subpopulations GPC3 Antibody Treatment against Liver Cancer (J. Hepatology. 2016 Apr, Abou-Alfa et.al.) • 180 patients: 60 took a placebo, 120 took the drug • PFS: Progression Free Survival Sub-population Drug Identifier Size (number Median PFS of patients) Mean PFS (months) (months) F1 F2 F3 1. 0 0 0 0 33.37 3.06 1.65 2. 0 0 1 0 4.07 2.29 2.24 3. 0 1 0 0 17.84 2.72 1.81 4. 0 1 1 0 4.72 7.05 7.18 5. 1 0 0 0 51.52 3.22 2.55 6. 1 0 0 1 16.77 4.17 3.65 7. 1 0 1 0 8.38 1.74 1.33 8. 1 0 1 1 2.07 2.69 2.65 9. 1 1 0 0 29.88 3.36 2.03 10. 1 1 0 1 4.90 4.44 4.34 11. 1 1 1 0 4.53 6.31 5.31 12. 1 1 1 1 1.94 10.04 10.01 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 22/50

  39. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: subpopulations GPC3 Antibody Treatment against Liver Cancer (J. Hepatology. 2016 Apr, Abou-Alfa et.al.) • 180 patients: 60 took a placebo, 120 took the drug • PFS: Progression Free Survival Sub-population Drug Identifier Size (number Median PFS of patients) Mean PFS (months) (months) F1 F2 F3 1. 0 0 0 0 33.37 3.06 1.65 2. 0 0 1 0 4.07 2.29 2.24 3. 0 1 0 0 17.84 2.72 1.81 4. 0 1 1 0 4.72 7.05 7.18 5. 1 0 0 0 51.52 3.22 2.55 6. 1 0 0 1 16.77 4.17 3.65 7. 1 0 1 0 8.38 1.74 1.33 8. 1 0 1 1 2.07 2.69 2.65 9. 1 1 0 0 29.88 3.36 2.03 10. 1 1 0 1 4.90 4.44 4.34 11. 1 1 1 0 4.53 6.31 5.31 12. 1 1 1 1 1.94 10.04 10.01 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 22/50

  40. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: subpopulations GPC3 Antibody Treatment against Liver Cancer (J. Hepatology. 2016 Apr, Abou-Alfa et.al.) • 180 patients: 60 took a placebo, 120 took the drug • PFS: Progression Free Survival Sub-population Drug Identifier Size (number Median PFS of patients) Mean PFS (months) (months) 15 F1 F2 F3 1. 0 0 0 0 33.37 3.06 1.65 PFS (months) 2. 0 0 1 0 4.07 2.29 2.24 10 3. 0 1 0 0 17.84 2.72 1.81 4. 0 1 1 0 4.72 7.05 7.18 5. 1 0 0 0 51.52 3.22 2.55 6. 1 0 0 1 16.77 4.17 3.65 5 7. 1 0 1 0 8.38 1.74 1.33 8. 1 0 1 1 2.07 2.69 2.65 9. 1 1 0 0 29.88 3.36 2.03 10. 1 1 0 1 4.90 4.44 4.34 0 11. 1 1 1 0 4.53 6.31 5.31 12. 1 1 1 1 1.94 10.04 10.01 1 2 3 4 5 6 7 8 9 1 0 2 1 1 1 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 22/50

  41. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results: biomarker discovery Treatment-specific feature F3 2 0 ∆ d − 2 − 4 e t t I D h P T P e 6 3 6 3 a c r e c r e t m 7 6 5 B 3 4 8 8 K 6 + t - t K 6 6 N P F F 5 7 7 6 L h h M o o y h 6 h g g r 1 D 1 D m i l i l 0 1 4 D D D D 1 5 4 6 5 0 9 T g g F A R o t m b t b C N 6 g g N D D S S A u D D o m e 1 D D D 1 D P 1 / 6 / B A C C a o a C C C C 1 E E D i i o A C c o r M i D i / 0 / 1 e e u i r u i D r r K s C C P r c v c e C C / 8 C D b b C M M W H r t t T V C 4 3 0 1 S t e e r C 4 D N P P S o e C 6 6 1 3 C n 6 6 N P P C D m 6 6 3 3 c r C 5 1 1 C P 1 1 o C - 1 4 s C 6 D D C 3 C 6 D P C i D P D c C d 3 P C P 1 H D 5 C C C s 6 C K C D C D D G P G A 5 m / H N P G / A C s s C D 3 3 i G s / d D D C 6 s 3 C 5 D C D C C Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 23/50

  42. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 24/50

  43. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50

  44. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Classical view • Division of labor (A. Smith, 1776; Ricardo, 1817) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50

  45. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Classical view • Division of labor (A. Smith, 1776; Ricardo, 1817) • Specialization leads to economic efficiency Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50

  46. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Classical view • Division of labor (A. Smith, 1776; Ricardo, 1817) • Specialization leads to economic efficiency • Export portfolios Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50

  47. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations What makes some countries wealthier than others? Classical view • Division of labor (A. Smith, 1776; Ricardo, 1817) • Specialization leads to economic efficiency • Export portfolios → block-structure Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 25/50

  48. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: 0 Countries 40 80 120 0 200 400 600 Products E nd / � p E nd RCA nd = � n E nd / � n,d E nd � if RCA nd ≥ 1 1 , x nd = 0 , otherwise Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50

  49. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: Properties: 0 Countries 40 80 120 0 200 400 600 Products E nd / � p E nd RCA nd = � n E nd / � n,d E nd � if RCA nd ≥ 1 1 , x nd = 0 , otherwise Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50

  50. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: Properties: 0 Countries 1 Triangularity 40 80 120 0 200 400 600 Products E nd / � p E nd RCA nd = � n E nd / � n,d E nd � if RCA nd ≥ 1 1 , x nd = 0 , otherwise Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50

  51. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: Properties: 0 Countries 1 Triangularity 40 2 D ≫ N 80 120 0 200 400 600 Products E nd / � p E nd RCA nd = � n E nd / � n,d E nd � if RCA nd ≥ 1 1 , x nd = 0 , otherwise Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50

  52. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Motivation: wealth of nations The reality: Properties: 0 Countries 1 Triangularity 40 2 D ≫ N 80 120 Our Approach 0 200 400 600 Products 1 Develop an infinite Poisson factor analysis model . . . E nd / � p E nd RCA nd = • flexible prior � n E nd / � n,d E nd • feature sparsity � if RCA nd ≥ 1 1 , x nd = 2 Design a time-varying 0 , otherwise extension Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 26/50

  53. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Bernoulli process Poisson factor analysis (BeP-PFA) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 27/50

  54. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Bernoulli process Poisson factor analysis (BeP-PFA) Generative Model � � x nd ∼ Poisson Z n • B • d α B , µ B � � B kd ∼ Gamma α B Z ∼ IBP( α ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 27/50

  55. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Limitation of the IBP • Number of ones per row J n ∝ Poisson( α ) • Number of non-empty features K ∝ Poisson( α � N 1 j ) j =1 • Mass parameter α couples both J n and K α = 1 α = 10 0 0 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 0 5 0 10 20 30 40 nz = 97 nz = 339 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 28/50

  56. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Beyond the standard IBP Three-parameter IBP (Teh et.al, 2007) • More flexible distribution for feature weights Z n • ∼ BeP( µ ) (5.1) µ ∼ SBP(1 , α, H, c, σ ) (5.2) � α Γ(1 + c )Γ( n + c + σ − 1) � ∼ p ( J new ) Poisson Γ( n + c )Γ( c + σ ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 29/50

  57. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Beyond the standard IBP Three-parameter IBP (Teh et.al, 2007) • More flexible distribution for feature weights Z n • ∼ BeP( µ ) (5.1) µ ∼ SBP(1 , α, H, c, σ ) (5.2) � α Γ(1 + c )Γ( n + c + σ − 1) � ∼ p ( J new ) Poisson Γ( n + c )Γ( c + σ ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 29/50

  58. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Beyond the standard IBP Restricted IBP Three-parameter IBP (Doshi-Velez et.al, 2015) (Teh et.al, 2007) • Arbitrary prior f over J n • More flexible distribution for feature weights Z n • ∼ R - BeP( µ, f ) (5.3) µ ∼ BP(1 , α, H ) (5.4) Z n • ∼ BeP( µ ) (5.1) µ ∼ SBP(1 , α, H, c, σ ) (5.2) � α Γ(1 + c )Γ( n + c + σ − 1) � ∼ p ( J new ) Poisson Γ( n + c )Γ( c + σ ) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 29/50

  59. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Beyond the standard IBP Restricted IBP Three-parameter IBP (Doshi-Velez et.al, 2015) (Teh et.al, 2007) • Arbitrary prior f over J n • More flexible distribution for feature weights Z n • ∼ R - BeP( µ, f ) (5.3) µ ∼ BP(1 , α, H ) (5.4) Z n • ∼ BeP( µ ) (5.1) µ ∼ SBP(1 , α, H, c, σ ) (5.2) � α Γ(1 + c )Γ( n + c + σ − 1) � • Combination of both ∼ p ( J new ) Poisson Γ( n + c )Γ( c + σ ) • Flexible prior Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 29/50

  60. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Our contributions α = 1 α = 10 0 0 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 0 5 0 10 20 30 40 nz = 97 nz = 339 3RBeP-PFA for static scenario � � x nd ∼ Poisson Z n • B • d α B , µ B � � B kd ∼ Gamma α B ∼ 3R - IBP( α, c, σ, f ) Z • Inference : aux. vars + dynamic programming (Doshi-Velez et.al, 2015) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 30/50

  61. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Our contributions α = 1 α = 10 0 0 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 0 5 0 10 20 30 40 nz = 97 nz = 339 dBeP-PFA for dynamic scenario 3RBeP-PFA for static scenario x ( t ) Z ( t ) � � ∼ Poisson n • B • d � � x nd ∼ Poisson Z n • B • d nd α B , µ B � � α B , µ B � � B kd ∼ Gamma B kd ∼ Gamma α B α B Z ( • ) ∼ 3R - IBP( α, c, σ, f ) ∼ mIBP( α, γ, δ ) Z n • • Inference : forward-filtering • Inference : aux. vars + dynamic backward-sampling (Gael et.al, 2009) programming (Doshi-Velez et.al, 2015) Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 30/50

  62. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results in static scenario Quantitative analysis: accuracy Vs interpretability Metric PMF NNMF BeP-PFA sBeP-PFA 3RBeP-PFA Log Perplexity 1 . 68 ± 0 . 01 1 . 61 ± 0 . 01 1 . 59 ± 0 . 04 3 . 26 ± 0 . 17 1 . 62 ± 0 . 01 Coherence − 264 . 60 ± 4 . 74 − 263 . 27 ± 7 . 45 − 149 . 36 ± 7 . 56 − 178 . 44 ± 4 . 50 − 140 . 51 ± 2 . 73 (a) 2010 SITC database ( N = 126 , D = 744 ) Metric PMF NNMF BeP-PFA sBeP-PFA 3RBeP-PFA Log Perplexity 1 . 48 ± 0 . 01 1 . 47 ± 0 . 01 1 . 58 ± 0 . 01 2 . 56 ± 0 . 12 1 . 57 ± 0 . 02 Coherence − 264 . 73 ± 3 . 11 − 264 . 67 ± 6 . 22 − 148 . 91 ± 10 . 57 − 168 . 39 ± 13 . 16 − 134 . 51 ± 4 . 43 (b) 2010 HS database ( N = 123 , D = 4890 ) • PMF: Probabilistic matrix factorization (Mnih et.al, 2008) • NNMF: Non-negative matrix factorization (Schmidt et.al, 2009) • BeP-PFA: Bernoulli process Poisson factor analysis • sBeP-PFA: sparse Bernoulli process Poisson factor analysis • 3RBeP-PFA: Three-parameter Restricted Bernoulli process Poisson factor analysis Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 31/50

  63. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results in static scenario Capturing input sparsity structure 400 400 Inferred Inferred 0 400 0 400 Empirical Empirical (a) Baseline (b) BeP-PFA 400 400 Inferred Inferred 0 400 0 400 Empirical Empirical (c) sBeP-PFA (d) 3RBeP-PFA Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 32/50

  64. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Results in static scenario Interpretability Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 33/50

  65. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Temporal Dynamics Indonesia 1 F4 Capabilities F5 F0 Bias F9 F1 Agriculture 0 . 5 F2 F2 Clothing I F15 F3 Farming F4 Clothing II 0 F5 Electronics I 1965 1975 1985 1995 2005 F6 Processed Materials F7 Electronics II Egypt F8 Materials I F9 Machinery I 1 F10 Materials II F4 F11 Automobile F11 F12 Chemicals I 0 . 5 F1 F13 Chemicals II F2 F14 Machinery II F15 Miscellaneous 0 1965 1975 1985 1995 2005 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 34/50

  66. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Temporal Dynamics Indonesia 1 F4 Capabilities F5 F0 Bias F9 F1 Agriculture 0 . 5 F2 F2 Clothing I F15 F3 Farming F4 Clothing II 0 F5 Electronics I 1965 1975 1985 1995 2005 F6 Processed Materials F7 Electronics II Egypt F8 Materials I F9 Machinery I 1 F10 Materials II F4 F11 Automobile F11 F12 Chemicals I 0 . 5 F1 F13 Chemicals II F2 F14 Machinery II F15 Miscellaneous 0 1965 1975 1985 1995 2005 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 34/50

  67. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Temporal Dynamics Indonesia 1 F4 Capabilities F5 F0 Bias F9 F1 Agriculture 0 . 5 F2 F2 Clothing I F15 F3 Farming F4 Clothing II 0 F5 Electronics I 1965 1975 1985 1995 2005 F6 Processed Materials F7 Electronics II Egypt F8 Materials I F9 Machinery I 1 F10 Materials II F4 F11 Automobile F11 F12 Chemicals I 0 . 5 F1 F13 Chemicals II F2 F14 Machinery II F15 Miscellaneous 0 1965 1975 1985 1995 2005 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 34/50

  68. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Model extension: Dynamic PFA Indonesia 1 F1 F4 F7 Model extension 0 . 5 F10 F13 x ( t ) Z ( t ) � � ∼ Poisson n • B • d nd 0 α B , µ B � � 1965 1975 1985 1995 2005 B kd ∼ Gamma α B Egypt Z ( • ) ∼ mIBP( α, γ, δ ) n • 1 F3 mIBP: markov Indian buffet process F5 0 . 5 (Gael et.al, 2009) F7 F10 0 1965 1975 1985 1995 2005 Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 35/50

  69. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Outline 1 Introduction 2 Bayesian nonparametrics 3 ADDP mixture model for marathon model 4 C-IBP feature model for clinical trials 5 PFA models for international trade 6 Conclusions Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 36/50

  70. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Conclusions BNPs • useful BNP models for specific data exploration tasks • Fair density estimation model • Structured general latent feature model (global and group-specific factors) • Flexible Poisson factor analysis models in static/dynamic scenarios Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 37/50

  71. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Conclusions BNPs • useful BNP models for specific data exploration tasks • Fair density estimation model • Structured general latent feature model (global and group-specific factors) • Flexible Poisson factor analysis models in static/dynamic scenarios Sports science • age-gender curves • fair grading system • running patterns over time Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 37/50

  72. Intro BNPs ADDP for Marathon Modeling C-IBP for Clinical Trial PFAs for International Trade Conclusions Conclusions BNPs • useful BNP models for specific data exploration tasks • Fair density estimation model • Structured general latent feature model (global and group-specific factors) • Flexible Poisson factor analysis models in static/dynamic scenarios Cancer research Sports science • subpopulation • age-gender curves learning • fair grading system • biomarker discovery • running patterns • clinico-genetic over time associations Melanie F. Pradier (UC3M) Bayesian Nonparametric Models for Data Exploration 2017-09-15 37/50

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend