Regression Analysis for Probabilistic Cause-of-Disease Assignment - PowerPoint PPT Presentation

Background Models Regression Simulations Results Discussion “nested” pLCM Relax the LI and Non-interference Assumption • Direct evidence against LI: control measurements ( M i 1 , ..., M iJ ) ′ • test cross-reactions (prevented in PERCH assays) • lab technicians effect • heterogeneity in subjects’ immunity level Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 18 / 55

Background Models Regression Simulations Results Discussion “nested” pLCM Relax the LI and Non-interference Assumption • Direct evidence against LI: control measurements ( M i 1 , ..., M iJ ) ′ • test cross-reactions (prevented in PERCH assays) • lab technicians effect • heterogeneity in subjects’ immunity level • Deviations from independence impacts inference (Cf. Pepe and Janes, 2007, Biostatistics ; Albert et al., 2001, Biometrics ) Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 18 / 55

Background Models Regression Simulations Results Discussion “nested” pLCM Relax the LI and Non-interference Assumption • Direct evidence against LI: control measurements ( M i 1 , ..., M iJ ) ′ • test cross-reactions (prevented in PERCH assays) • lab technicians effect • heterogeneity in subjects’ immunity level • Deviations from independence impacts inference (Cf. Pepe and Janes, 2007, Biostatistics ; Albert et al., 2001, Biometrics ) • Modeling Deviation from LI Modeling a cross-classified probability contingency table P [ M i 1 = m 1 , ..., M iJ = m J | I i ] , ∀ m = ( m 1 , ..., m J ) ′ Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 18 / 55

Background Models Regression Simulations Results Discussion “nested” pLCM Relax the LI and Non-interference Assumption • Direct evidence against LI: control measurements ( M i 1 , ..., M iJ ) ′ • test cross-reactions (prevented in PERCH assays) • lab technicians effect • heterogeneity in subjects’ immunity level • Deviations from independence impacts inference (Cf. Pepe and Janes, 2007, Biostatistics ; Albert et al., 2001, Biometrics ) • Modeling Deviation from LI Modeling a cross-classified probability contingency table P [ M i 1 = m 1 , ..., M iJ = m J | I i ] , ∀ m = ( m 1 , ..., m J ) ′ • Log-linear parameterization Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 18 / 55

Background Models Regression Simulations Results Discussion “nested” pLCM Relax the LI and Non-interference Assumption • Direct evidence against LI: control measurements ( M i 1 , ..., M iJ ) ′ • test cross-reactions (prevented in PERCH assays) • lab technicians effect • heterogeneity in subjects’ immunity level • Deviations from independence impacts inference (Cf. Pepe and Janes, 2007, Biostatistics ; Albert et al., 2001, Biometrics ) • Modeling Deviation from LI Modeling a cross-classified probability contingency table P [ M i 1 = m 1 , ..., M iJ = m J | I i ] , ∀ m = ( m 1 , ..., m J ) ′ • Log-linear parameterization • Generalized linear mixed-effect models (GLMM) Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 18 / 55

Background Models Regression Simulations Results Discussion “nested” pLCM Relax the LI and Non-interference Assumption • Direct evidence against LI: control measurements ( M i 1 , ..., M iJ ) ′ • test cross-reactions (prevented in PERCH assays) • lab technicians effect • heterogeneity in subjects’ immunity level • Deviations from independence impacts inference (Cf. Pepe and Janes, 2007, Biostatistics ; Albert et al., 2001, Biometrics ) • Modeling Deviation from LI Modeling a cross-classified probability contingency table P [ M i 1 = m 1 , ..., M iJ = m J | I i ] , ∀ m = ( m 1 , ..., m J ) ′ • Log-linear parameterization • Generalized linear mixed-effect models (GLMM) • Simplex factor model; similar to mixed-membership model (Cf. Bhattacharya and Dunson, 2012, JASA ) Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 18 / 55

Background Models Regression Simulations Results Discussion “nested” pLCM Relax the LI and Non-interference Assumption • Direct evidence against LI: control measurements ( M i 1 , ..., M iJ ) ′ • test cross-reactions (prevented in PERCH assays) • lab technicians effect • heterogeneity in subjects’ immunity level • Deviations from independence impacts inference (Cf. Pepe and Janes, 2007, Biostatistics ; Albert et al., 2001, Biometrics ) • Modeling Deviation from LI Modeling a cross-classified probability contingency table P [ M i 1 = m 1 , ..., M iJ = m J | I i ] , ∀ m = ( m 1 , ..., m J ) ′ • Log-linear parameterization • Generalized linear mixed-effect models (GLMM) • Simplex factor model; similar to mixed-membership model (Cf. Bhattacharya and Dunson, 2012, JASA ) • PARAFAC decomposition (Cf. Dunson and Xing, 2009, JASA ) Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 18 / 55

Background Models Regression Simulations Results Discussion Nested Partially-Latent Class Models (npLCM; Wu and Zeger, 2016) Example: 5 Pathogens, 2 Subclasses; BrS Data Only Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 19 / 55

Background Models Regression Simulations Results Discussion Nested Partially-Latent Class Models (npLCM; Wu and Zeger, 2016) Example: 5 Pathogens, 3 Subclasses; BrS Data Only Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 20 / 55

Background Models Regression Simulations Results Discussion Encourage Few Subclasses: Stick-Breaking Prior V j ∼ Beta(1 , α ); Example: K = 10, α = 1 • On average, the first several segments receive most weights Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 21 / 55

Background Models Regression Simulations Results Discussion npLCM: Likelihood and Prior BrS Data Only • Likelihood K J � � m j � � 1 − m j , � � ψ ( j ) 1 − ψ ( j ) P 0 ( M i = m ) = ν k k k j =1 k =1   J K � � m j � � 1 − m j � � � m ℓ � � 1 − m ℓ � � θ ( j ) 1 − θ ( j ) ψ ( j ) 1 − ψ ( j )  , P 1 ( M i = m ) = π j  η k k k k k j =1 k =1 ℓ � = j • Prior: ∼ Dirichlet( . 5 , . . . , . 5) , π ψ ( j ) ∼ Beta(1 , 1) , θ k ∼ Beta( c 1 kj , c 2 kj ) , j = 1 , ..., J ; k = 1 , ..., ∞ , k ∞ � � Z i ′ | I L i ′ = j ∼ U k [1 − U ℓ ] δ k , U k ∼ Beta(1 , α 0 ) , for all cases, k =1 ℓ< k ∞ � � Z i ∼ V k [1 − V ℓ ] δ k , V k ∼ Beta(1 , α 0 ) , for all controls , k =1 ℓ< k α 0 ∼ Gamma(0 . 25 , 0 . 25) , Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 22 / 55

(I) (II) 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 (A,B) (A,C) (A,D) (A,E) (A,B) (A,C) (A,D) (A,E) 4 D B E C 2 D B B C D C E E 1 A B C D E A B C D E A B C D E A B C D E A A A A 0.5 C E 0.2 B D (A,B) (B,C) (B,D) (B,E) (A,B) (B,C) (B,D) (B,E) 4 4 D Odds Ratio (log−scale) E 2 A A B C D 2 B C 1 E 1 A B C D E A B C D E A B C D E 0.5 0.5 B A C 0.2 E 0.2 D (A,C) (B,C) (C,D) (C,E) (A,C) (B,C) (C,D) (C,E) 4 4 CASES CASES C C 2 O O 2 N N 1 1 Background Models T C D Regression A B C D E Simulations T Results Discussion A B E R R 0.5 A A B C D 0.5 O O B C E L L 0.2 D 0.2 S S E (a) (A,D) (B,D) (C,D) (D,E) (A,D) (B,D) (C,D) (D,E) 4 4 Estimation Bias if Ignoring Local Dependence (LD) 2 D E 2 A B C 1 1 A B C D E 0.5 0.5 0.2 0.2 Simulation: LD Truth (npLCM) Estimated by Working LI Models (pLCM) (A,E) (B,E) (C,E) (D,E) (A,E) (B,E) (C,E) (D,E) 4 2 1 0.5 0.2 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 (I : weak LD ) (II : strong LD ) 120 Asymptotic Bias (PRAB) 100 80 60 Percent Relative C 40 smoothed_mat 20 smoothed_mat E ( 0 A B C D E D A −20 B −40 −60 −80 −100 −120 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 Cases' First Subclass Weight ( η 1 ) Marginal Class A Class B Class C Class D Class E Controls: Marginal Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 23 / 55

So Far: A General Framework Nested Partially Latent Class Models (npLCM) For simplicity, we assume “single-pathogen causes”, or a single relevant feature per cluster, or more visually, ”one row of green boxes per disease class”

Background Models Regression Simulations Results Discussion npLCM Framework (no Covariates) Three components of a likelihood function: Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 25 / 55

Background Models Regression Simulations Results Discussion npLCM Framework (no Covariates) Three components of a likelihood function: a. Cause-specific case fractions (CSCF): π = ( π 1 , . . . , π L ) ⊤ = { π ℓ = P ( I = ℓ | Y = 1) , ℓ = 1 , . . . , L } ∈ S L − 1 ; Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 25 / 55

Background Models Regression Simulations Results Discussion npLCM Framework (no Covariates) Three components of a likelihood function: a. Cause-specific case fractions (CSCF): π = ( π 1 , . . . , π L ) ⊤ = { π ℓ = P ( I = ℓ | Y = 1) , ℓ = 1 , . . . , L } ∈ S L − 1 ; b. P 1 ℓ = { P 1 ℓ ( m ) } = { P ( M = m | I = ℓ, Y = 1) } : a table of probabilities of making J binary observations M = m in a case class ℓ � = 0; Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 25 / 55

Background Models Regression Simulations Results Discussion npLCM Framework (no Covariates) Three components of a likelihood function: a. Cause-specific case fractions (CSCF): π = ( π 1 , . . . , π L ) ⊤ = { π ℓ = P ( I = ℓ | Y = 1) , ℓ = 1 , . . . , L } ∈ S L − 1 ; b. P 1 ℓ = { P 1 ℓ ( m ) } = { P ( M = m | I = ℓ, Y = 1) } : a table of probabilities of making J binary observations M = m in a case class ℓ � = 0; c. P 0 = { P 0 ( m ) } = { P ( M = m | I = 0 , Y = 0) } : the same probability table as above but for controls. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 25 / 55

Background Models Regression Simulations Results Discussion npLCM Framework (no Covariates) Three components of a likelihood function: a. Cause-specific case fractions (CSCF): π = ( π 1 , . . . , π L ) ⊤ = { π ℓ = P ( I = ℓ | Y = 1) , ℓ = 1 , . . . , L } ∈ S L − 1 ; b. P 1 ℓ = { P 1 ℓ ( m ) } = { P ( M = m | I = ℓ, Y = 1) } : a table of probabilities of making J binary observations M = m in a case class ℓ � = 0; c. P 0 = { P 0 ( m ) } = { P ( M = m | I = 0 , Y = 0) } : the same probability table as above but for controls. Cases’ disease classes are unobserved , so the distribution of their measurements is a weighted finite-mixture model: P 1 = � L ℓ =1 π ℓ P 1 ℓ Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 25 / 55

Background Models Regression Simulations Results Discussion npLCM Framework (no Covariates) Three components of a likelihood function: a. Cause-specific case fractions (CSCF): π = ( π 1 , . . . , π L ) ⊤ = { π ℓ = P ( I = ℓ | Y = 1) , ℓ = 1 , . . . , L } ∈ S L − 1 ; b. P 1 ℓ = { P 1 ℓ ( m ) } = { P ( M = m | I = ℓ, Y = 1) } : a table of probabilities of making J binary observations M = m in a case class ℓ � = 0; c. P 0 = { P 0 ( m ) } = { P ( M = m | I = 0 , Y = 0) } : the same probability table as above but for controls. Cases’ disease classes are unobserved , so the distribution of their measurements is a weighted finite-mixture model: P 1 = � L ℓ =1 π ℓ P 1 ℓ The likelihood:     L � � � L = L 1 · L 0 = π ℓ · P 1 ℓ ( M i ; Θ , Ψ , η )  × P 0 ( M i ′ ; Ψ , ν )  i ′ : Y i ′ =0 i : Y i =1 ℓ =1 Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 25 / 55

Background Models Regression Simulations Results Discussion Special Case: pLCM (Wu et al., 2016) Setting η 1 = 1 and ν 1 = 1 Control model for multivariate binary data { M i : where Y i = 0 } : 1. P 0 ( m ) = � J j =1 { ψ j } m j { 1 − ψ j } 1 − m j = Π( m ; ψ ) 1a. Π( m ; s ) = � J j =1 { s j } m ij { 1 − s j } 1 − m ij is the probability mass function for a product Bernoulli distribution given the success probabilities s = ( s 1 , . . . , s J ) ⊤ , 0 ≤ s j ≤ 1 1b. Parameters ψ = ( ψ 1 , . . . , ψ J ) ⊤ represent the positive rates absent disease, referred to as “false positive rates” (FPRs). Local Independence: M ij ⊥ M ij ′ | I = 0 Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 26 / 55

Background Models Regression Simulations Results Discussion Special Case: pLCM (Wu et al., 2016) Model for the multivariate binary data in case class ℓ � = 0 2. P 1 ℓ ( m ) is a product of the probabilities of measurements made Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 27 / 55

Background Models Regression Simulations Results Discussion Special Case: pLCM (Wu et al., 2016) Model for the multivariate binary data in case class ℓ � = 0 2. P 1 ℓ ( m ) is a product of the probabilities of measurements made 2a. on the causative pathogen ℓ , P ( M ℓ | I = ℓ, Y = 1 , θ ) = { θ ℓ } M ℓ { 1 − θ ℓ } 1 − M ℓ , where θ = ( θ 1 , . . . , θ J ) ⊤ are “true positive rates” (TPRs), larger than FPRs. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 27 / 55

Background Models Regression Simulations Results Discussion Special Case: pLCM (Wu et al., 2016) Model for the multivariate binary data in case class ℓ � = 0 2. P 1 ℓ ( m ) is a product of the probabilities of measurements made 2a. on the causative pathogen ℓ , P ( M ℓ | I = ℓ, Y = 1 , θ ) = { θ ℓ } M ℓ { 1 − θ ℓ } 1 − M ℓ , where θ = ( θ 1 , . . . , θ J ) ⊤ are “true positive rates” (TPRs), larger than FPRs. 2b. on the non-causative pathogens P ( M i [ − ℓ ] | I i = ℓ, Y i = 1 , ψ [ − ℓ ] ) = Π( M [ − ℓ ] ; ψ [ − ℓ ] ), where a [ − ℓ ] represents all but the ℓ -th element in a vector a . Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 27 / 55

Background Models Regression Simulations Results Discussion Special Case: pLCM (Wu et al., 2016) Model for the multivariate binary data in case class ℓ � = 0 2. P 1 ℓ ( m ) is a product of the probabilities of measurements made 2a. on the causative pathogen ℓ , P ( M ℓ | I = ℓ, Y = 1 , θ ) = { θ ℓ } M ℓ { 1 − θ ℓ } 1 − M ℓ , where θ = ( θ 1 , . . . , θ J ) ⊤ are “true positive rates” (TPRs), larger than FPRs. 2b. on the non-causative pathogens P ( M i [ − ℓ ] | I i = ℓ, Y i = 1 , ψ [ − ℓ ] ) = Π( M [ − ℓ ] ; ψ [ − ℓ ] ), where a [ − ℓ ] represents all but the ℓ -th element in a vector a . 2c. Under the single-pathogen-cause assumption, pLCM uses J TPRs θ for L = J causes and J FPRs ψ . Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 27 / 55

Background Models Regression Simulations Results Discussion Special Case: pLCM (Wu et al., 2016) Model for the multivariate binary data in case class ℓ � = 0 2. P 1 ℓ ( m ) is a product of the probabilities of measurements made 2a. on the causative pathogen ℓ , P ( M ℓ | I = ℓ, Y = 1 , θ ) = { θ ℓ } M ℓ { 1 − θ ℓ } 1 − M ℓ , where θ = ( θ 1 , . . . , θ J ) ⊤ are “true positive rates” (TPRs), larger than FPRs. 2b. on the non-causative pathogens P ( M i [ − ℓ ] | I i = ℓ, Y i = 1 , ψ [ − ℓ ] ) = Π( M [ − ℓ ] ; ψ [ − ℓ ] ), where a [ − ℓ ] represents all but the ℓ -th element in a vector a . 2c. Under the single-pathogen-cause assumption, pLCM uses J TPRs θ for L = J causes and J FPRs ψ . 2a-2b: Local Independence (LI): M ij ⊥ M ij ′ | I = ℓ � = 0 Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 27 / 55

Background Models Regression Simulations Results Discussion Special Case: pLCM (Wu et al., 2016) Model for the multivariate binary data in case class ℓ � = 0 2. P 1 ℓ ( m ) is a product of the probabilities of measurements made 2a. on the causative pathogen ℓ , P ( M ℓ | I = ℓ, Y = 1 , θ ) = { θ ℓ } M ℓ { 1 − θ ℓ } 1 − M ℓ , where θ = ( θ 1 , . . . , θ J ) ⊤ are “true positive rates” (TPRs), larger than FPRs. 2b. on the non-causative pathogens P ( M i [ − ℓ ] | I i = ℓ, Y i = 1 , ψ [ − ℓ ] ) = Π( M [ − ℓ ] ; ψ [ − ℓ ] ), where a [ − ℓ ] represents all but the ℓ -th element in a vector a . 2c. Under the single-pathogen-cause assumption, pLCM uses J TPRs θ for L = J causes and J FPRs ψ . 2a-2b: Local Independence (LI): M ij ⊥ M ij ′ | I = ℓ � = 0 2a-2b. Non-interference: disease-causing pathogen(s) are more frequently detected among cases than controls ( θ ℓ > ψ ℓ ) and the non-causative pathogens are observed with the same rates among cases as in controls Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 27 / 55

Background Models Regression Simulations Results Discussion Regression Analysis in nested PLCM In large-scale disease etiology studies: • Data : case-control diagnostic tests, multivariate binary observations • Scientific problem : estimate cause-specific case fractions (CSCF); Think “Pie chart” for cases Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 28 / 55

Background Models Regression Simulations Results Discussion Regression Analysis in nested PLCM In large-scale disease etiology studies: • Data : case-control diagnostic tests, multivariate binary observations • Scientific problem : estimate cause-specific case fractions (CSCF); Think “Pie chart” for cases • Statistical problem : Using nested PLCM to estimate the mixing distribution among the cases Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 28 / 55

Background Models Regression Simulations Results Discussion Regression Analysis in nested PLCM In large-scale disease etiology studies: • Data : case-control diagnostic tests, multivariate binary observations • Scientific problem : estimate cause-specific case fractions (CSCF); Think “Pie chart” for cases • Statistical problem : Using nested PLCM to estimate the mixing distribution among the cases • Motivation for regression analyses : CSCFs may vary by season, a child’s age, HIV status, disease severity Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 28 / 55

Background Models Regression Simulations Results Discussion Data (with Covariates) • D = { ( M i , Y i , X i Y i , W i ) , i = 1 , . . . , N } Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 29 / 55

Background Models Regression Simulations Results Discussion Data (with Covariates) • D = { ( M i , Y i , X i Y i , W i ) , i = 1 , . . . , N } • M i = ( M i 1 , ..., M iJ ) ⊤ : binary measurements; Indicate the presence or absence of J pathogens for subject i = 1 , . . . , N . Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 29 / 55

Background Models Regression Simulations Results Discussion Data (with Covariates) • D = { ( M i , Y i , X i Y i , W i ) , i = 1 , . . . , N } • M i = ( M i 1 , ..., M iJ ) ⊤ : binary measurements; Indicate the presence or absence of J pathogens for subject i = 1 , . . . , N . • Y i : case (1) or a control (0). Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 29 / 55

Background Models Regression Simulations Results Discussion Data (with Covariates) • D = { ( M i , Y i , X i Y i , W i ) , i = 1 , . . . , N } • M i = ( M i 1 , ..., M iJ ) ⊤ : binary measurements; Indicate the presence or absence of J pathogens for subject i = 1 , . . . , N . • Y i : case (1) or a control (0). • X i = ( X i 1 , . . . , X ip ) ⊤ : covariates that may influence case i ’s etiologic fractions Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 29 / 55

Background Models Regression Simulations Results Discussion Data (with Covariates) • D = { ( M i , Y i , X i Y i , W i ) , i = 1 , . . . , N } • M i = ( M i 1 , ..., M iJ ) ⊤ : binary measurements; Indicate the presence or absence of J pathogens for subject i = 1 , . . . , N . • Y i : case (1) or a control (0). • X i = ( X i 1 , . . . , X ip ) ⊤ : covariates that may influence case i ’s etiologic fractions • W i = ( W i 1 , . . . , W iq ) ⊤ : shared by cases and controls; possibly different from X i ; may influence control distribution [ M i | W i , Y i = 0]. For example, healthy controls do not have disease severity information (which can be included in X i ). Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 29 / 55

Background Models Regression Simulations Results Discussion Data (with Covariates) • D = { ( M i , Y i , X i Y i , W i ) , i = 1 , . . . , N } • M i = ( M i 1 , ..., M iJ ) ⊤ : binary measurements; Indicate the presence or absence of J pathogens for subject i = 1 , . . . , N . • Y i : case (1) or a control (0). • X i = ( X i 1 , . . . , X ip ) ⊤ : covariates that may influence case i ’s etiologic fractions • W i = ( W i 1 , . . . , W iq ) ⊤ : shared by cases and controls; possibly different from X i ; may influence control distribution [ M i | W i , Y i = 0]. For example, healthy controls do not have disease severity information (which can be included in X i ). • Continuous covariates: the first p 1 and q 1 elements of X i and W i , respectively. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 29 / 55

Background Models Regression Simulations Results Discussion Motivating Application Again: PERCH Study Data : 494 cases and 944 controls from one site Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 30 / 55

Background Models Regression Simulations Results Discussion Motivating Application Again: PERCH Study Data : 494 cases and 944 controls from one site Goal a. : Estimate CSCFs at all covariate values, and assign cause-specific probabilities for each case Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 30 / 55

Background Models Regression Simulations Results Discussion Motivating Application Again: PERCH Study Data : 494 cases and 944 controls from one site Goal a. : Estimate CSCFs at all covariate values, and assign cause-specific probabilities for each case Goal b. : Quantify overall cause-specific disease burdens in a population, i.e., overall CSCFs π ∗ = ( π ∗ L ) ⊤ as an 1 , . . . , π ∗ empirical average of the stratum-specific CSCFs (by X ); Of policy interest (vaccine/antibiotics development and manufacture) Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 30 / 55

Background Models Regression Simulations Results Discussion Motivating Application Again: PERCH Study Data : 494 cases and 944 controls from one site Goal a. : Estimate CSCFs at all covariate values, and assign cause-specific probabilities for each case Goal b. : Quantify overall cause-specific disease burdens in a population, i.e., overall CSCFs π ∗ = ( π ∗ L ) ⊤ as an 1 , . . . , π ∗ empirical average of the stratum-specific CSCFs (by X ); Of policy interest (vaccine/antibiotics development and manufacture) Model : • J = 7: noisy presence/absence of 2 bacteria and 5 viruses in the nose • Causes: seven single-pathogen causes plus an “Not Specified” (NoS) cause; So L = J + 1 • X i : enrollment date, age ( < or > 1 year), disease severity for cases (severe or very severe), HIV status (+/-) • W i : X i minus “disease severity”. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 30 / 55

Background Models Regression Simulations Results Discussion PERCH Data: Sparsely-Populated Strata � Table: The observed count (frequency) of cases and controls by age, disease severity and HIV status (1: yes; 0: no). The marginal fractions among cases and controls for each covariate are shown at the bottom. Regression results will be shown for the first two strata. age ≥ 1 very severe (VS) HIV positive # cases (%) # controls (%) (case-only) total: 524 (100) total: 964 (100) 0 0 0 208 (39.7) 545 (56.5) 1 0 0 72 (13.7) 278 (28.8) 0 1 0 116 (22.1) - 1 1 0 33 (6.3) - 0 0 1 37 (7.1) 85 (8.8) 1 0 1 24 (4.5) 51 (5.3) 0 1 1 25 (4.8) - 1 1 1 3 (0.6) - case: 25 . 2% 34 . 5% 17 . 0% control: 34 . 3% - 14 . 1% Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 31 / 55

Background Models Regression Simulations Results Discussion Current Methods Fall Short � • Fully-stratified analysis : fit an npLCM to the case-control data in each covariate stratum. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 32 / 55

Background Models Regression Simulations Results Discussion Current Methods Fall Short � • Fully-stratified analysis : fit an npLCM to the case-control data in each covariate stratum. Like pLCM, the npLCM is partially-identified in each stratum, necessitating multiple sets of independent informative priors across multiple strata. Two primary issues: Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 32 / 55

Background Models Regression Simulations Results Discussion Current Methods Fall Short � • Fully-stratified analysis : fit an npLCM to the case-control data in each covariate stratum. Like pLCM, the npLCM is partially-identified in each stratum, necessitating multiple sets of independent informative priors across multiple strata. Two primary issues: Gap 1a Unstable CSCF estimates due to sparsely-populated strata. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 32 / 55

Background Models Regression Simulations Results Discussion Current Methods Fall Short � • Fully-stratified analysis : fit an npLCM to the case-control data in each covariate stratum. Like pLCM, the npLCM is partially-identified in each stratum, necessitating multiple sets of independent informative priors across multiple strata. Two primary issues: Gap 1a Unstable CSCF estimates due to sparsely-populated strata. Gap 1b Informative TPR priors are often elicited for a case population and rarely for each stratum; Reusing independent prior distributions of the TPRs across all the strata will lead to overly-optimistic posterior uncertainty in π ∗ , hampering policy decisions. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 32 / 55

Background Models Regression Simulations Results Discussion The Rest of Talk � More focus on model formulation; Inference done by ‘baker‘ Extend the npLCM to perform regression analysis in case-control disease etiology studies that Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 33 / 55

Background Models Regression Simulations Results Discussion The Rest of Talk � More focus on model formulation; Inference done by ‘baker‘ Extend the npLCM to perform regression analysis in case-control disease etiology studies that (a) incorporates controls to estimate the CSCFs ( π ), Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 33 / 55

Background Models Regression Simulations Results Discussion The Rest of Talk � More focus on model formulation; Inference done by ‘baker‘ Extend the npLCM to perform regression analysis in case-control disease etiology studies that (a) incorporates controls to estimate the CSCFs ( π ), (b) specifies parsimonious functional dependence of π upon covariates such as additivity, and Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 33 / 55

Background Models Regression Simulations Results Discussion The Rest of Talk � More focus on model formulation; Inference done by ‘baker‘ Extend the npLCM to perform regression analysis in case-control disease etiology studies that (a) incorporates controls to estimate the CSCFs ( π ), (b) specifies parsimonious functional dependence of π upon covariates such as additivity, and (c) correctly assesses the posterior uncertainty of the CSCF functions and the overall CSCFs π ∗ by applying the TPR priors just once. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 33 / 55

Now, how to incorporate covariates, to which quantities? Regression Extension for P 0 and P 1 : letting π ℓ , ν k , η k depend on covariates

Background Models Regression Simulations Results Discussion Roadmap Let three sets of parameters in an npLCM (pg.17) depend on the observed covariates 1x. Etiology regression function among cases, { π ℓ ( x ) , ℓ � = 0 } , which is of primary scientific interest 2x. Conditional probability of measurements m given covariates w in controls: P 0 ( m ; w ) = [ M = m | W = w , I = 0], 3x. 2x above, but in the case class ℓ : P 1 ℓ ( m ; w ) = [ M = m | W = w , I = ℓ ], ℓ = 1 , . . . , L note Keep the specifications for the TPRs and FPRs ( Θ , Ψ ) as in the original npLCM. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 35 / 55

Background Models Regression Simulations Results Discussion Etiology Regression π ℓ ( X ) π ℓ ( X ) is the primary target of inference. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 36 / 55

Background Models Regression Simulations Results Discussion Etiology Regression π ℓ ( X ) π ℓ ( X ) is the primary target of inference. 1. Recall that I i = ℓ represents case i ’s disease being caused by pathogen ℓ . Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 36 / 55

Background Models Regression Simulations Results Discussion Etiology Regression π ℓ ( X ) π ℓ ( X ) is the primary target of inference. 1. Recall that I i = ℓ represents case i ’s disease being caused by pathogen ℓ . 2. Occurs with probability π i ℓ that depends upon covariates. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 36 / 55

Background Models Regression Simulations Results Discussion Etiology Regression π ℓ ( X ) π ℓ ( X ) is the primary target of inference. 1. Recall that I i = ℓ represents case i ’s disease being caused by pathogen ℓ . 2. Occurs with probability π i ℓ that depends upon covariates. 3. Over-parameterized multinomial logistic regression: π i ℓ = π ℓ ( X i ) = exp { φ ℓ ( X i ) } / � L ℓ ′ =1 exp { φ ℓ ′ ( X i ) } , ℓ = 1 , ..., L , where φ ℓ ( X i ) − φ L ( X i ) is the log odds of case i in disease class ℓ relative to L : log π i ℓ /π iL . Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 36 / 55

Background Models Regression Simulations Results Discussion Etiology Regression π ℓ ( X ) π ℓ ( X ) is the primary target of inference. 1. Recall that I i = ℓ represents case i ’s disease being caused by pathogen ℓ . 2. Occurs with probability π i ℓ that depends upon covariates. 3. Over-parameterized multinomial logistic regression: π i ℓ = π ℓ ( X i ) = exp { φ ℓ ( X i ) } / � L ℓ ′ =1 exp { φ ℓ ′ ( X i ) } , ℓ = 1 , ..., L , where φ ℓ ( X i ) − φ L ( X i ) is the log odds of case i in disease class ℓ relative to L : log π i ℓ /π iL . 4. Without specifying a baseline category, we treat all the disease classes symmetrically which simplifies prior specification. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 36 / 55

Background Models Regression Simulations Results Discussion Etiology Regression π ℓ ( X ) π ℓ ( X ) is the primary target of inference. 1. Recall that I i = ℓ represents case i ’s disease being caused by pathogen ℓ . 2. Occurs with probability π i ℓ that depends upon covariates. 3. Over-parameterized multinomial logistic regression: π i ℓ = π ℓ ( X i ) = exp { φ ℓ ( X i ) } / � L ℓ ′ =1 exp { φ ℓ ′ ( X i ) } , ℓ = 1 , ..., L , where φ ℓ ( X i ) − φ L ( X i ) is the log odds of case i in disease class ℓ relative to L : log π i ℓ /π iL . 4. Without specifying a baseline category, we treat all the disease classes symmetrically which simplifies prior specification. ℓ ) = � p 1 5. Additive models for φ ℓ ( x ; Γ π j =1 f π ℓ j ( x j ; β π x ⊤ γ π ℓ j ) + � ℓ Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 36 / 55

Background Models Regression Simulations Results Discussion Etiology Regression π ℓ ( X ) π ℓ ( X ) is the primary target of inference. 1. Recall that I i = ℓ represents case i ’s disease being caused by pathogen ℓ . 2. Occurs with probability π i ℓ that depends upon covariates. 3. Over-parameterized multinomial logistic regression: π i ℓ = π ℓ ( X i ) = exp { φ ℓ ( X i ) } / � L ℓ ′ =1 exp { φ ℓ ′ ( X i ) } , ℓ = 1 , ..., L , where φ ℓ ( X i ) − φ L ( X i ) is the log odds of case i in disease class ℓ relative to L : log π i ℓ /π iL . 4. Without specifying a baseline category, we treat all the disease classes symmetrically which simplifies prior specification. ℓ ) = � p 1 5. Additive models for φ ℓ ( x ; Γ π j =1 f π ℓ j ( x j ; β π x ⊤ γ π ℓ j ) + � ℓ 5a. Use B-spline basis expansion to approximate f π ℓ j ( · ) and use P-spline for estimating smooth functions. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 36 / 55

Background Models Regression Simulations Results Discussion Etiology Regression π ℓ ( X ) π ℓ ( X ) is the primary target of inference. 1. Recall that I i = ℓ represents case i ’s disease being caused by pathogen ℓ . 2. Occurs with probability π i ℓ that depends upon covariates. 3. Over-parameterized multinomial logistic regression: π i ℓ = π ℓ ( X i ) = exp { φ ℓ ( X i ) } / � L ℓ ′ =1 exp { φ ℓ ′ ( X i ) } , ℓ = 1 , ..., L , where φ ℓ ( X i ) − φ L ( X i ) is the log odds of case i in disease class ℓ relative to L : log π i ℓ /π iL . 4. Without specifying a baseline category, we treat all the disease classes symmetrically which simplifies prior specification. ℓ ) = � p 1 5. Additive models for φ ℓ ( x ; Γ π j =1 f π ℓ j ( x j ; β π x ⊤ γ π ℓ j ) + � ℓ 5a. Use B-spline basis expansion to approximate f π ℓ j ( · ) and use P-spline for estimating smooth functions. x is the subvector of the predictors x ; Γ π ℓ = ( β π ℓ j , γ π 5b. � ℓ ). Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 36 / 55

Background Models Regression Simulations Results Discussion P 0 : Multivariate binary regression for controls Desirable properties Model Specification: • Model space large enough for complex conditional dependence of M given covariates W • Upward compatibility, or reproducibility (invariant parameter interpretation with increasing dimensions or complex patterns of missing responses) Estimation: • Adaptivity: regularization to adapt to the difficulty of the problem, e.g., model residual dependence [ M | W , I = 0] only if necessary; model the effect of covariates only if necessary Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 37 / 55

Background Models Regression Simulations Results Discussion Let P 0 depend on W i Regression model for controls • The pmf for controls’ measurements: Pr ( M i = m | W i , I i = 0) = � K k =1 ν k ( W i )Π( m ; Ψ k ), Ψ k = ( ψ (1) k , . . . , ψ ( J ) k ) ′ Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 38 / 55

Background Models Regression Simulations Results Discussion Let P 0 depend on W i Regression model for controls • The pmf for controls’ measurements: Pr ( M i = m | W i , I i = 0) = � K k =1 ν k ( W i )Π( m ; Ψ k ), Ψ k = ( ψ (1) k , . . . , ψ ( J ) k ) ′ • The vector ( ν 1 ( W i ) , . . . , ν K ( W i )) lies in a ( K − 1)-simplex Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 38 / 55

Background Models Regression Simulations Results Discussion Let P 0 depend on W i Regression model for controls • The pmf for controls’ measurements: Pr ( M i = m | W i , I i = 0) = � K k =1 ν k ( W i )Π( m ; Ψ k ), Ψ k = ( ψ (1) k , . . . , ψ ( J ) k ) ′ • The vector ( ν 1 ( W i ) , . . . , ν K ( W i )) lies in a ( K − 1)-simplex • Π( m ; s ) = � J j =1 { s j } m ij (1 − s j ) 1 − m ij Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 38 / 55

Background Models Regression Simulations Results Discussion Let P 0 depend on W i Regression model for controls • The pmf for controls’ measurements: Pr ( M i = m | W i , I i = 0) = � K k =1 ν k ( W i )Π( m ; Ψ k ), Ψ k = ( ψ (1) k , . . . , ψ ( J ) k ) ′ • The vector ( ν 1 ( W i ) , . . . , ν K ( W i )) lies in a ( K − 1)-simplex • Π( m ; s ) = � J j =1 { s j } m ij (1 − s j ) 1 − m ij • An equivalent generative process: Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 38 / 55

Background Models Regression Simulations Results Discussion Let P 0 depend on W i Regression model for controls • The pmf for controls’ measurements: Pr ( M i = m | W i , I i = 0) = � K k =1 ν k ( W i )Π( m ; Ψ k ), Ψ k = ( ψ (1) k , . . . , ψ ( J ) k ) ′ • The vector ( ν 1 ( W i ) , . . . , ν K ( W i )) lies in a ( K − 1)-simplex • Π( m ; s ) = � J j =1 { s j } m ij (1 − s j ) 1 − m ij • An equivalent generative process: sample subclass indicator : Z i | W i ∼ Categorical K ( ν ( W i )) M ij | Z i = k ∼ Bernoulli( ψ ( j ) generate measurements : k ) , independently for j = 1 , ..., J . Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 38 / 55

Background Models Regression Simulations Results Discussion Let P 0 depend on W i Regression model for controls Stick-breaking parametrization of weight functions ν k ( W i ) = P ( Z i = k | W i ) by � ik ) � g ( α ν s < k { 1 − g ( α ν is ) } , if k < K , h k ( W i ; Γ ν k ) = � � �� s < k { 1 − g ( α ν is ) } , if k = K , stick k Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 39 / 55

Background Models Regression Simulations Results Discussion Let P 0 depend on W i Regression model for controls Stick-breaking parametrization of weight functions ν k ( W i ) = P ( Z i = k | W i ) by � ik ) � g ( α ν s < k { 1 − g ( α ν is ) } , if k < K , h k ( W i ; Γ ν k ) = � � �� s < k { 1 − g ( α ν is ) } , if k = K , stick k g ( · ) = 1 / (1 + exp {− ( · ) } ) Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 39 / 55

Background Models Regression Simulations Results Discussion Let P 0 depend on W i Regression model for controls Stick-breaking parametrization of weight functions ν k ( W i ) = P ( Z i = k | W i ) by � ik ) � g ( α ν s < k { 1 − g ( α ν is ) } , if k < K , h k ( W i ; Γ ν k ) = � � �� s < k { 1 − g ( α ν is ) } , if k = K , stick k g ( · ) = 1 / (1 + exp {− ( · ) } ) . We specify α ν ik via additive models: q 1 � kj ) + � α ν f kj ( W ij ; β ν W ⊤ i γ ν ik = µ k 0 + k , k = 1 , . . . , K − 1 . j =1 Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 39 / 55

Background Models Regression Simulations Results Discussion Let P 0 depend on W i Regression model for controls Stick-breaking parametrization of weight functions ν k ( W i ) = P ( Z i = k | W i ) by � ik ) � g ( α ν s < k { 1 − g ( α ν is ) } , if k < K , h k ( W i ; Γ ν k ) = � � �� s < k { 1 − g ( α ν is ) } , if k = K , stick k g ( · ) = 1 / (1 + exp {− ( · ) } ) . We specify α ν ik via additive models: q 1 � kj ) + � α ν f kj ( W ij ; β ν W ⊤ i γ ν ik = µ k 0 + k , k = 1 , . . . , K − 1 . j =1 Expand the smooth functions by B-spline bases with coefficients β ν kj ; � w is a subvector of covariates w Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 39 / 55

Background Models Regression Simulations Results Discussion Adaptivity Considerations � Proposed Model • Prevent overfitting when the regression is easy, and improve interpretability • We a priori place substantial probabilities on models with the following two features: a) Few subclasses with effective weights (in the sense that ν k ( · ) is bounded away from 0 and 1): a novel additive half-Cauchy prior for µ k 0 . b) Smooth weight regression curves ν k ( · ): by Bayesian Penalized-Splines (P-Splines) combined with mixture priors on spline coefficients to sensitively distinguish constant α ν k ( · ) from flexible smooth curves Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 40 / 55

Background Models Regression Simulations Results Discussion On Consideration a) “Uniform Shrinkage over Simplex” for ν k ( W ) Proposed Model Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 41 / 55

Background Models Regression Simulations Results Discussion On Consideration a) “Uniform Shrinkage over Simplex” for ν k ( W ) Proposed Model • We let µ k 0 = � k j =1 µ ∗ j 0 , µ ∗ j 0 > 0. A large µ k 0 for a large k . Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 41 / 55

Background Models Regression Simulations Results Discussion On Consideration a) “Uniform Shrinkage over Simplex” for ν k ( W ) Proposed Model • We let µ k 0 = � k j =1 µ ∗ j 0 , µ ∗ j 0 > 0. A large µ k 0 for a large k . • µ k 0 increases with k : making the stick-breaking a priori more likely to stop for a large k Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 41 / 55

Background Models Regression Simulations Results Discussion On Consideration a) “Uniform Shrinkage over Simplex” for ν k ( W ) Proposed Model • We let µ k 0 = � k j =1 µ ∗ j 0 , µ ∗ j 0 > 0. A large µ k 0 for a large k . • µ k 0 increases with k : making the stick-breaking a priori more likely to stop for a large k • We specify the prior distributions for µ ∗ j 0 to be heavy-tailed: µ ∗ j 0 ∼ Cauchy + (0 , s j ) , j = 1 , . . . , K , Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 41 / 55

Background Models Regression Simulations Results Discussion On Consideration a) “Uniform Shrinkage over Simplex” for ν k ( W ) Proposed Model • We let µ k 0 = � k j =1 µ ∗ j 0 , µ ∗ j 0 > 0. A large µ k 0 for a large k . • µ k 0 increases with k : making the stick-breaking a priori more likely to stop for a large k • We specify the prior distributions for µ ∗ j 0 to be heavy-tailed: µ ∗ j 0 ∼ Cauchy + (0 , s j ) , j = 1 , . . . , K , • A large s k produces a large µ ∗ k 0 and helps stop the stick-breaking at class k . Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 41 / 55

Background Models Regression Simulations Results Discussion On Consideration a) “Uniform Shrinkage over Simplex” for ν k ( W ) Proposed Model • We let µ k 0 = � k j =1 µ ∗ j 0 , µ ∗ j 0 > 0. A large µ k 0 for a large k . • µ k 0 increases with k : making the stick-breaking a priori more likely to stop for a large k • We specify the prior distributions for µ ∗ j 0 to be heavy-tailed: µ ∗ j 0 ∼ Cauchy + (0 , s j ) , j = 1 , . . . , K , • A large s k produces a large µ ∗ k 0 and helps stop the stick-breaking at class k . • Encourages using a small number of effective classes ( < K ) to approximate the observed 2 J probability contingency table in finite samples Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 41 / 55

Background Models Regression Simulations Results Discussion Inference of ν k ( x ) at three hyperparameter values s j Simulation: with a single continuous covariate; “—”: truth, “—”: posterior samples X-axis: covariate values Y-axis: weight; 0 to 1. Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 42 / 55

Background Models Regression Simulations Results Discussion Let P 1 depend on X and W Subclass Weight Regression: For Cases Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 43 / 55

Background Models Regression Simulations Results Discussion Let P 1 depend on X and W Subclass Weight Regression: For Cases The pmf for cases’ measurements: Pr ( M i = m ) = � L � K ℓ =1 π i ℓ k =1 η ik Π( M i ; p k ℓ ) Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 43 / 55

Background Models Regression Simulations Results Discussion Let P 1 depend on X and W Subclass Weight Regression: For Cases The pmf for cases’ measurements: Pr ( M i = m ) = � L � K ℓ =1 π i ℓ k =1 η ik Π( M i ; p k ℓ ) • p k ℓ = { p ( j ) k ℓ , j = 1 , . . . , J } are positive rates for J measurements in subclass k of disease class ℓ : � � I { j = ℓ } � � 1 − I { j = ℓ } p ( j ) θ ( j ) ψ ( j ) k ℓ = · k k Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 43 / 55

Background Models Regression Simulations Results Discussion Let P 1 depend on X and W Subclass Weight Regression: For Cases The pmf for cases’ measurements: Pr ( M i = m ) = � L � K ℓ =1 π i ℓ k =1 η ik Π( M i ; p k ℓ ) • p k ℓ = { p ( j ) k ℓ , j = 1 , . . . , J } are positive rates for J measurements in subclass k of disease class ℓ : � � I { j = ℓ } � � 1 − I { j = ℓ } p ( j ) θ ( j ) ψ ( j ) k ℓ = · k k • Equals the TPR θ ( j ) for a causative pathogen and the FPR ψ ( j ) k k otherwise Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 43 / 55

Background Models Regression Simulations Results Discussion Let P 1 depend on X and W Subclass Weight Regression: For Cases The pmf for cases’ measurements: Pr ( M i = m ) = � L � K ℓ =1 π i ℓ k =1 η ik Π( M i ; p k ℓ ) • p k ℓ = { p ( j ) k ℓ , j = 1 , . . . , J } are positive rates for J measurements in subclass k of disease class ℓ : � � I { j = ℓ } � � 1 − I { j = ℓ } p ( j ) θ ( j ) ψ ( j ) k ℓ = · k k • Equals the TPR θ ( j ) for a causative pathogen and the FPR ψ ( j ) k k otherwise • Subclass weight regression η k ( W ) is also specified via stick-breaking: η ik = h k ( W i ; Γ η k ), k = 1 , . . . , K − 1 Zhenke Wu( zhenkewu@umich.edu ) 2019 TAMU 43 / 55

Regression Analysis for Probabilistic Cause-of-Disease Assignment - PowerPoint PPT Presentation

Background Models Regression Simulations Results Discussion Regression Analysis for Probabilistic Cause-of-Disease Assignment Using Case-Control Diagnostic Tests Zhenke Wu Assistant Professor of Biostatistics Research Assistant Professor

Linear regression How to measure the accuracy of linear regression models Linear Regression

Root Cause Analysis 1 Root Cause Analysis Root Cause Analysis is a method that is used to

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Wake Up to Lyme What is Lyme Disease? Risk of Lyme Disease Preventing Lyme Disease

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Analysis of variance and regression Other types of regression models Other types of regression

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Class 1: Introduction and OT Basics Adam Albright (albright@mit.edu) LSA 2017 Phonology

Labscape Goal: Simplify lab work by making information available where it is needed and by

P honotactics Darrell Larsen Linguistics 101 Phonotactics A Note on Foreign Accents O utline 1

the discovery of the Indo-Europeans is one of the most fascinating and important stories in

flowBin: A Complete Pipeline for Feature Extraction and Classification of Multi-tube Flow

Vi Visual ualizing ng Data a for Anal Analysis and and Communi unication Anamaria Crisan

Values and Preferences Pneumonia 1. Recommend treatments shown to improve clinical outcomes

A year in review in community-acquired respiratory tract infections Paul M. Tulkens, MD, PhD *

Sambuz

Useful Links

Newsletter

Mail Us

Regression Analysis for Probabilistic Cause-of-Disease Assignment - PowerPoint PPT Presentation

Background Models Regression Simulations Results Discussion Regression Analysis for Probabilistic Cause-of-Disease Assignment Using Case-Control Diagnostic Tests Zhenke Wu Assistant Professor of Biostatistics Research Assistant Professor

Linear regression How to measure the accuracy of linear regression models Linear Regression

Root Cause Analysis 1 Root Cause Analysis Root Cause Analysis is a method that is used to

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Wake Up to Lyme What is Lyme Disease? Risk of Lyme Disease Preventing Lyme Disease

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Analysis of variance and regression Other types of regression models Other types of regression

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Class 1: Introduction and OT Basics Adam Albright (albright@mit.edu) LSA 2017 Phonology

Labscape Goal: Simplify lab work by making information available where it is needed and by

P honotactics Darrell Larsen Linguistics 101 Phonotactics A Note on Foreign Accents O utline 1

the discovery of the Indo-Europeans is one of the most fascinating and important stories in

flowBin: A Complete Pipeline for Feature Extraction and Classification of Multi-tube Flow

Vi Visual ualizing ng Data a for Anal Analysis and and Communi unication Anamaria Crisan

Values and Preferences Pneumonia 1. Recommend treatments shown to improve clinical outcomes

A year in review in community-acquired respiratory tract infections Paul M. Tulkens, MD, PhD *

Sambuz

Useful Links

Newsletter

Mail Us

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and