Genomics, Transcriptomics and Proteomics in Clinical Research - PowerPoint PPT Presentation

Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Diagnostics Discovery of Therapeutic signatures Targets Genomic Data single biomarkers candidate targets Prognostic Factor Studies Insight in Pharmacological Axel Benner response to treatment Mechanisms toxicity pathway analysis German Cancer Research Center, Heidelberg, Germany survival Custom Drug Selection June 16, 2006 predictive factors for response/ resistance to certain therapy indicators of adverse events Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data Explanation vs. Prediction Large scale problems New biomolecular techniques: Target: Explanation Number of input variables (genes, clones, etc.): 1000s to Implies that there is some likelihood of a ”true” model 10,000s Model selection: few input variables are relevant Number of observations: 10s to 100s Occam’s razor: ’do not make more assumptions than needed’ → number of observations << number of input variables → more unknown parameters than estimation equations Target: Prediction → infinitely many solutions Statistical learning Model selection: quality of prediction Models can be fit perfectly to the data Topic: Large scale problems → no bias but high variance Use statistical learning methods to handle these problems! Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data

Statistical Learning Penalized maximum likelihood estimation Maximizing the log likelihood can result in fitting noise in the Control of Model Complexity data. Restriction methods A shrinkage approach will often result in estimates of the the class of functions of the input vectors is limited regression coefficients that, while biased, are lower in mean Selection methods squared error and are more close to the true parameters. constitute methods, which include only those basis functions of A good approach to shrinkage is penalized maximum the input vectors that contribute ‘significantly’ to the fit of the likelihood estimation (le Cessie & van Houwelingen, 1990). model A general form of penalized log likelihood is examples are variable selection methods, stepwise greedy approaches like boosting n d Regularization methods � � logL ( y i ; g ( x T i β )) − p λ ( | β j | ) restrict the coefficients of the model, e.g. ridge regression i =1 j =1 From the log-likelihood a so-called ‘penalty’ is subtracted, that discourages regression coefficients to become large. Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data Penalty functions Penalty functions Well-known penalty functions are L q -norm penalties: A good penalty function should result in a estimator with the p λ ( | θ | ) = λ | θ | q following three properties (Fan & Li, 2001): Unbiasedness: The resulting estimator is nearly unbiased when L 2 (Ridge regression) with thresholding rule the true unknown parameter is large to avoid excessive 1 estimation bias ˆ θ ( z ) = 1 + λ z Sparsity: Estimating a small coefficient as zero, to reduce → continuous, but biased and no sparse solutions model complexity L 1 (LASSO) with thresholding rule Continuity: The resulting estimator is continuous in the data to avoid instability in model prediction ˆ θ ( z ) = sgn ( z )( | z | − λ ) + → continuous and sparse, but no unbiased solutions Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data

Penalty functions Penalty functions Convex penalties (e.g. quadratic penalties) Related approaches make trade-offs between bias and variance Bridge regression (Frank & Friedman, 1993) which minimizes can create unnecessary biases when the true parameters are j β j x ij ) 2 subject to � d j =1 | β j | γ ≤ t with γ ≥ 0. � ( y i − β 0 − � large parsimonious models cannot be produced Nonconcave penalities Nonnegative garotte (Breiman, 1995), which minimizes j c j β j x ij ) 2 under the constraint � c j ≤ s select variables and estimate coefficients of variables � ( y i − β 0 − � simultaneously where { ˆ β j } are the full-model OLS coefficients. e.g. hard thresholding penalty (HARD, Antoniadis 1997) p λ ( | θ | ) = λ 2 − ( | θ | − λ ) 2 I ( | θ | < λ ) Elastic net (Zou & Hastie, 2005), where the penalty is a convex combination of the lasso and ridge penalty. with thresholding rule Relaxed Lasso (Meinshausen, 2005). ˆ θ = z · I ( | z | > λ ) Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data SCAD penalty Selected penalty and thresholding functions HARD Penalty LASSO Penalty SCAD Penalty 3.0 3.0 3.0 Smoothly Clipped Absolute Deviation (SCAD; Fan, 1997) 2.5 2.5 2.5 satisfies all three requirements (unbiasedness, sparsity, 2.0 2.0 2.0 continuity) 1.5 1.5 1.5 is defined by 1.0 1.0 1.0 0.5 0.5 0.5 � I ( | θ | ≤ λ ) + ( a λ − | θ | ) + � 0.0 0.0 0.0 p ′ λ ( | θ | ) = λ ( a − 1) λ I ( | θ | > λ ) , a > 2 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 λ = 1.5 λ = 1.5 λ = 1 10 10 10 with thresholding rule 5 5 5 0 0 0  sgn ( z )( | z | − λ ) + , | z | ≤ 2 λ  ˆ θ ( z ) = { ( a − 1) z − sgn ( z ) a λ } / ( a − 2) , 2 λ < | z | ≤ a λ −5 −5 −5 z , | z | > a λ  −10 −10 −10 −10 −5 0 5 10 −10 −5 0 5 10 −10 −5 0 5 10 z z z Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data

SCAD Penalty Penalized proportional hazards regression Penalized partial likelihood SCAD improves the LASSO via reducing estimation bias. d � l ( β ) − p λ ( | β j | ) → max β SCAD possesses an oracle property: j =1 the true regression coefficients that are zero are automatically with estimated as zero, and the remaining coefficients are estimated N � � [ x T exp( x T ( k ) β − log { i β ) } ] . l ( β ) = as well as if the correct submodel were known in advance. k =1 i ∈ R k where n = number of observations, Hence, SCAD is an ideal procedure for variable selection, at N = number of events, least from theoretical point of view. R k = risk set for event k , k = 1 , ..., N . Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data SCAD Regression SCAD Regression: Local quadratic approximation for p λ ( β ) λ β SCAD Regression (Fan & Li, 2002) Fan & Li, 2002 Fan & Li, 2002 3 Use ’LQA’, local quadratic approximation for β close to β 0 , 2 l ( β 0 )+ ∇ l ( β 0 ) T ( β − β 0 )+ 1 2 ( β − β 0 ) T ∇ 2 l ( β 0 )( β − β 0 ) − n 1 2 β T Σ λ ( β 0 ) β penalty with Σ λ ( β 0 ) = diag { p ′ λ ( | β 10 | ) / | β 10 | , ..., p ′ λ ( | β d 0 | ) / | β d 0 |} 1 Solve quadratic maximization problem by Newton-Raphson algorithm 0 β 1 = β 0 − [ ∇ 2 l ( β 0 ) − n Σ λ ( β 0 )] − 1 [ ∇ l ( β 0 ) − n Σ λ ( β 0 ) β 0 ] -4 -2 0 2 4 β Estimate covariance matrix by sandwich formula β ≈ β + ′ β β β − β β ≈ β cov (ˆ β 1 ) = [ ∇ 2 l (ˆ β 1 ) − n Σ λ (ˆ β 1 )] − 1 cov ( ∇ l (ˆ β 1 ))[ ∇ 2 l (ˆ β 1 ) − n Σ λ (ˆ β 1 )] − 1 ( β 2 j − β 2 � p ′ � p λ ( | β j | ) ≈ p λ ( | β j 0 | ) + 1 / 2 λ ( | β j 0 | ) / | β j 0 | j 0 ) for β j ≈ β j 0 λ λ λ Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data

Genomics, Transcriptomics and Proteomics in Clinical Research - PowerPoint PPT Presentation

Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Diagnostics Discovery of Therapeutic signatures Targets Genomic Data single biomarkers candidate targets Prognostic Factor Studies

Genomics Genomics extravaganza extravaganza Genomics Genomics overview overview Genomics

Melbourne Genomics Establishing data governance in clinical genomics Ian Pham Data Governance

Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch EMBnet

Quality control of proteomics data IBIP19: Integrative Biological Interpretation using Proteomics

What is proteomics good for? IBIP19: Integrative Biological Interpretation using Proteomics with

1 Genome Transcriptome Proteome Metabolome Genome: the complete set of hereditary material

1. Integration of proteomics and transcriptomics data to model the dynamics of gene expression The

Phenotype Sequencing Marc Harper UCLA Bioinformatics, Genomics and Proteomics March 4th, 2013

clinical genomics Melbourne Genomics Health Alliance Melbourne Genomics Health Alliance Medical

Melbourne Genomics Data and technology to support and enable genomics Kate Birch Data &

Genomics extravaganza Genomics overview Genomics analysis of the structure and function of very

Outline Part 1 Introduction to Genomics Part 2 Visual Design for Genomics Part 3 Hands-On

Single-cell transcriptomics (scRNA-seq) Eukaryotic Single Cell Genomics facility Applications for

Proteomics and Protein Mass Proteomics and Protein Mass Spectrometry 2004 Spectrometry 2004

Principles and Applications of Proteomics Overview Why Proteomics? 2-DE Sample

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Chapter 1 Rationale for Survival Analysis Time-to-event data have as principal end- point the

2019 Philmont Expeditions Parents & Participants Orientation Chester County Council High

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182 LC-MS Maps Peptide 2 I

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Mario

Overcoming Barriers to Access to Medicines and Health T echnologies for Cancer Stronger health

Gene Expression Microarray 02-223 How to Analyze Your Own

PUTTING IT ALL TOGETHER: CASE STUDIES I have nothing to disclose. Tiffany Kim, MD Assistant

Brookings Roundtable Webinar: Mini Sentinel Accomplishments and Plans for Year 2 January 31,

Sambuz

Useful Links

Newsletter

Mail Us

Genomics, Transcriptomics and Proteomics in Clinical Research - PowerPoint PPT Presentation

Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Diagnostics Discovery of Therapeutic signatures Targets Genomic Data single biomarkers candidate targets Prognostic Factor Studies

Genomics Genomics extravaganza extravaganza Genomics Genomics overview overview Genomics

Melbourne Genomics Establishing data governance in clinical genomics Ian Pham Data Governance

Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch EMBnet

Quality control of proteomics data IBIP19: Integrative Biological Interpretation using Proteomics

What is proteomics good for? IBIP19: Integrative Biological Interpretation using Proteomics with

1 Genome Transcriptome Proteome Metabolome Genome: the complete set of hereditary material

1. Integration of proteomics and transcriptomics data to model the dynamics of gene expression The

Phenotype Sequencing Marc Harper UCLA Bioinformatics, Genomics and Proteomics March 4th, 2013

clinical genomics Melbourne Genomics Health Alliance Melbourne Genomics Health Alliance Medical

Melbourne Genomics Data and technology to support and enable genomics Kate Birch Data &amp;

Genomics extravaganza Genomics overview Genomics analysis of the structure and function of very

Outline Part 1 Introduction to Genomics Part 2 Visual Design for Genomics Part 3 Hands-On

Single-cell transcriptomics (scRNA-seq) Eukaryotic Single Cell Genomics facility Applications for

Proteomics and Protein Mass Proteomics and Protein Mass Spectrometry 2004 Spectrometry 2004

Principles and Applications of Proteomics Overview Why Proteomics? 2-DE Sample

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Chapter 1 Rationale for Survival Analysis Time-to-event data have as principal end- point the

2019 Philmont Expeditions Parents &amp; Participants Orientation Chester County Council High

L14 Mass Spec Quantitation MS applications Microarray analysis CSE182 LC-MS Maps Peptide 2 I

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis Mario

Overcoming Barriers to Access to Medicines and Health T echnologies for Cancer Stronger health

Gene Expression Microarray 02-223 How to Analyze Your Own

PUTTING IT ALL TOGETHER: CASE STUDIES I have nothing to disclose. Tiffany Kim, MD Assistant

Brookings Roundtable Webinar: Mini Sentinel Accomplishments and Plans for Year 2 January 31,

Sambuz

Useful Links

Newsletter

Mail Us

Melbourne Genomics Data and technology to support and enable genomics Kate Birch Data &

2019 Philmont Expeditions Parents & Participants Orientation Chester County Council High