Multivariate Data Analysis in Omics Research Diverging Alternative - PowerPoint PPT Presentation

Multivariate Data Analysis in Omics Research Diverging Alternative Splicing Fingerprints Identified in Thoracic Aortic Aneurysm Sanela Kjellqvist, PhD WABI RNAseq course 2017-11-08

Outline • Why multivariate data analysis? • Multivariate statistics – Different analyses – Data preprocessing • Alternative splicing in thoracic aortic aneurysm – Thoracic aortic aneurysm – Study setup – Aim of the study – Results – Summary • Today’s exercise

WHY MULTIVARIATE DATA ANALYSIS?

Development of Classical Statistics – 1930s Assumptions: Multiple regression • Canonical correlation • Independent X variables • Linear discriminant analysis • Analysis of variance • Many more observations than • variables K Regression analysis one Y at a • time Tables are long and lean No missing data • N

Today’s data RNASeq, Array, LC-MS/MS, GC/MS or • NMR data Problems • Many variables – Few observations K – – Noisy data – Missing data Multiple responses – Implications • N – High degree of correlation – Difficult to analyse with conventional methods Data ¹ Information • – Need ways to extract information from the data Need reliable, predictive – information – Ignore random variation (noise)

Poor Methods of Data Analysis Plot pairs of variables Select a few variables and use MLR • • – Tedious, impractical – Throwing away information – Risk of spurious correlations – Assumes no ‘noise’ in X – Risk of missing information – One Y at a time X 1 X 2 X 3 Y 1 Y 2 Y 3

A Better Way... • Multivariate analysis by Projection – Looks at ALL the variables together – Avoids loss of information – Finds underlying trends = “latent variables” – More stable models

Fundamental Data Analysis Objectives Overview Discrimination Regression Trends Discriminating Comparing blocks of between groups omics data Outliers Biomarker candidates Metab vs Proteomic vs Quality Control Genomic Comparing studies or Biological Diversity instrumentation Omic vs medical Patient Monitoring Prediction

MULTIVARIATE STATISTICS

Different methods • Principal component analysis (PCA) • Partial least squares to latent structures analysis (PLS) • Orthogonal partial least squares to latent structures analysis (OPLS) • PLS-DA • OPLS-DA • K-means clustering • Hierarchical clustering • Biplot analysis • Canonical correlation analysis

What is a projection? Principal component analysis (PCA) Algebraically • – Summarizes the information in the observations as a few new (latent) variables Geometrically • – The swarm of points in a K dimensional space (K = number of variables) is approximated by a (hyper)plane and the points are projected on that plane.

PCA - Geometric Interpretation x 3 Fit first principal component (line describing maximum variation) t 1 Add second component (accounts for next largest amount of variation) and is at right angles to first - orthogonal t 2 x 2 x 1 Each component goes through origin 12

PCA - Geometric Interpretation x 3 t1 t2 K Comp 1 X N “Distance to Model” Comp 2 Points are projected down onto a plane with co-ordinates t1, t2 x 2 x 1 13

Loadings x 3 t1 t2 K Comp 1 X N α 3 α 2 How do the principal components relate to the x 2 original variables? α 1 Look at the angles between PCs and variable axes x 1 14

Loadings x 3 t1 t2 K Comp 1 X N p’ 1 cos(α 3 ) α 3 α 2 Take cos( α ) for each axis cos(α 2 ) x 2 Loadings vector p’ - one α 1 for each principal cos(α 1 ) component x 1 One value per variable 15

Principal component analysis (PCA) • PCA compress the X data block into A number of orthogonal components • Variation seen in the score vector t can be interpreted from the corresponding loading vector p 1…A P T 1…A X PCA T T +…+t A p A T +E = TP T + E PCA Model X = t 1 p 1 T + t 2 p 2

Recognition of molecular quasi-species (evolving units) in enzyme evolution by PCA Emrén, L., Kurtovic, S. , Runarsdottir, A., Larsson, A-K., & Mannervik, B. (2006) Proc Natl Acad Sci U S A, 103, 10866-10870 Kurtovic, S , & Mannervik B (2009) Biochemistry, 48, 9330-9339

Orthogonal partial least squares to latent structure – Discriminant analysis (OPLS-DA)

Orthogonal partial least squares to latent structure – Discriminant analysis (OPLS-DA) Y Class 1 X OPLS Class 2

OPLS with single Y / modelling and prediction ’Y-orthogonal’ ’Y-predictive’ 1… q 1 T p 1 T P O T 1 1 1 … 1 1 X y OPLS t 1 T O u 1 T + T O P O T + E X = t 1 p 1 OPLS Model Y = t 1 q T 1 + F

Data Preprocessing – Scaling PCA and other methods are scale dependent • Is the size of a variable important? – 1/SD X UV scaling ws • Scaling weight is 1/SD for each variable i.e. divide each variable by its standard deviation – Unit Variance Scaling • Variance of scaled variables = 1 • Many other kinds of scaling exist

Cross-Validation • Data are divided into G groups (default in SIMCA-P is 7) and a model is generated for the data devoid of one group The deleted group is predicted by the model Þ • partial PRESS (Predictive Residual Sum of Squares) • This is repeated G times and then all partial • PCA cross-validation is PRESS values are summed to form overall PRESS done in two phases and several deletion rounds: If a new component enhances the predictive • – first removal of power compared with the previous PRESS value observations (rows) then the new component is retained – then removal of variables (columns) 22

Model Diagnostics Fit or R 2 • – Residuals of matrix E pooled column-wise – Explained variation Stop when Q 2 starts to drop Prediction – For whole model or individual variables – RSS = Σ (observed - fitted) 2 Fit – R 2 = 1 - RSS / SSX Predictive Ability or Q 2 • – Leave out 1/7 th data in turn – ‘ Cross Validation ’ – Predict each missing block of data in turn – Sum the results – PRESS = Σ (observed - predicted) 2 – Q 2 = 1 – PRESS / SSX 23

Kurtovic , Paloschi, Folkersen, Gottfries, Franco-Cereceda, Eriksson (2011) Molecular Medicine, 17 ; 665-675 ALTERNATIVE SPLICING IN THORACIC AORTIC ANEURYSM

Thoracic aortic aneurysm (TAA) • Monogenic – Marfan syndrome – Loeys Dietz • Aneurysm associated with bicuspid aortic valve (BAV) • Idiopathic thoracic aortic aneurysm

Outline of the study Biopsies are collected from both • non-dilated and dilated aorta during valve replacement surgery and reconstruction of the dilated aorta respectively Media from ascending aorta • RNA • Affymetrix human exon 1.0 ST – microarrays (in this study 81 patients) RNAseq (30 patients) – Protein • HiRiEF iTRAQ LC-MS/MS – Non-dilated Dilated 2D gel electrophoresis followed by – iTRAQ LC-MS/MS

Aim of the study • Alternative splicing in transforming growth factor-β (TGFβ) signaling pathway • TGFβ pathway is known to be important in aortic aneurysm • Are there any alternatively spliced genes in the TGFβ pathway? • Is alternative splicing an important mechanism in thoracic aortic aneurysm (TAA)? • How do we analyze alternative splicing?

Affymetrix exon array design Exons Introns PSR – probe selection region

Preprocessing of data • Probe set core level • Unique hybridization target • Robust multichip average (RMA) normalized • Splice Index calculated (in case of exon level analysis) i = exon 𝑜 𝑗,𝑘,𝑙 = 𝑓 𝑗,𝑘 ,𝑙 j = sample 𝑕 𝑘,𝑙 k = gene e = exon signal g = gene signal • Unit variance scaled and mean centered data prior to MVA

Alternative splicing pattern in the TGFβ pathway is different between dilated and non-dilated aorta Non-supervised PCA Supervised OPLS-DA • TAV and BAV together • 81 patients included • 614 exons included • Good model • Good separation between the two groups

Alternative splicing pattern in the TGFβ pathway is different between dilated and non-dilated aorta Non-supervised PCA Supervised OPLS-DA • Only TAV patients • 29 patients included • 614 exons included • Good model • Good separation between the two groups

Alternative splicing pattern in the TGFβ pathway is different between dilated and non-dilated aorta Non-supervised PCA Supervised OPLS-DA Only BAV patients • 52 patients included • 614 exons included • Good model • Good separation between the two groups •

Alternatively spliced exons are present in both TAV and BAV groups of patients

Alternative splicing analysis of all exons in the human genome reveals the importance of TGFβ pathway exons

Gene expression patterns of differentially spliced genes

Summary TGFβ pathway exons clearly important according to an overall exon • level analysis Dilated and non-dilated aortas show different alternative splicing • patterns in dilated and non-dilated tissues with respect to TAV and BAV in TGFβ pathway Exons responsible for the diverging alternative splicing fingerprints in • TGFβ pathway identified • Implies that dilatation in TAV has different underlying molecular mechanisms compared to BAV patients • New methods for analyzing array data

Multivariate Data Analysis in Omics Research Diverging Alternative - PowerPoint PPT Presentation

Multivariate Data Analysis in Omics Research Diverging Alternative Splicing Fingerprints Identified in Thoracic Aortic Aneurysm Sanela Kjellqvist, PhD WABI RNAseq course 2017-11-08 Outline Why multivariate data analysis? Multivariate

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

PostgreSQL and Omics Data How omics data can be stored in postgres database Postgr tgreSQ eSQL

Integrating multi-omics Luciano Milanesi Outline Introduction Omics challenges Data

Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of Research in

Multi-Omics with Galaxy for Diverse Biological Applications Tim Griffin and Pratik Jagtap

Machine Learning Applications to Omics Data Kelly Ruggles April 9, 2018 Diversity of Omics in

Abou out t OM OMICS S Gr Grou oup OMICS Group International is an amalgamation of

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

High-dimensional omics data analysis using a variable screening protocol with prior knowledge

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Multivariate Analysis of Variance Max Turgeon STAT 4690Applied Multivariate Analysis Quick

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Reporting and Evaluation of Studies of Biomarkers and Omics-based Predictors: REMARK Guidelines

Human Factors Research Some OSU examples 1 Human Factors Research to Inform the Human-Machine

Meta-analysis of self-control study: Methods and associated application of METAN Dr. Robert

Immucor User Group Meeting Boca Raton, Florida May 12, 2016 Patricia Houtz BS, MT (ASCP) Martin

3/26/2015 Ebola: Past, Present, & Future Janet A. Jokela, MD, MPH, FACP, FIDSA Head, Department

Global Surgery: Assuring an Adequate Surgical Workforce Florida Chapter, American College of

Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems Jing Fu,

HPG Annual Report Dr Kathryn Cobain Public Health Consultant On Behalf of the Director of

Getting SMART with FHIR Grahame Grieve, Mark Braunstein, Michael Lawley, Brett Esler, Reuben

Sambuz

Useful Links

Newsletter

Mail Us

Multivariate Data Analysis in Omics Research Diverging Alternative - PowerPoint PPT Presentation

Multivariate Data Analysis in Omics Research Diverging Alternative Splicing Fingerprints Identified in Thoracic Aortic Aneurysm Sanela Kjellqvist, PhD WABI RNAseq course 2017-11-08 Outline Why multivariate data analysis? Multivariate

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

PostgreSQL and Omics Data How omics data can be stored in postgres database Postgr tgreSQ eSQL

Integrating multi-omics Luciano Milanesi Outline Introduction Omics challenges Data

Statistical analysis of meta-omics data Sandra Plancade INRA (French Institute of Research in

Multi-Omics with Galaxy for Diverse Biological Applications Tim Griffin and Pratik Jagtap

Machine Learning Applications to Omics Data Kelly Ruggles April 9, 2018 Diversity of Omics in

Abou out t OM OMICS S Gr Grou oup OMICS Group International is an amalgamation of

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

High-dimensional omics data analysis using a variable screening protocol with prior knowledge

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Multivariate Analysis of Variance Max Turgeon STAT 4690Applied Multivariate Analysis Quick

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Reporting and Evaluation of Studies of Biomarkers and Omics-based Predictors: REMARK Guidelines

Human Factors Research Some OSU examples 1 Human Factors Research to Inform the Human-Machine

Meta-analysis of self-control study: Methods and associated application of METAN Dr. Robert

Immucor User Group Meeting Boca Raton, Florida May 12, 2016 Patricia Houtz BS, MT (ASCP) Martin

3/26/2015 Ebola: Past, Present, &amp; Future Janet A. Jokela, MD, MPH, FACP, FIDSA Head, Department

Global Surgery: Assuring an Adequate Surgical Workforce Florida Chapter, American College of

Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems Jing Fu,

HPG Annual Report Dr Kathryn Cobain Public Health Consultant On Behalf of the Director of

Getting SMART with FHIR Grahame Grieve, Mark Braunstein, Michael Lawley, Brett Esler, Reuben

Sambuz

Useful Links

Newsletter

Mail Us

3/26/2015 Ebola: Past, Present, & Future Janet A. Jokela, MD, MPH, FACP, FIDSA Head, Department