A19 Research Internship Results Charlie Cloutier-Langevin & - PowerPoint PPT Presentation

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion A19 Research Internship Results Charlie Cloutier-Langevin & Julien Corriveau-Trudel Universit´ e de Sherbrooke Tuesday, December 10th 2019 Supervisors : F´ elix Camirand Lemyre, Alan A. Cohen, Nancy Presse Collaborators : V´ eronique Legault, Val´ erie Turcot, Alistair Senior

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Introduction Context New approach to study aging through Physiological Dysregulation (Phys. Dys.) with the Mahalanobis distance [4][5][9] Advent of the NuAge Dataset Task Study the potential relationship between nutrients intake of an individual and the deviance of his or her biological profile from a reference population.

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion NuAge Dataset The NuAge dataset in numbers: 1 754 elderly women and men, from age 68 to 81; 6 586 visits, between 1 and 4 visits per person; 23 186 24h recalls, 1 to 3 recalls per timepoints, for 5 timepoints; 188 medical variables and 43 nutritional variables; 364 421 missing values out of 1 238’168 entries (29.4%); Each year, a set of of biological, nutritional, functional, medical, and social traits is measured for each participant.

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Physiological systems considered Different physiological systems considered: 1 Oxygen Transport 2 Liver/Kidney functions 3 Hematopoiesis 4 Micronutrients 5 Lipids System information comes from a previous study on how to regroup these biomarkers and the effects of using different subsets of biomarkers [4]. Global Phys. Dys. score has been computed, which is the sum of the Phys. Dys. values of all systems.

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Table of contents Transformation to normality 1 Longitudinal imputation 2 Intrapolation Extrapolation Results Clustering 3 Measurement error and regression 4 Additive Error Model CoCoLasso Deconvolution Nonparametric Regression Conclusion 5

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation to normality

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation for normality Statistical methods + normality = Better performances Classic transformations As best transform provided by V´ eronique Legault Examples: sqrt(), log(), exp() Parametric transformation methods Provide an accurate and simplified process for transformation. Parametric transformation methods BoxCox[1] Yeo-Johnson[12] Manly[8]

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion BoxCox Transformation Best transformation? ... = ⇒ BoxCox transformation! BoxCox transformation (Box & Cox, 1964) Parametric power transformation Strictly positive observation values λ ǫ [-5, 5] Defined as: � y λ − 1 λ , if λ � = 0 y ( λ ) := ln ( y ), if λ = 0

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion BoxCox Transformation Since the BoxCox transformation requires strictly positive data, Box & Cox proposed a shifted modification. Shifted BoxCox transformation Parametric power transformation λ 1 ǫ [-5, 5] New shifting parameters λ 2 Defined as: � ( y i + λ 2 ) λ 1 − 1 , if λ 1 � = 0 y ( λ ) λ 1 := i ln ( y i + λ 2 ), if λ 1 = 0 Remark Shift the data by λ 2 = 1 does not impact the result because it will not impact the variable distribution

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Let’s compare best transform and BoxCox for 2 examples EXAMPLE 1 Biomarker name: Creatinine Name in data set: CREAT Best transform applied: log(x) 1 BoxCox λ applied: λ = -0.5 (equivalent to √ x )

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) No transformation histogram: Right skewed

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) Best transform transformation histogram: Approaching normality

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) BoxCox transformation histogram: Almost normal

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) No transformation Q-Q plot: No line shape

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) Best transform transformation Q-Q plot: Ends does not fit the line

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) BoxCox transformation Q-Q plot: Approaching a line shape

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Creatinine (CREAT) Shapiro-Wilk [11] normality test comparison: H0: Data are from a normally distributed population No transformation: p-value < 2.2e-16 = ⇒ Reject H0 Best transform: p-value = 3.844e-15 = ⇒ Reject H0 BoxCox: p-value = 0.004968 = ⇒ Reject H0

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison EXAMPLE 2 Biomarker name: Weight Name in data set: weight Best transform applied: x (no transformation) BoxCox λ applied: λ = 0.1

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight No transformation (& Best transform ) histogram: Right skewed

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight BoxCox transformation histogram: Almost normal

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight No transformation (& Best transform Q-Q plot: No line shape

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight BoxCox transformation Q-Q plot: Almost a line shape

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison Weight Shapiro-Wilk [11] normality test comparison: H0: Data are from a normally distributed population No transformation: p-value < 2.2e-16 = ⇒ Reject H0 BoxCox: p-value = 0.005793 = ⇒ Reject H0

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Transformation examples and comparison General results 119 continuous biomarker variables transformed Average difference of λ 1 between Best transform and BoxCox = 0.7831933 The only biomarker that was not transformed by BoxCox is lipids tot , compare to 75 for Best transform Limitations In most case, we still have to reject normality. BoxCox search for the best power transformation in NuAge data set, it could vary on other data sets . Not necessarily the best results for every variable, but the best overall

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Imputation

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion What is imputation and why impute data? What is imputation? Imputation is the act of substituting missing values. Replacing NA s by a plausible value. Why imputation? Having more data means stronger statistical model, since data imputed > data initial , and a consistent statistical model is one that is ”stronger” when n → ∞ .

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Beware naive imputation Risk : Introducing bias to subsequent statistical estimations. Mean imputation Imputing variable X with the mean of non missing values of X attenuates correlation in the data. Linear regression imputation On the opposite, imputing with linear regression based on non missing values of X will strengthen correlation.

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion Imputation of NuAge biomarker dataset In NuAge biomarkers: 1 238 168 entries, with 364 421 of them missing. In the light of our objectives: Conservative approach ⇒ only impute the necessary, without negatively impacting the computation of Mahalanobis Distance (MHBD).

A19 Research Internship Results Charlie Cloutier-Langevin & - PowerPoint PPT Presentation

Transformation to normality Longitudinal imputation Clustering Measurement error and regression Conclusion A19 Research Internship Results Charlie Cloutier-Langevin & Julien Corriveau-Trudel Universit e de Sherbrooke Tuesday, December

PRESENTATION OUTLINE Internship Process Internship Goals Internship Requirements

Student internship program as Student internship program as Student internship program as

Dietetic Internship Meet the Dietetic Internship Faculty Susan Roberts, MS, RDN, LD, CNSC Ashley

Master of Information Management Internship INFM 736 Review the Internship Description:

Internship Orientation Summer 2013 Welcome and Congratulations! Getting the most out of your

COLLARTS SOURCING REMOTE INTERNSHIPS WHAT IS A REMOTE INTERNSHIP? COLLARTS REMOTE INTERNSHIPS

CADET EMILIANO GONZALEZ, USMA 2018 HOMELAND SECURITY INTERNSHIP RESEARCH PRESENTATION 2018

TRACKING CODE FOR COMET PHASE I CYDET DETECTOR BASED ON GENEFIT 2 Internship report Research

GPS INTERNSHIP SUMMER 2018 Malia Martin Goal of the internship Introduction into the dairy

ADCN COUNSELING INTERNSHIP PREREQUISITES 48-HR. ADDICTION COUNSELING o Internship: ADCN 699

Extreme MAKEOVER College of Life Sciences Internship Online Management System

Seed Central Grand Prize Internship Program Sponsored by HM.CLAUSE 10/8/2014 Internship

Regional Internship Programme 2019 1 2 Benefits of the CCRIF SPC Regional Internship Programme

Graduation Day Event Welcome! August 2, 2019 Rock Internship Program Angus Young Associates

IAEA IAEA Internship Report Internship Report Takanari Fukuda Takanari Fukuda Project Human

Building a Quality Internship Program An Employer Guide 1 A traditional internship is any

In silico ligand-based methods targeting porcupine receptor inhibitors with potential anticancer

Inferenc nce and dynamical modeling of regul ulatory networks controlling hematopoiesis JOS

Workplace Wellness: How to Support a Diversity of Needs in the Covid-19 Workplace Michael

Stadium A Distributed Metadata-private Messaging System Nirvan Tyagi Yossi Gilad Derek Leung

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

ASTATINE-211 PRODUCTION UPDATE FOR THE UNIVERSITY OF WASHINGTON 2020 DOE IP ASTATINE-211 USER

EC487 Advanced Microeconomics, Part I: Lecture 4 Leonardo Felli 32L.LG.04 20 October, 2017

A Few Pearls in the Theory of Quasi-Metric Spaces Jean Goubault-Larrecq ANR Blanc CPP TACL