Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for - PowerPoint PPT Presentation

Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universit¨ at Berlin Winter School on Compressed Sensing December 5, 2015

Outline Biological Background 1 Sparse Proteomics Analysis (SPA) 2 Theoretical Foundation by High-dimensional Estimation Theory 3 Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 2 / 19

Biological Background 1 Sparse Proteomics Analysis (SPA) 2 Theoretical Foundation by High-dimensional Estimation Theory 3

What is Proteomics? The pathological mechanisms of many diseases, such as cancer, are manifested on the level of protein activities. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 3 / 19

What is Proteomics? The pathological mechanisms of many diseases, such as cancer, are manifested on the level of protein activities. To improve clinical treatment options and early diagnostics, we need to understand protein structures and their interactions! Proteins are long chains of amino acids, controlling many biological and chemical processes in the human body. http://www.topsan.org/Proteins/JCSG/3qxb Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 3 / 19

What is Proteomics? The pathological mechanisms of many diseases, such as cancer, are manifested on the level of protein activities. To improve clinical treatment options and early diagnostics, we need to understand protein structures and their interactions! Proteins are long chains of amino acids, controlling many biological and chemical processes in the human body. The entire set of proteins at a certain point of time is called a proteome. Proteomics is the large-scale study of the human proteome. http://www.topsan.org/Proteins/JCSG/3qxb Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 3 / 19

What is Mass Spectrometry? How to “capture” a proteome? Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 4 / 19

What is Mass Spectrometry? How to “capture” a proteome? Mass spectrometry (MS) is a popular technique to detect the abundance of proteins in samples (blood, urine, etc.). Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 4 / 19

What is Mass Spectrometry? How to “capture” a proteome? Mass spectrometry (MS) is a popular technique to detect the abundance of proteins in samples (blood, urine, etc.). Schematic Work-Flow Mass spectrum Laser Intensity (cts) Detector + + + - - + + Sample Mass (m/z) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 4 / 19

Real-World MS-Data Intensity (cts) Mass (m/z) d ≈ 10 4 . . . 10 6 MS-vector: x = ( x 1 , . . . , x d ) ∈ R d , Index ˆ = Mass/Feature, Entry ˆ = Intensity/Amplitude Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 5 / 19

Feature Selection from MS-Data Goal: Detect a small set of features (disease fingerprint) that allows for an appropriate distinction between the diseased and healthy group. Schematic Work-Flow Samples Blood from healthy individual Blood from diseased individual Mass (m/z) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 6 / 19

Feature Selection from MS-Data Goal: Detect a small set of features (disease fingerprint) that allows for an appropriate distinction between the diseased and healthy group. Schematic Work-Flow Samples Mass Spectra Blood from healthy individual Intensity (cts) MS ¡ Mass (m/z) Blood from diseased individual Intensity (cts) MS ¡ Mass (m/z) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 6 / 19

Feature Selection from MS-Data Goal: Detect a small set of features (disease fingerprint) that allows for an appropriate distinction between the diseased and healthy group. Schematic Work-Flow Samples Mass Spectra Feature Selection Blood from healthy individual Intensity (cts) MS ¡ Disease Fingerprint Mass (m/z) Blood from diseased individual Comparing ¡ Intensity (cts) MS ¡ Mass (m/z) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 6 / 19

Mathematical Problem Formulation Supervised Learning: We are given n samples ( x 1 , y 1 ) , . . . , ( x n , y n ). x k ∈ R d : Mass spectrum of the k -th patient y k ∈ {− 1 , +1 } : Health status of the k -th patient (healthy = +1, diseased = − 1) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 7 / 19

Mathematical Problem Formulation Supervised Learning: We are given n samples ( x 1 , y 1 ) , . . . , ( x n , y n ). x k ∈ R d : Mass spectrum of the k -th patient y k ∈ {− 1 , +1 } : Health status of the k -th patient (healthy = +1, diseased = − 1) Goal: Learn a feature vector ω ∈ R d which is sparse, i.e., few non-zero entries, ( ⇒ stability, avoid overfitting) and its entries correspond to peaks that are highly correlated with the disease. ( ⇒ interpretability, biological relevance) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 7 / 19

How to learn a fingerprint ω ?

Biological Background 1 Sparse Proteomics Analysis (SPA) 2 Theoretical Foundation by High-dimensional Estimation Theory 3

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis is a generic framework to meet this challenge. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 8 / 19

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis is a generic framework to meet this challenge. Input: Sample pairs ( x 1 , y 1 ) , . . . , ( x n , y n ) ∈ R d × {− 1 , +1 } Compute: 1 Preprocessing (Smoothing, Standardization) 2 Feature Selection (LASSO, ℓ 1 -SVM, Robust 1-Bit CS) 3 Postprocessing (Sparsification) Output: Sparse feature vector ω ∈ R d ⇒ Biomarker extraction, dimension reduction Blood ¡Sample Intensity (cts) Biomarker Identification Mass (m/z) Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 8 / 19

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis is a generic framework to meet this challenge. Input: Sample pairs ( x 1 , y 1 ) , . . . , ( x n , y n ) ∈ R d × {− 1 , +1 } Compute: 1 Preprocessing (Smoothing, Standardization) 2 Feature Selection (LASSO, ℓ 1 -SVM, Robust 1-Bit CS) 3 Postprocessing (Sparsification) Output: Sparse feature vector ω ∈ R d Rest of this talk Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 8 / 19

Feature Selection (Geometric Intuition) Linear Separation Model: Find a feature vector ω ∈ R d such that y k = sign( � x k , ω � ) for “many” k ∈ { 1 , . . . , n } . Moreover, ω should be sparse and interpretable. Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 9 / 19

Feature Selection via the LASSO The LASSO (Tibshirani ’96) n � ( y k − � x k , ω � ) 2 min subject to � ω � 1 ≤ R ω ∈ R d k =1 Multivariate approach, originally designed for linear regression models: y k ≈ � x k , ω � , k = 1 , . . . , n . But also applicable to non-linear models → Next part Later: R ≈ √ s to allow for s -sparse solutions (with unit norm). Martin Genzel Sparse Proteomics Analysis (SPA) WiCoS 2015 10 / 19

Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for - PowerPoint PPT Presentation

Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universit at Berlin Winter School on Compressed Sensing December 5, 2015 Outline Biological Background 1

5 Star Spa hotel Strimon spa club PROPERTY DESCRIPTION 5 Star Spa hotel Strimon spa club

Luxury Spa Management Company www.softouchspa.com About Softouch Spa Softouch Spa is a Luxury

1 Genome Transcriptome Proteome Metabolome Genome: the complete set of hereditary material

A luxurious bush spa escape for some me time Sediko Bush Spa Welcome to Sediko Bush Spa Hidden

Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch EMBnet

Quality control of proteomics data IBIP19: Integrative Biological Interpretation using Proteomics

What is proteomics good for? IBIP19: Integrative Biological Interpretation using Proteomics with

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Proteomics pathway Proteomics pathway Sample Data Analysis Separation Selection of spot(s) G

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Principles and Applications of Proteomics Overview Why Proteomics? 2-DE Sample

Proteomics and Protein Mass Proteomics and Protein Mass Spectrometry 2004 Spectrometry 2004

Proteomics Informatics (BMSC-GA 4437) Instructor David Feny Contact information

Universals and Varia-on in Spa-al Referencing: Are Spa-al Mental Models Universal? Marco Ragni

PRADA spa PRESS RELEASE PRADA SPA APPROVES GROUP RESULTS FOR QUARTER ENDED APRIL 30, 2014 A

PRADA spa PRESS RELEASE PRADA SPA APPROVES GROUP RESULTS FOR THE NINE MONTHS ENDED OCTOBER 31,

Joins, and more plotting Joins, and more plotting Abhijit Dasgupta Abhijit Dasgupta Fall, 2019

Reminders Please rename yourself with your name and practice location in the Manage

Low protein intake, muscle strength and physical performance in the very old September 21 st ,

COMPUTATIONAL PROTEOMICS AND METABOLOMICS Oliver Kohlbacher, Sven

Building and Documenting Bioinformatics Workflows with Python-based Snakemake Johannes K

Provenance, End-User Trust and Reuse: An Empirical Investigation Devan Ray Donaldson and Kathleen

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture20: Multiple phenotypes

Little bioinformaticians pragmatic guide to internships in Debian By Tatiana Malygina &

Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for - PowerPoint PPT Presentation

Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universit at Berlin Winter School on Compressed Sensing December 5, 2015 Outline Biological Background 1

5 Star Spa hotel Strimon spa club PROPERTY DESCRIPTION 5 Star Spa hotel Strimon spa club

Luxury Spa Management Company www.softouchspa.com About Softouch Spa Softouch Spa is a Luxury

1 Genome Transcriptome Proteome Metabolome Genome: the complete set of hereditary material

A luxurious bush spa escape for some me time Sediko Bush Spa Welcome to Sediko Bush Spa Hidden

Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch EMBnet

Quality control of proteomics data IBIP19: Integrative Biological Interpretation using Proteomics

What is proteomics good for? IBIP19: Integrative Biological Interpretation using Proteomics with

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Proteomics pathway Proteomics pathway Sample Data Analysis Separation Selection of spot(s) G

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Principles and Applications of Proteomics Overview Why Proteomics? 2-DE Sample

Proteomics and Protein Mass Proteomics and Protein Mass Spectrometry 2004 Spectrometry 2004

Proteomics Informatics (BMSC-GA 4437) Instructor David Feny Contact information

Universals and Varia-on in Spa-al Referencing: Are Spa-al Mental Models Universal? Marco Ragni

PRADA spa PRESS RELEASE PRADA SPA APPROVES GROUP RESULTS FOR QUARTER ENDED APRIL 30, 2014 A

PRADA spa PRESS RELEASE PRADA SPA APPROVES GROUP RESULTS FOR THE NINE MONTHS ENDED OCTOBER 31,

Joins, and more plotting Joins, and more plotting Abhijit Dasgupta Abhijit Dasgupta Fall, 2019

Reminders Please rename yourself with your name and practice location in the Manage

Low protein intake, muscle strength and physical performance in the very old September 21 st ,

COMPUTATIONAL PROTEOMICS AND METABOLOMICS Oliver Kohlbacher, Sven

Building and Documenting Bioinformatics Workflows with Python-based Snakemake Johannes K

Provenance, End-User Trust and Reuse: An Empirical Investigation Devan Ray Donaldson and Kathleen

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture20: Multiple phenotypes

Little bioinformaticians pragmatic guide to internships in Debian By Tatiana Malygina &amp;

Little bioinformaticians pragmatic guide to internships in Debian By Tatiana Malygina &