Advancing clinical proteomics via analysis based on biological - PowerPoint PPT Presentation

Advancing clinical proteomics via analysis based on biological complexes: A tale of five paradigms GIW2016 Joint work with Limsoon Wong Wilson Wen Bin Goh

Some background A B The traditional network utilisations The new network utilisations DNA RNA Protein DNA RNA Protein ? Undetected + Machine Learner Perturbation Perturbation A B P( ) exists Validation A = n% Validation Correlating phenotype to network (static projection) Describing network rewiring Network building Feature-selection Class prediction Coverage expansion Complexes work much better than predicted clusters from reference networks Goh & Wong. Integrating networks and proteomics: Moving forward . Trends in Biotechnology , 2016

The problem • No formalization of the classes of methods for complex-based analysis • A comprehensive means of evaluation/benchmarking is not available

Network-Paired approach ESSNet Newest addition to complex-based • methods Let g i be a protein in a given • protein complex Null hypothesis is “Complex C is • irrelevant to the difference Let p j be a patient • between patients and normals, Let q k be a normal • and the proteins in C behave similarly in patients and normals” Let Δ i,j,k = Expr(g i ,p j ) – • Expr(g i ,q k ) No need to restrict to most • abundant proteins Test whether Δ i,j,k is a • ⇒ Potential to reliably detect low- distribution with mean 0 abundance but differential proteins Lim et al. A quantum leap in the reproducibility, precision, and sensitivity of gene expression profile analysis even when sample size is extremely small . JBCB , 13(4):1550018, 2015

Five methods to compare with • Network-based methods – Over-Representation Analysis (Hypergeometric enrichment, HE) – Direct group (GSEA) – Hit-Rate (qPSP) Goh et al ., Biology Direct , 10:71, 2015 – Rank-Based Network Analysis (PFSNET), Goh & Wong, JBCB, 14(5):16500293, 2016 • Standard t-test on individual proteins (SP)

Langley & Mayr, J. Proteomics , 129:83-92, 2015 Simulated data • Simulated datasets from Langley and Mayr – D.1.2 is from study of proteomic changes resulting from addition of exogenous matrix metallopeptidase (3 control, 3 test) – D2.2 is from a study of hibernating arctic squirrels (4 control, 4 test) • Both D1.2 and D2.2 have 100 simulated datasets, each with 20% significant features – Effect sizes of these differential features are sampled from one out of five possibilities (20%, 50%, 80%, 100% and 200%), increased in one class and not in the other • Significant artificial complexes are constructed with various level of purity (i.e. proportion of significant proteins in the complex) – Equal # of non-significant complexes are constructed as well

Precision, Recall and the F-score Elements = features Precision: Of the selected feature, How many are correct? Recall: Of the selected feature, What is the proportion of all the correct ones we got? Precision and recall can be combined as:

SP shows poor performance on simulated data. Can network- based methods do better?

ESSNET shows excellent recall/precision on simulated data

Guo et al. Nature Medicine , 21(4):407-413, 2015 Renal cancer control data (RCC) • 12 runs originating from a human kidney tissue digested in quadruplicates and analyzed in triplicates • Excellent for evaluating false-positive rates of feature-selection methods – Randomly split the 12 runs into two groups. Report of any significant features between the groups must be false positives

All methods control false positives well Dash line corresponds to expected # of false positives at alpha 0.05 (~30 complexes)

Guo et al. Nature Medicine , 21(4):407-413, 2015 Renal cancer data (RC) • 12 samples are run twice so that we have technical replicates over 6 normal and 6 cancer tissues • Excellent opportunity for testing reproducibility of feature-selection methods – A good method should report similar feature sets between replicates • Can also test feature-selection stability – Apply feature-selection method on subsamples and see whether the same features get selected

ESSNET & PFSNET show excellent cross-replicate reproducibility This table is computed on by applying the methods on the full RC dataset

Feature-selection stability THE BINARY MATRIX is USEFUL FOR COMPARING STABILITY AND CONSISTENCY OF SIGNIFICANT FEATURES PRODUCED BY SOME FEATURE-SELECTION METHOD A Complex Vector Row Sums THE ROWS REPRESENT 1 3 EACH SIMULATION THE COLUMNS ARE A NOMINAL FEATURE VECTOR. RED REPRESENTS 2 2 FEATURES REPORTED AS SIGNIFICANT WHILE PINK ARE NON- Sampling 3 3 SIGNIFICANT. THE ROW SUMS PROVIDES INFORMATION ON THE NUMBER OF 4 3 SIGNIFICANT FEATURES WHILE THE COLUMN SUMS PROVIDE 5 2 INFORMATION ON THE RELATIVE STABILITY OF EACH FEATURE (I.E., 6 3 OUT OF N SIMULATIONS, HOW MANY TIMES IS THE FEATURE Col Sums 1 REPORTED AS SIGNIFICANT) 1 3 6 2 0 1 1 1 Legend: Non-significant Significant Goh and Wong, Design principles for clinical network-based proteomics. Drug Discovery Today, 2016

ESSNET & PFSNET show excellent feature-selection stability

ESSNET & PFSNET show excellent stability

ESSNET can assay low-abundance complexes that qPSP cannot A: QPSP-ESSNET significant-complex overlaps B: P-value distribution for overlapping and non- overlapping QPSP complexes. C: Sampling abundance distribution. The left panel is a zoom-in of the right. The y-axis is the protein abundance while the four categories are the distribution of abundances of complexes found in QPSP, ESSNET, ESSNET unique (complement), and all proteins in RC .

ESSNET can assay low-abundance complexes that PFSNET cannot Of the 5 ESSNET-unique complexes, PFSNET can detect 4; the missed complex consists entirely of low-abundance proteins. If p-value threshold is adjusted by Benjamini- Hochberg 5% FDR, PFSNET can detect only 3 of the 5 ESSNET-unique complexes while ESSNET continues to detect them all.

What have we learnt? • We’ve seen how five statistical methods can be used in conjunction with complex-based analysis • ESSNET, adapted for proteomics is a powerful approach that can sensitively detect low-abundance complexes

References Goh & Wong. Design principles for clinical network-based proteomics . Drug Discovery Today, 21(7), 2016 • Goh & Wong. Integrating networks and proteomics: Moving forward . Trends in Biotechnology , in press • [qPSP/HE] Goh et al. Quantitative proteomics signature profiling based on network contextualization . Biology • Direct , 10:71, 2015 [SNET/FSNET/PFSNET] Goh & Wong. Evaluating feature-selection stability in next-generation proteomics . Journal of • Bioinformatics and Computational Biology, 14(5):16500293, 2016 [ESSNET/GSEA] Goh & Wong. Advancing clinical proteomics via analysis based on biological complexes: A tale of five • paradigms . Journal of Proteome Research , in press

Acknowledgements Professor Limsoon Wong National University of Singapore

Advancing clinical proteomics via analysis based on biological - PowerPoint PPT Presentation

Advancing clinical proteomics via analysis based on biological complexes: A tale of five paradigms GIW2016 Joint work with Limsoon Wong Wilson Wen Bin Goh Some background A B The traditional network utilisations The new network

1 Genome Transcriptome Proteome Metabolome Genome: the complete set of hereditary material

Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch EMBnet

Quality control of proteomics data IBIP19: Integrative Biological Interpretation using Proteomics

What is proteomics good for? IBIP19: Integrative Biological Interpretation using Proteomics with

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Proteomics pathway Proteomics pathway Sample Data Analysis Separation Selection of spot(s) G

Principles and Applications of Proteomics Overview Why Proteomics? 2-DE Sample

Proteomics and Protein Mass Proteomics and Protein Mass Spectrometry 2004 Spectrometry 2004

Proteomics Informatics (BMSC-GA 4437) Instructor David Feny Contact information

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data M. Cannataro, P. H.

Mass spectrometry: how can it be used for medical research? Peter Burgers Laboratory of

Phenotype Sequencing Marc Harper UCLA Bioinformatics, Genomics and Proteomics March 4th, 2013

Corporate Presentation | Melbourne, Australia Australia Biotech Invest | 26 & 27 th October

For personal use only Investor Presentation | June 2015 Global leader & innovator in the

For personal use only Shareholder Presentation | Perth, WA Annual General Meeting | 20 th

Proteomics and Protein Structure Introduction to Bioinformatics Dortmund, 16.-20.07.2007

Louvain centre for Toxicology and Applied Pharmacology ABCB1 1199G>A genetic polymorphism

Identification Algorithms for Hybrid Systems Giancarlo Ferrari-Trecate Politecnico di Milano,

R01 - Simple linear regression STAT 587 (Engineering) Iowa State University October 17, 2020

MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN robin@inapg.inra.fr UMR INA-PG / INRA,

CSEP 527 Computational Biology Genes and Gene Prediction 1 Gene Finding: Motivation We

Introduc)on to the Analysis of RNA-seq Data Lecture

Unix commands for beginners D. Puthier TAGC/Inserm, U1090, denis.puthier@univ-amu.fr Matthieu

Reducing technical variability and bias in RNA-seq data Francesca Finotello NETTAB 2012

Sambuz

Useful Links

Newsletter

Mail Us

Advancing clinical proteomics via analysis based on biological - PowerPoint PPT Presentation

Advancing clinical proteomics via analysis based on biological complexes: A tale of five paradigms GIW2016 Joint work with Limsoon Wong Wilson Wen Bin Goh Some background A B The traditional network utilisations The new network

1 Genome Transcriptome Proteome Metabolome Genome: the complete set of hereditary material

Proteomics databases and protein characterization tools Marie-Claude.Blatter@ISB-SIB.ch EMBnet

Quality control of proteomics data IBIP19: Integrative Biological Interpretation using Proteomics

What is proteomics good for? IBIP19: Integrative Biological Interpretation using Proteomics with

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Proteomics pathway Proteomics pathway Sample Data Analysis Separation Selection of spot(s) G

Principles and Applications of Proteomics Overview Why Proteomics? 2-DE Sample

Proteomics and Protein Mass Proteomics and Protein Mass Spectrometry 2004 Spectrometry 2004

Proteomics Informatics (BMSC-GA 4437) Instructor David Feny Contact information

Preprocessing, Management, and Analysis of Mass Spectrometry Proteomics Data M. Cannataro, P. H.

Mass spectrometry: how can it be used for medical research? Peter Burgers Laboratory of

Phenotype Sequencing Marc Harper UCLA Bioinformatics, Genomics and Proteomics March 4th, 2013

Corporate Presentation | Melbourne, Australia Australia Biotech Invest | 26 &amp; 27 th October

For personal use only Investor Presentation | June 2015 Global leader &amp; innovator in the

For personal use only Shareholder Presentation | Perth, WA Annual General Meeting | 20 th

Proteomics and Protein Structure Introduction to Bioinformatics Dortmund, 16.-20.07.2007

Louvain centre for Toxicology and Applied Pharmacology ABCB1 1199G&gt;A genetic polymorphism

Identification Algorithms for Hybrid Systems Giancarlo Ferrari-Trecate Politecnico di Milano,

R01 - Simple linear regression STAT 587 (Engineering) Iowa State University October 17, 2020

MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN robin@inapg.inra.fr UMR INA-PG / INRA,

CSEP 527 Computational Biology Genes and Gene Prediction 1 Gene Finding: Motivation We

Introduc)on to the Analysis of RNA-seq Data Lecture

Unix commands for beginners D. Puthier TAGC/Inserm, U1090, denis.puthier@univ-amu.fr Matthieu

Reducing technical variability and bias in RNA-seq data Francesca Finotello NETTAB 2012

Sambuz

Useful Links

Newsletter

Mail Us

Corporate Presentation | Melbourne, Australia Australia Biotech Invest | 26 & 27 th October

For personal use only Investor Presentation | June 2015 Global leader & innovator in the

Louvain centre for Toxicology and Applied Pharmacology ABCB1 1199G>A genetic polymorphism