Microarray Data Analysis of Adenocarcinoma Patients Survival Using - PowerPoint PPT Presentation

Microarray Data Analysis of Adenocarcinoma Patients’ Survival Using ADC and K-Medians Clustering Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University http://camda.cs.tufts.edu

Overview � Goals � Introduction � Explanation of ADC and NSM � Explanation of MVR, K-Medians, and Hierarchical Clustering � Results � Conclusions

Goals � Start with a classification of patients into high-risk and low-risk clusters � Obtain a small subset of genes that still leads to good clusters � These genes may be biologically significant � One can use statistical or machine learning techniques on the reduced set that would have led to overfitting on the original set

Introduction � We applied clustering and dimension-reduction techniques to gene expression values and survival times of patients with lung adenocarcinomas Harvard Data (n= 84) Michigan Data (n= 86) gene AD-043T2-A7-1 AD-111T2-A8-1 AD-114T1-A9-1 * AD-115T1-A12-1 * AD-118t1-A13-1 AD-119t3-A195-8 AD-120t1-A226-8 * AD-122t3-A197-8 interleukin 2 -18.6 9.12 -2.175 -1.54 -9.07 -16.58 -15.895 -14.5 GENE AD10 AD2 AD3 AD5 AD6 AD7 AD8 L01 L02 L04 GABRA3 170 59.7 80 92.4 104 88 69.7 230 105 53.7 interleukin 10 10.54 9.12 -2.21 21.75 3.08 -20.09 10.88 -10.48 OMD 69.4 18.1 26 96.9 72.8 138.6 11.1 176 78.1 36.7 interleukin 4 0.01 10.18 -0.06 5.835 -1.98 -8.39 1.61 3.61 GS3686 250.7 146.8 150 177.8 228.7 115.5 177.8 511.3 233.9 393.6 tumor necrosis factor receptor superfamily, member 6 19.44 29.29 6.32 23.815 17.26 4.49 23.845 12.67 SEMA3C 957.1 186.8 340.2 515.8 540.8 616.6 380.5 523.9 602.7 160.5 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -16.98 -4.68 -1.775 -24.785 -10.09 -18.92 -21.98 -17.52 GML 25.4 -7.7 -16.3 18 26 9 21 32 24.3 27 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -27.5 -1.5 -16.53 -12.89 -15.15 -20.09 -29 -20.54 MKNK1 471.2 309 225.7 296.6 264.1 371.9 291 664.2 471.6 407.3 OGG1 -52 -99 23.5 48.5 -10 49.2 -62.5 -17.1 20 -4.4 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -1.6 -3.62 -3.61 -4.485 -18.19 -8.39 -3.865 0.59 VRK1 42.8 57.9 69.4 60.4 56.4 37.2 99 295 78.1 94.2 J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) 38.88 20.8 16.41 19.5 13.21 16.19 23.635 28.78 VRK2 200.9 151.5 207.6 151.5 145.9 149.2 238.8 607.2 300.7 411 J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -29.12 -13.18 -17.97 -21.445 -13.13 -38.82 -19.01 -22.55 RES4-22 846.4 722.8 515.1 819.1 674.4 618.9 936.2 1388.1 732.1 959.1 J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -42.87 -35.47 -57.02 -47.205 -39.47 -56.38 -65.195 -68.78 SH3BP2 134.7 55.3 63.7 56.3 122.6 49.2 139.3 362.5 115.5 52 J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) 121.62 50.53 59.36 46.995 53.71 68.85 71.025 78.18 NULL 147 131.2 107 118.9 174 92 175.9 396.9 90 185.3 NULL -71.4 -85.4 -78.3 -80.7 -85.2 -135.3 4.1 46 -76.4 -50.2 X03453 Bacteriophage P1 cre recombinase protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -22.64 -14.24 -19.73 -7.555 -30.35 -15.41 -22.815 -22.55 RES4-25 19.6 -44 49.2 22.2 -69.2 17 6.8 60 81 105 X03453 Bacteriophage P1 cre recombinase protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) 2.44 10.18 2.99 12.885 -3 -4.87 0.965 4.62 RNF4 953.2 552.1 609.4 708.2 582.7 768.1 1130.1 1062.6 1005.8 1561.9 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 51.04 86.63 29.485 112.72 74.96 19.71 93.535 54.99 PLAB 703.6 2068.7 447 2771.2 327.1 179 1427.8 460.4 3691.9 1583.4 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 14.59 -5.74 -4.765 -35.865 -1.98 0.98 -30.79 -35.62 ARNTL 22.2 -22 30.8 75.5 32 57 28.2 47 34.8 34.3 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -97.84 -43.96 -65.625 -61.04 -79 -56.38 -97.25 -111.96 CDH23 222.2 178.3 99 111.6 157.1 133.2 340.2 325 131.9 181.5 J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -38.82 -3.62 -32.87 -26.21 -19.2 -24.77 -31.695 -31.6 PCDHGB4 43.5 69 53.4 67.6 66.8 60 45.8 125 66.8 76.4 PCDHGA12 -7 -0.8 28.4 4.2 3 -0.6 6.8 1 10.4 2.3 J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -7.27 -5.74 -11.285 -6.535 -11.1 -35.31 -7.655 -25.56 H4FM 95.5 75.1 68.5 57 35.5 54.5 55.1 152.6 71.1 88 J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -34.78 10.18 -12.12 18.265 -10.09 -4.87 19.03 -5.45 GMFB 526.9 391.8 288.9 326.1 383.1 416.4 806.9 1286.3 669.6 437.3 J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) 34.02 13.37 6.805 20.2 -8.06 -16.58 8.025 39.87 AQP3 777.5 517.9 1053.2 4190.3 449.5 421.9 709.9 687 1194.1 413.8 X03453 Bacteriophage P1 cre recombinase protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -12.13 9.12 -10.245 -5.04 -7.05 -13.07 -13.15 -18.52 KIAA0316 62.3 52 24.8 43.8 31 39 45.8 162.6 44 48.5 X03453 Bacteriophage P1 cre recombinase protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -60.66 -9.99 -22.565 -26.475 -46.57 -58.73 -46 -52.71 KIAA0317 149 328.6 199.4 172 288 321.4 238.8 314.7 201.8 298 KIAA0320 565.7 467.2 378 522.1 558.9 432.1 571.7 592.4 493.8 517.2 U14573 Human Alu-Sq subfamily consensus sequence. 7322.58 5795.86 8056.02 6437.37 7254.32 6222 6715.07 6766.43 CLOCK 400.6 259.7 238.5 400 340.5 360.3 189.1 365.3 252.6 433.8 L38424 B subtilis dapB, jojF, jojG genes corresponding to nucleotides 1358-3197 of L38424 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 4.06 20.8 2.285 12.87 1.06 -3.7 11.67 5.63 MADD 554.6 480.9 528.7 618.6 530 471.1 597.3 486.3 427 393.6 L38424 B subtilis dapB, jojF, jojG genes corresponding to nucleotides 1358-3197 of L38424 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 21.06 30.36 9.79 32.835 13.21 0.98 24.68 30.8 KIAA0367 68.5 65 16 108 32 98 95.8 195.1 52.8 15 L38424 B subtilis dapB, jojF, jojG genes corresponding to nucleotides 1358-3197 of L38424 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -15.36 3.81 -4.295 3.38 -6.03 -9.56 -0.745 -5.45 KIAA0368 22.2 4 10.8 70.2 23.5 35.5 41 84.6 43 31 X17013 B subtilis lys gene for diaminopimelate decarboxylase corresponding to nucleotides 350-1345 of X17013 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 0.01 16.55 4.62 7.395 -11.1 -3.7 0.6 0.59 ARHGEF12 281.6 355.7 650.7 795.5 412.5 371.9 246.8 437 375.8 454.9 X17013 B subtilis lys gene for diaminopimelate decarboxylase corresponding to nucleotides 350-1345 of X17013 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -11.32 -5.74 -11.15 -9.455 -23.26 -30.63 -14.36 -10.48 CTNND1 1018.2 1579.4 1254.4 1293.3 1220 1053.2 1098.5 738.6 703.6 3401.2 SCYA21 658.2 419.8 319.3 172 358.5 315.2 426.1 510.5 190.8 350.6 12,600 genes 7129 genes

Overview � Goals � Introduction � Explanation of ADC and NSM � Explanation of MVR, K-Medians, and Hierarchical Clustering � Results � Conclusions

ADC and NSM Overview � We use Approximate Distance Clustering maps (Cowen, 1997) to project the data into one or two dimensions so we can use very simple clustering techniques. � Then we use Nearest Shrunken Mean (Tibshirani, 1999) to reduce the number of genes used to predict the clusters. � We evaluate using leave-one-out crossvalidation and log-rank tests

Approximate Distance Clustering (ADC, Cowen 1997) � Approximate Distance Clustering is a method that reduces the dimensionality of the data. � This is done by calculating the distance from each datapoint to a subset of the data, which is called a witness set. � A different witness set is used for each desired dimension � A simple clustering technique is used on the projected data

ADC map in one dimension

1-d ADC map with cutoff

General ADC Definition � Choose witness sets D 1 , D 2 , …, D q to be subsets of the data of sizes k 1 , k 2 , …, k q � The associated ADC map � f (D1, D2, …, Dq) : R p � R q � maps a datapoint x to (y 1 , y 2 , …, y q ) � where y i = min{ || x j – x || : x j ∈ D i } is the distance to the closest point in D i

Microarray Data Analysis of Adenocarcinoma Patients Survival Using - PowerPoint PPT Presentation

Microarray Data Analysis of Adenocarcinoma Patients Survival Using ADC and K-Medians Clustering Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University http://camda.cs.tufts.edu

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro

Arg-ADNI Patients Flowchart Patients Invited Patients Followed Patients Followed Patients

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

Barretts Esophagus and Dysplasia: Diagnosis and Management Prateek Sharma, MD Kansas City

Pulmonary adenocarcinoma Issues, Issues and more issues. Why the headache? Alain Borczuk In

for In Situ Endocervical adenocarcinoma Excisional treatment in women with cervical

Pancreatic Ductal Adenocarcinoma 1 Pancreatic Ductal Adenocarcinoma AJCC 8 th Edition

5/22/2014 Outline of Talk Endocervical Adenocarcinoma Treatment Decisions for Endocervical

Microarray Data Analysis ECS 289A ECS289A a) Oligonucleotide and b) Spotted Arrays Lochart and

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes,

Recent development in microarray data analysis Guan-Hua Huang Institute of Statistics National

Neural Network Classifiers and Gene Selection Methods for Microarray Data on Human Lung

Microarray analysis at a glance from low-level data processing to data analysis Olga

Biweight Correlation as a Measure of Distance between Genes on a Microarray Aya Mitani Pitzer

Conflicts between Optimality Criteria in Incomplete-Block Designs for Microarray Experiments R.

Class discrimination for microarray studies Vlad Popovici Swiss Institute of Bioinformatics

The MAJOR HI STOCOMPATI BI LI TY COMPLEX & ANTI GEN PRESENTATI ON MHC MHC - tightly

Updates on transient elastography Victor de Ldinghen MD PhD CHU Bordeaux France Hong Kong

Natural Resource Management and Social Change in Dln, NT Ken Caine Walter Bayha Post-Doct

MIKE ES G GRIL LL & BAR R, SYL LVAN NIA Page | 2 A gre eat season for the Com mets

Traceability in laboratory medicine: a driver of accurate results for patients Graham H Beastall

Tabular Data Extraction Epidemiology Table Classification and Factor Alignment Garrick Sherman

modulated by histone modifications modifications are catalyzed by enzymes alterations

IMP761 webcast slides Date & Time: March 26, 2019, 7:45 am Australian Eastern Daylight Time

Sambuz

Useful Links

Newsletter

Mail Us

Microarray Data Analysis of Adenocarcinoma Patients Survival Using - PowerPoint PPT Presentation

Microarray Data Analysis of Adenocarcinoma Patients Survival Using ADC and K-Medians Clustering Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University http://camda.cs.tufts.edu

Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro

Arg-ADNI Patients Flowchart Patients Invited Patients Followed Patients Followed Patients

A CMOS Label- -free DNA free DNA A CMOS Label Microarray Microarray Erik Anderson Stanford

Barretts Esophagus and Dysplasia: Diagnosis and Management Prateek Sharma, MD Kansas City

Pulmonary adenocarcinoma Issues, Issues and more issues. Why the headache? Alain Borczuk In

for In Situ Endocervical adenocarcinoma Excisional treatment in women with cervical

Pancreatic Ductal Adenocarcinoma 1 Pancreatic Ductal Adenocarcinoma AJCC 8 th Edition

5/22/2014 Outline of Talk Endocervical Adenocarcinoma Treatment Decisions for Endocervical

Microarray Data Analysis ECS 289A ECS289A a) Oligonucleotide and b) Spotted Arrays Lochart and

Biology-Driven Clustering of Microarray Data Applications to the NCI60 Data Set K.R. Coombes,

Recent development in microarray data analysis Guan-Hua Huang Institute of Statistics National

Neural Network Classifiers and Gene Selection Methods for Microarray Data on Human Lung

Microarray analysis at a glance from low-level data processing to data analysis Olga

Biweight Correlation as a Measure of Distance between Genes on a Microarray Aya Mitani Pitzer

Conflicts between Optimality Criteria in Incomplete-Block Designs for Microarray Experiments R.

Class discrimination for microarray studies Vlad Popovici Swiss Institute of Bioinformatics

The MAJOR HI STOCOMPATI BI LI TY COMPLEX &amp; ANTI GEN PRESENTATI ON MHC MHC - tightly

Updates on transient elastography Victor de Ldinghen MD PhD CHU Bordeaux France Hong Kong

Natural Resource Management and Social Change in Dln, NT Ken Caine Walter Bayha Post-Doct

MIKE ES G GRIL LL &amp; BAR R, SYL LVAN NIA Page | 2 A gre eat season for the Com mets

Traceability in laboratory medicine: a driver of accurate results for patients Graham H Beastall

Tabular Data Extraction Epidemiology Table Classification and Factor Alignment Garrick Sherman

modulated by histone modifications modifications are catalyzed by enzymes alterations

IMP761 webcast slides Date &amp; Time: March 26, 2019, 7:45 am Australian Eastern Daylight Time

Sambuz

Useful Links

Newsletter

Mail Us

The MAJOR HI STOCOMPATI BI LI TY COMPLEX & ANTI GEN PRESENTATI ON MHC MHC - tightly

MIKE ES G GRIL LL & BAR R, SYL LVAN NIA Page | 2 A gre eat season for the Com mets

IMP761 webcast slides Date & Time: March 26, 2019, 7:45 am Australian Eastern Daylight Time