microarray data analysis of adenocarcinoma patients
play

Microarray Data Analysis of Adenocarcinoma Patients Survival Using - PowerPoint PPT Presentation

Microarray Data Analysis of Adenocarcinoma Patients Survival Using ADC and K-Medians Clustering Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University http://camda.cs.tufts.edu


  1. Microarray Data Analysis of Adenocarcinoma Patients’ Survival Using ADC and K-Medians Clustering Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University http://camda.cs.tufts.edu

  2. Overview � Goals � Introduction � Explanation of ADC and NSM � Explanation of MVR, K-Medians, and Hierarchical Clustering � Results � Conclusions

  3. Goals � Start with a classification of patients into high-risk and low-risk clusters � Obtain a small subset of genes that still leads to good clusters � These genes may be biologically significant � One can use statistical or machine learning techniques on the reduced set that would have led to overfitting on the original set

  4. Introduction � We applied clustering and dimension-reduction techniques to gene expression values and survival times of patients with lung adenocarcinomas Harvard Data (n= 84) Michigan Data (n= 86) gene AD-043T2-A7-1 AD-111T2-A8-1 AD-114T1-A9-1 * AD-115T1-A12-1 * AD-118t1-A13-1 AD-119t3-A195-8 AD-120t1-A226-8 * AD-122t3-A197-8 interleukin 2 -18.6 9.12 -2.175 -1.54 -9.07 -16.58 -15.895 -14.5 GENE AD10 AD2 AD3 AD5 AD6 AD7 AD8 L01 L02 L04 GABRA3 170 59.7 80 92.4 104 88 69.7 230 105 53.7 interleukin 10 10.54 9.12 -2.21 21.75 3.08 -20.09 10.88 -10.48 OMD 69.4 18.1 26 96.9 72.8 138.6 11.1 176 78.1 36.7 interleukin 4 0.01 10.18 -0.06 5.835 -1.98 -8.39 1.61 3.61 GS3686 250.7 146.8 150 177.8 228.7 115.5 177.8 511.3 233.9 393.6 tumor necrosis factor receptor superfamily, member 6 19.44 29.29 6.32 23.815 17.26 4.49 23.845 12.67 SEMA3C 957.1 186.8 340.2 515.8 540.8 616.6 380.5 523.9 602.7 160.5 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -16.98 -4.68 -1.775 -24.785 -10.09 -18.92 -21.98 -17.52 GML 25.4 -7.7 -16.3 18 26 9 21 32 24.3 27 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -27.5 -1.5 -16.53 -12.89 -15.15 -20.09 -29 -20.54 MKNK1 471.2 309 225.7 296.6 264.1 371.9 291 664.2 471.6 407.3 OGG1 -52 -99 23.5 48.5 -10 49.2 -62.5 -17.1 20 -4.4 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -1.6 -3.62 -3.61 -4.485 -18.19 -8.39 -3.865 0.59 VRK1 42.8 57.9 69.4 60.4 56.4 37.2 99 295 78.1 94.2 J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) 38.88 20.8 16.41 19.5 13.21 16.19 23.635 28.78 VRK2 200.9 151.5 207.6 151.5 145.9 149.2 238.8 607.2 300.7 411 J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -29.12 -13.18 -17.97 -21.445 -13.13 -38.82 -19.01 -22.55 RES4-22 846.4 722.8 515.1 819.1 674.4 618.9 936.2 1388.1 732.1 959.1 J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -42.87 -35.47 -57.02 -47.205 -39.47 -56.38 -65.195 -68.78 SH3BP2 134.7 55.3 63.7 56.3 122.6 49.2 139.3 362.5 115.5 52 J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) 121.62 50.53 59.36 46.995 53.71 68.85 71.025 78.18 NULL 147 131.2 107 118.9 174 92 175.9 396.9 90 185.3 NULL -71.4 -85.4 -78.3 -80.7 -85.2 -135.3 4.1 46 -76.4 -50.2 X03453 Bacteriophage P1 cre recombinase protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -22.64 -14.24 -19.73 -7.555 -30.35 -15.41 -22.815 -22.55 RES4-25 19.6 -44 49.2 22.2 -69.2 17 6.8 60 81 105 X03453 Bacteriophage P1 cre recombinase protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) 2.44 10.18 2.99 12.885 -3 -4.87 0.965 4.62 RNF4 953.2 552.1 609.4 708.2 582.7 768.1 1130.1 1062.6 1005.8 1561.9 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 51.04 86.63 29.485 112.72 74.96 19.71 93.535 54.99 PLAB 703.6 2068.7 447 2771.2 327.1 179 1427.8 460.4 3691.9 1583.4 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 14.59 -5.74 -4.765 -35.865 -1.98 0.98 -30.79 -35.62 ARNTL 22.2 -22 30.8 75.5 32 57 28.2 47 34.8 34.3 J04423 E coli bioB gene biotin synthetase (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -97.84 -43.96 -65.625 -61.04 -79 -56.38 -97.25 -111.96 CDH23 222.2 178.3 99 111.6 157.1 133.2 340.2 325 131.9 181.5 J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -38.82 -3.62 -32.87 -26.21 -19.2 -24.77 -31.695 -31.6 PCDHGB4 43.5 69 53.4 67.6 66.8 60 45.8 125 66.8 76.4 PCDHGA12 -7 -0.8 28.4 4.2 3 -0.6 6.8 1 10.4 2.3 J04423 E coli bioC protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -7.27 -5.74 -11.285 -6.535 -11.1 -35.31 -7.655 -25.56 H4FM 95.5 75.1 68.5 57 35.5 54.5 55.1 152.6 71.1 88 J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -34.78 10.18 -12.12 18.265 -10.09 -4.87 19.03 -5.45 GMFB 526.9 391.8 288.9 326.1 383.1 416.4 806.9 1286.3 669.6 437.3 J04423 E coli bioD gene dethiobiotin synthetase (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) 34.02 13.37 6.805 20.2 -8.06 -16.58 8.025 39.87 AQP3 777.5 517.9 1053.2 4190.3 449.5 421.9 709.9 687 1194.1 413.8 X03453 Bacteriophage P1 cre recombinase protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -12.13 9.12 -10.245 -5.04 -7.05 -13.07 -13.15 -18.52 KIAA0316 62.3 52 24.8 43.8 31 39 45.8 162.6 44 48.5 X03453 Bacteriophage P1 cre recombinase protein (-5 and -3 represent transcript regions 5 prime and 3 prime respectively) -60.66 -9.99 -22.565 -26.475 -46.57 -58.73 -46 -52.71 KIAA0317 149 328.6 199.4 172 288 321.4 238.8 314.7 201.8 298 KIAA0320 565.7 467.2 378 522.1 558.9 432.1 571.7 592.4 493.8 517.2 U14573 Human Alu-Sq subfamily consensus sequence. 7322.58 5795.86 8056.02 6437.37 7254.32 6222 6715.07 6766.43 CLOCK 400.6 259.7 238.5 400 340.5 360.3 189.1 365.3 252.6 433.8 L38424 B subtilis dapB, jojF, jojG genes corresponding to nucleotides 1358-3197 of L38424 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 4.06 20.8 2.285 12.87 1.06 -3.7 11.67 5.63 MADD 554.6 480.9 528.7 618.6 530 471.1 597.3 486.3 427 393.6 L38424 B subtilis dapB, jojF, jojG genes corresponding to nucleotides 1358-3197 of L38424 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 21.06 30.36 9.79 32.835 13.21 0.98 24.68 30.8 KIAA0367 68.5 65 16 108 32 98 95.8 195.1 52.8 15 L38424 B subtilis dapB, jojF, jojG genes corresponding to nucleotides 1358-3197 of L38424 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -15.36 3.81 -4.295 3.38 -6.03 -9.56 -0.745 -5.45 KIAA0368 22.2 4 10.8 70.2 23.5 35.5 41 84.6 43 31 X17013 B subtilis lys gene for diaminopimelate decarboxylase corresponding to nucleotides 350-1345 of X17013 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) 0.01 16.55 4.62 7.395 -11.1 -3.7 0.6 0.59 ARHGEF12 281.6 355.7 650.7 795.5 412.5 371.9 246.8 437 375.8 454.9 X17013 B subtilis lys gene for diaminopimelate decarboxylase corresponding to nucleotides 350-1345 of X17013 (-5, -M, -3 represent transcript regions 5 prime, Middle, and 3 prime respectively) -11.32 -5.74 -11.15 -9.455 -23.26 -30.63 -14.36 -10.48 CTNND1 1018.2 1579.4 1254.4 1293.3 1220 1053.2 1098.5 738.6 703.6 3401.2 SCYA21 658.2 419.8 319.3 172 358.5 315.2 426.1 510.5 190.8 350.6 12,600 genes 7129 genes

  5. Overview � Goals � Introduction � Explanation of ADC and NSM � Explanation of MVR, K-Medians, and Hierarchical Clustering � Results � Conclusions

  6. ADC and NSM Overview � We use Approximate Distance Clustering maps (Cowen, 1997) to project the data into one or two dimensions so we can use very simple clustering techniques. � Then we use Nearest Shrunken Mean (Tibshirani, 1999) to reduce the number of genes used to predict the clusters. � We evaluate using leave-one-out crossvalidation and log-rank tests

  7. Approximate Distance Clustering (ADC, Cowen 1997) � Approximate Distance Clustering is a method that reduces the dimensionality of the data. � This is done by calculating the distance from each datapoint to a subset of the data, which is called a witness set. � A different witness set is used for each desired dimension � A simple clustering technique is used on the projected data

  8. ADC map in one dimension

  9. 1-d ADC map with cutoff

  10. General ADC Definition � Choose witness sets D 1 , D 2 , …, D q to be subsets of the data of sizes k 1 , k 2 , …, k q � The associated ADC map � f (D1, D2, …, Dq) : R p � R q � maps a datapoint x to (y 1 , y 2 , …, y q ) � where y i = min{ || x j – x || : x j ∈ D i } is the distance to the closest point in D i

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend