a combinatorial approach to the analysis of differential
play

A Combinatorial Approach to the Analysis of Differential Gene - PowerPoint PPT Presentation

A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening The Goal To classify patients based on expression profiles Presence of cancer Type of


  1. A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening

  2. The Goal • To classify patients based on expression profiles – Presence of cancer – Type of cancer – Response to treatment • To identify the genes required for accurate classification – Too many = unnecessary noise – Too few = insufficient information

  3. Classic Clustering Problem • Current techniques: – Hierarchical Clustering – K-Means Clustering – Self-Organizing Maps – Others • Drawbacks: – Determining cluster boundaries difficult with diffuse data – Objects can only belong to one group

  4. Algorithmic Training Raw Data Gene Scoring Dominating Set Eliminate Poorly Eliminate Poorly Discriminating Genes Covering Genes Calculate Sample Similarities Apply Threshold Verify by Classification Maximal Cliques Set of Discriminatory Gene Scores Genes

  5. Algorithmic Training Raw Data Eliminate Poorly Discriminating Genes

  6. The Gene Scoring Function: Identifying Discriminators vs. 0 2 4 6 8 10 0 2 4 6 8 score ( gene i ) = m classA − m classB − σ classA +σ classB

  7. Algorithmic Training Raw Data Eliminate Poorly Eliminate Poorly Discriminating Genes Covering Genes

  8. Eliminate Poorly Covering Genes Samples Genes Class 2 Class 1

  9. Algorithmic Training Raw Data Eliminate Poorly Eliminate Poorly Discriminating Genes Covering Genes Calculate Sample Similarities Apply Threshold

  10. Create Unweighted Graph • Complete, edge-weighted graph – Vertices = samples – Edge weight = similarity metric • Remove edge weights – If edge weight < threshold, remove edge from graph – Otherwise, keep edge, ignore weight • Result: incomplete unweighted graph

  11. The Edge Weight Function [ ] ∑ score ( gene i ) • (1 − expression_value ij − expression_value ik ) where, expression value ij = expression value of gene i for sample j

  12. Algorithmic Training Raw Data Eliminate Poorly Eliminate Poorly Discriminating Genes Covering Genes Calculate Sample Similarities Apply Threshold Verify by Classification Set of Discriminatory Gene Scores Genes

  13. What is a Clique? • A completely connected subset of vertices in a graph • Maximal clique = local optimization • NP-complete

  14. Classification Using Clique GRAPH Class 2 Class 1 Class 1 Class 3 Class2

  15. A Selection of Discriminators ADH1B alcohol dehydrogenase IB alcohol dehydrogenase activity FHL1 four and a half LIM domains 1 cell growth, cell differentiation HBB hemoglobin, beta oxygen transport CYP4B1 cytochrome P450 4B1 electron transport TNA tetranectin plasminogen binding protein TGFBR2 transforming growth factor, beta transmembrane receptor receptor II protein serine/threonine kinase signaling pathway

  16. The Algorithm - Unsupervised Raw Data Set of Discriminatory Genes, Scores Calculate Sample Similarities Apply Threshold Classify Unknown Samples

  17. Summary • Intersection of clique and dominating set techniques improves results • Combined orthogonal scoring identifies limited number of discriminatory genes • Clique offers means of validating obtained scores and weights • Our technique identifies differing set of discriminatory genes from original paper • Clique-based classification a viable complement to present clustering methods

  18. Ongoing and Future Research • Reverse Training • Train to distinguish among types of cancer • Experiment with different weight functions (ex. Pearson’s coefficient) • Investigate using less stringent techniques – Near-cliques – Neighborhood search – K-dense subgraphs • Port codes to SGI Altix supercomputer

  19. Our Research Group Mike Langston, Ph. D. Lan Lin Chris Symons Xinxia Peng Bing Zhang, Ph. D.

Recommend


More recommend