A Combinatorial Approach to the Analysis of Differential Gene - - PowerPoint PPT Presentation

a combinatorial approach to the analysis of differential
SMART_READER_LITE
LIVE PREVIEW

A Combinatorial Approach to the Analysis of Differential Gene - - PowerPoint PPT Presentation

A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening The Goal To classify patients based on expression profiles Presence of cancer Type of


slide-1
SLIDE 1

A Combinatorial Approach to the Analysis of Differential Gene Expression Data

The Use of Graph Algorithms for Disease Prediction and Screening

slide-2
SLIDE 2

The Goal

  • To classify patients based on expression profiles

– Presence of cancer – Type of cancer – Response to treatment

  • To identify the genes required for accurate

classification

– Too many = unnecessary noise – Too few = insufficient information

slide-3
SLIDE 3

Classic Clustering Problem

  • Current techniques:

– Hierarchical Clustering – K-Means Clustering – Self-Organizing Maps – Others

  • Drawbacks:

– Determining cluster boundaries difficult with diffuse data – Objects can only belong to one group

slide-4
SLIDE 4

Eliminate Poorly Covering Genes Raw Data Set of Discriminatory Genes Gene Scores Verify by Classification Calculate Sample Similarities Apply Threshold Eliminate Poorly Discriminating Genes

Algorithmic Training

Dominating Set Maximal Cliques Gene Scoring

slide-5
SLIDE 5

Raw Data Eliminate Poorly Discriminating Genes

Algorithmic Training

slide-6
SLIDE 6

The Gene Scoring Function: Identifying Discriminators

2 4 6 8 10 2 4 6 8

score(genei) = mclassA − mclassB − σ classA +σ classB

vs.

slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9

Eliminate Poorly Covering Genes Raw Data Eliminate Poorly Discriminating Genes

Algorithmic Training

slide-10
SLIDE 10

Eliminate Poorly Covering Genes

Samples Genes

Class 2 Class 1

slide-11
SLIDE 11

Eliminate Poorly Covering Genes Raw Data Calculate Sample Similarities Apply Threshold Eliminate Poorly Discriminating Genes

Algorithmic Training

slide-12
SLIDE 12

Create Unweighted Graph

  • Complete, edge-weighted graph

– Vertices = samples – Edge weight = similarity metric

  • Remove edge weights

– If edge weight < threshold, remove edge from graph – Otherwise, keep edge, ignore weight

  • Result: incomplete unweighted graph
slide-13
SLIDE 13

The Edge Weight Function

score(genei)•(1− expression_valueij − expression_valueik )

[ ]

where, expression valueij = expression value of genei for samplej

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

Eliminate Poorly Covering Genes Raw Data Set of Discriminatory Genes Gene Scores Verify by Classification Calculate Sample Similarities Apply Threshold Eliminate Poorly Discriminating Genes

Algorithmic Training

slide-18
SLIDE 18
  • A completely connected subset of vertices in a

graph

  • Maximal clique = local optimization
  • NP-complete

What is a Clique?

slide-19
SLIDE 19

Classification Using Clique

Class2 Class 1 Class 1 Class 3 Class 2 GRAPH

slide-20
SLIDE 20

A Selection of Discriminators

electron transport cytochrome P450 4B1 CYP4B1 cell growth, cell differentiation four and a half LIM domains 1 FHL1 alcohol dehydrogenase activity alcohol dehydrogenase IB ADH1B

  • xygen transport

hemoglobin, beta HBB transmembrane receptor protein serine/threonine kinase signaling pathway transforming growth factor, beta receptor II TGFBR2 plasminogen binding protein tetranectin TNA

slide-21
SLIDE 21

Raw Data Classify Unknown Samples Calculate Sample Similarities Apply Threshold Set of Discriminatory Genes, Scores

The Algorithm - Unsupervised

slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25

Summary

  • Intersection of clique and dominating set

techniques improves results

  • Combined orthogonal scoring identifies limited

number of discriminatory genes

  • Clique offers means of validating obtained scores

and weights

  • Our technique identifies differing set of

discriminatory genes from original paper

  • Clique-based classification a viable complement to

present clustering methods

slide-26
SLIDE 26

Ongoing and Future Research

  • Reverse Training
  • Train to distinguish among types of cancer
  • Experiment with different weight functions (ex.

Pearson’s coefficient)

  • Investigate using less stringent techniques

– Near-cliques – Neighborhood search – K-dense subgraphs

  • Port codes to SGI Altix supercomputer
slide-27
SLIDE 27

Our Research Group

Mike Langston, Ph. D. Lan Lin Chris Symons Xinxia Peng Bing Zhang, Ph. D.

slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33