U-curve Search for Biological States Characterization and Genetic - - PowerPoint PPT Presentation

u curve search for biological states characterization and
SMART_READER_LITE
LIVE PREVIEW

U-curve Search for Biological States Characterization and Genetic - - PowerPoint PPT Presentation

U-curve Search for Biological States Characterization and Genetic Network Design Marcelo Ris Universidade de So Paulo Instituto de Matemtica e Estatstica Junior Barrera Universidade de So Paulo Instituto de Matemtica e


slide-1
SLIDE 1

U-curve Search for Biological States Characterization and Genetic Network Design

Marcelo Ris – Universidade de São Paulo – Instituto de Matemática e Estatística Junior Barrera – Universidade de São Paulo – Instituto de Matemática e Estatística Helena Brentani - Hospital do Câncer, Fundação Antônio Prudente

slide-2
SLIDE 2

Outline

  • Introduction
  • Feature selection problem
  • U-curve search algorithm
  • Characterization of biological

states

  • Genetic network design
  • Application
slide-3
SLIDE 3
  • Introduction
  • Feature selection problem
  • U-curve search algorithm
  • Characterization of biological

states

  • Genetic network design
  • Application
slide-4
SLIDE 4
  • Biological Problems
  • P1. Biological states characterization
  • P2. Genetic Network Design
  • Gene expression data
  • P1. States samples
  • P2. Time-course samples
  • Mathematical approach

– Feature Selection Problem – State of the art: heuristic optimizations – U-curve algorithm

slide-5
SLIDE 5
  • Introduction
  • Feature selection problem
  • U-curve search algorithm
  • Characterization of biological

states

  • Genetic network design
  • Application
slide-6
SLIDE 6

] [ ] [ ] [ ] [

2 1

  • =

t x t x t x t x

n

  • ])}

[ ], [ ( , ]), 1 [ ], 1 [ ( ]), [ ], [ {( m y m x y x y x

  • }

, , 2 , 1 { n A

Feature Selection } , , 1 { ] [ { ] [

2 1

c K t y r r r r R t x

i l i

  • =

∈ ∈ = ∈

  • K

R A →

| |

: ψ

) , ( Y X P

slide-7
SLIDE 7

= → −

} 1 , , 1 {

1 ) ( ] 1 , [ } 1 , , 1 { :

y

y P P

Y Distribution

) ' ' ( ) ' ( ) ' ( ) ( ) ( log ) ( ) (

} 1 , , 1 {

Y H Y H Y H Y H y P y P Y H

y

= > − =

Y Entropy

) | ( ) ( ) , ( ≥ − = X Y H Y H Y X I

Mutual Information

  • 1 0 1

P(Y) Y

  • 1 0 1

P(Y’) Y’

  • 1 0 1

P(Y’’) Y’’

slide-8
SLIDE 8

) | ( log ) | ( ) ( )] | ( [

| } 1 , , 1 { } 1 , , 1 { |

x y P x y P x P X Y H E

X Y x y X Y X

∈ − ∈

= Mean Conditional Entropy Mean Mutual Information

)] | ( [ ) ( )] , ( [ X Y H E Y H Y X I E − =

) | ( log ) | ( ) ( )] | ( [

| } 1 , , 1 { } 1 , , 1 { |

x y P x y P x P X Y H E

X Y x y X Y X

∈ − ∈

= Estimation Estimation

)] | ( [ ) ( )] , ( [ X Y H E Y H Y X I E

=

slide-9
SLIDE 9
  • Problem

– find the subset A that optimizes the cost function – Ex: mean conditional entropy minimization (cost function) – Exponential

  • Search Space

– Complete boolean lattice of order n – Each node represents a possible candidate A – Cost function: estimated for each node – Find the node with the minimum cost

slide-10
SLIDE 10

Boolean Lattice of order 4 4-element chain is emphasized

  • Heuristics: SFS, SFFS

– Incremental – Does not search all the candidates space – Could not obtain the “best” result

  • Ex: 2 elements alone turns the result worse, but

together improves it a lot

slide-11
SLIDE 11
  • Introduction
  • Feature selection problem
  • U-curve search algorithm
  • Characterization of biological

states

  • Genetic network design
  • Application
slide-12
SLIDE 12
  • U-curve property of Ê[H(Y|X)]

Ë[H(Y|X)] |A|

– Why ? – Estimation composed by:

  • Real measure – decreases from H(Y) to the real value

E[H(Y|X)]

  • Estimation error – increases as more attributes are

added to X

– For a fixed number of samples – For any chain of the search space – Ê[H(Y|X)] forms an U-curve

slide-13
SLIDE 13
  • Features of the algorithm

– Branch-and-Bound: go through the whole space without having to visit all the candidates – Stochastic

– Some definitions:

  • U-cost Boolean Lattice
  • Local minimum
  • Exhausted minimum
  • Global minimum
slide-14
SLIDE 14
  • Search space

characterized by:

– Upper Bound List – Lower Bound List

  • An element is reachable if

there is a chain from an upper or lower list element

  • At each step:

– Select with some probability a beginning list – Select an aleatory element from this list – Build a chain iteratively:

  • Inserts to the chain an

aleatory reachable adjacent to the last

  • ne
  • Stop, when the cost
  • f the last element is

greater than the last

  • ne

0000 0100 0110 1110

10 7 9 6

Prune Procedure

slide-15
SLIDE 15
  • Additional Procedures

– Minimum exhausting

  • Avoid more than one visit to the same candidate
  • Using a stack

– Pruning elements from an element E

  • Upper bound list – remove elements U’s that contain E, and

inserts elemets reachable from U that not contain E

  • Lower bound list – remove elements L’s that are contained

in E, and inserts elemets reachable from L that is not contained in E

slide-16
SLIDE 16
  • Introduction
  • Feature selection problem
  • U-curve search algorithm
  • Characterization of biological

states

  • Genetic network design
  • Application
slide-17
SLIDE 17

] [ ] [ ] [ ] [

2 1

  • =

t x t x t x t x

n

  • ])}

[ ], [ ( , ]), 1 [ ], 1 [ ( ]), [ ], [ {( m y m x y x y x

  • }

, , 2 , 1 { n A

U-curve algorithm

} , , 1 { ] [ }, , , , { ] [

2 1

c K t y r r r r R t x

i l i

  • =

∈ ∈ = ∈

  • K

R A →

| |

: ψ

) , ( Y X P

Quantized Microarray Quantized Values Biological States

slide-18
SLIDE 18
  • Introduction
  • Feature selection problem
  • U-curve search algorithm
  • Characterization of biological

states

  • Genetic network design
  • Application
slide-19
SLIDE 19
  • Dynamical Systems

– State: vector x – Transition function – x[t+1] = (x[t])

  • Stochastic Process

– Stochastic transition function

  • Next State – aleatory vector realization

– Ex: Markov Chain (X|Y , 0 )

  • Time-discrete, finite-size vector, finite domain
  • Aleatory state sequence

| || | | | || 3 | || 2 | || 1 3 | | | 3 | 3 3 | 2 3 | 1 2 | | | 2 | 3 2 | 2 2 | 1 1 | | | 1 | 3 1 | 2 1 | 1 | | | 3 2 1

  • =
  • =

n n n n n n n n n

R R R R R R R R X Y R

p p p p p p p p p p p p p p p p p p p p

  • π

π

slide-20
SLIDE 20

space

  • sub
  • n this
  • f

projection the is e wher , : as such , , dimension

  • f

space

  • sub

a is there , , e. , 1 | is there }, ,.., 1 { e is, that tic, determinis

  • almost

d. ), | ( , , is, that t, independen lly condiciona é c. , , b. ,

  • n

independs s, homogeneou is a. : axioms following with the ) , ( Chain Markov

  • ,

| | | | 1 | | | | | |

,

x x p p n j j N i R x p R r n N i R x x y p p R y x R y x p t p

x y x y n x r y n X Y n i i x y n X Y n x y x y X Y X Y

i i i

= << ∈ ∀ ∈ ∀ ≈ ∈ = ∈ ∈ ∀ = ∈ ∀ ∈ ∀ >

= =

π π π π π

  • Probabilistic Genetic Networks - PGN
slide-21
SLIDE 21

, , ,

| || | || | || | || 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | | | | |

3 2 1 3 2 1 3 2 1 3 2 1 2 1

  • =

n l n n n l l l i n

R r R r R r R r r r r r r r r r r r r r X X X X X X X X

p p p p p p p p p p p p p p p p P P P P

  • 3

| 3 3 | 3 3 | 2 3 | 1 3 | 3 3 | 3 3 | 2 3 | 1 2 | 3 2 | 3 2 | 2 2 | 1 1 | 3 1 | 3 1 | 2 1 | 1 |

  • =

n n n n n n n n

p p p p p p p p p p p p p p p p

X Y

  • π
  • Markov Chain
  • Probabilistic Genetic Networks - PGN

Almost Deterministic

slide-22
SLIDE 22

Expression (Gene 1) time Expression (Gene 2) time Expression (Gene 3) time

. . . . . . . . . . . . . . . . . .

Expression (Gene n) time

] 1 [ x ] 2 [ x ] 3 [ x ] 4 [ x ] 5 [ x ] 6 [ x ] 7 [ x

] 9 [ x

] 10 [ x ] 11 [ x ] 12 [ x ] 13 [ x ] 1 [ − m x ] [m x

.... Expression Measurement Techniques

Time-Course Gene Expression Data

slide-23
SLIDE 23

] [ ] [ ] [ ] [

2 1

  • =

t x t x t x t x

n

  • ])}

[ ], [ ( , ]), 1 [ ], 1 [ ( ]), [ ], [ {( m y m x y x y x

j j j

  • }

, , 2 , 1 { n Aj

U-curve algorithm

] 1 [ ] [ }, , , , { ] [

2 1

+ = ∈ = ∈ t x t y r r r r R t x

j j i l i

  • K

R

j

A j

| |

: ψ

n j Y X P

j

, , 1 ), , (

  • =

Quantized Microarray at t Quantized Values Gene j quantized expression at t+1

slide-24
SLIDE 24
  • Introduction
  • Feature selection problem
  • U-curve search algorithm
  • Characterization of biological

states

  • Genetic network design
  • Application
slide-25
SLIDE 25
  • Application to design a estrogen

regulated network

– 21 target genes estrogen regulated – Time-course gene expression (Ed Liu et al.)

  • MCF-7 cells treated with estrogen
  • 16 microarray experiments in 24 hours:

– each hour: 8 first hours – each 2 hours: 16 last hours

– Quantization (Barrera et al.)

  • 3 levels {-1, 0, 1}

– For each target gene

  • 15x21 matrix

– 20 first columns – gene expressions on the first 15 experiments – Last column – gene target expression on the last 15 experiments

– The algorithm returns between the 20 genes the subset that best predict the target

slide-26
SLIDE 26
slide-27
SLIDE 27

THBS1 PGR CXCL12 STC2 SERP INE1 NRIP1 FZD8 CRABP2 BMP7 IL6ST REA EPHA4 CD7 JAK1 CTSD JDP1 CCNG2 ABCA3 GATA3 NMA IGFBP4

slide-28
SLIDE 28

Thanks!!!