U-curve Search for Biological States Characterization and Genetic - PowerPoint PPT Presentation

U-curve Search for Biological States Characterization and Genetic Network Design Marcelo Ris – Universidade de São Paulo – Instituto de Matemática e Estatística Junior Barrera – Universidade de São Paulo – Instituto de Matemática e Estatística Helena Brentani - Hospital do Câncer, Fundação Antônio Prudente

Outline • Introduction • Feature selection problem • U-curve search algorithm • Characterization of biological states • Genetic network design • Application

• Introduction • Feature selection problem • U-curve search algorithm • Characterization of biological states • Genetic network design • Application

• Biological Problems P1. Biological states characterization P2. Genetic Network Design • Gene expression data P1. States samples P2. Time-course samples • Mathematical approach – Feature Selection Problem – State of the art: heuristic optimizations – U-curve algorithm

P ( X , Y ) � {( x [ 0 ], y [ 0 ]), ( x [ 1 ], y [ 1 ]), , ( x [ m ], y [ m ])} � � [ ] x t 1 � � x [ t ] � � 2 = x [ t ] � � � � � � � x [ t ] n R A → | | ψ : K Feature Selection � � � �� ∈ = ∈ x [ t ] R { r r r r i 1 2 l i � ∈ = y [ t ] K { 1 , , c } � ⊂ A { 1 , 2 , , n }

Y Distribution − → P : { 1 , 0 , 1 } [ 0 , 1 ] � P(Y) = P ( y ) 1 y ∈ { − 1 , 0 , 1 } Y -1 0 1 Y Entropy P(Y’) � = − H ( Y ) P ( y ) log P ( y ) ∈ − y { 1 , 0 , 1 } > H ( Y ) H ( Y ' ) Y’ -1 0 1 = H ( Y ' ) H ( Y ' ' ) P(Y’’) Mutual Information Y’’ -1 0 1 = − ≥ ( , ) ( ) ( | ) 0 I X Y H Y H Y X

Mean Conditional Entropy � � = E [ H ( Y | X )] P ( x ) P ( y | x ) log P ( y | x ) X Y | X Y | X ∈ − ∈ − x { 1 , 0 , 1 } y { 1 , 0 , 1 } Estimation � � � � � � = E [ H ( Y | X )] P ( x ) P ( y | x ) log P ( y | x ) X Y | X Y | X ∈ − ∈ − x { 1 , 0 , 1 } y { 1 , 0 , 1 } Mean Mutual Information = − E [ I ( X , Y )] H ( Y ) E [ H ( Y | X )] Estimation � � � = − E [ I ( X , Y )] H ( Y ) E [ H ( Y | X )]

• Problem – find the subset A that optimizes the cost function – Ex: mean conditional entropy minimization (cost function) – Exponential • Search Space – Complete boolean lattice of order n – Each node represents a possible candidate A – Cost function: estimated for each node – Find the node with the minimum cost

Boolean Lattice of order 4 4-element chain is emphasized • Heuristics: SFS, SFFS – Incremental – Does not search all the candidates space – Could not obtain the “best” result • Ex: 2 elements alone turns the result worse, but together improves it a lot

• U-curve property of Ê[H(Y|X)] Ë[H(Y|X)] – For a fixed number of samples – For any chain of the search space – Ê[H(Y|X)] forms an U-curve |A| – Why ? – Estimation composed by: • Real measure – decreases from H(Y) to the real value E[H(Y|X)] • Estimation error – increases as more attributes are added to X

• Features of the algorithm – Branch-and-Bound: go through the whole space without having to visit all the candidates – Stochastic – Some definitions: • U-cost Boolean Lattice • Local minimum • Exhausted minimum • Global minimum

• Search space characterized by: – Upper Bound List – Lower Bound List An element is reachable if • there is a chain from an 10 upper or lower list element 1110 • At each step: – Select with some 6 probability a beginning list 0110 – Select an aleatory Prune element from this list Procedure – Build a chain iteratively: • Inserts to the chain an 7 aleatory reachable 0100 adjacent to the last one • Stop, when the cost of the last element is greater than the last 9 one 0000

• Additional Procedures – Minimum exhausting • Avoid more than one visit to the same candidate • Using a stack – Pruning elements from an element E • Upper bound list – remove elements U’s that contain E , and inserts elemets reachable from U that not contain E • Lower bound list – remove elements L’s that are contained in E , and inserts elemets reachable from L that is not contained in E

P ( X , Y ) Quantized Microarray � {( x [ 0 ], y [ 0 ]), ( x [ 1 ], y [ 1 ]), , ( x [ m ], y [ m ])} � � [ ] x t 1 � � x [ t ] � � 2 = x [ t ] � � � � � � � x [ t ] n R A → | | ψ : K U-curve algorithm Quantized Values � � ∈ = ∈ x [ t ] R { r , r , , r }, r i 1 2 l i � ∈ = y [ t ] K { 1 , , c } � ⊂ A { 1 , 2 , , n } Biological States

• Dynamical Systems – State: vector x – Transition function � – x [ t+1 ] = � ( x [ t ]) • Stochastic Process – Stochastic transition function • Next State – aleatory vector realization – Ex: Markov Chain ( � X|Y , � 0 ) • Time-discrete, finite-size vector, finite domain • Aleatory state sequence � � � p p p p � � p 1 | 1 2 | 1 3 | 1 n | R | | 1 � � � � 1 � � � p p p p � � p 1 | 2 2 | 2 3 | 2 n | R | | 2 2 � � � � � π = π = p p p p p � � � � 0 3 Y | X 1 | 3 2 | 3 3 | 3 n | R | | 3 � � � � � � � � � � � � � � � � � p � � � � p p p p � � n | R | n n n n n 1 || | 2 || | 3 || | | | || | R R R R R

• Probabilistic Genetic Networks - PGN π π - Markov Chain ( , ) with the following axioms : | 0 Y X π a. is homogeneou s, p independs on t , Y | X y | x n > ∀ ∈ b. p 0 , x , y R y | x π c. é condiciona lly independen t, that is, Y | X n n ∏ ∀ ∈ = x , y R , p p ( y | x ), y | x i = i 1 n π ∀ ∈ d. almost - determinis tic, that is, x R e Y | X ∈ = ∈ ≈ i N { 1 ,.., n }, there is r R | p 1 , = y r | x i n ∀ ∈ ∀ ∈ e. x R , i N , there is a sub - space of << = dimension j , j n , such as : p p , , y | x y | x i i , wher e x is the projection of x on this sub - space

• Markov Chain � � � p p p p 1 | 1 2 | 1 3 | 1 n 3 | 1 � � � � � p p p p 1 | 2 2 | 2 3 | 2 n 3 | 2 � � � π = � p p p p � | 1 | 3 2 | 3 3 | 3 n Y X 3 | 3 � � � � � � � � � � � � p p p p � � n n n n n 1 | 3 2 | 3 3 | 3 3 | 3 • Probabilistic Genetic Networks - PGN � P , P , , P X | X X | X X | X 1 2 n Almost Deterministic � � � p p p p r | 1 r | 1 r | 1 r | 1 � � 1 2 3 l � � � p p p p r | 2 r | 2 r | 2 r | 2 1 2 3 l � � � = P p p p p � � X | X r | 3 r | 3 r | 3 r | 3 i 1 2 3 l � � � � � � � � � � � � p p p p � � n n n n r || R | r || R | r || R | r || R | 1 2 3 l

Time-Course Gene Expression Data Expression (Gene 1) time Expression (Gene 2) time Expression (Gene 3) time . . . . . . . . . . . . . . . . . . Expression (Gene n) time Expression Measurement Techniques .... x [ 1 ] x [ 2 ] x [ 3 ] x [ 4 ] x [ 5 ] x [ 6 ] x [ 7 ] x [ 9 ] x [ 10 ] x [ 11 ] x [ 12 ] x [ 13 ] x [ m − 1 ] x [ m ]

� = P ( X , Y ), j 1 , , n j Quantized Microarray at t � {( x [ 0 ], y [ 0 ]), ( x [ 1 ], y [ 1 ]), , ( x [ m ], y [ m ])} j j j � � [ ] x t 1 � � x [ t ] � � 2 = x [ t ] � � � � � � � x [ t ] n | A | ψ → : R j K U-curve algorithm Quantized j Values � ∈ = � ∈ x [ t ] R { r , r , , r }, r i 1 2 l i = + y [ t ] x [ t 1 ] j j � ⊂ A j { 1 , 2 , , n } Gene j quantized expression at t+1

U-curve Search for Biological States Characterization and Genetic - PowerPoint PPT Presentation

U-curve Search for Biological States Characterization and Genetic Network Design Marcelo Ris Universidade de So Paulo Instituto de Matemtica e Estatstica Junior Barrera Universidade de So Paulo Instituto de Matemtica e

Curve Curve Ninjas December 19, 2012 Curve Ninjas Curve Overview Using Curve Implementation

Elliptic Curve Cryptography Applications of Elliptic Curve Cryptography Elliptic Curve

Bending the Cost Curve and Improving Bending the Cost Curve and Improving Bending the Cost Curve

Local Analysis of 2D Curve Patches Local Analysis of 2D Curve Patches Topic 4.2: Topic 4.2:

Maths Summary Intro: Zero coupon yield curve (Ex 13) TV concept Yield curve

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Characterization of the Household Electricity Characterization of the Household Electricity

SITE CHARACTERIZATION Part 1. Non-Intrusive Site Characterization Technologies Tyler E. Gass,

24 States in Total 14 States: Prison Programs 16 States: Jail Programs 2 States: Federal

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Uninformed Search Depth First Search Iterative Deepening Volker Sorge Uniform Cost Search

Z table and area under the Standard Normal Curve Z TABLE Area under the Standard Normal Curve (or

Curve of intersection of the surfaces z = x 3 and y =sin x + z 2 The curve can be parametrized as

CORPORATE PRESENTATION NOVEMBER 2017 DISCLAIMER This presentation has been prepared by Bank of

Solving an integrated Job-Shop problem with human resource constraints PMS10 - Tours (France)

Decison-Aid Methodologies in Transportation Optimization Exercise 3 Tom Robenek April 30,

For personal use only Lower cost nickel, gold and PGM optionality East Coast Roadshow 17-20

Segmentation as selective search for object recognition Elie Cattan 6/12/2013 Introduction

Status of the Bound-T WCET tool Niklas Holsti and Sami Saarinen Space Systems Finland Ltd.

Presentation by Colin Gordon, Assistant Director Membership to Gloucestershire Branch Chairman's

Applying Markov Logics for Controlling Abox Abduc9on