Outline Finish aCGH + HMM. Introduc4on to networks. 1 4/24/09 - PDF document

4/24/09 CSCI1950‐Z Computa4onal Methods for Biology Lecture 20 Ben Raphael April 15, 2009 hGp://cs.brown.edu/courses/csci1950‐z/ Outline • Finish aCGH + HMM. • Introduc4on to networks. 1

4/24/09 CGH Analysis (1) Divide genome into segments of equal copy number 0.5 Log 2 (R/G) 0 Genomic posi4on ‐0.5 Dele4on Amplifica4on 0.5 0 Genomic posi4on ‐0.5 A model for CGH data K states  S 1  S 2  S 3  S 4  copy numbers  1  2  3  4  Homozygous   Heterozygous   Normal   Duplication   Deletion   Deletion   (copy =2)  (copy >2)  (copy =1)  (copy =0)  µ 1 ,  ,  σ 1  µ 2 , σ 2 µ 3 , σ 3 µ 4 , σ 4 1  Emissions:  Copy number Gaussians  Genome coordinate 2

4/24/09 Hidden Markov Models 1 1 1 1 … 1 … 2 2 2 2 2 2 … … … … K K K K … K x 1   x 2   x 3   x K   Defini4on of a hidden Markov model Defini:on: A hidden Markov model (HMM) • Alphabet Σ = { b 1 , b 2 , …, b M } Set of states Q = { 1, ..., K } • Transi4on probabili4es between any two states • 1 2 a ij = transi4on prob from state i to state j a i1 + … + a iK = 1, for all states i = 1…K • Start probabili4es a 0i K … a 01 + … + a 0K = 1 • Emission probabili4es within each state e i (b) = P( x i = b | π i = k) e i (b 1 ) + … + e i (b M ) = 1, for all states i = 1…K 3

4/24/09 A HMM is memory‐less 1 2 At each 4me step t, the only thing that affects future states is the current state π t K … P( π t+1 = k | “whatever happened so far”) = P( π t+1 = k | π 1 , π 2 , …, π t , x 1 , x 2 , …, x t ) = P( π t+1 = k | π t ) A parse of a sequence Given a sequence x = x 1 ……x N , A parse of x is a sequence of states π = π 1 , ……, π N … 1 1 1 1 1 … 2 2 2 2 2 2 … … … … … K K K K K x 1   x 2   x 3   x K   4

4/24/09 Likelihood of a Parse Simply, mul:ply all the orange arrows ! (transi4on probs and emission probs) … 1 1 1 1 1 … 2 2 2 2 2 2 … … … … … K K K K K x 1   x 2   x 3   x K   The dishonest casino model 0.05 0.95 0.95 FAIR LOADED P(1|F) = 1/6  P(1|L) = 1/10  P(2|F) = 1/6  P(2|L) = 1/10  0.05 P(3|F) = 1/6  P(3|L) = 1/10  P(4|F) = 1/6  P(4|L) = 1/10  P(5|F) = 1/6  P(5|L) = 1/10  P(6|F) = 1/6  P(6|L) = 1/2  5

4/24/09 Ques4on # 1 – Evalua4on GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 Prob = 1.3 x 10 ‐35 QUESTION How likely is this sequence, given our model of how the casino works? This is the EVALUATION problem in HMMs Ques4on # 2 – Decoding GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 FAIR LOADED FAIR QUESTION What por4on of the sequence was generated with the fair die, and what por4on with the loaded die? This is the DECODING ques4on in HMMs. This is what we want to solve for CGH analysis 6

4/24/09 Ques4on # 3 – Learning GIVEN A sequence of rolls by the casino player 1245526462146146136136661664661636616366163616515615115146123562344 Prob(6) = 64% QUESTION How “loaded” is the loaded die? How “fair” is the fair die? How oqen does the casino player change from fair to loaded, and back? This is the LEARNING ques4on in HMMs The three main ques4ons on HMMs 1. Decoding GIVEN: HMM M, and a sequence x, FIND: sequence π of states that maximizes P[ x, π | M ] 2. Evalua:on GIVEN: HMM M, and a sequence x, FIND: Prob[ x | M ] 3. Learning GIVEN: HMM M, with unspecified transi4on/emission probs. and a sequence x, FIND parameters θ = (e i (.), a ij ) that maximize P[ x | θ ] 7

4/24/09 Problem 1: Decoding Find the most likely parse of a sequence Decoding 1 1 1 1 1 … GIVEN x = x 1 x 2 ……x N 2 2 2 2 … 2 2 Find π = π 1 , ……, π N , to maximize P[ x, π ] … … … … K K K K K … π * = argmax π P[ x, π ] x 1   x 2   x 3   x K   Maximizes a 0 π 1 e π 1 (x 1 ) a π 1 π 2 ……a π N‐1 π N e π N (x N ) Dynamic Programming! Given that we end up in state k at step i, maximize product to the le\ and right V k (i) = max { π 1… π i‐1} P[x 1 …x i‐1 , π 1 , …, π i‐1 , x i , π i = k] = Prob. of most likely sequence of states ending at state π i = k 8

4/24/09 The Viterbi Algorithm Input: x = x 1 ……x N Ini:aliza:on: V 0 (0) = 1 (0 is the imaginary first posi4on) V k (0) = 0, for all k > 0 Itera:on: V j (i) = e j (x i ) × max k a kj V k (i – 1) Ptr j (i) = argmax k a kj V k (i – 1) Termina:on: P(x, π *) = max k V k (N) Traceback: π N * = argmax k V k (N) π i‐1 * = Ptr π i (i) Problem 2: Evalua4on Find the likelihood a sequence is generated by the model 9

4/24/09 Genera4ng a sequence by the model Given a HMM, we can generate a sequence of length n as follows: 1. Start at state π 1 with probability a 0 π 1 2. Emit leGer x 1 with probability e π 1 (x 1 ) 3. Go to state π 2 with probability a π 1 π 2 4. … un4l emiyng x n 1 1 1 1 1 … a 02   2 2 2 2 2 2 … 0 … … … … K K K K K … e 2 (x 1 )  x 1   x 2   x 3   x n   A couple of ques4ons P(box: FFFFFFFFFFF) = Given a sequence x, (1/6) 11 * 0.95 12 = 2.76 ‐9 * 0.54 = What is the probability that x was generated by the model? • 1.49 ‐9 P(box: LLLLLLLLLLL) = • Given a posi4on i, what is the most likely state that emiGed x i ? [ (1/2) 6 * (1/10) 5 ] * 0.95 10 * 0.05 2 = 1.56*10 ‐7 * 1.5 ‐3 = 0.23 ‐9 Example: the dishonest casino Say x = 12341…23162616364616234112…21341 F F Most likely path: π = FF……F However: marked leGers more likely to be L than unmarked leGers 10

4/24/09 Evalua4on We will develop algorithms that allow us to compute: P(x) Probability of x given the model P(x i …x j ) Probability of a substring of x given the model P( π I = k | x) Probability that the i th state is k, given x A more refined measure of which states x may be in The Forward Algorithm We want to calculate P(x) = probability of x, given the HMM Sum over all possible ways of genera4ng x: P(x) = Σ π P(x, π ) = Σ π P(x | π ) P( π ) To avoid summing over an exponen4al number of paths π , define f k (i) = P(x 1 …x i , π i = k) (the forward probability) 11

4/24/09 The Forward Algorithm – deriva4on Define the forward probability: f k (i) = P(x 1 …x i , π i = k) = Σ π 1… π i‐1 P(x 1 …x i‐1 , π 1 ,…, π i‐1 , π i = k) e k (x i ) = Σ l Σ π 1… π i‐2 P(x 1 …x i‐1 , π 1 ,…, π i‐2 , π i‐1 = l) a lk e k (x i ) = Σ l P(x 1 …x i‐1 , π i‐1 = l) a lk e k (x i ) = e k (x i ) Σ l f l (i‐1) a lk The Forward Algorithm We can compute f k (i) for all k, i, using dynamic programming! Ini:aliza:on: f 0 (0) = 1 f k (0) = 0, for all k > 0 Itera:on: f k (i) = e k (x i ) Σ l f l (i‐1) a lk Termina:on: P(x) = Σ k f k (N) a k0 Where, a k0 is the probability that the termina4ng state is k (usually = a 0k ) 12

4/24/09 Rela4on between Forward and Viterbi VITERBI FORWARD Ini:aliza:on: Ini:aliza:on: f 0 (0) = 1 V 0 (0) = 1 f k (0) = 0, for all k > 0 V k (0) = 0, for all k > 0 Itera:on: Itera:on: = e j (x i ) max k V k (i‐1) a kj f l (i) = e l (x i ) Σ k f k (i‐1) a kl V j (i) Termina:on: Termina:on: P(x, π *) = max k V k (N) P(x) = Σ k f k (N) a k0 Mo4va4on for the Backward Algorithm We want to compute P( π i = k | x), the probability distribu4on on the i th posi4on, given x We start by compu4ng P( π i = k, x) = P(x 1 …x i , π i = k, x i+1 …x N ) = P(x 1 …x i , π i = k) P(x i+1 …x N | x 1 …x i , π i = k) = P(x 1 …x i , π i = k) P(x i+1 …x N | π i = k) Forward, f Forward, f k (i) (i)   Backward, b Backward, b k (i) (i)   Then, P( π i = k | x) = P( π i = k, x) / P(x) 13

4/24/09 The Backward Algorithm – deriva4on Define the backward probability: b k (i) = P(x i+1 …x N | π i = k) = Σ π i+1… π N P(x i+1 ,x i+2 , …, x N , π i+1 , …, π N | π i = k) = Σ l Σ π i+1… π N P(x i+1 ,x i+2 , …, x N , π i+1 = l, π i+2 , …, π N | π i = k) = Σ l e l (x i+1 ) a kl Σ π i+1… π N P(x i+2 , …, x N , π i+2 , …, π N | π i+1 = l) = Σ l e l (x i+1 ) a kl b l (i+1) The Backward Algorithm We can compute b k (i) for all k, i, using dynamic programming Ini:aliza:on: b k (N) = a k0 , for all k Itera:on: b k (i) = Σ l e l (x i+1 ) a kl b l (i+1) Termina:on: P(x) = Σ l a 0l e l (x 1 ) b l (1) 14

4/24/09 Computa4onal Complexity What is the running 4me, and space required, for Forward, and Backward? N = length of sequence K = Time: O(K 2 N) number Space: O(KN) of states Useful implementa4on technique to avoid underflows Viterbi: sum of logs Forward/Backward: rescaling at each posi4on by mul4plying by a constant Posterior Decoding We can now calculate f k (i) b k (i) P( π i = k | x) = ––––––– P(x) Then, we can ask What is the most likely state at posi4on i of sequence x: Define π ^ by Posterior Decoding: π ^ i = argmax k P( π i = k | x) 15

Outline Finish aCGH + HMM. Introduc4on to networks. 1 4/24/09 - PDF document

4/24/09 CSCI1950Z Computa4onal Methods for Biology Lecture 20 Ben Raphael April 15, 2009 hGp://cs.brown.edu/courses/csci1950z/ Outline Finish aCGH + HMM. Introduc4on to networks. 1 4/24/09 CGH Analysis (1) Divide genome into

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge

Mendelian Genetics Slide 2 / 43 1 Where do you get your traits from? Slide 3 / 43 2 True or

High-dimensional data-sets and the problems they cause Paul Marjoram, Dept. of Preventive

How to choose summary statistics for model selection and model checking. Sarah Filippi Imperial

Results for the Third Quarter ended 30 September 2009 22 October 2009 0 0 Disclaimer This

L ECTURE 10 Labor Markets April 1, 2015 I. O VERVIEW Issues and Papers Broadlythe

Disclosure I have nothing to disclose Trying to Prevent Illness in Kids Who

Chapter 6 Role of capital Role of population growth Role of other production factors: