Outline Finish aCGH + HMM. Introduc4on to networks. 1 4/24/09 - - PDF document

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Finish aCGH + HMM. Introduc4on to networks. 1 4/24/09 - - PDF document

4/24/09 CSCI1950Z Computa4onal Methods for Biology Lecture 20 Ben Raphael April 15, 2009 hGp://cs.brown.edu/courses/csci1950z/ Outline Finish aCGH + HMM. Introduc4on to networks. 1 4/24/09 CGH Analysis (1) Divide genome into


slide-1
SLIDE 1

4/24/09 1

CSCI1950‐Z Computa4onal Methods for Biology Lecture 20

Ben Raphael April 15, 2009

hGp://cs.brown.edu/courses/csci1950‐z/

Outline

  • Finish aCGH + HMM.
  • Introduc4on to networks.
slide-2
SLIDE 2

4/24/09 2

CGH Analysis (1)

Divide genome into segments of equal copy number

‐0.5 0.5

Log2(R/G) Genomic posi4on

‐0.5 0.5

Dele4on Amplifica4on Genomic posi4on

A model for CGH data

S1


1


S2


2


S3


3


S4


4


µ1,
 ,
σ1


1


µ2, σ2 µ3, σ3 µ4, σ4 Homozygous

 Deletion

 (copy
=0)
 Emissions:
 Gaussians
 K
states
 copy
numbers
 Duplication

 (copy
>2)
 Heterozygous

 Deletion

 (copy
=1)
 Normal

 (copy
=2)


Copy number Genome coordinate

slide-3
SLIDE 3

4/24/09 3

Hidden Markov Models

1 2 K

1 2 K

1 2 K

… … … …

1 2 K

x1
 x2
 x3
 xK


2 1 K 2

Defini4on of a hidden Markov model

Defini:on: A hidden Markov model (HMM)

  • Alphabet

Σ = { b1, b2, …, bM }

  • Set of states Q = { 1, ..., K }
  • Transi4on probabili4es between any two states

aij = transi4on prob from state i to state j ai1 + … + aiK = 1, for all states i = 1…K

  • Start probabili4es a0i

a01 + … + a0K = 1

  • Emission probabili4es within each state

ei(b) = P( xi = b | πi = k) ei(b1) + … + ei(bM) = 1, for all states i = 1…K

K 1 … 2

slide-4
SLIDE 4

4/24/09 4

A HMM is memory‐less

At each 4me step t, the only thing that affects future states is the current state πt P(πt+1 = k | “whatever happened so far”) = P(πt+1 = k | π1, π2, …, πt, x1, x2, …, xt) = P(πt+1 = k | πt)

K 1 … 2

A parse of a sequence

Given a sequence x = x1……xN, A parse of x is a sequence of states π = π1, ……, πN

1 2 K

1 2 K

1 2 K

… … … …

1 2 K

x1
 x2
 x3
 xK


2 1 K 2

slide-5
SLIDE 5

4/24/09 5

Likelihood of a Parse

Simply, mul:ply all the orange arrows! (transi4on probs and emission probs) 1 2 K

1 2 K

1 2 K

… … … …

1 2 K

x1
 x2
 x3
 xK


2 1 K 2

The dishonest casino model

FAIR LOADED 0.05 0.05 0.95 0.95

P(1|F)
=
1/6
 P(2|F)
=
1/6
 P(3|F)
=
1/6
 P(4|F)
=
1/6
 P(5|F)
=
1/6
 P(6|F)
=
1/6
 P(1|L)
=
1/10
 P(2|L)
=
1/10
 P(3|L)
=
1/10
 P(4|L)
=
1/10
 P(5|L)
=
1/10
 P(6|L)
=
1/2


slide-6
SLIDE 6

4/24/09 6

Ques4on # 1 – Evalua4on

GIVEN A sequence of rolls by the casino player

1245526462146146136136661664661636616366163616515615115146123562344

QUESTION How likely is this sequence, given our model of how the casino works? This is the EVALUATION problem in HMMs

Prob = 1.3 x 10‐35

Ques4on # 2 – Decoding

GIVEN A sequence of rolls by the casino player

1245526462146146136136661664661636616366163616515615115146123562344

QUESTION What por4on of the sequence was generated with the fair die, and what por4on with the loaded die? This is the DECODING ques4on in HMMs. This is what we want to solve for CGH analysis

FAIR LOADED FAIR

slide-7
SLIDE 7

4/24/09 7

Ques4on # 3 – Learning

GIVEN A sequence of rolls by the casino player

1245526462146146136136661664661636616366163616515615115146123562344

QUESTION How “loaded” is the loaded die? How “fair” is the fair die? How oqen does the casino player change from fair to loaded, and back? This is the LEARNING ques4on in HMMs

Prob(6) = 64%

The three main ques4ons on HMMs

  • 1. Decoding

GIVEN: HMM M, and a sequence x, FIND: sequence π of states that maximizes P[ x, π | M ]

  • 2. Evalua:on

GIVEN: HMM M, and a sequence x, FIND: Prob[ x | M ]

  • 3. Learning

GIVEN: HMM M, with unspecified transi4on/emission

  • probs. and a sequence x,

FIND parameters θ = (ei(.), aij) that maximize P[ x | θ ]

slide-8
SLIDE 8

4/24/09 8

Problem 1: Decoding

Find the most likely parse of a sequence

Decoding

GIVEN x = x1x2……xN Find π = π1, ……, πN, to maximize P[ x, π ] π* = argmaxπ P[ x, π ] Maximizes a0π1 eπ1(x1) aπ1π2……aπN‐1πN eπN(xN) Dynamic Programming! Vk(i) = max{π1… πi‐1} P[x1…xi‐1, π1, …, πi‐1, xi, πi = k] = Prob. of most likely sequence of states ending at state πi = k

1 2 K

1 2 K

1 2 K

… … … …

1 2 K

x1
 x2
 x3
 xK


2 1 K 2

Given that we end up in state k at step i, maximize product to the le\ and right

slide-9
SLIDE 9

4/24/09 9

The Viterbi Algorithm

Input: x = x1……xN Ini:aliza:on: V0(0) = 1 (0 is the imaginary first posi4on) Vk(0) = 0, for all k > 0 Itera:on: Vj(i) = ej(xi) × maxk akj Vk(i – 1) Ptrj(i) = argmaxk akj Vk(i – 1) Termina:on: P(x, π*) = maxk Vk(N) Traceback: πN* = argmaxk Vk(N) πi‐1* = Ptrπi (i)

Problem 2: Evalua4on

Find the likelihood a sequence is generated by the model

slide-10
SLIDE 10

4/24/09 10

Genera4ng a sequence by the model

Given a HMM, we can generate a sequence of length n as follows:

  • 1. Start at state π1 with probability a0π1
  • 2. Emit leGer x1 with probability eπ1(x1)
  • 3. Go to state π2 with probability aπ1π2
  • 4. … un4l emiyng xn

1 2 K

1 2 K

1 2 K

… … … …

1 2 K

x1
 x2
 x3
 xn


2 1 K 2

e2(x1)
 a02


A couple of ques4ons

Given a sequence x,

  • What is the probability that x was generated by the model?
  • Given a posi4on i, what is the most likely state that emiGed xi?

Example: the dishonest casino Say x = 12341…23162616364616234112…21341 Most likely path: π = FF……F However: marked leGers more likely to be L than unmarked leGers

P(box: FFFFFFFFFFF) = (1/6)11 * 0.9512 = 2.76‐9 * 0.54 = 1.49‐9 P(box: LLLLLLLLLLL) = [ (1/2)6 * (1/10)5 ] * 0.9510 * 0.052 = 1.56*10‐7 * 1.5‐3

=

0.23‐9 F F

slide-11
SLIDE 11

4/24/09 11

Evalua4on

We will develop algorithms that allow us to compute: P(x) Probability of x given the model P(xi…xj) Probability of a substring of x given the model P(πI = k | x) Probability that the ith state is k, given x A more refined measure of which states x may be in

The Forward Algorithm

We want to calculate P(x) = probability of x, given the HMM Sum over all possible ways of genera4ng x: P(x) = Σπ P(x, π) = Σπ P(x | π) P(π) To avoid summing over an exponen4al number of paths π, define fk(i) = P(x1…xi, πi = k) (the forward probability)

slide-12
SLIDE 12

4/24/09 12

The Forward Algorithm – deriva4on

Define the forward probability: fk(i) = P(x1…xi, πi = k) = Σπ1…πi‐1 P(x1…xi‐1, π1,…, πi‐1, πi = k) ek(xi) = Σl Σπ1…πi‐2 P(x1…xi‐1, π1,…, πi‐2, πi‐1 = l) alk ek(xi) = Σl P(x1…xi‐1, πi‐1 = l) alk ek(xi) = ek(xi) Σl fl(i‐1) alk

The Forward Algorithm

We can compute fk(i) for all k, i, using dynamic programming! Ini:aliza:on: f0(0) = 1 fk(0) = 0, for all k > 0 Itera:on: fk(i) = ek(xi) Σl fl(i‐1) alk Termina:on: P(x) = Σk fk(N) ak0 Where, ak0 is the probability that the termina4ng state is k (usually = a0k)

slide-13
SLIDE 13

4/24/09 13

Rela4on between Forward and Viterbi

VITERBI

Ini:aliza:on: V0(0) = 1 Vk(0) = 0, for all k > 0 Itera:on: Vj(i) = ej(xi) maxk Vk(i‐1) akj Termina:on: P(x, π*) = maxk Vk(N)

FORWARD

Ini:aliza:on: f0(0) = 1 fk(0) = 0, for all k > 0 Itera:on: fl(i) = el(xi) Σk fk(i‐1) akl Termina:on: P(x) = Σk fk(N) ak0

Mo4va4on for the Backward Algorithm

We want to compute P(πi = k | x), the probability distribu4on on the ith posi4on, given x We start by compu4ng P(πi = k, x) = P(x1…xi, πi = k, xi+1…xN) = P(x1…xi, πi = k) P(xi+1…xN | x1…xi, πi = k) = P(x1…xi, πi = k) P(xi+1…xN | πi = k) Then, P(πi = k | x) = P(πi = k, x) / P(x) Forward,
f Forward,
fk(i) (i)

 Backward,
b Backward,
bk(i) (i)



slide-14
SLIDE 14

4/24/09 14

The Backward Algorithm – deriva4on

Define the backward probability:

bk(i) = P(xi+1…xN | πi = k) = Σπi+1…πN P(xi+1,xi+2, …, xN, πi+1, …, πN | πi = k) = Σl Σπi+1…πN P(xi+1,xi+2, …, xN, πi+1 = l, πi+2, …, πN | πi = k) = Σl el(xi+1) akl Σπi+1…πN P(xi+2, …, xN, πi+2, …, πN | πi+1 = l) = Σl el(xi+1) akl bl(i+1)

The Backward Algorithm

We can compute bk(i) for all k, i, using dynamic programming Ini:aliza:on: bk(N) = ak0, for all k Itera:on: bk(i) = Σl el(xi+1) akl bl(i+1) Termina:on: P(x) = Σl a0l el(x1) bl(1)

slide-15
SLIDE 15

4/24/09 15

Computa4onal Complexity

What is the running 4me, and space required, for Forward, and Backward? Time: O(K2N) Space: O(KN) Useful implementa4on technique to avoid underflows Viterbi: sum of logs Forward/Backward: rescaling at each posi4on by mul4plying by a constant K = number

  • f states

N = length of sequence

Posterior Decoding

We can now calculate fk(i) bk(i) P(πi = k | x) = ––––––– P(x) Then, we can ask What is the most likely state at posi4on i of sequence x: Define π^ by Posterior Decoding: π^

i = argmaxk P(πi = k | x)

slide-16
SLIDE 16

4/24/09 16

Posterior Decoding

  • For each state,

– Posterior Decoding gives us a curve of likelihood of state for each posi4on – That is some4mes more informa4ve than Viterbi path π*

  • Posterior Decoding may give an invalid sequence of states

– Why?

Posterior Decoding

  • P(πi = k | x) = Σπ P(π | x) 1(πi = k)

= Σ {π:π[i] = k} P(π | x)

x1



x2



x3
……………………………………………
xN
 State
1
 l


P(πi=l|x)

k
 1(ψ) = 1, if ψ is true 0, otherwise

slide-17
SLIDE 17

4/24/09 17

Viterbi, Forward, Backward

VITERBI Ini:aliza:on: V0(0) = 1 Vk(0) = 0, for all k > 0 Itera:on: Vl(i) = el(xi) maxk Vk(i‐1) akl Termina:on: P(x, π*) = maxk Vk(N) FORWARD Ini:aliza:on: f0(0) = 1 fk(0) = 0, for all k > 0 Itera:on: fl(i) = el(xi) Σk fk(i‐1) akl Termina:on: P(x) = Σk fk(N) ak0 BACKWARD Ini:aliza:on: bk(N) = ak0, for all k Itera:on: bl(i) = Σk el(xi+1) akl bk(i +1) Termina:on: P(x) = Σk a0k ek(x1) bk(1)

Problem 3: Learning

Re‐es4mate the parameters of the model based on training data

slide-18
SLIDE 18

4/24/09 18

Two learning scenarios

  • 1. Es4ma4on when the “right answer” is known

Examples: GIVEN: a genomic region x = x1…x1,000,000 where we have good (experimental) annota4ons of copy numbers GIVEN: the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls

  • 2. Es4ma4on when the “right answer” is unknown

Examples: GIVEN: tumor sample; we don’t know how the frequency and length of segments GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice QUESTION: Update the parameters θ of the model to maximize P(x|θ)

1. When the right answer is known

Given x = x1…xN for which the true π = π1…πN is known, Define: Akl = # 4mes k→l transi4on occurs in π Ek(b) = # 4mes state k in π emits b in x We can show that the maximum likelihood parameters θ (maximize P(x|θ)) are:

Akl

Ek(b)

akl = ––––– ek(b) = ––––––– Σi Aki Σc Ek(c)

slide-19
SLIDE 19

4/24/09 19

1. When the right answer is known

Intui:on: When we know the underlying states, Best es4mate is the average frequency of transi4ons & emissions that occur in the training data Drawback: Given liGle data, there may be overfieng: P(x|θ) is maximized, but θ is unreasonable 0 probabili:es – VERY BAD Example: Given 10 casino rolls, we observe x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3 π = F, F, F, F, F, F, F, F, F, F Then: aFF = 1; aFL = 0 eF(1) = eF(3) = .2; eF(2) = .3; eF(4) = 0; eF(5) = eF(6) = .1

Pseudocounts

Solu4on for small training sets: Add pseudocounts Akl = # 4mes k→l transi4on occurs in π + rkl Ek(b) = # 4mes state k in π emits b in x + rk(b) rkl, rk(b) are pseudocounts represen4ng our prior belief Larger pseudocounts ⇒ Strong priof belief Small pseudocounts (ε < 1): just to avoid 0 probabili4es

slide-20
SLIDE 20

4/24/09 20

Pseudocounts

Example: dishonest casino We will observe player for one day, 600 rolls Reasonable pseudocounts:

r0F = r0L = rF0 = rL0 = 1;

rFL = rLF = rFF = rLL = 1; rF(1) = rF(2) = … = rF(6) = 20 (strong belief fair is fair) rL(1) = rL(2) = … = rL(6) = 5 (wait and see for loaded) Above #s preGy arbitrary – assigning priors is an art

2. When the right answer is unknown

We don’t know the true Akl, Ek(b) Idea:

  • We es4mate our “best guess” on what Akl, Ek(b) are
  • We update the parameters of the model, based on our guess
  • We repeat
slide-21
SLIDE 21

4/24/09 21

2. When the right answer is unknown

Star4ng with our best guess of a model M, parameters θ: Given x = x1…xN for which the true π = π1…πN is unknown, We can get to a provably more likely parameter set θ i.e., θ that increases the probability P(x | θ) Principle: EXPECTATION MAXIMIZATION

  • 1. Es4mate Akl, Ek(b) in the training data
  • 2. Update θ according to Akl, Ek(b)
  • 3. Repeat 1 & 2, un4l convergence

Es4ma4ng new parameters

To es4mate Akl: (assume | θCURRENT, in all formulas below) At each posi4on i of sequence x, find probability transi4on k→l is used: P(πi = k, πi+1 = l | x) = [1/P(x)] × P(πi = k, πi+1 = l, x1…xN) = Q/P(x) where Q = P(x1…xi, πi = k, πi+1 = l, xi+1…xN) = = P(πi+1 = l, xi+1…xN | πi = k) P(x1…xi, πi = k) = = P(πi+1 = l, xi+1xi+2…xN | πi = k) fk(i) = = P(xi+2…xN | πi+1 = l) P(xi+1 | πi+1 = l) P(πi+1 = l | πi = k) fk(i) = = bl(i+1) el(xi+1) akl fk(i) fk(i) akl el(xi+1) bl(i+1) So: P(πi = k, πi+1 = l | x, θ) = –––––––––––––––––– P(x | θCURRENT)

slide-22
SLIDE 22

4/24/09 22

Es4ma4ng new parameters

  • So, Akl is the E[# 4mes transi4on k→l, given current θ]

fk(i) akl el(xi+1) bl(i+1)

Akl = Σi P(πi = k, πi+1 = l | x, θ) = Σi ––––––––––––––––– P(x | θ)

  • Similarly,

Ek(b) = [1/P(x | θ)]Σ {i | xi = b} fk(i) bk(i)

k l

xi+1

akl el(xi+1) bl(i+1) fk(i)

x1………xi-1 xi+2………xN xi

The Baum‐Welch Algorithm

Ini:aliza:on: Pick the best‐guess for model parameters (or arbitrary) Itera:on:

1. Forward 2. Backward 3. Calculate Akl, Ek(b), given θCURRENT 4. Calculate new model parameters θNEW : akl, ek(b) 5. Calculate new log‐likelihood P(x | θNEW)

GUARANTEED TO BE HIGHER BY EXPECTATION‐MAXIMIZATION Un4l P(x | θ) does not change much

slide-23
SLIDE 23

4/24/09 23

The Baum‐Welch Algorithm

Time Complexity: # itera4ons × O(K2N)

  • Guaranteed to increase the log likelihood P(x | θ)
  • Not guaranteed to find globally best parameters

Converges to local op4mum, depending on ini4al condi4ons

  • Too many parameters / too large model: Overtraining

Alterna4ve: Viterbi Training

Ini:aliza:on: Same Itera:on:

1. Perform Viterbi, to find π* 2. Calculate Akl, Ek(b) according to π* + pseudocounts 3. Calculate the new parameters akl, ek(b)

Un4l convergence Notes:

– Not guaranteed to increase P(x | θ) – Guaranteed to increase P(x, | θ, π*) – In general, worse performance than Baum‐Welch

slide-24
SLIDE 24

4/24/09 24

A model for CGH data

S1


1


S2


2


S3


3


S4


4


µ1,
 ,
σ1


1


µ2, σ2 µ3, σ3 µ4, σ4 Homozygous

 Deletion

 (copy
=0)
 Emissions:
 Gaussians
 K
states
 copy
numbers
 Duplication

 (copy
>2)
 Heterozygous

 Deletion

 (copy
=1)
 Normal

 (copy
=2)


Copy number Genome coordinate

Use Viterbi algorithm to derive segmenta4on. Forward/backward to compute state

  • f a single probe.

CGH Segmenta4on: Model Selec4on

How many states copy number states K? Larger K: 1. Better fit to observed data 2. More parameters to estimate Avoid overfitting by model selection.

Let θ = (A, B, π) be parameters for HMM. Try different k = 1, …, Kmax Compute L( θ | O ) by dynamic programming (forward‐backward algorithm) Calculate: ψ(k) = ‐log (L (θ | O ) ) + qK D(N)/N N = number of probes (data points) qk = number of parameters D(N) = 2 (AIC) or D(N) = log(N) (BIC) Choose K = argmink ψ(k)

slide-25
SLIDE 25

4/24/09 25

Problems with HMM model

Length of sequence emiGed from fixed state is geometrically distributed. P(j j j j j j j j) = P(πt+1 = j | πt = j) n

For CGH this means, 1) Length of aberrant intervals 2) Separa4on between two intervals of same copy number

Will be geometrically distributed

CGH Segmenta4on: Transi4ons

X
 X
 Y
 Y


1-p
 1-q
 p
 q


Let IX = length of sequence in state X.

  • P[lX = 1] = 1‐p
  • P[lX = 2] = p(1‐p)
  • P[lX= k] = pk(1‐p)
  • E[lX] = 1/(1‐p)
  • Geometric

distribu4on, with mean 1/(1‐p)

slide-26
SLIDE 26

4/24/09 26

Networks Biological Interac4on Networks

Many types:

  • Protein‐DNA

(regulatory)

  • Protein‐metabolite

(metabolic)

  • Protein‐protein

(signaling)

  • RNA‐RNA

(regulatory)

  • Gene4c interac4ons

(gene knockouts)

slide-27
SLIDE 27

4/24/09 27

Remaining Lectures

  • Biology of cellular interac4on networks
  • Network Alignment
  • Network Mo4fs
  • Network Integra4on

Metabolic Networks

Nodes = reactants Edges = reac4ons labeled by enzyme (protein) that catalyzes reac4on

slide-28
SLIDE 28

4/24/09 28

Regulatory Networks

B C A A “represses” C A “ac4vates” B

Nodes = genes Edges = regulatory interac4on

Protein‐DNA interac4on network

Regulatory Networks

slide-29
SLIDE 29

4/24/09 29

Signaling Networks Sources

  • Fridyland, et al. Hidden Markov models approach to the analysis of array

CGH data. Journal of Mul4variate Analysis, 2004

  • hGp://ai.stanford.edu/~serafim/CS262_2006/ (HMM slides)