cse 527 lecture 7
play

CSE 527 Lecture 7 Relative entropy Convergence of EM Weight - PowerPoint PPT Presentation

CSE 527 Lecture 7 Relative entropy Convergence of EM Weight matrix motif models Talk Today COMBI Seminar Today: Dr. David Baker Progress in High-Resolution Modeling of Protein Structure and Interactions Today, October 19, 2005


  1. CSE 527 Lecture 7 Relative entropy Convergence of EM Weight matrix motif models

  2. Talk Today COMBI Seminar Today: Dr. David Baker “Progress in High-Resolution Modeling of Protein Structure and Interactions” Today, October 19, 2005 1:30-2:30 HSB K-069

  3. Relative Entropy • AKA Kullback-Liebler Distance/Divergence, AKA Information Content • Given distributions P , Q P ( x ) log P ( x ) � H ( P || Q ) = Q ( x ) x ∈ Ω Notes: Let P ( x ) log P ( x ) Q ( x ) = 0 if P ( x ) = 0 [since lim y → 0 y log y = 0] Undefined if 0 = Q ( x ) < P ( x )

  4. ln x x − 1 ≤ 1 0.5 1 1.5 2 2.5 -1 − ln x 1 − x ≥ ln(1 /x ) 1 − x ≥ -2 ln x 1 − 1 /x ≥

  5. Theorem: H ( P || Q ) ≥ 0 x P ( x ) log P ( x ) � H ( P || Q ) = Q ( x ) � � 1 − Q ( x ) � x P ( x ) ≥ P ( x ) � = x ( P ( x ) − Q ( x )) � x P ( x ) − � = x Q ( x ) = 1 − 1 = 0 Furthermore: H(P||Q) = 0 if and only if P = Q

  6. EM Convergence

  7. ↑ θ → Choose θ t+1 = arg max θ Q( θ | θ t)

  8. Sequence Motifs

  9. E. coli Promoters • “TATA Box” - consensus TATAAT ~ 10bp upstream of transcription start • Not exact: of 168 studied – nearly all had 2/3 of TAxyzT – 80-90% had all 3 – 50% agreed in each of x,y,z – no perfect match • Other common features at -35, etc.

  10. TATA Box Frequencies pos 1 2 3 4 5 6 base A 2 95 26 59 51 1 C 9 2 14 13 20 3 G 10 1 16 15 13 0 T 79 3 44 13 17 96

  11. Scanning for TATA Stormo, Ann. Rev. Biophys. Biophys Chem, 17, 1988, 241-263

  12. Weight Matrices: Statistics • Assume: f b,i = frequency of base b in position i f b = frequency of base b in all sequences • Log likelihood ratio, given S = B 1 B 2 ...B 6 : 6 f f   P(S | “promoter” ) ∏    P(S  log | “nonpromot P(S “promoter” | er”) )   = log ∏    6 i f f = 1 B 6  i   i = , = ∑ 1 6 i   log  f f i B , i      ∏  = i 1 B i   i B    B , i B , i i 1 6 =   log log i log i   = = ∑   i 1 6 =   P(S | “nonpromot er”) f   f ∏   B B   i 1   = i i

  13. Weight Matrices: Chemistry • Experiments show ~80% correlation of log likelihood weight matrix scores to measured binding energy of RNA polymerase to variations on TATAAT consensus [Stormo & Fields]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend