a reversible infinite hmm using normalised random measures
play

A reversible infinite HMM using normalised random measures - PowerPoint PPT Presentation

A reversible infinite HMM using normalised random measures Konstantina Palla, David A. Knowles, Zoubin Ghahramani 23rd of June 2014 Konstantina Palla 1 / 24 M OTIVATION Assume a Markov chain X 1 , . . . , X t , . . . , X T , which is reversible


  1. A reversible infinite HMM using normalised random measures Konstantina Palla, David A. Knowles, Zoubin Ghahramani 23rd of June 2014 Konstantina Palla 1 / 24

  2. M OTIVATION Assume a Markov chain X 1 , . . . , X t , . . . , X T , which is reversible : P ( X 1 , . . . , X t , . . . X T ) = P ( X T , . . . , X t , . . . , X 1 ) Applications • Modelling physical systems e.g transitions of a macromolecule conformation at fixed temperature. • Chemical dynamics of protein folding. Tasks • Find the transition operation (transition matrix) of the reversible Markov chain • Put a prior on the reversible Markov chain This work: proposes a Bayesian non-parametric prior for reversible Markov chains. Konstantina Palla 2 / 24

  3. R EVERSIBLE M ARKOV CHAINS Problem : Put prior on reversible Markov chains. What does that mean? Reversible chains and random walk on weighted graph G ( V , E , W ) weighted undirected graph • vertex-set V = { i , r , q , . . . } i • edge-set E = { e ir , e iq , e rq , . . . } • weight-set W = { J ir , J rq , J iq , . . . } Discrete-time random walk on G → J iq J ir Markov chain with X t ∈ V and transition matrix J ij P ( i , j ) := k J ik , J rq � r q Put a prior on the transition matrix P (or on the weights J s). Konstantina Palla 3 / 24

  4. B ASIC T HEORY Seminal work by Diaconis, Freedman and Coppersmith. Markov Exchangeability A process on a countable space S is Markov exchangeable if the probability of observing a path X 1 , . . . , X t , . . . , X T is only a function of X 1 and the transition counts C ( i , j ) := |{ X t = i , X t + 1 = j ; 1 ≤ t < T }| for all i , j ∈ S . Representation Theorem (Diaconis and Freedman, 1980) A process is Markov exchangeable and returns to every state visited infinitely often (recurrent), if and only if it is a mixture of recurrent Markov chains T − 1 � � P ( X 2 , . . . , X t , . . . , X T | X 1 ) = P ( X t , X t + 1 ) µ ( dP | X 1 ) P t = 1 where P is the set of stochastic matrices on S × S and the mixing measure µ ( ·| X 1 ) on P is uniquely determined. Problem: Determine the prior µ . Not always easy. Konstantina Palla 4 / 24

  5. R ELATED WORK Random walk with reinforcement i • Idea: Simulate from the prior µ . • Increase the edge weight by +1 each time an edge is crossed. + 1 1 + 1 1 1 T →∞ T [ J ir , J rq , J iq ] − − − → [ L ir , L rq , L iq ] ∼ µ T - total number of steps, µ - measure over edge + 1 weights, the underlying prior r q 1 • Process Markov exchangeable, recurrent → mixture of recurrent MCs Examples • Edge Reinforcement Random Walk (ERRW) Diaconis and Freedman [1980], Diaconis and Rolles [2006]; conjugate prior for the transition matrix for reversible MCs. • Edge reinforced schema by Bacallado et al. [2013] extends ERRW to countably infinite space, reversible process, prior is difficult to characterise. Konstantina Palla 5 / 24

  6. R ELATED WORK Define a prior over reversible Markov chains: 1. Explicitly characterize the measure µ over transition matrix 2. Define an Edge Reinforcement schema Proposed work : Explicitly construct the prior µ over the weights (or equivalently the transition matrix) Konstantina Palla 6 / 24

  7. A MODEL FOR REVERSIBLE M ARKOV CHAINS General idea: Define the prior over the weights using the Gamma process hierarchically . Gamma process Γ P ( α 0 H ) Completely random measure on X with Lévy measure ν ( dw , dx ) = ρ ( dw ) H ( dx ) = a 0 w − 1 e − a 0 w dw H ( dx ) . on the space X × [ 0 , ∞ ) . H is the base measure and α 0 the concentration parameter. ∞ � G 0 := w i δ X i ∼ Γ P ( α 0 H ) i = 1 Countably infinite collection of pairs { X i , w i } ∞ i = 1 sampled from a Poisson process with intensity ν . Konstantina Palla 7 / 24

  8. A MODEL FOR REVERSIBLE M ARKOV CHAINS Define the prior over the weights using the Gamma process hierarchically . Model α 0 µ 0 1. First level: Γ P over space X G 0 α ∞ � G 0 = w i δ x i ∼ Γ P ( α 0 , µ 0 ) i = 1 G Set of states S := { x i ; x i ∈ X , i ∈ N } , countably infinite . X 2. Second level: Γ P over space S × S . w i i ∞ ∞ � � G = J ij δ X i X j ∼ Γ P ( α, µ ) , J iq J ri i = 1 j = 1 J ir J qi J ij | α, w i , w j ∼ Gamma ( α w i w j , α ) J rq Base measure atomic on S × S : w q r w r q µ ( x i , x j ) = G 0 ( x i ) G 0 ( x j ) J qr Non-reversible : Directed edges, J ij � = J ji Konstantina Palla 8 / 24

  9. A MODEL FOR REVERSIBLE M ARKOV CHAINS Reversibility w i i Impose symmetry J ij = J ji ∼ Gamma ( α w i w j , α ) Proof: Sufficient to prove detailed balance J ir J qi π i P ( i , j ) = π j P ( j , i ) � k J ik J rq where π i = k J jk , 0 < � k J jk < ∞ � � w q r w r j q Corollary: π is the invariant measure of the chain. We call the model the Symmetric Hierarchical Gamma Process (SHGP) Konstantina Palla 9 / 24

  10. A MODEL FOR REVERSIBLE M ARKOV CHAINS Properties • Irreducibility A MC is irreducible if ∃ t ∈ N s.t P t ij > 0 , ∀ i , j ∈ S J ij SHGP is irreducible: , J ij , � k J ik ∈ ( 0 , ∞ ) → P ij = k J ik > 0 a.s ∀ i , j ∈ S � • Recurrence A state i is positive recurrent if E ( τ ii ) < ∞ , τ ij := min { t > 1 : X t = j | X 1 = i } The SHGP is positive recurrent since the following applies: Theorem (Levin et al. [2006]) An irreducible Markov chain is positive recurrent iff there exists a probability distribution π such that π = π P. Konstantina Palla 10 / 24

  11. A MODEL FOR REVERSIBLE M ARKOV CHAINS Representation Theorem A process is Markov exchangeable and returns to every state visited infinitely often (recurrent), if and only if it is a mixture of recurrent Markov chains T − 1 � � P ( X 2 , . . . , X t , . . . , X T | X 1 ) = P ( X t , X t + 1 ) µ ( dP | X 1 ) P t = 1 where P is the set of stochastic matrices on S × S and µ ( ·| X 1 ) on P is the mixing measure. SHGP • Explicitly defined prior µ ; hierarchical construction of weights • SHGP is a mixture of recurrent, reversible Markov chains • SHGP is recurrent, Markov exchangeable and reversible. Konstantina Palla 11 / 24

  12. T HE SHGP H IDDEN M ARKOV M ODEL α 0 µ 0 G 0 α G X 1 X 2 X 3 X T X E Y 1 Y 2 Y 3 Y T Y Finite number of states K . Countably X t ∈ { 1 , . . . , K } - hidden state sequence. infinite model as K → ∞ . E - emission matrix K Y t , t = 1 , . . . , T - observed sequence with � G 0 = w i δ x i observation model F ( ·| E ) i = 1 Y t | X t , E ∼ iid F ( ·| E X t ) w i ∼ Gamma ( α 0 µ 0 ( x i ) , α 0 ) K K � � { E k , k = 1 , · · · , K } state emission G = J ij δ x i , x j parameters. F ; multinomial, Poisson and i = 1 j = 1 Gaussian observation models J ij = J ji ∼ Gamma ( α w i w j , α ) Konstantina Palla 12 / 24

  13. E XPERIMENTS We ran the SHGP Hidden Markov model on 2 real world datasets with reversible underlying systems. Comparison against • SHGP HMM non-reversible • infinite HMM (HDP) Konstantina Palla 13 / 24

  14. C H IP- SEQ DATA FROM NEURAL STEM CELLS • ChIP-seq allows us to measure what proteins, with what chemical modifications, are bound to DNA along the genome. • Y matrix T × L , T = 2 · 10 4 and L = 6: counts, how many reads for the protein of interest l map to bin t. • Poisson (multivariate) likelihood model F . 200 H3K27ac H3K27me3 150 H3K4me1 read counts H3K4me3 100 p300 Pol2 50 0 0 50 100 150 200 250 300 genomic location (100bp) Figure: ChipSeq data for a small section of length 300 of the whole chromosome region, along with the L = 6 identifiers (proteins of interest) Konstantina Palla 14 / 24

  15. C H IP- SEQ DATA FROM NEURAL STEM CELLS Task: Predict held out values in Y . Table: ChipSeq results for 10 runs using different hold out patterns (20%), a truncation level of K = 30, 1000 iterations and a burnin of 700. Model Alogirthm Train error Test error Train log likelihood Test log likelihood Reversible HMC 0 . 9122 ± 0 . 0032 1 . 1158 ± 0 . 0097 − 1 . 0488 ± 0 . 0009 − 3 . 2422 ± 0 . 0023 Non-rev 0 . 9127 ± 0 . 0033 1 . 1167 ± 0 . 0095 − 1 . 0494 ± 0 . 0009 − 3 . 2478 ± 0 . 0022 iHMM Beam Sampler 0 . 9383 ± 0 . 0061 1 . 1365 ± 0 . 0107 − 1 . 0727 ± 0 . 0041 − 3 . 3047 ± 0 . 0027 Konstantina Palla 15 / 24

  16. C H IP- SEQ DATA FROM NEURAL STEM CELLS SHGP recovers known types of regulatory regions • promoters . • enhancers . Figure: Learnt emission matrix L × K for ChIP-seq dataset. Element E lk is the Poisson rate parameter for protein l in state k . Brighter indicates higher values . Konstantina Palla 16 / 24

  17. S INGLE ION CHANNEL RECORDINGS DATASET • Patch clamp recordings is a method for measuring conformational changes in ion channels. These changes are accompanied by changes in electrical potential (measurements). • Y matrix 1 × T , T = 10 4 : 10KHz recording of electrical potential measurements of a single alamethicin channel. • Gaussian likelihood model F . Y t | X t , E ∼ N ( Y t ; µ, σ ) , where µ = E ( X t , 1 ) and σ = E ( X t , 2 ) with K × 2 emission matrix E . Table: Ion channel results across 10 different random hold out patterns, a truncation of K = 15, 1000 iterations and a burnin of 700. Model Alogirthm Train error Test error Train log likelihood Test log likelihood 0 . 023 ± 0 . 001 0 . 030 ± 0 . 002 2 . 204 ± 0 . 055 2 . 034 ± 0 . 058 Reversible HMC 0 . 027 ± 0 . 007 0 . 033 ± 0 . 007 2 . 108 ± 0 . 084 1 . 970 ± 0 . 078 Non-reversible HMC 0 . 038 ± 0 . 005 0 . 045 ± 0 . 004 2 . 134 ± 0 . 070 2 . 008 ± 0 . 058 iHMM Beam sampler Konstantina Palla 17 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend