learning from irregularly sampled time series
play

Learning from Irregularly-Sampled Time Series A Missing Data - PowerPoint PPT Presentation

Learning from Irregularly-Sampled Time Series A Missing Data Perspective Steven Cheng-Xian Li Benjamin M. Marlin University of Massachusetts Amherst Irregularly-Sampled Time Series Irregularly-sampled time series: Time series with non-uniform


  1. Learning from Irregularly-Sampled Time Series A Missing Data Perspective Steven Cheng-Xian Li Benjamin M. Marlin University of Massachusetts Amherst

  2. Irregularly-Sampled Time Series Irregularly-sampled time series: Time series with non-uniform time intervals between successive measurements 1

  3. Problem and Challenges 2 Problem: learning from a collection of irregularly-sampled time series within a common time interval value time Challenges: • Each time series is observed at arbitrary time points . • Different data cases may have different numbers of observations • Observed samples may not be aligned in time • Many real-world time series data are extremely sparse • Most machine learning algorithms require data lying on fixed dimensional feature space

  4. Problem and Challenges 2 Problem: learning from a collection of irregularly-sampled time series within a common time interval value time Tasks: • Learning the distribution of latent temporal processes • Inferring the latent process associated with a time series • Classification of time series This can be transformed into a missing data problem .

  5. Index Representation of Incomplete Data • Examples: • Image: pixel coordinates • Time series: timestamps • Applicable for both finite and continuous index set. 3 Data defined on an index set I : • Complete data as a mapping: I → R . Index representation of an incomplete data case ( x , t ) : • t ≡ { t i } | t | i =1 ⊂ I are the indices of observed entries. • x i is the corresponding value observed at t i .

  6. Index Representation of Incomplete Data • Examples: • Image: pixel coordinates • Time series: timestamps • Applicable for both finite and continuous index set. 3 Data defined on an index set I : • Complete data as a mapping: I → R . Index representation of an incomplete data case ( x , t ) : • t ≡ { t i } | t | i =1 ⊂ I are the indices of observed entries. • x i is the corresponding value observed at t i .

  7. Generative Process of Incomplete Data 4 Generative process for an incomplete case ( x , t ) : f ∼ p θ ( f ) complete data f : I → R t ∈ 2 I (subset of I ) t ∼ p I ( t | f ) � � | t | x = f ( t i ) values of f indexed at t i =1 Goal: learning the complete data distribution p θ given the incomplete dataset D = { ( x i , t i ) } n i =1

  8. Generative Process of Incomplete Data 4 Generative process for an incomplete case ( x , t ) : f ∼ p θ ( f ) complete data f : I → R t ∼ p I ( t ) independence between f and t � � | t | x = f ( t i ) values of f indexed at t i =1 Goal: learning the complete data distribution p θ given the incomplete dataset D = { ( x i , t i ) } n i =1

  9. Generative Process of Incomplete Data 4 Generative process for an incomplete case ( x , t ) : f ∼ p θ ( f ) complete data f : I → R t ∼ p I ( t ) independence between f and t � � | t | x = f ( t i ) values of f indexed at t i =1 Goal: learning the complete data distribution p θ given the incomplete dataset D = { ( x i , t i ) } n i =1

  10. Encoder-Decoder Framework for Incomplete Data Probabilistic latent variable model Decoder: 5 • Model the data generating process: z ∼ p z ( z ) , f = g θ ( z ) � � | t | • Given t ∼ p I , the corresponding values are g θ ( z , t ) ≡ f ( t i ) i =1 . • Note: our goal is to model g θ , not p I .

  11. Encoder-Decoder Framework for Incomplete Data • Different incomplete cases carry different levels of uncertainty. Encoder ( stochastic ): • Replacing all missing entries by zero. 6 • Model the posterior distribution q φ ( z | x , t ) • Functional form: q φ ( z | x , t ) = q φ ( z | m ( x , t )) • Example: q φ ( z | x , t ) = N ( z | µ φ ( v ) , Σ φ ( v )) with v = m ( x , t ) . Masking function m ( x , t ) : ( ) • m = ( ) value x value x • m = 0 0 T T 0 time t 0 time t

  12. Partial Variational Autoencoder (P-VAE) (i.i.d. noise) 7 Generative process: Joint distribution: ( x , t ) ∼ p D x t t ∼ p I ( t ) z ∼ p ( z ) q φ f = g θ ( z ) x i ∼ p ( x i | f ( t i )) z Example: p ( x i | f ( t i )) = N ( x i | f ( t i ) , σ 2 ) g θ � | t | � � x p θ ( x i | z , t i ) d z p ( x , t ) = p ( z ) p I ( t ) i =1 p θ ( x i | z , t i ) is the shorthand for p ( x i | f ( t i )) with f = g θ ( z ) .

  13. Partial Variational Autoencoder (P-VAE) Kingma & Welling. (2014). Auto-encoding variational bayes. 8 Ma, et al. (2018). Partial VAE for hybrid recommender system. ( x , t ) ∼ p D x t Variational lower bound for log p ( x , t ) : � q φ ( z | x , t ) log p z ( z ) p I ( t ) � | t | i =1 p θ ( x i | z , t i ) q φ d z q φ ( z | x , t ) z Learning with gradients without p I ( t ) involved : � � p I ( t ) � | t | ✟ log p z ( z ) ✟✟ i =1 p θ ( x i | z , t i ) ∇ φ,θ E z ∼ q φ ( z | x , t ) g θ q φ ( z | x , t ) � x

  14. Partial Variational Autoencoder (P-VAE) Related work: 9 • MIWAE [Mattei & Frellsen, 2019] • Partial VAE [Ma, et al., 2018] • Neural processes [Garnelo, et al., 2018] ( x , t ) ∼ p D x t Conditional objective (lower bound for log p ( x | t ) ): � � log p z ( z ) � | t | q φ i =1 p θ ( x i | z , t i ) E z ∼ q φ ( z | x , t ) q φ ( z | x , t ) z g θ � x

  15. Partial Bidirectional GAN (P-BiGAN) Li, Jiang, Marlin. (2019). MisGAN: Learning from Incomplete Data with GANs. Donahue, et al. (2016). Adversarial feature learning (BiGAN). 10 { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D x x ′ t q φ g θ z z ′ t ′ encoding decoding

  16. Partial Bidirectional GAN (P-BiGAN) Li, Jiang, Marlin. (2019). MisGAN: Learning from Incomplete Data with GANs. Donahue, et al. (2016). Adversarial feature learning (BiGAN). 10 { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D ( x , t ) ∼ p D x x ′ t q φ g θ z z ′ t ′ encoding decoding

  17. Partial Bidirectional GAN (P-BiGAN) Li, Jiang, Marlin. (2019). MisGAN: Learning from Incomplete Data with GANs. Donahue, et al. (2016). Adversarial feature learning (BiGAN). 10 { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D x x ′ t q φ g θ ( · , t ′ ) ∼ p D z z ′ t ′ z ′ ∼ p z encoding decoding

  18. Partial Bidirectional GAN (P-BiGAN) Donahue, et al. (2016). Adversarial feature learning (BiGAN). 10 Li, Jiang, Marlin. (2019). MisGAN: Learning from Incomplete Data with GANs. { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D ( x , t ) ∼ p D x x ′ t q φ g θ ( · , t ′ ) ∼ p D z z ′ t ′ z ′ ∼ p z encoding decoding Discriminator: D ( m ( x , t ) , z )

  19. Invertibility Property of P-BiGAN 11 Theorem: For ( x , t ) with non-zero probability, if z ∼ q φ ( z | x , t ) then g θ ( z , t ) = x . { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D x t x ′ q φ g θ z z ′ t ′ encoding decoding g θ ( z , t ) is the shorthand notation for [ f ( t i )] | t | i =1 with f = g θ ( z ) .

  20. Autoencoding Regularization for P-BiGAN 12 { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D x t x ′ q φ g θ x ) ℓ ( x , � z z ′ t ′ g θ � x

  21. Missing Data Imputation Sampling: Imputation: 13 x t p ( x ′ | t ′ , x , t ) = E z ∼ q φ ( z | x , t ) [ p θ ( x ′ | z , t ′ )] q φ z t ′ z ∼ q φ ( z | x , t ) f = g θ ( z ) g θ x ′ = [ f ( t ′ i )] | t ′ | i =1 x ′

  22. Supervised Learning: Classification Adding classification term to objective: Prediction: classification regularization 14 x t � � log p z ( z ) p θ ( x | z , t ) p ( y | z ) E z ∼ q φ ( z | x , t ) q φ q φ ( z | x , t ) � � log p z ( z ) p θ ( x | z , t ) + E q φ ( z | x , t ) [log p ( y | z )] = E q φ ( z | x , t ) q φ ( z | x , t ) z � �� � � �� � C y = argmax � E q φ ( z | x , t ) [log p ( y | z )] � y y

  23. MNIST Completion P-VAE P-BiGAN square observation with 90% missing P-VAE P-BiGAN independent dropout with 90% missing 15

  24. CelebA Completion P-VAE P-BiGAN square observation with 90% missing P-VAE P-BiGAN independent dropout with 90% missing 16

  25. Architecture for Irregularly-Sampled Time Series P-VAE P-BiGAN How to construct decoder, encoder and discriminator for continuous 17 index set, e.g., time series with I = [0 , T ] ? ( x , t ) ∼ p D x t { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D q φ x t x ′ z q φ g θ g θ z z ′ t ′ � x

  26. Decoder for Continuous Time Series Generative process for time series: kernel smoother 18 z ∼ p z ( z ) v = CNN θ ( z ) values on evenly-spaced times u � L i =1 K ( u i , t ) v i f ( t ) = � L i =1 K ( u i , t )

  27. Continuous Convolutional Layer 19 CNN encoder/discriminator Cross-correlation between: evenly-spaced u index t continuous kernel w ( t ) • continuous kernel w ( t ) • masked function m ( x , t )( t ) = � | t | i =1 x i δ ( t − t i ) δ ( · ) is the Dirac delta function.

  28. Continuous Convolutional Layer 19 CNN encoder/discriminator evenly-spaced u index t continuous kernel w ( t ) Cross-correlation between w and m ( x , t ) : � w ( t i − u ) x i ( w ⋆ m ( x , t ))( u ) = i : t i ∈ neighbor ( u ) Construct kernel w ( t ) using a degree-1 B-spline

  29. Continuous Convolutional Layer 19 CNN encoder/discriminator evenly-spaced u index t continuous kernel w ( t ) Cross-correlation between w and m ( x , t ) : � w ( t i − u ) x i ( w ⋆ m ( x , t ))( u ) = i : t i ∈ neighbor ( u ) Construct kernel w ( t ) using a degree-1 B-spline

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend