Learning from Irregularly-Sampled Time Series A Missing Data - PowerPoint PPT Presentation

Learning from Irregularly-Sampled Time Series A Missing Data Perspective Steven Cheng-Xian Li Benjamin M. Marlin University of Massachusetts Amherst

Irregularly-Sampled Time Series Irregularly-sampled time series: Time series with non-uniform time intervals between successive measurements 1

Problem and Challenges 2 Problem: learning from a collection of irregularly-sampled time series within a common time interval value time Challenges: • Each time series is observed at arbitrary time points . • Different data cases may have different numbers of observations • Observed samples may not be aligned in time • Many real-world time series data are extremely sparse • Most machine learning algorithms require data lying on fixed dimensional feature space

Problem and Challenges 2 Problem: learning from a collection of irregularly-sampled time series within a common time interval value time Tasks: • Learning the distribution of latent temporal processes • Inferring the latent process associated with a time series • Classification of time series This can be transformed into a missing data problem .

Index Representation of Incomplete Data • Examples: • Image: pixel coordinates • Time series: timestamps • Applicable for both finite and continuous index set. 3 Data defined on an index set I : • Complete data as a mapping: I → R . Index representation of an incomplete data case ( x , t ) : • t ≡ { t i } | t | i =1 ⊂ I are the indices of observed entries. • x i is the corresponding value observed at t i .

Generative Process of Incomplete Data 4 Generative process for an incomplete case ( x , t ) : f ∼ p θ ( f ) complete data f : I → R t ∈ 2 I (subset of I ) t ∼ p I ( t | f ) � � | t | x = f ( t i ) values of f indexed at t i =1 Goal: learning the complete data distribution p θ given the incomplete dataset D = { ( x i , t i ) } n i =1

Generative Process of Incomplete Data 4 Generative process for an incomplete case ( x , t ) : f ∼ p θ ( f ) complete data f : I → R t ∼ p I ( t ) independence between f and t � � | t | x = f ( t i ) values of f indexed at t i =1 Goal: learning the complete data distribution p θ given the incomplete dataset D = { ( x i , t i ) } n i =1

Encoder-Decoder Framework for Incomplete Data Probabilistic latent variable model Decoder: 5 • Model the data generating process: z ∼ p z ( z ) , f = g θ ( z ) � � | t | • Given t ∼ p I , the corresponding values are g θ ( z , t ) ≡ f ( t i ) i =1 . • Note: our goal is to model g θ , not p I .

Encoder-Decoder Framework for Incomplete Data • Different incomplete cases carry different levels of uncertainty. Encoder ( stochastic ): • Replacing all missing entries by zero. 6 • Model the posterior distribution q φ ( z | x , t ) • Functional form: q φ ( z | x , t ) = q φ ( z | m ( x , t )) • Example: q φ ( z | x , t ) = N ( z | µ φ ( v ) , Σ φ ( v )) with v = m ( x , t ) . Masking function m ( x , t ) : ( ) • m = ( ) value x value x • m = 0 0 T T 0 time t 0 time t

Partial Variational Autoencoder (P-VAE) (i.i.d. noise) 7 Generative process: Joint distribution: ( x , t ) ∼ p D x t t ∼ p I ( t ) z ∼ p ( z ) q φ f = g θ ( z ) x i ∼ p ( x i | f ( t i )) z Example: p ( x i | f ( t i )) = N ( x i | f ( t i ) , σ 2 ) g θ � | t | � � x p θ ( x i | z , t i ) d z p ( x , t ) = p ( z ) p I ( t ) i =1 p θ ( x i | z , t i ) is the shorthand for p ( x i | f ( t i )) with f = g θ ( z ) .

Partial Variational Autoencoder (P-VAE) Kingma & Welling. (2014). Auto-encoding variational bayes. 8 Ma, et al. (2018). Partial VAE for hybrid recommender system. ( x , t ) ∼ p D x t Variational lower bound for log p ( x , t ) : � q φ ( z | x , t ) log p z ( z ) p I ( t ) � | t | i =1 p θ ( x i | z , t i ) q φ d z q φ ( z | x , t ) z Learning with gradients without p I ( t ) involved : � � p I ( t ) � | t | ✟ log p z ( z ) ✟✟ i =1 p θ ( x i | z , t i ) ∇ φ,θ E z ∼ q φ ( z | x , t ) g θ q φ ( z | x , t ) � x

Partial Variational Autoencoder (P-VAE) Related work: 9 • MIWAE [Mattei & Frellsen, 2019] • Partial VAE [Ma, et al., 2018] • Neural processes [Garnelo, et al., 2018] ( x , t ) ∼ p D x t Conditional objective (lower bound for log p ( x | t ) ): � � log p z ( z ) � | t | q φ i =1 p θ ( x i | z , t i ) E z ∼ q φ ( z | x , t ) q φ ( z | x , t ) z g θ � x

Partial Bidirectional GAN (P-BiGAN) Li, Jiang, Marlin. (2019). MisGAN: Learning from Incomplete Data with GANs. Donahue, et al. (2016). Adversarial feature learning (BiGAN). 10 { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D x x ′ t q φ g θ z z ′ t ′ encoding decoding

Partial Bidirectional GAN (P-BiGAN) Li, Jiang, Marlin. (2019). MisGAN: Learning from Incomplete Data with GANs. Donahue, et al. (2016). Adversarial feature learning (BiGAN). 10 { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D ( x , t ) ∼ p D x x ′ t q φ g θ z z ′ t ′ encoding decoding

Partial Bidirectional GAN (P-BiGAN) Li, Jiang, Marlin. (2019). MisGAN: Learning from Incomplete Data with GANs. Donahue, et al. (2016). Adversarial feature learning (BiGAN). 10 { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D x x ′ t q φ g θ ( · , t ′ ) ∼ p D z z ′ t ′ z ′ ∼ p z encoding decoding

Partial Bidirectional GAN (P-BiGAN) Donahue, et al. (2016). Adversarial feature learning (BiGAN). 10 Li, Jiang, Marlin. (2019). MisGAN: Learning from Incomplete Data with GANs. { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D ( x , t ) ∼ p D x x ′ t q φ g θ ( · , t ′ ) ∼ p D z z ′ t ′ z ′ ∼ p z encoding decoding Discriminator: D ( m ( x , t ) , z )

Invertibility Property of P-BiGAN 11 Theorem: For ( x , t ) with non-zero probability, if z ∼ q φ ( z | x , t ) then g θ ( z , t ) = x . { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D x t x ′ q φ g θ z z ′ t ′ encoding decoding g θ ( z , t ) is the shorthand notation for [ f ( t i )] | t | i =1 with f = g θ ( z ) .

Autoencoding Regularization for P-BiGAN 12 { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D x t x ′ q φ g θ x ) ℓ ( x , � z z ′ t ′ g θ � x

MNIST Completion P-VAE P-BiGAN square observation with 90% missing P-VAE P-BiGAN independent dropout with 90% missing 15

CelebA Completion P-VAE P-BiGAN square observation with 90% missing P-VAE P-BiGAN independent dropout with 90% missing 16

Architecture for Irregularly-Sampled Time Series P-VAE P-BiGAN How to construct decoder, encoder and discriminator for continuous 17 index set, e.g., time series with I = [0 , T ] ? ( x , t ) ∼ p D x t { ( x , t , z ) } { ( x ′ , t ′ , z ′ ) } D q φ x t x ′ z q φ g θ g θ z z ′ t ′ � x

Decoder for Continuous Time Series Generative process for time series: kernel smoother 18 z ∼ p z ( z ) v = CNN θ ( z ) values on evenly-spaced times u � L i =1 K ( u i , t ) v i f ( t ) = � L i =1 K ( u i , t )

Continuous Convolutional Layer 19 CNN encoder/discriminator Cross-correlation between: evenly-spaced u index t continuous kernel w ( t ) • continuous kernel w ( t ) • masked function m ( x , t )( t ) = � | t | i =1 x i δ ( t − t i ) δ ( · ) is the Dirac delta function.

Continuous Convolutional Layer 19 CNN encoder/discriminator evenly-spaced u index t continuous kernel w ( t ) Cross-correlation between w and m ( x , t ) : � w ( t i − u ) x i ( w ⋆ m ( x , t ))( u ) = i : t i ∈ neighbor ( u ) Construct kernel w ( t ) using a degree-1 B-spline

Learning from Irregularly-Sampled Time Series A Missing Data - PowerPoint PPT Presentation

Learning from Irregularly-Sampled Time Series A Missing Data Perspective Steven Cheng-Xian Li Benjamin M. Marlin University of Massachusetts Amherst Irregularly-Sampled Time Series Irregularly-sampled time series: Time series with non-uniform

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

BMI Report for Sampled CCSD Students: 2010 2013 BMI Report for Sampled CCSD Students: 2010 2013

Timestamp /16 at LBL, sampled 1-in-1K 2nd /16, sampled 1-in-1K Number of relays 8000 6000

Stratospheric Air Sampled at the Stratospheric Air Sampled at the Surface at Mauna Loa Surface

Ordering Chaos Memory-Aware Scheduling for Irregularly Wired Neural Networks on Edge Devices

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

standard series Overview DP series DX series H series M series bitte hier

Charging f rom Sampled Net work Usage Nick Duf f ield Carst en Lund Mikkel Thorup AT&T

Sub-sampled Newton Methods with Non-uniform Sampling Jiyan Yang ICME, Stanford University

Inverting Sampled Traffic Nicolas Hohn, Darryl Veitch Australian Research Council Special

Graphons and sampled networks A graphon W : [0 , 1] 2 [0 , 1] is a measurable function . W ( u

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Introduction to Time Series Basic Concepts Time series concepts well cover Elements of

MECT Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell

AICPA Business and Industry Economic Outlook Survey Detailed Survey Results: 2Q 2020 Management

Outline Restless Bandits 1 Overview Problem Description Decomposition Applications 2

Dropping on the Edge: Flexibility and Dropping on the Edge: Flexibility and Trac Conrmation

Review: Case where index is useful CS5208: Query Optimization 2 1 Query Optimization Since

ArrayLists Exercise Write a program that reads a file and displays the words of that file.

Control Structure: Multiple Selections 01204111 Computers and Programming Chal halermsak Chat

ST-Segment Elevation Myocardial Infarction: Insights from the COMPLETE trial David A Wood, MD on

Learning from Irregularly-Sampled Time Series A Missing Data - PowerPoint PPT Presentation

Learning from Irregularly-Sampled Time Series A Missing Data Perspective Steven Cheng-Xian Li Benjamin M. Marlin University of Massachusetts Amherst Irregularly-Sampled Time Series Irregularly-sampled time series: Time series with non-uniform

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

BMI Report for Sampled CCSD Students: 2010 2013 BMI Report for Sampled CCSD Students: 2010 2013

Timestamp /16 at LBL, sampled 1-in-1K 2nd /16, sampled 1-in-1K Number of relays 8000 6000

Stratospheric Air Sampled at the Stratospheric Air Sampled at the Surface at Mauna Loa Surface

Ordering Chaos Memory-Aware Scheduling for Irregularly Wired Neural Networks on Edge Devices

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

standard series Overview DP series DX series H series M series bitte hier

Charging f rom Sampled Net work Usage Nick Duf f ield Carst en Lund Mikkel Thorup AT&amp;T

Sub-sampled Newton Methods with Non-uniform Sampling Jiyan Yang ICME, Stanford University

Inverting Sampled Traffic Nicolas Hohn, Darryl Veitch Australian Research Council Special

Graphons and sampled networks A graphon W : [0 , 1] 2 [0 , 1] is a measurable function . W ( u

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Introduction to Time Series Basic Concepts Time series concepts well cover Elements of

MECT Microeconometrics Blundell Lecture 1 Overview and Binary Response Models Richard Blundell

AICPA Business and Industry Economic Outlook Survey Detailed Survey Results: 2Q 2020 Management

Outline Restless Bandits 1 Overview Problem Description Decomposition Applications 2

Dropping on the Edge: Flexibility and Dropping on the Edge: Flexibility and Trac Conrmation

Review: Case where index is useful CS5208: Query Optimization 2 1 Query Optimization Since

ArrayLists Exercise Write a program that reads a file and displays the words of that file.

Control Structure: Multiple Selections 01204111 Computers and Programming Chal halermsak Chat

ST-Segment Elevation Myocardial Infarction: Insights from the COMPLETE trial David A Wood, MD on

Charging f rom Sampled Net work Usage Nick Duf f ield Carst en Lund Mikkel Thorup AT&T