connectivity optimized representation learning via
play

Connectivity-Optimized Representation Learning via Persistent - PowerPoint PPT Presentation

ICML | 2019 Long Beach Connectivity-Optimized Representation Learning via Persistent Homology Christoph D. Hofer, Roland Kwitt Mandar Dixit Marc Niethammer University of Salzburg UNC Chapel Hill Microsoft Unsupervised representation


  1. ICML | 2019 Long Beach Connectivity-Optimized Representation Learning via Persistent Homology Christoph D. Hofer, Roland Kwitt Mandar Dixit Marc Niethammer University of Salzburg UNC Chapel Hill Microsoft

  2. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc.

  3. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc. Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X Rec [ x, ˆ x ] f θ : X → Z Encoder Decoder

  4. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) Contractive AE’s [Rifai et al., ICML ’11] ◮ Robust to pertubations of the input ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc. Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X + Reg Rec [ x, ˆ x ] f θ : X → Z Encoder Decoder

  5. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input Denoising AE’s [Vincent et al., JMLR ’10] ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc. Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X Rec [ x, ˆ x ] f θ : X → Z � Encoder Decoder Large Perturb, or zero-out

  6. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input Sparse AE’s [Makhzani & Frey, ICLR ’14] ◮ Useful for downstream tasks (e.g., clustering, or classification) ◮ etc. Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X + Reg Rec [ x, ˆ x ] f θ : X → Z Encoder Decoder

  7. Unsupervised representation learning Q : What makes a good representation? ◮ Ability to reconstruct ( → prevalance of autoencoders) ◮ Robust to pertubations of the input ◮ Useful for downstream tasks (e.g., clustering, or classification) Adversarial AE’s [Makhzani et al., ICLR ’16] ◮ etc. (by far not exhaustive) Common idea : Control (/or enforce) properties of (/on) the latent representations in Z . Latent space Z x x ˆ g φ : Z → X Rec [ x, ˆ x ] f θ : X → Z Encoder Decoder � Enforce distributional properties through adversarial training

  8. Motivating (toy) example We aim to control properties of the latent space, but from a topological point of view !

  9. Motivating (toy) example We aim to control properties of the latent space, but from a topological point of view ! Assume, we want to do Kernel Density Estimation (KDE) in the latent space Z . Data ( z i ) Gaussian KDE Bandwidth selection: Scott’s rule [Scott, 1992]

  10. Motivating (toy) example We aim to control properties of the latent space, but from a topological point of view ! Assume, we want to do Kernel Density Estimation (KDE) in the latent space Z . Data ( z i ) Data ( z i ) Gaussian KDE Gaussian KDE Bandwidth selection: Scott’s rule [Scott, 1992] Bandwidth selection can be challenging, as the scaling greatly differs!

  11. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Latent space Z

  12. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Vietoris Rips Persistent Homology (PH) Radius r = r 1 r Latent space Z

  13. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Vietoris Rips Persistent Homology (PH) Radius r = r 2 Latent space Z

  14. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Vietoris Rips Persistent Homology (PH) Radius r = r 3 Latent space Z ◮ PH tracks topological changes as the ball radius r increases ◮ Connectivity information is caputred by 0 -dim. persistent homology

  15. Controlling connectivity Q : How do we capture topological properties and what do we want to control? Vietoris Rips Persistent Homology (PH) Homogeneous arrangement! Radius r = r 3 What if z �→ f θ ( z ) η/ 2 Latent space Z ◮ PH tracks topological changes as the ball radius r increases beneficial for KDE ◮ Connectivity information is caputred by 0 -dim. persistent homology

  16. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? x x ˆ g φ : R n → X f θ : X → R n Rec [ · , · ]

  17. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? x ˆ Consider batches g φ : R n → X f θ : X → R n Rec [ · , · ] ( x 1 , . . . , x B ) + Connectivity loss PH PH

  18. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? x ˆ Consider batches g φ : R n → X f θ : X → R n Rec [ · , · ] ( x 1 , . . . , x B ) + Connectivity loss PH PH L η η , penalize deviation from homogeneous arrangement (with scale η )

  19. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? Gradient signal x ˆ Consider batches g φ : R n → X f θ : X → R n Rec [ · , · ] ( x 1 , . . . , x B ) + Connectivity loss PH PH L η η , penalize deviation from homogeneous arrangement (with scale η )

  20. Connectivity loss Q : How can we control topological properties (connectivity properties in particular)? � Until now , we could not backpropagate through PH Gradient signal x x ˆ ˆ Consider batches g φ : R n → X f θ : X → R n Rec [ · , · ] ( x 1 , . . . , x B ) + Connectivity loss PH PH L η η , penalize deviation from homogeneous arrangement (with scale η )

  21. Connectivity loss From a theoretical perspective , we show . . . · · · Enc Dec + Connectivity loss PH (1) . . . that under mild conditions, the connectivity loss is differentiable

  22. Connectivity loss From a theoretical perspective , we show . . . x 1 , . . . , x B · · · Enc Dec + Connectivity loss PH (1) . . . that under mild conditions, the connectivity loss is differentiable (2) . . . metric-entropy based guidelines for choosing the training batch size B

  23. Connectivity loss From a theoretical perspective , we show . . . x 1 , . . . , x N · · · Enc Dec + Connectivity loss N ≫ B PH (1) . . . that under mild conditions, the connectivity loss is differentiable (2) . . . metric-entropy based guidelines for choosing the training batch size B (3) . . . “densification ” e ff ects occur for samples, N , larger than the training batch size B

  24. Connectivity loss From a theoretical perspective , we show . . . x 1 , . . . , x N · · · Enc Dec + Connectivity loss N ≫ B PH (1) . . . that under mild conditions, the connectivity loss is differentiable (2) . . . metric-entropy based guidelines for choosing the training batch size B (3) . . . “ densi fi cation ” e ff ects occur for samples, N , larger than the training batch size B Intuitively , during training ... ... the reconstruction loss controls what is worth capturing ... the connectivity loss controls how to topologically organize the latent space

  25. Experiments – Task : One-class learning unlabled data Auxiliary f θ g φ Rec [ · , · ] + Connectivity loss (with fi xed scale η ) PH Trained only once (e.g., on C I FAR-10 without labels)

  26. Experiments – Task : One-class learning unlabled data Auxiliary f θ g φ Rec [ · , · ] + Connectivity loss (with fi xed scale η ) PH Trained only once (e.g., on C I FAR-10 without labels) KDE-inspired one-class "learning" One-class samples r = η/ 2 f θ

  27. Experiments – Task : One-class learning unlabled data Auxiliary f θ g φ Rec [ · , · ] + Connectivity loss (with fi xed scale η ) PH Trained only once (e.g., on C I FAR-10 without labels) KDE-inspired one-class "learning" Computation of a one-class score One-class samples r = η/ 2 I n-class f θ f θ f θ Out-of-class Count #samples falling into balls of radius η , anchored at the one-class instances

  28. Results – Task : One-class learning CIFAR-10 (AE trained on C I FAR-100) 0.8 0.8 ∅ AUROC 0.7 0.7 0.6 0.6 0.5 0.5 DAGMM DSEBM OC-SVM (CAE) Deep-SVDD ADT Ours -120 ADT [Goland & El-Yaniv, N I PS ’ 18] DAGMM [Zong et al., I CLR ’ 18] DSEBM [Zhai et al., I CML ’ 16] Training batch size: B = 100 Deep-SVDD [Ru ff et al., I CML ’ 18]

  29. Results – Task : One-class learning CIFAR-10 (AE trained on C I FAR-100) +7 points 0.8 0.8 ∅ AUROC 0.7 0.7 0.6 0.6 0.5 0.5 DAGMM DSEBM OC-SVM (CAE) Deep-SVDD ADT Ours -120 ADT-1,000 ADT-500 ADT-120 Ours -120 Low-sample size ADT [Goland & El-Yaniv, N I PS ’ 18] DAGMM [Zong et al., I CLR ’ 18] DSEBM [Zhai et al., I CML ’ 16] Training batch size: B = 100 Deep-SVDD [Ru ff et al., I CML ’ 18]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend