Unsupervised learning in medical imaging Discovering phenotypes and - PowerPoint PPT Presentation

Maximum likelihood 𝑦 (𝑗) Given training data and a model with model • parameters, we choose the parameters to maximize the likelihood of the training examples 𝑛 𝜄 ∗ = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑞 𝑛𝑝𝑒𝑓𝑚 (𝑦 (𝑗) ; 𝜄) ∏ 𝜄 𝑗=1 𝑞 𝑛𝑝𝑒𝑓𝑚 (𝑦; 𝜄) In practice …. log -space: 𝑛 𝜄 ∗ = 𝑏𝑠𝑕𝑛𝑏𝑦 (𝑞 𝑛𝑝𝑒𝑓𝑚 (𝑦 𝑗 ; 𝜄)) log⁡ 𝜄 𝑗=1 A distribution in the data / observation space: 𝑞 𝑒𝑏𝑢𝑏 Johannes Hofmanninger 34 www.cir.meduniwien.ac.at

Explicit density models 𝑞 𝑛𝑝𝑒𝑓𝑚 (𝑦; 𝜄) 𝑞 𝑒𝑏𝑢𝑏 (𝑦) ~ Explicit representation of the model density • Examples: • Gaussian mixture models • Variational autoencoders • Maximum likelihood Explicit density Implicit density Johannes Hofmanninger 35 www.cir.meduniwien.ac.at

Implicit density models 𝑞 𝑛𝑝𝑒𝑓𝑚 (𝑦; 𝜄) 𝑞 𝑒𝑏𝑢𝑏 (𝑦) ~ Implicit density model that can • generate samples from its density Examples: GANs, GSN • 𝑨 Generator Maximum likelihood Explicit density Implicit density Johannes Hofmanninger 36 www.cir.meduniwien.ac.at

Example: Clustering / Gaussian mixture model (GMM) Observations Model distribution Johannes Hofmanninger 37 www.cir.meduniwien.ac.at

Taxonomy of generative models Maximum likelihood Explicit density Implicit density Direct Tractable density Approximate density Markov Chain Variational Markov Chain Taxonomy following I. Goodfellow 2016 Johannes Hofmanninger 38 www.cir.meduniwien.ac.at

Taxonomy of generative models Maximum likelihood Explicit density Implicit density Direct Tractable density Approximate density Markov Chain Variational Markov Chain Variational Autoencoder Taxonomy following I. Goodfellow 2016 Johannes Hofmanninger 39 www.cir.meduniwien.ac.at

Taxonomy of generative models Maximum likelihood GAN Explicit density Implicit density Direct Markov Chain Tractable density Approximate density Markov Chain Variational Taxonomy following I. Goodfellow 2016 Johannes Hofmanninger 40 www.cir.meduniwien.ac.at

Auto Encoders Ian Goodfellow 2016 - GAN Tutorial - https://arxiv.org/abs/ 1701.00160 Johannes Hofmanninger 41 www.cir.meduniwien.ac.at

Convolutional Neural Network Johannes Hofmanninger 42 www.cir.meduniwien.ac.at

Autoencoder Figure from [Guo et al. 2017] Johannes Hofmanninger 43 www.cir.meduniwien.ac.at

Autoencoder 𝑁𝑇𝐹(𝑦, 𝑦 ) Low dimensional representation Loss function (bottleneck neurons) 𝑦 𝑦 𝑦 High-dimensional Image representation Encoder Decoder Johannes Hofmanninger 44 www.cir.meduniwien.ac.at

Stacked Autoencoder Layerwise pretraining Finetuning Johannes Hofmanninger 45 www.cir.meduniwien.ac.at

#TBT ... Lung pattern classification [Schlegl et al. MICCAI-MCV 2014] Johannes Hofmanninger 46 www.cir.meduniwien.ac.at

Example: faces Input Output autoencoder (30 dim) Output PCA (30 dim) Figure from [Hinton & Salakhutdinov Science 2006] Johannes Hofmanninger 47 www.cir.meduniwien.ac.at

The code layer represents structure Autoencoder: 784-1000-500-250-2 layers. • Look at the code layer Figure from [Hinton & Salakhutdinov Science 2006] Johannes Hofmanninger 48 www.cir.meduniwien.ac.at

Variational Autoencoder Loss function: + 𝛾⁡ 𝐿𝑀(𝑟 𝑘 (𝑨 𝑦) 𝑂(0,1)) 𝑁𝑇𝐹 𝑦, 𝑦 𝑘 reconstruction property of latent space Johannes Hofmanninger 49 www.cir.meduniwien.ac.at

Variational Autoencoder Generative Model we can sample new cases Johannes Hofmanninger 50 www.cir.meduniwien.ac.at

Generative adversarial networks Goodfellow et al. 2014 NIPS - arXiv:1406.2661 Ian Goodfellow 2016 - GAN Tutorial - arXiv:1701.00160 Johannes Hofmanninger 51 www.cir.meduniwien.ac.at

A generative model: generates observations from a latent variable Generator: G 𝐇: 𝒶 → 𝒴 𝐇: 𝐴 ↦ 𝐲 z x 𝑨 has a latent prior in the z space 𝑨 ∼ 𝑞 𝑨 (𝑨), 𝑨 ∈ 𝒶 𝐇(⋅; 𝜄 𝐻 ) implicitely defines a model distribution 𝑞 𝑛𝑝𝑒𝑓𝑚 (𝑦; 𝜄 𝐻 ) Johannes Hofmanninger 52 www.cir.meduniwien.ac.at

A generative model: generates observations from a latent variable Generator: G How do we train it to become good at sampling? 𝐇: 𝒶 → 𝒴 𝐇: 𝐴 ↦ 𝐲 Game: z x The Generator generates fakes The Discriminator has to tell fakes and real examples apart Johannes Hofmanninger 53 www.cir.meduniwien.ac.at

A generative model: generates observations from latent variable Generator: G Discriminator: D 𝐇: 𝒶 → 𝒴 𝐄: 𝒴 → ℝ x d z x 𝐄(⋅; 𝜄 𝐸 ) scores how fake/real a sample looks like Johannes Hofmanninger 54 www.cir.meduniwien.ac.at

Adsversarial learning Generator: G Discriminator: D 𝜄 (𝐻) 𝜄 (𝐸) Parameters: Parameters: Cost function: 𝐾 (𝐻) (𝜄 (𝐸) , 𝜄 (𝐻) ) Cost function: 𝐾 (𝐸) (𝜄 (𝐸) , 𝜄 (𝐻) ) x d z x Latent variable Observed variable real or faked Decision: is the image input real or fake e.g., image Both, generator and discriminator are differentiable Johannes Hofmanninger 55 www.cir.meduniwien.ac.at

Adversarial learning Generator: G Discriminator: D 𝜄 (𝐻) 𝜄 (𝐸) Parameters: Parameters: Cost function: 𝐾 (𝐻) (𝜄 (𝐸) , 𝜄 (𝐻) ) 𝐾 (𝐸) (𝜄 (𝐸) , 𝜄 (𝐻) ) Cost function: x d ? z x The discriminator learns to discriminate between real examples and generated samples . Minimize J (D) through changing 𝜄 (𝐸) Johannes Hofmanninger 56 www.cir.meduniwien.ac.at

Adversarial learning Generator: G Discriminator: D 𝜄 (𝐻) 𝜄 (𝐸) Parameters: Parameters: Cost function: 𝐾 (𝐻) (𝜄 (𝐸) , 𝜄 (𝐻) ) 𝐾 (𝐸) (𝜄 (𝐸) , 𝜄 (𝐻) ) Cost function: x d ? z x Its primary purpose is to provide the cost function of the generator with a reward function to evaluate its quality Johannes Hofmanninger 57 www.cir.meduniwien.ac.at

Adversarial learning Generator: G Discriminator: D 𝜄 (𝐻) 𝜄 (𝐸) Parameters: Parameters: Cost function: 𝐾 (𝐻) (𝜄 (𝐸) , 𝜄 (𝐻) ) 𝐾 (𝐸) (𝜄 (𝐸) , 𝜄 (𝐻) ) Cost function: x d ? z x The generator learns to generate samples that are hard to discern from real examples. Its cost function is penalized by the discriminator. Johannes Hofmanninger 58 www.cir.meduniwien.ac.at

Training: simultaneous stochastic gradient descent (SDG) Generator: G Discriminator: D 𝜄 (𝐻) 𝜄 (𝐸) Parameters: Parameters: 𝐾 (𝐻) (𝜄 (𝐸) , 𝜄 (𝐻) ) 𝐾 (𝐸) (𝜄 (𝐸) , 𝜄 (𝐻) ) Cost function: Cost function: x d z x 2 minibatches of samples: Two gradient steps: • z values drawn from model prior in z space • Update to get better at discriminating 𝜄 (𝐸) generating x generated from real data • x from the training example set • Update to minimize which can 𝜄 (𝐻) 𝐾 (𝐻) 𝐾 (𝐻) = −𝐾 (𝐸) be e.g., Johannes Hofmanninger 59 www.cir.meduniwien.ac.at

Training 𝜄 (𝐻) 𝜄 (𝐸) Random samples Generated “faked” G in z space observations x (Sampled from prior) D real / fake Real examples x 𝜄 (𝐸) 𝑊(𝜄 (𝐸) , 𝜄 (𝐻) ) argmin max 𝑊(𝜄 (𝐸) , 𝜄 (𝐻) ) = −𝐾 (𝐸) (𝜄 (𝐸) , 𝜄 (𝐻) ) 𝜄 (𝐻) A minimax game using a value function Johannes Hofmanninger 60 www.cir.meduniwien.ac.at

Training to reach equilibrium 𝐲 𝐴 𝑒 𝐲 x d z x (𝜄 (𝐸) , 𝜄 (𝐻) ) This is a game, where each player wishes to A Nash equilibrium is a tuple 𝐾 (𝐻) minimize the cost function that depends on so that is a local minimum w.r.t. , and 𝜄 (𝐸) 𝐾 (𝐸) parameters of both players, while only having is a local minimum w.r.t. 𝜄 (𝐻) control over its own parameters. The solution to this game is a Nash equilibrium Johannes Hofmanninger 61 www.cir.meduniwien.ac.at

Example: adversarial learning in 1D 𝑞 𝑒𝑏𝑢𝑏 𝑞 𝑛𝑝𝑒𝑓𝑚 Init Updated D Updated G Equilibrium Figure from Goodfellow et al. 2014 Generative Adversarial Nets arXiv:1406.2661 Johannes Hofmanninger 62 www.cir.meduniwien.ac.at

Deep Convolutional GANs (DCGAN) Generator Discriminator DeConv DeConv DeConv DeConv z d 4x4x1024 8x8x512 16x16x256 32x32x128 x 64x64x3 Goodfellow et al. 2014; Radford et al. 2015 Johannes Hofmanninger 63 www.cir.meduniwien.ac.at

Learning the distribution of data We learn a manifold of plausible • We can produce plausible data • [Goodfellow et al. 2014 Generative Adversarial Nets] Figure from [Karras et al. 2017] Johannes Hofmanninger 64 www.cir.meduniwien.ac.at

Disentangling concepts - vector arithmetic in z space In z-space, vector arithmetic is feasible to some extent Radford et al. 2015 Johannes Hofmanninger 65 www.cir.meduniwien.ac.at

Conditional GANs - cGANs • A condition c as an additional input to both the generator and the discriminator 𝐇: 𝒶 → 𝒴 𝐇: 𝐴 ↦ 𝐲 𝐇: (𝒶, 𝒟) → 𝒴 𝐇: (𝐴, 𝐝) ↦ 𝐲 𝐄: (𝐲, 𝐝) ↦ 𝑒 𝑞(𝑦 𝑑) Johannes Hofmanninger 66 www.cir.meduniwien.ac.at

conditional GAN Generator Discriminator 𝐲 𝐲 𝐴 𝑒 𝐝 𝐝 Johannes Hofmanninger 67 www.cir.meduniwien.ac.at

Conditional GANs: image generation from labels Odena et al. 2016 - Conditional image synthesis - arXiv:1610.09585 Johannes Hofmanninger 68 www.cir.meduniwien.ac.at

Image to image translation Discriminator Encoder - decoder (𝐲, 𝐝) 𝐲 𝐝 𝑒 U-net style skip connections • Map from image c to image x . • Use an image c as the condition for the generator and discriminator 𝐲 𝐝 Isola et al. 2016 Image to Image Translation - https://arxiv.org/abs/ 1611.07004 Georg Langs www.cir.meduniwien.ac.at 69

Image to image translation: label map to image https://phillipi.github.io/pix2pix/ Isola et al. 2016 Image to Image Translation - https://arxiv.org/abs/ 1611.07004 Johannes Hofmanninger 70 www.cir.meduniwien.ac.at

Problem: mode collapse Instead of covering the entire data distribution, the generator has extremely reduced output diversity … hopping from one narrow area to the next while the discriminator catches up Arjovsky et al. 2017 Metz et al. 2016 Johannes Hofmanninger 71 www.cir.meduniwien.ac.at

Wasserstein GANs - WGANs Critic instead of discriminator: instead of • divergence, we use an approximation of the earth movers distance If the data is actually on a low-dimensional • manifold, divergence can saturate, and gradients can vanish Wasserstein distance as EM distance • approximation does not suffer from this Less prone to mode collapse • Arjovsky et al 2017 - Theory - arXiv:1701.04862 Figure from Arjovsky et al 2017 Wasserstein GAN - arXiv:1701.07875v3 Johannes Hofmanninger 72 www.cir.meduniwien.ac.at

Detecting anomalies with GANs Work by Thomas Schlegl et al. https://www.cir.meduniwien.ac.at/team/thomas-schlegl Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Johannes Hofmanninger 73 www.cir.meduniwien.ac.at

Detect anomalies by having a good model of normal Model Look at residual Normal data Model Unseen data Anomalies Johannes Hofmanninger 74 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping of a query image Generator G Loss G( z ) z µ( x ) backpropagation Generated image Query image z Anomalous Normal µ( x ) G( z ) Johannes Hofmanninger 75 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping: Ingredient 1 G z G( z ) ℒ 𝑆 𝒜 𝛿 Residual loss ℒ 𝑆 𝒜 Γ Query image ℒ 𝑆 𝒜 𝛿 = ∑ 𝒚 − 𝐻 𝒜 𝛿 Johannes Hofmanninger 76 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping: Ingredient 2 D G ℒ 𝐸 𝒜 𝛿 z G( z ) 𝒜 Γ Discrimination loss ℒ 𝐸 Query image ℒ 𝐸 𝒜 𝛿 = −log 𝐸 𝐻 𝒜 Johannes Hofmanninger 77 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping: Ingredient 2 (revised) Feature matching [Salimans et al., 2016] G D z G( z ) ℒ 𝐸 𝒜 𝛿 Discrimination loss ℒ 𝐸 𝒜 Γ Query image ℒ 𝐸 𝒜 𝛿 = ∑ f 𝒚 − f 𝐻 𝒜 𝛿 Johannes Hofmanninger 78 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping: Combined loss function G D z G( z ) Combined loss Query image ℒ 𝒜 𝛿 = 1 − 𝜇 ∙ ℒ 𝑆 𝒜 𝛿 + 𝜇 ∙ ℒ 𝐸 𝒜 𝛿 Johannes Hofmanninger 79 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Anomaly detection 1. Anomaly score : Detection of anomalous images A(x)=(1- λ)∙R(x)+λ∙D(x) ? • residual score and ‘anomalous’ ‘normal’ • discrimination score 2. Residual image: Detection of anomalous regions within images 𝒚 𝑆 = 𝒚 − 𝐻 𝒜 Γ 𝒚 𝒚 𝑆 𝐻 𝒜 Γ Johannes Hofmanninger 80 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Experiments – Data • Unsupervised GAN Training: 270 OCT scans of „ healthy’ subjects ( non-fluid ) • 1.000.000 2D image patches • • Testing – Detecting anomalies: 10 OCT „ healthy’ ; 10 OCT „ pathological’ ( macular fluid ) • In total: 8.192 image patches • Preprocessing Input data Johannes Hofmanninger 81 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Training process epoch 1 epoch 3 epoch 5 epoch 10 epoch 20 iter 1 iter 1.000 iter 16.000 Johannes Hofmanninger 82 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Can the model generate realistic images? Johannes Hofmanninger 83 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Can the model generate similar images? Query image Training set Test set Test set (normal) (normal) (diseased) Generated image Johannes Hofmanninger 84 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Pixel-level detection of anomalies Training set Test set Test set (normal) (normal) (diseased) Johannes Hofmanninger 85 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Pixel-level detection of anomalies Anomalous Johannes Hofmanninger 86 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Pixel-level detection of anomalies Normal Johannes Hofmanninger 87 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Image-level detection of anomalies: Anomaly score components ROC Residual score Discrimination score Johannes Hofmanninger 88 www.cir.meduniwien.ac.at

Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Image-level detection of anomalies: ROC Model comparison [Pathak et al., 2016] CAE D aCAE G D GAN R G D AnoGAN D P D Johannes Hofmanninger 89 www.cir.meduniwien.ac.at

Identifying phenotypes Routine Radiology Imaging Data Johannes Hofmanninger 90 www.cir.meduniwien.ac.at

The reason for big data analytics • Why real world datasets? Learn from a representative sample to • identify robust marker patterns Capture natural variability • High variability reality but limited training set • Nobody has time to annotate 1000s of cases • Inter-rater concordance may be low • Johannes Hofmanninger 91 www.cir.meduniwien.ac.at

Typical study data • 100 Cases (10MB/case) • Carefully selected • Evaluated • Annotated • Homogeneous cohorts Johannes Hofmanninger 92 www.cir.meduniwien.ac.at

Collected within one month... >4TB CT/MR Data Johannes Hofmanninger 93 www.cir.meduniwien.ac.at

Handle heterogeneity in real life data: correspondence • Algorithmic localization of anatomical structures • Mapping and comparison of positions across individuals • Tracking of positions over time Hofmanninger et al. 2017 Johannes Hofmanninger 94 www.cir.meduniwien.ac.at

Multi-template normalization Hofmanninger et al. 2017 Johannes Hofmanninger 95 www.cir.meduniwien.ac.at

Rich but unstructured information … in clinical data: imaging + semantic information. Johannes Hofmanninger 96 9 www.cir.meduniwien.ac.at

Reports as weak annotations [Thomas Schlegl et al.] Johannes Hofmanninger 97 www.cir.meduniwien.ac.at

Linking semantics and imaging to map reported markers to new images • Machine learning can extract structured information from unstructured reports • Link this to imaging data • Algorithms can learn maps of findings only based on imaging data an reports Hofmanninger, Langs 2015 Johannes Hofmanninger 98 www.cir.meduniwien.ac.at

Mapping report terms to imaging data Image information can be used to capture variability in the data Algorithm Hofmanninger, Langs 2015 Johannes Hofmanninger 99 www.cir.meduniwien.ac.at

Mapping report terms to imaging data Image information can be used to capture variability in the data Expert Hofmanninger, Langs 2015 Johannes Hofmanninger 100 www.cir.meduniwien.ac.at

Unsupervised learning in medical imaging Discovering phenotypes and - PowerPoint PPT Presentation

Unsupervised learning in medical imaging Discovering phenotypes and detecting anomalies Johannes Hofmanninger many slides by Georg Langs Medical University of Vienna Department for Biomedical Imaging and Image-guided Therapy Computational

Nuclear Imaging Medical Imaging Medical Imaging Nuclear Imaging Nuclear Imaging Nuclear

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Introduction to Medical Imaging Dr Kevin Ho-Shon Head of Medical Imaging Macquarie Medical

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Introduction to optoacoustic imaging Xos Lus Den Ben IBMI Institute of Biological and

Objectives : Lower Extremity Imaging Lower Extremity Imaging What to Different Imaging

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Applications of the Human Eye Working Principle: CORPS-BFN Diego Betancourt Carlos del

Detection of Data Corruption via Combinatorial Group Testing and beyond Kazuhiko Minematsu

DPS studies at LHCb Vanya Belyaev (ITEP, Moscow) On behalf of LHCb collaboration High energy

A new smart-pooling strategy for high-throughput screening: the Shifted Transversal Design

Double Parton Scattering @ LHCb Vanya Belyaev (ITEP/Moscow) DPS : simple paradigm scatters Two

Description of visual auras Karl Lashley Cortical

Authors Sheila Nirenberg: Nirenberg lab is part of the department of physiology and biophysics

Assistive Technology as Reasonable Accommodations Helping Employees with Disabilities Find,