Tensor Decomposition for Healthcare Analytics Matteo Ruffini - PowerPoint PPT Presentation

Tensor Decomposition for Healthcare Analytics Matteo Ruffini Laboratory for Relational Algorithmic, Complexity and Learning matteo.ruffini@estudiant.upc.edu November 5, 2017 Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 1 / 36

Overview Overview 1 Clustering 2 Mixture Model Clustering Tensor Decomposition Mixture of independent Bernoulli Applications to Healthcare Analytics 3 Data and objectives Results Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 2 / 36

Overview Task : to segment patients in groups with similar clinical profiles. 1 Similar patients → Similar cares. 2 Find recurrent comorbidities. 3 Assigning and planning resources: drugs and doctors. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 3 / 36

Overview Task : to segment patients in groups with similar clinical profiles. 1 Similar patients → Similar cares. 2 Find recurrent comorbidities. 3 Assigning and planning resources: drugs and doctors. Data : Electronic Healthcare Records (EHR). Objective : Use these data to create clusters of patients. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 3 / 36

Example: ICD-9 EHR In ICD code, to each disease is associated a number 278 → Obesity , 401 → Hypertension Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 4 / 36

Example: ICD-9 EHR In ICD code, to each disease is associated a number 278 → Obesity , 401 → Hypertension Records : list of patients with their diseases → patient-disease matrix. Diseases 820 401 278 560 Patient 1 820, 401 Patient 1 1 1 0 0 Patient 2 401, 278, Patient 2 0 1 1 0 Patient 3 560, 820, 278 Patient 3 1 0 1 1 Objective : cluster the rows of the patient-disease matrix. Sparse and high dimensional data. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 4 / 36

Clustering Clustering : one of the fundamental tasks of Machine Learning. Objective : Dataset of N samples → partition in coherent subsets Dataset : a matrix X ∈ R N × n X ( i ) = ( x ( i ) 1 , ..., x ( i ) n ) Group together similar rows. Standard methods: k-means, k-medioids, single linkage ... Distance-based: poor performances on high dimensional sparse data. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 5 / 36

Mixture Models Definition (Mixture Model) Y ∈ { 1 , ..., k } A latent discrete variable. X = ( x 1 , . . . , x n ) observable, depends on Y . Y k � P ( X ) = P ( Y = i ) P ( X | Y = i ) . . . i =1 x 1 x 2 x n x i are called features . Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 6 / 36

Mixture Models Definition (Mixture Model) Y ∈ { 1 , ..., k } A latent discrete variable. X = ( x 1 , . . . , x n ) observable, depends on Y . Y k � P ( X ) = P ( Y = i ) P ( X | Y = i ) . . . i =1 x 1 x 2 x n x i are called features . Generative process for one sample: 1 Draw Y , obtain Y = i ∈ { 1 , ..., k } . 2 Draw X ∈ R n ≈ P ( X | Y = i ) Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 6 / 36

Mixture Model Clustering Clustering From an outcome of X (observed) → Infer the outcome of Y (unknown) k clusters . Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 7 / 36

Mixture Model Clustering Clustering From an outcome of X (observed) → Infer the outcome of Y (unknown) k clusters . Parameters characterizing a mixture model: ω h := P ( Y = h ) , ω := ( ω 1 , . . . , ω k ) ⊤ , Ω := diag ( ω ) . M = ( µ i , j ) i , j = [ µ 1 | , ..., | µ k ] ∈ R n × k µ i , j = E ( x i | Y = j ) , If conditional distributions and the model parameters are known: P ( Y = j | X , M , ω ) ∝ P ( X | Y = j , M ) ω j Cluster ( X ) = arg max P ( Y = j | X , M , ω ) j =1 ,..., k It is crucial to know the parameters of the model ( M , ω ). Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 7 / 36

Mixture of Independent Bernoulli Observables are binary and conditionally independent: x i ∈ { 0 , 1 } . The expectations coincide with the probability of a positive outcome. µ i , j = P ( x i = 1 | Y = j ) . n � µ x i i , j (1 − µ i , j ) 1 − x i P ( Y = j | X ) ∝ ω j i =1 Clustering Rule: n � µ x i i , j (1 − µ i , j ) 1 − x i Cluster ( X ) = arg max ω j j =1 ,..., k i =1 Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 8 / 36

Mixture Model Clustering: sum up Advantages : Robust to irrelevant features: P ( x i ) = P ( x i | Y = j ) Algorithms with provable guarantees of optimality. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 9 / 36

Mixture Model Clustering: sum up Advantages : Robust to irrelevant features: P ( x i ) = P ( x i | Y = j ) Algorithms with provable guarantees of optimality. Disadvantages : Model assumption on the reality. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 9 / 36

Mixture Model Clustering: sum up Advantages : Robust to irrelevant features: P ( x i ) = P ( x i | Y = j ) Algorithms with provable guarantees of optimality. Disadvantages : Model assumption on the reality. To sum up : Two steps: 1 Estimate the parameters of the mixture. 2 Group together similar elements, using Bayes’ theorem. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 9 / 36

Learning mixture parameters Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 10 / 36

Maximum Likelihood Estimate Standard method Maximum Likelihood. Find parameters Θ = ( M , ω ) maximizing the likelihood on X ∈ R N × n N k � � P ( X ( i ) | Y = j , M ) ω j max Θ P ( X , Θ) = max Θ i =1 j =1 Maximizing this is hard In general there are no closed form solutions. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 11 / 36

Expectation Maximization (EM) Iterative algorithm from [Dempster et al.(1977)] 1 Randomly initialize ( M , ω ) 2 Cluster the samples. 3 Use the clusters to recalculate ( M , ω ). 4 Iterate over steps 2 and 3 until convergence. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 12 / 36

Expectation Maximization (EM) Iterative algorithm from [Dempster et al.(1977)] 1 Randomly initialize ( M , ω ) 2 Cluster the samples. 3 Use the clusters to recalculate ( M , ω ). 4 Iterate over steps 2 and 3 until convergence. Pro and cons Iteratively increases the likelihood. No guarantees of reaching global optimum. EM is slow. The quality of the results depends on the initialization: Good starting points → Good outputs Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 12 / 36

Alternative Approach: Tensor Decomposition A general approach, outlined in [Anandkumar et al., 2014]. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 13 / 36

Alternative Approach: Tensor Decomposition A general approach, outlined in [Anandkumar et al., 2014]. 1 Estimate (Recall: M = [ µ 1 | , ..., | µ k ], µ i = E [ X | Y = i ] ∈ R n ). M 1 := M ω ∈ R n M 2 := M diag ( ω ) M ⊤ ∈ R n × n , M 3 := � k i =1 ω i µ i ⊗ µ i ⊗ µ i ∈ R n × n × n Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 13 / 36

Alternative Approach: Tensor Decomposition A general approach, outlined in [Anandkumar et al., 2014]. 1 Estimate (Recall: M = [ µ 1 | , ..., | µ k ], µ i = E [ X | Y = i ] ∈ R n ). M 1 := M ω ∈ R n M 2 := M diag ( ω ) M ⊤ ∈ R n × n , M 3 := � k i =1 ω i µ i ⊗ µ i ⊗ µ i ∈ R n × n × n 2 Retrieve ( M , ω ) with a tensor decomposition algorithm A : A ( M 1 , M 2 , M 3 ) → ( M , ω ) Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 13 / 36

Alternative Approach: Tensor Decomposition A general approach, outlined in [Anandkumar et al., 2014]. 1 Estimate (Recall: M = [ µ 1 | , ..., | µ k ], µ i = E [ X | Y = i ] ∈ R n ). M 1 := M ω ∈ R n M 2 := M diag ( ω ) M ⊤ ∈ R n × n , M 3 := � k i =1 ω i µ i ⊗ µ i ⊗ µ i ∈ R n × n × n 2 Retrieve ( M , ω ) with a tensor decomposition algorithm A : A ( M 1 , M 2 , M 3 ) → ( M , ω ) Step 1: Depends on the specific properties of the mixture. Step 2 : Is general (need assumptions on M ). Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 13 / 36

Example: Mixture of Independent Gaussians Dataset X ∈ R N × n with iid rows X ( i ) = ( x ( i ) 1 , ..., x ( i ) n ). Model settings : x ( i ) and x ( i ) are conditionally independent ∀ h � = l . h l x ( i ) conditioned to Y is a Gaussian, with known stdev σ : h P ( x h | Y = i ) ≈ N ( µ h , i , σ ) Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 14 / 36

Tensor Decomposition for Healthcare Analytics Matteo Ruffini - PowerPoint PPT Presentation

Tensor Decomposition for Healthcare Analytics Matteo Ruffini Laboratory for Relational Algorithmic, Complexity and Learning matteo.ruffini@estudiant.upc.edu November 5, 2017 Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5,

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Memory-Efficient Parallel Computation of Tensor and Matrix Products for Big Tensor Decomposition

Multilinear Algebra and Tensor Decomposition Qibin Zhao Tensor Learning Unit RIKEN AIP

Tutorial: A brief survey on tensor rank and tensor decomposition, from a geometric perspective.

Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arXiv

Uniqueness of Tensor Decomposition February 2-4, 2015 Villard de Lans, Grenoble Winter School

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist,

Utilizing Macromodels in Floating Random Walk Based Capacitance Extraction Wenjian Yu Department

Coded Modulation An Information-Theoretic Perspective Young-Han Kim http://young-han.kim

Opes Capital Group, LLC Trading Program 2012 CTA Disclaimer THE RISK OF LOSS IN TRADING

Finance, Insurance, and Stochastic Control (III) Jin Ma Spring School on Stochastic Control in

Time-Synchronization in Mobile Sensor Networks from Difference Measurements Chenda Liao and

Convex Analysis in Stochastic Teams and Asymptotic Optimality of Finite Model Representations and

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Fast Laplace Approximation for Gaussian Ketter Processes with a Tensor Product Kernel