Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 7 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 7 Jan-Willem van de Meent ( credit: David Blei)

  Review: K-means Clustering Objective: Sum of Squares μ 1 μ 2 One-hot assignment Center for cluster k µ k Alternate between two steps   μ 3 1. Minimize SSE w.r.t. z n 2. Minimize SSE w.r.t. μ k

Review: Probabilistic K-means Generative Model z n ∼ Discrete ( π ) x n | z n = k ∼ Norm ( µ k , Σ k ) Questions 1. What is log p ( X , z | μ , Σ , π ) ? 2. For what choice of π and Σ   do we recover K -means? Σ k = σ 2 I π k = 1 / K Same as K-means when:

Review: Probabilistic K-means Assignment Update Parameter Updates P N N k : = n = 1 z nk π = ( N 1 / N ,..., N K / N ) 1 P P N 1 Idea: Replace hard   µ k = n = 1 z nk x n P N k assignments with P N P N 1 n = 1 z nk ( x n � µ k )( x n � µ k ) > Σ k = soft assignments N k

Review: Soft K-means Soft Assignment Update Parameter Updates P N N k : = n = 1 γ nk P π = ( N 1 / N ,..., N K / N ) 1 P P N 1 Idea: Replace hard   µ k = n = 1 γ nk x n P N k assignments with P P N 1 n = 1 γ nk ( x n � µ k )( x n � µ k ) > Σ k = soft assignments N k

Review: Lower Bound on Log Likelihood (multiplication by 1)

Review: Lower Bound on Log Likelihood (multiplication by 1) (multiplication by 1)

Review: Lower Bound on Log Likelihood (multiplication by 1) (multiplication by 1) (Bayes rule)

Review: Lower Bound on Log Likelihood

Review: EM for Gaussian Mixtures Generative Model Expectation Maximization z n ∼ Discrete ( π ) Initialize θ x n | z n = k ∼ Norm ( µ k , Σ k ) Repeat until convergence 1. Expectation Step 2. Maximization Step

TOPIC MODELS Borrowing from :   David Blei   (Columbia)

Word Mixtures Idea: Model text as a mixture over words (ignore order) gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Words: Topics:

EM for Word Mixtures Generative Model Expectation Maximization Initialize θ Repeat until convergence 1. Expectation Step   2. Maximization Step

EM for Word Mixtures Generative Model E-step: Update assignments M-step: Update parameters

Topic Modeling Topic proportions and Topics Documents assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, • Each topic is a distribution over words • Each document is a mixture over topics • Each word is drawn from one topic distribution

Topic Modeling Topic proportions and Topics Documents assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Words: Topics:

EM for Topic Models (PLSI/PLSA*) Generative Model E-step: Update assignments M-step: Update parameters *(Probabilistic Latent Semantic Indexing, a.k.a. Probabilistic Latent Semantic Analysis)

Topic Models with Priors Generative Model (with priors) Maximum a Posteriori E-step: Update assignments M-step: Update parameters

Latent Dirichlet Allocation (a.k.a. PLSI/PLSA with priors) Per-word Proportions topic assignment parameter Per-document Topic Observed Topics topic proportions word parameter α θ d Z d,n W d,n β k η N D K

Intermezzo: Dirichlet Distribution

Intermezzo: Conjugacy Likelihood (discrete) Prior (Dirichlet) Question: What distribution is the posterior? More examples: https://en.wikipedia.org/wiki/Conjugate_prior

MAP estimation for LDA Generative Model (with priors) Maximum a Posteriori E-step: Update assignments M-step: Update parameters

Variational Inference Idea: Maximize Evidence Lower Bound (ELBO) Maximizing the ELBO is equivalent to minimizing the KL divergence

Variational EM Use Factorized Approximation for q ( z , β , θ ) Discrete Dirichlet Dirichlet Variational E-step: Maximize w.r.t. φ (expectations closed form for Dirichlet distributions) Variational M-step: Maximize w.r.t. λ and γ (analogous to MAP estimation)

Example Inference 0.4 0.3 Probability 0.2 0.1 0.0 1 8 16 26 36 46 56 66 76 86 96 Topics

Example Inference human evolution disease computer genome evolutionary host models dna species bacteria information genetic organisms diseases data genes life resistance computers sequence origin bacterial system gene biology new network molecular groups strains systems sequencing phylogenetic control model map living infectious parallel information diversity malaria methods genetics group parasite networks mapping new parasites software project two united new sequences common tuberculosis simulations

Example Inference

Example Inference problem model selection species problems rate male forest mathematical constant males ecology number distribution females fish new time sex ecological mathematics number species conservation university size female diversity two values evolution population first value populations natural numbers average population ecosystems work rates sexual populations time data behavior endangered mathematicians density evolutionary tropical chaos measured genetic forests chaotic models reproductive ecosystem

Performance Metric: Perplexity Nematode abstracts Associated Press 3400 7000 Smoothed Unigram Smoothed Unigram Smoothed Mixt. Unigrams Smoothed Mixt. Unigrams 3200 6500 LDA LDA Fold in pLSI Fold in pLSI 3000 6000 2800 5500 Perplexity 2600 Perplexity 5000 2400 4500 2200 4000 2000 3500 1800 1600 3000 1400 2500 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 120 140 160 180 200 Number of Topics Number of Topics 7000 ⇢ − P d log p ( w d ) � perplexity = exp P d N d Marginal likelihood (evidence) of held out documents

Extensions of LDA • EM inference (PLSA/PLSI) yields similar results   to Variational inference or MAP inference (LDA)   on most data • Reason for popularity of LDA:   can be embedded in more complicated models

Extensions: Supervised LDA θ d α Z d,n W d,n β k N K η , σ 2 Y d D 1 Draw topic proportions θ | α ∼ Dir ( α ) . 2 For each word • Draw topic assignment z n | θ ∼ Mult ( θ ) . • Draw word w n | z n , β 1 : K ∼ Mult ( β z n ) . 3 Draw response variable y | z 1 : N , η , σ 2 ∼ N z , σ 2 � � η > ¯ , where z = ( 1 / N ) P N ¯ n = 1 z n .

Extensions: Supervised LDA least bad more awful his both problem guys has featuring their motion unfortunately watchable than routine character simple supposed its films dry many perfect worse not director offered while fascinating flat one will charlie performance power dull movie characters paris between complex ● ● ● ● ● ● ● ● ● ● − 30 − 20 − 10 have not 0 one however 10 20 like about from cinematography you movie there screenplay was all which performances just would who pictures some they much effective out its what picture

Extensions: Correlated Topic Model β k Σ η d Z d,n W d,n N D K µ Noconjugate prior on topic proportions Estimate a covariance matrix Σ that parameterizes correlations between topics in a document

Extensions: Dynamic Topic Models 1789 2009 Inaugural addresses My fellow citizens: I stand here today humbled by the task AMONG the vicissitudes incident to life no event could before us, grateful for the trust you have bestowed, mindful have filled me with greater anxieties than that of which of the sacrifices borne by our ancestors... the notification was transmitted by your order... Track changes in word distributions   associated with a topic over time.

Extensions: Dynamic Topic Models α α α θ d θ d θ d Z d,n Z d,n Z d,n W d,n W d,n W d,n N N N D D D . . . β k, 2 β k,T β k, 1 K

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 7 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 7 Jan-Willem van de Meent ( credit: David Blei) Review: K-means Clustering Objective: Sum of Squares 1 2 One-hot assignment Center for cluster k k Alternate between

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Data Mining: Concepts and Techniques Chap 8. Data Streams, Time Series Data, and Sequential

Value-driven policy-making as a socio-cognitive technical system Perell-Moragues, Antonio

Meteor Fullstack JavaScript Development Raimond Reichert raimond@ergon.ch Samuel Zrcher

Food Safety Plan 227 The information presented in this sections based on Annex 5: HACCP Guidelines

The NRL Multi Aperture SAR (NRL MSAR): System Description and Recent Results Luke Rosenberg

Electro Optical Instrumentation: Milestones and Trends Silvano Donati Silvano Donati

Line intensities and Collisional-Radiative Modeling H. K. Chung (many slides from Y .

Seebeck and Nernst coe ffi cients of the heavy-electron metals Kamran Behnia Ecole Suprieure de

Lectures on Cosmic Microwave Background Eiichiro Komatsu (Texas Cosmology Center, Univ. of Texas,