Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 19: EM algorithm, Topic modeling Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1

Administrivia • HW4 due, HW5 out • Remember that we only count the highest 4 homework scores • Final project midpoint presentation • For the final project, each person will be asked to summarize what everyone in the team did • Contact information for printing 2

Second Month Survey Second survey First survey 3

Second Month Survey • Conflicting opinions • wide variety of models, good explanations, good homeworks • Clarity of HW grading is the worst I have ever had for a class. • Depth of content covered • Course is too theory heavy • I liked that the instructor not only requested feedback often, but also acted upon the feedback, changing a few things about how the class and slides are presented. 4

Second Month Survey • Increase exam duration • The professor needs to slow down, and sacrifice some of the math subtleties and complexities in favor of concrete understanding of the topics. • Go into the weeds of the math less 5

Learning Objectives • Learn about Expectation-Maximization algorithm • Learn about latent Dirichlet allocation 6

Gaussian Mixture Models 7

Gaussian Mixture Models ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x2 ● ● ● 0 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 2 ● ● ● ● ● ● ● ● ● ● ● ● ● − 4 − 4 − 2 0 2 4 9 x1

Latent Variables • z’s correspond to the latent structure that we try to learn in unsupervised learning • From a modeling perspective, they are usually referred to as latent variables 18

EM Algorithm 19

EM Algorithm 20

EM Algorithm 21

EM Algorithm 22

EM Algorithm • EM stands for Expectation-Maximization • A classic algorithm in Dempster, Laird, Rubin, 1977 • An iterative method 23

EM Algorithm 24

EM Algorithm 25

EM Algorithm 26

EM Algorithm 27

EM Algorithm 28

EM Algorithm 29

EM Algorithm 30

EM Algorithm 31

EM Algorithm 32

EM Algorithm 33

EM Algorithm 34

EM Algorithm 35

GMM and K-means 36

GMMs and the EM algorithm • GMMs with the EM Algorithm suffer from some of the same problems as K-Means • Doesn't really work with categorical data • Usually only converges to a local minimum • Have to determine the number of clusters • Only generates convex clusters • But, it also has certain advantages • The clusters are allowed different shapes • We get a soft partitioning of the data 37

Topic models • Discrete count data 38

Topic models • Suppose you have a huge number of documents • Want to know what's going on • Can't read them all (e.g. every New York Times article from the 90's) • Topic models offer a way to get a corpus-level view of major themes • Unsupervised 39

Conceptual approach • Input: a text corpus and number of topics K • Output: Corpus • K topics, each topic is a list of words • Topic assignment for each document Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of A Peaceful Crew Puts a Mouse Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple The three big Internet portals begin to distinguish Red Light, Green Light: A among themselves as 2-Tone L.E.D. to shopping malls Simplify Screens 40

Conceptual approach • K topics, each topic is a list of words TOPIC 1 TOPIC 2 TOPIC 3 computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine 41

Conceptual approach • Topic assignment for each document Internet portals Red Light, Green Stock Trades: A begin to distinguish Light: A Better Deal For among themselves 2-Tone L.E.D. to Investors Isn't as shopping malls Simplify Screens Simple Forget the TOPIC 1 TOPIC 2 Bootleg, Just "TECHNOLOGY" "BUSINESS" Download the Movie Legally Multiplex Heralded The Shape of As Linchpin To Cinema, Growth Transformed At the Click of a A Peaceful Crew Mouse TOPIC 3 Puts Muppets "ENTERTAINMENT" Where Its Mouth Is 42

Topics from Science 43

Why should you care? • Neat way to explore/understand corpus collections • E-discovery • Social media • Scientific data • NLP Applications • Word sense disambiguation • Discourse segmentation • Psychology: word meaning, polysemy • A general way to model count data and a general inference algorithm 44

Topic models • Discrete count data • Gaussian distributions are not appropriate 45

Generative model: Latent Dirichlet Allocation • Generate a document, or a bag of words • Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003. 46

Generative model: Latent Dirichlet Allocation • Generate a document, or a bag (1,0,0) (0,0,1) (0,1,0) of words • Multinomial distribution • Distribution over discrete outcomes • Represented by non-negative vector that sums to one (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) • Picture representation 47

Generative model: Latent Dirichlet Allocation • Generate a document, or a bag (1,0,0) (0,0,1) (0,1,0) of words • Multinomial distribution • Distribution over discrete outcomes • Represented by non-negative vector that sums to one (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) • Picture representation • Come from a Dirichlet distribution 48

Generative story computer, TOPIC 1 technology, system, service, site, phone, internet, machine TOPIC 2 sell, sale, store, product, business, advertising, market, consumer TOPIC 3 play, film, movie, theater, production, star, director, stage 49

Generative story The three big Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A Stock Trades: A Better Deal 2-Tone L.E.D. to For Investors Isn't Simple Simplify Screens TOPIC 1 TOPIC 2 Forget the Bootleg, Just Download the Movie Legally The Shape of Cinema, Multiplex Heralded As Transformed At the Click of Linchpin To Growth a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is TOPIC 3 50

Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 51

Missing component: how to generate a multinomial distribution 55

Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 19: EM algorithm, Topic modeling Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 Administrivia HW4 due, HW5 out Remember that we only count the

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 14: PAC

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

HOW TO: Rise Above the Noise and Become a Successful Indie Developer IUGO Mobile Entertainment

2020 CSIU SIS User Meeting Interoperability Breakout Session Agenda Introduction

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher,

CSE5390 & 7390 Special Topics in Ubiquitous and Cognitive Computing lecture one, introduction

Opinions, Deceptions, and Lifelong Machine Learning Bing Liu University of Illinois at Chicago

Mariana Raykova Yale, Google Data Security: Encryption and Digital Signatures Beyond security

References From Textbook to Practice & How Things Can Go Wrong J.-P. Aumanson. Serious

CEE 370 Environmental Engineering Principles Lecture #12 Environmental Biology I: Biochemical

Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 19: EM algorithm, Topic modeling Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 Administrivia HW4 due, HW5 out Remember that we only count the

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 14: PAC

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

HOW TO: Rise Above the Noise and Become a Successful Indie Developer IUGO Mobile Entertainment

2020 CSIU SIS User Meeting Interoperability Breakout Session Agenda Introduction

Web Personalization &amp; Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher,

CSE5390 &amp; 7390 Special Topics in Ubiquitous and Cognitive Computing lecture one, introduction

Opinions, Deceptions, and Lifelong Machine Learning Bing Liu University of Illinois at Chicago

Mariana Raykova Yale, Google Data Security: Encryption and Digital Signatures Beyond security

References From Textbook to Practice &amp; How Things Can Go Wrong J.-P. Aumanson. Serious

CEE 370 Environmental Engineering Principles Lecture #12 Environmental Biology I: Biochemical

Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher,

CSE5390 & 7390 Special Topics in Ubiquitous and Cognitive Computing lecture one, introduction

References From Textbook to Practice & How Things Can Go Wrong J.-P. Aumanson. Serious