ECE 6504: Advanced Topics in Machine Learning Probabilistic - - PowerPoint PPT Presentation

ece 6504 advanced topics in machine learning
SMART_READER_LITE
LIVE PREVIEW

ECE 6504: Advanced Topics in Machine Learning Probabilistic - - PowerPoint PPT Presentation

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Dhruv Batra Virginia Tech What is this class about? Some of the most exciting developments in Machine Learning, AI, Statistics &


slide-1
SLIDE 1

ECE 6504: Advanced Topics in Machine Learning

Probabilistic Graphical Models and Large-Scale Learning

Dhruv Batra Virginia Tech

slide-2
SLIDE 2

What is this class about? Some of the most exciting developments in Machine Learning, AI, Statistics & related fields in the last 3 decades

(C) Dhruv Batra 2

slide-3
SLIDE 3

First Caveat

  • This is an ADVANCED Machine Learning class

– This should not be your first introduction to ML – You will need a formal class; not just self-reading/coursera – If you took ECE 4984/5984, you’re in the right place – If you took ECE 5524 or equivalent, see list of topics taught in ECE 4984/5984.

(C) Dhruv Batra 3

slide-4
SLIDE 4

Topics Covered in Intro to ML&P

  • Basics of Statistical Learning
  • Loss function, MLE, MAP, Bayesian estimation, bias-variance tradeoff,
  • verfitting, regularization, cross-validation
  • Supervised Learning
  • Naïve Bayes, Logistic Regression, Nearest Neighbour, Neural

Networks, Support Vector Machines, Kernels

  • Ensemble Methods: Bagging, Boosting
  • Unsupervised Learning
  • Clustering: k-means, Gaussian mixture models, EM
  • Dimensionality reduction: PCA, SVD, LDA
  • Perception
  • Applications to Vision, Natural Language Processing

(C) Dhruv Batra 4

slide-5
SLIDE 5

What is this class about?

  • Making global predictions from local observations
  • Learning such models from large quantities of data

(C) Dhruv Batra 5

slide-6
SLIDE 6

Exciting Developments

  • Probabilistic Graphical Models

– Directed: Bayesian Networks (Bayes Nets) – Undirected: Markov/Conditional Random Fields – Structured Prediction

  • Large-Scale Learning

– Online learning – Distributed learning

  • Deep Learning

– Convolutional Nets – Distributed backprop – Dropout

(C) Dhruv Batra 6

Not covered in this class

slide-7
SLIDE 7

What is Machine Learning?

  • What is learning?
  • [Kevin Murphy] algorithms that

– automatically detect patterns in data – use the uncovered patterns to predict future data or other

  • utcomes of interest
  • [Tom Mitchell] algorithms that

– improve their performance (P) – at some task (T) – with experience (E)

(C) Dhruv Batra 7

slide-8
SLIDE 8

Tasks

(C) Dhruv Batra 8

Classification x y Regression x y

Discrete Continuous

Clustering x c

Discrete ID

Dimensionality Reduction x z

Continuous

Supervised Learning Unsupervised Learning

slide-9
SLIDE 9

Classification

(C) Dhruv Batra 9

Classification x y

Discrete

slide-10
SLIDE 10

Speech Recognition

(C) Dhruv Batra 10 Slide Credit: Carlos Guestrin

slide-11
SLIDE 11

Machine Translation

(C) Dhruv Batra 11 Figure Credit: Kevin Gimpel

slide-12
SLIDE 12

Object/Face ¡detec,on ¡

  • Many ¡new ¡digital ¡cameras ¡now ¡detect ¡faces ¡

– Canon, ¡Sony, ¡Fuji, ¡… ¡ ¡ ¡ ¡ ¡ ¡ ¡

12 Slide Credit: Noah Snavely, Steve Seitz, Pedro Felzenschwalb (C) Dhruv Batra

slide-13
SLIDE 13

Reading a noun (vs verb)

[Rustandi et al., 2005]

Slide Credit: Carlos Guestrin 13

slide-14
SLIDE 14

Regression

(C) Dhruv Batra 14

Regression x y

Continuous

slide-15
SLIDE 15

Stock market

15 (C) Dhruv Batra

slide-16
SLIDE 16

Weather Prediction

Temperature

Slide Credit: Carlos Guestrin 16 (C) Dhruv Batra

slide-17
SLIDE 17

Tasks

(C) Dhruv Batra 17

Classification x y Regression x y

Discrete Continuous

Clustering x c

Discrete ID

Dimensionality Reduction x z

Continuous

Supervised Learning Unsupervised Learning

slide-18
SLIDE 18

Need for Joint Prediction

(C) Dhruv Batra 18

slide-19
SLIDE 19

Handwriting recognition

Character recognition, e.g., kernel SVMs

e c b c b c a a a a a a z c b c a c r r r r r

slide-20
SLIDE 20

Handwriting recognition 2

slide-21
SLIDE 21

[Smyth ¡et ¡al., ¡1994] ¡

Local Ambiguity

slide-22
SLIDE 22

Local Ambiguity

(C) Dhruv Batra 22

slide credit: Fei-Fei Li, Rob Fergus & Antonio Torralba

slide-23
SLIDE 23

Joint Prediction

(C) Dhruv Batra 23

Classification x1, x2,…, xn y1, y2,…,yn Regression

Discrete Continuous

x1, x2,…, xn y1, y2,…,yn

slide-24
SLIDE 24

How many parameters?

  • P(X1, X2, …, Xn)
  • Each Xi takes k states
  • What if all Xi are independent?

(C) Dhruv Batra 24

slide-25
SLIDE 25

Probabilistic Graphical Models

  • One of the most exciting advancements in statistical

AI in the last 10-20 years

  • Marriage

– Graph Theory + Probability

  • Compact representation for exponentially-large

probability distributions

– Exploit conditional independencies

  • Generalize

– naïve Bayes – logistic regression – Many more …

(C) Dhruv Batra 25

slide-26
SLIDE 26

Types of PGMs

(C) Dhruv Batra 26 Graphical Models

Directed

Directed Factor Graph Bayesian Networks

Dynamic Bayes nets Markov chains HMM LDS Latent variable models Discrete Mixture models cluster- ing Continuous dimen- reduct
  • ver-
complete repres. Influence diagrams Strong JT Decision theory

Chain Graphs Undirected Graphs

Markov network

input dependent CRF Pairwise Boltz. machine (disc.) Gauss. Process (cont)

Clique Graphs

Junction tree Clique tree

Factor Graphs

Image Credit: David Barber

slide-27
SLIDE 27

Main Issues in PGMs

  • Representation

– How do we store P(X1, X2, …, Xn) – What does my model mean/imply/assume? (Semantics)

  • Inference

– How do I answer questions/queries with my model? such as – Marginal Estimation: P(X5 | X1, X4) – Most Probable Explanation: argmax P(X1, X2, …, Xn)

  • Learning

– How do we learn parameters and structure of P(X1, X2, …, Xn) from data? – What model is the right for my data?

(C) Dhruv Batra 27

slide-28
SLIDE 28

Key Ingredient

  • Exploit independence assumptions

– Encoded in the graph structure

  • Structured Prediction vs Unstructured Prediction

(C) Dhruv Batra 28

slide-29
SLIDE 29

Application: Evolutionary Biology

(C) Dhruv Batra 29

[Friedman et al.]

slide-30
SLIDE 30

30

Application: Computer Vision

Chain model (hidden Markov model) Interpreting sign language sequences

(C) Dhruv Batra Image Credit: Simon JD Prince

slide-31
SLIDE 31

Application: Speech

(C) Dhruv Batra 31

slide-32
SLIDE 32

Application: Sensor Network

(C) Dhruv Batra 32

A ¡ B ¡ C ¡

Image Credit: Carlos Guestrin & Erik Sudderth

slide-33
SLIDE 33

Application: Medical Diagnosis

(C) Dhruv Batra 33 Image Credit: Erik Sudderth

slide-34
SLIDE 34

Application: Coding

(C) Dhruv Batra 34

Observed Bits True Bits Parity Constraints

slide-35
SLIDE 35

Application: Protein Folding

  • Foldit

– http://youtu.be/bTlNNFQxs_A?t=175 – http://www.youtube.com/watch?v=lGYJyur4FUA

(C) Dhruv Batra 35

slide-36
SLIDE 36

Application: Protein Folding

  • Foldit

– http://youtu.be/bTlNNFQxs_A?t=175 – http://www.youtube.com/watch?v=lGYJyur4FUA

(C) Dhruv Batra 36

slide-37
SLIDE 37

37

Application: Computer Vision

Image Credit: Simon JD Prince

Tree model Parsing the human body

(C) Dhruv Batra

slide-38
SLIDE 38

38

Application: Computer Vision

Image Credit: Simon JD Prince

Grid model Markov random field (blue nodes) Semantic segmentation

(C) Dhruv Batra

slide-39
SLIDE 39

Application: Computer Vision

  • Geometric Labelling

– [Hoiem et al. IJCV ’07], [Hoiem et al. CVPR ’08], [Saxena PAMI ’08], [Ramalingam et al. CVPR ‘08].

39 (C) Dhruv Batra

slide-40
SLIDE 40
  • Name-Face Association

– [Berg et al. CVPR ’04, Phd-Thesis ‘07], [Gallagher et al. CVPR ’08].

40

Mildred and Lisa

Lisa Mildred

Application: Computer Vision

1900 1920 1940 1960 1980 2000 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Birth Year Probability Probability of Birth Year Mildred Lisa Nora Peyton Linda

(C) Dhruv Batra

slide-41
SLIDE 41
  • Name-Face Association

– [Berg et al. CVPR ’04, Phd-Thesis ‘07], [Gallagher et al. CVPR ’08].

41

President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters British director Sam Mendes and his partner actress Kate Winslet arrive at the London premiere of ’The Road to Perdition’, September 18, 2002. The films stars Tom Hanks as a Chicago hit man who has a separate family life and co-stars Paul Newman and Jude Law. REUTERS/Dan Chung

Application: Computer Vision

(C) Dhruv Batra

slide-42
SLIDE 42

And many

many

many

many

many

more…

42 (C) Dhruv Batra

slide-43
SLIDE 43

Course Information

  • Instructor: Dhruv Batra

– dbatra@vt – Office Hours: Fri 1-2pm – Location: 468 Whittemore

(C) Dhruv Batra 43

slide-44
SLIDE 44

Syllabus

  • Directed Graphical Models (Bayes Nets)

– Representation: Directed Acyclic Graphs (DAGs), Conditional Probability Tables (CPTs), d-Separation, v-structures, Markov Blanket, I-Maps – Parameter Learning: MLE, MAP, EM – Structure Learning: Chow-Liu, Decomposable scores, hill climbing – Inference: Marginals, MAP/MPE, Variable Elimination

  • Undirected Graphical Models (MRFs/CRFs)

– Representation: Junction trees, Factor graphs, treewidth, Local Makov Assumptions, Moralization, Triangulation – Inference: Belief Propagation, Message Passing, Linear Programming Relaxations, Dual-Decomposition, Variational Inference, Mean Field – Parameter Learning: MLE, gradient descent – Structured Prediction: Structured SVMs, Cutting-Plane training

  • Large-Scale Learning

– Online learning: perceptrons, stochastic (sub-)gradients – Distributed Learning: Dual Decomposition, Alternating Direction Method

  • f Multipliers (ADMM)

(C) Dhruv Batra 44

slide-45
SLIDE 45

Syllabus

  • You will learn about the methods you heard about,

and then some.

  • You will understand algorithms, theory, applications,

and implementations

  • It’s going to be FUN and HARD WORK J

J

(C) Dhruv Batra 45

slide-46
SLIDE 46

Prerequisites

  • Intro Machine Learning

– Classifiers, regressors, loss functions, MLE, MAP

  • Linear Algebra

– Matrix multiplication, eigenvalues, positive semi-definiteness…

  • Graph Concepts

– Nodes, edges, trees, cycles, depth-first search

  • Algorithms

– Dynamic programming, basic data structures, complexity…

  • Programming

– Matlab for HWs. Your language of choice for project

  • Ability to deal with “abstract mathematical concepts”
  • This will be an in-depth class.

(C) Dhruv Batra 46

slide-47
SLIDE 47

Textbook

  • No required book.

– We will assign readings from online/free books, papers, etc

  • Reference Books:

– [On Library Reserve] Probabilistic Graphical Models: Principles and Techniques Daphne Koller and Nir Friedman – [Free PDF from author] Bayesian reasoning and machine learning David Barber http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php? n=Brml.HomePage – [Free PDF from authors] Graphical models, exponential families, and variational inference. Martin J. Wainwright and Michael I. Jordan.

(C) Dhruv Batra 47

slide-48
SLIDE 48

Grading

  • 5 homeworks (50%)

– First one goes out Jan 30

  • Start early, Start early, Start early, Start early, Start early, Start early,

Start early, Start early, Start early, Start early

  • Final project (25%)

– Projects done individually, or groups of two students

  • Final (20%)

– Take home – 3-5 days

  • Class Participation / Paper Reading (5%)

– Contribute to class discussions on Scholar – Ask questions, answer questions – Reading assigned papers

(C) Dhruv Batra 48

slide-49
SLIDE 49

Re-grading Policy

  • Homework assignments and midterm

– Within 3 days of receiving grades: see me

  • This is an advanced grad class.

– The goal is understanding the material and making progress towards our research.

(C) Dhruv Batra 49

slide-50
SLIDE 50

Homeworks

  • Homeworks are hard, start early!

– Due in 2 weeks via Scholar (Assignments tool) – Theory + Implementation (similar format as 4984/5984) – HW1 out 1/30

  • “Free” Late Days

– 5 late days for the semester

  • Use for HW, project proposal/report
  • Cannot use for midterm or final

– After late days are used up:

  • Half credit within 48 hours
  • Zero credit after 48 hours
  • All homeworks must be submitted even for zero credit

(C) Dhruv Batra 50

slide-51
SLIDE 51

Project

  • Goal

– Chance to try Graphical Models – Encouraged to apply to your research (computer vision, communication, UAVs, computational biology…) – Must be done this semester. No double counting. – Extra credit for shooting for a publication

  • Main categories

– Application/Survey

  • Compare a bunch of existing algorithms on a new application domain of your

interest

– Formulation/Development

  • Formulate a new model or algorithm for a new or old problem

– Theory

  • Theoretically analyze an existing algorithm
  • Support

– We will give a list of ideas, points to dataset/algorithms/code – Mentor teams and give feedback.

(C) Dhruv Batra 51

slide-52
SLIDE 52

Spring 2013 Projects

  • Gesture Activated Interactive Assistant

– Gordon Christie & Ujwal Krothpalli, Grad Students – http://youtu.be/VFPAHY7th9A

(C) Dhruv Batra 52

slide-53
SLIDE 53

Spring 2013 Projects

  • American Sign Language Detection

– Vireshwar Kumar & Dhiraj Amuru, Grad Students

(C) Dhruv Batra 53

slide-54
SLIDE 54

Collaboration Policy

  • Collaboration

– Only on HW and project (not allowed in exams). – You may discuss the questions – Each student writes their own answers – Write on your homework anyone with whom you collaborate – Each student must write their own code for the programming part

  • Zero tolerance on plagiarism

– Neither ethical nor in your best interest – Always credit your sources – Don’t cheat. We will find out.

(C) Dhruv Batra 54

slide-55
SLIDE 55

Audit / Sit in

  • Audit

– ECE Audit Request form

  • http://www.ece.vt.edu/graduate/forms/index.html

– Deadline: Jan 27

  • Sitting in

– Talk to instructor.

(C) Dhruv Batra 55

slide-56
SLIDE 56

Communication Channels

  • Primary means of communication -- Scholar Forum

– No direct emails to Instructor unless private information – Instructor can mark/provide answers to everyone – Class participation credit for answering questions! – No posting hints/answers. We will monitor.

  • Class websites:

– https://scholar.vt.edu/portal/site/s14ece6504 – https://filebox.ece.vt.edu/~s14ece6504/

  • Office Hours

(C) Dhruv Batra 56

slide-57
SLIDE 57

Other Relevant Classes

  • Data Analytics (CS 5526)

– Instructor: X Deng – Offered: Spring

  • Optimization (ISE 5406)

– Instructor: BM Fraticelli – Offered: Spring

  • Convex Optimization (ECE 5734)

– Instructor: MH Farhood – Offered: Spring

  • Advanced Computer Vision (ECE 6504)

– Instructor: Devi Parikh – Offered: Spring

(C) Dhruv Batra 57

slide-58
SLIDE 58

Guest Lectures

  • Rosalyn Moran, VT CRI

– Graphical Models for Neuroscience – Variational Inference

(C) Dhruv Batra 58

slide-59
SLIDE 59

Misc Notes

  • Mix of power-point + writing on board

– Slides + notes available on scholar

  • Difficulty level of this class

– On par with Spring 2013 4984/5984 – Significantly more than Fall 2013 4984 – More than Fall 2013 5984

  • Exciting topic; Advanced Class

– Focus on depth, not breadth – We will go as slow as necessary and bearable J

(C) Dhruv Batra 59

slide-60
SLIDE 60

Plan for Today

  • Nothing!

(C) Dhruv Batra 60

slide-61
SLIDE 61

Todo

  • Readings

– Probability Refresher: Barber Chap 1 – Graph Theory Refresher: Barber Chap 2

(C) Dhruv Batra 61