CMSC 678 Introduction to Machine Learning Spring 2019 - - PowerPoint PPT Presentation

cmsc 678 introduction to machine learning spring 2019
SMART_READER_LITE
LIVE PREVIEW

CMSC 678 Introduction to Machine Learning Spring 2019 - - PowerPoint PPT Presentation

CMSC 678 Introduction to Machine Learning Spring 2019 https://www.csee.umbc.edu/courses/graduate/678/spring19/ Some slides adapted from Hamed Pirsiavash Outline Welcome! Administrivia Basics of Learning Examples of Machine Learning Frank


slide-1
SLIDE 1

CMSC 678 Introduction to Machine Learning Spring 2019

https://www.csee.umbc.edu/courses/graduate/678/spring19/

Some slides adapted from Hamed Pirsiavash

slide-2
SLIDE 2

Outline

Welcome! Administrivia Basics of Learning Examples of Machine Learning

slide-3
SLIDE 3

Frank Ferraro

ITE 358 ferraro@umbc.edu Monday: 3:45-4:30 Tuesday: 11-11:30 by appointment Natural language processing: Semantics Vision & language processing Generative & neural modeling Learning with low-to-no supervision

slide-4
SLIDE 4

TA: Caroline Kery

Location TBA ckery1@umbc.edu TBD Multilingual language learning Semantic parsing Active learning Data visualization Analysis of educational data

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

https://cdn.arstechnica.net/wp-content/uploads/2015/11/Screen-Shot-2015-11-02-at-9.11.40-PM-640x543.png

slide-9
SLIDE 9

http://www.adweek.com/wp-content/uploads/sites/2/2016/02/NewsFeedTeaser640.jpg

slide-10
SLIDE 10

http://graphics.wsj.com/blue-feed-red-feed/

slide-11
SLIDE 11

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture)

slide-12
SLIDE 12

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture)

This is not a survey course. We will go deep into the topics.

slide-13
SLIDE 13
slide-14
SLIDE 14

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture) Learn different ways that success and progress can be measured in ML

slide-15
SLIDE 15

keras torch

slide-16
SLIDE 16

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture) Learn different ways that success and progress can be measured in ML Relate to statistics, AI [671], and specialized areas (e.g., NLP [673] and CV [691]) Implement ML programs

slide-17
SLIDE 17

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture) Learn different ways that success and progress can be measured in ML Relate to statistics, AI [671], and specialized areas (e.g., NLP [673] and CV [691]) Implement ML programs

Assignments will require your own implementation.

slide-18
SLIDE 18

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture) Learn different ways that success and progress can be measured in ML Relate to statistics, AI [671], and specialized areas (e.g., NLP [673] and CV [691]) Implement ML programs Read and analyze research papers Practice your (written) communication skills

slide-19
SLIDE 19

Outline

Welcome! Administrivia Basics of Learning Examples of Machine Learning

slide-20
SLIDE 20

Grading

Component 678 Assignments 40% Course Project 40% Two Exams 20%

slide-21
SLIDE 21

Grading

Component 678 Assignments 40% Course Project 40% Two Exams 20% Each component is max(micro-average, macro-average)

slide-22
SLIDE 22

Grading

Component 678 Assignments 40% Course Project 40% Two Exams 20% max(micro-average, macro-average)

65/90 95/100 95/110 100/110

slide-23
SLIDE 23

Grading

Component 678 Assignments 40% Course Project 40% Two Exams 20% max(micro-average, macro-average)

65/90 95/100 95/110 100/110

microaverage = 65 + 95 + 95 + 100 90 + 100 + 110 + 110 ≈ 86.59%

slide-24
SLIDE 24

Grading

Component 678 Assignments 40% Course Project 40% Two Exams 20% max(micro-average, macro-average)

65/90 95/100 95/110 100/110

microaverage = 65 + 95 + 95 + 100 90 + 100 + 110 + 110 ≈ 86.59% macroaverage = 1 4 65 90 + 95 100 + 95 110 + 100 110 ≈ 86.12%

slide-25
SLIDE 25

Grading

Component 678 Assignments 40% Course Project 40% Two Exams 20% max(micro-average, macro-average)

65/90 95/100 95/110 100/110

microaverage = 65 + 95 + 95 + 100 90 + 100 + 110 + 110 ≈ 86.59% macroaverage = 1 4 65 90 + 95 100 + 95 110 + 100 110 ≈ 86.12%

slide-26
SLIDE 26

Final Grades

If you get ≥ You get at least a/an 90 A- 80 B- 70 C- 65 D F

slide-27
SLIDE 27

https://www.csee.umbc.edu/courses/graduate/678/spring19/

slide-28
SLIDE 28

Online Discussions

https://piazza.com/umbc/spring2019/cmsc678

slide-29
SLIDE 29

Submitting Your Work

https://www.csee.umbc.edu/courses/graduate/678/spring19/submit

slide-30
SLIDE 30

Running the Assignments

A "standard" x86-64 Linux machine, like gl A passable amount of memory (2GB-4GB) Modern but not necessarily cutting edge software Don’t assume a GPU (if you want to write CUDA yourself, talk to me)

If in doubt, ask first

slide-31
SLIDE 31

Running the Project

An x86-64 Linux machine Memory and hardware constraints lifted (somewhat)

If in doubt, ask first

slide-32
SLIDE 32

Programming Languages for Assignments

Use the tools you feel comfortable with Python+numpy, C, C++, Java, Matlab, …: OK (straight Python may not cut it) Libraries: Generally OK, as long as you don’t use their implementation of what you need to implement Math accelerators (blas, numpy, etc.): OK

If in doubt, ask first

slide-33
SLIDE 33

Programming Languages for the Project

Use the tools you feel comfortable with Python+numpy, C, C++, Java, Matlab, …: OK (straight Python may not cut it) Libraries: Use what you want Math accelerators (blas, numpy, etc.): OK

slide-34
SLIDE 34

Important Dates

Date Due Friday, 2/8 Assignment 1 Wednesday, 3/6 Project Proposal Wednesday, 3/13 Exam 1 (In-class) Wednesday, 4/17 Project Update Friday, 5/17 Exam 2 (Final exam block) Wednesday, 5/22 Course Project

All items due 11:59 AM UMBC time (unless specified otherwise)

Future assignment dates will be announced

  • n Piazza, the website, and in class.
slide-35
SLIDE 35

Late Policy

Everyone has a budget of 10 late days

slide-36
SLIDE 36

Late Policy

Everyone has a budget of 10 late days If you have them left: assignments turned in after the deadline will be graded and recorded, no questions asked

slide-37
SLIDE 37

Late Policy

Everyone has a budget of 10 late days If you have them left: assignments turned in after the deadline will be graded and recorded, no questions asked If you don’t have any left: still turn assignments

  • in. They could count in your favor in borderline

cases

slide-38
SLIDE 38

Late Policy

Everyone has a budget of 10 late days Use them as needed throughout the course They’re meant for personal reasons and emergencies Do not procrastinate

slide-39
SLIDE 39

Late Policy

Everyone has a budget of 10 late days Contact me privately if an extended absence will occur

You must know how many you’ve used

slide-40
SLIDE 40

Main Resource: CIML

“A Course in Machine Learning”, v0.99 Hal Daumé III http://ciml.info/ Full book: http://ciml.info/dl/v0_99/ ciml-v0_99-all.pdf

Official

slide-41
SLIDE 41

Optional Advanced Resource: ESL

“Elements of Statistical Learning” Hastie, Tibshirani, Friedman https://web.stanford.edu/~hastie /ElemStatLearn/ Full book: https://web.stanford.edu/~hastie /ElemStatLearn/printings/ESLII_p rint12.pdf

Unofficial: Recommended

slide-42
SLIDE 42

Optional Advanced Resource: ITILA

“Information Theory, Inference and Learning Algorithms” MacKay http://www.inference.org.u k/mackay/itprnn/ps/ Full book: http://www.inference.phy.c am.ac.uk/itprnn/book.pdf

Unofficial: Recommended

slide-43
SLIDE 43

Optional Advanced Resource: UML

“Understanding Machine Learning: From Theory to Algorithms” Shalev-Shwartz, Ben-David http://www.cs.huji.ac.il/~shais/Un derstandingMachineLearning/ Full book: http://www.cs.huji.ac.il/~shais/Un derstandingMachineLearning/und erstanding-machine-learning- theory-algorithms.pdf

Unofficial: Recommended

slide-44
SLIDE 44

Resources #5… ∞

Peer-reviewed articles (journals, conferences & workshops)

ICML

slide-45
SLIDE 45

Is this the right course for you?

good math and programming background? diligent and determined? willing to implement & write up your results?

Unsure? Let’s talk after class

Who should take this course?

(thank you to everyone who filled out the survey! :) ) https://goo.gl/forms/yqVH8QnwzggpRQJr1

slide-46
SLIDE 46

Calculus and linear algebra

Techniques for finding maxima/minima of functions Convenient language for high dimensional data analysis

Probability

The study of the outcomes of repeated experiments The study of the plausibility of some event

Statistics

The analysis and interpretation of data

Why do we care about math?!

slide-47
SLIDE 47

Course Announcement 1: Assignment 1

Due Friday, 2/8 (~11 days) Math & programming review Discuss with others, but write, implement and complete on your own

slide-48
SLIDE 48

Outline

Welcome! Administrivia Basics of Learning Examples of Machine Learning

slide-49
SLIDE 49

Chris has just begun taking a machine learning course Pat, the instructor has to ascertain if Chris has “learned” the topics covered, at the end of the course What is a “reasonable” exam?

(Bad) Choice 1: History of pottery

Chris’s performance is not indicative of what was learned in ML

(Bad) Choice 2: Questions answered during lectures

Open book?

A good test should test ability to answer “related” but “new” questions on the exam

What does it mean to learn?

Generalization

slide-50
SLIDE 50

Model, parameters and hyperparameters

Model: mathematical formulation of system (e.g., classifier) Parameters: primary “knobs” of the model that are set by a learning algorithm Hyperparameter: secondary “knobs”

http://www.uiparade.com/wp-content/uploads/2012/01/ui-design-pure-css.jpg

slide-51
SLIDE 51
slide-52
SLIDE 52

score( )

slide-53
SLIDE 53

scoreθ( )

scoring model

  • bjective

F(θ)

slide-54
SLIDE 54

scoring model

  • bjective

F(θ)

(implicitly) dependent on the

  • bserved data X=

scoreθ( )

slide-55
SLIDE 55

Machine Learning Framework: Learning

instance 1 instance 2 instance 3 instance 4 Machine Learning Predictor

slide-56
SLIDE 56

Machine Learning Framework: Learning

instance 1 instance 2 instance 3 instance 4 Machine Learning Predictor Extra-knowledge

instances are typically examined independently

slide-57
SLIDE 57

Machine Learning Framework: Learning

instance 1 instance 2 instance 3 instance 4 Machine Learning Predictor Extra-knowledge

Evaluator

score

instances are typically examined independently Gold/correct labels

slide-58
SLIDE 58

Machine Learning Framework: Learning

instance 1 instance 2 instance 3 instance 4 Machine Learning Predictor Extra-knowledge

Evaluator

score

instances are typically examined independently Gold/correct labels

give feedback to the predictor

slide-59
SLIDE 59

F(θ) θ F’(θ)

derivative

  • f F wrt θ

θ*

How do we optimize? Follow the derivative

slide-60
SLIDE 60

F(θ) θ F’(θ)

derivative

  • f F wrt θ

θ*

How do we optimize? Follow the derivative

Set t = 0 Pick a starting value θt Until converged:

  • 1. Get value y t = F(θ t)

θ0 y0

slide-61
SLIDE 61

F(θ) θ F’(θ)

derivative

  • f F wrt θ

θ*

How do we optimize? Follow the derivative

Set t = 0 Pick a starting value θt Until converged:

  • 1. Get value y t = F(θ t)
  • 2. Get derivative g t = F’(θ t)

θ0 y0 g0

slide-62
SLIDE 62

F(θ) θ F’(θ)

derivative

  • f F wrt θ

θ*

How do we optimize? Follow the derivative

Set t = 0 Pick a starting value θt Until converged:

  • 1. Get value y t = F(θ t)
  • 2. Get derivative g t = F’(θ t)
  • 3. Get scaling factor ρ t
  • 4. Set θ t+1 = θ t + ρ t *g t
  • 5. Set t += 1

θ0 y0 θ1 g0

slide-63
SLIDE 63

F(θ) θ F’(θ)

derivative

  • f F wrt θ

θ*

How do we optimize? Follow the derivative

Set t = 0 Pick a starting value θt Until converged:

  • 1. Get value y t = F(θ t)
  • 2. Get derivative g t = F’(θ t)
  • 3. Get scaling factor ρ t
  • 4. Set θ t+1 = θ t + ρ t *g t
  • 5. Set t += 1

θ0 y0 θ1 y1 θ2 g0 g1

slide-64
SLIDE 64

F(θ) θ F’(θ)

derivative

  • f F wrt θ

θ*

How do we optimize? Follow the derivative

Set t = 0 Pick a starting value θt Until converged:

  • 1. Get value y t = F(θ t)
  • 2. Get derivative g t = F’(θ t)
  • 3. Get scaling factor ρ t
  • 4. Set θ t+1 = θ t + ρ t *g t
  • 5. Set t += 1

θ0 y0 θ1 y1 θ2 y2 y3 θ3 g0 g1 g2

slide-65
SLIDE 65

F(θ) θ F’(θ)

derivative

  • f F wrt θ

θ*

How do we optimize? Follow the derivative

Set t = 0

Pick a starting value θt

Until converged:

  • 1. Get value y t = F(θ t)
  • 2. Get derivative g t = F’(θ t)
  • 3. Get scaling factor ρ t
  • 4. Set θ t+1 = θ t + ρ t *g t
  • 5. Set t += 1

θ0 y0 θ1 y1 θ2 y2 y3 θ3 g0 g1 g2

slide-66
SLIDE 66

Gradient = Multi-variable derivative

K-dimensional input K-dimensional output

slide-67
SLIDE 67

Gradient Ascent

slide-68
SLIDE 68

Gradient Ascent

slide-69
SLIDE 69

Gradient Ascent

slide-70
SLIDE 70

Gradient Ascent

slide-71
SLIDE 71

Gradient Ascent

Set t = 0 Pick a starting value θt Until converged:

  • 1. Get value y t = F(θ t)
  • 2. Get gradient g t = F’(θ t)
  • 3. Get scaling factor(s) ρ t
  • 4. Set θ t+1 = θ t + ρ t *g t
  • 5. Set t += 1

vector Vector of partial derivatives

slide-72
SLIDE 72

Outline

Welcome! Administrivia Basics of Learning Examples of Machine Learning

slide-73
SLIDE 73

A Terminology Buffet

Classification Regression Clustering

the task: what kind

  • f problem are you

solving?

slide-74
SLIDE 74

A Terminology Buffet

Classification Regression Clustering Fully-supervised Semi-supervised Un-supervised

the task: what kind

  • f problem are you

solving? the data: amount of human input/number

  • f labeled examples
slide-75
SLIDE 75

A Terminology Buffet

Classification Regression Clustering Fully-supervised Semi-supervised Un-supervised

Probabilistic Generative Conditional Spectral Neural Memory- based Exemplar …

the data: amount of human input/number

  • f labeled examples

the approach: how any data are being used the task: what kind

  • f problem are you

solving?

slide-76
SLIDE 76

Classification Examples

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

slide-77
SLIDE 77

Classification Examples

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

an instance a fixed set of classes C = {c1, c2,…, cJ}

Output: a predicted class c from C

slide-78
SLIDE 78

Classification: Hand-coded Rules?

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Rules based on combinations of words or other features

spam: black-list-address OR (“dollars” AND “have been selected”)

Accuracy can be high

If rules carefully refined by expert

Building and maintaining these rules is expensive Can humans faithfully assign uncertainty?

slide-79
SLIDE 79

Classification: Supervised Machine Learning

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

an instance d a fixed set of classes C = {c1, c2,…, cJ} A training set of m hand-labeled instances (d1,c1),....,(dm,cm)

Output:

a learned classifier γ that maps instances to classes

slide-80
SLIDE 80

Classification: Supervised Machine Learning

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

an instance d a fixed set of classes C = {c1, c2,…, cJ} A training set of m hand-labeled instances (d1,c1),....,(dm,cm)

Output:

a learned classifier γ that maps instances to classes

γ learns to associate certain features of instances with their labels

slide-81
SLIDE 81

Classification: Supervised Machine Learning

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

an instance d a fixed set of classes C = {c1, c2,…, cJ} A training set of m hand-labeled instances (d1,c1),....,(dm,cm)

Output:

a learned classifier γ that maps instances to classes

Naïve Bayes Logistic regression Support-vector machines k-Nearest Neighbors …

slide-82
SLIDE 82

Classify with Goodness

best label =

label arg max score(example, label)

slide-83
SLIDE 83

Classification Example: Face Recognition

What is a good representation for images?

Pixel values? Edges?

Courtesy from Hamed Pirsiavash

slide-84
SLIDE 84

Classification Example: Sequence & Structured Prediction

Courtesy Hamed Pirsiavash

slide-85
SLIDE 85

Ingredients for classification

Inject your knowledge into a learning system

Feature representation Training data: labeled examples Model

Courtesy Hamed Pirsiavash

slide-86
SLIDE 86

Ingredients for classification

Inject your knowledge into a learning system

Problem specific Difficult to learn from bad

  • nes

Feature representation Training data: labeled examples Model

Courtesy Hamed Pirsiavash

slide-87
SLIDE 87

Ingredients for classification

Inject your knowledge into a learning system

Problem specific Difficult to learn from bad

  • nes

Labeling data == $$$ Sometimes data is available for “free”

Feature representation Training data: labeled examples Model

Courtesy Hamed Pirsiavash

slide-88
SLIDE 88

Ingredients for classification

Inject your knowledge into a learning system

Problem specific Difficult to learn from bad

  • nes

Labeling data == $$$ Sometimes data is available for “free” No single learning algorithm is always good (“no free lunch”) Different learning algorithms work differently

Feature representation Training data: labeled examples Model

Courtesy Hamed Pirsiavash

slide-89
SLIDE 89

Regression

Like classification, but real-valued

slide-90
SLIDE 90

Regression Example: Stock Market Prediction

Courtesy Hamed Pirsiavash

slide-91
SLIDE 91

Unsupervised learning: Clustering

Courtesy Hamed Pirsiavash

slide-92
SLIDE 92

Outline

Welcome! Administrivia Basics of Learning Examples of Machine Learning