CMSC 678 Introduction to Machine Learning Spring 2018 - - PowerPoint PPT Presentation

cmsc 678 introduction to machine learning spring 2018
SMART_READER_LITE
LIVE PREVIEW

CMSC 678 Introduction to Machine Learning Spring 2018 - - PowerPoint PPT Presentation

CMSC 678 Introduction to Machine Learning Spring 2018 https://www.csee.umbc.edu/courses/graduate/678/spring18/ Some slides adapted from Hamed Pirsiavash Frank Ferraro Natural language processing: Semantics ITE 358 ferraro@umbc.edu Vision


slide-1
SLIDE 1

CMSC 678 Introduction to Machine Learning Spring 2018

https://www.csee.umbc.edu/courses/graduate/678/spring18/

Some slides adapted from Hamed Pirsiavash

slide-2
SLIDE 2

Frank Ferraro

ITE 358 ferraro@umbc.edu Monday: 3:45-4:30 Tuesday: 11-11:30 by appointment Natural language processing: Semantics Vision & language processing Generative & neural modeling Learning with low-to-no supervision

slide-3
SLIDE 3

TA: Vamshi Nagabandi

Location TBA nvamshi1@umbc.edu Wednesday 1-2 Thursday 2:30-3:30 Machine learning Data analytics

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

https://cdn.arstechnica.net/wp-content/uploads/2015/11/Screen-Shot-2015-11-02-at-9.11.40-PM-640x543.png

slide-8
SLIDE 8

http://www.adweek.com/wp-content/uploads/sites/2/2016/02/NewsFeedTeaser640.jpg

slide-9
SLIDE 9

http://graphics.wsj.com/blue-feed-red-feed/

slide-10
SLIDE 10

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture)

slide-11
SLIDE 11

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture)

This is not a survey course. We will go deep into the topics.

slide-12
SLIDE 12
slide-13
SLIDE 13

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture) Learn different ways that success and progress can be measured in ML

slide-14
SLIDE 14

keras

slide-15
SLIDE 15

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture) Learn different ways that success and progress can be measured in ML Relate to statistics, AI [671], and specialized areas (e.g., NLP [673] and CV [691]) Implement ML programs

slide-16
SLIDE 16

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture) Learn different ways that success and progress can be measured in ML Relate to statistics, AI [671], and specialized areas (e.g., NLP [673] and CV [691]) Implement ML programs

Assignments will require your own implementation.

slide-17
SLIDE 17

Course Goals

Be introduced to some of the core problems and solutions of ML (big picture) Learn different ways that success and progress can be measured in ML Relate to statistics, AI [671], and specialized areas (e.g., NLP [673] and CV [691]) Implement ML programs Read and analyze research papers Practice your (written) communication skills

slide-18
SLIDE 18

Administrivia

slide-19
SLIDE 19

Grading

Component 678 Four Assignments 40% Course Project 40% Two Exams 20%

slide-20
SLIDE 20

Grading

Component 678 Four Assignments 40% Course Project 40% Two Exams 20% Each component is max(micro-average, macro-average)

slide-21
SLIDE 21

Grading

Component 678 Four Assignments 40% Course Project 40% Two Exams 20% max(micro-average, macro-average)

65/90 95/100 95/110 100/110

slide-22
SLIDE 22

Grading

Component 678 Four Assignments 40% Course Project 40% Two Exams 20% max(micro-average, macro-average)

65/90 95/100 95/110 100/110

microaverage = 65 + 95 + 95 + 100 90 + 100 + 110 + 110 ≈ 86.59%

slide-23
SLIDE 23

Grading

Component 678 Four Assignments 40% Course Project 40% Two Exams 20% max(micro-average, macro-average)

65/90 95/100 95/110 100/110

microaverage = 65 + 95 + 95 + 100 90 + 100 + 110 + 110 ≈ 86.59% macroaverage = 1 4 65 90 + 95 100 + 95 110 + 100 110 ≈ 86.12%

slide-24
SLIDE 24

Grading

Component 678 Four Assignments 40% Course Project 40% Two Exams 20% max(micro-average, macro-average)

65/90 95/100 95/110 100/110

microaverage = 65 + 95 + 95 + 100 90 + 100 + 110 + 110 ≈ 86.59% macroaverage = 1 4 65 90 + 95 100 + 95 110 + 100 110 ≈ 86.12%

slide-25
SLIDE 25

Final Grades

If you get ≥ You get at least a/an 90 A- 80 B- 70 C- 65 D F

slide-26
SLIDE 26

https://www.csee.umbc.edu/courses/graduate/678/spring18/

slide-27
SLIDE 27

Submitting Your Work

https://www.csee.umbc.edu/courses/graduate/678/spring18/submit

slide-28
SLIDE 28

Running the Assignments

A "standard" x86-64 Linux machine, like gl A passable amount of memory (2GB-4GB) Modern but not necessarily cutting edge software Don’t assume a GPU (if you want to write CUDA yourself, talk to me)

If in doubt, ask first

slide-29
SLIDE 29

Running the Project

An x86-64 Linux machine Memory and hardware constraints lifted (somewhat)

If in doubt, ask first

slide-30
SLIDE 30

Programming Languages for Assignments

Use the tools you feel comfortable with Python+numpy, C, C++, Java, Matlab, …: OK (straight Python may not cut it) Libraries: Generally OK, as long as you don’t use their implementation of what you need to implement Math accelerators (blas, numpy, etc.): OK

If in doubt, ask first

slide-31
SLIDE 31

Programming Languages for the Project

Use the tools you feel comfortable with Python+numpy, C, C++, Java, Matlab, …: OK (straight Python may not cut it) Libraries: Use what you want Math accelerators (blas, numpy, etc.): OK

slide-32
SLIDE 32

Online Discussions

https://piazza.com/umbc/spring2018/cmsc678

slide-33
SLIDE 33

Important Dates

Date Due Wednesday, 2/7 Assignment 1 Monday, 3/5 Assignment 2 Monday, 3/12 Project Proposal Wednesday, 3/14 Exam 1 (In-class) Monday, 4/2 Assignment 3 Monday, 4/9 Project Update Monday, 5/14 Assignment 4 Friday, 5/18 Exam 2 (Final exam block) Wednesday, 5/23 Course Project

All items due 11:59 AM UMBC time (unless specified otherwise)

slide-34
SLIDE 34

Late Policy

Everyone has a budget of 10 late days

slide-35
SLIDE 35

Late Policy

Everyone has a budget of 10 late days If you have them left: assignments turned in after the deadline will be graded and recorded, no questions asked

slide-36
SLIDE 36

Late Policy

Everyone has a budget of 10 late days If you have them left: assignments turned in after the deadline will be graded and recorded, no questions asked If you don’t have any left: still turn assignments

  • in. They could count in your favor in borderline

cases

slide-37
SLIDE 37

Late Policy

Everyone has a budget of 10 late days Use them as needed throughout the course They’re meant for personal reasons and emergencies Do not procrastinate

slide-38
SLIDE 38

Late Policy

Everyone has a budget of 10 late days Contact me privately if an extended absence will occur

You must know how many you’ve used

slide-39
SLIDE 39

Resource #1: ESL

“Elements of Statistical Learning” Hastie, Tibshirani, Friedman https://web.stanford.edu/~hastie /ElemStatLearn/ Full book: https://web.stanford.edu/~hastie /ElemStatLearn/printings/ESLII_p rint12.pdf

Official: Recommended

slide-40
SLIDE 40

Resource #2: ITILA

“Information Theory, Inference and Learning Algorithms” MacKay http://www.inference.org.u k/mackay/itprnn/ps/ Full book: http://www.inference.phy.c am.ac.uk/itprnn/book.pdf

Official: Recommended

slide-41
SLIDE 41

Resource #3: UML

“Understanding Machine Learning: From Theory to Algorithms” Shalev-Shwartz, Ben-David http://www.cs.huji.ac.il/~shais/Un derstandingMachineLearning/ Full book: http://www.cs.huji.ac.il/~shais/Un derstandingMachineLearning/und erstanding-machine-learning- theory-algorithms.pdf

Official: Recommended

slide-42
SLIDE 42

Resource #4: CIML

“A Course in Machine Learning”, v0.99 Hal Daumé III http://ciml.info/ Full book: http://ciml.info/dl/v0_99/ ciml-v0_99-all.pdf

Unofficial

slide-43
SLIDE 43

Resources #5… ∞

Peer-reviewed articles (journals, conferences & workshops)

ICML

slide-44
SLIDE 44

Is this the right course for you?

good math and programming background? diligent and determined? willing to implement & write up your results?

Unsure? Let’s talk after class

Who should take this course?

(thank you to everyone who filled out the survey! :) ) https://goo.gl/forms/yqVH8QnwzggpRQJr1

slide-45
SLIDE 45

Calculus and linear algebra

Techniques for finding maxima/minima of functions Convenient language for high dimensional data analysis

Probability

The study of the outcomes of repeated experiments The study of the plausibility of some event

Statistics

The analysis and interpretation of data

Why do we care about math?!

slide-46
SLIDE 46

Course Announcement 1: Assignment 1

Due Wednesday, 2/7 (~9 days) Math & programming review Discuss with others, but write, implement and complete on your own

slide-47
SLIDE 47

Chris has just begun taking a machine learning course Pat, the instructor has to ascertain if Chris has “learned” the topics covered, at the end of the course What is a “reasonable” exam?

(Bad) Choice 1: History of pottery

Chris’s performance is not indicative of what was learned in ML

(Bad) Choice 2: Questions answered during lectures

Open book?

A good test should test ability to answer “related” but “new” questions on the exam

What does it mean to learn?

Generalization

slide-48
SLIDE 48

Machine Learning Framework: Learning

instance 1 instance 2 instance 3 instance 4 Machine Learning Predictor

slide-49
SLIDE 49

Machine Learning Framework: Learning

instance 1 instance 2 instance 3 instance 4 Machine Learning Predictor Extra-knowledge

instances are typically examined independently

slide-50
SLIDE 50

Machine Learning Framework: Learning

instance 1 instance 2 instance 3 instance 4 Machine Learning Predictor Extra-knowledge

Evaluator

score

instances are typically examined independently Gold/correct labels

slide-51
SLIDE 51

Machine Learning Framework: Learning

instance 1 instance 2 instance 3 instance 4 Machine Learning Predictor Extra-knowledge

Evaluator

score

instances are typically examined independently Gold/correct labels

give feedback to the predictor

slide-52
SLIDE 52

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today.

slide-53
SLIDE 53

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today.

score( )

slide-54
SLIDE 54

scoreθ(X)

scoring model

  • bjective

F(θ)

slide-55
SLIDE 55

scoreθ(X)

scoring model

  • bjective

F(θ)

(implicitly) dependent on the

  • bserved data X
slide-56
SLIDE 56

Gradient Ascent

slide-57
SLIDE 57

Gradient Ascent

slide-58
SLIDE 58

Gradient Ascent

slide-59
SLIDE 59

Gradient Ascent

slide-60
SLIDE 60

Underfitting and overfitting

Images courtesy Hamed Pirsiavash

slide-61
SLIDE 61

Underfitting and overfitting

Images courtesy Hamed Pirsiavash

underfitting

slide-62
SLIDE 62

Underfitting and overfitting

Images courtesy Hamed Pirsiavash

underfitting

Q: What’s one way you can get underfitting?

slide-63
SLIDE 63

Underfitting and overfitting

Images courtesy Hamed Pirsiavash

underfitting

Q: What’s one way you can get underfitting? A: A model that is too simple

slide-64
SLIDE 64

Underfitting and overfitting

Images courtesy Hamed Pirsiavash

underfitting

  • verfitting
slide-65
SLIDE 65

Underfitting and overfitting

Images courtesy Hamed Pirsiavash

underfitting

  • verfitting

Q: What’s one way you can get overfitting?

slide-66
SLIDE 66

Underfitting and overfitting

Images courtesy Hamed Pirsiavash

underfitting

  • verfitting

Q: What’s one way you can get overfitting? A: A model that is too complex (too many parameters)

slide-67
SLIDE 67

Model, parameters and hyperparameters

Model: mathematical formulation of system (e.g., classifier) Parameters: primary “knobs” of the model that are set by a learning algorithm Hyperparameter: secondary “knobs”

http://www.uiparade.com/wp-content/uploads/2012/01/ui-design-pure-css.jpg

slide-68
SLIDE 68

A Terminology Buffet

Classification Regression Clustering

the task: what kind

  • f problem are you

solving?

slide-69
SLIDE 69

A Terminology Buffet

Classification Regression Clustering Fully-supervised Semi-supervised Un-supervised

the task: what kind

  • f problem are you

solving? the data: amount of human input/number

  • f labeled examples
slide-70
SLIDE 70

A Terminology Buffet

Classification Regression Clustering Fully-supervised Semi-supervised Un-supervised

Probabilistic Generative Conditional Spectral Neural Memory- based Exemplar …

the data: amount of human input/number

  • f labeled examples

the approach: how any data are being used the task: what kind

  • f problem are you

solving?

slide-71
SLIDE 71

Classification

POLITICS TERRORISM SPORTS TECH HEALTH FINANCE …

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region.

Task: Topic Id/ Document Classification

slide-72
SLIDE 72

Classification

POLITICS TERRORISM SPORTS TECH HEALTH FINANCE …

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region.

slide-73
SLIDE 73

Classification

POLITICS TERRORISM SPORTS TECH HEALTH FINANCE …

Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.

slide-74
SLIDE 74

Classification

POLITICS TERRORISM SPORTS TECH HEALTH FINANCE …

Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.

slide-75
SLIDE 75

Classify with Goodness

best label =

label arg max score(example, label)

slide-76
SLIDE 76

Classify with (Low) Regret/Loss

best label =

label arg min loss(example, label)

slide-77
SLIDE 77

Classification

POLITICS .05 TERRORISM .48 SPORTS .0001 TECH .39 HEALTH .0001 FINANCE .0002 …

Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.

slide-78
SLIDE 78

Classification Examples

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

slide-79
SLIDE 79

Classification Examples

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

an instance a fixed set of classes C = {c1, c2,…, cJ}

Output: a predicted class c from C

slide-80
SLIDE 80

Classification: Hand-coded Rules?

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Rules based on combinations of words or other features

spam: black-list-address OR (“dollars” AND “have been selected”)

Accuracy can be high

If rules carefully refined by expert

Building and maintaining these rules is expensive Can humans faithfully assign uncertainty?

slide-81
SLIDE 81

Classification: Supervised Machine Learning

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

an instance d a fixed set of classes C = {c1, c2,…, cJ} A training set of m hand-labeled instances (d1,c1),....,(dm,cm)

Output:

a learned classifier γ that maps instances to classes

slide-82
SLIDE 82

Classification: Supervised Machine Learning

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

an instance d a fixed set of classes C = {c1, c2,…, cJ} A training set of m hand-labeled instances (d1,c1),....,(dm,cm)

Output:

a learned classifier γ that maps instances to classes

γ learns to associate certain features of instances with their labels

slide-83
SLIDE 83

Classification: Supervised Machine Learning

Assigning subject categories, topics, or genres Spam detection Authorship identification Age/gender identification Language Identification Sentiment analysis …

Input:

an instance d a fixed set of classes C = {c1, c2,…, cJ} A training set of m hand-labeled instances (d1,c1),....,(dm,cm)

Output:

a learned classifier γ that maps instances to classes

Naïve Bayes Logistic regression Support-vector machines k-Nearest Neighbors …