Lecture 1: Course outline and logistics What is Machine Learning - - PowerPoint PPT Presentation

lecture 1
SMART_READER_LITE
LIVE PREVIEW

Lecture 1: Course outline and logistics What is Machine Learning - - PowerPoint PPT Presentation

Lecture 1: Course outline and logistics What is Machine Learning Aykut Erdem October 2016 Hacettepe University Todays Schedule Course outline and logistics An overview of Machine Learning 2 Course outline and logistics


slide-1
SLIDE 1

Aykut Erdem

October 2016 Hacettepe University

Lecture 1:

−Course outline and logistics −What is Machine Learning

slide-2
SLIDE 2

Today’s Schedule

  • Course outline and logistics
  • An overview of Machine Learning

2

slide-3
SLIDE 3

Course outline and logistics

slide-4
SLIDE 4

Logistics

  • Instructor: 



 Aykut ERDEM (aykut@cs.hacettepe.edu.tr)

  • Teaching Assistant: 



 Aysun Kocak (aysunkocak@cs.hacettepe.edu.tr)
 


Burcak Asal (basal@cs.hacettepe.edu.tr)

  • Lectures: Mon 09:00 - 10:50_D8


Thu 11:00 - 11:50_D8

  • Tutorials: Wed 15:00 - 16:50_D8

4

slide-5
SLIDE 5

About this course

  • This is a undergraduate-level introductory course in

machine learning (ML)

⎯ A broad overview of many concepts and algorithms in ML.

  • Requirements

⎯ Basic algorithms, data structures. ⎯ Basic probability and statistics. ⎯ Basic linear algebra and calculus ⎯ Good programming skills


  • BBM 409 Introduction to Machine Learning

Practicum

⎯ Students will gain skills to apply the concepts to real

world problems.

5

vector/matrix manipulations, partial derivatives common distributions, Bayes rule, mean/median/model

slide-6
SLIDE 6

Communication

  • The course webpage will be updated regularly

throughout the semester with lecture notes, programming and reading assignments and important deadlines. 
 http://web.cs.hacettepe.edu.tr/~aykut/classes/ fall2016/bbm406/

  • We will be using Piazza for course related discussions

and announcements. Please enroll the class on Piazza by following the link
 http://piazza.com/class#fall2016/bbm406

6

slide-7
SLIDE 7

Reference Books

  • Artificial Intelligence: A Modern Approach (3rd Edition), Russell

and Norvig. Prentice Hall, 2009

  • Bayesian Reasoning and Machine Learning, Barber, Cambridge

University Press, 2012. (online version available)

  • Introduction to Machine Learning (2nd Edition), Alpaydin, MIT

Press , 2010

  • Pattern Recognition and Machine Learning, Bishop, Springer,

2006

  • Machine Learning: A Probabilistic Perspective, Murphy, MIT

Press, 2012

7

slide-8
SLIDE 8

Grading Policy

  • Grading for BBM 406 will be based on

⎯ a course project (done in pairs) (25%),

⎯ a midterm exam (30%), ⎯ a final exam (40%), and ⎯ class participation (5%)

  • In BBM 409, the grading will be based on

⎯ a set of quizzes (20%), and

⎯ 3 assignments (done individually)

8

slide-9
SLIDE 9

Assignments

  • 3 assignments, first one worth 20%, last two worth 30%

each

  • Theoretical: Pencil-and-paper derivations
  • Programming: Implementing Python code to solve a

given real-world problem

  • A quick Python tutorial in this week’s tutorial session.

9

slide-10
SLIDE 10

10

slide-11
SLIDE 11

Course Project

  • Done individually, or in teams of two students.
  • Choose your own topic and explore ways to

solve the problem

  • Proposal: 1 page (Oct 31) (10%)


Progress Report: 4-5 pages (Dec 12) (25%)
 Poster Presentation: (last week of classes) (20%)
 Final Report: (due at the beginning of poster session) (45%)

11

slide-12
SLIDE 12

Collaboration Policy

  • All work on assignments have to be done individually. The

course project, however, can be done in pairs.

  • You are encouraged to discuss with your classmates about

the given assignments, but these discussions should be carried out in an abstract way.

  • In short, turning in someone else’s work, in whole or in

part, as your own will be considered as a violation of academic integrity.

  • Please note that the former condition also holds for the

material found on the web as everything on the web has been written by someone else.

12

http://www.plagiarism.org/plagiarism-101/prevention/

slide-13
SLIDE 13

Course Outline

  • Week1

Overview of Machine Learning, Nearest Neighbor Classifier

  • Week2

Linear Regression, Least Squares

  • Week3

Machine Learning Methodology

  • Week4

Statistical Estimation: MLE, MAP , Naïve Bayes Classifier


  • Week5

Linear Classification Models: Logistic Regression, Linear 
 Discriminant Functions, Perceptron

  • Week6

Neural Networks

  • Week7

Midterm Exam

13

Assg1 out Assg1 due, Assg2 out Course project proposal due Assg2 due Assg3 out

slide-14
SLIDE 14

Course Outline (cont’d.)

  • Week8

Deep Learning

  • Week9

Support Vector Machines (SVMs)

  • Week10

Multi-class SVM

  • Week11

Decision Tree Learning

  • Week12

Ensemble Methods: Bagging, Random Forests, 
 Boosting

  • Week13

Clustering

  • Week14

Principle Component Analysis, Autoencoders

14

Project progress report due Assg3 due

slide-15
SLIDE 15

Machine Learning: 
 An Overview

slide-16
SLIDE 16

Quotes

  • “If you were a current computer science student what area

would you start studying heavily?” –Answer: Machine Learning. –“The ultimate is computers that learn” –Bill Gates, Reddit AMA

  • “Machine learning is the next Internet”

–Tony Tether, Director, DARPA

  • “Machine learning is today’s discontinuity”

–Jerry Yang, CEO, Yahoo

16

slide by David Sontag

slide-17
SLIDE 17

Google Trends

17

Machine learning Deep learning

slide-18
SLIDE 18

2015 Edition

slide-19
SLIDE 19

2016 Edition

slide-20
SLIDE 20

Two definitions of learning

(1) Learning is the acquisition of knowledge 
 about the world. 
 Kupfermann (1985)
 (2) Learning is an adaptive change in behavior 
 caused by experience. 
 Shepherd (1988)

20

slide by Bernhard Schölkopf

slide-21
SLIDE 21

Empirical Inference

  • Drawing conclusions from empirical data

(observations, measurements)

21

slide by Bernhard Schölkopf

slide-22
SLIDE 22

Empirical Inference

  • Drawing conclusions from empirical data

(observations, measurements)

  • Example 1: scientific inference

22

slide by Bernhard Schölkopf

y

× × × × ×

x

slide-23
SLIDE 23

Empirical Inference

  • Drawing conclusions from empirical data

(observations, measurements)

  • Example 1: scientific inference

23

slide by Bernhard Schölkopf

y

× × × × ×

y = a * x x

slide-24
SLIDE 24

Empirical Inference

  • Drawing conclusions from empirical data

(observations, measurements)

  • Example 1: scientific inference

24

slide by Bernhard Schölkopf

x y

× × × × ×

y = a * x

Leibniz, Weyl, Chaitin

× × × ×

slide-25
SLIDE 25

Empirical Inference

  • Drawing conclusions from empirical data

(observations, measurements)

  • Example 1: scientific inference

25

slide by Bernhard Schölkopf

x y

× × × × ×

y = ∑i ai k(x, xi)+b

Leibniz, Weyl, Chaitin

× × × ×

slide-26
SLIDE 26

Empirical Inference

  • Example 2: perception

26

slide by Bernhard Schölkopf

slide-27
SLIDE 27

slide by Bernhard Schölkopf

slide-28
SLIDE 28

slide by Bernhard Schölkopf

slide-29
SLIDE 29

slide by Bernhard Schölkopf

slide-30
SLIDE 30

slide by Bernhard Schölkopf

slide-31
SLIDE 31

slide by Bernhard Schölkopf

slide-32
SLIDE 32

slide by Bernhard Schölkopf

slide-33
SLIDE 33

slide by Bernhard Schölkopf

slide-34
SLIDE 34

slide by Bernhard Schölkopf

slide-35
SLIDE 35

slide by Bernhard Schölkopf

slide-36
SLIDE 36

slide by Bernhard Schölkopf

slide-37
SLIDE 37

slide by Bernhard Schölkopf

slide-38
SLIDE 38

slide by Bernhard Schölkopf

slide-39
SLIDE 39

slide by Bernhard Schölkopf

slide-40
SLIDE 40

slide by Bernhard Schölkopf

slide-41
SLIDE 41

slide by Bernhard Schölkopf

slide-42
SLIDE 42

slide by Bernhard Schölkopf

slide-43
SLIDE 43

slide by Bernhard Schölkopf

slide-44
SLIDE 44

slide by Bernhard Schölkopf

slide-45
SLIDE 45

slide by Bernhard Schölkopf

slide-46
SLIDE 46

slide by Bernhard Schölkopf

slide-47
SLIDE 47

Empirical Inference

  • Example2: perception

47

"The brain is nothing but a statistical decision organ" 


  • H. Barlow

slide by Bernhard Schölkopf

slide-48
SLIDE 48

48

slide-49
SLIDE 49

X

slide by Bernhard Schölkopf

slide-50
SLIDE 50

X

slide by Bernhard Schölkopf

slide-51
SLIDE 51
slide-52
SLIDE 52

slide by Bernhard Schölkopf

slide-53
SLIDE 53

reflected light = illumination * reflectance

53

slide by Bernhard Schölkopf

slide-54
SLIDE 54

54

  • High dimensionality
  • Complex regularities
  • Little prior knowledge
  • Need large data sets

Hard Inference Problems

slide by Bernhard Schölkopf

— consider many factors simultaneously 
 to find regularity — nonlinear; nonstationary, etc. — e.g. no mechanistic models for the data — processing requires computers and 
 automatic inference methods

slide-55
SLIDE 55

What is machine learning?

slide-56
SLIDE 56

Example: Netflix Challenge

  • Goal: Predict how a viewer will rate a movie
  • 10% improvement = 1 million dollars

56

slide by Yaser Abu-Mostapha

slide-57
SLIDE 57

Example: Netflix Challenge

  • Goal: Predict how a viewer will rate a movie
  • 10% improvement = 1 million dollars
  • Essence of Machine Learning:
  • A pattern exists
  • We cannot pin it down mathematically
  • We have data on it

57

slide by Yaser Abu-Mostapha

slide-58
SLIDE 58

AlphaGo vs Lee Sedol

58

slide-59
SLIDE 59

NVIDIA BB8 AI Car

59

slide-60
SLIDE 60

Comparison

  • Traditional Programming
  • Machine Learning

60

Computer Data Program Output Computer Data Output Program

slide by Pedro Domingos, Tom Mitchel, Tom Dietterich

slide-61
SLIDE 61

What is Machine Learning?

  • [Arthur Samuel, 1959]
  • Field of study that gives computers
  • the ability to learn without being explicitly programmed
  • [Kevin Murphy] algorithms that
  • automatically detect patterns in data
  • use the uncovered patterns to predict future data or
  • ther outcomes of interest
  • [Tom Mitchell] algorithms that
  • improve their performance (P)
  • at some task (T)
  • with experience (E)

61

slide by Dhruv Batra

slide-62
SLIDE 62

What is Machine Learning?

  • If you are a Scientist
  • If you are an Engineer / Entrepreneur
  • Get lots of data
  • Machine Learning
  • ???
  • Profit!

62

Data Understanding Machine 
 Learning

slide by Dhruv Batra

slide-63
SLIDE 63

Why Study Machine Learning?


Engineering Better Computing Systems

  • Develop systems
  • too difficult/expensive to construct manually
  • because they require specific detailed skills/knowledge
  • knowledge engineering bottleneck
  • Develop systems
  • that adapt and customize themselves to individual users.
  • Personalized news or mail filter
  • Personalized tutoring
  • Discover new knowledge from large databases
  • Medical text mining (e.g. migraines to calcium channel

blockers to magnesium)

  • data mining

63

slide by Dhruv Batra

slide-64
SLIDE 64

Why Study Machine Learning?


Cognitive Science

  • Computational studies of learning may help

us understand learning in humans

  • and other biological organisms.
  • Hebbian neural learning
  • “Neurons that fire together, wire together.”

64

slide by Dhruv Batra

slide-65
SLIDE 65

Why Study Machine Learning?


The Time is Ripe

  • Algorithms
  • Many basic effective and efficient algorithms

available.

  • Data
  • Large amounts of on-line data available.
  • Computing
  • Large amounts of computational resources

available.

65

slide by Ray Mooney

slide-66
SLIDE 66

Where does ML fit in?

66

slide by Fei Sha

slide-67
SLIDE 67

A Brief History of AI

67

slide by Dhruv Batra

slide-68
SLIDE 68

68

adopted from Dhruv Batra

slide-69
SLIDE 69

AI Predictions: Experts

69

Image Credit: http://intelligence.org/files/PredictingAI.pdf slide by Dhruv Batra

slide-70
SLIDE 70

AI Predictions: Non-Experts

70

Image Credit: http://intelligence.org/files/PredictingAI.pdf slide by Dhruv Batra

slide-71
SLIDE 71

AI Predictions: Failed

71

Image Credit: http://intelligence.org/files/PredictingAI.pdf slide by Dhruv Batra

slide-72
SLIDE 72

Why is AI hard?

72

Image Credit: http://karpathy.github.io/2012/10/22/state-of-computer-vision/

slide by Dhruv Batra

slide-73
SLIDE 73

What humans see

73

slide by Larry Zitnick

slide-74
SLIDE 74

What computers see

74

243 239 240 225 206 185 188 218 211 206 216 225 242 239 218 110 67 31 34 152 213 206 208 221 243 242 123 58 94 82 132 77 108 208 208 215 235 217 115 212 243 236 247 139 91 209 208 211 233 208 131 222 219 226 196 114 74 208 213 214 232 217 131 116 77 150 69 56 52 201 228 223 232 232 182 186 184 179 159 123 93 232 235 235 232 236 201 154 216 133 129 81 175 252 241 240 235 238 230 128 172 138 65 63 234 249 241 245 237 236 247 143 59 78 10 94 255 248 247 251 234 237 245 193 55 33 115 144 213 255 253 251 248 245 161 128 149 109 138 65 47 156 239 255 190 107 39 102 94 73 114 58 17 7 51 137 23 32 33 148 168 203 179 43 27 17 12 8 17 26 12 160 255 255 109 22 26 19 35 24

slide by Larry Zitnick

slide-75
SLIDE 75

“I saw her duck”

75

Image Credit: Liang Huang

slide by Liang Huang

slide-76
SLIDE 76

“I saw her duck”

76

Image Credit: Liang Huang

slide by Liang Huang

slide-77
SLIDE 77

“I saw her duck”

77

Image Credit: Liang Huang

slide by Liang Huang

slide-78
SLIDE 78

We’ve come a long way… IBM Watson

  • What is Jeopardy?
  • http://youtu.be/Xqb66bdsQlw?t=53s
  • Challenge:
  • http://youtu.be/_429UIzN1JM
  • Watson Demo:
  • http://youtu.be/WFR3lOm_xhE?t=22s
  • Explanation
  • http://youtu.be/d_yXV22O6n4?t=4s
  • Future: Automated operator, doctor assistant,

finance

78

  • IBM Watson wins on Jeopardy (February 2011)
  • Watson provides cancer treatment options to

doctors in seconds (February 2013)

slide by Liang Huang

slide-79
SLIDE 79

Why are things working today?

  • More compute

power

  • More data
  • Better algorithms/

models

79

Figure Credit: Banko & Brill, 2011

Accuracy

Better

Amount of Training Data

slide by Dhruv Batra

slide-80
SLIDE 80

Machine Learning 
 (by examples)

slide-81
SLIDE 81

Pose Estimation

81

slide by Alex Smola

slide-82
SLIDE 82

Collaborative Filtering

82

Amazon books Don’t mix preferences on Netflix!

slide by Alex Smola

slide-83
SLIDE 83

Imitation Learning in Games

83

Avatar learns from your behavior

Black & White Lionsgate Studios

slide by Alex Smola

slide-84
SLIDE 84

Reinforcement Learning

84

https://www.youtube.com/watch?v=lleRKHsJBJ0

slide by Alex Smola

slide-85
SLIDE 85

Spam Filtering

85

ham spam

slide by Alex Smola

slide-86
SLIDE 86

Cheque Reading

86

segment image recognize handwriting

slide by Alex Smola

slide-87
SLIDE 87

Image Layout

  • Raw set of images from several cameras
  • Joint layout based on image similarity

87

slide by Alex Smola

slide-88
SLIDE 88

Search Ads

88

why these ads?

slide by Alex Smola

slide-89
SLIDE 89

Google Self-Driving Cars

  • Google’s self-driving car passes 300,000 miles (Forbes, 8/15/2012)

89

slide by Alex Smola

slide-90
SLIDE 90

Speech Recognition

90

Given an audio waveform, robustly extract & recognize any spoken words

  • Statistical models can be used to

  • Provide greater robustness to noise
  • Adapt to accent of different speakers

  • Learn from training
slide-91
SLIDE 91

Natural Language Processing

91

I need to hide a body noun, verb, preposition, …

slide-92
SLIDE 92

Face Detection

92

Sudhakar et al., Multi-view Face Detection Using Deep Convolutional Neural Networks, 2015

slide-93
SLIDE 93

Face Detection

93

Yang et al., From Facial Parts Responses to Face Detection: A Deep Learning Approach, ICCV 2015

slide-94
SLIDE 94

Topic Models of Text Documents

94

Topic Models of Text Documents

slide by Eric Sudderth

slide-95
SLIDE 95

Visual Scene Understanding

95

trees skyscraper sky bell dome temple buildings sky

slide by Eric Sudderth

slide-96
SLIDE 96

Learning - revisited

96

data


Learning


knowledge
 



prior
 knowledge


slide by Stuart Russell

slide-97
SLIDE 97

Learning - revisited

97

data


Learning


knowledge
 



prior
 knowledge


slide by Stuart Russell

slide-98
SLIDE 98

Programming with Data

  • Want adaptive robust and fault tolerant systems
  • Rule-based implementation is (often)
  • difficult (for the programmer)
  • brittle (can miss many edge-cases)
  • becomes a nightmare to maintain explicitly
  • often doesn’t work too well (e.g. OCR)

  • Usually easy to obtain examples of what we want


IF x THEN DO y

  • Collect many pairs (xi, yi)
  • Estimate function f such that f(xi) = yi (supervised learning)
  • Detect patterns in data (unsupervised learning)

98

slide by Mehryar Mohri

slide-99
SLIDE 99

Objectives of Machine Learning

99

  • Algorithms: design of efficient, accurate, and

general learning algorithms to

– deal with large-scale problems. – make accurate predictions (unseen examples). – handle a variety of different learning problems.

  • Theoretical questions:

– what can be learned? Under what conditions? – what learning guarantees can be given? – what is the algorithmic complexity?

slide by Mehryar Mohri

slide-100
SLIDE 100

Definitions and Terminology

  • Example: an object, instance of the data used.
  • Features: the set of attributes, often

represented as a vector, associated to an example (e.g., height and weight for gender prediction).

  • Labels: in classification, category associated to

an object (e.g., positive or negative in binary classification); in regression real value.

  • Training data: data used for training learning

algorithm (often labeled data).

100

slide by Mehryar Mohri

slide-101
SLIDE 101

Definitions and Terminology (cont’d.)

  • Test data: data used for testing learning

algorithm (unlabeled data).

  • Unsupervised learning: no labeled data.
  • Supervised learning: uses labeled data.
  • Weakly or semi-supervised learning:

intermediate scenarios.

  • Reinforcement learning: rewards from

sequence of action.

101

slide by Mehryar Mohri

slide-102
SLIDE 102

Supervised Learning

slide by Alex Smola

slide-103
SLIDE 103

Supervised Learning

  • Binary classification


Given x find y in {-1, 1}

  • Multicategory classification


Given x find y in {1, ... k}

  • Regression


Given x find y in R (or R

d

)

  • Sequence annotation


Given sequence x1 ... xl find y1 ... yl

  • Hierarchical Categorization (Ontology)


Given x find a point in the hierarchy of y (e.g. a tree)

  • Prediction


Given xt and yt-1 ... y1 find yt


103

y = f(x)

l(y, f(x))

  • ften with loss

slide by Alex Smola

slide-104
SLIDE 104

Binary Classification

104

slide by Alex Smola

slide-105
SLIDE 105

Multiclass Classification + Annotation

105

slide by Alex Smola

slide-106
SLIDE 106

Regression

106

linear nonlinear

slide by Alex Smola

slide-107
SLIDE 107

Sequence Annotation

107

given sequence 
 gene finding speech recognition activity segmentation named entities

slide by Alex Smola

slide-108
SLIDE 108

Ontology

108

webpages genes

slide by Alex Smola

slide-109
SLIDE 109

Prediction

109

tomorrow’s stock price

slide by Alex Smola

slide-110
SLIDE 110

Unsupervised Learning

slide by Alex Smola

slide-111
SLIDE 111

Unsupervised Learning

  • Given data x, ask a good question ... about x or about model for x
  • Clustering


Find a set of prototypes representing the data

  • Principal Components


Find a subspace representing the data

  • Sequence Analysis


Find a latent causal sequence for observations

  • Sequence Segmentation
  • Hidden Markov Model (discrete state)
  • Kalman Filter (continuous state)
  • Hierarchical representations
  • Independent components / dictionary learning


Find (small) set of factors for observation

  • Novelty detection


Find the odd one out

111

slide by Alex Smola

slide-112
SLIDE 112

Clustering

  • Documents
  • Users
  • Webpages
  • Diseases
  • Pictures
  • Vehicles


...

112

slide by Alex Smola

slide-113
SLIDE 113

Principal Components

113

Variance component model to account for sample structure in genome-wide association studies, Nature Genetics 2010

slide by Alex Smola

slide-114
SLIDE 114

Sequence Analysis

114

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project, Nature 2007

slide by Alex Smola

slide-115
SLIDE 115

Hierarchical Grouping

115

slide by Alex Smola

slide-116
SLIDE 116

Independent Components

116

find them automatically

slide by Alex Smola

slide-117
SLIDE 117

Novelty detection

117

typical atypical

slide by Alex Smola

slide-118
SLIDE 118

Important challenges in ML

  • How important is the actual learning

algorithm and its tuning

  • Simple versus complex algorithm
  • Overfitting
  • Model Selection
  • Regularization

118