L ECTURE 2: S UPERVISED L EARNING Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Spring 2015) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 2: S UPERVISED L EARNING Prof. Julia Hockenmaier juliahmr@illinois.edu

Class admin Are you on Piazza? Is everybody registered for the class? HW0 is out (not graded) http://courses.engr.illinois.edu/cs446/Homework/HW0/HW0.pdf Email alias for CS446 staff: cs446-staff@mx.uillinois.edu

Learning scenarios The focus of CS446 Supervised learning: Learning to predict labels from correctly labeled data Unsupervised learning: Learning to find hidden structure (e.g. clusters) in input data Semi-supervised learning: Learning to predict labels from (a little) labeled and (a lot of) unlabeled data Reinforcement learning: Learning to act through feedback for actions (rewards/punishments) from the environment

The Badges game + Naoki Abe - Eric Baum Conference attendees to the 1994 Machine Learning conference were given name badges labeled with + or − . What function was used to assign these labels?

The supervised learning task Given a labeled training data set of N items x n ∈ X with labels y n ∈ Y D train = {( x 1 , y 1 ),…, ( x N , y N )} (y n is determined by some unknown target function f( x )) Return a model g: X X ⟼ Y that is a good approximation of f( x ) (g should assign correct labels y to unseen x ∉ D train )

Supervised learning terms Input items/data points x n ∈ X X (e.g. emails) are drawn from an instance space X Output labels y n ∈ Y Y (e.g. ‘spam’/‘nospam’) are drawn from a label space Y Every data point x n ∈ X X has a single correct label y n ∈ Y , defined by an (unknown) target function f ( x ) = y

Supervised learning Input Output Target function y = f( x ) x ∈ X X y ∈ Y Y Learned model y = g( x ) An item y An item x drawn from a drawn from an label space Y instance space X X ^ You often seen f( x ) instead of g( x ), but PowerPoint can’t really typeset that, so g( x ) will have to do.

Supervised learning: Training Labeled Training Data D train Learned Learning ( x 1 , y 1 ) model Algorithm ( x 2 , y 2 ) g( x ) … ( x N , y N ) Give the learner examples in D train The learner returns a model g( x )

Training data + Naoki Abe + Peter Bartlett + Carla E. Brodley - Myriam Abramson - Eric Baum + Nader Bshouty + David W. Aha + Welton Becket - Wray Buntine + Kamal M. Ali - Shai Ben-David - Andrey Burago - Eric Allender + George Berg + Tom Bylander + Dana Angluin + Neil Berkman + Bill Byrne - Chidanand Apte + Malini Bhandaru - Claire Cardie + Minoru Asada + Bir Bhanu + John Case + Lars Asker + Reinhard Blasig + Jason Catlett + Javed Aslam - Avrim Blum - Philip Chan + Jose L. Balcazar - Anselm Blumer - Zhixiang Chen - Cristina Baroglio + Justin Boyan - Chris Darken

Supervised learning: Testing Labeled Test Data D test ( x’ 1 , y’ 1 ) ( x’ 2 , y’ 2 ) … ( x’ M , y’ M ) Reserve some labeled data for testing

Supervised learning: Testing Labeled Test Data Raw Test Test D test Data Labels ( x’ 1 , y’ 1 ) X test Y test ( x’ 2 , y’ 2 ) y’ 1 x’ 1 … x’ 2 y’ 2 ( x’ M , y’ M ) ... …. y’ M x’ M

Supervised learning: Testing Apply the model to the raw test data Raw Test Predicted Test Data Labels Labels X test g( X test ) Y test Learned y’ 1 x’ 1 g( x’ 1 ) model x’ 2 g( x’ 2 ) y’ 2 g( x ) ... …. …. y’ M x’ M g( x’ M )

Raw test data Gerald F. DeJong J. R. Quinlan Chris Drummond Priscilla Rasmussen Yolanda Gil Dan Roth Attilio Giordana Yoram Singer Jiarong Hong Lyle H. Ungar

Supervised learning: Testing Evaluate the model by comparing the predicted labels against the test labels Raw Test Predicted Test Data Labels Labels X test g( X test ) Y test Learned y’ 1 x’ 1 g( x’ 1 ) model x’ 2 g( x’ 2 ) y’ 2 g( x ) ... …. …. y’ M x’ M g( x’ M )

Labeled test data + Gerald F. DeJong - J. R. Quinlan - Chris Drummond - Priscilla Rasmussen + Yolanda Gil + Dan Roth - Attilio Giordana + Yoram Singer + Jiarong Hong - Lyle H. Ungar

Evaluating supervised learners Use a test data set that is disjoint from D train D test = {( x’ 1 , y’ 1 ),…, ( x’ M , y’ M )} The learner has not seen the test items during learning. Split your labeled data into two parts: test and training. Take all items x’ i in D D test and compare the predicted f( x’ i ) with the correct y’ i . This requires an evaluation metric (e.g. accuracy).

Using supervised learning – What is our instance space? Gloss: What kind of features are we using? – What is our label space? Gloss: What kind of learning task are we dealing with? – What is our hypothesis space? Gloss: What kind of model are we learning? – What learning algorithm do we use? Gloss: How do we learn the model from the labeled data? (What is our loss function/evaluation metric?) Gloss: How do we measure success?

1. The instance space

1. The instance space X Input Output Learned x ∈ X y ∈ Y Model An item x An item y y = g( x ) drawn from an drawn from a instance space X X label space Y Designing an appropriate instance space X is crucial for how well we can predict y.

1. The instance space X When we apply machine learning to a task, we first need to define the instance space X. X. Instances x ∈ X X are defined by features: – Boolean features: Does this email contain the word ‘money’? – Numerical features: How often does ‘money’ occur in this email? What is the width/height of this bounding box?

What’s X X for the Badges game? Possible features: • Gender/age/country of the person? • Length of their first or last name? • Does the name contain letter ‘x’? • How many vowels does their name contain? • Is the n-th letter a vowel?

X X as a vector space X is an N-dimensional vector space (e.g. ℝ N ) Each dimension = one feature. Each x is a feature vector (hence the boldface x ). Think of x = [x 1 … x N ] as a point in X : x 2 x 1

From feature templates to vectors When designing features, we often think in terms of templates, not individual features: What is the 2nd letter? N a oki → [1 0 0 0 …] A b e → [0 1 0 0 …] S c rooge → [0 0 1 0 …] What is the i- th letter? Abe → [ 1 0 0 0 0… 0 1 0 0 0 0… 0 0 0 0 1 …]

Good features are essential The choice of features is crucial for how well a task can be learned. In many application areas (language, vision, etc.), a lot of work goes into designing suitable features. This requires domain expertise. CS446 can’t teach you what specific features to use for your task. But we will touch on some general principles

2. The label space

2. The label space Y Input Output Learned x ∈ X y ∈ Y Model An item x An item y y = g( x ) drawn from an drawn from a instance space X X label space Y The label space Y Y determines what kind of supervised learning task we are dealing with

Supervised learning tasks I Output labels y ∈ Y Y are categorical : The focus of CS446 – Binary classification: Two possible labels – Multiclass classification: k possible labels Output labels y ∈ Y Y are structured objects (sequences of labels, parse trees, etc.) – Structure learning (e.g. CS546)

Supervised learning tasks II Output labels y ∈ Y Y are numerical : – Regression (linear/polynomial) : Labels are continuous-valued Learn a linear/polynomial function f(x) – Ranking: Labels are ordinal Learn an ordering f(x 1 ) > f(x 2 ) over input

3. Models (The hypothesis space)

3. The model g( x ) Input Output Learned x ∈ X y ∈ Y Model An item x An item y y = g( x ) drawn from an drawn from a instance space X X label space Y We need to choose what kind of model we want to learn

More terminology For classification tasks ( Y Y is categorical, e.g. {0, 1}, or {0, 1, …, k}), the model is called a classifier. For binary classification tasks ( Y Y = {0, 1}), we often think of the two values of Y Y as Boolean (0 = false, 1 = true), and call the target function f( x ) to be learned a concept

A learning problem x 1 x 2 x 3 x 4 y 1 0 0 1 0 0 2 0 1 0 0 0 3 0 0 1 1 1 ‘ 4 1 0 0 1 1 5 0 1 1 0 0 6 1 1 0 0 0 7 0 1 0 1 0

A learning problem Each x has 4 bits: | X X |= 2 4 = 16 Since Y Y = {0, 1}, each f( x ) defines one subset of X X has 2 16 = 65536 subsets: There are 2 16 possible f( x ) (2 9 are consistent with our data) We would need to see all of X X to learn f( x )

A learning problem We would need to see all of X X to learn f( x ) – Easy with | X |=16 – Not feasible in general (for any real-world problems) – Learning = generalization, not memorization of the training data

The hypothesis space H There are | Y | | X | possible functions f( x ) from the instance space X to the label space Y. Y. Learners typically consider only a subset of the functions from X to Y . This subset is called the hypothesis space H . H H ⊆ | Y | | X |

Can we restrict H H ? x 1 x 2 x 3 x 4 y Conjunctive clauses: 1 0 0 1 0 0 16 different conjunctions 2 0 1 0 0 0 over {x 1 , x 2 , x 3 , x 4 }: 3 0 0 1 1 1 f( x ) = x 1 4 1 0 0 1 1 ... 5 0 1 1 0 0 f( x ) = x 1 ∧ x 2 ∧ x 3 ∧ x 4 6 1 1 0 0 0 None is consistent with 7 0 1 0 1 0 the data

L ECTURE 2: S UPERVISED L EARNING Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Spring 2015) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 2: S UPERVISED L EARNING Prof. Julia Hockenmaier juliahmr@illinois.edu Class admin Are you on

18-759: Wireless Networks L ecture 17: Cellular Peter Steenkiste Departments of Computer Science

18-759: Wireless Networks L ecture 18: Cellular Peter Steenkiste Departments of Computer Science

L ECTURE 8: D YNAMICAL S YSTEMS 7 I NSTRUCTOR : G IANNI A. D I C ARO G EOMETRIES IN THE PHASE SPACE

AAP COVID-19 ECHO: Pediatric Emergency Readiness & Response L ECTURE COVID-19 Testing and

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

Wireless Networks L ecture 18: Wireless LANs 802.11* Peter Steenkiste CS and ECE, Carnegie

U nit 1: I ntroduction to data L ecture 1: D ata collection , observational studies , and

Wireless Networks L ecture 21: Wireless and the Internet Peter Steenkiste CS and ECE, Carnegie

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 1: N EUROIMAGING T ECHNIQUES Ilya Kuzovkin

Wireless Networks L ecture 5: Physical Layer Channel Properties Peter Steenkiste CS and ECE,

L ECTURE 13: C ELLULAR A UTOMATA 3 / D ISCRETE -T IME D YNAMICAL S YSTEMS 5 I NSTRUCTOR : G IANNI

Wireless Networks L ecture 17: Wireless LANs 802.11 Management Peter Steenkiste CS and ECE,

Wireless Networks L ecture 6: Physical Layer Channel Model and Modulation Peter Steenkiste CS

From Cashews to The Evolution of Behavioral Economics Richard H. Thaler N OBEL P RIZE L ECTURE D

L ECTURE 25: B AYESIAN F ILTERS M ONTE C ARLO L OCALIZATION (PF) I NSTRUCTOR : G IANNI A. D I C ARO

Wireless Networks L ecture 1: Course Organization, A Bit of History Peter Steenkiste CS and ECE,

USNH Town Hall September 16, 2020 Chancellors Update FAR Update USNH Email Communication:

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // JOY ARULRAJ L E C T U R E # 0 1 : C

Thoughts on Validating RDF Healthcare Data David Booth, Ph.D. KnowMED, Inc. 2013 W3C RDF

Email Etiquette Facilitated by Chris Friend The John Scott Dailey Florida Institute of Government

CS 3410: Computer System Organization and Programming Hakim Weatherspoon Computer Science

On the Leakage of Personally Identifiable Information Via Online Social Networks Balachander

A Survivor's Guide to Contributing to the Linux Kernel Javier Martinez Canillas Samsung Open

Configuring Git Matthieu Moy Matthieu.Moy@imag.fr

Sambuz

Useful Links

Newsletter

Mail Us

L ECTURE 2: S UPERVISED L EARNING Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Spring 2015) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 2: S UPERVISED L EARNING Prof. Julia Hockenmaier juliahmr@illinois.edu Class admin Are you on

18-759: Wireless Networks L ecture 17: Cellular Peter Steenkiste Departments of Computer Science

18-759: Wireless Networks L ecture 18: Cellular Peter Steenkiste Departments of Computer Science

L ECTURE 8: D YNAMICAL S YSTEMS 7 I NSTRUCTOR : G IANNI A. D I C ARO G EOMETRIES IN THE PHASE SPACE

AAP COVID-19 ECHO: Pediatric Emergency Readiness &amp; Response L ECTURE COVID-19 Testing and

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

Wireless Networks L ecture 18: Wireless LANs 802.11* Peter Steenkiste CS and ECE, Carnegie

U nit 1: I ntroduction to data L ecture 1: D ata collection , observational studies , and

Wireless Networks L ecture 21: Wireless and the Internet Peter Steenkiste CS and ECE, Carnegie

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 1: N EUROIMAGING T ECHNIQUES Ilya Kuzovkin

Wireless Networks L ecture 5: Physical Layer Channel Properties Peter Steenkiste CS and ECE,

L ECTURE 13: C ELLULAR A UTOMATA 3 / D ISCRETE -T IME D YNAMICAL S YSTEMS 5 I NSTRUCTOR : G IANNI

Wireless Networks L ecture 17: Wireless LANs 802.11 Management Peter Steenkiste CS and ECE,

Wireless Networks L ecture 6: Physical Layer Channel Model and Modulation Peter Steenkiste CS

From Cashews to The Evolution of Behavioral Economics Richard H. Thaler N OBEL P RIZE L ECTURE D

L ECTURE 25: B AYESIAN F ILTERS M ONTE C ARLO L OCALIZATION (PF) I NSTRUCTOR : G IANNI A. D I C ARO

Wireless Networks L ecture 1: Course Organization, A Bit of History Peter Steenkiste CS and ECE,

USNH Town Hall September 16, 2020 Chancellors Update FAR Update USNH Email Communication:

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // JOY ARULRAJ L E C T U R E # 0 1 : C

Thoughts on Validating RDF Healthcare Data David Booth, Ph.D. KnowMED, Inc. 2013 W3C RDF

Email Etiquette Facilitated by Chris Friend The John Scott Dailey Florida Institute of Government

CS 3410: Computer System Organization and Programming Hakim Weatherspoon Computer Science

On the Leakage of Personally Identifiable Information Via Online Social Networks Balachander

A Survivor's Guide to Contributing to the Linux Kernel Javier Martinez Canillas Samsung Open

Configuring Git Matthieu Moy Matthieu.Moy@imag.fr

Sambuz

Useful Links

Newsletter

Mail Us

AAP COVID-19 ECHO: Pediatric Emergency Readiness & Response L ECTURE COVID-19 Testing and