L ECTURE 1: I NTRODUCTION Prof. Julia Hockenmaier - - PowerPoint PPT Presentation

l ecture 1 i ntroduction
SMART_READER_LITE
LIVE PREVIEW

L ECTURE 1: I NTRODUCTION Prof. Julia Hockenmaier - - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Fall 2013) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 1: I NTRODUCTION Prof. Julia Hockenmaier juliahmr@illinois.edu Welcome to CS 446! Professor: Julia


slide-1
SLIDE 1

CS446 Introduction to Machine Learning (Fall 2013) University of Illinois at Urbana-Champaign

http://courses.engr.illinois.edu/cs446

  • Prof. Julia Hockenmaier

juliahmr@illinois.edu

LECTURE 1: INTRODUCTION

slide-2
SLIDE 2

Professor: Julia Hockenmaier Teaching assistants: Arun Mallya Micah Hodosh Mingjie Qian Ryan Musa

Welcome to CS 446!

slide-3
SLIDE 3

What is machine learning?

slide-4
SLIDE 4

Machine learning is everywhere

slide-5
SLIDE 5

Applications: Spam Detection

This is a binary classification task: Assign one of two labels (i.e. yes/no) to the input (here, an email message)

slide-6
SLIDE 6

Applications: Spam Detection

Classification requires a model (a classifier) to determine which label to assign to items.

slide-7
SLIDE 7

Applications: Spam Detection

In this class, we study algorithms and techniques to learn such models from data.

slide-8
SLIDE 8

Learning = Generalization

Mail thinks this message is junk mail.

Not junk

The learner has to be able to classify items it has never seen before.

slide-9
SLIDE 9

Learning = Adaptation

Mail thinks this message is junk mail.

Not junk

The learner should adapt its model to feedback (supervision) it receives.

slide-10
SLIDE 10

Applications: Text classification

This is a multiclass classification task: Assign one of k labels to the input {Spam, Conferences, Vacations,…}

Spam Conferences Vacations …

slide-11
SLIDE 11

Applications: Face recognition

This is also a binary classification task: Decide for each rectangular image region whether it shows a face or not.

slide-12
SLIDE 12

What will we cover in this class?

slide-13
SLIDE 13

CS446: Key questions

– What kind of tasks can we learn models for? – What kind of models can we learn? – What algorithms can we use to learn? – How do we evaluate how well we have learned to perform a particular task? – How much data do we need to learn models for a particular task?

slide-14
SLIDE 14

The focus of CS446

Supervised learning:

Learning to predict labels from correctly labeled data

Unsupervised learning:

Learning to find hidden structure (e.g. clusters) in input data

Semi-supervised learning:

Learning to predict labels from (a little) labeled and (a lot of) unlabeled data

Reinforcement learning:

Learning to act through feedback for actions (rewards/punishments) from the environment

Learning scenarios

slide-15
SLIDE 15

Supervised learning

slide-16
SLIDE 16

Output y∈ Y Y

An item y drawn from an

  • utput space Y

Input x∈ X

An item x drawn from an input space X X System y = f(x) We consider systems that apply a function f() to input items x and return an output y = f(x).

slide-17
SLIDE 17

Output y∈ Y Y

An item y drawn from an

  • utput space Y

Input x∈ X

An item x drawn from an input space X X System y = f(x)

In (supervised) machine learning, we deal with systems whose f(x) is learned from examples.

slide-18
SLIDE 18

Why use learning?

We typically use machine learning when the function f(x) we want the system to apply is too complex to program by hand.

slide-19
SLIDE 19

Output y∈ Y Y

An item y drawn from a label space Y

Input x∈ X X

An item x drawn from an instance space X X Learned Model y = g(x)

Supervised learning

Target function

y = f(x)

You often seen f(x) instead of g(x), but PowerPoint can’t really typeset that, so g(x) will have to do. ^

slide-20
SLIDE 20

Supervised learning: Training

Labeled Training Data D train (x1, y1) (x2, y2) … (xN, yN) Learned model g(x) Learning Algorithm Give the learner examples in D train The learner returns a model g(x)

slide-21
SLIDE 21

Supervised learning: Testing

Labeled Test Data D test (x’1, y’1) (x’2, y’2) … (x’M, y’M) Reserve some labeled data for testing

slide-22
SLIDE 22

Supervised learning: Testing

Labeled Test Data D test (x’1, y’1) (x’2, y’2) … (x’M, y’M) Test Labels Y test y’1 y’2

...

y’M

Raw Test Data X test x’1 x’2 ….

x’M

slide-23
SLIDE 23

Test Labels Y test y’1 y’2

...

y’M

Raw Test Data X test x’1 x’2 ….

x’M

Supervised learning: Testing

Learned model g(x) Predicted Labels g(X test) g(x’1) g(x’2) …. g(x’M) Apply the model to the raw test data

slide-24
SLIDE 24

Supervised learning: Testing

Test Labels Y test y’1 y’2

...

y’M

Raw Test Data X test x’1 x’2 ….

x’M

Predicted Labels g(X test) g(x’1) g(x’2) …. g(x’M) Learned model g(x) Evaluate the model by comparing the predicted labels against the test labels

slide-25
SLIDE 25

The Badges game

slide-26
SLIDE 26

The Badges game

Conference attendees to the 1994 Machine Learning conference were given name badges labeled with + or −. What function was used to assign these labels?

+ Naoki Abe

  • Eric Baum
slide-27
SLIDE 27

Training data

+ Naoki Abe

  • Myriam Abramson

+ David W. Aha + Kamal M. Ali

  • Eric Allender

+ Dana Angluin

  • Chidanand Apte

+ Minoru Asada + Lars Asker + Javed Aslam + Jose L. Balcazar

  • Cristina Baroglio

+ Peter Bartlett

  • Eric Baum

+ Welton Becket

  • Shai Ben-David

+ George Berg + Neil Berkman + Malini Bhandaru + Bir Bhanu + Reinhard Blasig

  • Avrim Blum
  • Anselm Blumer

+ Justin Boyan + Carla E. Brodley + Nader Bshouty

  • Wray Buntine
  • Andrey Burago

+ Tom Bylander + Bill Byrne

  • Claire Cardie

+ John Case + Jason Catlett

  • Philip Chan
  • Zhixiang Chen
  • Chris Darken
slide-28
SLIDE 28

Raw test data

Gerald F. DeJong Chris Drummond Yolanda Gil Attilio Giordana Jiarong Hong

  • J. R. Quinlan

Priscilla Rasmussen Dan Roth Yoram Singer Lyle H. Ungar

slide-29
SLIDE 29

Labeled test data

+ Gerald F. DeJong

  • Chris Drummond

+ Yolanda Gil

  • Attilio Giordana

+ Jiarong Hong

  • J. R. Quinlan
  • Priscilla Rasmussen

+ Dan Roth + Yoram Singer

  • Lyle H. Ungar
slide-30
SLIDE 30

How will we teach this class?

slide-31
SLIDE 31

Lectures

Tuesdays and Thursdays 3:30 PM – 4:45 PM Digital Computer Lab (Room 1320) Slides will be on the website before class. Lecture videos will be uploaded after class.

slide-32
SLIDE 32

Professor:

Julia Hockenmaier (juliahmr@illinois.edu)

Teaching assistants:

Arun Mallya (amallya2@illinois.edu) Micah Hodosh (mhodosh2@illinois.edu) Mingjie Qian (mqian2@illinois.edu) Ryan Musa (ramusa2@illinois.edu)

Preferred email (to reach us all): cs446-staff@mx.uillinois.edu

Contacting the CS446 staff

slide-33
SLIDE 33

Office Hours (starting next week)

Julia Hockenmaier (3324 Siebel)

Thu, 5:00 PM – 6:00 PM

TAs on-campus (4407 Siebel)

Mon, 4:00 PM – 6:00 PM (Mingjie Qian) Tue, 5:00 PM – 6:00 PM (Ryan Musa) Wed, 3:00 PM – 5:00 PM (Arun Mallya) Wed, 5:00 PM – 7:00 PM (Micah Hodosh)

TA for on-line students: Tue, 6:00 PM – 7:00 PM, (Ryan Musa)

slide-34
SLIDE 34

CS446 on the web

Check our class website:

Schedule, slides, videos, policies, anonymous feedback http://courses.engr.illinois.edu/cs446/

Sign up, participate in our Piazza forum:

Announcements and discussions https://piazza.com/illinois/fall2013/cs446/

Log on to Compass:

Submit assignments, get your grades https://compass2g.illinois.edu

slide-35
SLIDE 35

Assessment and Grading

If you take this class for 3 hours credit, your grade will be determined by your performance on – Homework (33.3% of your grade) – Midterm exam (33.3% of your grade) – Final exam (33.3% of your grade)

slide-36
SLIDE 36

Assessment and Grading

If you take this class for 4 hours credit, your grade will be determined by your performance on – Homework (25% of your grade) – Midterm exam (25% of your grade) – Final exam (25% of your grade) – Research project (25% of your grade)

slide-37
SLIDE 37

Homework

There will be 6 assignments. – We plan to release them on Thursdays in Weeks 2, 4, 6, 8, 10, and 12. – Some, but not all require programming

Probably some Matlab, some Java, some with a language of your choice

– You will have two weeks to complete them.

slide-38
SLIDE 38

Homework: Submission

You need to use Compass to submit your solutions (http://compass2g.illinois.edu) We do not accept any handwritten solutions. – Reports have to be submitted as PDFs, typeset in LaTeX (templates provided)

slide-39
SLIDE 39

Homework: Late Policy

Everybody is allowed a total of two late days for the semester. If you have exhausted your contingent of late days, we will subtract 20% per late day. We don’t accept assignments more than three days after their due date. Let us know if there are any special circumstances (family, health, etc.)

slide-40
SLIDE 40

Homework: Collaboration

We encourage collaboration and discussion, but you need to submit your own work. If you are asked to write your own code, do so. Piazza: Use it to discuss problems and give (reasonable) hints. But if you post complete solutions, you may fail the assignment.

slide-41
SLIDE 41

Homework: Plagiarism

We don’t tolerate plagiarism. – Cite all external sources (including external code) you have used – We may compare your source code if we suspect plagiarism. – Don’t reuse old solutions from previous years.

slide-42
SLIDE 42

Exams

Midterm exam: Thursday, Oct 10 in class Final exam: Tuesday, Dec 10 in class

Let us know ASAP if you have a conflict

  • n those days.

Closed-book exams:

No books/cheat sheets/calculators/computers/phones

Previous exams will be posted to the web.

slide-43
SLIDE 43

4th Credit Hour Projects

Perform an experimental research project that uses machine learning We encourage you to work in pairs

(We don’t allow larger groups)

Write a paper that describes your task, relevant background, and experiments

slide-44
SLIDE 44

4th Credit Hour Projects

Milestone 1 (Week 5)

Have a partner, agreed on a task, submit proposal

Milestone 2 (Week 9)

Submit preliminary results and task description (including relevant background)

Milestone 3 (Week 13)

Submit more fleshed-out results and report

Milestone 4 (After the final exam)

Submit complete report, do brief oral presentation

slide-45
SLIDE 45

Questions?

cs446-staff@mx.uillinois.edu

http://courses.engr.illinois.edu/cs446/