Resources 1. Web Page: www.cs.rpi.edu/ magdon/courses/learn.php - - PowerPoint PPT Presentation

resources
SMART_READER_LITE
LIVE PREVIEW

Resources 1. Web Page: www.cs.rpi.edu/ magdon/courses/learn.php - - PowerPoint PPT Presentation

Resources 1. Web Page: www.cs.rpi.edu/ magdon/courses/learn.php course info: www.cs.rpi.edu/ magdon/courses/learn/info.pdf slides: www.cs.rpi.edu/ magdon/courses/learn/slides.html assignments: www.cs.rpi.edu/


slide-1
SLIDE 1

Learning From Data Lecture 1 The Learning Problem

Introduction Motivation Credit Default - A Running Example Summary of the Learning Problem

  • M. Magdon-Ismail

CSCI 4100/6100

Resources

  • 1. Web Page: www.cs.rpi.edu/∼magdon/courses/learn.php

– course info: www.cs.rpi.edu/∼magdon/courses/learn/info.pdf – slides: www.cs.rpi.edu/∼magdon/courses/learn/slides.html – assignments: www.cs.rpi.edu/∼magdon/courses/learn/assign.html

  • 2. Text Book:

Learning From Data

Abu-Mostafa, Magdon-Ismail, Lin

  • 3. Book Forum: book.caltech.edu/bookforum

– discussion about any material in book including problems and exercises. – additional material

  • 4. TA.
  • 5. Professor.
  • 6. Prerequisites? assignment #0

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 2 /16

The storyline − →

The Storyline

  • 1. What is Learning?
  • 2. Can We do it?
  • 3. How to do it?
  • 4. How to do it well?
  • 5. General principles?
  • 6. Advanced techniques.
  • 7. Other Learning Paradigms.

concepts theory practice

  • ur language will be mathematics . . .

. . . our sword will be computer algorithms

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 3 /16

The applications − → c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 4 /16

Define a tree − →

slide-2
SLIDE 2

Let’s Define a Tree?

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 5 /16

A definition − →

Let’s Define a Tree?

A brown trunk moving upwards and branching with leaves . . .

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 6 /16

Does it work? − →

Are These Trees?

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 7 /16

Learning a Tree − →

Learning “What are Trees” is ‘Easy’

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 8 /16

Recognizing is easy − →

slide-3
SLIDE 3

Defining is Hard; Recognizing is Easy

Hard to give a complete mathematical definition of a tree. Even a 3 year old can tell a tree from a non-tree. The 3 year old has learned from data.

(Other tasks like graphics or GAN?)

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 9 /16

Rating movies − →

Learning to Rate Movies

  • Can we predict how a viewer would rate a movie?
  • Why? So that Netflix can make better movie recommendations, and get more rentals.
  • $1 million prize for a mere 10% improvement in their recommendation system.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 10 /16

There’s a pattern, we have data − →

Previous Ratings Reflect Future Ratings

movie: viewer: l i k e s T

  • m

C r u i s e ? l i k e s c

  • m

e d y ? l i k e s a c t i

  • n

? c

  • m

e d y c

  • n

t e n t T

  • m

C r u i s e i n i t ? a c t i

  • n

c

  • n

t e n t b l

  • c

k b u s t e r ?

predicted rating

Match corresponding factors then add their contributions p r e f e r s b l

  • c

k b u s t e r s ?

  • Viewer taste & movie content imply viewer rating.
  • No magical formula to predict viewer rating.
  • Netflix has data. We can learn to identify movie

“categories” as well as viewer “preferences”

Class Motto: A pattern exists. We don’t know it. We have data to learn it.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 11 /16

Credit approval − →

Credit Approval

Let’s use a conceptual example to crystallize the issues.

age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit?

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 12 /16

There’s a pattern, we have data − →

slide-4
SLIDE 4

Credit Approval

Let’s use a conceptual example to crystallize the issues.

  • Using salary, debt, years in residence, etc., approve for credit or not.
  • No magic credit approval formula.
  • Banks have lots of data.

– customer information: salary, debt, etc. – whether or not they defaulted on their credit. age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit?

A pattern exists. We don’t know it. We have data to learn it.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 13 /16

Key players − →

The Key Players

  • Salary, debt, years in residence, . . .

input x ∈ Rd = X.

  • Approve credit or not
  • utput y ∈ {−1, +1} = Y.
  • True relationship between x and y

target function f : X → Y.

(The target f is unknown.)

  • Data on customers

data set D = (x1, y1), . . . , (xN, yN).

(yn = f(xn).)

X Y and D are given by the learning problem; The target f is fixed but unknown.

We learn the function f from the data D.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 14 /16

Learning − →

Learning

  • Start with a set of candidate hypotheses H which you think are likely to represent f.

H = {h1, h2, . . . , } is called the hypothesis set or model.

  • Select a hypothesis g from H. The way we do this is called a learning algorithm.
  • Use g for new customers. We hope g ≈ f.

X Y and D are given by the learning problem; The target f is fixed but unknown. We choose H and the learning algorithm

This is a very general setup (eg. choose H to be all possible hypotheses)

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 15 /16

Summary of learning setup − →

Summary of the Learning Setup

(ideal credit approval formula) (historical records of credit customers) (set of candidate formulas) (learned credit approval formula) UNKNOWN TARGET FUNCTION f : X → Y TRAINING EXAMPLES (x1, y1), (x2, y2), . . . , (xN, yN) HYPOTHESIS SET H FINAL HYPOTHESIS g ≈ f LEARNING ALGORITHM A

yn = f(xn)

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 16 /16