Learning From Data Lecture 1 The Learning Problem Introduction - - PowerPoint PPT Presentation

learning from data lecture 1 the learning problem
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 1 The Learning Problem Introduction - - PowerPoint PPT Presentation

Learning From Data Lecture 1 The Learning Problem Introduction Motivation Credit Default - A Running Example Summary of the Learning Problem M. Magdon-Ismail CSCI 4100/6100 Resources 1. Web Page: www.cs.rpi.edu/ magdon/courses/learn.php


slide-1
SLIDE 1

Learning From Data Lecture 1 The Learning Problem

Introduction Motivation Credit Default - A Running Example Summary of the Learning Problem

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

Resources

  • 1. Web Page: www.cs.rpi.edu/∼magdon/courses/learn.php

– course info: www.cs.rpi.edu/∼magdon/courses/learn/info.pdf – slides: www.cs.rpi.edu/∼magdon/courses/learn/slides.html – assignments: www.cs.rpi.edu/∼magdon/courses/learn/assign.html

  • 2. Text Book:

Learning From Data

Abu-Mostafa, Magdon-Ismail, Lin

  • 3. Book Forum: book.caltech.edu/bookforum

– discussion about any material in book including problems and exercises. – additional material

  • 4. TA.
  • 5. Professor.
  • 6. Prerequisites? assignment #0

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 2 /16

The storyline − →

slide-3
SLIDE 3

The Storyline

  • 1. What is Learning?
  • 2. Can We do it?
  • 3. How to do it?
  • 4. How to do it well?
  • 5. General principles?
  • 6. Advanced techniques.
  • 7. Other Learning Paradigms.

concepts theory practice

  • ur language will be mathematics . . .

. . . our sword will be computer algorithms

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 3 /16

The applications − →

slide-4
SLIDE 4

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 4 /16

Define a tree − →

slide-5
SLIDE 5

Let’s Define a Tree?

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 5 /16

A definition − →

slide-6
SLIDE 6

Let’s Define a Tree?

A brown trunk moving upwards and branching with leaves . . .

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 6 /16

Does it work? − →

slide-7
SLIDE 7

Are These Trees?

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 7 /16

Learning a Tree − →

slide-8
SLIDE 8

Learning “What are Trees” is ‘Easy’

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 8 /16

Recognizing is easy − →

slide-9
SLIDE 9

Defining is Hard; Recognizing is Easy

Hard to give a complete mathematical definition of a tree. Even a 3 year old can tell a tree from a non-tree. The 3 year old has learned from data.

(Other tasks like graphics or GAN?)

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 9 /16

Rating movies − →

slide-10
SLIDE 10

Learning to Rate Movies

  • Can we predict how a viewer would rate a movie?
  • Why? So that Netflix can make better movie recommendations, and get more rentals.
  • $1 million prize for a mere 10% improvement in their recommendation system.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 10 /16

There’s a pattern, we have data − →

slide-11
SLIDE 11

Previous Ratings Reflect Future Ratings

movie: viewer:

l i k e s T

  • m

C r u i s e ? l i k e s c

  • m

e d y ? l i k e s a c t i

  • n

? c

  • m

e d y c

  • n

t e n t T

  • m

C r u i s e i n i t ? a c t i

  • n

c

  • n

t e n t b l

  • c

k b u s t e r ?

predicted rating

Match corresponding factors then add their contributions p r e f e r s b l

  • c

k b u s t e r s ?

  • Viewer taste & movie content imply viewer rating.
  • No magical formula to predict viewer rating.
  • Netflix has data. We can learn to identify movie

“categories” as well as viewer “preferences”

Class Motto: A pattern exists. We don’t know it. We have data to learn it.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 11 /16

Credit approval − →

slide-12
SLIDE 12

Credit Approval

Let’s use a conceptual example to crystallize the issues.

age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit?

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 12 /16

There’s a pattern, we have data − →

slide-13
SLIDE 13

Credit Approval

Let’s use a conceptual example to crystallize the issues.

  • Using salary, debt, years in residence, etc., approve for credit or not.
  • No magic credit approval formula.
  • Banks have lots of data.

– customer information: salary, debt, etc. – whether or not they defaulted on their credit. age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit?

A pattern exists. We don’t know it. We have data to learn it.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 13 /16

Key players − →

slide-14
SLIDE 14

The Key Players

  • Salary, debt, years in residence, . . .

input x ∈ Rd = X.

  • Approve credit or not
  • utput y ∈ {−1, +1} = Y.
  • True relationship between x and y

target function f : X → Y.

(The target f is unknown.)

  • Data on customers

data set D = (x1, y1), . . . , (xN, yN).

(yn = f(xn).)

X Y and D are given by the learning problem; The target f is fixed but unknown.

We learn the function f from the data D.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 14 /16

Learning − →

slide-15
SLIDE 15

Learning

  • Start with a set of candidate hypotheses H which you think are likely to represent f.

H = {h1, h2, . . . , } is called the hypothesis set or model.

  • Select a hypothesis g from H. The way we do this is called a learning algorithm.
  • Use g for new customers. We hope g ≈ f.

X Y and D are given by the learning problem; The target f is fixed but unknown. We choose H and the learning algorithm

This is a very general setup (eg. choose H to be all possible hypotheses)

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 15 /16

Summary of learning setup − →

slide-16
SLIDE 16

Summary of the Learning Setup

(ideal credit approval formula) (historical records of credit customers) (set of candidate formulas) (learned credit approval formula) UNKNOWN TARGET FUNCTION f : X → Y TRAINING EXAMPLES (x1, y1), (x2, y2), . . . , (xN, yN) HYPOTHESIS SET H FINAL HYPOTHESIS g ≈ f LEARNING ALGORITHM A

yn = f(xn)

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 16 /16