Learning From Data Lecture 1 The Learning Problem Introduction - - PowerPoint PPT Presentation

▶

Dec 16, 2022 334 likes •501 views

Learning From Data Lecture 1 The Learning Problem Introduction Motivation Credit Default - A Running Example Summary of the Learning Problem M. Magdon-Ismail CSCI 4100/6100 Resources 1. Web Page: www.cs.rpi.edu/ magdon/courses/learn.php

SLIDE 1

Learning From Data Lecture 1 The Learning Problem

Introduction Motivation Credit Default - A Running Example Summary of the Learning Problem

M. Magdon-Ismail

CSCI 4100/6100

SLIDE 2

Resources

1. Web Page: www.cs.rpi.edu/∼magdon/courses/learn.php

– course info: www.cs.rpi.edu/∼magdon/courses/learn/info.pdf – slides: www.cs.rpi.edu/∼magdon/courses/learn/slides.html – assignments: www.cs.rpi.edu/∼magdon/courses/learn/assign.html

2. Text Book:

Learning From Data

Abu-Mostafa, Magdon-Ismail, Lin

3. Book Forum: book.caltech.edu/bookforum

– discussion about any material in book including problems and exercises. – additional material

4. TA.
5. Professor.
6. Prerequisites? assignment #0

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 2 /16

The storyline − →

SLIDE 3

The Storyline

1. What is Learning?
2. Can We do it?
3. How to do it?
4. How to do it well?
5. General principles?
6. Advanced techniques.
7. Other Learning Paradigms.

concepts theory practice

ur language will be mathematics . . .

. . . our sword will be computer algorithms

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 3 /16

The applications − →

SLIDE 4

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 4 /16

Define a tree − →

SLIDE 5

Let’s Define a Tree?

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 5 /16

A definition − →

SLIDE 6

Let’s Define a Tree?

A brown trunk moving upwards and branching with leaves . . .

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 6 /16

Does it work? − →

SLIDE 7

Are These Trees?

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 7 /16

Learning a Tree − →

SLIDE 8

Learning “What are Trees” is ‘Easy’

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 8 /16

Recognizing is easy − →

SLIDE 9

Defining is Hard; Recognizing is Easy

Hard to give a complete mathematical definition of a tree. Even a 3 year old can tell a tree from a non-tree. The 3 year old has learned from data.

(Other tasks like graphics or GAN?)

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 9 /16

Rating movies − →

SLIDE 10

Learning to Rate Movies

Can we predict how a viewer would rate a movie?
Why? So that Netflix can make better movie recommendations, and get more rentals.
$1 million prize for a mere 10% improvement in their recommendation system.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 10 /16

There’s a pattern, we have data − →

SLIDE 11

Previous Ratings Reflect Future Ratings

movie: viewer:

l i k e s T

C r u i s e ? l i k e s c

e d y ? l i k e s a c t i

? c

e d y c

t e n t T

C r u i s e i n i t ? a c t i

t e n t b l

k b u s t e r ?

predicted rating

Match corresponding factors then add their contributions p r e f e r s b l

k b u s t e r s ?

Viewer taste & movie content imply viewer rating.
No magical formula to predict viewer rating.
Netflix has data. We can learn to identify movie

“categories” as well as viewer “preferences”

Class Motto: A pattern exists. We don’t know it. We have data to learn it.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 11 /16

Credit approval − →

SLIDE 12

Credit Approval

Let’s use a conceptual example to crystallize the issues.

age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit?

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 12 /16

There’s a pattern, we have data − →

SLIDE 13

Credit Approval

Let’s use a conceptual example to crystallize the issues.

Using salary, debt, years in residence, etc., approve for credit or not.
No magic credit approval formula.
Banks have lots of data.

– customer information: salary, debt, etc. – whether or not they defaulted on their credit. age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit?

A pattern exists. We don’t know it. We have data to learn it.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 13 /16

Key players − →

SLIDE 14

The Key Players

Salary, debt, years in residence, . . .

input x ∈ Rd = X.

Approve credit or not
utput y ∈ {−1, +1} = Y.
True relationship between x and y

target function f : X → Y.

(The target f is unknown.)

Data on customers

data set D = (x1, y1), . . . , (xN, yN).

(yn = f(xn).)

X Y and D are given by the learning problem; The target f is fixed but unknown.

We learn the function f from the data D.

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 14 /16

Learning − →

SLIDE 15

Learning

Start with a set of candidate hypotheses H which you think are likely to represent f.

H = {h1, h2, . . . , } is called the hypothesis set or model.

Select a hypothesis g from H. The way we do this is called a learning algorithm.
Use g for new customers. We hope g ≈ f.

X Y and D are given by the learning problem; The target f is fixed but unknown. We choose H and the learning algorithm

This is a very general setup (eg. choose H to be all possible hypotheses)

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 15 /16

Summary of learning setup − →

SLIDE 16

Summary of the Learning Setup

(ideal credit approval formula) (historical records of credit customers) (set of candidate formulas) (learned credit approval formula) UNKNOWN TARGET FUNCTION f : X → Y TRAINING EXAMPLES (x1, y1), (x2, y2), . . . , (xN, yN) HYPOTHESIS SET H FINAL HYPOTHESIS g ≈ f LEARNING ALGORITHM A

yn = f(xn)

c A M L Creator: Malik Magdon-Ismail

The Learning Problem: 16 /16