and Data Science Lecture 1, September 9, 2015 Maria-Florina (Nina) - - PowerPoint PPT Presentation

and data science
SMART_READER_LITE
LIVE PREVIEW

and Data Science Lecture 1, September 9, 2015 Maria-Florina (Nina) - - PowerPoint PPT Presentation

Foundations of Machine Learning and Data Science Lecture 1, September 9, 2015 Maria-Florina (Nina) Balcan Course Staff Instructors: Nina Balcan http://www.cs.cmu.edu/~ninamf Avrim Blum http://www.cs.cmu.edu/~avrim


slide-1
SLIDE 1

Foundations of Machine Learning and Data Science

Maria-Florina (Nina) Balcan Lecture 1, September 9, 2015

slide-2
SLIDE 2

Course Staff

  • Nina Balcan http://www.cs.cmu.edu/~ninamf
  • Avrim Blum http://www.cs.cmu.edu/~avrim

Instructors:

  • Nika Haghtalab http://www.cs.cmu.edu/~nhaghtal
  • Sarah Allen http://www.cs.cmu.edu/~srallen

TAs:

slide-3
SLIDE 3

Lectures in general

On the board Ocasionally, will use slides

slide-4
SLIDE 4

4

Image Classification Document Categorization Speech Recognition Branch Prediction Protein Classification

Spam Detection

Fraud Detection

Machine Learning

Playing Games Computational Advertising

slide-5
SLIDE 5

Machine Learning is Changing the World

“A breakthrough in machine learning would be worth ten Microsofts” (Bill Gates, Microsoft) “Machine learning is the hot new thing”

(John Hennessy, President, Stanford)

“Web rankings today are mostly a matter of machine learning” (Prabhakar Raghavan, VP Engineering at Google)

slide-6
SLIDE 6

The COOLEST TOPIC IN SCIENCE

  • “A breakthrough in machine learning would be worth

ten Microsofts” (Bill Gates, Chairman, Microsoft)

  • “Machine learning is the next Internet”

(Tony Tether, Director, DARPA)

  • Machine learning is the hot new thing”

(John Hennessy, President, Stanford)

  • “Web rankings today are mostly a matter of machine

learning” (Prabhakar Raghavan, Dir. Research, Yahoo)

  • “Machine learning is going to result in a real

revolution” (Greg Papadopoulos, CTO, Sun)

  • “Machine learning is today’s discontinuity”

(Jerry Yang, CEO, Yahoo)

slide-7
SLIDE 7

This course: foundations of Machine Learning

A2 Â

and Data Science

slide-8
SLIDE 8
  • what kinds of tasks we can hope to learn, and from what

kind of data

Goals of Machine Learning Theory

Develop and analyze models to understand:

  • what types of guarantees might we hope to achieve
  • prove guarantees for practically successful algs (when will

they succeed, how long will they take?)

  • Algorithms

Interesting connections to other areas including:

  • Optimization
  • Probability & Statistics
  • Game Theory
  • Information Theory
  • Complexity Theory
  • develop new algs that provably meet desired criteria (potentially

within new learning paradigms)

slide-9
SLIDE 9

9

Example: Supervised Classification

Goal: use emails seen so far to produce good prediction rule for future data.

Not spam spam

Decide which emails are spam and which are important.

Supervised classification

slide-10
SLIDE 10

10

example label

Reasonable RULES:

Predict SPAM if unknown AND (money OR pills) Predict SPAM if 2money + 3pills –5 known > 0

Represent each message by features. (e.g., keywords, spelling, etc.)

Example: Supervised Classification

+

  • +

+ +

  • Linearly separable
slide-11
SLIDE 11

11

Two Main Aspects of Supervised Learning

Algorithm Design. How to optimize?

Automatically generate rules that do well on observed data.

Confidence Bounds, Generalization Guarantees, Sample Complexity

Confidence for rule effectiveness on future data.

Well understood for passive supervised learning.

slide-12
SLIDE 12

Using Unlabeled Data and Interaction for Learning

Computer Vision Search/Information Retrieval Computational Biology Spam Detection Medical Diagnosis Robotics Application Areas

slide-13
SLIDE 13

13

Billions of webpages

Only a tiny fraction can be annotated by human experts.

Massive Amounts of Raw Data

Images Protein sequences

slide-14
SLIDE 14

14

Expert Labeler

Semi-Supervised Learning

raw data

face not face

Labeled data

Classifier

slide-15
SLIDE 15

15

Active Learning

face

O O O

Expert Labeler

raw data

Classifier

not face

slide-16
SLIDE 16

16

  • Semi-Supervised Learning

Using cheap unlabeled data in addition to labeled data.

  • Active Learning

The algorithm interactively asks for labels of informative examples.

Other Protocols for Supervised Learning

Theoretical understanding entirely lacking 10 years ago. Lots of progress recently. We will cover some of these.

slide-17
SLIDE 17

Distributed Learning

Many ML problems today involve massive amounts of data distributed across multiple locations. Often would like low error hypothesis wrt the overall distrib.

slide-18
SLIDE 18

Distributed Learning

E.g., medical data Data distributed across multiple locations.

slide-19
SLIDE 19

Distributed Learning

E.g., scientific data Data distributed across multiple locations.

slide-20
SLIDE 20

Distributed Learning

  • Data distributed across multiple locations.
  • Each has a piece of the overall data pie.

Important question: how much communication? Plus, privacy & incentives.

  • To learn over the combined D, must communicate.
slide-21
SLIDE 21

The World is Changing Machine Learning

Many competing resources & constraints. E.g.,

  • Computational efficiency

(noise tolerant algos)

  • Communication
  • Human labeling effort
  • Statistical efficiency
  • Privacy/Incentives

New approaches. E.g.,

  • Semi-supervised learning
  • Distributed learning
  • Interactive learning
  • Multi-task/transfer learning
  • Never ending learning
  • Deep Learning
slide-22
SLIDE 22

Structure of the Class

  • Simple algos and hardness results for supervised learning.
  • Classic, state of the art algorithms: AdaBoost and SVM

(kernel based mehtods).

  • Basic models: PAC, SLT.
  • Standard Sample Complexity Results (VC dimension)
  • Weak-learning vs. Strong-learning

Basic Learning Paradigm: Passive Supervised Learning

  • Modern Sample Complexity Results
  • Rademacher Complexity; localization
  • Margin analysis of Boosting and SVM
slide-23
SLIDE 23

Structure of the Class

  • Incorporating Unlabeled Data in the Learning Process.
  • Incorporating Interaction in the Learning Process:
  • Active Learning
  • More general types of Interaction

Other Learning Paradigms

  • Distributed Learning.
  • Transfer learning/Multi-task learning/Life-long learning.
  • Deep Learning.
  • Foundations and algorithms for constraints/externalities.

E.g., privacy, limited memory, and communication.

slide-24
SLIDE 24

Structure of the Class

  • Online Learning, Optimization, and Game Theory
  • connections to Boosting

Other Topics.

  • Methods for summarizing and making sense of massive

datasets including:

  • unsupervised learning.
  • spectral, combinatorial techniques.
  • streaming algorithms.
slide-25
SLIDE 25

Admin

  • Course web page:

http://www.cs.cmu.edu/~ninamf/courses/806/10-806-index.html

Two grading schemes: 1) Project Oriented.

  • Project [60%]
  • Take-home final [10%]
  • Hwks + grading [30%]

2) Homework Oriented.

  • Hwk +grading [60%]
  • Take-home final [10%]
  • Project [30%]
slide-26
SLIDE 26

Admin

1) Project Oriented.

  • Project [60%]
  • Take-home final [10%]
  • explore a theoretical or empirical question;
  • write-up --- ideally aim for a conference submission!
  • Small groups OK.
  • Hwks + grading [30%]
  • Course web page:

http://www.cs.cmu.edu/~ninamf/courses/806/10-806-index.html

slide-27
SLIDE 27

Admin

2) Homework Oriented.

  • Take-home final [10%]
  • Project [30%]
  • read a couple of papers and explain the idea.
  • Hwk +grading [60%]
  • Course web page:

http://www.cs.cmu.edu/~ninamf/courses/806/10-806-index.html

slide-28
SLIDE 28

Lectures in general

On the board Ocasionally, will use slides