Introduction to machine learning COMS 4721 Learning from data - - PowerPoint PPT Presentation

introduction to machine learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to machine learning COMS 4721 Learning from data - - PowerPoint PPT Presentation

Introduction to machine learning COMS 4721 Learning from data Machine learning : the study of computational mechanisms that learn from data in order to make predictions and decisions. Example 1: image classification A birdwatcher


slide-1
SLIDE 1

Introduction to machine learning

COMS 4721

slide-2
SLIDE 2

Learning from data

  • Machine learning: the study of computational mechanisms that

“learn” from data in order to make predictions and decisions.

slide-3
SLIDE 3

Example 1: image classification

  • A birdwatcher takes pictures of birds and sorts the photos by species.
  • Goal: automatically recognize bird species in new photos.

Indigo bunting

slide-4
SLIDE 4

Example 2: matchmaking

  • An online matchmaking service introduces thousands of pairs of

students to each other, and receives feedback about whether the pair actually goes on a date or not.

  • Goal: predict how likely any pair of students will go on a date if

introduced to each other.

Alice Bob Charlie Daisy Alice 1 1 Bob 1 1 Charlie 1 Daisy 1

slide-5
SLIDE 5

Example 3: machine translation

  • Linguists provide translations of all English language books into

French, sentence-by-sentence.

  • Goal: automatically translate any English sentence into French.
slide-6
SLIDE 6

Example 4: personalized medicine

  • A physician attends to patients at a hospital and prescribes

treatments on the basis of the patients’ symptoms, medical histories, genetic profiles, etc. The health outcome (e.g., recovery, death) for each patient is observed a day or so after the treatment.

  • Goal: prescribe a personalized treatment for any patient that delivers

the best possible health outcome for that patient.

slide-7
SLIDE 7

Basic setting

  • Data: labeled examples

𝑦", 𝑧" , 𝑦%, 𝑧% , …, 𝑦', 𝑧' ∈ 𝒴×𝒵

  • Goal: “learn” a function

𝑔

  • : 𝒴 → 𝒝

from the data that is ultimately used for prediction/decision-making.

  • 𝑦1 ∈ 𝒴: representation of 𝑗34 object (𝒴 = input/feature space)
  • 𝑧1 ∈ 𝒵: label pertinent to 𝑗34 object (𝒵 = output/label space)
  • 𝒝: action space (usually 𝒵 = 𝒝 for prediction problems)
slide-8
SLIDE 8

Prediction problems

  • Goal: learn a prediction function (predictor) that provides “correct”

labels to inputs that may be encountered in the future (i.e., new unlabeled examples).

Collection of labeled examples

Learning algorithm

Learned predictor New unlabeled example Prediction

Why should this be possible?

slide-9
SLIDE 9

Some basic issues

  • 1. How should we represent the input objects?
  • 2. What types of prediction functions should we consider?
  • 3. How should data be used to select a predictor?
  • 4. How can we evaluate whether learning was successful?
slide-10
SLIDE 10

Special case: binary classification

𝒵 = 0,1 (e.g., is it an indigo bunting or not) Why is this hard?

  • 1. Only have labels for 𝑦1 18"

' , which together comprise a miniscule

fraction of the input space 𝒴.

  • 2. Relationship between an input 𝑦 ∈ 𝒴 and its correct label 𝑧 ∈ 𝒵

may be complicated, possibly ambiguous/non-deterministic!

  • 3. Can be many functions that perfectly match input/output

relationship on 𝑦1, 𝑧1

18" ' . How should we pick one among these?

1

slide-11
SLIDE 11

Topics for this course (tentative)

  • 1. Non-parametric models (e.g., nearest neighbor, decision trees)
  • 2. Parametric models (e.g., generative models, linear models)
  • 3. Ensemble methods (e.g., boosting, hedging)
  • 4. Regression (e.g., least squares, Lasso)
  • 5. Representation learning (e.g., mixture models, PCA, auto-encoders)
  • 6. Other topics as time permits (e.g., sequence models, partial feedback)
slide-12
SLIDE 12

A small sample of other topics in ML…

  • Advanced issues:
  • Distributed learning
  • Incomplete data
  • Causal inference
  • Privacy and fairness
  • Other models of learning:
  • Semi-supervised learning
  • Active learning
  • Online learning
  • Reinforcement learning
  • Application areas:
  • Natural language processing
  • Speech recognition
  • Computer vision
  • Computational advertising
  • Modes of study:
  • Mathematical analysis
  • Cross-domain evaluations
  • End-to-end application study
slide-13
SLIDE 13

This course

  • Mathematical prerequisites:
  • Multivariable calculus
  • Linear algebra
  • Probability (and some basic statistics would be helpful)
  • Basic data structures and algorithms
  • Computational prerequisites:
  • You should have regular access and be able to program in MATLAB, Python, or R.
  • Course requirements:
  • Around four homework assignments (theoretical & empirical exercises): 24%
  • Two in-class exams (March 3, April 28): 25% each
  • Practical modeling project: 26%
  • No late assignments accepted, no make-up exams
slide-14
SLIDE 14

Resources

  • Course website: http://www.cs.columbia.edu/~djhsu/coms4721-s16
  • Course staff:
  • Instructor: Daniel Hsu
  • Instructional assistants: Edward Li, Siddharth Varshney, Robert Ying
  • Office hours, contact information, online forum: see course website
  • Course materials:
  • Course policies: posted on the course website
  • Lecture slides, notes, etc.: posted on the course website
  • Readings: “A Course in Machine Learning” and “The Elements of Statistical

Learning” (both available free online), as well as other materials posted on the course website