Introduction to Machine Learning
- 1. Overview
Alex Smola Carnegie Mellon University
http://alex.smola.org/teaching/cmu2013-10-701 10-701
Introduction to Machine Learning 1. Overview Alex Smola Carnegie - - PowerPoint PPT Presentation
Introduction to Machine Learning 1. Overview Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 Administrative Stuff Important Stuff Lectures Monday and Wednesday 12:00-1:20pm Recitation
Alex Smola Carnegie Mellon University
http://alex.smola.org/teaching/cmu2013-10-701 10-701
Administrative Stuff
Mid project report due after midterm
The exams without technology. You can bring a paper notebook.
Best 4 out of 5 homeworks. To receive points you must submit on due date in class. No exceptions.
(questions, discussions, announcements)
(videos, problems, slides, timing, extra resources)
Can we beat the Stanford class? http://cs229.stanford.edu/projects2012.html
If you got lost now is a good time to catch up again
have comments, concerns, suggestions!
This is our FIRST class at CMU.
Problems, Statistics, Applications
Naive Bayes, Nearest Neighbors, Decision Trees, Neural Networks, Perceptron
Support Vector Classification, Regression, Novelty Detection, Kernel PCA
Risk Minimization, Convergence Bounds, Information Theory
Exponential Families, Graphical Models, Dynamic Programming, Latent Variables, Sampling
Online Learning, Bandits, Reinforcement Learning
Problems, Statistics, Applications
Naive Bayes, Nearest Neighbors, Decision Trees, Neural Networks, Perceptron
Support Vector Classification, Regression, Novelty Detection, Kernel PCA
Risk Minimization, Convergence Bounds, Information Theory
Exponential Families, Graphical Models, Dynamic Programming, Latent Variables, Sampling
Online Learning, Bandits, Reinforcement Learning
for the internet all you need for a startup for your PhD for Wall Street biology energy
Amazon books
Don’t mix preferences
Avatar learns from your behavior
Black & White Lionsgate Studios
Drivatar in Forza
ham spam
10 20 30 40 0.1 0.2 0.3 Propotion Day
Baseball Finance Jobs Dating
10 20 30 40 0.1 0.2 0.3 0.4 0.5 Propotion Day
Baseball Dating Celebrity Health Snooki Tom Cruise Katie Holmes Pinkett Kudrow Hollywood League baseball basketball, doublehead Bergesen Griffey bullpen Greinke skin body fingers cells toes wrinkle layers women men dating singles personals seeking match
Dating Baseball Celebrity Health
job career business assistant hiring part-time receptionist financial Thomson chart real Stock Trading currency
Jobs Finance
determine automatically determine automatically
segment image recognize handwriting
http://heli.stanford.edu
why these ads?
(system improves)
(system sort-of improves, ruleset is a mess)
(lots of rules, but they work better)
(combining many trees, works even better)
(machine learning system is replaced entirely)
IF x THEN DO y
Given x find y in {-1, 1}
Given x find y in {1, ... k}
Given x find y in R (or Rd)
Given sequence x1 ... xl find y1 ... yl
Given x find a point in the hierarchy of y (e.g. a tree)
Given xt and yt-1 ... y1 find yt
l(y, f(x))
map image x to digit y
linear nonlinear
given sequence gene finding speech recognition activity segmentation named entities
webpages genes
tomorrow’s stock price
Find a set of prototypes representing the data
Find a subspace representing the data
Find a latent causal sequence for observations
Find (small) set of factors for observation
Find the odd one out
...
Variance component model to account for sample structure in genome-wide association studies, Nature Genetics 2010
find them automatically
typical atypical
iid = Independently Identically Distributed
(not available at training time)
Test data x available at training time (you see the exam questions early)
Lots of unlabeled data available at training time (past exam questions)
Observe a number of similar problems at once
We only have training set. Do the best with it.
We have lots more problems that need to be solved with the same method.
Use correlation between tasks for better result
For many cases both sets of covariates are available
behavior
Observe training data (x1,y1) ... (xl,yl) then deploy
Observe x, predict f(x), observe y (stock market, homework)
Query y for x, improve model, pick new x
Pick arm, get reward, pick new arm (also with context)
Take action, environment responds, take new action
training data
build model
test
4 8 3 5
environment (cooperative, adversary, doesn’t care) memory (goldfish, elephant) state space (tic tac toe, chess, car)
Discriminative vs. Generative (mainly relevant for supervised models)
probabilities
really complicated (e.g. texts, images, movies)
For previously seen instance remember label
small amounts of data. Why?
f(x) = ax + b minimize
a,b m
X
i=1
1 2(axi + b − yi)2 ∂a [. . .] = 0 =
m
X
i=1
xi(axi + b − yi) ∂b [. . .] = 0 =
m
X
i=1
(axi + b − y)
f(x) = ha, xi + b = hw, (x, 1)i minimize
w m
X
i=1
1 2(hw, ¯ xii yi)2 0 =
m
X
i=1
¯ xi(hw, ¯ xii yi) ( ) " m X
i=1
¯ xi¯ x>
i
# w =
m
X
i=1
yi¯ xi
f(x) = hw, (1, x)i f(x) = ⌦ w, (1, x, x2) ↵ f(x) = ⌦ w, (1, x, x2, x3) ↵ f(x) = hw, φ(x)i
f(x) = ha, xi + b = hw, (x, 1)i minimize
w m
X
i=1
1 2(hw, ¯ xii yi)2 0 =
m
X
i=1
¯ xi(hw, ¯ xii yi) ( ) " m X
i=1
¯ xi¯ x>
i
# w =
m
X
i=1
yi¯ xi
0 =
m
X
i=1
φ(xi)(hw, φ(xi)i yi) ( ) " m X
i=1
φ(xi)φ(xi)> # w =
m
X
i=1
yiφ(xi)
f(x) = hw, φ(x)i minimize
w m
X
i=1
1 2(hw, φ(xi)i yi)2
Training phi_xx = [xx.^4, xx.^3, xx.^2, xx, 1.0 + 0.0 * xx]; w = (yy' * phi_xx) / (phi_xx' * phi_xx); Testing phi_x = [x.^4, x.^3, x.^2, x, 1.0 + 0.0 * x]; y = phi_x * w';
warning: matrix singular to machine precision, rcond = 5.8676e-19 warning: attempting to find minimum norm solution warning: matrix singular to machine precision, rcond = 5.86761e-19 warning: attempting to find minimum norm solution warning: dgelsd: rank deficient 8x8 matrix, rank = 7 warning: matrix singular to machine precision, rcond = 1.10156e-21 warning: attempting to find minimum norm solution warning: matrix singular to machine precision, rcond = 1.10145e-21 warning: attempting to find minimum norm solution warning: dgelsd: rank deficient 9x9 matrix, rank = 6 warning: matrix singular to machine precision, rcond = 2.16217e-26 warning: attempting to find minimum norm solution warning: matrix singular to machine precision, rcond = 1.66008e-26 warning: attempting to find minimum norm solution warning: dgelsd: rank deficient 10x10 matrix, rank = 5
warning: matrix singular to machine precision, rcond = 5.8676e-19 warning: attempting to find minimum norm solution warning: matrix singular to machine precision, rcond = 5.86761e-19 warning: attempting to find minimum norm solution warning: dgelsd: rank deficient 8x8 matrix, rank = 7 warning: matrix singular to machine precision, rcond = 1.10156e-21 warning: attempting to find minimum norm solution warning: matrix singular to machine precision, rcond = 1.10145e-21 warning: attempting to find minimum norm solution warning: dgelsd: rank deficient 9x9 matrix, rank = 6 warning: matrix singular to machine precision, rcond = 2.16217e-26 warning: attempting to find minimum norm solution warning: matrix singular to machine precision, rcond = 1.66008e-26 warning: attempting to find minimum norm solution warning: dgelsd: rank deficient 10x10 matrix, rank = 5
(model is too simple to explain data)
(model is too complicated to learn from data)
(failed matrix inverse)
Need to quantify model complexity vs. data
>1B images, 40h video/minute
>1B images, 40h video/minute
crawl it
we need Big Learning
>10B useful webpages
(this is a small subset, maybe 10%) 10k/page = 100TB ($10k for disks or EBS 1 month )
10ms/page = 1 day afford 1-10 MIP/page ($20k on EC2 for 0.68$/h)
($10k/month via ISP or EC2)
($70k on EC2 for 0.085$/h)
100M-1B vertices
(crawl it at 10 queries/s)
>1B texts
>1B texts
impossible without NDA
alex.smola.org
>1B ‘identities’
expensive! Do not use!
computer vision bioinformatics
personalized sensors
ubiquitous control
computer vision bioinformatics
personalized sensors
ubiquitous control
in the cloud
http://alex.smola.org/teaching/ cmu2013-10-701/papers/intro_chapter.pdf
http://mlss.cc (lots of videos there)
https://www.coursera.org/course/ml