 
              Spring 2017 CIS 493, EEC 492, EEC 592: Autonomous Intelligent Robotics Instructor: Shiqi Zhang http://eecs.csuohio.edu/~szhang/teaching/17spring/
Machine learning: an introduction Slides adapted from Ray Mooney, Pedro Domingos, James Hays, and Yi-Fan Chang
A Few Quotes ● “A breakthrough in machine learning would be worth ten Microsofts” (Bill Gates, Chairman, Microsoft) ● “Machine learning is the next Internet” (Tony Tether, Director, DARPA) ● Machine learning is the hot new thing” (John Hennessy, President, Stanford) ● “Web rankings today are mostly a matter of machine learning” (Prabhakar Raghavan, Dir. Research, Yahoo) ● “Machine learning is going to result in a real revolution” (Greg Papadopoulos, CTO, Sun) ● “Machine learning is today’s discontinuity” (Jerry Yang, CEO, Yahoo)
So What Is Machine Learning? ● Automating automation ● Getting computers to program themselves ● Writing software is the bottleneck ● Let the data do the work instead!
Traditional Programming Data Output Computer Program Machine Learning Data Program Computer Output
Magic? No, more like gardening • Seeds = Algorithms • Nutrients = Data • Gardener = You • Plants = Programs
Why Study Machine Learning? Engineering Better Computing Systems ● Develop systems that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tuned to a specific task ( knowledge engineering bottleneck ). ● Develop systems that can automatically adapt and customize themselves to individual users. Personalized news or mail filter – – Personalized tutoring ● Discover new knowledge from large databases ( data mining ). Market basket analysis (e.g. diapers and beer) – Medical text mining (e.g. migraines to calcium channel blockers to – magnesium) 7
Why Study Machine Learning? The Time is Ripe ● Many basic effective and efficient algorithms available. ● Large amounts of on-line data available. ● Large amounts of computational resources available. 8
Related Disciplines ● Artificial Intelligence ● Data Mining ● Probability and Statistics ● Information theory ● Numerical optimization ● Computational complexity theory ● Control theory (adaptive) ● Psychology (developmental, cognitive) ● Neurobiology ● Linguistics ● Philosophy 9
Defining the Learning Task Improve on task, T, with respect to performance metric, P, based on experience, E. T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Recognizing hand-written words P: Percentage of words correctly classified E: Database of human-labeled images of handwritten words T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged error E: A sequence of images and steering commands recorded while observing a human driver. T: Categorize email messages as spam or legitimate. P: Percentage of email messages correctly classified. 10 E: Database of emails, some with human-given labels
ML in a Nutshell ● Tens of thousands of machine learning algorithms ● Hundreds new every year ● Every machine learning algorithm has three components: – Representation – Evaluation – Optimization
Representation ● Decision trees ● Sets of rules / Logic programs ● Instances ● Graphical models (Bayes/Markov nets) ● Neural networks ● Support vector machines ● Model ensembles ● Etc.
Evaluation ● Accuracy ● Precision and recall ● Squared error ● Likelihood ● Posterior probability ● Cost / Utility ● Margin ● Entropy ● K-L divergence ● Etc.
Optimization ● Combinatorial optimization – E.g.: Greedy search ● Convex optimization – E.g.: Gradient descent ● Constrained optimization – E.g.: Linear programming
Types of Learning ● Supervised (inductive) learning – Training data includes desired outputs ● Unsupervised learning – Training data does not include desired outputs ● Semi-supervised learning – Training data includes a few desired outputs ● Reinforcement learning – Rewards from sequence of actions
Inductive Learning ● Given examples of a function (X, F(X)) ● Predict function F(X) for new examples X – Discrete F(X) : Classification – Continuous F(X) : Regression – F(X) = Probability( X ): Probability estimation
Supervised vs. Unsupervised Unsupervised Supervised learning learning 17 Semi-supervised learning
Supervised learning
Supervised learning: Steps Training Training Labels Training Images Image Learned Training Features model Testing Image Learned Prediction Features model Test Image Slide credit: D. Hoiem and L. Lazebnik
Unsupervised learning
Fully supervised “Weakly” supervised Unsupervised Definition depends on task Slide credit: L. Lazebnik
Generalization Training set (labels known) Test set (labels unknown) How well does a learned model generalize from the data it was trained on to ● a new test set? Slide credit: L. Lazebnik
Generalization ● Components of generalization error Bias: how much the average model over all training sets differ from – the true model? ● Error due to inaccurate assumptions/simplifications made by the model Variance: how much models estimated from different training sets – differ from each other ● Underfitting: model is too “simple” to represent all the relevant class characteristics High bias and low variance – High training error and high test error – ● Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data Low bias and high variance – Overfitting Thriller! Low training error and high test error – https://youtu.be/DQWI1kvmwRg Slide credit: L. Lazebnik
Generative vs. Discriminative Classifiers Generative Models Discriminatjve Models ● Represent both the data • Learn to directly predict the and the labels labels from the data ● Often, makes use of • Ofuen, assume a simple conditional independence boundary (e.g., linear) and priors • Examples ● Examples – Logistjc regression Naïve Bayes classifier – – SVM Bayesian network – – Boosted decision trees ● Models of data may apply • Ofuen easier to predict a to future prediction label from the data than to problems model the data Slide credit: D. Hoiem
N e a r e s t N e i g h b o r C l a s s i fj e r •A s s i g n l a b e l o f n e a r e s t t r a i n i n g d a t a p o i n t t o e a c h t e s t d a t a p o i n t from Duda et al. Voronoi partitioning of feature space for two-category 2D and 3D data Source: D. Lowe
K-nearest neighbor x x o x x x x + o x o x o + o o o x2 x1
1-nearest neighbor x x o x x x x + o x o x o + o o o x2 x1
3-nearest neighbor x x o x x x x + o x o x o + o o o x2 x1
5-nearest neighbor x x o x x x x + o x o x o + o o o x2 x1
Using K-NN • Simple, a good one to try fjrst • With infjnite examples, 1-NN provably has error that is at most twice Bayes optjmal error
Sample Applications ● Web search ● Computational biology ● Finance ● E-commerce ● Space exploration ● Robotics (reasoning, planning, control, etc) ● Information extraction ● Social networks ● Debugging ● [Your favorite area]
Recommend
More recommend