Machine Learning: Basic Principles Teaching demonstration Kalle - - PowerPoint PPT Presentation
Machine Learning: Basic Principles Teaching demonstration Kalle - - PowerPoint PPT Presentation
Machine Learning: Basic Principles Teaching demonstration Kalle Palomki Department of Signal Processing and Acoustics Aalto University Content 1. Goal 2. Machine learning: definition 3. Classification an important machine learning
Content
- 1. Goal
- 2. Machine learning: definition
- 3. Classification – an important machine learning
approach
- 4. A machine learning problem
Hands on problem solving Demonstration
- 5. Summary
Goal
Part of introductory sessions adjusted to 20
minutes
4th year students with no background in
machine learning
Start building understanding of machine
learning by
Concrete examples Solving simple hands on problems
Machine learning - definition
Wikipedia: “Machine learning deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions”
http://oldentech.files.wordpress.com/2010/07/1028528_29880053.jpg http://www.paranormalpeopleonline.com/boskop-man-big-brains-and-increased-intelligence/
Common sense definition: machines that learn a little like the brains
Internet and machine learning - far beyond the single brains capacity
http://www.slate.com/blogs/future_tense/2014/10/24/internet_sleep_new_research_from_usc_shows_internet_activity_changes_in.html
Machine learning categories
Supervised learning
Classification
Unsupervised learning
Clustering
Reinforcement learning
Classifier
Classifier
http://upload.wikimedia.org/wikipedia/commons/3/39/Leonardo_da_Vinci_043-mod.jpg
http://ecx.images-amazon.com/images/I/51f9cnKx90L._SY300_.jpg
Problem
Lisa is a tailor...
Lisa makes uniforms
Salvation army uniforms: men have trousers, women skirts
http://www.bilerico.com/2009/03/Army%20Uniforms.jpg
Sometimes she makes mistakes
These should be skirts.
Once she made a skirt for prince Charles!
http://i.dailymail.co.uk/i/pix/2009/05/21/article-1186234-050B9CB2000005DC-834_224x423.jpg
Hip Waist Hip Waist
waist (cm) hip (cm) gender 29.6 34.4 Female 28.9 34.4 Female 31.3 34.5 ??? 30.8 33.7 Male 29.8 34.5 ??? 32.5 33.6 Male 30.6 34.4 ??? ..... ..... .......
Here is Lisa’s data
*
Missing gender information:
* *
Female samples: Red Male samples : Blue
Some help to Lisa?
Discuss in pairs 2 min:
How would you approach this problem? What kind of algorithm would you design? Try to come up with some ideas please! Use the picture provided to assist your discussion
K-nearest neighbours algorithm
- 1. Determine K = number of nearest neighbours
- 2. Calculate the distance between test sample all the
training samples
Use euclidean distance measure:
- ,
- 3. Sort the distances and determine nearst neigbours
- 4. Gather the categories of the nearest neighbors
- 5. Use the majority voting to predict the test sample class
http://people.revoledu.com/kardi/tutorial/KNN/
*
Missing gender information:
* *
Female samples: Red Male samples : Blue
*
Missing gender information:
*
Female samples: Red Male samples : Blue
K = 3
*
K-nearest neighbours algorithm
- 1. Determine K = number of nearest neighbours
- 2. Calculate the distance between test sample all the
training samples
Use euclidean distance measure:
- ,
- 3. Sort the distances and determine nearst neigbours
- 4. Gather the categories of the nearest neighbors
- 5. Use the majority voting to predict the test sample class
http://people.revoledu.com/kardi/tutorial/KNN/
Euclidean distance
- ,
- http://people.revoledu.com/kardi/tutorial/KNN/
Test sample
Training samples Euclidean distance
Euclidean distance
- ,
- http://people.revoledu.com/kardi/tutorial/KNN/
Test sample
Training samples Eucidean distance Training sample index
Euclidean distance
- ,
- http://people.revoledu.com/kardi/tutorial/KNN/
Data dimension Test sample
Training samples Eucidean distance Training sample index Dimension index
Euclidean distance
- ,
- http://people.revoledu.com/kardi/tutorial/KNN/
Data dimension M=2 Test sample
Training samples Eucidean distance Training sample index Dimension index
*
Test sample Female samples of training data Male samples of training data Euclidean distance: d1
*
Test sample Female samples of training data Male samples of training data d2
*
Test sample Female samples of training data Male samples of training data d3
*
Test sample Female samples of training data Male samples of training data d4
*
Test sample Female samples of training data Male samples of training data d5
*
Test sample Female samples of training data Male samples of training data d6
K-nearest neighbours algorithm
- 1. Determine K = number of nearest neighbours
- 2. Calculate the distance between test sample all the training
samples
Use euclidean distance measure:
- ,
- 3. Sort the distances and determine nearest neigbours
- 4. Gather the categories of the nearest neighbors
- 5. Use the majority voting to predict the test sample class
http://people.revoledu.com/kardi/tutorial/KNN/
*
Test sample Female samples of training data Male samples of training data 3 nearest neighbors
K-nearest neighbours algorithm
- 1. Determine K = number of nearest neighbours
- 2. Calculate the distance between test sample all the training
samples
Use euclidean distance measure:
- ,
- 3. Sort the distances and determine nearest neigbours
- 4. Gather the categories of the nearest neighbors
- 5. Use the majority voting to predict the test sample
class
http://people.revoledu.com/kardi/tutorial/KNN/
*
Test sample Female samples of training data Male samples of training data 3 nearest neighbors
All 3 neighbors were Male Class was male
*
Test sample Female samples of training data Male samples of training data 3 nearest neighbors
*
Test sample Female samples of training data Male samples of training data 3 nearest neighbors
2 neighbors Female 1 neighbor Male More Females than Males Class is Female
Classification problem Lisa has lost gender information of one of her customers, and does not know whether to make skirt or trousers. She is planning to throw a coin. Can you help her to make a better decision? The customer who is missing gender information: Gender ------, Waist 28, Hip 34, gender waist (cm) hip (cm) Male 28 32 Male 33 35 Female 27 33 Female 31 36
http://www.dcs.gla.ac.uk/~srogers/firstcourseml/matlab/chapter5/knnexample.html#1 Molarius A, Seidell JC, Sans S, Tuomilehto J, Kuulasmaa K. (1999) "Waist and hip circumferences, and waist-hip ratio in 19 populations of the WHO MONICA Project", International Journal of Obesity and Related Metabolic Disorders :J. Internat. Association Study Obesity, 23:116-125.
Gender waist (cm) hip (cm) distance Male 28 32 (28-28)2+(34-32)2=4 Male 33 35 (28-33)2+(34-35)2=26 Female 27 33 (28-27)2+(34-33)2=2 Female 31 36 (28-31)2+(34-36)2=13
Solution
Test sample 28, 34
Gender waist (cm) hip (cm) distance Male 28 32 (28-28)2+(34-32)2=4 Male 33 35 (28-33)2+(34-35)2=26 Female 27 33 (28-27)2+(34-33)2=2 Female 31 36 (28-31)2+(34-36)2=13
Solution
Test sample 28, 34
Gender waist (cm) hip (cm) Distance rank Male 28 32 (28-28)2+(34-32)2=4 2 Male 33 35 (28-33)2+(34-35)2=26 4 Female 27 33 (28-27)2+(34-33)2=2 1 Female 31 36 (28-31)2+(34-36)2=13 3
Solution
Test sample 28, 34
Gender waist (cm) hip (cm) Distance rank belongs to the neighborhood (Yes or No) Male 28 32 (28-28)2+(34-32)2=4 2 Yes Male 33 35 (28-33)2+(34-35)2=26 4 No Female 27 33 (28-27)2+(34-33)2=2 1 Yes Female 31 36 (28-31)2+(34-36)2=13 3 Yes
Solution
Test sample 28, 34
Gender waist (cm) hip (cm) Distance rank belongs to the neighborhood (Yes or No) gender if in neigborhood Male 28 32 (28-28)2+(34-32)2=4 2 Yes Male Male 33 35 (28-33)2+(34-35)2=26 4 No ‐‐‐‐‐ Female 27 33 (28-27)2+(34-33)2=2 1 Yes Female Female 31 36 (28-31)2+(34-36)2=13 3 Yes Female
Solution
Test sample 28, 34
Male 1 Female 2 Number of Female > Number of Male Class: Female
Summary
- We addressed briefly principles of machine learning
1. First we defined the machine learning 2. Classification as an important machine learning task 3. Solved a hands on problem of classification utilizing K- nearest neighbour algorithm
- Check out my website for
- These slides
- Exercise
- The code on the decision border calculations in previous slides
http://users.spa.aalto.fi/kpalomak/demonstration_session
What next
Supervised learning
Classification
Unsupervised learning
Clustering
Reinforcement learning
http://cs.nyu.edu/~roweis/data.html
Face recognition
Speech recognition
Spectrum over time for “cat”