Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING - - PowerPoint PPT Presentation

applied machine learning introduction 1 applied machine
SMART_READER_LITE
LIVE PREVIEW

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING - - PowerPoint PPT Presentation

APPLIED MACHINE LEARNING Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact Information of the Instructors APPLIED MACHINE LEARNING Practicalities Slides and exercises will be posted on the website of


slide-1
SLIDE 1

APPLIED MACHINE LEARNING

1

Applied Machine Learning Introduction

slide-2
SLIDE 2

APPLIED MACHINE LEARNING

Practicalities Contact Information of the Instructors

slide-3
SLIDE 3

APPLIED MACHINE LEARNING

Practicalities

Slides and exercises will be posted on the website of the class the day before class:

http://lasa.epfl.ch/teaching/lectures/ML_Msc/index.php

http://lasa.epfl.ch/  Teaching  Lectures  Applied Machine Learning

Solutions to the exercises will be posted a week after the exercise session.

slide-4
SLIDE 4

APPLIED MACHINE LEARNING

5

Class Format

  • Lectures: 9h15-11h00
  • Exercises (In class): 11h15-12h00
  • Lectures alternates with practice sessions held in

INN 218, see class schedule.

  • Attendance to the practice sessions is compulsory.
  • Attendance to exercise sessions is highly

recommended….

slide-5
SLIDE 5

APPLIED MACHINE LEARNING

6

TP / Practicals: Group formation

https://epfl.doodle.com/poll/ggwdf99a3nwk3ad8 (the link will be sent to you by email)

The first practical starts on week 3!

slide-6
SLIDE 6

APPLIED MACHINE LEARNING

7

Class Syllabus

slide-7
SLIDE 7

APPLIED MACHINE LEARNING

8

Grading Scheme

Practicals (25% of the grade) Performed in team of 2 or 3 people 2 reports – due October 16 and November 20 1 oral presentation – December 11 Written Exam (75% of the grade) 3 hours long Closed book Allowed 1 A4 pages with handwritten notes

slide-8
SLIDE 8

APPLIED MACHINE LEARNING

9

Pre-requisites

Linear Algebra  Vector / Matrix notation  Eigenvalue decomposition  Linear dependency Probability / Statistics  Probability Distribution Function  Covariance, Expectation  Joint, conditional probability  Correlation / Statistical Independence Optimization  Global versus local optima  Gradient descent  Method of Lagrange Multipliers

Brief recap of main algorithms in class

slide-9
SLIDE 9

APPLIED MACHINE LEARNING

10

  • To understand the basics of some key algorithms of Machine

Learning

  • To apply some of these algorithms with real data and, by so

doing, to understand the limitations of the algorithm for real- time systems

  • To raise in you enough interest for the field, so that you will

later try to learn more about it (advanced class at the doctoral school, search on-line, …)

  • To have more engineers apply these techniques for robust

control, signal processing, prediction, learning, etc.

Class Objectives

slide-10
SLIDE 10

APPLIED MACHINE LEARNING

11

Main learning outcomes: By the end of the course, the student must be able to:

  • Choose an appropriate ML method
  • Assess / Evaluate an appropriate ML method
  • Apply an appropriate ML method

Transversal skills

  • Write a scientific or technical report.
  • Make an oral presentation.

Learning Outcomes

slide-11
SLIDE 11

APPLIED MACHINE LEARNING

12

Today’s class format

  • Taxonomy and basic concepts in ML
  • Examples of ML applications
  • Introduction to best practice in ML
slide-12
SLIDE 12

APPLIED MACHINE LEARNING

13

What is Machine Learning?

Machine Learning encompasses a large set of algorithms that aim at inferring information from what is hidden. This process is often referred to as Data Mining.

slide-13
SLIDE 13

APPLIED MACHINE LEARNING

14

What is Machine Learning?

Machine Learning encompasses a large set of algorithms that aim at inferring information from what is hidden.

  • A. M. Bronstein, M. M. Bronstein, M. Zibulevsky, "On separation of semitransparent dynamic images from static background", Proc. Intl. Conf. on Independent Component Analysis

and Blind Signal Separation, pp. 934-940, 2006.

Independent Component Analysis (ICA) can decompose mixture of signals

slide-14
SLIDE 14

APPLIED MACHINE LEARNING

15

What is Machine Learning?

Recognizing human speech. Here this is the wave produced when uttering the word “allright”. The strength of ML algorithms is that they can apply to arbitrary data. They can recognize patterns from various sources of data. Modeling time series: Hidden Markov Models be used to recognize complex sounds, including human speech.

slide-15
SLIDE 15

APPLIED MACHINE LEARNING

16

What is Machine Learning?

Piano note (C5 – do) Same note played by an oboe (hautbois)

Classification: Two patterns that are different should still be grouped in the same class

slide-16
SLIDE 16

APPLIED MACHINE LEARNING

17

What is Machine Learning?

3 sample of 2-dimensional trajectories

  • f a robot’s hand along time.

Same trajectories displayed in space(without time). ML algorithms are often used to find the representation in which patterns that originally looked different, become similar.

slide-17
SLIDE 17

APPLIED MACHINE LEARNING

18

What is Machine Learning?

3 sample of 2-dimensional trajectories

  • f a robot’s hand along time.

Same trajectories displayed in space(without time). ML algorithms are often used to find the representation in which patterns that originally looked different, become similar. Principal Component Analysis can discover this projection  First algorithm we will see in class

slide-18
SLIDE 18

APPLIED MACHINE LEARNING

19

What is Machine Learning?

Helps compute automatically information that would take days to do by hand.

Noris, B., Nadel, J, Barker, M., Hadjikhani, N. and Billard, A. (2012) Investigating gaze of children with ASD in naturalistic settings. PLOS ONE.

The mapping can be done through support vector regression  An algorithm we will see in class

slide-19
SLIDE 19

APPLIED MACHINE LEARNING

20

  • 1940 – The Perceptron (Pitts & MacCulloch) !!
  • 1960 – The Perceptron (Minsky & Papert) !!
  • 1960 – “Bellman Curse of Dimensionality”
  • 1980 – Bounds on statistical estimators (C. Stone)
  • 1990 – Beginning of high dimensional data (Hundreds variables)
  • 2000 – High dimensional data (Thousands variables)

Machine Learning, History

slide-20
SLIDE 20

APPLIED MACHINE LEARNING

21

The history traces back to the very first step in Artificial Intelligence (AI). Machine Learning is a new way of thinking in AI that builds strongly on statistics:

  • probabilistic approach to modeling decision function
  • modeling of uncertainty in input, output and in the

parameters of the model!

  • data-driven approach as opposed to “knowledge-

driven”

Machine Learning, History

slide-21
SLIDE 21

APPLIED MACHINE LEARNING

22

1980 – The First Machine Learning Workshop was held at Carnegie-Mellon University in Pittsburgh. 1980 – Three consecutive issues of the International Journal of Policy Analysis and Information Systems were specially devoted to machine learning. 1981 - Hinton, Jordan, Sejnowski, Rumelhart, McLeland at UCSD

Back Propagation alg. PDP Book

1986 – The establishment of the Machine Learning journal. 1987 – The beginning of annual international conferences on machine learning (ICML). Snowbird ML conference 1988 – The beginning of regular workshops on computational learning theory (COLT). 1990’s – Explosive growth in the field of data mining, which involves the application of machine learning techniques.

Machine Learning, History

slide-22
SLIDE 22

APPLIED MACHINE LEARNING

23

Why and when do we need learning in Robotics?

slide-23
SLIDE 23

APPLIED MACHINE LEARNING

24

Peg and Hole Problem

A typical problem of Robotics

slide-24
SLIDE 24

APPLIED MACHINE LEARNING

25

A typical problem of Robotics

Peg and Hole Problem

slide-25
SLIDE 25

APPLIED MACHINE LEARNING

26

A: Engineer the environment

A typical problem of Robotics

slide-26
SLIDE 26

APPLIED MACHINE LEARNING

27

A: Engineer the environment B: Engineer the body

A typical problem of Robotics

slide-27
SLIDE 27

APPLIED MACHINE LEARNING

28

A: Engineer the environment B: Engineer the body C: Engineer the controller Systematic search  Adaptive control  Learning Machine!

A typical problem of Robotics

slide-28
SLIDE 28

APPLIED MACHINE LEARNING

29

Machine Learning: definitions

Machine Learning is the field of scientific study that concentrates on induction algorithms and on other algorithms that can be said to ``learn.''

Machine Learning Journal, Kluwer Academic

Machine Learning is an area of artificial intelligence involving developing

techniques to allow computers to “learn”. More specifically, machine learning is a method for creating computer programs by the analysis of data sets, rather than the intuition of engineers. Machine learning

  • verlaps heavily with statistics, since both fields study the analysis of
  • data. Webster Dictionary

Machine learning is a branch of statistics and computer science, which studies algorithms and architectures that learn from data sets. WordIQ

slide-29
SLIDE 29

APPLIED MACHINE LEARNING

30

To engineer the environment is not always desirable Rather, it is desirable to have a system that is Adaptable to different environments That can generalize across tasks

Kronander, Burdet and Billard, Learning PegI n Hole Insertionf rom Human Demonstrations, 2013

slide-30
SLIDE 30

APPLIED MACHINE LEARNING

31

To engineer the environment is not always desirable Rather, it is desirable to have a system that is adaptable to different environments can generalize across tasks

Machines that learn

slide-31
SLIDE 31

APPLIED MACHINE LEARNING

34

Problem I

A: Engineer the environment B: Engineer the body C: Engineer the controller ROWS 4-6 Make an autonomous robot car that can drive people from Lausanne to Geneva ROWS 1-3 Design an autonomous robot that distributes graded assignments to a class of students

slide-32
SLIDE 32

APPLIED MACHINE LEARNING

35

Generalization versus memorization An important feature of a learning system that differentiates it from a pure “memory” is its ability to generalize.

Key features for a good learning system

slide-33
SLIDE 33

APPLIED MACHINE LEARNING

36

Extracting features and generalizing

Calinon, D’halluin & Billard, Handling of multiple constraints and motion alternatives in a robot programming by demonstration framework, Humanoids’09.

slide-34
SLIDE 34

APPLIED MACHINE LEARNING

37

Learning versus Memorization

Learning implies generalizing. Generalizing consists of extracting key features from the data, matching those across data (to find resemblances) and storing a generalized representation of the data features that accounts best (according to a given metric) for all the small differences across data. Generalizing is the opposite of memorizing and often one might want to find a tradeoff between over-generalizing, hence losing information on the data, and over fitting, i.e. keeping more information than required. Generalization is particularly important in order to reduce the influence

  • f noise, introduced in the variability of the data.
slide-35
SLIDE 35

APPLIED MACHINE LEARNING

38

Imagine that you have been a witness to a robbery. How would you describe the thief to the police?

Problem II

slide-36
SLIDE 36

APPLIED MACHINE LEARNING

39

Feature extraction consists of extracting only a subset, the key components, of a situation, image, scene, conversation, etc.  Dimensionality reduction The key components depends on the context. Feature extraction presupposes the existence of a mechanism and of background knowledge to restore the complete situation, image, scene, conversation.

Features extraction

slide-37
SLIDE 37

APPLIED MACHINE LEARNING

40

Features extraction

Bio-Inspiration Extracting the relevant information Preattentive processing of orientation Sensitive to asymmetries

Nothdurft, H.-C. (1991a). Ophthalmology and Visual Science, 32(4), 714.

slide-38
SLIDE 38

APPLIED MACHINE LEARNING

41

Features extraction

Bio-Inspiration Our eyes pick up rapidly what stands out: asymmetries or uncoordinated pattern of motion

slide-39
SLIDE 39

APPLIED MACHINE LEARNING

42

Features extraction

Bio-Inspiration

A Model of Saliency-Based Visual Attention for Rapid Scene Analysis. Itti et al 2008

slide-40
SLIDE 40

APPLIED MACHINE LEARNING

44

Features extraction

Bio-Inspired IT Systems

Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub , Ruesch et al, ICRA 2008

slide-41
SLIDE 41

APPLIED MACHINE LEARNING

45

Problem III

You must solve one of the two problems below. You are provided with the following data: Age, gender, nationality, body weight, hair color, favorite hobbies, weekly timetable of all EPFL students and EPFL professors Which data do you remove from the input to your system? ROWS 4-6 Predict number of people in the metro between 8h00 and 8h15 ROWS 1-3 Predict number of meals sold by each EPFL cafeteria at lunch

slide-42
SLIDE 42

APPLIED MACHINE LEARNING

46

Machine learning algorithms are often classified according to the type of computation they can perform on a given dataset. Common types of computation include:

  • Classification: learn to put instances into pre-defined classes
  • Association: learn relationships between the attributes
  • Clustering: discover classes of instances that belong together
  • Time series prediction: learn to predict a numeric quantity

instead of a class Other terms: On-line Learning, Connectionist Models

Taxonomy in ML

slide-43
SLIDE 43

APPLIED MACHINE LEARNING

47

  • Supervised learning – where the algorithm learns a function or

model that maps a set of inputs to a set of desired outputs.

  • Unsupervised learning – where the algorithm learns a model that

represents a set of inputs without any feedback (no desired output, no external reinforcement).

  • Reinforcement learning – where the algorithm learns a

mechanism that generates a set of outputs from one input in order to maximize a reward value (external and delayed feedback).

  • Learning to learn – where the algorithm learns its own inductive

bias based on previous experiences

Taxonomy in ML

slide-44
SLIDE 44

APPLIED MACHINE LEARNING

48

  • Supervised learning relates to a vast group of methods by

which one estimates a model from a set of examples,  The system is given the desired output.

  • When these examples are provided by a human expert,

this is referred to robot learning from demonstration; robot programming by demonstration.

Supervised learning

slide-45
SLIDE 45

APPLIED MACHINE LEARNING

( )

{ } ( )

2

Goal: estimate function of state evolution , , Make

  • bservations of the state of the system

, , 1... . Estimate through mean-square error: min

i i i i i

x f x x x N x x i N f f x x = ∈ = −

    

1

x

2

x

x*: target

Supervised learning: example

slide-46
SLIDE 46

APPLIED MACHINE LEARNING

Supervised learning

Khansari Zadeh, S. M., Kronander, K. and Billard, A. (2012) Learning to Play Minigolf. Advanced Robotics.

slide-47
SLIDE 47

APPLIED MACHINE LEARNING

51

Noris, B., Nadel, J, Barker, M., Hadjikhani, N. and Billard, A. (2012) Investigating gaze of children with ASD in naturalistic settings. PLOS ONE.

Supervised learning

Where do the eyes look? Map image of the eyes to point in the camera image

slide-48
SLIDE 48

APPLIED MACHINE LEARNING

52

What is Machine Learning?

What is sometimes impossible to see for humans is easy for ML to pick.

Noris, B., Keller, J-B. and Billard, A. (2011) A Wearable Gaze Tracking System for Children in Unconstrained Environments. Computer Vision and Image Understanding.

Exploit information not only on the pupil, cornea, but also on wrinkles, eyelids and eyelashed pattern to infer gaze direction.

Support Vector Regression can be used to learn this mapping

slide-49
SLIDE 49

APPLIED MACHINE LEARNING

53

What is Machine Learning?

Noris, B., Keller, J-B. and Billard, A. (2011) A Wearable Gaze Tracking System for Children in Unconstrained Environments. Computer Vision and Image Understanding.

Support Vector Regression can be used to learn this mapping

20 20,

1...50

i x

x i ∈ = 

Input: 50 images of the eyes, In grey color 20x20 pixels Output: 50 images of the scene, In grey color 240x320 pixels

( )

y f x =

240 320,

1...50

i x

y i ∈ = 

Learn a function f:

slide-50
SLIDE 50

APPLIED MACHINE LEARNING

54

Unsupervised Learning

Unsupervised learning refers to a variety of methods by which a pair of signals y and x are associated but there is no explicit labeling as to which y should be associated to which x. This is often done through association, i.e. through associative learning.

slide-51
SLIDE 51

APPLIED MACHINE LEARNING

55

Associative Learning

Let us play a little game. When I say “satelite”, what do you think of? When I say “EPFL”, what do you think of? When I say “exam”, what do you think of?

slide-52
SLIDE 52

APPLIED MACHINE LEARNING

56

Associative Learning

Associative learning is multi-modal: Visual, auditory, proprioceptive, etc. It can represent

  • A concatenation of features

Satelite  Comics, Beer

  • Events bearing a close temporality

EPFL TSOL, Classes

  • A cause-to-effect relationship

Exam  Grade, Vacation…

slide-53
SLIDE 53

APPLIED MACHINE LEARNING

57

Associative Learning for Learning Word-Objects Relations

  • H. Kozima, H. Yano, A robot that learns to communicate with human caregivers, in: Proceedings of the International

Workshop on Epigenetic Robotics, 2001

slide-54
SLIDE 54

APPLIED MACHINE LEARNING

58

Reinforcement Learning (RL)

RL tries to infer the optimal path to the goal, through a process of Trial-and-error, so as to maximize the reward. Reinforcement learning is a tedious learning method. It is slow and is functional only in well-defined problems with small search space.

slide-55
SLIDE 55

APPLIED MACHINE LEARNING

59

Reinforcement Learning

Robotics Applications

Atkeson & Schaal, Learning how to swing an inverted pendulum, ICRA 1997.

The robot tracks the position of the two colored ball. Using a model of the inverse pendulum dynamics, it learns which joint angle displacement to produce to ensures that the ball remains in equilibrium.

slide-56
SLIDE 56

APPLIED MACHINE LEARNING

60

Reinforcement Learning

Robotics Applications

Kober, J.; Peters, J. (2011). Policy Search for Motor Primitives in Robotics, Machine Learning, 84, 1-2, pp.171-203.

Learning the “ball in a cup” task; A couple of successful examples are provided by a human; the robot then performs guided trials so as to optimize the reward (putting the ball in the cup).

slide-57
SLIDE 57

APPLIED MACHINE LEARNING

62

Problem IV

Supervised learning, unsupervised learning or reinforcement learning? 1: Learning to ride a bicycle 2: Predicting the length of the queue at the cafeteria 11h30-12h30 3: Relating demographics (age/weight/nationality) with chocolate consumption 4: Estimating monthly returns of stocks in the financial market

slide-58
SLIDE 58

APPLIED MACHINE LEARNING

63

Summary

Machine Learning encompasses a large area of works which cannot all be covered here. We will focus on a subset of algorithms that form the foundation of most current advances in machine learning. We however omit several topics, which are covered in other courses on machine learning at EPFL.

slide-59
SLIDE 59

APPLIED MACHINE LEARNING

64

Some Machine Learning Resources

On-line resources:

  • http://www.machinelearning.org/index.html
  • http://www.pascal-network.org/ Network of excellence on Pattern Recognition, Statistical Modelling

and Computational Learning (summer schools and workshops)

Journals:

  • Machine Learning Journal, Kluwer Publisher
  • IEEE Transactions on Signal processing
  • IEEE Transactions on Pattern Analysis
  • IEEE Transactions on Pattern Recognition
  • The Journal of Machine Learning Research

Conferences:

  • ICML: int. conf. on machine learning
  • Neural Information Processing Conference – on-line repository of all research papers,

www.nips.org

slide-60
SLIDE 60

APPLIED MACHINE LEARNING

65

LASA

slide-61
SLIDE 61

APPLIED MACHINE LEARNING

66

Software for the class