MA2823: Introductjon to Machine Learning CentraleSuplec Fall 2017 - - PowerPoint PPT Presentation

ma2823 introductjon to machine learning
SMART_READER_LITE
LIVE PREVIEW

MA2823: Introductjon to Machine Learning CentraleSuplec Fall 2017 - - PowerPoint PPT Presentation

MA2823: Introductjon to Machine Learning CentraleSuplec Fall 2017 Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Course material & contact


slide-1
SLIDE 1

MA2823: Introductjon to Machine Learning

CentraleSupélec — Fall 2017 Chloé-Agathe Azencot

Centre for Computatjonal Biology, Mines ParisTech

chloe-agathe.azencott@mines-paristech.fr

slide-2
SLIDE 2

2

  • Course material & contact

http://tinyurl.com/ma2823-2017 chloe-agathe.azencott@mines-paristech.fr

Slides thanks to Ethem Alpaydi, Matuhew Blaschko, Trevor Hastje, Rob Tibshirani and Jean-Philippe Vert.

slide-3
SLIDE 3

3

What is (Machine) Learning

?

slide-4
SLIDE 4

4

Why Learn?

  • Learning:

Modifying a behavior based on experience [F. Benureau]

  • Machine learning: Programming computers to

– Model phenomena – by means of optjmizing an objectjve functjon – using example data.

slide-5
SLIDE 5

5

Why Learn?

  • There is no need to “learn” to calculate payroll.
  • Learning is used when

– Human expertjse does not exist (bioinformatjcs); – Humans are unable to explain their expertjse (speech

recognitjon, computer vision);

– Complex olutjons change in tjme (routjng computer networks).

Classical program Machine learning program Answers Rules Data Rules Data Answers

slide-6
SLIDE 6

6

What about AI?

slide-7
SLIDE 7

7

Artjfjcial Intelligence

ML is a subfjeld of Artjfjcial Intelligence

– A system that lives in a changing environment must have

the ability to learn in order to adapt.

– ML algorithms are building blocks that make computers

behave more intelligently by generalizing rather than merely storing and retrieving data (like a database system would do).

slide-8
SLIDE 8

8

Learning objectjves

  • Defjne machine learning
  • Given a problem

– Decide whether it can be solved with machine learning – Decide as what type of machine learning problem you

can formalize it (unsupervised – clustering, dimension reductjon, supervised – classifjcatjon, regression?)

– Describe it formally in terms of design matrix, features,

samples, and possibly target.

  • Defjne a loss functjon (supervised settjng)
  • Defjne generalizatjon.
slide-9
SLIDE 9

9

What is machine learning?

  • Learning general models from partjcular examples

(data)

– Data is (mostly) cheap and abundant; – Knowledge is expensive and scarce.

  • Example in retail:

From customer transactjons to consumer behavior People who bought “Game of Thrones” also bought “Lord of the Rings” [amazon.com]

  • Goal: Build a model that is a good and useful

approximatjon to the data.

slide-10
SLIDE 10

10

What is machine learning?

  • Optjmizing a performance criterion using example

data or past experience.

  • Role of Statjstjcs:

Build mathematjcal models to make inference from a sample.

  • Role of Computer Science: Effjcient algorithms to

– Solve the optjmizatjon problem; – Represent and evaluate the model for inference.

slide-11
SLIDE 11

11

Zoo of ML Problems

slide-12
SLIDE 12

12

Unsupervised learning

Data Images, text, measurements,

  • mics data...

ML algo Data!

Learn a new representatjon of the data

X p n

slide-13
SLIDE 13

13

Dimensionality reductjon

Data

Find a lower-dimensional representatjon

Data

X p n

Images, text, measurements,

  • mics data...

X m n ML algo

slide-14
SLIDE 14

14

Dimensionality reductjon

Data

Find a lower-dimensional representatjon

– Reduce storage space & computatjonal tjme – Remove redundances – Visualizatjon (in 2 or 3 dimensions) and interpretability.

ML algo

Data

slide-15
SLIDE 15

15

Clustering

Data

Group similar data points together

– Understand general characteristjcs of the data; – Infer some propertjes of an object based on how it

relates to other objects.

ML algo

slide-16
SLIDE 16

16

Clustering: applicatjons

– Customer segmentatjon

Find groups of customers with similar buying behaviors.

– Topic modeling

Groups documents based on the words they contain to identjfy common topics.

– Image compression

Find groups of similar pixels that can be easily summarized.

– Disease subtyping (cancer, mental health)

Find groups of patjents with similar pathologies (molecular or symptomes level).

slide-17
SLIDE 17

17

Supervised learning

Data

ML algo

Make predictjons

Labels

Predictor

decision function

X p n y n

slide-18
SLIDE 18

18

Classifjcatjon

Make discrete predictjons

Data

ML algo

Labels

Predictor

slide-19
SLIDE 19

19

Classifjcatjon

Make discrete predictjons

Binary classifjcatjon

Data

ML algo

Labels

Predictor

slide-20
SLIDE 20

20

Classifjcatjon

Make discrete predictjons

Binary classifjcatjon Multj-class classifjcatjon

Data

ML algo

Labels

Predictor

slide-21
SLIDE 21

21

Classifjcatjon

human contact Cat Dog good eater

slide-22
SLIDE 22

22

Training set D

+

  • human contact

Cat Dog good eater

slide-23
SLIDE 23

23

Classifjcatjon: Applicatjons

– Face recognitjon

Identjfy faces independently of pose, lightjng, occlusion (glasses, beard), make-up, hair style.

– Vehicle identjfjcatjon (self-driving cars) – Character recognitjon

Read letuers or digits independently of difgerent handwritjng styles.

– Sound recognitjon

Which language is spoken? Who wrote this music? What type

  • f bird is this?

– Spam detectjon – Precision medicine

Does this sample come from a sick or healthy person? Will this drug work on this patjent?

slide-24
SLIDE 24

24

Regression

Make contjnuous predictjons

Data

ML algo

Labels

Predictor

slide-25
SLIDE 25

25

Regression

time of day train occupancy

slide-26
SLIDE 26

26

Regression

time of day train occupancy

slide-27
SLIDE 27

27

Regression: Applicatjons

– Click predictjon

How many people will click on this ad? Comment on this post? Share this artjcle on social media?

– Load predictjon

How many users will my service have at a given tjme?

– Algorithmic trading

What will the price of this share be?

– Drug development

What is the binding affjnity between this drug candidate and its target? What is the sensibility of the tumor to this drug?

slide-28
SLIDE 28

28

Supervised learning settjng

X y

p n

data matrix design matrix

  • utcome

target label

  • bservatjons

samples data points features variables descriptors atuributes

Binary classifjcatjon: Multj-class classifjcatjon: Regression:

slide-29
SLIDE 29

29

Hypothesis class

  • Hypothesis class

– The space of possible decision functjons we are

considering

– Chosen based on our beliefs about the problem

slide-30
SLIDE 30

30

Hypothesis class

  • Hypothesis class

– The space of possible decision functjons we are

considering

– Chosen based on our beliefs about the problem

family car not family car

x2: Engine power x1: Price

slide-31
SLIDE 31

31

Hypothesis class

  • Hypothesis class

– The space of possible decision functjons we are

considering

– Chosen based on our beliefs about the problem

family car not family car

What shape do you think the discriminant should take?

x2: Engine power x1: Price

?

slide-32
SLIDE 32

32

Hypothesis class

  • Hypothesis class

– Belief: the decision functjon is a rectangle

x2: Engine power x1: Price

family car not family car

p1 e1 p2 e2

slide-33
SLIDE 33

33

Loss functjon

  • Loss functjon (or cost functjon, or risk):

Quantjfjes how far the decision functjon is from the truth (= oracle).

  • E.g.

?

slide-34
SLIDE 34

34

Loss functjon

  • Loss functjon (or cost functjon, or risk):

Quantjfjes how far the decision functjon is from the truth (= oracle).

  • E.g.

?

slide-35
SLIDE 35

35

Loss functjon

  • Loss functjon (or cost functjon, or risk):

Quantjfjes how far the decision functjon is from the truth (= oracle).

  • Empirical risk on dataset D
slide-36
SLIDE 36

36

Supervised learning: 3 ingredients

  • Chose an optjmizatjon procedure

A good and useful approximatjon

  • Chose a hypothesis class
  • Parametric methods — e.g.
  • Non-parametric methods — e.g. f(x) is the label of the

point closest to x.

  • Chose a loss functjon L

Empirical error:

slide-37
SLIDE 37

37

Generalizatjon

A good and useful approximatjon

  • It’s easy to build a model that performs well on the

training data

  • But how well will it perform on new data?
  • “Predictjons are hard, especially about the future” — Niels Bohr.

– Learn models that generalize well. – Evaluate whether models generalize well.

slide-38
SLIDE 38

38

Patern recognitjon Data mining Knowledge discovery in databases Signal processing Optjmizatjon Artjfjcial intelligence Inductjon Discriminant analysis Inference Data science Computer science Electrical engineering Statjstjcs Engineering Business Big data

http://www.kdnuggets.com/2016/11/machine-learning-vs-statistics.html

slide-39
SLIDE 39

39

Learning objectjves

Afuer this course, you should be able to

– Identjfy problems that can be solved by machine

learning;

– Formulate your problem in machine learning terms – Given such a problem, identjfy and apply the most

appropriate classical algorithm(s);

– Implement some of these algorithms yourself; – Evaluate and compare machine learning algorithms for a

partjcular task.

slide-40
SLIDE 40

40

Course Syllabus

  • Sep 29
  • 1. Introductjon
  • 2. Convex optjmizatjon
  • Oct 2
  • 3. Dimensionality reductjon

Lab: Principal component analysis + Jupyter, pandas, and scikit-learn.

  • Oct 6
  • 4. Model selectjon

Lab: Convex optjmizatjon with scipy.optjmize

  • Oct 13
  • 5. Bayesian decision theory

Lab: Intro to Kaggle challenge

  • Oct 20
  • 6. Linear regression

Lab: Linear regression

slide-41
SLIDE 41

41

  • Nov 10
  • 7. Regularized linear regression

Lab: Regularized linear regression

  • Nov 17
  • 8. Nearest-neighbor approaches

Lab: Nearest-neighbor approaches

  • Nov 24
  • 9. Tree-based approaches

Lab: Tree-based approaches

  • Dec 01
  • 10. Support vector machines

Lab: Support vector machines

  • Dec 08
  • 11. Neural networks

Deep learning (Joseph Boyd) + Bioimage informatjcs applicatjons (Peter Naylor)

  • Dec 15
  • 12. Clustering

Lab: Clustering

slide-42
SLIDE 42

42

Labs

Bring your laptop! There will be power plugs and wifj. Instructjons on how to set up your computer:

https://github.com/chagaz/ma2823_2017 and in the syllabus

  • TAs:

– Josehp Boyd joseph.boyd@mines-paristech.fr – Benoît Playe benoit.playe@mines-paristech.fr – Mihir Sahasrabudhe mihir.sahasrabudhe@centralesupelec.fr

slide-43
SLIDE 43

43

challenge project

How Many Shares? Challenge

https://www.kaggle.com/c/how-many-shares

  • Predict the number of shares on social media for

artjcles from the same media site

– Regression – From artjcle length, topics, subjectjvity and much more.

  • Evaluatjon on

– Insights learned – Predictjon performance.

slide-44
SLIDE 44

44

Evaluatjon

  • Final exam (60 pts)

December 22, 2016

– Pen and paper – Closed book

  • Kaggle project (30 pts)

December 22, 2016

– Writuen report (25 pts) – Positjon in the leaderboard (5pts) – Introductjon: October 13, 2017

  • Homework (10 pts) 1 assignment each week

– To get the points: turn it in!

slide-45
SLIDE 45

45

Homework

  • One assignment per week

– Similar to the questjons you'll be asked at the exam – Turn it in online

http://tinyurl.com/ma2823-2017-hw

– Solutjon will be posted the day afuer the due date – Worth 1pt if you turn it in.

slide-46
SLIDE 46

46

Resources

  • Course website

http://tinyurl.com/ma2823-2017

– Syllabus – 2 days before the lecture: printable lecture handout – Shortly afuer the lecture:

  • HW Problem n+1
  • Lecture slides
  • HW Solutjon n.
  • GitHub repository (labs)

https://github.com/chagaz/ma2823_2017

slide-47
SLIDE 47

47

Textbooks

  • A Course in Machine Learning

Hal Daumé III

http://ciml.info/dl/v0_99/ciml-v0_99-all.pdf

  • The Elements of Statjstjcal Learning

Trevor Hastje, Robert Tibshirani and Jerome Friedman

http://web.stanford.edu/~hastie/ElemStatLearn/

  • Learning with Kernels: Support Vector Machines,

Regularizatjon, Optjmizatjon and Beyond Bernhard Schölkopf and Alex Smola

http://agbs.kyb.tuebingen.mpg.de/lwk/

  • Convex Optjmizatjon

Stephen Boyd and Lieven Vendenberghe

https://web.stanford.edu/~boyd/cvxbook/

slide-48
SLIDE 48

48

Resources: Datasets

  • UCI Repository:

http://www.ics.uci.edu/~mlearn/MLRepository.html

  • KDnuggets Datasets:

http://www.kdnuggets.com/datasets/index.html

  • ImageNet: http://www.image-net.org/
  • Enron Email Dataset: http://www.cs.cmu.edu/~enron/
  • Million Song Dataset:

http://labrosa.ee.columbia.edu/millionsong/

  • IMDB Data: http://www.imdb.com/interfaces
  • Données publiques françaises: https://www.data.gouv.fr/
  • TunedIT: http://www.tunedit.org/
  • Knoema: https://knoema.com/
slide-49
SLIDE 49

49

Resources: Journals

  • Journal of Machine Learning Research

http://jmlr.csail.mit.edu/

  • IEEE Transactjons on Patern Analysis and Machine Intelligence

https://www.computer.org/portal/web/tpami

  • Annals of Statjstjcs http://imstat.org/aos/
  • Journal of the American Statjstjcal Associatjon

http://www.tandfonline.com/toc/uasa20/current

  • Machine Learning http://link.springer.com/journal/10994
  • Neural Computatjon http://www.mitpressjournals.org/loi/neco
  • Neural Networks

http://www.journals.elsevier.com/neural-networks

  • IEEE Transactjons on Neural Networks and Learning Systems

http://cis.ieee.org/ieee-transactions-on-neural-networks-and- learning-systems.html

slide-50
SLIDE 50

50

Resources: Conferences

  • Internatjonal Conference on Machine Learning (ICML)

http://www.icml.cc/

  • Neural Informatjon Processing Systems (NIPS) http://www.nips.cc/
  • Internatjonal Conference on Learning Representatjons (ICLR)

http://www.iclr.cc/

  • European Conference on Machine Learning (ECML)

http://www.ecmlpkdd.org/

  • Internatjonal Conference on AI & Statjstjcs (AISTATS)

http://www.aistats.org/

  • Uncertainty in Artjfjcial Intelligence (UAI) http://www.auai.org/
  • Computatjonal Learning Theory (COLT)

http://www.learningtheory.org/past-conferences-2/

  • Knowledge Discovery and Data Mining (KDD) http://www.kdd.org/
  • Internatjonal Conference on Patern Recognitjon (ICPR)

http://www.icpr2017.org/