Introduction to Machine Learning Introduction Prof. Andreas Krause - - PowerPoint PPT Presentation

introduction to machine learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Machine Learning Introduction Prof. Andreas Krause - - PowerPoint PPT Presentation

Introduction to Machine Learning Introduction Prof. Andreas Krause Institute for Machine Learning (las.ethz.ch) What is Machine Learning I: An example Classify email messages as Spam or Non Spam Classical Approach : manual rules


slide-1
SLIDE 1

Introduction to Machine Learning

Introduction

  • Prof. Andreas Krause

Institute for Machine Learning

(las.ethz.ch)

slide-2
SLIDE 2

What is Machine Learning I: An example

Classify email messages as “Spam” or “Non Spam” Classical Approach: manual rules

IF text body contains “Please login here” THEN classify as “spam” ELSE “non-spam”

Machine Learning: Automatic discovery of rules from training data (examples)

2
slide-3
SLIDE 3

What is ML II: One Definition [Tom Mitchell]

„A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E“

3
slide-4
SLIDE 4

Our Digital Society and the Information Technology value chain

4

Machine Learning plays a core role in this value chain

Information Knowledge

Activation of the mTOR Signaling Pathway in Renal Clear Cell Carcinoma. Robb et al., J Urology 177:346 (2007)

Value Data

slide-5
SLIDE 5

Related disciplines

5

neuro- informatics algorithms & optimization information theory statistics philosophy epistemiology causality

machine learning

slide-6
SLIDE 6 6

Overview

Introductory course Preparation for M.Sc. Level ML courses Two main topics

Supervised learning Unsupervised learning

Algorithms, models & applications Handouts etc. on course webpage

https://las.ethz.ch/teaching/introml-s20 Old slides available at …/introml-s19 Password can be retrieved from within ETH network

Textbooks listed on course webpage (some available online)

slide-7
SLIDE 7

Prerequisites

Basic knowledge in linear algebra, calculus and probability If you need a refresher:

Part I of ”Mathematics for Machine Learning” by Deisenroth, Faisal, Ong Available online at https://mml-book.com/

Basic programming (in Python)

Links to tutorials on website

If you plan not to complete the course, please deregister!

7
slide-8
SLIDE 8

Syllabus

Linear regression Linear classification Kernels and the kernel trick Neural networks & Deep Learning Unsupervised learning The statistical perspective Statistical decision theory Discriminative vs. generative modeling Bayes' classifiers Bayesian approaches to unsupervised learning Generative modeling with neural networks

8
slide-9
SLIDE 9

After participating in this course you will

Understand basic machine learning ideas & concepts Be able to apply basic machine learning algorithms Know how to validate the output of a learning method Have some experience using machine learning on real data Learn what role machine learning plays in decision making under uncertainty

9
slide-10
SLIDE 10

Relation to other ML Courses @ ETHZ

Advanced Machine Learning (Fall)

Continuation and advanced topics

Deep Learning (Fall)

Deep neural networks and their applications

Probabilistic Artificial Intelligence (Fall)

Reasoning and decision making under uncertainty

Computational Intelligence Lab (Spring)

Matrix Factorization, Recommender Systems, projects

Statistical Learning Theory (Spring)

Theoretical foundations; model validation

Guarantees for Machine Learning (Spring) Computational Statistics (D-MATH, Spring)

10
slide-11
SLIDE 11 11 11

People

Instructor: Andreas Krause (krausea@ethz.ch) Teaching assistants:

Head TA: Philippe Wenk (wenkph@ethz.ch) Andisheh Amrollahi, Nemanja Bartolovic, Ilija Bogunovic, Zalán Borsos, Charlotte Bunne, Sebastian Curi, Radek Danecek, Gideon Dresdner, Joanna Ficek, Vincent Fortuin, Carl Johann Simon Gabriel, Shubhangi Gosh, Nezihe Merve Gürel, Matthias Hüser, Jakob Jakob, Mikhail Karasikov, Kjong Lehmann, Julian Mäder, Mojmír Mutný, Harun Mustafa, Anastasia Makarova, Gabriela Malenova, Mohammad Reza Karimi, Max Paulus , Laurie Prelot, Jonas Rothfuss, Stefan Stark, Jingwei Tang, Xianyao Zhang

slide-12
SLIDE 12

Video-recording

Lectures are video-recorded, and will be available at https://video.ethz.ch/lectures/d-infk.html Videos, slides etc. from last year are still available https://video.ethz.ch/lectures/d- infk/2019/spring/252-0220-00L.html

12
slide-13
SLIDE 13

Waitlist situation

We are currently trying to create extra capacity and allow more students to register for the course If you are on the waitlist, please keep following the course – there will be more information next week

13
slide-14
SLIDE 14

Exercises

Take them seriously if you want to pass the exam… Published and partially corrected in moodle More involved solutions on website This week: Optional refresher on basic linear algebra, calculus and probability

14
slide-15
SLIDE 15

Online tutorials

Every Wednesday, 15:00-18:00 1-2 hours of presentation, 1-2 hours open Q&A Participate actively via Q&A feature Presentation will be recorded Public viewing at CAB G61 No TAs present. LIMITED CAPACITY

15
slide-16
SLIDE 16

Zoom client: https://ethz.zoom.us/j/869018193

slide-17
SLIDE 17

Meeting ID: 869-018-193

slide-18
SLIDE 18

Use true ethz email when registering

slide-19
SLIDE 19
slide-20
SLIDE 20

VI VIDEO PRESENTATION S SLIDE

  • r
  • r

DOCU C CAM

slide-21
SLIDE 21

Questions

Main resource: Piazza https://www.piazza.com/ethz.ch/spring2020/252022000l/home During tutorials via Q&A feature (live) Limited Capacity Office hours, Fridays, ML D28, 13:00-15:00 Very limited Capacity

21
slide-22
SLIDE 22

Course Project

In a course project, you will apply basic learning methods to make predictions on real data Submit predictions on test data To do now:

Team up in groups of (up to) three students Will send instructions on how to register by end of week

More details to follow in the tutorials Contributes to 30% of final grade Project must be passed on its own and has a bonus/penalty function

22
slide-23
SLIDE 23 23

Project server: https://project.las.ethz.ch

slide-24
SLIDE 24

Some FAQs

Distance exams

are possible (as exception), but need to officially request with study administration

Doctoral students for whom a “Testat” or 2 ECTS credits suffice:

Can take unit “Introduction to Machine Learning (only project)”

Repeating the exam

requires repeating the project

Will maintain an FAQ list on webpage

24
slide-25
SLIDE 25

Introduction to Machine Learning

A brief tour of supervised and unsupervised learning

  • Prof. Andreas Krause

Institute for Machine Learning (las.ethz.ch)

slide-26
SLIDE 26

Machine Learning Tasks

Supervised Learning

Classification Regression Structured Prediction, …

Unsupervised Learning

Clustering Dimension reduction Anomaly detection, …

Many other specialized tasks

26
slide-27
SLIDE 27

Supervised Learning

27

f : X → Y f

slide-28
SLIDE 28

Example: E-Mail Classification

X: E-Mail Messages Y: label: “spam” or “non-spam”

28
slide-29
SLIDE 29

Example: Improving Hearing Aids

[Buhmann et al]

X: Acoustic waveforms Y: label speech, speech in noise, music, noise

29
slide-30
SLIDE 30

Example: Improving Hearing Aids

30
slide-31
SLIDE 31

Example: Image Classification

31

Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks ‘12

X: X: Y: Y:

slide-32
SLIDE 32

Regression

Goal: Predict real valued labels (possibly vectors) Examples:

32

X Y Flight route Delay (minutes) Real estate objects Price Patient & drug …. Treatment effectiveness …

slide-33
SLIDE 33 33

Example: Recommender systems

33

X: User & article / product features Y: Ranking of articles / products to display

slide-34
SLIDE 34

Example: Image captioning

34

Vinyals et al. Show and Tell: A Neural Image Caption Generator ‘14

Y X

slide-35
SLIDE 35 35

Example: Translation

X Y

slide-36
SLIDE 36 36

Example: Predicting program properties

[Raychev, Vechev, Krause POPL ’15] jsnice.org X Y

slide-37
SLIDE 37

Example: Computational Pathology

[Buhmann, Fuchs et al.]

37

Human Tissue TMA Proteomics Transcriptomics Metabolomics

X Y

slide-38
SLIDE 38

Basic Supervised Learning Pipeline

38

Training Data “spam” “ham” “spam”

Learning method

Model (Class- ifier,…)

Predic- tion

? ? ? Test Data

f : X → Y

: X→ Y

Model fitting Prediction and Generalization

: X

Representation

slide-39
SLIDE 39

Representing Data

Learning methods expect standardized representation

  • f data (e.g., Points in vector spaces, nodes in a

graph, similarity matrices ...) Concrete choice of representation („features“) is crucial for successful learning This class (typically): feature vectors in

39

[.3 .01 .1 2.3 0 0 1.1 …]

The quick brown fox jumps over the lazy dog …

[0 1 0 0 0 3 2 0 1 0 0 0] Rd

slide-40
SLIDE 40

Example: Bag-of-words

Suppose language contains at most d=100000 words Represent each document as a vector x in

i-th component xi counts occurrence of i-th word

40

Rd

Word Index a 1 abandon 2 ability 3 ... is 578 ... test 2512 ... this 2809 ....

slide-41
SLIDE 41

Bag-of-words: Improvements

Length of the document should not matter

Replace counts by binary indicator (yes/no) Normalize to unit length

Some words more „important“ than others

Remove „stopwords“ (the, a, is, ...) Stemming (learning, learner, learns -> learn) Discount frequent words (tf-idf)

Bag-of-words ignores order

Consider pairs (n-grams) of consecutive words

Does not differentiate between similar and dissimilar words (ignores semantics)

Word embeddings (e.g., word2vec, GloVe)

41
slide-42
SLIDE 42

Basic Supervised Learning Pipeline

42

Training Data “spam” “ham” “spam”

Learning method

Model (Class- ifier,…)

Predic- tion

? ? ? Test Data

f : X → Y

: X→ Y

Model fitting Prediction and Generalization

: X

Representation

slide-43
SLIDE 43 43

Example: Classifying Documents

Input: Training examples (e.g., “bag-of-words”) with positive (+) and negative (-) labels Goal: Decision rule (aka hypothesis, e.g., linear, decision tree, random forest, deep neural network …)

+ – + + + + + – – – –

– –

Spam Non-spam

+ – + +

?

slide-44
SLIDE 44

Basic Supervised Learning Pipeline

44

Training Data “spam” “ham” “spam”

Learning method

Model (Class- ifier,…)

Predic- tion

? ? ? Test Data

f : X → Y

: X→ Y

Model fitting Prediction and Generalization

: X

Representation

slide-45
SLIDE 45

Model selection and validation

Automatic model-selection and validation of crucial importance (è statistical learning theory) Goal: Balance of “Goodness of Fit” and complexity Ideal models are simultaneously statistically and computationally efficient

45

Underfitting (too simple) Overfitting (too complex) Good fit

+ – + + + + + – – – –

– – + – + +

– – –

+ + + + + + + – + + + + + – – – –

– – + – + +

– – –

+ + + + + + + – + + + + + – – – –

– – + – + +

– – –

+ + + + + +

– – –

slide-46
SLIDE 46

Machine Learning Tasks

Supervised Learning

Classification Regression Structured Prediction, …

Unsupervised Learning

Clustering Dimension reduction Anomaly detection, …

Many other specialized tasks

46
slide-47
SLIDE 47

Basic Unsupervised Learning Pipeline

47

Training Data

Learning method

Model

Predic- tion

? ? ? Test Data

f : X → Y

: X

Model fitting Prediction Representation

slide-48
SLIDE 48

Unsupervised learning

„Learning without labels“ Examples:

Clustering (e.g., unsupervised classification) Dimension reduction (e.g., unsupervised regression) Generative modeling (topic models, autoencoders, GANs etc.)

Common goals:

Compact representation / compression of data sets Identification of latent variables

Use-cases:

Exploratory data analysis Feature learning / embedding Anomaly detection of„unusual“ data points

48
slide-49
SLIDE 49

Example: Clustering

Input: Data set without labels Goal: Assignment to clusters (infer labels)

49
slide-50
SLIDE 50

Example: Dimension Reduction

[Roweis & Saul, Nonlinear dimensionality reduction by locally linear embedding, Science ‘00]

50
slide-51
SLIDE 51

Example: Dimension reduction

Often, high-dimensional data can be well approximated in low dimensions Very useful for visualization! Many methods available, e.g.,

Linear (Principal Component Analysis, Linear Discriminant Analysis, ...) Non-linear (ISOMAP, Kernel-PCA, Max. variance unfolding, t-SNE, autoencoders based on neural networks, ...) Sparse modeling / inference

51

Eigenfaces

[AT&T Labs Cambridge]

slide-52
SLIDE 52

Example: Anomaly detection

Application: Quality control, fraud detection, … Fit statistical model of “normal” data Declare “unusual” (low prob.) data as anomaly

52

Anomaly-Threshold “normal” “Anomaly”

slide-53
SLIDE 53

Example: Network inference

[Gomez Rodriguez, Leskovec, Krause ACM TKDE 2012]

53

Estimate flow of information and influence in the „blogosphere“ (ecosystem of blogs and social media)

slide-54
SLIDE 54

Example: Never Ending Language Learning

[Mitchell et al.] (Mostly) unsupervised acquisition of facts by „reading“ the internet

54

[rtw.ml.cmu.edu]

slide-55
SLIDE 55

Example: GANs

[Goodfellow et al’14, Salimans et al’16]

55
slide-56
SLIDE 56

BigGAN

[Brock, Donahue, Simonyan. Large Scale GAN Training for High Fidelity Natural Image Synthesis ICLR ‘19]

56
slide-57
SLIDE 57

Machine Learning Tasks

Supervised Learning

Classification Regression Structured Prediction, …

Unsupervised Learning

Clustering Dimension reduction Anomaly detection, …

Many other specialized tasks

57
slide-58
SLIDE 58

Other models of learning

Semi-supervised learning

Learning from both labeled and unlabeled data

Transfer & meta learning

Learn on one domain and test on another

Active learning

Acquiring most informative data for learning

Online / lifelong / continual learning

Learning from examples as they arrive over time

Reinforcement learning

Learning by interacting with an unknown environment

...

58
slide-59
SLIDE 59

Summary so far

Two basic forms of learning:

Supervised vs. Unsupervised learning

Key challenge in ML

Trading goodness of fit and model complexity

Representation of data is of key importance

59