Introduction to Large-Scale ML Shan-Hung Wu shwu@cs.nthu.edu.tw - - PowerPoint PPT Presentation

introduction to large scale ml
SMART_READER_LITE
LIVE PREVIEW

Introduction to Large-Scale ML Shan-Hung Wu shwu@cs.nthu.edu.tw - - PowerPoint PPT Presentation

Introduction to Large-Scale ML Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 1 / 20 Outline Whats Machine


slide-1
SLIDE 1

Introduction to Large-Scale ML

Shan-Hung Wu

shwu@cs.nthu.edu.tw

Department of Computer Science, National Tsing Hua University, Taiwan

Machine Learning

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 1 / 20

slide-2
SLIDE 2

Outline

1

What’s Machine Learning?

2

About this Course...

3

FAQ

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 2 / 20

slide-3
SLIDE 3

Outline

1

What’s Machine Learning?

2

About this Course...

3

FAQ

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 3 / 20

slide-4
SLIDE 4

Prior vs. Posteriori Knowledge

To solve a problem, we need an algorithm

E.g., sorting

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 20

slide-5
SLIDE 5

Prior vs. Posteriori Knowledge

To solve a problem, we need an algorithm

E.g., sorting A priori knowledge is enough

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 20

slide-6
SLIDE 6

Prior vs. Posteriori Knowledge

To solve a problem, we need an algorithm

E.g., sorting A priori knowledge is enough

For some problem, however, we do not have the a priori knowledge

E.g., to tell if an email is spam or not The correct answer varies in time and from person to person

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 20

slide-7
SLIDE 7

Prior vs. Posteriori Knowledge

To solve a problem, we need an algorithm

E.g., sorting A priori knowledge is enough

For some problem, however, we do not have the a priori knowledge

E.g., to tell if an email is spam or not The correct answer varies in time and from person to person

Machine learning algorithms use the a posteriori knowledge to solve problems

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 20

slide-8
SLIDE 8

Prior vs. Posteriori Knowledge

To solve a problem, we need an algorithm

E.g., sorting A priori knowledge is enough

For some problem, however, we do not have the a priori knowledge

E.g., to tell if an email is spam or not The correct answer varies in time and from person to person

Machine learning algorithms use the a posteriori knowledge to solve problems

Learnt from examples (as extra input)

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 4 / 20

slide-9
SLIDE 9

Example Data X as Extra Input

Unsupervised: X = {x(i)}N

i=1, where x(i) 2 RD

E.g., x(i) an email

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 5 / 20

slide-10
SLIDE 10

Example Data X as Extra Input

Unsupervised: X = {x(i)}N

i=1, where x(i) 2 RD

E.g., x(i) an email

Supervised: X = {(x(i),y(i))}N

i=1, where x(i) 2 RD and y(i) 2 RK,

E.g., y(i) 2 {0,1} a spam label

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 5 / 20

slide-11
SLIDE 11

General Types of Learning (1/2)

Supervised learning: learn to predict the labels of future data points X 2 RN⇥D : x0 2 RN : y 2 RN⇥K : [e(6),e(1),e(9),e(4),e(2)] y0 2 RK : ?

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 6 / 20

slide-12
SLIDE 12

General Types of Learning (1/2)

Supervised learning: learn to predict the labels of future data points X 2 RN⇥D : x0 2 RN : y 2 RN⇥K : [e(6),e(1),e(9),e(4),e(2)] y0 2 RK : ? Unsupervised learning: learn patterns or latent factors in X

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 6 / 20

slide-13
SLIDE 13

General Types of Learning (2/2)

Reinforcement learning: learn from “good”/“bad” feedback of actions (instead of correct labels) to maximize the goal

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 7 / 20

slide-14
SLIDE 14

General Types of Learning (2/2)

Reinforcement learning: learn from “good”/“bad” feedback of actions (instead of correct labels) to maximize the goal AlphaGo [1] is a hybrid of reinforcement learning and supervised learning

The latter is used to tell how good a “move” performed by an agent

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 7 / 20

slide-15
SLIDE 15

General Machine Learning Steps

1

Data collection, preprocessing (e.g., integration, cleaning, etc.), and exploration

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 20

slide-16
SLIDE 16

General Machine Learning Steps

1

Data collection, preprocessing (e.g., integration, cleaning, etc.), and exploration

1

Split a dataset into the training and testing datasets

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 20

slide-17
SLIDE 17

General Machine Learning Steps

1

Data collection, preprocessing (e.g., integration, cleaning, etc.), and exploration

1

Split a dataset into the training and testing datasets

2

Model development

1

Assume a model {f} that is a collection of candidate functions f’s (representing posteriori knowledge) we want to discover

1

f may be parametrized by w

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 20

slide-18
SLIDE 18

General Machine Learning Steps

1

Data collection, preprocessing (e.g., integration, cleaning, etc.), and exploration

1

Split a dataset into the training and testing datasets

2

Model development

1

Assume a model {f} that is a collection of candidate functions f’s (representing posteriori knowledge) we want to discover

1

f may be parametrized by w

2

Define an cost function C(w) (or functional C[f]) that measures “how good a particular f can explain the training data”

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 20

slide-19
SLIDE 19

General Machine Learning Steps

1

Data collection, preprocessing (e.g., integration, cleaning, etc.), and exploration

1

Split a dataset into the training and testing datasets

2

Model development

1

Assume a model {f} that is a collection of candidate functions f’s (representing posteriori knowledge) we want to discover

1

f may be parametrized by w

2

Define an cost function C(w) (or functional C[f]) that measures “how good a particular f can explain the training data”

3

Training: employ an algorithm that finds the best (or good enough) function f ⇤ in the model that minimizes the cost function over the training dataset

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 20

slide-20
SLIDE 20

General Machine Learning Steps

1

Data collection, preprocessing (e.g., integration, cleaning, etc.), and exploration

1

Split a dataset into the training and testing datasets

2

Model development

1

Assume a model {f} that is a collection of candidate functions f’s (representing posteriori knowledge) we want to discover

1

f may be parametrized by w

2

Define an cost function C(w) (or functional C[f]) that measures “how good a particular f can explain the training data”

3

Training: employ an algorithm that finds the best (or good enough) function f ⇤ in the model that minimizes the cost function over the training dataset

4

Testing: evaluate the performance of the learned f ⇤ using the testing dataset

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 20

slide-21
SLIDE 21

General Machine Learning Steps

1

Data collection, preprocessing (e.g., integration, cleaning, etc.), and exploration

1

Split a dataset into the training and testing datasets

2

Model development

1

Assume a model {f} that is a collection of candidate functions f’s (representing posteriori knowledge) we want to discover

1

f may be parametrized by w

2

Define an cost function C(w) (or functional C[f]) that measures “how good a particular f can explain the training data”

3

Training: employ an algorithm that finds the best (or good enough) function f ⇤ in the model that minimizes the cost function over the training dataset

4

Testing: evaluate the performance of the learned f ⇤ using the testing dataset

5

Apply the model to the real world

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 8 / 20

slide-22
SLIDE 22

Example for Spam Detection

1

Random split of your past emails and labels

1

Training dataset: X = {(x(i),y(i))}i

2

Testing dataset: X0 = {(x0(i),y0(i))}i

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 20

slide-23
SLIDE 23

Example for Spam Detection

1

Random split of your past emails and labels

1

Training dataset: X = {(x(i),y(i))}i

2

Testing dataset: X0 = {(x0(i),y0(i))}i

2

Model development

1

Model: {f : f(x;w) = w>x}

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 20

slide-24
SLIDE 24

Example for Spam Detection

1

Random split of your past emails and labels

1

Training dataset: X = {(x(i),y(i))}i

2

Testing dataset: X0 = {(x0(i),y0(i))}i

2

Model development

1

Model: {f : f(x;w) = w>x}

2

Cost function: C(w) = Σi1(x(i);f(x(i);w) 6= y(i))

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 20

slide-25
SLIDE 25

Example for Spam Detection

1

Random split of your past emails and labels

1

Training dataset: X = {(x(i),y(i))}i

2

Testing dataset: X0 = {(x0(i),y0(i))}i

2

Model development

1

Model: {f : f(x;w) = w>x}

2

Cost function: C(w) = Σi1(x(i);f(x(i);w) 6= y(i))

3

Training: to solve w⇤ = argminw Σi1(x(i);f(x(i);w) 6= y(i))

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 20

slide-26
SLIDE 26

Example for Spam Detection

1

Random split of your past emails and labels

1

Training dataset: X = {(x(i),y(i))}i

2

Testing dataset: X0 = {(x0(i),y0(i))}i

2

Model development

1

Model: {f : f(x;w) = w>x}

2

Cost function: C(w) = Σi1(x(i);f(x(i);w) 6= y(i))

3

Training: to solve w⇤ = argminw Σi1(x(i);f(x(i);w) 6= y(i))

4

Testing: accuracy

1 |X0|Σi1(x0(i);f(x0(i)) = y0(i))

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 20

slide-27
SLIDE 27

Example for Spam Detection

1

Random split of your past emails and labels

1

Training dataset: X = {(x(i),y(i))}i

2

Testing dataset: X0 = {(x0(i),y0(i))}i

2

Model development

1

Model: {f : f(x;w) = w>x}

2

Cost function: C(w) = Σi1(x(i);f(x(i);w) 6= y(i))

3

Training: to solve w⇤ = argminw Σi1(x(i);f(x(i);w) 6= y(i))

4

Testing: accuracy

1 |X0|Σi1(x0(i);f(x0(i)) = y0(i))

5

Use f ⇤ to predict the labels of your future emails

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 20

slide-28
SLIDE 28

Example for Spam Detection

1

Random split of your past emails and labels

1

Training dataset: X = {(x(i),y(i))}i

2

Testing dataset: X0 = {(x0(i),y0(i))}i

2

Model development

1

Model: {f : f(x;w) = w>x}

2

Cost function: C(w) = Σi1(x(i);f(x(i);w) 6= y(i))

3

Training: to solve w⇤ = argminw Σi1(x(i);f(x(i);w) 6= y(i))

4

Testing: accuracy

1 |X0|Σi1(x0(i);f(x0(i)) = y0(i))

5

Use f ⇤ to predict the labels of your future emails See Notation

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 9 / 20

slide-29
SLIDE 29

Outline

1

What’s Machine Learning?

2

About this Course...

3

FAQ

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 10 / 20

slide-30
SLIDE 30

Target Audience

Senior undergraduate and junior graduate students

Easy-to-moderate level of theory Coding and engineering (in Python) Clean datasets (small & large)

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 11 / 20

slide-31
SLIDE 31

Target Audience

Senior undergraduate and junior graduate students

Easy-to-moderate level of theory Coding and engineering (in Python) Clean datasets (small & large)

For those who are familiar with ML already:

Get advanced materials from other courses Or join us since week 10

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 11 / 20

slide-32
SLIDE 32

Topics Covered

Supervised and unsupervised learning only

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 12 / 20

slide-33
SLIDE 33

Topics Covered

Supervised and unsupervised learning only with structural output:

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 12 / 20

slide-34
SLIDE 34

Syllabus

Part 1: math review (4 weeks)

Linear algebra Probability & information theory Numerical optimization

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 13 / 20

slide-35
SLIDE 35

Syllabus

Part 1: math review (4 weeks)

Linear algebra Probability & information theory Numerical optimization

Part 2: machine learning basics (4 weeks)

Learning theory Parametric/non-parametric models Experiment design

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 13 / 20

slide-36
SLIDE 36

Syllabus

Part 1: math review (4 weeks)

Linear algebra Probability & information theory Numerical optimization

Part 2: machine learning basics (4 weeks)

Learning theory Parametric/non-parametric models Experiment design

Part 3: large-scale & deep learning (6 weeks)

Approximate inference Neural Networks (NNs), CNNs, RNNs

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 13 / 20

slide-37
SLIDE 37

Grading (Tentative)

Contests: 40%

1st at the end of Part 2 2nd at the end of semester

Midterm: 20%

During Part 3

Assignments: 40%

Once every 1~3 weeks

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 14 / 20

slide-38
SLIDE 38

Classes Info

Lectures

Tue 10:10am-12:00pm Concepts & theories

Labs

Thu 9:00am-11:00pm Implementation (in Python) & engineering topics

More info can be found in the course website

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 15 / 20

slide-39
SLIDE 39

Outline

1

What’s Machine Learning?

2

About this Course...

3

FAQ

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 16 / 20

slide-40
SLIDE 40

FAQ (1/2)

Q: Should we team up for the contests? A: Yes, 3 students per team

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 17 / 20

slide-41
SLIDE 41

FAQ (1/2)

Q: Should we team up for the contests? A: Yes, 3 students per team Q: Do we need to attend the classes? A: No, as long as you can pass. But you have attendance bonus...

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 17 / 20

slide-42
SLIDE 42

FAQ (1/2)

Q: Should we team up for the contests? A: Yes, 3 students per team Q: Do we need to attend the classes? A: No, as long as you can pass. But you have attendance bonus... Q: Is this a light-loading course or heavy-loading one? A: Should be heavy to most students. Please reserve your time

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 17 / 20

slide-43
SLIDE 43

FAQ (2/2)

Q: What’s the textbook? A: No formal textbook. But if you need one, read the Deep Learning book

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 18 / 20

slide-44
SLIDE 44

FAQ (2/2)

Q: What’s the textbook? A: No formal textbook. But if you need one, read the Deep Learning book Q: Why some sections are marked with “*” or “**” in the slides? A: The mark “*” means “can be skipped for the first time reader,” and “**” means “materials for reference only”

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 18 / 20

slide-45
SLIDE 45

FAQ (2/2)

Q: What’s the textbook? A: No formal textbook. But if you need one, read the Deep Learning book Q: Why some sections are marked with “*” or “**” in the slides? A: The mark “*” means “can be skipped for the first time reader,” and “**” means “materials for reference only” Q: Can I be enrolled? A: We take up to 60 students (20 teams, 2 teams per GPU-equipped machine). Juniors have priority

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 18 / 20

slide-46
SLIDE 46

TODO

Assigned reading:

Calculus Scientific Python 101

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 19 / 20

slide-47
SLIDE 47

Reference I

[1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,

  • M. Lanctot, et al.

Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.

Shan-Hung Wu (CS, NTHU) Introduction Machine Learning 20 / 20