Applied Machine Learning Introduction Siamak Ravanbakhsh COMP 551 - - PowerPoint PPT Presentation

applied machine learning
SMART_READER_LITE
LIVE PREVIEW

Applied Machine Learning Introduction Siamak Ravanbakhsh COMP 551 - - PowerPoint PPT Presentation

Applied Machine Learning Introduction Siamak Ravanbakhsh COMP 551 (fall 2020) Objectives a short history of ML understanding the scope of machine learning relation to other areas understanding major families of machine learning tasks What


slide-1
SLIDE 1

Applied Machine Learning

Introduction

Siamak Ravanbakhsh

COMP 551 (fall 2020)

slide-2
SLIDE 2

a short history of ML understanding the scope of machine learning relation to other areas understanding major families of machine learning tasks

Objectives

slide-3
SLIDE 3

ML is the set of "algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions"

What is Machine Learning?

while there are some unifying principles, machine learning may still look like a toolbox with different tools suitable for different tasks ML is the "study of computer algorithms that improve automatically through experience"

slide-4
SLIDE 4

COMP 551 | Fall 2020

Placing ML

Artificial Intelligence: its a broader domain (includes search, planning, multiagent systems, robotics, etc.) Statistics: historically precedes ML. ML is more focused on algorithmic, practical and powerful models (e.g., neural networks) and is built around AI Vision & Natural Language Processing: use many ML algorithms and ideas Optimization: extensively used in ML Data mining: scalability, and performance comes before having theoretical foundations, more space for using heuristics, exploratory analysis, and unsupervised algorithms Data science: an umbrella term for the above mostly used in industry when the output is knowledge/information to be used for decision making

slide-5
SLIDE 5

A short history of ML

1950: Turing test participants in the imitation game: A) machine B) human C) an interrogator test: if the machine can imitate humans such that the interrogator C after some time cannot reliably tell which one of A or B is human then machine A passes the Turing test. extensive debates about the validity of the test and what it actually proves

slide-6
SLIDE 6

A short history of ML

1950: Turing test 1956: checker player that learned as it played (Arthur Samuel) coined the term Machine Learning uses a (min-max) search method learning happens in estimating the value of a state many important ideas appear in his work

self-play, alpha-beta pruning, temporal difference learning, function approximation ...

figure from from Samuel's paper (1959)

slide-7
SLIDE 7

A short history of ML

1950: Turing test 1956: checker player that learned as it played (Arthur Samuel) 1958: first artificial neural networks Perceptron, and ADELINE (1959)

(Rosenblot) (Widrow and Hoff)

we will discuss Perceptron's learning algorithm which was in turn based on Hebbian learning: connected neural wiring with firing patterns

Perceptron M1 could process a 20x20 pixels image

base on McCullach-Pitts mathematical model of neurons

f(x) = σ w x (∑i

i i)

xi

activation function

slide-8
SLIDE 8

A short history of ML

1950: Turing test 1956: checker player that learned as it played (Arthur Samuel) 1958: first artificial neural networks Perceptron, and ADELINE (1959) 1963: support vector machines (Vapnik & Chervonenkis)

we will discuss SVM's idea later in the course

meanwhile neural networks are finding lots of applications

Speech Recognition weather forcasting telephones

1969: Minskey and Pappert show the limitations of single-layer Perceptron for example, it cannot learn a simple XOR function the limitation does not extend to a multilayer perceptron (which was known back then)

  • ne of the factors in so-called AI winter

1970-1980: ruled based and symbolic AI dominates in contrast to connectionist AI as in neural networks expert systems find applications in industry these are rule-based systems with their specialized hardware

slide-9
SLIDE 9

A short history of ML

1950: Turing test 1956: checker player that learned as it played (Arthur Samuel) 1958: first artificial neural networks Perceptron, and ADELINE (1959) 1963: support vector machines (Vapnik & Chervonenkis) 1969: Minskey and Pappert show the limitations of single-layer Perceptron 1970-1980: ruled based and symbolic AI dominates 1980s Bayesian networks (Judea Pearl) combine graph structure with probabilistic (and causal) reasoning related to both symbolic and connectionist approach 1986 Backpropagation rediscovered (Rumelhart, Hinton & Williams) an efficient method for learning the weights in neural networks using gradient descent it was rediscovered many times since the 1960s we discuss it later in the course 1980-1990s: expert systems are being replaced with general-purpose computers

slide-10
SLIDE 10

COMP 551 | Fall 2020

A short history of ML

1950: Turing test 1956: checker player that learned as it played (Arthur Samuel) 1958: first artificial neural networks Perceptron, and ADELINE (1959) 1963: support vector machines (Vapnik & Chervonenkis) 1969: Minskey and Pappert show the limitations of single-layer Perceptron 1970-1980: ruled based and symbolic AI dominates 1980s Bayesian networks (Judea Pearl) 1986 Backpropagation rediscovered (Rumelhart, Hinton & Williams) 1980-1990s: expert systems are being replaced with general-purpose computers ... 2012 AlexNet wins Imagenet by a large margin 2012 - now a new AI spring around deep learning ... super-human performance in many tasks Future: what is next? in the short term, AI will impact domain sciences historically, hypes have been followed by disappointments is it the same this time?

slide-11
SLIDE 11

input features predictors independent variable covariate

x

Some terminology

  • utput

targets labels predictions dependent variable response variable

ML algorithm

(hypothesis)

y

<tumorsize, texture, perimeter> = <18.2, 27.6, 117.5> cancer = No example

slide-12
SLIDE 12

COMP 551 | Fall 2020

(labelled) datasets: consist of many training examples or instances

<tumorsize, texture, perimeter> , <cancer, size change> <18.2, 27.6, 117.5> , < No , +2 > <17.9, 10.3, 122.8> , < No , -4 > <20.2, 14.3, 111.2> , < Yes , +3 > <15.5, 15.2, 135.5> , < No , 0 >

Some terminology

x(1) x(2)

x(N) x(3)

  • ne instance
slide-13
SLIDE 13

families of ML methods

  • 1. Supervised learning: we have labeled data

classification regression structured prediction

x =

(n)

(x , x )

1 (n) 2 (n)

y =

(n)

−1

slide-14
SLIDE 14

<tumorsize, texture, perimeter> , <size change> <18.2, 27.6, 117.5> , < +2 > <17.9, 10.3, 122.8> , < -4 > <20.2, 14.3, 111.2> , < +3 > <15.5, 15.2, 135.5> , < 0 >

Regression: continuous output

<tumorsize, texture, perimeter> , <cancer> <18.2, 27.6, 117.5> , < No > <17.9, 10.3, 122.8> , < No > <20.2, 14.3, 111.2> , < Yes > <15.5, 15.2, 135.5> , < No >

Classification: categorical/discrete output

target target

Supervised learning

slide-15
SLIDE 15

Supervised learning: example

slide-16
SLIDE 16

Machine translation: data consists of input-output sentence pairs (x,y)

Supervised learning: example

more recently end-to-end speech translation similarly we may consider text-to-speech, with text and voice as input and target (x,y)

a variety of language processing tasks are in this category

in speech recognition input and output above are swapped

translation example from CNET

slide-17
SLIDE 17

Supervised learning: example

input: image

  • utput: text

image: COCO dataset

Image captioning

slide-18
SLIDE 18

COMP 551 | Fall 2020

Supervised learning: example

input: image

  • utput: a set of bounding box coordinates

image: https://bitmovin.com/object-detection/

Object detection

slide-19
SLIDE 19

Families of ML methods

  • 2. Unsupervised earning: only unlabeled data

clustering dimensionality reduction density estimation / generative modeling anomaly detection discovering latent factors and structures ...

helps explore and understand the data closer to data mining we have much more unlabeled data more open challeges

slide-20
SLIDE 20

Unsupervised learning: example

<tumorsize, texture, perimeter> , <cancer> <18.2, 27.6, 117.5> , < No > <17.9, 10.3, 122.8> , < No > <20.2, 14.3, 111.2> , < Yes > <15.5, 15.2, 135.5> , < No >

Similar to classification but labels/classes should be inferred and are not given to the algorithm

Clustering

slide-21
SLIDE 21

COMP 551 | Fall 2020

Unsupervised learning: example

Generative modeling (density estimation): learn the data distribution p(x)

slide-22
SLIDE 22

Families of ML methods

Semisupervised learning: a few labeled examples we can include structured problems such as matrix completion (a few entries are observed) link prediction

slide-23
SLIDE 23

COMP 551 | Fall 2020

slide-24
SLIDE 24

Families of ML methods

Reinforcement Learning: weak supervision through the reward signal sequential decision making biologically motivated also related: imitation learning: learning from demonstrations behavior cloning (is supervised learning!) inverse reinforcement learning (learning the reward function)

slide-25
SLIDE 25

Reinforcement learning: example

slide-26
SLIDE 26

COMP 551 | Fall 2020

Reinforcement learning: example

slide-27
SLIDE 27

Summary

Supervised Learning: we have labeled data classification regression structured prediction Unsupervised Learning: only unlabeled data clustering dimensionality reduction density estimation / generative modeling anomaly detection discovering latent factors and structures Semisupervised learning: a few labeled examples Reinforcement Learning: reward signal

  • ne way to classify ML methods is based on availability labeled data

note that there are other ways to classify ML methods, and other learning paradigms not covered