COMP24111: Machine Learning and Optimisation Chapter 1: Machine - - PowerPoint PPT Presentation

comp24111 machine learning and optimisation
SMART_READER_LITE
LIVE PREVIEW

COMP24111: Machine Learning and Optimisation Chapter 1: Machine - - PowerPoint PPT Presentation

COMP24111: Machine Learning and Optimisation Chapter 1: Machine Learning Basics Dr. Tingting Mu Email: tingting.mu@manchester.ac.uk Outline We are going to learn the following concepts: Machine learning. Unsupervised, supervised,


slide-1
SLIDE 1

COMP24111: Machine Learning and Optimisation

  • Dr. Tingting Mu

Email: tingting.mu@manchester.ac.uk

Chapter 1: Machine Learning Basics

slide-2
SLIDE 2

Outline

  • We are going to learn the following concepts:

– Machine learning. – Unsupervised, supervised, reinforcement learning. – Classification. – Regression.

1

slide-3
SLIDE 3
  • “The goal of machine learning is to make a computer learn just like a

baby — it should get better at tasks with experience.”

  • A machine learning system can be used to

– Automate a process – Automate decision making – Extract knowledge from data – Predict future event – Adapt systems dynamically to enable better user experiences – …

  • How do we build a machine learning system?

Machine Learning

2

slide-4
SLIDE 4
  • Basic idea:

– To represent experiences with data. – To covert a task to a parametric model. – To convert the learning quality to an objective function. – To determine the model through optimising an objective function.

  • Machine learning research builds on optimisaton theory, linear algebra,

probability theory…

Machine Learning

3

slide-5
SLIDE 5
  • Goal: to find the minimum (or maximum) of a real-valued function by

– systematically choosing the values of the function input from an allowed set. – computing the value of the function using the chosen value.

  • We look at the example function , the input x is allowed

to be chosen from the set of real numbers between 0 and 3, the input y is allowed to be chosen from the set of real numbers between 0 and 5.

– Its minimum is or – The chosen input that gives the minimum is – For maximum case,

Optimisation

4

f x, y

( ) = (x +1)2 sin y ( )

max

x∈[0,3], y∈[0,5]

f x, y

( )

[x*, y*]= argmin

x∈[0,3], y∈[0,5]

f x, y

( )

argmax

x∈[0,3], y∈[0,5]

f x, y

( )

min

x∈[0,3], y∈[0,5]

f x, y

( )

min f x, y

( )

subject to 0 ≤ x ≤ 3 0 ≤ y ≤ 5

slide-6
SLIDE 6

Machine Learning in Data Science

  • Data is recorded on real-world phenomenons. The World is driven by data.

– Germany’s climate research centre generates 10 petabytes per year. – Google processes 24 petabytes per day. – PC users crossed over 300 billion videos in August 2014 alone, with an average of 202 videos and 952 minutes per viewer. – There were 223 million credit card purchases in March 2016, with a total value of £12.6 billion in UK. – Photo uploads in Facebook is around 300 million per day. – Approximately 2.5 million new scientific papers are published each year. – …

  • What might we want to do with that data?

– Prediction - what can we predict about this phenomenon? – Description - how can we describe/understand this phenomenon in a new way?

  • Humans cannot handle manually data in such scale any more. A machine

learning system can learn from data and offer insights.

5

slide-7
SLIDE 7

Machine Learning Speech Recognition Speech Synthesis Natural Language Processing Text Mining Computer Vision Data Mining, Analysis, Engineering Robotics

Machine learning in A.I.

  • All of these are

subfields of Artificial Intelligence (A.I.).

  • Machine

learning plays a significant role in A.I.

6

slide-8
SLIDE 8

School Courses

Machine Learning Speech Recognition Speech Synthesis Natural Language Processing Text Mining Computer Vision Data Mining, Analysis, Engineering Robotics COMP14112, Fundamentals of A.I. COMP37212, Computer Vision COMP38120, Documents, Services and Data on the Web COMP61332, Text Mining COMP60711, Data Engineering COMP34120, AI and Games COMP24111, Machine Learning and Optimisation COMP61011, Foundations of Machine Learning COMP61021, Modelling and visualization

  • f high-dimensional data

7

slide-9
SLIDE 9
  • Wine experts identify the grape type

by smelling and tasting the wine.

  • The chemist says that wines derived from

different grape types are different in terms

  • f alcohol, malic acid, alcalinity of ash,

magnesium, color intensity, etc.

  • We get the measurements. But, too many numbers…

Example: Wine Classification

Can build a machine learning system to automate grape type identification!

8

slide-10
SLIDE 10
  • Task: To identify the grape type of a wine sample based on the

measured chemical quantities!

v Collecting wine samples for each grape type. v Characterising each wine sample with 13 chemical features.

Feature Extraction 30 bottles in total, 10 bottles for each tree type, each bottle is charaterised by 13 features.

x1 = x1,1, x1,2, x1,3,…, x1,12, x1,13 ⎡ ⎣ ⎤ ⎦, y1 = grape type 1 x2 = x2,1, x2,2, x2,3,…, x2,12, x2,13 ⎡ ⎣ ⎤ ⎦, y2 = grape type 2 x3 = x3,1, x3,2, x3,3,…, x3,12, x3,13 ⎡ ⎣ ⎤ ⎦, y3 = grape type 2 ! ! x30 = x30,1, x30,2, x30,3,…, x30,12, x30,13 ⎡ ⎣ ⎤ ⎦, y30 = grape type 1

feature vectors class labels Experiences 9

Example: Wine Classification

slide-11
SLIDE 11

ˆ y = g(x) = type 1, if wixi

i=1 13

+ b ≥ 0 type 2, if wixi

i=1 13

+ b < 0 ⎧ ⎨ ⎪ ⎪ ⎩ ⎪ ⎪ v Design a mathematical model to predict the grape type. The model below is controlled by 14 parameters:

w1,w2,…,w13,b

[ ]

⇒ ˆ y1 = g(x1) ⇒ ˆ y2 = g(x2) ! ⇒ ˆ y30 = g(x30)

Predicted grape type by computer. Wine features.

bottle 1: x1 bottle 2: x2 ! ! bottle 30: x30

Real grape type. ✔ ✗ ✔

v System training is the process of finding the best model parameters by minimising a loss function.

y1 y2 ! y30 w1

*,w2 *,…,w13 * , b*

⎡ ⎣ ⎤ ⎦= argmin

w1 ,w2,…,w13 , b

Oloss w1,w2,…,w13 , b

( )

Loss: predictive error

10

Example: Wine Classification

slide-12
SLIDE 12
  • Now, given an unseen bottle of wine:

13 Features: x1 = 12.25, x2 = 3.88, x3 = 2.2, x4 = 18.5, x5 = 112, x6 = 1.38, x7 = 0.78, x8 = 0.29, x9 = 1.14, x10 = 8.21, x11 = 0.65, x12 = 2, x13 = 855

ˆ y = g(x) = type 1, if w*

ixi i=1 13

+ b* ≥ 0 type 2, if w*

ixi i=1 13

+ b* < 0 ⎧ ⎨ ⎪ ⎪ ⎩ ⎪ ⎪

11

Example: Wine Classification

slide-13
SLIDE 13

Three Ingredients in Machine Learning

  • “Model” (final product):

The thing you have to package up and send to a customer. A piece

  • f code with some parameters that need to be optimised.
  • “Error function” (performance criterion):

The function you use to judge how well the parameters of the model are set.

  • “Learning algorithm” (training):

The algorithm that optimises the model parameters, using the error function to judge how well it is doing.

12

slide-14
SLIDE 14

Learning Type: Supervised

13

  • In supervised learning, there is a “teacher” who provides a target
  • utput for each data pattern. This guides the computer to build a

predictive relationship between the data pattern and the target output.

  • The target output can be a real-valued number, an integer, a symbol,

a set of real-valued numbers, a set of integers, or a set of symbols.

  • A training example (or called sample) is a pair consisting of an input

data pattern (or called object) and a target output.

  • A test example is used to assess the strength and utility of a

predictive relationship. Its target output is only used for evaluation purpose, and never contributes to the learning process.

  • Typical supervised learning tasks include classification and

regression.

slide-15
SLIDE 15

14

The target output is a category label.

  • Medical diagnosis: x=patient data, y=positive/negative of some

pathology

  • Optical character recognition: x=pixel values and writing curves,

y=‘A’, ‘B’, ‘C’, …

  • Image analysis: x=image pixel features, y=scene/objects contained

in image

  • Weather: x=current & previous conditions per location,

y=tomorrow’s weather … … … this list can never end, applications of classification are vast and extremely active!

Classification Examples:

slide-16
SLIDE 16

Regression Examples:

15

The target output is a continuous number (or a set of such numbers).

  • Finance: x=current market conditions and other possible side

information, y=tomorrow’s stock market price

  • Social Media: x=videos the viewer is watching on YouTube,

y=viewer’s age

  • Robotics: x=control signals sent to motors, y=the 3D location of a

robot arm end effector

  • Medical Health: x=a number of clinical measurements, y=the amount
  • f prostate specific antigen in the body
  • Environment: x=weather data, time, door sensors, etc., y=the

temperature at any location inside a building … … … this list can never end, applications of regression are vast and extremely active!

slide-17
SLIDE 17

Successful Applications

16

  • Convert speech to text, translate from one language to the other.
slide-18
SLIDE 18

Successful Applications

17

  • Face recognition
slide-19
SLIDE 19

Successful Applications

18

  • Object recognition, speech synthesis,

information retrieval.

slide-20
SLIDE 20

Learning Type: Unsupervised

19

  • In unsupervised learning, there is no explicit “teacher”.
  • The systems form a natural “understanding” of the hidden structure

from unlabelled data.

  • Typical unsupervised learning tasks include

– Clustering: group similar data patterns together. – Generative modelling: estimate distribution of the observed data patterns. – Unsupervised representation learning: remove noise, capture data statistics, capture inherent data structure.

MATLAB’s example

From https://cambridge-intelligence.com/keylines-network-clustering/

MATLAB’s example

slide-21
SLIDE 21

Successful Applications

20

  • Document clustering and visualisation
slide-22
SLIDE 22

Learning Type: Reinforcement

21

  • In reinforcement learning, there is a “teacher” who provides

feedback on the action of an agent, in terms of reward and punishment.

  • Examples:

– Helicopter manoeuvres: reward for following desired trajectory, punishment for crashing. – Manage an investment portfolio: reward for each $ in bank. – Control a power station: reward for producing power, punishment for exceeding safety thresholds. – Make a humanoid robot walk, reward for forward motion, punishment for falling over. – Play many different Atari games better than humans: reward for increasing score, punishment for decreasing score.

These examples are from UCL course on RL.

slide-23
SLIDE 23

Successful Applications

22

  • Game player, self-driving cars, trading strategy.
slide-24
SLIDE 24

History

23

  • 1940s, Human reasoning / logic first studied as a formal subject within

mathematics (Claude Shannon, Kurt Godel et al).

  • 1950s, The Turing Test is proposed: a test for true machine intelligence,

expected to be passed by year 2000. Various game-playing programs built. 1956, Dartmouth conference coins the phrase artificial intelligence. 1959, Arthur Samuel wrote a program that learnt to play draughts (checkers if you are American).

  • 1960s, A.I. funding increased (mainly military). Famous quote: Within a

generation ... the problem of creating 'artificial intelligence' will substantially be solved."

  • 1970s, A.I. winter. Funding dries up as people realise it is hard. Limited

computing power and dead-end frameworks.

  • 1980s, Revival through bio-inspired algorithms: Neural networks, Genetic
  • Algorithms. A.I. promises the world – lots of commercial investment –

mostly fails. Rule based expert systems used in medical / legal professions.

slide-25
SLIDE 25

History

24

  • 1990s, AI diverges into separate fields: Machine Learning, Computer Vision,

Automated Reasoning, Planning systems, Natural Language processing… Machine Learning begins to overlap with statistics / probability theory.

  • 2000s, ML merging with statistics continues. Other subfields continue in
  • parallel. First commercial-strength applications: Google, Amazon, computer

games, route-finding, credit card fraud detection, etc... Tools adopted as standard by other fields e.g. biology.

  • 2010s, deep neural networks have led to significant performance

improvement in speech recognition, reinforcement learning, image classification, machine translation, etc..

  • Future?

Some links on machine learning history:

https://en.wikipedia.org/wiki/Timeline_of_machine_learning https://cloud.withgoogle.com/build/data-analytics/explore-history-machine-learning/

slide-26
SLIDE 26

Maths Knowledge Overview

  • Linear Algebra:

– Concepts: vector, matrix, etc. – Operations: transpose, sum, multiplication, trace, inverse, etc.

  • Calculus:

– Derivative, partial derivative, gradient, etc.

  • Notes: “Maths Knowledge Overview - for Part 1, COMP24111”

25