COMP24111: Machine Learning and Optimisation Chapter 1A: Machine - - PowerPoint PPT Presentation

comp24111 machine learning and optimisation
SMART_READER_LITE
LIVE PREVIEW

COMP24111: Machine Learning and Optimisation Chapter 1A: Machine - - PowerPoint PPT Presentation

COMP24111: Machine Learning and Optimisation Chapter 1A: Machine Learning Basics Dr. Tingting Mu Email: tingting.mu@manchester.ac.uk Machine Learning A machine learning system is a magic box that can be used to Automate a process


slide-1
SLIDE 1

COMP24111: Machine Learning and Optimisation

  • Dr. Tingting Mu

Email: tingting.mu@manchester.ac.uk

Chapter 1A: Machine Learning Basics

slide-2
SLIDE 2

Machine Learning

  • A machine learning system is a magic box that can be used to

– Automate a process – Automate decision making – Extract knowledge from data – Predict future event – Adapt systems dynamically to enable better user experiences – …

  • How do we build a machine learning system?

1

slide-3
SLIDE 3
  • “The goal of machine learning is to make a computer learn just like a

baby — it should get better at tasks with experience.”

  • Basic idea:

– To represent experiences with data. – To covert a task to a parametric model. – To convert the learning quality to an objective function. – To determine the model through optimising an objective function.

  • Machine learning research builds on optimisaton theory, linear algebra,

statistics…

Machine Learning

2

slide-4
SLIDE 4
  • Wine experts identify the grape type

by smelling and tasting the wine.

  • The chemist says that wines derived from

different grape types are different in terms

  • f alcohol, malic acid, alcalinity of ash,

magnesium, color intensity, etc.

  • We get the measurements. But, too many numbers…

Example: Wine Classification

Can build a machine learning system to automate grape type iden5fica5on!

3

slide-5
SLIDE 5
  • Task: To identify the grape type of a wine sample based on its

chemical quantities measured!

v Collecting wine samples for each grape type. v Characterising each wine sample with 13 chemical features.

Feature Extrac5on 30 bo=les in total, 10 bo=les for each tree type, each bo=le is charaterised with 13 features.

x1 = x1,1, x1,2, x1,3,…, x1,12, x1,13 ⎡ ⎣ ⎤ ⎦, y1 = grape type 1 x2 = x2,1, x2,2, x2,3,…, x2,12, x2,13 ⎡ ⎣ ⎤ ⎦, y2 = grape type 2 x3 = x3,1, x3,2, x3,3,…, x3,12, x3,13 ⎡ ⎣ ⎤ ⎦, y3 = grape type 2 ! ! x30 = x30,1, x30,2, x30,3,…, x30,12, x30,13 ⎡ ⎣ ⎤ ⎦, y30 = grape type 1

feature vectors class labels Experiences 4

Example: Wine Classification

slide-6
SLIDE 6

ˆ y = g(x) = type 1, if wixi

i=1 13

+ b ≥ 0 type 2, if wixi

i=1 13

+ b < 0 ⎧ ⎨ ⎪ ⎪ ⎩ ⎪ ⎪ v Design a mathematical model to predict the grape type. The model below is controlled by 14 parameters:

w1,w2,…,w13,b

[ ]

⇒ ˆ y1 = g(x1) ⇒ ˆ y2 = g(x2) ! ⇒ ˆ y30 = g(x30)

Predicted grape type by computer. Wine features.

bottle 1: x1 bottle 2: x2 ! ! bottle 30: x30

Real grape type. ✔ ✗ ✔

v System training is the process of finding the best model parameters by minimising a loss function.

y1 y2 ! y30 w1

*,w2 *,…,w13 * , b*

⎡ ⎣ ⎤ ⎦= argmin

w1 ,w2,…,w13 , b

Oloss w1,w2,…,w13 , b

( )

Loss: predic5ve inaccuracy

5

Example: Wine Classification

slide-7
SLIDE 7
  • Now, given an unseen bottle of wine:

13 Features: x1 = 12.25, x2 = 3.88, x3 = 2.2, x4 = 18.5, x5 = 112, x6 = 1.38, x7 = 0.78, x8 = 0.29, x9 = 1.14, x10 = 8.21, x11 = 0.65, x12 = 2, x13 = 855

ˆ y = g(x) = type 1, if w*

ixi i=1 13

+ b* ≥ 0 type 2, if w*

ixi i=1 13

+ b* < 0 ⎧ ⎨ ⎪ ⎪ ⎩ ⎪ ⎪

6

Example: Wine Classification

slide-8
SLIDE 8

The World Generates Data!

  • Data is recorded on real-world phenomenons. The World is driven by data.

– Germany’s climate research centre generates 10 petabytes per year. – Google processes 24 petabytes per day. – PC users crossed over 300 billion videos in August 2014 alone, with an average of 202 videos and 952 minutes per viewer. – There were 223 million credit card purchases in March 2016, with a total value of £12.6 billion in UK. – Photo uploads in Facebook is around 300 million per day. – Approximately 2.5 million new scientific papers are published each year. – …

  • What might we want to do with that data?

– Prediction - what can we predict about this phenomenon? – Description - how can we describe/understand this phenomenon in a new way?

  • Humans cannot handle manually data in such scale any more. A machine

learning system can learn from data and offer insights.

7

slide-9
SLIDE 9

Machine Learning Speech Recogni5on Speech Synthesis Natural Language Processing Text Mining Computer Vision Data Mining, Analysis, Engineering Robo5cs

Machine learning is important!

All of these are subfields

  • f Artificial

Intelligence (A.I.)

8

slide-10
SLIDE 10

School Courses

Machine Learning Speech Recogni5on Speech Synthesis Natural Language Processing Text Mining Computer Vision Data Mining, Analysis, Engineering Robo5cs COMP14112, Fundamentals of A.I. COMP37212, Computer Vision COMP38120, Documents, Services and Data on the Web COMP61332, Text Mining COMP60711, Data Engineering COMP34120, AI and Games COMP24111, Machine Learning and Optimisation COMP61011, Foundations of Machine Learning COMP61021, Modelling and visualization

  • f high-dimensional data

9

slide-11
SLIDE 11

Learning Type: Supervised

10

  • In supervised learning, there is a “teacher” who provides a target
  • utput for each data pattern. This guides the computer to build a

predictive relationship between the data pattern and the target output.

  • The target output can be a real-valued number, an integer, a symbol,

a set of real-valued numbers, a set of integers, or a set of symbols.

  • A training example (or called sample) is a pair consisting of an input

data pattern (or called object) and a target output.

  • A test example is used to assess the strength and utility of a predictive
  • relationship. Its target output is only used for evaluation purpose, and

never contributes to the learning process.

  • Typical supervised learning tasks include classification and

regression.

slide-12
SLIDE 12

11

The target output is a category label.

  • Medical diagnosis: x=patient data, y=positive/negative of some

pathology

  • Optical character recognition: x=pixel values and writing curves,

y=‘A’, ‘B’, ‘C’, …

  • Image analysis: x=image pixel features, y=scene/objects contained

in image

  • Weather: x=current & previous conditions per location,

y=tomorrow’s weather … … … this list can never end, applications of classification are vast and extremely active!

Classification Examples:

slide-13
SLIDE 13

Regression Examples:

12

The target output is a continuous number (or a set of such numbers).

  • Finance: x=current market conditions and other possible side

information, y=tomorrow’s stock market price

  • Social Media: x=videos the viewer is watching on YouTube,

y=viewer’s age

  • Robotics: x=control signals sent to motors, y=the 3D location of a

robot arm end effector

  • Medical Health: x=a number of clinical measurements, y=the amount
  • f prostate specific antigen in the body
  • Environment: x=weather data, time, door sensors, etc., y=the

temperature at any location inside a building … … … this list can never end, applications of regression are vast and extremely active!

slide-14
SLIDE 14

Successful Applications

13

  • Convert speech to text, translate from one language to the other.
slide-15
SLIDE 15

Successful Applications

14

  • Face recognition
slide-16
SLIDE 16

Successful Applications

15

  • Object recognition, speech synthesis,

information retrieval.

slide-17
SLIDE 17

Learning Type: Unsupervised

16

  • In unsupervised learning, there is no explicit “teacher”.
  • The systems form a natural “understanding” of the hidden structure

from unlabelled data.

  • Typical unsupervised learning task includes

– Clustering: group similar data patterns together. – Generative modelling: estimate distribution of the observed data patterns. – Unsupervised representation learning: remove noise, capture data statistics, capture inherent data structure.

MATLAB’s example

From https://cambridge-intelligence.com/keylines-network-clustering/

MATLAB’s example

slide-18
SLIDE 18

Successful Applications

17

  • Document clustering and visualisation
slide-19
SLIDE 19

Learning Type: Reinforcement

18

  • In reinforcement learning, there is a “teacher” who provides

feedback on the action of an agent, in terms of reward and punishment.

  • Examples:

– Helicopter manoeuvres: reward for following desired trajectory, punishment for crashing – Manage an investment portfolio: reward for each $ in bank – Control a power station: reward for producing power, punishment for exceeding safety thresholds – Make a humanoid robot walk, reward for forward motion, punishment for falling over – Play many different Atari games better than humans: reward for increasing score, punishment for decreasing score

These examples are from UCL course on RL.

slide-20
SLIDE 20

Successful Applications

19

  • Game player, self-driving cars, trading strategy.
slide-21
SLIDE 21

History

20

  • 1940s, Human reasoning / logic first studied as a formal subject within

mathematics (Claude Shannon, Kurt Godel et al).

  • 1950s, The Turing Test is proposed: a test for true machine intelligence,

expected to be passed by year 2000. Various game-playing programs built. 1956, Dartmouth conference coins the phrase artificial intelligence . 1959, Arthur Samuel wrote a program that learnt to play draughts (checkers if you are American).

  • 1960s, A.I. funding increased (mainly military). Famous quote: Within a

generation ... the problem of creating 'artificial intelligence' will substantially be solved."

  • 1970s, A.I. winter. Funding dries up as people realise it is hard. Limited

computing power and dead-end frameworks.

slide-22
SLIDE 22

History

21

  • 1980s, Revival through bio-inspired algorithms: Neural networks, Genetic
  • Algorithms. A.I. promises the world – lots of commercial investment –

mostly fails. Rule based expert systems used in medical / legal professions.

  • 1990s, AI diverges into separate fields: Machine Learning, Computer Vision,

Automated Reasoning, Planning systems, Natural Language processing… Machine Learning begins to overlap with statistics / probability theory.

  • 2000s, ML merging with statistics continues. Other subfields continue in
  • parallel. First commercial-strength applications: Google, Amazon, computer

games, route-finding, credit card fraud detection, etc... Tools adopted as standard by other fields e.g. biology.

  • 2010s, deep neural networks have led to significant performance

improvement in speech recognition, reinforcement learning, image classification, machine translation, etc..

  • Future?