Machine Learning NEIL LAWRENCE UNIVERSITY OF SHEFFIELD @lawrennd - - PowerPoint PPT Presentation

β–Ά
machine learning
SMART_READER_LITE
LIVE PREVIEW

Machine Learning NEIL LAWRENCE UNIVERSITY OF SHEFFIELD @lawrennd - - PowerPoint PPT Presentation

Gaussian Processes for Machine Learning NEIL LAWRENCE UNIVERSITY OF SHEFFIELD @lawrennd GLOBAL INFORMATION STORAGE CAPACITY IN OPTIMALLY COMPRESSED BYTES SVMs ConvNets dominate NIPS Developed Coal Google Facebook Amazon Tin Startups


slide-1
SLIDE 1

Gaussian Processes for Machine Learning

NEIL LAWRENCE UNIVERSITY OF SHEFFIELD @lawrennd

slide-2
SLIDE 2

GLOBAL INFORMATION STORAGE CAPACITY

IN OPTIMALLY COMPRESSED BYTES

ConvNets Developed SVMs dominate NIPS

slide-3
SLIDE 3
slide-4
SLIDE 4

Coal Tin Google Facebook Amazon Startups

slide-5
SLIDE 5
slide-6
SLIDE 6

The Data are Not Enough

  • Four pillars:
  • Deterministic/Stochastic
  • Mechanistic/Emipirical
  • Goal: model complex phenomena over time
  • Problem:
  • Mechanistic models are often inaccurate
  • Data is often not rich enough for an empirical approach
  • Question 1: How do we combine inaccurate physical model with

machine learning?

slide-7
SLIDE 7

Central Dogma

DNA mRNA Protein

Transcription Translation

slide-8
SLIDE 8

Decision: Transcription Factors

mRNA TF Protein Other mRNAs

Translation Transcription Measured using Microarray since 1998 Measured using Microarray since 1998 Difficult to measure

slide-9
SLIDE 9

β…†π‘žπ‘ˆπΊ(𝑒) ⅆ𝑒 = π‘‘π‘”π‘›π‘ˆπΊ 𝑒 βˆ’ π‘’π‘”π‘žπ‘ˆπΊ(𝑒) ⅆ𝑛𝑗(𝑒) ⅆ𝑒 = π‘‘π‘—π‘žπ‘ˆπΊ(𝑒) βˆ’ 𝑒𝑗𝑛𝑗(𝑒)

Mechanistic Model

mRNA π‘›π‘ˆπΊ 𝑒 TF Protein π‘žπ‘ˆπΊ(𝑒) Other mRNAs 𝑛𝑗(𝑒)

Translation Transcription

slide-10
SLIDE 10

Need to Model π‘žπ‘ˆπΊ(𝑒)

  • Gaussian process: a probabilistic model for functions.
  • Formally known as a stochastic process.
  • Multivariate Gaussian is normally defined by a mean vector, 𝝂,

and a covariance matrix, C. 𝑧~𝑂(𝝂, C)

  • Gaussian process defined by a mean function, 𝜈(𝑒), and a

covariance function, 𝑑(𝑒, 𝑒′). 𝑧(𝑒)~𝑂(𝜈(𝑒), 𝑑(𝑒, 𝑒′))

slide-11
SLIDE 11

Zero Mean Gaussian Sample

5 10 15 20 25 5 10 15 20 25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

5 10 15 20 25 0.5 1 1.5 2

index index index samples from Gaussian covariance C y t 𝑒′ 𝑒 samples from Gaussian process covariance function 𝑑(𝑒, 𝑒′) 𝑧(𝑒)

Zero Mean Gaussian Process Sample

slide-12
SLIDE 12

Gaussian Processes π‘ž 𝑧1|𝑔

1

π‘ž 𝑧2|𝑔

2

𝑦2, 𝑧2 𝑦1, 𝑧1 π‘ž 𝐠 𝐲 π‘ž 𝐠|𝐳, 𝐲

slide-13
SLIDE 13

π‘žπ‘ˆπΊ(𝑒) π‘›π‘ˆπΊ 𝑒

β…†π‘žπ‘ˆπΊ(𝑒) ⅆ𝑒 = π‘‘π‘”π‘›π‘ˆπΊ 𝑒 βˆ’ π‘’π‘”π‘žπ‘ˆπΊ(𝑒) ⅆ𝑛𝑗(𝑒) ⅆ𝑒 = π‘‘π‘—π‘žπ‘ˆπΊ(𝑒) βˆ’ 𝑒𝑗𝑛𝑗(𝑒)

𝑛𝑗(𝑒)

Results

TPAMI, 2 PNAS papers, 2 Comp Bio

slide-14
SLIDE 14
slide-15
SLIDE 15

MATLAB Demo

  • demo_2016_04_28_amazon.m
slide-16
SLIDE 16

Further Challenge

  • This model inter-relates different functions with mechanistic

understanding.

  • What if you need to inter-relate across different modalities of

data at different scales.

  • E.g. biopsy images + genetic test + mammogram for breast

cancer diagnostics.

slide-17
SLIDE 17

The Data are Not Enough

  • Four pillars:
  • Deterministic/Stochastic
  • Mechanistic/Empirical
  • Goal: model complex phenomena over time
  • Problem:
  • Mechanistic models are often inaccurate
  • Data is often not rich enough for an empirical approach
  • Question 2: How do we formulate the right representations to

integrate different data modalities?

slide-18
SLIDE 18

Classical Latent Variables

x y

slide-19
SLIDE 19

Classical Treatment

  • Assume a priori that

x~𝑂 0, I

  • Relate linearly to y

y = Wx +𝛝

  • Framework covers many classical models PCA, Factor

Analysis, ICA

slide-20
SLIDE 20

Render Gaussian Non Gaussian

𝑧 = 𝑔(𝑦) 𝑦 𝑧

slide-21
SLIDE 21

Use Abstraction for Complex Systems

High Level Ideas Stratification of Concepts Low Level Mechanisms

slide-22
SLIDE 22
slide-23
SLIDE 23

Biology and Health

Health ? ? ? Molecular Biology

slide-24
SLIDE 24

Neuroscience

Behaviour ? ? ? Neuron Firing

slide-25
SLIDE 25

g 𝑦 = f9 f8 f7 f6 β‹― g 𝑦

f1(𝑦) f2(βˆ™) f3(βˆ™) f4(βˆ™) f5(βˆ™) f6(βˆ™)f7(βˆ™)f8(βˆ™)f9(βˆ™)

slide-26
SLIDE 26

Stochastic Process Composition

  • A new approach to forming stochastic processes
  • Mathematical composition:

𝑧 𝑦 = 𝑔

1 𝑔 2 𝑔 3 𝑦

  • Properties of resulting process highly non-Gaussian
  • Allows for hierarchical structured form of model.
  • Learning in models of this type has become known as: deep

learning.

slide-27
SLIDE 27

(200 iterations)

slide-28
SLIDE 28

(converged)

slide-29
SLIDE 29
slide-30
SLIDE 30

2

slide-31
SLIDE 31

3

slide-32
SLIDE 32

model MSE (train) MSE (test) mlp (200 iters) 108.5 1185.1 mlp (converged) 24.0 1338.2 gp 59.2 1095.4 deep gp (2) 146.2 833.7 deep gp (3) 182.5 843.6

One hundred hidden nodes, one hundred inducing points

slide-33
SLIDE 33

data set π‘œ π‘ž GP Sparse GP Deep GP housing 506 13 2.78Β±0.54 2.77Β±0.60 2.69Β±0.49 redwine 588 11 0.72Β±0.06 0.62Β±0.04 0.62Β±0.04 energy1 768 8 0.48Β±0.07 0.50Β±0.07 0.49Β±0.07 energy2 768 8 0.59Β±0.08 1.66Β±0.21 1.39Β±0.49 concrete 1030 8 5.26Β±0.67 5.81Β±0.62 5.66Β±0.62

Regression

slide-34
SLIDE 34

Bayesian Optimization

  • Check

http://sheffieldml.github.io/GPyOpt/

slide-35
SLIDE 35
slide-36
SLIDE 36

Use Abstraction for Complex Systems

High Level Ideas Stratification of Concepts Low Level Mechanisms

slide-37
SLIDE 37

Example: Motion Capture Modelling

slide-38
SLIDE 38

MATLAB Demo

  • demo_2016_04_28_amazon.m
slide-39
SLIDE 39

Modelling Digits

slide-40
SLIDE 40

MATLAB Demo

  • demo_2016_04_28_amazon.m
slide-41
SLIDE 41
slide-42
SLIDE 42

Numerical Issues

slide-43
SLIDE 43

Health

  • Complex

system

  • Scarce data
  • Different

modalities

  • Poor

understanding

  • f mechanism
  • Large scale

gene expression clinical notes biopsy X-ray genotype epigenotype environmen t

State of health Organ states Cell states

PLoS Comp Bio, Nature Communications survival analysis clinical tests treatment biopsy X-ray

slide-44
SLIDE 44

To Find Out More

  • Gaussian Process Summer School
  • 12th-15th September 2016 in Sheffield
  • This year in parallel with/themed as a UQ orientated school (co-
  • rganisation with Rich Wilkinson)
  • Occurring alongside ENBIS Meeting
  • http://gpss.cc/
slide-45
SLIDE 45

Future

  • Methodology
  • Deep GPs (also current)
  • Latent Force Models (current but dormant)
  • Latent Action Models and Stochastic Optimal Control (new)
  • Probabilistic Geometries (starting)
  • Exemplar Applications
  • Health and Biology (existing)
  • Developing world (existing)
  • Robotics at different scales (starting)
  • Perception: vision (dormant) haptic (new)
slide-46
SLIDE 46

Summary

  • Complex systems:
  • β€˜big data’ is too β€˜small’.
  • The data are not enough.
  • Need data efficient methods
  • http://www.theguardian.com/media-network/2016/jan/28/google-ai-go-grandmaster-

real-winner-deepmind

  • Solutions:
  • Hybrid mechanistic-empirical models
  • Structured models for automated data assimilation
slide-47
SLIDE 47

Thank you

Neil Lawrence http://inverseprobability.com @lawrennd

slide-48
SLIDE 48

The Digital Oligarchy

  • Response to concentration of

power with data

  • CitizenMe
  • London based start up
  • User-centric data modelling
  • New challenges in ML
  • Integration of ML, systems,

cryptography.

slide-49
SLIDE 49

Open Data Science and Africa

Challenge

  • β€œWhole pipeline challenge”
  • Make software available
  • Teach summer schools
  • Support local meetings
  • Publicity in the Guardian
  • Opportunities to deploy

pipeline solution

slide-50
SLIDE 50

Disease Incidence for Malaria

slide-51
SLIDE 51

Uganda

  • Spatial models of disease
slide-52
SLIDE 52

http://pulselabkampala.ug/hmis/

Deployed with UN Global Pulse Lab