Machine Learning for Environmental Grand Challenges Shakir Mohamed - - PowerPoint PPT Presentation

machine learning for environmental grand challenges
SMART_READER_LITE
LIVE PREVIEW

Machine Learning for Environmental Grand Challenges Shakir Mohamed - - PowerPoint PPT Presentation

Machine Learning for Environmental Grand Challenges Shakir Mohamed Research Scientist, DeepMind @shakir_za #AI4ER Cambridge Principles to Products Advancing Climate and Fairness and Assistive Autonomous Healthcare Applications Science


slide-1
SLIDE 1

Machine Learning for Environmental Grand Challenges

Shakir Mohamed

Research Scientist, DeepMind

@shakir_za #AI4ER Cambridge

slide-2
SLIDE 2 Shakir Mohamed

Principles to Products

Probability Theory Bayesian Analysis Hypothesis Testing Estimation Theory Asymptotics Principles Uncertainty Information Gain Causality Information Prediction Planning Explanation Rapid Learning World Simulation Objects and Relations Reasoning Advancing Science Assistive Technology Climate and Energy Healthcare Fairness and Safety Autonomous systems Applications

2
slide-3
SLIDE 3 Shakir Mohamed

Statistical Operations

Modelling Estimation and Learning Hypothesis Testing Experimental Design

Data Enumeration Summarisation Comparison Inference

3
slide-4
SLIDE 4 Shakir Mohamed

Statistical Operations

What we can know about our data Inference What we can do with our data. Decision-making

4 Data Enumeration Summarisation Comparison Inference
slide-5
SLIDE 5

Paru I: Pathways in Machine Learning

Shakir Mohamed

Research Scientist, DeepMind

@shakir_za #AI4ER Cambridge

slide-6
SLIDE 6 Shakir Mohamed

On Models

Model: Description of the world, of data,

  • f potential scenarios, of processes.

Most models in machine learning are probabilistic. Probabilistic models let you learn probability distributions of data.

Peak hour Bad Weather Accident Traffic Jam Sirens

A probabilistic model writes out these models using the language of probability

6
slide-7
SLIDE 7

Statistical Inference Laplace approximation Maximum Likelihood Maximum a posteriori Cavity Methods

  • Integr. Nested

Laplace Approx Expectation Maximisation Markov chain Monte Carlo Variational Inference Sequential Monte Carlo Noise Contrastive Two Sample Comparison Transpo!ation methods Approx Bayesian Computation Method of Moments Max Mean Discrepency Direct Indirect

Learning Principles

slide-8
SLIDE 8 Shakir Mohamed

Model Evidence

Integral is intractable in general and requires approximation. Model evidence (or marginal likelihood, paruition function): Integrating out any global and local variables enables model scoring, comparison, selection, moment estimation, normalisation, posterior computation and prediction.

z x

f(z)

Learning principle: Model Evidence

prob- func- ) di- wn the are p(z) f(z) z q(z)

p(x) = Z p(x, z)dz

Basic idea: Transform the integral into an expectation over a simple, known distribution.

8
slide-9
SLIDE 9 Shakir Mohamed

Variational Methods

Deterministic approximation procedures with bounds on probabilities of interest. Fit the variational parameters.

qφ(z)

KL[q(z|y)kp(z|y)]

Approximation class True posterior

9
slide-10
SLIDE 10 Shakir Mohamed

Learning by Comparison

Interest is not in estimating the marginal probabilities, only in how they are related.

We compare the estimated distribution q(x) to the true distribution p*(x) using samples. Basic idea: Transform into learning a model of the density ratio.

z

f(z)

x

p*(x) q(x)

Learning principle: Two-sample tests

Ratiosp(x(1)) p(x(2))

p∗(x) q(x) = 1 p∗(x) = q(x)

10
slide-11
SLIDE 11 Shakir Mohamed

Estimation by Comparison

11

Density Estimation by Comparison Density Difference

rφ = p∗ − qθ

Density Ratio

rφ = p∗

f-Divergence Class Probability Estimation Bregman Divergence Moment Matching

Bf[r∗kr]

f(u) = u log u − (u + 1) log(u + 1)

Mixtures with identical moments

L(θ, φ)

Max Mean Discrepency

H0 : p∗ = qθ vs. p∗ 6= qθ

Mohamed and Lakshminarayanan (2016)

Two steps

  • 1. Use a hypothesis test or

comparison to obtain some model to tells how data from

  • ur model difgers from
  • bserved data.
  • 2. Adjust model to betuer

match the data distribution using the comparison model from step 1.

slide-12
SLIDE 12 Shakir Mohamed

Algorithms for Generative Models

12

Fully-observed auto- regressive models

PixelCNN and Wavenet Variational Autoencoders

Prescribed latent variable models and variational inference

Data x

Inference Network q(z |x) z ~ q(z | x) Model p(x |z) x ~ p(x | z) z

Generative Adversarial Networks

Generator

z

xreal

xgen

Implicit latent variable models and estimation-by-comparison

slide-13
SLIDE 13 Shakir Mohamed

Stochastic Optimisation

13 Common gradient problem
  • 1. Pathwise estimator: Difgerentiate the function f(z)
  • 2. Score-function estimator: Difgerentiate the

density q(z|x) Typical problem areas

  • Sensitivity analysis
  • Generative models and inference
  • Reinforcement learning and control
  • Operations research and inventory control
  • Monte Carlo simulation
  • Finance and asset pricing

rφEqφ(z)[fθ(z)] = r Z qφ(z)fθ(z)dz

Mohamed et al. (2019)
slide-14
SLIDE 14 Shakir Mohamed 14

Progress in Generative Models

ImageNet

Conv- DRAW Pixel RNN DRAW VAE
  • Conv. Generative
Adversarial Network Visual Quality of Independent Samples
slide-15
SLIDE 15 Shakir Mohamed

Perception-Action Loops

15

Computational perception-action loop Biological perception-action loop

slide-16
SLIDE 16 Shakir Mohamed

Environment Simulation

16 Data

xt-1 State st State st-1 Action at-1 mt

Data

xt State st+1 mt+1

Data

xt+1 Action at Action at+!

mt-1

Action-conditional and latent-only transitions. Grounded representations in actions and observations, using simulation to supporu grounding.

Chiappa et al. (2017)
slide-17
SLIDE 17

Shakir Mohamed

slide-18
SLIDE 18 Shakir Mohamed

Intrinsic Motivation

18

Equip agents with mechanisms to produce and learn from internal rewards that can guide behaviour, when external rewards are absent.

Escaping a Predator

1 1 2 3 4 5 6 6 True MI Mohamed and Rezende (2015)
slide-19
SLIDE 19 Mnih et al. (2015)
slide-20
SLIDE 20 Shakir Mohamed

AlphaZero

20

Fully general; No opening book; No endgame database; No heuristics; Starts from random All learned without any reference to past human games

Generalising AlphaGo to any 2-player game

Silver et al. (2018)
slide-21
SLIDE 21 Shakir Mohamed

Applications in Healthcare

21

1

Better clinical

  • utcomes 2

Enhance patient and clinician experience 3 Reduce costs

6h Outpatient events Admission Model 24h Data used by the model 48h history New entry 24h 48h 72h AKI Predicted Time unknown Optional longer history
slide-22
SLIDE 22 Shakir Mohamed

Predicting Organ Failure

22
  • Make predictions of AKI

up to 48hr ahead. Provide a window for meaningful action. For the most severe cases, can detect up to 90% of cases.

Tomasev et al. (2019)
slide-23
SLIDE 23 Shakir Mohamed

Critical Practice for ML

Consider the uses of our models.

What are the dual uses of generative models. How do we think critically about these uses, educate, regulate, co-design these tools.

Bansal et al. (2018) 23
slide-24
SLIDE 24 Future of Life Institute, Value alignment map 24

Dual-uses and Value Alignment

slide-25
SLIDE 25 Shakir Mohamed

Neutrality and Universality

Neutrality Traps

  • The Poruability Trap: Failure to understand how repurposing

algorithmic solutions designed for one social context may be inaccurate / do harm when applied to a difgerent context.

  • The Formalism Trap: Failure to account for the full meaning of

social concepts such as fairness, which be resolved through mathematical formalisms.

  • The Ripple Efgect Trap: Failure to understand how the inseruion
  • f technology into an existing social system changes the

behaviours and embedded values of the pre-existing system .

  • The Solutionism Trap: Failure to recognise the possibility that

the best solution to a problem may not involve technology. Universality

‘A mono-cultural view of ethics conceives itself as the only valid one. In order to avoid this kind of ethical chauvinism and colonialism it is necessary that transcultural ethics arise from an intercultural dialogue instead of thinking of itself as universal without noticing its own cultural bias.’ Capurro, 2004

25
slide-26
SLIDE 26

Paru II: AI for Environmental Risk

Shakir Mohamed

Research Scientist, DeepMind

@shakir_za #AI4ER Cambridge

slide-27
SLIDE 27
slide-28
SLIDE 28 Shakir Mohamed

Extreme Weather Events

28

Given CAM5 outputs of a tropical cyclone and its initial position, track its trajectory.

Segment Tropical Cyclones, Atmospheric Rivers from background

Tools for data assimilation, analysis of NWP simulations, and new types of decision supporu.

Mudigonda et al. (2017)
slide-29
SLIDE 29 Shakir Mohamed

Hybrid Physical Process Modelling

29

Predict future sea surgace temperature (SST) from previous synthetic SST data from NEMO (Nucleus for European Modeling of the Ocean)

Physical Model: Advection-Difgusion Equation

Solution

Key Idea: Predict w De Beznac et al. (2017)
slide-30
SLIDE 30 Shakir Mohamed

Solar Nowcasting

30

Predict solar irradiance, accounting for clouds.

  • Numerical weather models become out of date with respect to the most

recent observations.

  • Solar irradiance is greatly afgected by clouds; operational numerical weather

models can’t resolve clouds.

  • Radiative transfer codes in numerical weather models are some of the most

computationally expensive bits of numerical weather models.

Kelly (2019), wikipedia.
slide-31
SLIDE 31 Shakir Mohamed 31

Dramatically increase efficiency of existing systems Application to Google Data Centres

Energy Consumption

slide-32
SLIDE 32 Shakir Mohamed

Data Centre Energy Usage

32

Data centres across the world use around 3% of the world’s electricity Cooling energy is the largest non- server load (up to 40% of total energy usage)

State

Incoming IT load Power meters
 Pressure sensors Temperature sensors Water flow meters Pump and fan speeds Fault alarms Weather conditions

Actions

Number of cooling towers Number of chillers Number of pumps Temperature setpoints
 Pressure setpoints Flow setpoints Valve positions

Over 1,200 state variables and 20 actions

slide-33
SLIDE 33 Shakir Mohamed

General Learning Framework for DC Operations

33

Power Usage Effectiveness Temperature Pressure

Outputs Inputs

State and Actions

State inputs

  • Current IT load
  • Power meters
  • Pressure sensors
  • Temp sensors
  • Weather
  • Fan speeds

Actions

  • # active coolers
  • # chillers
  • Pumps on/off
  • Temp setpoints
  • Valve setpoints
  • Pressure setpoints
Gamble and Gao (2018)
slide-34
SLIDE 34 34

Every five minutes: generate recommendations, send to a human operator for implementation

Data Center Sensors Processing Human operator

slide-35
SLIDE 35

ML Control On 40% reduction in data center cooling energy

slide-36
SLIDE 36 Shakir Mohamed

System Insights

36

Spread the load across more equipment. Local v. Global trade-offs. Higher flow is not always better. Reduced water flow to chillers in some weather conditions. Shifting the loads. Learned to shift cooling load to components that were more or less efficient at different times of year.

slide-37
SLIDE 37

Data Center Sensors Processing Local data center system

Recommendations are sent directly to the data centre, to be verified by the local controls system for safety before implementation.

After three quarters of operation, scaling it up and getting it into production using a safety-first automation approach

slide-38
SLIDE 38

Continuous monitoring Automatic failover Smooth transfer Two-layer verification Constant communication Uncertainty estimation Rules and heuristics as backup Human in the loop

Safety-fjrst for direct AI control

Gamble and Gao (2018)
slide-39
SLIDE 39 Shakir Mohamed

Managing Energy Generation

39

Improving the economics of wind energy to accelerate adoption The cost of turbines has plummeted, but wind is unpredictable and intermittent The unpredictability of renewable energy makes it less valuable than fossil fuel energy One strategy: train a system for predicting and scheduling wind energy

slide-40
SLIDE 40

Applying ML algorithms to 700MW of Google’s wind farm portfolio.

slide-41
SLIDE 41

Global numerical weather forecasts Local weather observations Future wind power output (36 hours in advance) Inputs Outputs

→ → ←

Wind Power: Predicted Output v Ground Truth

Elkin and Witherspoon (2019)
slide-42
SLIDE 42

increase in economic value, compared to baseline of no time-based commitments to grid

20%

slide-43
SLIDE 43
slide-44
SLIDE 44 Data Enumeration Summarisation Comparison Inference Statistical Inference Laplace approximation Maximum Likelihood Maximum a posteriori Cavity Methods
  • Integr. Nested
Laplace Approx Expectation Maximisation Markov chain Monte Carlo Variational Inference Sequential Monte Carlo Noise Contrastive Two Sample Comparison Transpo!ation methods Approx Bayesian Computation Method of Moments Max Mean Discrepency Direct Indirect
slide-45
SLIDE 45
  • Mohamed, S., and Balaji L.. "Learning in implicit generative models." arXiv preprint arXiv:1610.03483(2016).
  • Mohamed, S., and Rezende D. J. . "Variational information maximisation for intrinsically motivated reinforcement learning." Advances in

neural information processing systems. 2015.

  • Mohamed, S., Rosca, M., Figurnov, M., & Mnih, A. (2019). Monte Carlo Gradient Estimation in Machine Learning. arXiv preprint

arXiv:1906.10652.

  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Petersen, S. (2015). Human-level control through deep

reinforcement learning. Nature, 518(7540), 529.

  • Silver, D., Huberu, T., Schrituwieser, J., Antonoglou, I., Lai, M., Guez, A., ... & Lillicrap, T. (2017). Mastering chess and shogi by self-play with

a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815.

  • Tomašev, Nenad, et al. "A clinically applicable approach to continuous prediction of future acute kidney injury." Nature 572.7767 (2019):

116.

  • Bansal, A., Ma, S., Ramanan, D., & Sheikh, Y. (2018). Recycle-gan: Unsupervised video retargeting. In Proceedings of the European

Conference on Computer Vision (ECCV) (pp. 119-135).

  • Mudigonda, M., Kim, S., Mahesh, A., Kahou, S., Kashinath, K., Williams, D., ... & Prabhat, M. (2017). Segmenting and tracking extreme

climate events using neural networks. In Deep Learning for Physical Sciences (DLPS) Workshop, held with NIPS Conference.

  • de Bézenac, E., Pajot, A., & Gallinari, P. Towards a Hybrid Approach to Physical Process Modeling.
  • J. Kelly (2019). Open Climate Fix. htups:/

/openclimatefjx.github.io/

  • C. Gamble and J. Gao. Safety-fjrst AI for autonomous data centre cooling and industrial control
  • htups:/

/deepmind.com/blog/aruicle/safety-fjrst-ai-autonomous-data-centre-cooling-and-industrial-control

  • C. Elkin and S. Witherspoon (2019). Machine learning can boost the value of wind energy. htups:/

/deepmind.com/blog/aruicle/machine- learning-can-boost-value-wind-energy

References

slide-46
SLIDE 46

Machine Learning for Environmental Grand Challenges

Shakir Mohamed

Research Scientist, DeepMind

@shakir_za #AI4ER Cambridge With thanks to colleagues and the work of many others referenced here from our ML community.