Data-driven COVID-19 modeling Data-driven COVID-19 modeling 1 - - PowerPoint PPT Presentation

data driven covid 19 modeling data driven covid 19
SMART_READER_LITE
LIVE PREVIEW

Data-driven COVID-19 modeling Data-driven COVID-19 modeling 1 - - PowerPoint PPT Presentation

Data-driven COVID-19 modeling Data-driven COVID-19 modeling 1 Cyprien Neverov th August 28 , 2020 1 Student at IMT Mines Ales and intern at FAU Erlangen-Nurnberg under the supervision of Prof. Enrique Zuazua. Table of Contents Table of


slide-1
SLIDE 1

Data-driven COVID-19 modeling Data-driven COVID-19 modeling

Cyprien Neverov August 28 , 2020

Student at IMT Mines Ales and intern at FAU Erlangen-Nurnberg under the supervision of Prof. Enrique Zuazua. 1

th

1

slide-2
SLIDE 2

Table of Contents Table of Contents

  • 1. Data-driven system identication
  • 2. Modeling COVID-19
  • 3. Conclusion
slide-3
SLIDE 3
  • 1. Data-driven system identication
  • 1. Data-driven system identication

Identifying the dynamics from data is becoming a key challenge because: Data acquisition is getting cheaper Problems are getting more complex Computational power is cheaper

slide-4
SLIDE 4

Sparse identication of nonlinear dynamical systems Sparse identication of nonlinear dynamical systems

Approach proposed by S. Brunton in [1]. Uses sparse regression Relies on a set of candidate functions Expresses the dynamics as a function which is a linear combination of the candidate functions: where is the state of the system.

f (t) = f(x(t)) dx dt x

slide-5
SLIDE 5
  • 1. Make two matrices
  • 1. Make two matrices

Let's say that we have observed the system at and either observed or numerically computed its time derivative at those time points, then we can construct the two following matrices:

, , . . . , t1 t2 tm = and X = X ˙ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ( )

dx dt t1

( )

dx dt t2

⋮ ( )

dx dt tm

⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ x( ) t1 x( ) t2 ⋮ x( ) tm ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

slide-6
SLIDE 6
  • 2. Augment the state matrix
  • 2. Augment the state matrix

And then we can augment the matrix with the candidate functions this will yield :

X , , … , f1 f2 fp θ(X) θ(X) = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ (x( )) f1 t1 (x( )) f1 t2 ⋮ (x( )) f1 tm (x( )) f2 t1 (x( )) f2 t2 ⋮ (x( )) f2 tm ⋯ ⋯ ⋱ ⋯ (x( )) fp t1 (x( )) fp t2 ⋮ (x( )) fp tm ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

slide-7
SLIDE 7
  • 3. Solve the linear least squares
  • 3. Solve the linear least squares

Now we want to nd a matrix that is a solution to: in the least squares sense. The sparsity is achieved by running the optimization several times and gradually zeroing out the values that are under a cut-off value.

ξ = θ(X)ξ X ˙

slide-8
SLIDE 8

Discretized formulation Discretized formulation This algorithm also works in an iterative manner, when instead of we have : and then we seek to solve

X ˙ X2 = and X = X2 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ x( ) t2 x( ) t3 ⋮ x( ) tm ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ x( ) t1 x( ) t2 ⋮ x( ) tm−1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ = θ(X)ξ X2

slide-9
SLIDE 9

Simple example Simple example

Oscillator with a cubic nonlinearity: From now on we will consider only polynomial terms of the variables as candidate functions: .

x ˙ y ˙ = −0.1 + 2 x3 y3 = −2 − 0.1 x3 y3 1, x, y, , xy, , , … , x2 y2 x3 yp

slide-10
SLIDE 10

In [15]: plt.figure(dpi=100) plt.plot(cubic_oscillator[:, 0], label='x') plt.plot(cubic_oscillator[:, 1], label='y') plt.legend() plt.xlabel('time') plt.ylabel('state') plt.title('Cubic oscillator') plt.show()

slide-11
SLIDE 11

Simple example Simple example

In [27]: X_dot, X = make_targets(cubic_oscillator, derivative=(derivative:=False)) theta_X, _ = make_polynomials(X, max_degree=3) weights, _ = sparse_regression(theta_X, X_dot, cutoff=1e-3) if derivative: weights /= t_ode[1] show_weights(weights, derivative=derivative)

function 1.00577

  • 0.0013371

1.00572

  • 0.00816922
  • 0.0645619
  • 0.00932543
  • 0.00826491

0.0654499

  • 0.00740647

xk+1 yk+1 1 x y x2 xy y2 x3 y x2 xy2 y3

slide-12
SLIDE 12

2.

  • 2. Modeling COVID-19

Modeling COVID-19

The trajectories of cumulative cases:

slide-13
SLIDE 13

In [15]: countries_to_display = ['China', 'South Korea', 'France', 'Spain', 'Germany', 'Uruguay'] plt.figure(dpi=130) for country in countries_to_display: values = ds.cumulative(country) plt.plot(values, label=country) plt.legend() plt.xlabel('days') plt.ylabel('cumulative cases') plt.show()

slide-14
SLIDE 14

Single country trajectory Single country trajectory

slide-15
SLIDE 15

In [31]: country = 'France' data = ds.cumulative(country, rescaling=100000)[:100] X_dot, X = make_targets(data[..., np.newaxis], derivative=(derivative:=True)) theta_X, _ = make_polynomials(X, max_degree=5) weights, _ = sparse_regression(theta_X, X_dot, cutoff=1e-15) show_weights(weights, derivative=derivative) show_trajectory(data, weights)

function

  • 0.000900268

0.372269

  • 1.44715

2.3851

  • 1.59452

0.36289

x ˙ 1 x x2 x3 x4 x5

slide-16
SLIDE 16

Observations Observations

A single trajectory can be described by only two or three parameters. The algorithm is not robust. We did not have the complete evolutions for most of the countries at the time of writing. How can we use the information from countries that are more advanced into the epidemic to make predictions for countries at a more early stage ?

slide-17
SLIDE 17

Multi-country model Multi-country model

What if the evolution of the number of cases in several countries could be governed by a single formula? This would require additional information: Information about the countries, the cultures; Information about the measures taken by the governments to tackle the spread of the disease. We would like to nd a function such that for any country , at any day , we have: Where is this additional information.

f c t = f( , ) xt+1,c xt,c it,c i

slide-18
SLIDE 18

SINDy vs ARIMA SINDy vs ARIMA

seems to be the preferred tool of statisticians to make COVID forecasting: [6, 7]. We compared the performance of our techniques to an ARIMA model for forecasting purposes: ARIMA (https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average)

slide-19
SLIDE 19

SINDy vs ARIMA SINDy vs ARIMA

But it doesn't always work as expected.

slide-20
SLIDE 20

SINDy vs ARIMA SINDy vs ARIMA

Percentage of best guesses Error on 1 week forecast horizon Error on 2 weeks forecast horizon

slide-21
SLIDE 21

Other approaches/experiments Other approaches/experiments

N-Beats and NNs SIR tting and recovering More precedent states ( , ) - autoregression. Rational basis functions (https://kipre.github.io/les/internship/reports/non- linear/nonlinear.html) Time dependent dynamics (https://kipre.github.io/les/internship/reports/covid_time/index.html) Control measures (https://kipre.github.io/les/internship/reports/covid_control/report.html)

xn−1 xn−2

slide-22
SLIDE 22

Conclusion Conclusion

System identication is challenging. In simple settings the SINDy algorithm learns well. In complex situations not so much. It was quite early to work on data-driven approaches. It is really sensitive to small hyperparameters changes. Sensitive to small data changes. Easily overts. Random events had a big impact on the evolutions.

slide-23
SLIDE 23

Other tasks Other tasks

  • a small program to transform

GASLIB .net les to a MATLAB .mat. Written in C++.

  • using the

implemented algorithms to deploy a service through a REST API for identifying systems from data. net2mat (https://github.com/Kipre/net2mat) System identication as a service (https://github.com/Kipre/les/tree/master/internship/siaas)

slide-24
SLIDE 24

References References

[1] Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. 2016. “Discovering Governing Equations from Data by Sparse Identication of Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences 113 (15): 3932–7. . [2] Hale, Thomas, Sam Webster, Anna Petherick, Toby Phillips, and Beatriz Kira. 2020. “Oxford COVID-19 Government Response Tracker.” Blavatnik School of Government. . [3] “Understanding the Coronavirus (COVID-19) Pandemic Through Data. World Bank.” n.d. . [4] Kermack, William Ogilvy, A. G. McKendrick, and Gilbert Thomas Walker. 1997. “A Contribution to the Mathematical Theory of Epidemics.” Proc. R. Soc. Lond. 12: 700–721. . [5] Andreas Kergassner, Christian Burkhardt, Dorothee Lippold, Sarah Nistler, Matthias Kergassner, Paul Steinmann, Dominik Budday, Silvia Budday. 2020. “Meso-scale modeling

  • f COVID-19 spatio-temporal outbreak dynamics in Germany” medRxiv

2020.06.10.20126771; doi https://doi.org/10.1073/pnas.1517384113 (https://doi.org/10.1073/pnas.1517384113) https://github.com/OxCGRT/covid-policy-tracker/ (https://github.com/OxCGRT/covid- policy-tracker/) http://datatopics.worldbank.org/universal-health-coverage/covid19/ (http://datatopics.worldbank.org/universal-health-coverage/covid19/) https://doi.org/10.1098/rspa.1927.0118 (https://doi.org/10.1098/rspa.1927.0118) https://doi.org/10.1101/2020.06.10.20126771

slide-25
SLIDE 25

[6] Guorong Ding, Xinru Li, Yang Shen, Jiao Fan. 2020. “Brief Analysis of the ARIMA model

  • n the COVID-19 in Italy” medRxiv 2020.04.08.20058636; doi:

[7] Lut Bayyurt, Burcu Bayyurt. 2020. “Forecasting of COVID-19 Cases and Deaths Using ARIMA Models” medRxiv 2020.04.17.20069237; doi: [8] Keimer, Alexander & Pug, Lukas. (2020). “Modeling infectious diseases using integro- differential equations: Optimal control strategies for policy decisions and Applications in COVID-19”. 10.13140/RG.2.2.10845.44000. https://doi.org/10.1101/2020.04.08.20058636 https://doi.org/10.1101/2020.04.17.20069237 link (https://www.researchgate.net/publication/341265820_Modeling_infectious_diseases_usin differential_equations_Optimal_control_strategies_for_policy_decisions_and_Applications_in 19?channel=doi&linkId=5eb6577f299bf1287f77ed58&showFulltext=true)

slide-26
SLIDE 26

Appendix Appendix

slide-27
SLIDE 27

Quick introduction to epidemiology Quick introduction to epidemiology

The most basic compartmental model is the SIR (Susceptible, Infected, Recovered) introduced in [4] and is governed by the following dynamics: Where is the total population, and are the parameters of the disease, is called .

S ˙ I ˙ R ˙ = − βIS N = − γI βIS N = γI N β γ

β γ

r0

slide-28
SLIDE 28

Examples for dierent Examples for dierent s

constant_r0 : piece_r0 : exp_r0 :

r0

x ↦ 2 x ↦ ⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2.5, 1, 0.8, 1.1, 1, if x < 10 if x < 55 if x < 90 if x < 130 else x ↦ 20 × + ϵ e−0.1x

slide-29
SLIDE 29

In [9]: integrate_sir(constant_r0)

slide-30
SLIDE 30

In practice In practice

Proposed by [5] With

= S ˙k = I ˙k = Q ˙k = R ˙k = D ˙ k − βIkSk + βIkSk − ∑

l nc

β ~

klSkIl

+ ∑

l nc

β ~

klSkIl −

γ1 ω − 1 Ik + γ1 ω − 1 Ik −γ1Ik + I γ1 − γ2Qk + γ2Qk −γ2 μω 1 − μω Q +γ2 μω 1 − μω Q = β ~

kl

βCC βkβl − − − − √

N ξ

k N λ l

ax N ξ+λ

m

e−

rkl r

slide-31
SLIDE 31

Inuence of the cuto value Inuence of the cuto value

Choosing a cutoff value is not trivial.

slide-32
SLIDE 32

Information about the countries Information about the countries List of country indicators relevant to the pandemic by the World Bank [2], divided into 3 categories: Health: general information about health infrastructure, economics and mortality causes. Water & Sanitation: information about hygiene. Age & Population: demographic indicators.

slide-33
SLIDE 33

Information about the government measures Information about the government measures Social distancing measures had a huge impact on the development of the disease and we want to take them into account. We use a "Stringency Index" provided by a group of researchers from Oxford [3]. An aggregation of policy indicators: information on containment and closure policies, such as school closures and restrictions in movement. Should not be interpreted as effectiveness of the measures.

slide-34
SLIDE 34

In [14]: def plot_cases_and_stringency(country): country_data = ds.cumulative(country) fig, ax1 = plt.subplots(dpi=80) color = 'tab:red' ax1.set_xlabel('time') ax1.set_ylabel('Cases') ax1.plot(country_data[:-1], 'k.', label='cases') ax1.tick_params(axis='y') ax2 = ax1.twinx() color = 'tab:blue' ax2.set_ylabel('Stringency', color=color) ax2.plot(ds.stringency(country)[:-1], color=color) ax2.tick_params(axis='y', labelcolor=color) fig.tight_layout() plt.title(country) plt.show() widgets.interact(plot_cases_and_stringency, country=ds.all_ox_countries()[1:]);

slide-35
SLIDE 35

Fitting Fitting

Experimental setting: 4857 training examples (rows in the and matrices) and 1028 test examples for 105 and 26 countries respectively. 42 variables (state + stringency + indicators). Polynomial terms with a maximum degree of 3. We neglect the terms that are constant with respect to the state. This yields more than 800 polynomial terms. The number of polynomial terms is equal to where is the number of variables and the maximum degree.

X X2

(n+r−1)! r!(n−1)!

n r

slide-36
SLIDE 36

Putting it all together Putting it all together

Here is a description of the information that was used for the model:

Variable Explanation Nature The state of the system: the number of cases in the country The state, time-dependent Stringency index: how severe are the containment measures. (the control) Time-dependent Human development index Constant Total population Constant Population ages 65 and above (% of total) Constant Hospital beds (per 1,000 people) Constant ... ... ... Cause of death, by injury (% of total) Constant

We want to nd a general formula of the following form :

The country indexes were omitted for better readability.

xt i1,t i2 i3 i4 i5 i40

1

= f( , , , , , . . . , ) xt+1 xt i1,t i2 i3 i4 i40

1

slide-37
SLIDE 37

Study of the inuence of the delay Study of the inuence of the delay

It is obvious that the inuence of the delay in the dynamics of the pandemic is very important [8]. But we couldn't nd any delay that would give signicantly better results than no delay.

slide-38
SLIDE 38

Indicators Indicators

Hospital beds (per 1,000 people), Physicians (per 1,000 people), Nurses and midwives (per 1,000 people), UHC service coverage index, Human development index, Current health expenditure (% of GDP), Current health expenditure per capita (current US$), Current health expenditure per capita, PPP (current international $), Out-of-pocket expenditure (% of current health expenditure), Out-of-pocket expenditure per capita (current US$), Out-of-pocket expenditure per capita, PPP (current international $), Diabetes prevalence (% of population ages 20 to 79), Life expectancy at birth, female (years), Life expectancy at birth, total (years), Life expectancy at birth, male (years), Mortality rate, adult, female (per 1,000 female adults), Mortality rate, adult, male (per 1,000 male adults), Mortality from CVD, cancer, diabetes or CRD between exact ages 30 and 70 (%), Mortality from CVD, cancer, diabetes or CRD between exact ages 30 and 70, female (%), Mortality from CVD, cancer, diabetes or CRD between exact ages 30 and 70, male

slide-39
SLIDE 39

Importance of the indicators Importance of the indicators

Let the importance of a variable be the normalized sum of the absolute value of the coefcients of the polynomial terms where this variable appears.

slide-40
SLIDE 40

For a cuto value of For a cuto value of and sparsity of and sparsity of %

10−4

slide-41
SLIDE 41

For a cuto value of For a cuto value of and sparsity of and sparsity of %

1 98

slide-42
SLIDE 42

Logistic trajectories Logistic trajectories

Most of the trajectories have an evolution close to a logistic function: Where and are the parameters. Since the derivative of a logistic function is: we should be able to have some baseline results with polynomials of a maximum degree of 2.

x(t) = a 1 + e−b(t−c) a, b c (t) = x(t) × (1 − x(t))

dx dt

slide-43
SLIDE 43

Integrating the identied system Integrating the identied system

We can easily generate a trajectory with

  • r using the differential

formulation and solving the ODE. But synchronizing is difcult.

= θ( )ξ xk+1 xk = θ(x)ξ x ˙