mining online data for public health surveillance
play

Mining online data for public health surveillance Vasileios Lampos ( - PowerPoint PPT Presentation

Mining online data for public health surveillance Vasileios Lampos ( a.k.a. Bill ) Computer Science University College London @lampos Structure Using online data for health applications From web searches to syndromic surveillance Google


  1. GFT v.2 — Linear multivariate regression X n ! X ( x i w + � � y i ) 2 Least squares argmin w , β i =1 frequency of m search queries for n weeks X ∈ R n × m ∈ R x i ∈ R m , i ∈ { 1 , . . . , n } … for week i ∈ R R n y ∈ R n ILI rates from CDC for n weeks ∈ R R … for week i y i ∈ R i ∈ R R m weights for the m search queries w ∈ R m ∈ R intercept term β ∈ R

  2. GFT v.2 — Linear multivariate regression X n ! X ( x i w + � � y i ) 2 Least squares argmin w , β i =1 frequency of m search queries for n weeks X ∈ R n × m ∈ R x i ∈ R m , i ∈ { 1 , . . . , n } … for week i ∈ R R n y ∈ R n ILI rates from CDC for n weeks ∈ R R … for week i y i ∈ R i ∈ R R m weights for the m search queries w ∈ R m ∈ R intercept term β ∈ R

  3. GFT v.2 — Linear multivariate regression X n ! X ( x i w + � � y i ) 2 ⚠ ☣ ⚠ Least Squares argmin w , β i =1 Least squares regression is not applicable here because we have very few training samples ( n ) frequency of m search queries for n weeks X ∈ R n × m ∈ R but many features (search queries; m ). x i ∈ R m , i ∈ { 1 , . . . , n } … for week i ∈ R R n y ∈ R n ILI rates from CDC for n weeks ∈ R Models derived from least squares will tend R … for week i y i ∈ R to overfit the data, resulting to bad solutions. i ∈ R R m weights for the m search queries w ∈ R m ∈ R intercept term β ∈ R

  4. GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1

  5. GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1 least squares

  6. GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1 least squares ∈ R λ 1 ∈ R + L1 & L2-norm regularisers for the weights λ 2 ∈ R +

  7. GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1 least squares ∈ R λ 1 ∈ R + L1 & L2-norm regularisers for the weights λ 2 ∈ R + Encourages sparse models ( feature selection ) Handles collinear features (search queries) Number of selected features is not limited to the number of samples ( n )

  8. GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1 many weights will least squares ∈ R be set to zero! λ 1 ∈ R + L1 & L2-norm regularisers for the weights λ 2 ∈ R + Encourages sparse models ( feature selection ) Handles collinear features (search queries) Number of selected features is not limited to the number of samples ( n )

  9. GFT v.2 — Feature selection 1st layer: Keep search queries that their frequency time series has a ≥ 0.5 Pearson correlation with the CDC ILI rates ( in the training data ) 2nd layer: Elastic net will assign weights equal to 0 to features (search queries) that are identified as statistically irrelevant to our task μ ( σ ) # queries selected across all training data sets # queries r ≥ 0.5 GFT Elastic net 49,708 937 (334) 46 (39) 278 (64)

  10. GFT v.2 — Evaluation (1/2) Target variable: set y = y 1 , . . . , y N Estimates: e performance: and ˆ y = ˆ y 1 , . . . , ˆ y N N y , y ) = 1 M ean S quared E rror: X y t − y t ) 2 MSE (ˆ (ˆ N t =1 N y , y ) = 1 M ean A bsolute E rror: X MAE (ˆ | ˆ y t − y t | N t =1 M ean A bsolute N � � y , y ) = 1 ˆ y t − y t P ercentage of E rror: X � � MAPE (ˆ � � N y t � � t =1

  11. GFT v.2 — Evaluation (2/2) 0.09 CDC ILI rates 0.08 Elastic Net 0.07 ILI rate (US) 0.06 0.05 0.04 0.03 0.02 0.01 2009 2010 2011 2012 2013 Time (weeks)

  12. GFT v.2 — Evaluation (2/2) 0.09 CDC ILI rates 0.08 Elastic Net GFT 0.07 ILI rate (US) 0.06 0.05 0.04 0.03 0.02 0.01 2009 2010 2011 2012 2013 Time (weeks) GFT r = .89, MAE = 3.81·10 -3 , MAPE = 20.4% Elastic net r = .92, MAE = 2.60·10 -3 , MAPE = 11.9%

  13. GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 0.5 1 1.5 2 2.5 3 3.5 Frequency of query 'flu' 10 -5 US ILI rates (CDC) ~ freq. of query ‘ flu ’

  14. GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 1 2 3 4 5 6 Frequency of query 'flu medicine' 10 -7 US ILI rates (CDC) ~ freq. of query ‘ flu medicine ’

  15. GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 0 1 2 3 4 5 6 Frequency of query 'how long is flu contagious' 10 -7 US ILI rates (CDC) ~ freq. of query ‘ how long is flu contagious ’

  16. GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 1 2 3 4 5 6 7 8 9 10 Frequency of query 'how to break a fever' 10 -7 US ILI rates (CDC) ~ freq. of query ‘ how to break a fever ’

  17. GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 0.5 1 1.5 2 2.5 3 Frequency of query 'sore throat treatment' 10 -7 US ILI rates (CDC) ~ freq. of query ‘ sore throat treatment ’

  18. GFT v.2 — Gaussian Processes (1/4) , x , x 0 2 R m , f : R m ! R � � m ( x ) , k ( x , x 0 ) f ( x ) ⇠ GP A Gaussian Process (GP) learns a distribution over functions that can explain the data Fully specified by a mean ( m ) and a covariance ( kernel ) function ( k ); we set m ( x ) = 0 in our experiments Collection of random variables any finite number of which have a multivariate Gaussian distribution

  19. GFT v.2 — Gaussian Processes (1/4) , x , x 0 2 R m , f : R m ! R � � m ( x ) , k ( x , x 0 ) f ( x ) ⇠ GP A Gaussian Process (GP) learns a distribution over functions that can explain the data Fully specified by a mean ( m ) and a covariance ( kernel ) function ( k ); we set m ( x ) = 0 in our experiments Collection of random variables any finite number of which have a multivariate Gaussian distribution � � 1 1 − 1 2( x − µ ) T Σ − 1 ( x − µ ) N ( x | µ , Σ ) = | Σ | 1 / 2 exp (2 π ) D/ 2

  20. GFT v.2 — Gaussian Processes (1/4) , x , x 0 2 R m , f : R m ! R � � m ( x ) , k ( x , x 0 ) f ( x ) ⇠ GP A Gaussian Process (GP) learns a distribution over functions that can explain the data Fully specified by a mean ( m ) and a covariance ( kernel ) function ( k ); we set m ( x ) = 0 in our experiments Collection of random variables any finite number of which have a multivariate Gaussian distribution � � 1 1 − 1 2( x − µ ) T Σ − 1 ( x − µ ) N ( x | µ , Σ ) = | Σ | 1 / 2 exp (2 π ) D/ 2 � � f ⇤ ⇠ N (0 , K ) , ( K ) ij = k ( x i , x j ) inference

  21. GFT v.2 — Gaussian Processes (2/4) Common GP kernels (covariance functions) Kernel name: Squared-exp ( SE ) Periodic ( Per ) Linear ( Lin ) f ( x − c )( x Õ − c ) − ( x ≠ x Õ ) 2 1 2 1 ¸ 2 sin 2 1 22 − 2 π x ≠ x Õ σ 2 σ 2 σ 2 k ( x, x Õ ) = f exp f exp 2 ¸ 2 p Plot of k ( x, x Õ ) : 0 0 0 x (with x Õ = 1 ) x − x Õ x − x Õ ↓ ↓ ↓ Functions f ( x ) sampled from GP prior: x x x Type of structure: local variation repeating structure linear functions

  22. GFT v.2 — Gaussian Processes (3/4) Adding or multiplying GP kernels produces a new valid GP kernel Lin × Lin SE × Per Lin × SE Lin × Per 0 0 0 0 x (with x Õ = 1 ) x (with x Õ = 1 ) x (with x Õ = 1 ) x − x Õ ↓ ↓ ↓ ↓ quadratic functions locally periodic increasing variation growing amplitude

  23. GFT v.2 — Gaussian Processes (4/4) (x,y) pairs with obvious nonlinear relationship x,y pairs 20 y (target variable) 15 10 5 0 0 10 20 30 40 50 60 x (predictor variable)

  24. GFT v.2 — Gaussian Processes (4/4) least squares regression ( poor solution ) x,y pairs OLS fit (train) OLS fit 20 y (target variable) 15 10 5 0 0 10 20 30 40 50 60 x (predictor variable)

  25. GFT v.2 — Gaussian Processes (4/4) sum of 2 GP kernels ( periodic + squared exponential ) x,y pairs OLS fit OLS fit (train) GP fit (train) GP fit 20 y (target variable) 15 10 5 0 0 10 20 30 40 50 60 x (predictor variable)

  26. GFT v.2 — k -means and GP regression Clustering queries selected by elastic net into C clusters with k -means Clusters are determined by using cosine similarity as the distance metric (on query frequency time series) Groups queries with similar topicality & usage patterns C ! + σ 2 · δ ( x , x 0 ) noise X k ( x , x 0 ) = k SE ( x c i , x 0 c i ) � � i =1 �k x � x 0 k 2 ✓ ◆ k SE ( x , x 0 ) = � 2 exp 2 x = { x c 1 , . . . , x c 10 } 2 ` 2 clusters

  27. GFT v.2 — Performance CDC ILI rates GP 0.07 0.06 0.05 ILI rate (US) 0.04 0.03 0.02 0.01 2009 2010 2011 2012 2013 Time (weeks)

  28. GFT v.2 — Performance CDC ILI rates GP Elastic Net 0.07 0.06 0.05 ILI rate (US) 0.04 0.03 0.02 0.01 2009 2010 2011 2012 2013 Time (weeks) Elastic net r = .92, MAE = 2.60·10 -3 , MAPE = 11.9% GP r = .95, MAE = 2.21·10 -3 , MAPE = 10.8%

  29. GFT v.2 — Queries’ added value p J X X y t = � i y t − i + ! i y t − 52 − i + i =1 i =1 | {z } AR and seasonal AR q K D X X X ✓ i ✏ t − i + ⌫ i ✏ t − 52 − i w i h t,i + ✏ t + i =1 i =1 i =1 | {z } | {z } MA and seasonal MA regression Autoregression : Combine CDC ILI rates from the previous week(s) with the ILI rate estimate from search queries for the current week Various week lags explored (1, 2,…, 6 weeks)

  30. GFT v.2 — Performance

  31. GFT v.2 — Performance 1-week lag for the CDC data AR(CDC) r = .97, MAE = 1.87·10 -3 , MAPE = 8.2% AR(CDC,GP) r = .99, MAE = 1.05·10 -3 , MAPE = 5.7% — 2-week lag for the CDC data AR(CDC) r = .87, MAE = 3.36·10 -3 , MAPE = 14.3% AR(CDC,GP) r = .99, MAE = 1.35·10 -3 , MAPE = 7.3% — GP r = .95, MAE = 2.21·10 -3 , MAPE = 10.8%

  32. GFT v.2 — Non-optimal feature selection Queries irrelevant to flu are still maintained, e.g. “ nba injury report ” or “ muscle building supplements ” Feature selection is primarily based on correlation , then on a linear relationship Introduce a semantic feature selection 
 — enhance causal connections ( implicitly ) 
 — circumvent the painful training of a classifier

  33. GFT v.3 — Word embeddings Word embeddings are vectors of a certain dimensionality (usually from 50 to 1024) that represent words in a corpus Derive these vectors by predicting contextual word occurrence in large corpora ( word2vec ) using a shallow neural network approach: 
 — Continuous Bag-Of-Words ( CBOW ): Predict centre word from surrounding ones 
 — skip-gram : Predict surrounding words from centre one Other methods available: GloVe , fastText

  34. GFT v.3 — Word embedding data sets Use tweets geolocated in the UK to learn word embeddings that may capture 
 — informal language used in searches 
 — British English language / expressions 
 — cultural biases (a)215 million tweets (February 2014 to March 2016), CBOW, 512 dimensions, 137,421 words covered 
 https://doi.org/10.6084/m9.figshare.4052331.v1 (b)1.1 billion tweets (2012 to 2016), skip-gram, 512 dimensions, 470,194 words covered 
 https://doi.org/10.6084/m9.figshare.5791650.v1

  35. GFT v.3 — Cosine similarity n X v i u i v · u word embeddings i =1 cos( v , u ) = k v kk u k = v v n n u u X X v 2 u 2 u u t i t i i =1 i =1

  36. GFT v.3 — Cosine similarity n X v i u i v · u word embeddings i =1 cos( v , u ) = k v kk u k = v v n n u u X X v 2 u 2 u u t i t i i =1 i =1 max (cos( v , ‘king’) + cos( v , ‘woman’) � cos( v , ‘man’)) ) v = ‘queen’ v

  37. GFT v.3 — Cosine similarity n X v i u i v · u word embeddings i =1 cos( v , u ) = k v kk u k = v v n n u u X X v 2 u 2 u u t i t i i =1 i =1 max (cos( v , ‘king’) + cos( v , ‘woman’) � cos( v , ‘man’)) ) v = ‘queen’ v ✓ cos( v , ‘king’) ⇥ cos( v , ‘woman’) ◆ or max ) v = ‘queen’ cos( v , ‘man’) v where cos( · , · ) = (cos( · , · ) + 1) / 2

  38. GFT v.3 — Cosine similarity n X v i u i v · u word embeddings i =1 cos( v , u ) = k v kk u k = v v n n u u X X v 2 u 2 u u t i t i i =1 i =1 max (cos( v , ‘king’) + cos( v , ‘woman’) � cos( v , ‘man’)) ) v = ‘queen’ v Positive context ✓ cos( v , ‘king’) ⇥ cos( v , ‘woman’) ◆ or max ) v = ‘queen’ cos( v , ‘man’) v Negative context where cos( · , · ) = (cos( · , · ) + 1) / 2

  39. GFT v.3 — Analogies in Twitter embd. The … for … not the … is … ? woman king man ? him she he ? better bad good ? England Rome London ? Messi basketball football ? Guardian Conservatives Labour ? Trump Europe USA ? ? rsv fever skin

  40. GFT v.3 — Analogies in Twitter embd. The … for … not the … is … ? woman king man queen him she he ? better bad good ? England Rome London ? Messi basketball football ? Guardian Conservatives Labour ? Trump Europe USA ? ? rsv fever skin

  41. GFT v.3 — Analogies in Twitter embd. The … for … not the … is … ? woman king man queen him she he her better bad good ? England Rome London ? Messi basketball football ? Guardian Conservatives Labour ? Trump Europe USA ? ? rsv fever skin

  42. GFT v.3 — Analogies in Twitter embd. The … for … not the … is … ? woman king man queen him she he her better bad good worse England Rome London Italy Messi basketball football Lebron Guardian Conservatives Labour Telegraph Trump Europe USA Farage flu rsv fever skin

  43. GFT v.3 — Better query selection (1/3) 1. Query embedding = Average token embedding 2. Derive a concept by specifying a positive ( P ) and a negative ( N ) context (sets of n-grams) 3. Rank all queries using their similarity score with this concept

  44. GFT v.3 — Better query selection (1/3) 1. Query embedding = Average token embedding 2. Derive a concept by specifying a positive ( P ) and a negative ( N ) context (sets of n-grams) 3. Rank all queries using their similarity score with this concept � k i =1 cos ( e Q , e P i ) S ( Q , C ) = � z � � j =1 cos e Q , e N j + γ

  45. GFT v.3 — Better query selection (1/3) 1. Query embedding = Average token embedding 2. Derive a concept by specifying a positive ( P ) and a negative ( N ) context (sets of n-grams) 3. Rank all queries using their similarity score with this concept query embedding � k i =1 cos ( e Q , e P i ) S ( Q , C ) = � z � � j =1 cos e Q , e N j + γ embedding of a negative constant to avoid concept n-gram division by 0

  46. GFT v.3 — Better query selection (2/3) Positive context Negative context Most similar queries #flu cold flu medicine fever bieber flu aches flu ebola cold and flu flu medicine wikipedia cold flu symptoms gp colds and flu hospital flu aches flu flu flu gp ebola colds and flu flu hospital wikipedia cold and flu flu medicine cold flu medicine

  47. GFT v.3 — Better query selection (3/3) 2000 Number of search queries 1500 1000 500 0 1.6 1.8 2 2.2 2.4 2.6 2.8 Similarity score (S) Given that the distribution of concept similarity scores appears to be unimodal , we standard deviations from the mean ( μ S + θσ S ) to determine the selected queries

  48. GFT v.3 — Hybrid feature selection Embedding based feature selection is an unsupervised technique , thus non optimal If we combine it with the previous ways for selecting features, will we obtain better inference accuracy ? We test 7 feature selection approaches : similarity → elastic net ( 1 ) correlation → elactic net ( 2 ) → GP ( 3 ) similarity → correlation → elastic net ( 4 ) → GP ( 5 ) similarity → correlation → GP ( 6 ) correlation → GP ( 7 )

  49. GFT v.3 — GP model details � √ � √ M ( x , x ′ ) = σ 2 2 1 − ν � ν � 2 ν 2 ν k ( ν ) r K ν r Γ ( ν ) ℓ ℓ Skipped in the interest of time! − r 2 � � k SE ( x , x ′ ) = σ 2 exp If you’re interested, 2 ℓ 2 check Section 3.1 of 2 https://doi.org/10.1145/3038912.3052622 � � k ( ν =3 / 2) � k ( x , x ′ ) = ( x , x ′ ; σ i , ℓ i ) M i =1 + k SE ( x , x ′ ; σ 3 , ℓ 3 ) + σ 2 4 δ ( x , x ′ )

  50. GFT v.3 — Data & evaluation weekly frequency of 35,572 search queries (UK) from 1/1/ 2007 to 9/08/ 2015 ( 449 weeks ) access to a private Google Health Trends API for health- oriented research corresponding ILI rates for England (Royal College of General Practitioners and Public Health England) test on the last 3 flu seasons in the data (2012-2015) train on increasing data sets starting from 2007 , using all data prior to a test period

  51. GFT v.3 — Performance (1/3) (a) similarity → elastic net (b) correlation → elactic net (c) similarity → correlation → elastic net r MAE x 0.1 MAPE 0.91 0.88 0.87 61.05% 47.15% 36.23% 0.30 0.21 0.19 (a) (b) (c)

  52. GFT v.3 — Performance (1/3) (a) similarity → elastic net (b) correlation → elactic net (c) similarity → correlation → elastic net r MAE x 0.1 MAPE 0.91 0.88 0.87 61.05% 47.15% 36.23% 0.30 0.21 0.19 (a) (b) (c)

  53. GFT v.3 — Performance (2/3) Elastic net with and without word embeddings filtering ILI rate per 100,000 people RCGP/PHE ILI rates 30 Elastic Net (correlation-based feature selection) Elastic Net (hybrid feature selection) 20 10 0 2013 2014 2015 Time (weeks)

  54. GFT v.3 — Performance (2/3) Elastic net with and without word embeddings filtering ILI rate per 100,000 people RCGP/PHE ILI rates 30 Elastic Net (correlation-based feature selection) Elastic Net (hybrid feature selection) 20 10 0 2013 2014 2015 Time (weeks) ratio over highest weight prof. surname (70.3%), name surname (27.2%), heal the world (21.9%), heating oil (21.2%), name surname recipes (21%), tlc diet (13.3%), blood game (12.3%), swine flu vaccine side effects (7.2%)

  55. GFT v.3 — Performance (3/3) (a) correlation → GP (b) correlation → elastic net → GP (c) similarity → correlation → elactic net → GP (d) similarity → correlation → GP r MAE x 0.1 MAPE 0.94 0.93 0.92 0.89 35.88% 34.17% 30.30% 25.81% 0.23 0.22 0.17 0.16 (a) (b) (c) (d)

  56. GFT v.3 — Performance (3/3) (a) correlation → GP (b) correlation → elastic net → GP (c) similarity → correlation → elactic net → GP (d) similarity → correlation → GP r MAE x 0.1 MAPE 0.94 0.93 0.92 0.89 35.88% 34.17% 30.30% 25.81% 0.23 0.22 0.17 0.16 (a) (b) (c) (d)

  57. Multi-task learning m tasks (problems) t 1 ,…, t m observations X t 1 , y t 1 ,…, X t m , y t m learn models f t i : X t i → y t i jointly ( and not independently ) Why? When tasks are related, multi-task learning is expected to perform better than learning each task independently Model learning possible even with a few training samples

  58. Multi-task learning for disease modelling m tasks (problems) t 1 ,…, t m observations X t 1 , y t 1 ,…, X t m , y t m learn models f t i : X t i → y t i jointly ( and not independently ) Can we improve disease models ( flu ) from online search: when sporadic training data are available? across the geographical regions of a country? across two different countries?

  59. Multi-task learning GFT (1/5) HHS Regions surveil- WA o-fold. Firstly, MT ME ND various ge- VT OR MN NH ID SD WI countries — can NY MA WY MI CT RI to assist IA PA NE NJ NV OH eillance MD UT IL IN DE CO WV CA data. We ex- VA KS MO KY multi-task NC TN AZ OK cess, and NM AR SC formulations. We use MS AL GA eriments on TX LA health and FL AK indicate national mod- Region 1 Region 2 Region 3 Region 4 Region 5 absolute HI Region 6 Region 7 Region 8 Region 9 Region 10 reduced Can multi-task learning across the 10 US regions help us improve the national ILI model?

  60. Multi-task learning GFT (1/5) Can multi-task learning across the 10 US regions help us improve the national ILI model? 5 years of training data r MAE 0.97 0.97 0.96 0.96 0.35 0.35 0.25 0.25 Elastic Net MT Elastic Net GP MT GP

  61. Multi-task learning GFT (1/5) Can multi-task learning across the 10 US regions help us improve the national ILI model? 1 year of training data r MAE 0.88 0.87 0.85 0.85 0.53 0.51 0.46 0.44 Elastic Net MT Elastic Net GP MT GP

  62. Multi-task learning GFT (2/5) Can multi-task learning across the 10 US regions help us improve the regional ILI models? 1 year of training data r MAE 0.87 0.86 0.85 0.84 0.54 0.53 0.49 0.47 Elastic Net MT Elastic Net GP MT GP

  63. Multi-task learning GFT (3/5) Can multi-task learning across the 10 US regions help us improve regional models under sporadic health reporting ? Split US regions into two groups , one including the 2 regions with the highest population (4 and 9 in the map), and the other having the remaining 8 regions Train and evaluate models for the 8 regions under the hypothesis that there might exist sporadic health reports Start downsampling the data from the 8 regions using burst error sampling (random data blocks removed) with rate γ (1 no sampling, 0.1 10% sample)

  64. Multi-task learning GFT (3/5) learning mod- Can multi-task learning across the 10 US regions help us 1.0 EN GP elonging to improve regional models under sporadic health MTEN MTGP Train- 0.9 reporting ? sampling 0.8 MAE Split US regions into two groups , one including the 2 0.7 regions with the highest population (4 and 9 in the map), GP and the other having the remaining 8 regions 0.6 MAE Train and evaluate models for the 8 regions under the 0.5 .460 hypothesis that there might exist sporadic health reports .465 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 .465 γ Start downsampling the data from the 8 regions using .467 burst error sampling (random data blocks removed) with rate γ (1 no sampling, 0.1 10% sample)

  65. Multi-task learning GFT (4/5) Correlations between US regions induced by the covariance matrix of the MT GP model Multi-task learning model seems to be capturing existing geographical relations assumption and have R1 search 0.96 A1 R2 and as- R3 orts for 0.88 R4 described in R5 assume A2 ears. The R6 0.80 follow- R7 dication were quencies R8 0.72 “z pak” R9 R10 previous 0.64 statistically US learning ones. ements

  66. Multi-task learning GFT (5/5) Can multi-task learning across countries (US, England) help us improve the ILI model for England? 5 years of training data r MAE 0.90 0.90 0.89 0.89 0.70 0.60 0.49 0.47 Elastic Net MT Elastic Net GP MT GP

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend