Mining online data for public health surveillance Vasileios Lampos ( - PowerPoint PPT Presentation

GFT v.2 — Linear multivariate regression X n ! X ( x i w + � � y i ) 2 Least squares argmin w , β i =1 frequency of m search queries for n weeks X ∈ R n × m ∈ R x i ∈ R m , i ∈ { 1 , . . . , n } … for week i ∈ R R n y ∈ R n ILI rates from CDC for n weeks ∈ R R … for week i y i ∈ R i ∈ R R m weights for the m search queries w ∈ R m ∈ R intercept term β ∈ R

GFT v.2 — Linear multivariate regression X n ! X ( x i w + � � y i ) 2 ⚠ ☣ ⚠ Least Squares argmin w , β i =1 Least squares regression is not applicable here because we have very few training samples ( n ) frequency of m search queries for n weeks X ∈ R n × m ∈ R but many features (search queries; m ). x i ∈ R m , i ∈ { 1 , . . . , n } … for week i ∈ R R n y ∈ R n ILI rates from CDC for n weeks ∈ R Models derived from least squares will tend R … for week i y i ∈ R to overfit the data, resulting to bad solutions. i ∈ R R m weights for the m search queries w ∈ R m ∈ R intercept term β ∈ R

GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1

GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1 least squares

GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1 least squares ∈ R λ 1 ∈ R + L1 & L2-norm regularisers for the weights λ 2 ∈ R +

GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1 least squares ∈ R λ 1 ∈ R + L1 & L2-norm regularisers for the weights λ 2 ∈ R + Encourages sparse models ( feature selection ) Handles collinear features (search queries) Number of selected features is not limited to the number of samples ( n )

GFT v.2 — Regularisation with elastic net @ A 0 1 n m m ( x i w + β − y i ) 2 + λ 1 X X X w 2 argmin | w j | + λ 2 @ j A w , β i =1 j =1 j =1 many weights will least squares ∈ R be set to zero! λ 1 ∈ R + L1 & L2-norm regularisers for the weights λ 2 ∈ R + Encourages sparse models ( feature selection ) Handles collinear features (search queries) Number of selected features is not limited to the number of samples ( n )

GFT v.2 — Feature selection 1st layer: Keep search queries that their frequency time series has a ≥ 0.5 Pearson correlation with the CDC ILI rates ( in the training data ) 2nd layer: Elastic net will assign weights equal to 0 to features (search queries) that are identified as statistically irrelevant to our task μ ( σ ) # queries selected across all training data sets # queries r ≥ 0.5 GFT Elastic net 49,708 937 (334) 46 (39) 278 (64)

GFT v.2 — Evaluation (1/2) Target variable: set y = y 1 , . . . , y N Estimates: e performance: and ˆ y = ˆ y 1 , . . . , ˆ y N N y , y ) = 1 M ean S quared E rror: X y t − y t ) 2 MSE (ˆ (ˆ N t =1 N y , y ) = 1 M ean A bsolute E rror: X MAE (ˆ | ˆ y t − y t | N t =1 M ean A bsolute N � � y , y ) = 1 ˆ y t − y t P ercentage of E rror: X � � MAPE (ˆ � � N y t � � t =1

GFT v.2 — Evaluation (2/2) 0.09 CDC ILI rates 0.08 Elastic Net 0.07 ILI rate (US) 0.06 0.05 0.04 0.03 0.02 0.01 2009 2010 2011 2012 2013 Time (weeks)

GFT v.2 — Evaluation (2/2) 0.09 CDC ILI rates 0.08 Elastic Net GFT 0.07 ILI rate (US) 0.06 0.05 0.04 0.03 0.02 0.01 2009 2010 2011 2012 2013 Time (weeks) GFT r = .89, MAE = 3.81·10 -3 , MAPE = 20.4% Elastic net r = .92, MAE = 2.60·10 -3 , MAPE = 11.9%

GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 0.5 1 1.5 2 2.5 3 3.5 Frequency of query 'flu' 10 -5 US ILI rates (CDC) ~ freq. of query ‘ flu ’

GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 1 2 3 4 5 6 Frequency of query 'flu medicine' 10 -7 US ILI rates (CDC) ~ freq. of query ‘ flu medicine ’

GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 0 1 2 3 4 5 6 Frequency of query 'how long is flu contagious' 10 -7 US ILI rates (CDC) ~ freq. of query ‘ how long is flu contagious ’

GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 1 2 3 4 5 6 7 8 9 10 Frequency of query 'how to break a fever' 10 -7 US ILI rates (CDC) ~ freq. of query ‘ how to break a fever ’

GFT v.2 — Nonlinearities in the data 0.08 Raw data Linear fit 0.07 0.06 0.05 ILI rates 0.04 0.03 0.02 0.01 0 0.5 1 1.5 2 2.5 3 Frequency of query 'sore throat treatment' 10 -7 US ILI rates (CDC) ~ freq. of query ‘ sore throat treatment ’

GFT v.2 — Gaussian Processes (1/4) , x , x 0 2 R m , f : R m ! R � � m ( x ) , k ( x , x 0 ) f ( x ) ⇠ GP A Gaussian Process (GP) learns a distribution over functions that can explain the data Fully specified by a mean ( m ) and a covariance ( kernel ) function ( k ); we set m ( x ) = 0 in our experiments Collection of random variables any finite number of which have a multivariate Gaussian distribution

GFT v.2 — Gaussian Processes (1/4) , x , x 0 2 R m , f : R m ! R � � m ( x ) , k ( x , x 0 ) f ( x ) ⇠ GP A Gaussian Process (GP) learns a distribution over functions that can explain the data Fully specified by a mean ( m ) and a covariance ( kernel ) function ( k ); we set m ( x ) = 0 in our experiments Collection of random variables any finite number of which have a multivariate Gaussian distribution � � 1 1 − 1 2( x − µ ) T Σ − 1 ( x − µ ) N ( x | µ , Σ ) = | Σ | 1 / 2 exp (2 π ) D/ 2

GFT v.2 — Gaussian Processes (1/4) , x , x 0 2 R m , f : R m ! R � � m ( x ) , k ( x , x 0 ) f ( x ) ⇠ GP A Gaussian Process (GP) learns a distribution over functions that can explain the data Fully specified by a mean ( m ) and a covariance ( kernel ) function ( k ); we set m ( x ) = 0 in our experiments Collection of random variables any finite number of which have a multivariate Gaussian distribution � � 1 1 − 1 2( x − µ ) T Σ − 1 ( x − µ ) N ( x | µ , Σ ) = | Σ | 1 / 2 exp (2 π ) D/ 2 � � f ⇤ ⇠ N (0 , K ) , ( K ) ij = k ( x i , x j ) inference

GFT v.2 — Gaussian Processes (2/4) Common GP kernels (covariance functions) Kernel name: Squared-exp ( SE ) Periodic ( Per ) Linear ( Lin ) f ( x − c )( x Õ − c ) − ( x ≠ x Õ ) 2 1 2 1 ¸ 2 sin 2 1 22 − 2 π x ≠ x Õ σ 2 σ 2 σ 2 k ( x, x Õ ) = f exp f exp 2 ¸ 2 p Plot of k ( x, x Õ ) : 0 0 0 x (with x Õ = 1 ) x − x Õ x − x Õ ↓ ↓ ↓ Functions f ( x ) sampled from GP prior: x x x Type of structure: local variation repeating structure linear functions

GFT v.2 — Gaussian Processes (3/4) Adding or multiplying GP kernels produces a new valid GP kernel Lin × Lin SE × Per Lin × SE Lin × Per 0 0 0 0 x (with x Õ = 1 ) x (with x Õ = 1 ) x (with x Õ = 1 ) x − x Õ ↓ ↓ ↓ ↓ quadratic functions locally periodic increasing variation growing amplitude

GFT v.2 — Gaussian Processes (4/4) (x,y) pairs with obvious nonlinear relationship x,y pairs 20 y (target variable) 15 10 5 0 0 10 20 30 40 50 60 x (predictor variable)

GFT v.2 — Gaussian Processes (4/4) least squares regression ( poor solution ) x,y pairs OLS fit (train) OLS fit 20 y (target variable) 15 10 5 0 0 10 20 30 40 50 60 x (predictor variable)

GFT v.2 — Gaussian Processes (4/4) sum of 2 GP kernels ( periodic + squared exponential ) x,y pairs OLS fit OLS fit (train) GP fit (train) GP fit 20 y (target variable) 15 10 5 0 0 10 20 30 40 50 60 x (predictor variable)

GFT v.2 — k -means and GP regression Clustering queries selected by elastic net into C clusters with k -means Clusters are determined by using cosine similarity as the distance metric (on query frequency time series) Groups queries with similar topicality & usage patterns C ! + σ 2 · δ ( x , x 0 ) noise X k ( x , x 0 ) = k SE ( x c i , x 0 c i ) � � i =1 �k x � x 0 k 2 ✓ ◆ k SE ( x , x 0 ) = � 2 exp 2 x = { x c 1 , . . . , x c 10 } 2 ` 2 clusters

GFT v.2 — Performance CDC ILI rates GP 0.07 0.06 0.05 ILI rate (US) 0.04 0.03 0.02 0.01 2009 2010 2011 2012 2013 Time (weeks)

GFT v.2 — Performance CDC ILI rates GP Elastic Net 0.07 0.06 0.05 ILI rate (US) 0.04 0.03 0.02 0.01 2009 2010 2011 2012 2013 Time (weeks) Elastic net r = .92, MAE = 2.60·10 -3 , MAPE = 11.9% GP r = .95, MAE = 2.21·10 -3 , MAPE = 10.8%

GFT v.2 — Queries’ added value p J X X y t = � i y t − i + ! i y t − 52 − i + i =1 i =1 | {z } AR and seasonal AR q K D X X X ✓ i ✏ t − i + ⌫ i ✏ t − 52 − i w i h t,i + ✏ t + i =1 i =1 i =1 | {z } | {z } MA and seasonal MA regression Autoregression : Combine CDC ILI rates from the previous week(s) with the ILI rate estimate from search queries for the current week Various week lags explored (1, 2,…, 6 weeks)

GFT v.2 — Performance

GFT v.2 — Performance 1-week lag for the CDC data AR(CDC) r = .97, MAE = 1.87·10 -3 , MAPE = 8.2% AR(CDC,GP) r = .99, MAE = 1.05·10 -3 , MAPE = 5.7% — 2-week lag for the CDC data AR(CDC) r = .87, MAE = 3.36·10 -3 , MAPE = 14.3% AR(CDC,GP) r = .99, MAE = 1.35·10 -3 , MAPE = 7.3% — GP r = .95, MAE = 2.21·10 -3 , MAPE = 10.8%

GFT v.2 — Non-optimal feature selection Queries irrelevant to flu are still maintained, e.g. “ nba injury report ” or “ muscle building supplements ” Feature selection is primarily based on correlation , then on a linear relationship Introduce a semantic feature selection   — enhance causal connections ( implicitly )   — circumvent the painful training of a classifier

GFT v.3 — Word embeddings Word embeddings are vectors of a certain dimensionality (usually from 50 to 1024) that represent words in a corpus Derive these vectors by predicting contextual word occurrence in large corpora ( word2vec ) using a shallow neural network approach:   — Continuous Bag-Of-Words ( CBOW ): Predict centre word from surrounding ones   — skip-gram : Predict surrounding words from centre one Other methods available: GloVe , fastText

GFT v.3 — Word embedding data sets Use tweets geolocated in the UK to learn word embeddings that may capture   — informal language used in searches   — British English language / expressions   — cultural biases (a)215 million tweets (February 2014 to March 2016), CBOW, 512 dimensions, 137,421 words covered   https://doi.org/10.6084/m9.figshare.4052331.v1 (b)1.1 billion tweets (2012 to 2016), skip-gram, 512 dimensions, 470,194 words covered   https://doi.org/10.6084/m9.figshare.5791650.v1

GFT v.3 — Cosine similarity n X v i u i v · u word embeddings i =1 cos( v , u ) = k v kk u k = v v n n u u X X v 2 u 2 u u t i t i i =1 i =1

GFT v.3 — Cosine similarity n X v i u i v · u word embeddings i =1 cos( v , u ) = k v kk u k = v v n n u u X X v 2 u 2 u u t i t i i =1 i =1 max (cos( v , ‘king’) + cos( v , ‘woman’) � cos( v , ‘man’)) ) v = ‘queen’ v

GFT v.3 — Cosine similarity n X v i u i v · u word embeddings i =1 cos( v , u ) = k v kk u k = v v n n u u X X v 2 u 2 u u t i t i i =1 i =1 max (cos( v , ‘king’) + cos( v , ‘woman’) � cos( v , ‘man’)) ) v = ‘queen’ v ✓ cos( v , ‘king’) ⇥ cos( v , ‘woman’) ◆ or max ) v = ‘queen’ cos( v , ‘man’) v where cos( · , · ) = (cos( · , · ) + 1) / 2

GFT v.3 — Cosine similarity n X v i u i v · u word embeddings i =1 cos( v , u ) = k v kk u k = v v n n u u X X v 2 u 2 u u t i t i i =1 i =1 max (cos( v , ‘king’) + cos( v , ‘woman’) � cos( v , ‘man’)) ) v = ‘queen’ v Positive context ✓ cos( v , ‘king’) ⇥ cos( v , ‘woman’) ◆ or max ) v = ‘queen’ cos( v , ‘man’) v Negative context where cos( · , · ) = (cos( · , · ) + 1) / 2

GFT v.3 — Analogies in Twitter embd. The … for … not the … is … ? woman king man ? him she he ? better bad good ? England Rome London ? Messi basketball football ? Guardian Conservatives Labour ? Trump Europe USA ? ? rsv fever skin

GFT v.3 — Analogies in Twitter embd. The … for … not the … is … ? woman king man queen him she he ? better bad good ? England Rome London ? Messi basketball football ? Guardian Conservatives Labour ? Trump Europe USA ? ? rsv fever skin

GFT v.3 — Analogies in Twitter embd. The … for … not the … is … ? woman king man queen him she he her better bad good ? England Rome London ? Messi basketball football ? Guardian Conservatives Labour ? Trump Europe USA ? ? rsv fever skin

GFT v.3 — Analogies in Twitter embd. The … for … not the … is … ? woman king man queen him she he her better bad good worse England Rome London Italy Messi basketball football Lebron Guardian Conservatives Labour Telegraph Trump Europe USA Farage flu rsv fever skin

GFT v.3 — Better query selection (1/3) 1. Query embedding = Average token embedding 2. Derive a concept by specifying a positive ( P ) and a negative ( N ) context (sets of n-grams) 3. Rank all queries using their similarity score with this concept

GFT v.3 — Better query selection (1/3) 1. Query embedding = Average token embedding 2. Derive a concept by specifying a positive ( P ) and a negative ( N ) context (sets of n-grams) 3. Rank all queries using their similarity score with this concept � k i =1 cos ( e Q , e P i ) S ( Q , C ) = � z � � j =1 cos e Q , e N j + γ

GFT v.3 — Better query selection (1/3) 1. Query embedding = Average token embedding 2. Derive a concept by specifying a positive ( P ) and a negative ( N ) context (sets of n-grams) 3. Rank all queries using their similarity score with this concept query embedding � k i =1 cos ( e Q , e P i ) S ( Q , C ) = � z � � j =1 cos e Q , e N j + γ embedding of a negative constant to avoid concept n-gram division by 0

GFT v.3 — Better query selection (2/3) Positive context Negative context Most similar queries #flu cold flu medicine fever bieber flu aches flu ebola cold and flu flu medicine wikipedia cold flu symptoms gp colds and flu hospital flu aches flu flu flu gp ebola colds and flu flu hospital wikipedia cold and flu flu medicine cold flu medicine

GFT v.3 — Better query selection (3/3) 2000 Number of search queries 1500 1000 500 0 1.6 1.8 2 2.2 2.4 2.6 2.8 Similarity score (S) Given that the distribution of concept similarity scores appears to be unimodal , we standard deviations from the mean ( μ S + θσ S ) to determine the selected queries

GFT v.3 — Hybrid feature selection Embedding based feature selection is an unsupervised technique , thus non optimal If we combine it with the previous ways for selecting features, will we obtain better inference accuracy ? We test 7 feature selection approaches : similarity → elastic net ( 1 ) correlation → elactic net ( 2 ) → GP ( 3 ) similarity → correlation → elastic net ( 4 ) → GP ( 5 ) similarity → correlation → GP ( 6 ) correlation → GP ( 7 )

GFT v.3 — GP model details � √ � √ M ( x , x ′ ) = σ 2 2 1 − ν � ν � 2 ν 2 ν k ( ν ) r K ν r Γ ( ν ) ℓ ℓ Skipped in the interest of time! − r 2 � � k SE ( x , x ′ ) = σ 2 exp If you’re interested, 2 ℓ 2 check Section 3.1 of 2 https://doi.org/10.1145/3038912.3052622 � � k ( ν =3 / 2) � k ( x , x ′ ) = ( x , x ′ ; σ i , ℓ i ) M i =1 + k SE ( x , x ′ ; σ 3 , ℓ 3 ) + σ 2 4 δ ( x , x ′ )

GFT v.3 — Data & evaluation weekly frequency of 35,572 search queries (UK) from 1/1/ 2007 to 9/08/ 2015 ( 449 weeks ) access to a private Google Health Trends API for health- oriented research corresponding ILI rates for England (Royal College of General Practitioners and Public Health England) test on the last 3 flu seasons in the data (2012-2015) train on increasing data sets starting from 2007 , using all data prior to a test period

GFT v.3 — Performance (1/3) (a) similarity → elastic net (b) correlation → elactic net (c) similarity → correlation → elastic net r MAE x 0.1 MAPE 0.91 0.88 0.87 61.05% 47.15% 36.23% 0.30 0.21 0.19 (a) (b) (c)

GFT v.3 — Performance (2/3) Elastic net with and without word embeddings filtering ILI rate per 100,000 people RCGP/PHE ILI rates 30 Elastic Net (correlation-based feature selection) Elastic Net (hybrid feature selection) 20 10 0 2013 2014 2015 Time (weeks)

GFT v.3 — Performance (2/3) Elastic net with and without word embeddings filtering ILI rate per 100,000 people RCGP/PHE ILI rates 30 Elastic Net (correlation-based feature selection) Elastic Net (hybrid feature selection) 20 10 0 2013 2014 2015 Time (weeks) ratio over highest weight prof. surname (70.3%), name surname (27.2%), heal the world (21.9%), heating oil (21.2%), name surname recipes (21%), tlc diet (13.3%), blood game (12.3%), swine flu vaccine side effects (7.2%)

GFT v.3 — Performance (3/3) (a) correlation → GP (b) correlation → elastic net → GP (c) similarity → correlation → elactic net → GP (d) similarity → correlation → GP r MAE x 0.1 MAPE 0.94 0.93 0.92 0.89 35.88% 34.17% 30.30% 25.81% 0.23 0.22 0.17 0.16 (a) (b) (c) (d)

Multi-task learning m tasks (problems) t 1 ,…, t m observations X t 1 , y t 1 ,…, X t m , y t m learn models f t i : X t i → y t i jointly ( and not independently ) Why? When tasks are related, multi-task learning is expected to perform better than learning each task independently Model learning possible even with a few training samples

Multi-task learning for disease modelling m tasks (problems) t 1 ,…, t m observations X t 1 , y t 1 ,…, X t m , y t m learn models f t i : X t i → y t i jointly ( and not independently ) Can we improve disease models ( flu ) from online search: when sporadic training data are available? across the geographical regions of a country? across two different countries?

Multi-task learning GFT (1/5) HHS Regions surveil- WA o-fold. Firstly, MT ME ND various ge- VT OR MN NH ID SD WI countries — can NY MA WY MI CT RI to assist IA PA NE NJ NV OH eillance MD UT IL IN DE CO WV CA data. We ex- VA KS MO KY multi-task NC TN AZ OK cess, and NM AR SC formulations. We use MS AL GA eriments on TX LA health and FL AK indicate national mod- Region 1 Region 2 Region 3 Region 4 Region 5 absolute HI Region 6 Region 7 Region 8 Region 9 Region 10 reduced Can multi-task learning across the 10 US regions help us improve the national ILI model?

Multi-task learning GFT (1/5) Can multi-task learning across the 10 US regions help us improve the national ILI model? 5 years of training data r MAE 0.97 0.97 0.96 0.96 0.35 0.35 0.25 0.25 Elastic Net MT Elastic Net GP MT GP

Multi-task learning GFT (1/5) Can multi-task learning across the 10 US regions help us improve the national ILI model? 1 year of training data r MAE 0.88 0.87 0.85 0.85 0.53 0.51 0.46 0.44 Elastic Net MT Elastic Net GP MT GP

Multi-task learning GFT (2/5) Can multi-task learning across the 10 US regions help us improve the regional ILI models? 1 year of training data r MAE 0.87 0.86 0.85 0.84 0.54 0.53 0.49 0.47 Elastic Net MT Elastic Net GP MT GP

Multi-task learning GFT (3/5) Can multi-task learning across the 10 US regions help us improve regional models under sporadic health reporting ? Split US regions into two groups , one including the 2 regions with the highest population (4 and 9 in the map), and the other having the remaining 8 regions Train and evaluate models for the 8 regions under the hypothesis that there might exist sporadic health reports Start downsampling the data from the 8 regions using burst error sampling (random data blocks removed) with rate γ (1 no sampling, 0.1 10% sample)

Multi-task learning GFT (3/5) learning mod- Can multi-task learning across the 10 US regions help us 1.0 EN GP elonging to improve regional models under sporadic health MTEN MTGP Train- 0.9 reporting ? sampling 0.8 MAE Split US regions into two groups , one including the 2 0.7 regions with the highest population (4 and 9 in the map), GP and the other having the remaining 8 regions 0.6 MAE Train and evaluate models for the 8 regions under the 0.5 .460 hypothesis that there might exist sporadic health reports .465 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 .465 γ Start downsampling the data from the 8 regions using .467 burst error sampling (random data blocks removed) with rate γ (1 no sampling, 0.1 10% sample)

Multi-task learning GFT (4/5) Correlations between US regions induced by the covariance matrix of the MT GP model Multi-task learning model seems to be capturing existing geographical relations assumption and have R1 search 0.96 A1 R2 and as- R3 orts for 0.88 R4 described in R5 assume A2 ears. The R6 0.80 follow- R7 dication were quencies R8 0.72 “z pak” R9 R10 previous 0.64 statistically US learning ones. ements

Multi-task learning GFT (5/5) Can multi-task learning across countries (US, England) help us improve the ILI model for England? 5 years of training data r MAE 0.90 0.90 0.89 0.89 0.70 0.60 0.49 0.47 Elastic Net MT Elastic Net GP MT GP

Mining online data for public health surveillance Vasileios Lampos ( - PowerPoint PPT Presentation

Mining online data for public health surveillance Vasileios Lampos ( a.k.a. Bill ) Computer Science University College London @lampos Structure Using online data for health applications From web searches to syndromic surveillance Google

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

(In)Visibility and Surveillance Questions Surveillance & Security Positives to surveillance

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

The Role of Health Surveillance Jaki Leaker HSE Specialist Inspector of Health & Safety

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Mekong Basin Disease Surveillance Mekong Basin Disease Surveillance Mekong Basin Disease

Vaccine Preventable Disease surveillance Dr Mercy Kamupira 10 th Annual African Vaccinology

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Interplay of micromotion and interactions in fractional Floquet Chern insulators Egidijus

Semiclassical Resonances of Schr odinger operators as zeroes of regularized determinants

3

Fundamental physics with CMB: anomalies, new particles, primordial black holes Rishi Khatri In

J-PARC E14 K O TO CsI 03/25/2011 Eito IWAI,

NUCLEAR POTENTIALS IN QCD AND THEIR EXTENSIONS Sinya Aoki, Yukawa Institute for Theoretical

Efficient Photonic Coding Yury Audzevich, Philip Watts, Sean Kilmurray, Andrew W. Moore

Model Selection and Nave Bayes Machine Learning - 10601 Geoff Gordon, Miroslav Dudk

Mining online data for public health surveillance Vasileios Lampos ( - PowerPoint PPT Presentation

Mining online data for public health surveillance Vasileios Lampos ( a.k.a. Bill ) Computer Science University College London @lampos Structure Using online data for health applications From web searches to syndromic surveillance Google

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

(In)Visibility and Surveillance Questions Surveillance &amp; Security Positives to surveillance

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

The Role of Health Surveillance Jaki Leaker HSE Specialist Inspector of Health &amp; Safety

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Mekong Basin Disease Surveillance Mekong Basin Disease Surveillance Mekong Basin Disease

Vaccine Preventable Disease surveillance Dr Mercy Kamupira 10 th Annual African Vaccinology

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Interplay of micromotion and interactions in fractional Floquet Chern insulators Egidijus

Semiclassical Resonances of Schr odinger operators as zeroes of regularized determinants

3

Fundamental physics with CMB: anomalies, new particles, primordial black holes Rishi Khatri In

J-PARC E14 K O TO CsI 03/25/2011 Eito IWAI,

NUCLEAR POTENTIALS IN QCD AND THEIR EXTENSIONS Sinya Aoki, Yukawa Institute for Theoretical

Efficient Photonic Coding Yury Audzevich, Philip Watts, Sean Kilmurray, Andrew W. Moore

Model Selection and Nave Bayes Machine Learning - 10601 Geoff Gordon, Miroslav Dudk

(In)Visibility and Surveillance Questions Surveillance & Security Positives to surveillance

The Role of Health Surveillance Jaki Leaker HSE Specialist Inspector of Health & Safety