Use of Social Media to Monitor and Predict Outbreaks and Public - - PowerPoint PPT Presentation

use of social media to monitor and predict outbreaks and
SMART_READER_LITE
LIVE PREVIEW

Use of Social Media to Monitor and Predict Outbreaks and Public - - PowerPoint PPT Presentation

Use of Social Media to Monitor and Predict Outbreaks and Public Opinion on Health Topics Alessio Signorini Department of Computer Science University of Iowa December 3rd, 2014 Measurement is the first step that leads to control and


slide-1
SLIDE 1

Use of Social Media to Monitor and Predict Outbreaks and Public Opinion

  • n Health Topics

Alessio Signorini

December 3rd, 2014

Department of Computer Science University of Iowa

slide-2
SLIDE 2

“ Measurement is the first step that leads to control and eventually to improvement. “

  • James Harrington
slide-3
SLIDE 3
  • Nascar / Formula One
  • Sports
  • Insurances
  • Sales / Marketing
  • Online Advertising
  • Logistics

Data Analytics

slide-4
SLIDE 4

in Public Health we have

Disease Surveillance

slide-5
SLIDE 5

Surveillance Systems

  • Vital Statistics & Registries (e.g., births, deaths, defects)
  • Population Surveys (e.g., substance abuse)
  • Disease Reporting (e.g., salmonellosis, measles)
  • Sentinel Surveillance (e.g., Influenza-Like Illnesses)
  • Adverse Events Surveillance (e.g., issues with drugs)
  • Laboratory Data
slide-6
SLIDE 6

surveillance data

should be a byproduct

  • f any healthcare operation
slide-7
SLIDE 7

Syndromic Surveillance

  • Focuses on Early Detection
  • Based on disease signs or symptoms, not diagnosis
  • Novel sources: Emergency Room data, Drugs sales
  • Uses well known Data Mining techniques
  • Reduced delay in results
slide-8
SLIDE 8

aggregate and analyze

Social Media Data

to monitor and predict health trends

slide-9
SLIDE 9

~5B/day ~10M/day ~7M/day ~500M/day

  • nline

~27h/mon

5% 13% 19% 20% 21% 22%

Social Search Content Email/IM Video Shopping

mobile

~34h/mon

slide-10
SLIDE 10

Google Searches Positive Tweets

Comprehensive Exam Alessio Signorini

University of Iowa, May 2010

Monitor Public Opinion

vs.

slide-11
SLIDE 11

The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic Alessio Signorini, Alberto Segre, Philip Polgreen

PLoS ONE – Journal, May 2011

slide-12
SLIDE 12

Using Twitter to Estimate H1N1 Activity Alessio Signorini, Alberto Segre, Philip Polgreen

ISDS 2010 – 9th Annual Conference of International Society for Disease Surveillance

error ~0.28% error ~0.37%

Estimate ILI%

slide-13
SLIDE 13

Inferring Travel from Social Media Alessio Signorini, Alberto Segre, Philip Polgreen

ISDS 2011 – 10th Annual Conference of International Society for Disease Surveillance

National Local

Monitor Travels

slide-14
SLIDE 14

can we use

“Social Travel Models”

to improve local flu trends prediction?

slide-15
SLIDE 15

City-Level Flu Trends

  • CDC’s MMWR - Flu & Pneumonia Deaths for 122 cities
  • Smoothed each week with values of prev/next 2 weeks

Philadelphia, PA - Deaths for 2012 New York City, NY - Deaths for 2012

slide-16
SLIDE 16

Social Travel Data

  • 240 Million geolocated tweets posted by 4 Million users
  • Mapped over MMWR cities, discarded overlapping ones
  • Used Spark cluster of 8 machines to do geo-mapping

Volume of Trips among MMWR cities 2012 TKG COL FAT SPK

slide-17
SLIDE 17

Social Travel Model

  • Final dataset: 78 cities, 124M tweets, 2.2M users
  • Assumed “home” the most common location
  • A “trip” was a post at home followed by one elsewhere
  • Used population to scale volume of trips between cities
slide-18
SLIDE 18

Correlation b/w Cities

San Jose, CA Atlanta, GA Philadelphia, PA

slide-19
SLIDE 19

Predicting Flu Trends

  • Flu Trends of 78 cities generated from MMWR data
  • Used 2011 for training and 2012 for testing
  • Support

Vector Regression with polynomial kernel

  • Target: value of local flu trend for that week
  • Features: value of top 20 correlated cities 2 weeks before
slide-20
SLIDE 20

Measures Compared

  • Distance closest 20 cities
  • Similarity most similar 20 cities on 2011 flu trends
  • Flow top 20 cities by number of visitors
slide-21
SLIDE 21

Prediction Results

Dallas, TX San Jose, CA

slide-22
SLIDE 22

Failure Hypothesis

  • Port-of-entry influenced by international travels
  • Noisy data Watebury, CT had only 43 deaths in 2011
  • Few data Fort Wayne has 1/50th of Las

Vegas’ users

Washington, DC - Flu Deaths 2012

slide-23
SLIDE 23

Conclusions

  • Social Media can be an important source for surveillance
  • Can predict American Idol’s winner ;)
  • Allows to monitor public sentiment about health topics
  • Can effectively be used to monitor ILI% in real time
  • Geolocated posts can be used to create travel models
  • Social Travel Data provides additional predictive power

for flu trends

slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26

Checkins Distributions

0 < 1 mile 1 < 10 miles 10 < 100 miles 100 < 1000 miles 1000 < 10000 miles

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 50% 85% 97% 99% 100% % Trips % Cumulative

10s 30s 1m 2m 5m 10m 15m 30m 1h 2h 6h 12h 1d 2d 1w

0% 2% 4% 6% 8% 10% 12% 14% 16% 0% 1% 4% 8% 15% 21% 24% 31% 38% 46% 59% 69% 81% 89% 97% % Trips % Cumulative

slide-27
SLIDE 27

Denver, CO

Distance Similarity Flow

slide-28
SLIDE 28

Smoothing Methods

5 weeks ahead 1 week around 2 weeks around