Members: Raghuram Krishnamachari Manish Maheshwari Maryam El - - PowerPoint PPT Presentation

members
SMART_READER_LITE
LIVE PREVIEW

Members: Raghuram Krishnamachari Manish Maheshwari Maryam El - - PowerPoint PPT Presentation

Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove Flu Prediction / Activity CDC Flu Activity Reports Influenza like Illness (ILI) for each region Google Flu Trends Aggregates


slide-1
SLIDE 1

Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by:

  • Prof. Alan Mislove
slide-2
SLIDE 2

Flu Prediction / Activity

 CDC Flu Activity

 Reports Influenza like Illness (ILI) for each

region

 Google Flu Trends

 Aggregates search data to estimate flu activity

 Our experiment (Twitter)

 Analyze Twitter data (tweets) to estimate flu

activity

slide-3
SLIDE 3

Google Flu Trends

 CDC’s ILI data VS Google Flu Trends

slide-4
SLIDE 4

Google Flu Trends Vs Twitter

2000 4000 6000 8000 10000 12000

HHS Region 1 (CT, ME, MA, NH, RI, VT) HHS Region 2 (NJ, NY) HHS Region 3 (DE, DC, MD, PA, VA, WV) HHS Region 4 (AL, FL, GA, KY, MS, NC, SC, TN) HHS Region 5 (IL, IN, MI, MN, OH, WI) HHS Region 6 (AR, LA, NM, OK, TX) HHS Region 7 (IA, KS, MO, NE) HHS Region 8 (CO, MT, ND, SD, UT, WY) HHS Region 9 (AZ, CA, HI, NV) HHS Region 10 (AK, ID, OR, WA) United States

0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 Region 1 Region 2 Region 3 Region 4 Region 5 Region 6 Region 7 Region 8 Region 9 Region 10

slide-5
SLIDE 5

Google Flu Trends Vs Twitter

1000 2000 3000 4000 5000 6000 7000 G-R3 T-R3 1000 2000 3000 4000 5000 6000 7000 8000 G-R9 T-R9

slide-6
SLIDE 6

Tweets, Phrases

"having a cold" 4 "have a cold“ 7 "feel feverish" "flu" 5 "headache" "flu" 8 "sick" "flu" 9 "flu" "fever“ 5 "came down with the flu" 7 "chills" "flu" 7 "catching the flu" 6 "cough" "flu" 6 "fatigue" "flu" 8 "weakness" "flu" 6 "flu like symptoms" 4 "runny nose" "flu" 5 "sore throat" "flu" 7 "stomach ache" "flu" 6 "stuffy nose" "flu" 6 "tiredness" "flu" 4 "vomiting" "flu" 4 "watery eyes" "flu" 6 "body hurts" "flu" 7

slide-7
SLIDE 7

Process

  • Filter flu tweets from twitter data
  • Store data for each state (FIPS)

Filter

  • Count flu tweets (weekly)
  • Count total tweets (weekly)

Count

  • Ratio of flu related to total tweets
  • Compare against Google/CDC

Plot

slide-8
SLIDE 8

Implementation

Linux bash shell script

 Filtering

 find fips -name "*.gz" -exec zcat {} \; | grep "$1"

 Counting

 find … -exec zcat {} \; | awk ‘{ print $3 }' | awk '{ print $3 "

" $2 " " $6 }

 sort -k 3n -k 2M -k 1n | uniq -c

 Plotting

 pr -mft -s, dates.txt NJ.tot NY.tot > RE2.tot  Microsoft Excel

slide-9
SLIDE 9

Challenges

 Filtering

 Phrases that express flu symptoms  Processing time  Segregation based on location

 Counting

 Processing time  Storage format

 Plotting

 Lack of consistent CDC data  Handling of large numeric data

slide-10
SLIDE 10

Future

 Better prediction algorithm  Live Tweet monitoring  Flu propagation  Facebook application