members

Members: Raghuram Krishnamachari Manish Maheshwari Maryam El - PowerPoint PPT Presentation

Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove Flu Prediction / Activity CDC Flu Activity Reports Influenza like Illness (ILI) for each region Google Flu Trends Aggregates


  1. Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

  2. Flu Prediction / Activity  CDC Flu Activity  Reports Influenza like Illness (ILI) for each region  Google Flu Trends  Aggregates search data to estimate flu activity  Our experiment (Twitter)  Analyze Twitter data (tweets) to estimate flu activity

  3. Google Flu Trends  CDC’s ILI data VS Google Flu Trends

  4. Google Flu Trends Vs Twitter 12000 HHS Region 1 (CT, ME, MA, NH, RI, VT) HHS Region 2 (NJ, NY) 10000 HHS Region 3 (DE, DC, MD, PA, 8000 VA, WV) HHS Region 4 (AL, FL, GA, KY, MS, NC, SC, TN) HHS Region 5 (IL, IN, MI, MN, 6000 OH, WI) HHS Region 6 (AR, LA, NM, OK, 4000 TX) HHS Region 7 (IA, KS, MO, NE) 2000 HHS Region 8 (CO, MT, ND, SD, UT, WY) HHS Region 9 (AZ, CA, HI, NV) 0 HHS Region 10 (AK, ID, OR, WA) United States 0.009 Region 1 0.008 0.007 Region 2 0.006 Region 3 0.005 Region 4 0.004 Region 5 0.003 Region 6 0.002 0.001 Region 7 0 Region 8 Region 9 Region 10

  5. Google Flu Trends Vs Twitter 7000 6000 5000 4000 3000 G-R3 2000 T-R3 1000 0 8000 7000 6000 5000 4000 3000 G-R9 2000 T-R9 1000 0

  6. Tweets, Phrases "having a cold" 4 "have a cold“ 7 "feel feverish" "flu" 5 "headache" "flu" 8 "sick" "flu" 9 "flu" "fever“ 5 "came down with the flu" 7 "chills" "flu" 7 "catching the flu" 6 "cough" "flu" 6 "fatigue" "flu" 8 "weakness" "flu" 6 "flu like symptoms" 4 "runny nose" "flu" 5 "sore throat" "flu" 7 "stomach ache" "flu" 6 "stuffy nose" "flu" 6 "tiredness" "flu" 4 "vomiting" "flu" 4 "watery eyes" "flu" 6 "body hurts" "flu" 7

  7. Process • Filter flu tweets from twitter data Filter • Store data for each state (FIPS) • Count flu tweets (weekly) Count • Count total tweets (weekly) • Ratio of flu related to total tweets Plot • Compare against Google/CDC

  8. Implementation Linux bash shell script  Filtering  find fips -name "*.gz" -exec zcat {} \; | grep "$1"  Counting  find … -exec zcat {} \; | awk ‘{ print $3 }' | awk '{ print $3 " " $2 " " $6 }  sort -k 3n -k 2M -k 1n | uniq -c  Plotting  pr -mft -s, dates.txt NJ.tot NY.tot > RE2.tot  Microsoft Excel

  9. Challenges  Filtering  Phrases that express flu symptoms  Processing time  Segregation based on location  Counting  Processing time  Storage format  Plotting  Lack of consistent CDC data  Handling of large numeric data

  10. Future  Better prediction algorithm  Live Tweet monitoring  Flu propagation  Facebook application

Recommend


More recommend