Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by:
- Prof. Alan Mislove
Members: Raghuram Krishnamachari Manish Maheshwari Maryam El - - PowerPoint PPT Presentation
Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove Flu Prediction / Activity CDC Flu Activity Reports Influenza like Illness (ILI) for each region Google Flu Trends Aggregates
Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by:
CDC Flu Activity
Reports Influenza like Illness (ILI) for each
region
Google Flu Trends
Aggregates search data to estimate flu activity
Our experiment (Twitter)
Analyze Twitter data (tweets) to estimate flu
activity
CDC’s ILI data VS Google Flu Trends
2000 4000 6000 8000 10000 12000
HHS Region 1 (CT, ME, MA, NH, RI, VT) HHS Region 2 (NJ, NY) HHS Region 3 (DE, DC, MD, PA, VA, WV) HHS Region 4 (AL, FL, GA, KY, MS, NC, SC, TN) HHS Region 5 (IL, IN, MI, MN, OH, WI) HHS Region 6 (AR, LA, NM, OK, TX) HHS Region 7 (IA, KS, MO, NE) HHS Region 8 (CO, MT, ND, SD, UT, WY) HHS Region 9 (AZ, CA, HI, NV) HHS Region 10 (AK, ID, OR, WA) United States
0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 Region 1 Region 2 Region 3 Region 4 Region 5 Region 6 Region 7 Region 8 Region 9 Region 10
1000 2000 3000 4000 5000 6000 7000 G-R3 T-R3 1000 2000 3000 4000 5000 6000 7000 8000 G-R9 T-R9
"having a cold" 4 "have a cold“ 7 "feel feverish" "flu" 5 "headache" "flu" 8 "sick" "flu" 9 "flu" "fever“ 5 "came down with the flu" 7 "chills" "flu" 7 "catching the flu" 6 "cough" "flu" 6 "fatigue" "flu" 8 "weakness" "flu" 6 "flu like symptoms" 4 "runny nose" "flu" 5 "sore throat" "flu" 7 "stomach ache" "flu" 6 "stuffy nose" "flu" 6 "tiredness" "flu" 4 "vomiting" "flu" 4 "watery eyes" "flu" 6 "body hurts" "flu" 7
Filtering
find fips -name "*.gz" -exec zcat {} \; | grep "$1"
Counting
find … -exec zcat {} \; | awk ‘{ print $3 }' | awk '{ print $3 "
" $2 " " $6 }
sort -k 3n -k 2M -k 1n | uniq -c
Plotting
pr -mft -s, dates.txt NJ.tot NY.tot > RE2.tot Microsoft Excel
Filtering
Phrases that express flu symptoms Processing time Segregation based on location
Counting
Processing time Storage format
Plotting
Lack of consistent CDC data Handling of large numeric data
Better prediction algorithm Live Tweet monitoring Flu propagation Facebook application