Data Exploration
Tyler Moore
CSE 7338 Computer Science & Engineering Department, SMU, Dallas, TX
Lecture 5
Outline
1
Data exploration
2 / 27 Data exploration
Guide to analyzing data
Type of Data Exploration Statistics RByEx 1 numerical variable
2 4 6 8 0.0 0.4 0.8 ecdf(br$logbreach) x Fn(x) 2 4 6 8 log(#records breached)
- ne way t-test, Wilcox test
6.3 1 categorical variable
CARD HACK PHYS STAT 400 800
– 3.1 # categories=2 – prop.test 6.2 1 categorical, 1 numerical
- BSF
EDU 2 4 6 8 Organization Type log(#records breached) 2 4 6 8 FALSE TRUE log(#records breached) Breach type
- ●
- anova, Permutation
10 # categories=2 – 2-way t, Wilcox test, Perm. 6.4 2 categorical variables
TOH
BSF BSO BSR EDU GOV MED NGO CARD DISC HACK INSD PHYS PORT STAT UNKN
χ2 test 3.2–3.5
4 / 27 Data exploration
R resources
“Beginner’s Guide to R” is good for introducing how to program in R, make plots etc. Chapters available at http://lyle.smu.edu/~tylerm/courses/econsec/rbegin.html “R by example” is good for introducing how to do data exploration and statistics in R (and therefore is useful for this part of the course) Chapters available at http://lyle.smu.edu/~tylerm/courses/econsec/rbyex.html
5 / 27