1
Unsupervised Data Discretization
- f Mixed Data Types
Unsupervised Data Discretization of Mixed Data Types Jee Vang - - PDF document
Unsupervised Data Discretization of Mixed Data Types Jee Vang Outline Introduction Background Objective Experimental Design Results Future Work 1 Introduction Many algorithms in data mining, machine learning, and
– Few discretization algorithms address interdependence
– Even fewer address such concerns in the absence of class
– Static vs dynamic – Supervised vs unsupervised – Local vs global – Top-down vs bottom-up – Direct vs incremental
– Based on principal component analysis (PCA) and frequent
– Binary: system type, technical violation, race, gender – Continuous: arrest, drug test, employment, homeless
– 1. measure the pair-wise correlations in the continuous
– 2. input data set into discretization algorithms – 3. measure the pair-wise correlations in the categorical
– 4. use Spearman or Kendall ranked-based correlation tests
PCA+FIM (Java, BLAS/LAPACK) –
–
–
–
–
–
EW (Data PreProcessor) –
EF (Data PreProcessor) –
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J.,
Cheng, J. “Data PreProcessor.”
Mehta, S., Parthasarathy , S., and Yang, H., “Toward