Classification of High Dimensional Data By Two-way Mixture Models
Jia Li Statistics Department The Pennsylvania State University
1
Classification of High Dimensional Data By Two-way Mixture Models - - PDF document
Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department The Pennsylvania State University 1 Outline Goals Two-way mixture model approach Background: mixture discriminant analysis Model
1
2
3
4
−10 −5 5 10 15 20 25 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 5
6
7
8
9
10
11
12
13
5 10 15 20 25 30 35 40 45 50 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 5 10 15 20 25 30 35 40 45 50 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18
Mixture of multivariate independent Poisson distribu-14
5 10 15 20 10 12 14 16 18 20 22 24 Number of mixture components per class Classification error rate (%)
10
1
10
2
10
3
10
4
11 11.5 12 12.5 13 13.5 14 14.5 15 15.5 16 Number of word clusters Classification error rate (%)
15
16
20 40 60 80 100 120 140 160 180 10 10
1
10
2
10
3
Word cluster index Number of words in each cluster
The corresponding weighted average of20 40 60 80 100 120 140 160 180 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Word cluster index Average λ
17
18
19
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40 45 Number of mixture components Classification error rate (%)
Minimum error rate 10:26% is achieved at M = 6. Due to the small sample size, classification perfor-20
10 20 30 40 50 60 70 80 90 100 2 4 6 8 10 12 14 16 Number of variable clusters Classification error rate (%) #components=4 #components=18 #components=36
Gene clustering improves classification.21
50 100 150 2 4 6 8 10 12 14 16 Number of mixture components Classification error rate (%)
When M22
23