A Cluster‐Target Similarity Based g y Principal Component Analysis for Interval‐Valued Data
University of Tsukuba School of Systems and Information Engineering Mika Sato‐Ilic
A Cluster Target Similarity Based g y Principal Component Analysis - - PowerPoint PPT Presentation
A Cluster Target Similarity Based g y Principal Component Analysis for Interval Valued Data University of Tsukuba School of Systems and Information Engineering Mika Sato Ilic Energy Evaluation Data Energy JP1 JP2 UK1 UK2
University of Tsukuba School of Systems and Information Engineering Mika Sato‐Ilic
[51,71] [81,91] … [60,70] [60,90]
UK2 UK1 … JP2 JP1 Energy … … [65,75] [50,60] … [20,40] [60,100]
CCS [80,91] [80,91] … [80,95] [90,120]
[ , ] [ , ] [ , ] [ , ]
… … [0,20] [0,20] … [30,45] [60,80]
[60,80] [45,65] … [50,85] [70,120]
CCS … … [50 70] [60 72] [50 60] [70 100]
[60,70] [60,70] … [20,35] [40,100]
[10,40] [0,10] … [30,40] [30,70]
… … [60,70] [65,75] … [40,60] [70,100]
[80,90] [60,80] … [50,65] [83,111]
Waste [50,70] [60,72] … [50,60] [70,100] large … … …
J i t R h ESRC f d d S E G t SPRU (S i d T h l P li R h U i it f S )
[87,97] [87,97] … [65,85] [80,120]
y
…
Joint Research: ESRC‐funded Sussex Energy Group at SPRU (Science and Technology Policy Research, University of Sussex) Sustainable Energy/Environment & Public Policy (SEPP), University of Tokyo (ESRC: Economic and Social Research Council)
Principal Component Analysis based on Classification Structure by Fuzzy Clustering y y g
Multi‐dimensional Space
10
(p dimensional space) p: Number of Variables
1 3 7 8 6 5 2 9 11 4 Adaptable Classification Structure Adaptable Number of Clusters
Metric Projection
L
L L
yL
: Inner Product Space
: Nonempty Subset of X
L X P
L
: Metric Projection P
L
) ( ) ( ve nonexpansi is X P P
L L
y x y x y x , ) ( ) (
: Convex Chebyshev Set
Analysis Component Principal Objects
ity Dissimilar : ) ( ) ( ) , ( y x y x y x y x
L L
P P C Space Projected
Objects
ity Dissimilar : ) ( ) ( j y y x y
L L
P P
n i x x X
1
1 ) ( ~ ~ x x
a ip i i n
p a x X n i x x X
1 1
1 ) ( , , 1 ), , , ( , ~ x x x x x
na a p
p n p a x X
1
Variables
Number : Objects
Number : , , 1 , ), , , ( x x x
a p a
X X F p n
1 1
) ( )' ( Minimize Variables
Number : Objects
Number : l x l x
a a a p a a a
X X X X X X X X F X X X
1 1 * 1 1 1
) ' ) ' ( ( )' ' ) ' ( ( ' ) ' ( x x x x x l
p p a
X X X X
1 1 1
, , by Spanned Subspace to Projection : ' ) ' ( x x
p b a b a b a b a b a
X X X X X X X X
1 1 ' 1
) ( ' ) ' ( ) ( ) ( ' ) ' ( ) ( x x x x x x x x
X p
X X X X P X X X X
1 1 1
' ) ' ( , , by Spanned Subspace to Projection : ' ) ' ( x x
X X X X X X
Symmetry P P Idempotent P P P : ' :
p a X a a a p a X a a X a n b a n b a
P P P F V V
*
) ' ' ( ) ( )' ( , , x x x x x x x x x x x x
p a a
P P
' 1 1
) ( ) ( ) ( ) ( x x x x x x x x
p b a X b a b a b a b a b a X b a b a X b a
P P P
1
) ( )' ( ) ( )' ( ) ( ) ( ) ( ) ( x x x x x x x x x x x x x x x x
p b a b X a b a p a a X a a a b a b a X b a b a b a
P P p
1 1 1
) ' ' ( 2 ) ' ' ( 2 x x x x x x x x
b a a 1 1
Covariance between Variables
*
F
p b a b a X b a b a X b a
P P
1 '
) ( ) ( ) ( ) ( x x x x x x x x
p b a b a X b a b a b a b a
P
1 1
) ( )' ( ) ( )' ( x x x x x x x x
p b a b X a b a p a a X a a a b a
P P p
1 1 1
) ' ' ( 2 ) ' ' ( 2 x x x x x x x x
*
F Adaptable Classification Structure based on an Appropriate Number of Clusters Dissimilarity Structure of Objects in Higher Dimensional Space
) ( ) (
a a
) ( ) (
b b
) (
i Cl t f N b th h f St t ti Cl ifi Data Similarity Observed :
l
l S U S
) (
, , 1 Clustering Fuzzy
Result A is Clusters
Number the when for Structure tion Classifica :
l
K l l S U ) (
) ( ) (
Using by Similarity Restored ~
l l
U S :
K l
) ( ) 2 ( ) (
K l
) ( ) 2 ( ) (
, , 1 , , , 1 ]), , ([ ) ( p a n i y y y Y
ia ia ia
| ) ( inf ) ( | ) ( sup ) , , ( ) , , (
1 1
y y y x d y x d y x y x d d y y and y y between ity Dissimilar
p jp j j ip i i
y y
| ) ( inf ) ( | ) ( sup | ) , ( inf ) , ( , | ) , ( sup
1
y x y x d y y d y y y y d d y y y x d y x d y x y x d d
p ja ja a ia ja ij
| ) , ( inf ) , ( , | ) , ( sup
1
y x y x d y y d y y y y d d
ia ia a ja ia ji
) ( j i d d
ji ij
) ( , , 1 , }, { max / 1
,
j i s s n j i d d s
ji ij ij j i ij ij
) ( j
ji ij
( ) ( ) (Sato and Sato, 1995) (Sato and Sato, 1995)
Asymmetric Similarity Data Asymmetric Similarity Data
n j i j i s s s S
ji ij ij
,..., 1 , ), ( , , n j i u u w s
ij jl K K ik kl ij
,..., 1 , , j and i Objects Between Similarity Asymmetric s n j i u u w s
ij ij jl k l ik kl ij
: ,..., 1 , ,
1 1
l and k Clusters Between Similarity Asymmetric w k Cluster a to i Ojbect an
ess Belongingn
Degree u
kl ik
: : s s w w Error
ji ij lk kl ij :
) , 1 ( , 1 ], 1 , [
1
m u u
K k ik ik
Clusters
Number K Objects
Number n : :
1 k
) (
K kl
w kl
) (
~
| | | | log ) ( 2 1 ~
) ( ) , ( ) , ( 1 ) , ( ) , ( ) , ( ) (
1 ) , (
K l K k K l K k K l K k K kl
I tr w
K k
μ μ ) ( ) ( | |
) , ( ) , ( 1 ) , ( ' ) , ( ) , ( ) , ( ) , ( ) , (
1 ) , (
K l K k K k K l K k K l K k K l
K k
μ μ μ μ μ μ k Cl f M i C i V i k Cluster in Data
Value Expected :
) , (
K k
μ ] 1 , [ ~ ), ( , ~ ~ k Cluster for Matrix Covariance Variance :
) ( ) ( ) ( ) , (
K kl K lk K kl K k
w l k w w
n K ij ijs
s
) (
~
n K ij n ij j i j j
s s K C
) ( 2 1
2
~ ) (
j i ij j i ij
s s
1 1
K K
K jl k l K ik K kl K ij
) ( 1 1 ) ( ) ( ) (
kl
n K ) (
~
n n j i K ij ijs
s K C
1 ) (
2
) (
j i K ij j i ij
s s
1 ) ( 1 2
2
~ a m C E C P
2 4 2
2 ) | )] ˆ ( ˆ [ ) ˆ ( ˆ (| c w s s C E w s s C P exp 2 ) | )] , , ( [ ) , , ( (| ) ˆ , ˆ , ( ˆ ) ˆ , ˆ , ( ˆ w s s C w s s C
kl ij ij
) 1 ( 6 | ) ~ ~ ( ˆ ) ˆ ˆ ( ˆ | n n m w s s C w s s C
kl
) 1 ( , | ) , , ( ) , , ( |
2
n n m ma w s s C w s s C
ˆ ˆ ) ˆ , ~ ~ , ( ˆ ) ~ , ~ , ( ˆ w s d s s C w s s C , ~ ~ ) ~ ~ ~ ( ~ ~ ) ~ ~ ~ ( ~ ~ , ~ 2 ) ˆ , ~ , ( ˆ ) ˆ , ~ ~ , ( ˆ
2
s s s s d s d s s s s d s d s s s s d s w s s C w s d s s C ) 1 ( , 2 ~ , ~ n n m m s s d
) ~ , ~ , ( ˆ ) ~ , ~ , ( ˆ ), ˆ , ˆ , ( ˆ ) ˆ , ˆ , ( ˆ w s s C w s s C w s s C w s s C
kl ij ij kl ij ij
Theorem: (C. McDiarmid)
| ) , , , ˆ , , , ( ) , , ( | sup
1 1 1 1 ˆ , , ,
1
c x x x x x f x x f
i n i i i n x x x
i n
, 2 exp 2 ) | ) , , ( ) , , ( (|
2 1 1
all for X X Ef X X f P , exp 2 ) | ) , , ( ) , , ( (|
1 2 1 1
all for c X X Ef X X f P
n i i n n
n i for satisfies R A f A Set a in Values Taking Variables Random t Independen X X
n n
1 : : , ,
1
n i for satisfies R A f 1 :
) 1 ( , 6 | ) ~ , ~ , ( ˆ ) ˆ , ˆ , ( ˆ |
2
n n m w s s C w s s C ) 1 ( , | ) , , ( ) , , ( |
2
n n m ma w s s C w s s C a m
2 4 2
ˆ ˆ c a m w s s C E w s s C P exp 2 ) | )] , ˆ , ( [ ) , ˆ , ( (|
p b a b a X b a b a X b a
P P
1 '
) ( ) ( ) ( ) ( x x x x x x x x
p b a b a X b a b a b a b a
P
1 1
) ( )' ( ) ( )' ( x x x x x x x x
p b a b X a b a p a a X a a a b a
P P p
1 1 1
) ' ' ( 2 ) ' ' ( 2 x x x x x x x x
*
F Adaptable Classification Structure based on an Appropriate Number of Clusters Dissimilarity Structure of Objects in Higher Dimensional Space
u
n i n K i t i m ik
) ( ) ( x x x x x n u C
i n i K k m ik i k
1 1 1 1 1
, x
i k 1 1
) , 1 ( , 1 ], 1 , [
1
m u u
K k ik ik
Fuzzy Clustering
n i t i
) ( ) ( x x x x Special case n C
i i i
1
) ( ) ( x x x x
K k ik ik
u u
1
1 }, 1 , {
Hard Clustering n : Number of Objects K : Number of Clusters
u
n i n K i t i m ik
) ( ) ( x x x x x n u C
i n i K k m ik i k
1 1 1 1 1
, x
i k 1 1
) , 1 ( , 1 ], 1 , [
1
m u u
K k ik ik
n b ib a ia i ab ab
p b a x x x x w c c C , , 1 , ), )( ( ), (
1 k
K m k n i b ib a ia i ab ab
u x p
1
, , , ), )( ( ), (
n K m ik k ik i i ia a
u u w n x x
1 1
,
i k ik
u
1 1
n : Number of Objects K : Number of Clusters
n b ib a ia i ab ab
p b a x x x x w c c C , , 1 , ), )( ( ), (
K m ik i b ib a ia i ab ab
u p
1
) )( ( ) (
K k ik ik
u u
1
1 ], 1 , [
n K m ik k ik i
u w
1
) , 1 ( m
i k ik 1 1
Crisp Classification of an Object i l f f b Becoms Larger
i
Uncertainty Classification of an Object i Becomes Smaller
i
n n : Number : Number of
Objects K K : N : Number mber of Clusters usters
n b ib a ia i ab ab
p b a x x x x w c c C , , 1 , ), )( ( ), (
K m ik n ia i b ib a ia i ab ab
u x
1
n K m ik k ik i i ia a
u w n x
1 1
, ) , 1 ( m
n K
i k ik 1 1
Fuzzy Clustering
w w u u
i i i k ik ik
1 , 1 ], 1 , [
1 1
y g
i w u u
i K ik ik
, 1 1 }, 1 , {
Hard Clustering
n
i k ik ik
1
n n : Number : Number of
Objects K K : N : Number mber of Clusters usters
n b ib a ia i ab ab
p b a x x x x w c c C , , 1 , ), )( ( ), (
K m n i b ib a ia i ab ab
u x p b a x x x x w c c C
1
, , 1 , ), )( ( ), (
n K m ik k ik i i ia a
u u w n x x
1 1
,
i k ik
u
1 1
n : Number of Objects K : Number of Clusters
Empirical Joint Function for Interval Empirical Joint Function for Interval‐Valued Data Valued Data
(Bertrand and (Bertrand and Goupil Goupil, 2000) , 2000)
) ( ]) [ ] ([ 1 , ) ( ) , ( 1 ) , (
1 n i b a i b a
i Z y y I n y y f
] [ I t l E h f Di t ib ti U if , , , , , ) ( ]) , [ ], , ([ , 1 ) , (
ib ib ia ia ib ib ia ia b a i
y y y y Otherwise i Z y y y y y y I
2 p
] , [ Interval Each for
Distributi Uniform
ia ia y
y
Object k
2 p
ib
y
Object i Object k
]) , [ ], , ([
ib ib ia ia
y y y y
ib
y
ib
y
Object i Object j
ia
y
ia
y
Fuzzy Cluster based Covariance Matrix Fuzzy Cluster based Covariance Matrix for for Interval Interval‐Valued Data Valued Data
K n i b ib a ia i ab ab
p b a x x x x w c c C
1
, , 1 , ), )( ( ), (
n K m K k m ik i n i ia a
u w n x x
1 1
,
i k m ik
u
1 1
When xia are Interval‐Valued Data ?
) , ( ~ ) )( ( ˆ ), ˆ ( ˆ
b a b a b b a a ab ab
dy dy y y f y y y y c c C Mean Empirical Symbolic : ) ( 2 1 ) ( ) )( ( ) (
n ia ia a b a b a b b a a ab ab
y y n y y y y y f y y y y Function Joint Empirical Weighted : ) ( ) , ( 1 ) , ( ~ 2
1
n b a i i b a i
i Z y y I w n y y f n ) (
1 i
i Z n
ab
n n n b a b a b b a a ab
y y w y y w dy dy y y f y y y y c
1 ) ( 1 ) ( 1 1 ) , ( ~ ) )( ( ˆ
b a i ib ib i a i ia ia i b i ib ib ia ia i
y y n y y w y n y y w y n y y y y w n
1 1 1
1 2 ) ( 1 2 ) ( 1 ) )( ( 4 1 ) , 1 ( ,
1
m u w
n K K k m ik i
) , ( ,
1 1
u
n i K k m ik i
'
1 12 11 1 p
Corresponding Eigen‐Vector
Corresponding Eigen Vector For the Maximum Eigen‐Value of
1 1
First Principal Component
ia ia ia ia),
ia ia
i
1 11
c p c
ia ia c ia
] , [
ia ia ia
y y y
1
c np c n
p a n i , 1 , , , 1
p
C Y ~ } ~ cov{
Covariance Matrix for Interval‐Valued Data
b a ib ib n i ia ia ab ab
y y y y y y n c c C
) ( ) ( 4 1 ~ ), ~ ( ~
1
(Billard and Diday, 2000)
i
n
4
1
n b a i b a
i Z y y I n y y f ) ( ) , ( 1 ) , (
(Bertrand and (Bertrand and Goupil Goupil, 2000) , 2000)
Principal Component Analysis:Centers Method
i
i Z n
1
) (
( p , ) , )
) , ( ~ ) )( ( ˆ ), ˆ ( ˆ
b a b a b b a a ab ab
dy dy y y f y y y y c c C Function Joint Empirical Weighted : ) ( ) , ( 1 ) , ( ~
1
n i b a i i b a
i Z y y I w n y y f ) (
1 i
) , 1 ( ,
1
m u w
n K m K k m ik i
1 1
u
i k m ik
) ( ) )( ( ~ ) ~ ( ~
dy dy y y f y y y y c c C Function Joint Empirical : ) , ( 1 ) , ( ) , ( ) )( ( ), (
n b a i b b a b a b b a a ab ab
y y I y y f dy dy y y f y y y y c c C
Function Joint Empirical : ) ( ) , (
1
i b a
i Z n y y f
n b ib a ia i ab ab
p b a x x x x w c c C , , 1 , ), )( ( ), (
K m ik n ia i b ib a ia i ab ab
u x
1
n K m ik k ik i i ia a
u w n x
1 1
, ) , 1 ( m
n K
i k ik 1 1
w w u u
i i i k ik ik
1 , 1 ], 1 , [
1 1
i w u u
i K ik ik
, 1 1 }, 1 , { n
i k ik ik
1
n : Number of Objects K : Number of Clusters
[51,71] [81,91] … [60,70] [60,90]
UK2 UK1 … JP2 JP1 Energy … … [65,75] [50,60] … [20,40] [60,100]
CCS [80,91] [80,91] … [80,95] [90,120]
[ , ] [ , ] [ , ] [ , ]
… … [0,20] [0,20] … [30,45] [60,80]
[60,80] [45,65] … [50,85] [70,120]
CCS … … [50 70] [60 72] [50 60] [70 100]
[60,70] [60,70] … [20,35] [40,100]
[10,40] [0,10] … [30,40] [30,70]
… … [60,70] [65,75] … [40,60] [70,100]
[80,90] [60,80] … [50,65] [83,111]
Waste [50,70] [60,72] … [50,60] [70,100] large … … …
J i t R h ESRC f d d S E G t SPRU (S i d T h l P li R h U i it f S )
[87,97] [87,97] … [65,85] [80,120]
y
…
Joint Research: ESRC‐funded Sussex Energy Group at SPRU (Science and Technology Policy Research, University of Sussex) Sustainable Energy/Environment & Public Policy (SEPP), University of Tokyo (ESRC: Economic and Social Research Council)
Objects
1 2 3 4 5 6 7 8 9 10 11 1 1.00 0.70 0.78 0.83 0.38 0.36 0.66 0.78 0.78 0.70 0.77 2 0 67 1 00 0 51 0 81 0 05 0 03 0 37 0 50 0 63 0 43 0 80 2 0.67 1.00 0.51 0.81 0.05 0.03 0.37 0.50 0.63 0.43 0.80 3 0.73 0.49 1.00 0.77 0.45 0.50 0.77 0.73 0.75 0.70 0.57 4 0 70 0 70 0 70 1 00 0 22 0 22 0 51 0 62 0 67 0 54 0 67 4 0.70 0.70 0.70 1.00 0.22 0.22 0.51 0.62 0.67 0.54 0.67 5 0.44 0.14 0.58 0.42 1.00 0.78 0.64 0.59 0.46 0.62 0.26 6 0.29 0.00 0.49 0.30 0.64 1.00 0.43 0.36 0.30 0.41 0.12 6 0.29 0.00 0.49 0.30 0.64 1.00 0.43 0.36 0.30 0.41 0.12 7 0.67 0.39 0.81 0.63 0.63 0.50 1.00 0.82 0.66 0.81 0.51 8 0.80 0.55 0.81 0.78 0.55 0.46 0.85 1.00 0.80 0.81 0.66 9 0.80 0.69 0.84 0.82 0.42 0.40 0.70 0.81 1.00 0.79 0.76 10 0.76 0.53 0.84 0.75 0.62 0.55 0.86 0.86 0.83 1.00 0.64 11 0.75 0.82 0.61 0.80 0.19 0.17 0.50 0.64 0.73 0.56 1.00
1: Oil 3: Coal with 5: Geothermal 7: Biomass 9: Mun/Ind 11: Gas 1: Oil 3: CCS 5: Geothermal 7: Biomass 9: Waste 11: Gas 2: Coal 4: Nuclear 6: Solar PV 8:
large 10: Hydro
n j i K ij ijs
s
1 ) (
~
n j i u u w s
K jl K K K ik K kl K ij
,..., 1 , , ~
) ( ) ( ) ( ) (
n j i K ij n j i ij j i
s s K C
1 ) ( 1 2 1
2
~ ) (
j k l j 1 1
(Number of Clusters)
40 ponent 7 10 20 40 mponent 7 10 11 20 Principal Comp 1 3 5 8 9 11 20 Principal Comp 1 3 5 8 9 11 −20 Second P 2 4 5 6 −20 Second P 2 4 5 6 −250 −200 −150 −100 −50 First Principal Component −250 −200 −150 −100 −50 First Principal Component
40 nt 7
2 nt 7 8
20 4 ipal Component 1 3 8 9 10 11
1 ipal Component 1 5 8 9 10 11
−20 Second Principa 1 2 4 5 6
−2 −1 Second Principa 1 2 3 4 6
−250 −200 −150 −100 −50 First Principal Component S 4
−4 −2 2 4 −2 Se 6
First Principal Component
First Principal Component
1
u w
K k m ik
1 1
u w
n i K k m ik i
) , 1 ( m
Landsat Data; 1024 X 1024 pixels, 7 Kinds of Lights, July ‐ October, 1993
~ ˆ
Function Joint Empirical Weighted : ) ( ) , ( 1 ) , ( ~ ) , ( ~ ) )( ( ˆ ), ˆ ( ˆ
1
n i b a i i b a b a b a b b a a ab ab
i Z y y I w n y y f dy dy y y f y y y y c c C ) (
1 i
i Z n
) , 1 ( ,
1
m u u w
n K m K k m ik i 1 1
u
i k ik
) , ( ~ ) )( ( ˆ ), ˆ ( ˆ
b b b b b b
dy dy y y f y y y y c c C Function Joint Empirical Weighted : ) ( ) , ( 1 ) , ( ~ ) , ( ) )( ( ), (
1
n i b a i i b a b a b a b b a a ab ab
i Z y y I w n y y f dy dy y y f y y y y c c C
) , 1 ( ,
1
m u u w
n K m K k m ik i 1 1
u
i k ik
) ( ) )( ( ~ ) ~ ( ~
d d f C Function Joint Empirical : ) ( ) , ( 1 ) , ( ) , ( ) )( ( ~ ), ~ (
1
n i b a i b a b a b a b b a a ab ab
i Z y y I n y y f dy dy y y f y y y y c c C
i wi , 1
) (
1 i
i Z n
) ( ) )( ( ~ ) ~ ( ~
d d f C Function Joint Empirical : ) ( ) , ( 1 ) , ( ) , ( ) )( ( ~ ), ~ (
1
n i b a i b a b a b a b b a a ab ab
i Z y y I n y y f dy dy y y f y y y y c c C
i wi , 1
) (
1 i
i Z n