ESS measures of political interest: An EM-MML approach Cludia - - PowerPoint PPT Presentation

ess measures of
SMART_READER_LITE
LIVE PREVIEW

ESS measures of political interest: An EM-MML approach Cludia - - PowerPoint PPT Presentation

A clustering view on ESS measures of political interest: An EM-MML approach Cludia Silvestre Margarida Cardoso Mrio Figueiredo Escola Superior de Comunicao Social - IPL BRU_UNIDE, ISCTE-IUL Instituto de


slide-1
SLIDE 1

A clustering view on ESS measures of political interest:

An EM-MML approach

Cláudia Silvestre Margarida Cardoso Mário Figueiredo

Escola Superior de Comunicação Social - IPL BRU_UNIDE, ISCTE-IUL Instituto de Telecomunicações, Inst. Sup. Técnico Portugal

slide-2
SLIDE 2

Outline

Objective Model

 Finite Mixture Models

Selection Criterion

 Minimum Message Length

Algorithm

EM-MML

Results Conclusions

slide-3
SLIDE 3

Objective

 Clustering the regions in the European Social Survey based on attitudes towards politics

 Voted last national election ( Yes; No; Not eligible)  Contacted politician or government official in last 12 months  Worked in political party or action group in last 12 months  Worked in another organisation or association in last 12 months  Worn or displayed campaign badge/sticker in last 12 months  Signed petition in last 12 months  Taken part in lawful public demonstration in last 12 months  Boycotted certain products last in 12 months  Feel closer to a particular party than all other parties (Y/N)

slide-4
SLIDE 4

Model: Finite Mixture Models

 K is the number of segments  𝑧𝑗 is regarded as “incomplete data”, the allocation to segments (𝑨𝑗 ) being missing Complete data: 𝑧𝑗, 𝑨𝑗 𝑔 𝑧𝑗|𝜄 = 𝛽𝑙𝑔 𝑧𝑗|𝜄𝑙

𝐿 𝑙=1

slide-5
SLIDE 5

Model: Finite Mixture Models

 The log of complete likelihood uknown 𝑔 𝑧𝑗,𝑨𝑗 𝜄 𝑨𝑗 𝛽 𝑔 𝑧𝑗 𝜄

𝑗

slide-6
SLIDE 6

Selection Criterion

How to select the number of segments?  Information criteria such as BIC, AIC, CAIC, AIC3 or ICL can be used… We adopt Minimum Message Length criterion embedded in the model estimation (Figueiredo and Jain, 2002), which:

 Provides estimates of all the model parameters including the number of segments  Is less sensitive to initialization than EM  Avoids the boundary of the parameters space

slide-7
SLIDE 7

Selection Criterion: MML

 Shannon’s Information Theory: optimally transmitting a random variable Y with probability 𝑔 𝑧 requires about − 2 𝑔 𝑧 bits of information.

 to encode 𝑧: 𝑧 𝜄 − 2 𝑔 𝑧, 𝜄

 to encode 𝑧 and 𝜄 the total message length is: 𝑧, 𝜄 𝑧 𝜄 + 𝜄

slide-8
SLIDE 8

Algorithm: EM-MML

 EM is a popular algorithm for finding ML parameter estimates, when unobserved (missing) data is considered in the model. The EM-MML  A mixture of multinomials is adopted and the MML estimates are obtained via an EM-type algorithm.

slide-9
SLIDE 9

Algorithm: EM-MML

Categorical variables: 𝑍 𝑍

, … , 𝑍 𝑗, … 𝑍

𝑍

𝑗 𝑍 𝑗 , … , 𝑍 𝑗𝐸

where variable 𝑒 (𝑒 1 … 𝐸) has 𝐷𝑒 categories 𝜄 𝜄 , … , 𝜄 , 𝛽 , … , 𝛽 , 𝛽 are the clusters’ weights or mixing probabilities 𝜄 the multinomials’ parameters

slide-10
SLIDE 10

Algorithm: EM-MML

log 𝑔 𝑧 𝜄 𝑗 log 𝑔 𝑧𝑗 𝜄 Mixture of multinomials: 𝑔 𝑧𝑗 𝜄 𝛽

𝑒 𝐸

𝑜!

𝑑 𝐷𝑒

𝜄 𝑒𝑑

𝑧𝑗𝑒𝑑

𝑧𝑗𝑒𝑑!

slide-11
SLIDE 11

Algorithm: EM-MML

Assuming that:

 The segments have independent priors  …independent from the mixing probabilities  A noninformative Jeffreys prior for 𝜄

𝑧, 𝜄 𝑧 𝜄 + 𝜄 𝑁 2

,𝛽𝑙>0

𝑜 𝛽 12 + 𝑙 𝑨 𝑜 12 + 𝑙 𝑨 𝑁 + 1 2 − log 𝑔 𝑧 𝜄 𝑁 is the number of parameters specifying each segment 𝑙 𝑨 is the number of segments with non-zero probability

slide-12
SLIDE 12

Algorithm: EM-MML

E-step 𝐹 𝑎𝑗 𝑧𝑗; 𝜄 𝑢 𝑄 𝑎𝑗 1 𝑧𝑗; 𝜄 𝑢 𝛽 𝑢 𝑔 𝑧𝑗; 𝜄

𝑢

𝛽 𝑢 𝑔 𝑧𝑗; 𝜄

𝑢

where 𝑔 𝑧𝑗; 𝜄

𝑢 𝑒 𝐸

𝑜!

𝑑 𝐷𝑒

𝜄 𝑒𝑑

𝑢 𝑧𝑗𝑒𝑑

𝑧𝑗𝑒𝑑!

slide-13
SLIDE 13

Algorithm: EM-MML

M-step  Update the estimates of mixing probabilities 𝛽

𝑢+

𝑛𝑏𝑦 0, 𝑗 𝑄 𝑎𝑗 1 𝑧𝑗; 𝜄 𝑢 − 𝑁 2 𝑛𝑏𝑦 0, 𝑗 𝑄 𝑎𝑗 1 𝑧𝑗; 𝜄 𝑢 − 𝑁 2  Update the estimates of multinomial parameters 𝜄 𝑒𝑑

𝑢+

𝑗 𝑄 𝑎𝑗 1 𝑧𝑗; 𝜄 𝑢 𝑧𝑗𝑒𝑑 𝑜! 𝑗 𝑄 𝑎𝑗 1 𝑧𝑗; 𝜄 𝑢

slide-14
SLIDE 14

Algorithm: EM-MML

K:=K-1

= 0 > 0 compute compute 𝛽 𝛽

𝜄 𝑄 𝑎𝑗 1 𝑍

𝑗,

𝜄(𝑢)

slide-15
SLIDE 15

Results

 The clustering of Regions in the European Social Survey based on attitudes towards politics, using EM-MML, yields 2 clusters

slide-16
SLIDE 16

Results:

cohesion-separation stability computationtime

BIC; CAIC; ICL AIC; AIC3 EM-MML 2012 Number of clusters

7 7 2

Silhouette index

0.213 0.191 0.361

Calinski-Harabasz

83.327 74.977 190.825

Computation time (seconds)

109 109 2

2014 Number of clusters

7 8 2

Silhouette index

0.152 0.164 0.367

Calinski-Harabasz

80.766 78.477 189.552

Computation time (seconds)

91 91 2

2012 vs 2014 Adjusted Rand

0.377 0.499 0.707

Normalized mutual information

0.523 0.591 0.598

slide-17
SLIDE 17

Results:

round 6 vs round 7

93 147 20 40 60 80 100 120 140 160 ESS7 CLU 1 ESS7 CLU 2

number of regions

114 126 20 40 60 80 100 120 140 160 ESS6 CLU 1 ESS6 CLU 2

number of regions

slide-18
SLIDE 18

10% 2% 6% 4% 13% 6% 9% 37% 65%

8%

16% 5% 26% 12% 32% 9% 28% 57% 76%

9%

0% 10% 20% 30% 40% 50% 60% 70% 80%

Contacted politician or government

  • fficial last 12

months Worked in political party

  • r action

group last 12 months Worked in another

  • rganisation or

association last 12 months Worn or displayed campaign badge/sticker last 12 months Signed petition last 12 months Taken part in lawful public demonstration last 12 months Boycotted certain products last 12 months Feel closer to a particular party than all

  • ther parties

Voted last national election Not eligible to vote

ESS6 CLU 1 ESS6 CLU 2

Regions in cluster 2 share a more active role in politics (Yes %)

slide-19
SLIDE 19

13% 3% 6% 5% 12% 5% 7% 41% 65% 8%

18% 5% 25% 12% 34% 9% 28% 58% 73% 10%

0% 10% 20% 30% 40% 50% 60% 70% 80%

Contacted politician or government

  • fficial last 12

months Worked in political party

  • r action

group last 12 months Worked in another

  • rganisation or

association last 12 months Worn or displayed campaign badge/sticker last 12 months Signed petition last 12 months Taken part in lawful public demonstration last 12 months Boycotted certain products last 12 months Feel closer to a particular party than all

  • ther parties

Voted last national election Not eligible to vote

ESS7 CLU 1 ESS7 CLU 2

Regions in cluster 1 share a more passive role in politics (Yes %)

slide-20
SLIDE 20

Results

7% 30% 37% 26%

14% 41% 32% 12%

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

How interested in politics - very interested How interested in politics - not at all interested

ESS6 CLU 1 ESS6 CLU 2

8% 29% 36% 27% 15% 42% 30% 12%

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

How interested in politics - very interested How interested in politics - not at all interested

ESS7 CLU 1 ESS7 CLU 2

Regions in cluster 2 are clearly more interested in politics (as expected…)

slide-21
SLIDE 21

Results

21% 11% 15% 13% 11% 13% 7% 5% 2% 1% 0%

10% 5% 9% 13% 13% 19% 14% 11% 4% 1% 0% 0% 5% 10% 15% 20% 25%

Trust in politicians - Not at all Trust in politicians - Completely

ESS6 CLU 1 ESS6 CLU 2

Most respondents in Cluster 1 do not trust politicians…

slide-22
SLIDE 22

Results

20% 12% 15% 14% 11% 14% 7% 4% 2% 1% 0%

10% 5% 9% 13% 13% 20% 14% 11% 4% 1% 0% 0% 5% 10% 15% 20% 25%

Trust in political parties - Not at all Trust in political parties - Completely

ESS6 CLU 1 ESS6 CLU 2

…or political parties

slide-23
SLIDE 23

5% 4% 9% 12% 11% 21% 12% 13% 9% 3% 2% 2% 2% 5% 10% 11% 21%

15% 18% 12% 3% 2%

0% 5% 10% 15% 20% 25%

Mostly looking out for themselves Most of the time people helpful

ESS6 CLU 1 ESS6 CLU 2

Regions in cluster 2 share a more positive view of other people ESS6 and ESS7 results being very similar

slide-24
SLIDE 24

5% 6% 0% 16% 9% 0% 10% 12% 8% 4% 0% 3% 0% 0% 9% 9% 0% 0% 10% 0%

3,9% 1,8% 9,2% 0,0% 0,0% 10,9% 0,5% 0,0% 1,9% 6,8% 9,2% 9,4% 11,4% 9,6% 1,2% 0,0% 8,7% 9,8% 0,0% 5,7%

0% 2% 4% 6% 8% 10% 12% 14% 16% 18%

Belgium Switzerland Czech Republic Germany Denmark Estonia Spain Finland France United Kingdom Hungary Ireland Israel Lithuania Netherlands Norway Poland Portugal Sweden Slovenia

ESS6 CLU 2 ESS6 CLU 1

All regions in Sweden, Norway, Finland, Denmark and Germany belong to cluster 2

slide-25
SLIDE 25

6% 7% 1% 14% 7% 0% 8% 10% 9% 10% 0% 4% 0% 0% 8% 7% 0% 1% 8% 0%

2,3% 0,0% 12,2% 0,0% 0,0% 12,4% 0,4% 0,0% 0,0% 0,4% 10,3% 9,5% 15,5% 13,6% 0,4% 0,0% 9,8% 5,8% 0,0% 7,4%

0% 2% 4% 6% 8% 10% 12% 14% 16% 18%

Belgium Switzerland Czech Republic Germany Denmark Estonia Spain Finland France United Kingdom Hungary Ireland Israel Lithuania Netherlands Norway Poland Portugal Sweden Slovenia

ESS7 CLU 2 ESS7 CLU 1

25 regions change to cluster 2, e.g. Lisbon (in Portugal ) Jihoceský kraj (in Czech Republic) All regions in Sweden, Norway, Finland, Denmark and Germany belong to cluster 2 4 regions change to cluster 1:

  • Prov. West-Vlaanderen

(in Belgium ), Principado de Asturias, La Rioja (in Spain) and Drenthe (in Netherlands)

slide-26
SLIDE 26

Conclusions

 A new EM variant – the EM-MML – was used to cluster categorical aggregated data and estimate the number of clusters simultaneously.  It estimates parameters of a finite mixture of multinomials, using a Minimum Message Length criterion.  EM-MML shows better performance when compared with traditional EM-ML combined with BIC, AIC and ICL: more parsimonious and robust solutions; better cohesion-separation and stability  Abrief profiling of the segments showing that the main changes occurred between rounds 6 and 7

slide-27
SLIDE 27

References

Biernacki, C., Celeux, G. and Govaert, G., 2000. Assessing a Mixture model for Clustering with the integrated Completed Likelihood. IEEE Transactions on Pattern analysis and Machine Intelligence, 22: 719–725. Figueiredo, M. A. T., and Jain, A. K., 2002. Unsupervised learning of finite mixture models", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, pp. 381-396. Fonseca, J. R. & Cardoso, M. G. (2007). Mixture-model cluster analysis using information theoretical criteria. Intelligent Data Analysis 11(2): 155-173. Silvestre, C., Cardoso, M. and Figueiredo, M., 2008. Clustering with finite mixture models and categorical variables. Contributed Papers to the International Conference on Computacional Statistics, Porto, Portugal, pp. 109-116. Silvestre, C., Cardoso, M., and Figueiredo, M., 2012. Categorical Data Clustering Using a Minimum Message Length Criterion. IDA 2012 - The eleventh International Symposium on Intelligent Data Analysis. Helsínquia, Finlândia, 25- 27 de outubro, 2012. Silvestre, C., Cardoso, M., and Figueiredo, M., 2013. Determining the Number of Groups while Clustering Categorical Data. IFCS 2013 – The International Federation os Classification Societies. Tilburg, the Netherlands, 14-17 July (Book

  • f Abstracts p. 158)