Discovering Correlation Jill illes V s Vreeken 5 5 June 2015 - PowerPoint PPT Presentation

Discovering Correlation Jill illes V s Vreeken 5 5 June 2015 2015

Questions of the day What is correl elatio ion, how can we measure it, and how can di disc scover it?

Correlation ‘the relationship between things that happen or change together’ (Merriam-Webster)

𝜍 = 0.947

Correlation ‘a relation existing between phenomena or things or between mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of chance alone’ (Merriam-Webster)

Correlation ‘a relation ion existing between phenomena or things or between mathematical or statistical variables which te tend to to vary, be associated, or occur toget ether er in n a way no y not exp expected o on the basis of chance a alon one’ (Merriam-Webster)

Good Ol’ Pearson Pearson product-moment correlation coefficient one of the most well-known measures for correlation 𝜍 𝑌 , 𝑍 = 𝑑𝑑𝑑𝑑 𝑌 , 𝑍 = 𝐹 𝑌 − 𝜈 𝑌 𝑍 − 𝜈 𝑍 𝜏 𝑌 𝜏 𝑍 That is, covariance divided by standard deviation. Pearson detects only lin linear correlations

Pearson in action (Wikipedia , yes really )

𝜍 = 0.998

Chance alone… Last week, we discussed Shannon e entropy and mutual i information Can we use these to measure correlation? Yes, we can! Shannon entropy works very well for discrete data: e.g. low-entropy sets for continuous valued data: …

Shannon entropy for continuous As discussed last week, to compute ℎ 𝑌 = − �𝑔 𝑦 log 𝑔 𝑦 𝑒𝑦 𝐘 We need to estimate the probability density function, choose a step-size, and then hope for the best. If we don’t know the distribution, we can use kernel density estimation – which requires choosing a kernel and a bandwidth. KDE is well-behaved for univariate, but estimating multivariate densities is very difficult, especially for high dimensionalities.

MIC MIC: Maximal Information Coefficient A few years back, there was a big stir about MIC, a measure for non-linear correlations between pairs of variables. The main idea in a nutshell: If we want to measure the correlation of real-valued 𝑌 and 𝑍 , why not discretize ize the data, and compute mutual information!? That is, just find those 𝑌𝑌 and 𝑍𝑌 such that 𝐽 ( 𝑌𝑌 ; 𝑍𝑌 ) is maximal, and treat that value as the correlation measure. (Reshef et al, 2011)

MIC in a pic Given 𝐸 ⊂ ℝ 2 and integers 𝑦 and 𝑧 , 𝐽 ∗ 𝐸 , 𝑦 , 𝑧 = max 𝐽 ( 𝐸 | 𝐻 ) with 𝐻 over all grids of 𝑦 cols, 𝑧 rows. Normalise this score by independence 𝐽 ∗ 𝐸 , 𝑦 , 𝑧 log min 𝑦 , 𝑧 𝑁 𝐸 𝑦 , 𝑧 = And return the maximum 𝑦𝑧<𝐶 ( 𝑜 ) { 𝑁 𝐸 𝑦 , 𝑧 } 𝑁𝐽𝑁 𝐸 = max

Mining with MIC MIC is strictly defined for pairs s of variables es which means… ‘Mining’ is ea easy! y! We have to measure MIC for ever ery p y pair of attributes in our data, which we can then order by their MIC score.

BAD MIC BAD MIC is a nice idea, but… stric ictly ly for pairs, heuris istic ic optimization, doesn’t like lin linear, and doesn’t like noise se at all And that are just a few of its drawbacks… Can we salvage the nice part? (Simon and Tibshirani, 2011)

Cumulative Distributions 𝐺 ( 𝑦 ) = 𝑄 ( 𝑌 ≤ 𝑦 ) cdf df can be computed directly from data no no assumptions necessary

Identifying Interacting Subspaces

Cumulative Entropy Entropy has been defined for cumulative distribution functions! ℎ 𝐷𝐷 𝑌 = − � 𝑄 𝑌 ≤ 𝑦 log 𝑄 𝑌 ≤ 𝑦 𝑒𝑦 𝑒𝑒𝑒 𝑌 As 0 ≤ 𝑄 𝑌 ≤ 𝑦 ≤ 1 we obtain ℎ 𝐷𝐷 𝑌 ≥ 0 (!) (Rao et al, 2004, 2005)

Cumulative Entropy How do we compute ℎ 𝐷𝐷 ( 𝑌 ) in practice? Easy. Let 𝑌 1 ≤ ⋯ ≤ 𝑌 𝑜 be i.i.d. random samples of continuous random variable 𝑌 𝑜−1 𝑜 log 𝑗 𝑗 ℎ 𝐷𝐷 𝑌 = − � 𝑌 𝑗+1 − 𝑌 𝑗 𝑜 𝑗=1 (Rao et al, 2004, 2005, Crescenzo & Longobardi 2009)

Multivariate Cumulative Entropy (1) First things first. We need ℎ 𝐷𝐷 𝑌 | 𝑍 = ∫ ℎ 𝐷𝐷 𝑌 𝑧 𝑞 𝑧 𝑒𝑧 which, in practice, means ℎ 𝐷𝐷 𝑌 | 𝑍 = � ℎ 𝐷𝐷 𝑌 𝑧 𝑞 ( 𝑧 ) 𝑧∈𝑍 with 𝑧 a discrete bin of data points over 𝑍 , and 𝑞 𝑧 = 𝑧 𝑜 How do we bin 𝑍 into 𝑧 ? We ca can n si simpl mply cl clus uster Y Y (Nguyen et al, 2013)

Multivariate Cumulative Entropy (2) First things first. We need ℎ 𝐷𝐷 𝑌 | 𝑍 = ∫ ℎ 𝐷𝐷 𝑌 𝑧 𝑞 𝑧 𝑒𝑧 which, in practice, means ℎ 𝐷𝐷 𝑌 | 𝑍 = � ℎ 𝐷𝐷 𝑌 𝑧 𝑞 ( 𝑧 ) 𝑧∈𝑍 with 𝑧 a discrete bin of data points over 𝑍 , and 𝑞 𝑧 = 𝑧 𝑜 How do we bin 𝑍 into 𝑧 ? Fin Find t the dis iscretis isation on of Y Y such such th that at ℎ 𝐷𝐷 𝑌 𝑍 is mi mini nima mal (Nguyen et al, 2014)

Cumulative Mutual Information We cannot (realistically) calculate ℎ 𝐷𝐷 𝑌 1 , … , 𝑌 𝑒 in one go yet… entropy has a factorization property, so, what we can do is � ℎ 𝐷𝐷 𝑌 𝑗 − � ℎ 𝐷𝐷 ( 𝑌 𝑗 | 𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=2 𝑗=2 (Nguyen et al, 2013)

Mining for Interaction super simple: a priori-style

Mining interacting attributes CMI: use apriori principle, mine all attribute sets with ℎ 𝐷𝐷 ≤ 𝜏 (Nguyen et al, 2013ab)

Measuring Multivariate Correlations MIC is exclusively defined for pairs  score and approach does not scale up to higher dimensions Entrez, MAC  Multivariate Maximal Correlation Analysis (Nguyen et al, 2014)

Maximal Correlation Analysis The maxim imal co l correla latio ion of 𝑒 a set of real-valued random variables 𝑌 𝑗 𝑗=1 is defined as 𝑁𝑑𝑑𝑑 ∗ 𝑌 1 , … , 𝑌 𝑒 = max 𝑒 𝑌 𝑒 ) 𝑛 𝑁𝑑𝑑𝑑 ( 𝑔 1 𝑌 1 , … , 𝑔 𝑔 1 ,…, 𝑔 where 𝑁𝑑𝑑𝑑 is a correlation measure, 𝑗 ∶ 𝑒𝑑𝑒 𝑌 𝑗 → 𝐵 𝑗 is drawn from a pre-specified class 𝑔 of functions, and 𝐵 𝑗 ⊆ ℝ

T otal Correlation Finds the ch chain in of pairwise grids that min inim imiz izes the entropy, that maxim imize izes correlation The total co l correlatio ion of a dataset D is 𝐽 𝐸 = � 𝐼 𝑌 𝑗 − 𝐼 ( 𝑌 1 , … , 𝑌 𝑒 ) 𝑗=1 (Nguyen et al, 2014)

Maximal Discretized Correlation Let’s say our data is real valued, but that we have a discretization grid 𝐻 , then we have 𝑕 𝑗 − 𝐼 ( 𝑌 1 𝐽 𝐸 𝐻 = � 𝐼 𝑌 𝑗 𝑕 1 , … , 𝑌 𝑒 𝑕 𝑛 ) 𝑗 To find the maxim imal co l correla latio ion, we hence need to find that grid 𝐻 for 𝐸 such that 𝐽 ( 𝐸 𝐻 ) is maximized. (Nguyen et al, 2014)

Normalizing the Score However, 𝐽 ( 𝐸 𝐻 ) strongly depends on the number of bins 𝑜 𝑗 for attribute 𝑗 . So, we should normalize by an upper bound. 𝑒 ) 𝐽 𝐸 𝐻 ≤ � log 𝑜 𝑗 − max({log 𝑜 𝑗 } 𝑗=1 𝑗 (Nguyen et al, 2014)

Normalizing the Score However, 𝐽 ( 𝐸 𝐻 ) depends on the number of bins 𝑜 𝑗 for attribute 𝑗 . So, we should normalize. We know 𝑒 ) 𝐽 𝐸 𝐻 ≤ � log 𝑜 𝑗 − max({log 𝑜 𝑗 } 𝑗=1 𝑗 by which we define 𝐽 𝐸 𝐻 𝐽 𝑜 𝐸 𝐻 = 𝑒 }) ∑ log 𝑜 𝑗 − max ({log 𝑜 𝑗 } 𝑗=1 𝑗 as the nor ormaliz lized t tot otal l cor orrela latio ion (Nguyen et al, 2014)

MAC After all that, we can now finally introduce MAC. 𝐽 𝑜 ( 𝐸 𝐻 ) 𝑁𝐵𝑁 𝐸 = max 𝐻= { 𝑕 1 ,…, 𝑕 𝑛 } ∀ 𝑗≠𝑘 𝑜 𝑗 × 𝑜 𝑘 <𝑂 1−𝜗 How do we compute MAC? How do we choose G? Through cumulative entropy! (Nguyen et al, 2014)

GOOD MAC GOOD Linear Circle

NICE MAC NICE 20% noise 80% noise

Mining with MAC super simple: a priori-style

PRETTY MAC PRETTY 20% noise 80% noise

Comparability of Scores So, we use a priori… but… are CMI, MIC, MAC, etc (anti)-monotonic? Is any meaningful correlation score monotonic?

Spurious Correlations 𝜍 = 0.985

Correlation does not imply… Correlation means a co co-rela lation ion is observed, which does es no not imply a casual relation.

Correlation does not imply… If 𝑌 and 𝑍 are strongly correlated, this may have many reasons. Besides spurious, it may be that 𝑌 and 𝑍 are the result of an unobserved process 𝑎 . Next week we’ll investigate whether we can somehow tell if 𝑌 causes 𝑍 or vice versa.

Correlation does not imply…

Conclusions Correlation is almost anything deviating from chance Measuring multivariate correlation is difficult  especially if you want to be non on-param ametric  even more so if you want to measure non on-line inear interactions Entropy and Mutual Information are powerful tools  Shannon entropy for nominal data  cumulative entropy for ordinal data  discretise smartly for multivariate CE

Thank you! 𝜍 = 0.870

Discovering Correlation Jill illes V s Vreeken 5 5 June 2015 - PowerPoint PPT Presentation

Discovering Correlation Jill illes V s Vreeken 5 5 June 2015 2015 Questions of the day What is correl elatio ion, how can we measure it, and how can di disc scover it? Correlation the relationship between things that happen or

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Local Correlation with Local Vol and Stochastic Vol : Towards Correlation dynamics ? Pascal

1 Outline Introduction to WMSNs Spatial correlation for visual information in WMSNs

Correlation Quantitative A Aptitude & & Business S Statistics Correlation

The 3 rd Covenant Re-Discovering the Word of God within the words of the Bible Re-Discovering The

~ Discovering gold in the Cortez gold-trend of Nevada ~ NUG:V NULGF:QX Discovering gold in

Discovering Mammalian Endocytic Discovering Mammalian Endocytic Pathways with High- -Throughput

DISCOVERING OF CHILDREN NEEDS DISCOVERING OF CHILDREN NEEDS AND POTENTIALS: MAP SUPPORT IN

Discovering Flight Chapter Overview Discovering Flight The Early Days of Flight Chapter

Discovering Gods Word (Part-1) Discovering Gods Word The Inspired Word (Part-1) 2

Masked Correlation Filters for Partially Occluded Face Recognition Eric He ICASSP 2016

Remarks on the Data Complexity of Zero-Correlation Linear Attacks C eline Blondeau Aalto

Correlation scales of chorus emissions determined from multi-point THEMIS observations Vitalii

No design system is or should be perfect. That which is overdesigned, too highly specific,

OMG ITS RDA! What Public Services Staff Need to Know (without TMI) Presented by Elizabeth

Super-powered CI with Git SARAH GOFF-DUPONT DEV TOOLS MARKETING ATLASSIAN

PI Checklist Process Procedures 02/23/2016 A. PI Checklist is received. (Note: No file folder will

NUTRITION DURING CANCER TREATMENT TODAYS OUTLINE Eating Healthfully During Cancer Treatment

Is Democracy Possible? Nir Oren n.oren @abdn.ac.uk University of Aberdeen March 30, 2012 Nir

Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval wi with th Hier Hierar

BELL RINGER The table below shows some of the properties of the elements cobalt and nickel. A

Discovering Correlation Jill illes V s Vreeken 5 5 June 2015 - PowerPoint PPT Presentation

Discovering Correlation Jill illes V s Vreeken 5 5 June 2015 2015 Questions of the day What is correl elatio ion, how can we measure it, and how can di disc scover it? Correlation the relationship between things that happen or

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Local Correlation with Local Vol and Stochastic Vol : Towards Correlation dynamics ? Pascal

1 Outline Introduction to WMSNs Spatial correlation for visual information in WMSNs

Correlation Quantitative A Aptitude &amp; &amp; Business S Statistics Correlation

The 3 rd Covenant Re-Discovering the Word of God within the words of the Bible Re-Discovering The

~ Discovering gold in the Cortez gold-trend of Nevada ~ NUG:V NULGF:QX Discovering gold in

Discovering Mammalian Endocytic Discovering Mammalian Endocytic Pathways with High- -Throughput

DISCOVERING OF CHILDREN NEEDS DISCOVERING OF CHILDREN NEEDS AND POTENTIALS: MAP SUPPORT IN

Discovering Flight Chapter Overview Discovering Flight The Early Days of Flight Chapter

Discovering Gods Word (Part-1) Discovering Gods Word The Inspired Word (Part-1) 2

Masked Correlation Filters for Partially Occluded Face Recognition Eric He ICASSP 2016

Remarks on the Data Complexity of Zero-Correlation Linear Attacks C eline Blondeau Aalto

Correlation scales of chorus emissions determined from multi-point THEMIS observations Vitalii

No design system is or should be perfect. That which is overdesigned, too highly specific,

OMG ITS RDA! What Public Services Staff Need to Know (without TMI) Presented by Elizabeth

Super-powered CI with Git SARAH GOFF-DUPONT DEV TOOLS MARKETING ATLASSIAN

PI Checklist Process Procedures 02/23/2016 A. PI Checklist is received. (Note: No file folder will

NUTRITION DURING CANCER TREATMENT TODAYS OUTLINE Eating Healthfully During Cancer Treatment

Is Democracy Possible? Nir Oren n.oren @abdn.ac.uk University of Aberdeen March 30, 2012 Nir

Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval wi with th Hier Hierar

BELL RINGER The table below shows some of the properties of the elements cobalt and nickel. A

Correlation Quantitative A Aptitude & & Business S Statistics Correlation