Inform ormat ation & & Cor Correlati tion on Jill - PowerPoint PPT Presentation

Inform ormat ation & & Cor Correlati tion on Jill illes V s Vreeken 11 11 June une 2014 2014 (TA TADA)

Quest uestio ions of th f the da day What is information? How can we measure correlation? and what do talking drums have to do with this?

Bits a Bit s and Piec Pieces es What is  information  a bit  entropy  information theory  compression  …

In Informatio ion Th Theo eory Branch of science concerned with measuring information Field founded by Claude Shannon in 1948, ‘A Mathematical Theory of Communication’ Information Theory is essentially about uncertainty in communication : not what you say, but what you could say

The B Th e Big ig In Insigh sight Communication is a series of discrete messages each message reduces the uncertainty of the recipient about a ) the series and b ) that message by how much? that is the amount of information

Uncerta tainty nty Shannon showed that uncertainty can be quantified, linking physical entropy to messages Shannon defined the entropy of a discrete random variable 𝑌 as 𝐼 ( 𝑌 ) = − � 𝑄 ( 𝑦 𝑗 )log 𝑄 ( 𝑦 𝑗 ) 𝑗

Optim Op imal pr l pref efix ix-code odes Shannon showed that uncertainty can be quantified, linking physical entropy to messages A side-result of Shannon entropy is that − log 2 𝑄 𝑦 𝑗 gives the length in bits of the optimal prefix code for a message 𝑦 𝑗

Wha hat is is a pr pref efix ix code? de? Prefix(-free) code: a code 𝐷 where no code word 𝑑 ∈ 𝐷 is the prefix of another 𝑒 ∈ 𝐷 with 𝑑 ≠ 𝑒 Essentially, a prefix code defines a tree , where each code corresponds to a path from the root to a leaf in a decision tree

Wha hat’s a a bit bit? Binary digit  smallest and most fundamental piece of information  yes or no  invented by Claude Shannon in 1948  name by John Tukey Bits have been in use for a long-long time, though  Punch cards (1725, 1804)  Morse code (1844)  African ‘talking drums’

Mo Morse se c code de

Natural la l lang nguage ge Punishes ‘bad’ redundancy: often-used words are shorter Rewards useful redundancy: cotxent alolws mishaireng/raeding African Talking Drums have used this for efficient, fast, long-distance communication mimic vocalized sounds: tonal language very reliable means of communication

Mea Measu surin ing b g bit its How much information carries a given string? how many bits? Say we have a binary string of 10000 ‘messages’ 1) 00010001000100010001…000100010001000100010001000100010001 2) 01110100110100100110…101011101011101100010110001011011100 3) 00011000001010100000…001000100001000000100011000000100110 4) 0000000000000000000000000000100000000000000000000…0000000 obviously, they are 10000 bits long. But, are they worth those 10000 bits?

So, So, h how ow man many bit bits? s? Depends on the encoding! What is the best encoding?  one that takes the entropy of the data into account  things that occur often should get short code  things that occur seldom should get long code An encoding matching Shannon Entropy is optimal

T ell ell us! us! Ho How w many ny bit bits? s? Please? In our simplest example we have 𝑄 (1) = 1/100000 𝑄 (0) = 99999/100000 (1/100000) = 16.61 | 𝑑𝑑𝑒𝑑 1 | = − log (99999/100000) = 0.0000144 | 𝑑𝑑𝑒𝑑 0 | = − log So, knowing 𝑄 our string contains 1 ∗ 16.61 + 99999 ∗ 0.0000144 = 18.049 bits of information

Op Optim imal… l…. Shannon lets us calculate optimal code lengths  what about actual codes? 0.0000144 bits?  Shannon and Fano invented a near-optimal encoding in 1948, within one bit of the optimal, but not lowest expected Fano gave students an option: regular exam, or invent a better encoding  David Huffman didn’t like exams; invented Huffman-codes (1952)  optimal for symbol-by-symbol encoding with fixed probs. (arithmetic coding is overall optimal, Rissanen 1976)

Op Optim imali lity To encode optimally, we need optimal probabilities What happens if we don’t? Kullback-Leibler divergence, 𝐸 ( 𝑞 || 𝑟 ), measures bits we ‘waste’ when we use 𝑞 while 𝑟 is the ‘true’ distribution 𝐸 𝑞 ‖ 𝑟 = � log 𝑞 𝑗 𝑞 ( 𝑗 ) 𝑟 𝑗 𝑗

Mult Mu ltiv ivariate E e Ent ntropy So far we’ve been thinking about a single sequence of messages How does entropy work for multivariate data? Simple!

Condit nditio ional E l Ent ntropy py Entropy, for when we, like, know stuff 𝐼 𝑌 𝑍 = � 𝑞 𝑦 𝐼 ( 𝑍 | 𝑌 = 𝑦 ) 𝑦∈X When is this useful?

Mu Mutua ual l In Information a and C Correla lation Mutual Information the amount of information shared between two variables 𝑌 and 𝑍 𝐽 𝑌 , 𝑍 = 𝐼 𝑌 − 𝐼 𝑌 𝑍 = 𝐼 𝑍 − 𝐼 𝑍 𝑌 𝑞 𝑦 , 𝑧 = � � 𝑞 𝑦 , 𝑧 log 𝑞 𝑦 𝑞 𝑧 𝑧∈𝑍 𝑦∈𝑌 high ↔ correlation low ↔ independence

In Informatio ion Ga Gain in (small aside) Entropy and KL are used in decision trees What is the best split in a tree? one that results in as homogeneous label distributions in the sub-nodes as possible: minimal entropy How do we compare over multiple options? 𝐽𝐽 𝑈 , 𝑏 = 𝐼 𝑈 − 𝐼 ( 𝑈 | 𝑏 )

Lo Low-Entropy S y Sets ts Theory of Probability Computation Theory 1 No No 1887 Yes No 156 No Yes 143 Yes yes 219 (Heikinheimo et al. 2007)

Low-Entropy S Lo y Sets ts Maturity Test Software Theory of Engineering Computation No No No 1570 Yes No No 79 No Yes No 99 Yes Yes No 282 No No Yes 28 Yes No Yes 164 No Yes Yes 13 Yes Yes Yes 170 (Heikinheimo et al. 2007)

Lo Low-Entropy T y Trees Scientific Writing Maturity Test Software Theory of Engineering Computation Project Probability Theory 1 (Heikinheimo et al. 2007)

Entr tropy fo y for r Conti ontinuous-value ued d d data So far we only considered discrete-valued data Lots of data is continuous-valued (or is it) What does this mean for entropy?

Differentia Dif ial E l Ent ntropy py ℎ 𝑌 = − �𝑔 𝑦 log 𝑔 𝑦 𝑒𝑦 𝐘 (Shannon, 1948)

Dif Differentia ial E l Ent ntropy py How about… the entropy of Uniform(0,1/2) ? 1 2 − � − 2 log 2 𝑒𝑦 = − log 2 0 Hm, negative ?

Differentia Dif ial E l Ent ntropy py In discrete data step size ‘dx’ is trivial. What is its effect here? ℎ 𝑌 = − �𝑔 𝑦 log 𝑔 𝑦 𝑒𝑦 𝐘 (Shannon, 1948)

Cumula lativ ive Dist Distribu ibutions

Cumula lativ ive E Ent ntropy We can define entropy for cumulative distribution functions! ℎ 𝐷𝐷 𝑌 = − � 𝑄 𝑌 ≤ 𝑦 log 𝑄 𝑌 ≤ 𝑦 𝑒𝑦 𝑒𝑒𝑒 𝑌 As 0 ≤ 𝑄 𝑌 ≤ 𝑦 ≤ 1 we obtain ℎ 𝐷𝐷 𝑌 ≥ 0 (!) (Rao et al, 2004, 2005)

Cumula lativ ive E Ent ntropy How do we compute it in practice? Easy. Let 𝑌 1 ≤ ⋯ ≤ 𝑌 𝑜 be i.i.d. random samples of continuous random variable X 𝑜−1 𝑜 log 𝑗 𝑗 ℎ 𝐷𝐷 𝑌 = − � 𝑌 𝑗+1 − 𝑌 𝑗 𝑜 𝑗=1 (Rao et al, 2004, 2005)

Mu Mult ltiv ivariate C e Cum umulativ ive E Entropy py? Tricky. Very tricky. Too tricky for now. (Nguyen et al, 2013, 2014)

Cumula lativ ive Mut Mutual Inf l Information Given continuous valued data over a set of attributes 𝑌 we want to identify 𝑍 ⊂ 𝑌 such that Y has high mutual information. Can we do this with cumulative entropy?

Ident Identif ifyin ing Int Inter eracting S Subs ubspaces es

Mult Mu ltiv ivariate C e Cum umulativ ive E Entropy py First things first. We need ℎ 𝐷𝐷 𝑌 | 𝑧 = ∫ ℎ 𝐷𝐷 𝑌 𝑧 𝑞 𝑧 𝑒𝑧 which, in practice, means ℎ 𝐷𝐷 𝑌 | 𝑍 = � ℎ 𝐷𝐷 𝑌 𝑧 𝑞 ( 𝑧 ) 𝑧∈𝑍 𝑧 with 𝑧 just some datapoints, and 𝑞 𝑧 = 𝑜 How do we choose 𝑧 ? such that ℎ 𝐷𝐷 𝑌 𝑍 is minimal

Entr trez, C , CMI We cannot (realistically) calculate ℎ 𝐷𝐷 𝑌 1 , … , 𝑌 𝑒 in one go but… Mutual Information has this nice factorization property… so, what we can do is � ℎ 𝐷𝐷 𝑌 𝑗 − � ℎ 𝐷𝐷 ( 𝑌 𝑗 | 𝑌 1 , … , 𝑌 𝑗−1 ) 𝑗=2 𝑗=2

Th The C e CMI MI algo lgorit ithm super simple: a priori-style

CMI in a MI in actio ion

Conclusi sions Information is about uncertainty of what you could say Entropy is a core aspect of information theory  lots of nice properties  optimal prefix-code lengths, mutual information, etc Entropy for continuous data is… more tricky  differential entropy is a bit problematic  cumulative distributions provide a way out, but are mostly unchartered territory

Thank you! Information is about uncertainty of what you could say Entropy is a core aspect of information theory  lots of nice properties  optimal prefix-code lengths, mutual information, etc Entropy for continuous data is… more tricky  differential entropy is a bit problematic  cumulative distributions provide a way out, but are mostly unchartered territory

Inform ormat ation & & Cor Correlati tion on Jill - PowerPoint PPT Presentation

Inform ormat ation & & Cor Correlati tion on Jill illes V s Vreeken 11 11 June une 2014 2014 (TA TADA) Quest uestio ions of th f the da day What is information? How can we measure correlation? and what do talking drums

Pollinator I nform ation Netw ork of the Am ericas Vital inform ation for a vital ecosystem

TSUBAME---A Year Lat er Sat oshi Mat suoka, Prof essor/ Dr.Sci. Global Scient if ic I nf ormat

Ind ndustrial rial an and d St Structur tural al Tran ansfor ormat ation ion in n

Inform 7 The Hall is a room. Inform 7 Logic based programming Similar to Prolog

CONTRACTING BASICS 410 th COR Training 410th CSB 410th CSB LEARNING OBJECTIVES CONTRACTING

CONTRACT PAYMENT 410 th COR Training 410th CSB 410th CSB CONTRACT PAYMENT OBJECTIVES

P EOPLE OF R ENOWN H OW MANY PEOPLE MENTIONED IN 1 C ORINTHIANS O NE ? N INE PEOPLE MENTIONED

MANAGING INF ORMAT ION PRIVACY & SE CURIT Y RE MOT E L Y Pr e se nte d by: Mar y

Relati v e Val u ation E QU ITY VAL U ATION IN R Cli Ang Senior Vice President , Compass Le

Thriving in the new w orld order Martin L. Flanagan President and CEO Invesco Ltd. All inform

Premium Manganese ASX : GMC Gulf Mangane se Cor por ation L imite d PT Gulf Mangan Gr up

Bhar Bharat at Petr troleum oleum Cor Corpor poration Limit ation Limited ed Investor

Multi-Ag e nc y Pa rtne rship E ve nt City of L ondon Cor por ation and City and Hac kne y

By Millike n De ve lopme nt Cor por ation SI T E MAP WE ST VANCOUVE R Ne e d fo r

Optimal Planning of Digital Cor d less T ele c ommunic ation Systems 1 Optimal

Model-based ased Ve Veri rific icatio ation, Optim imiz ization ation, Sy Synthesi hesis

Lecture 0 Introduction I-Hsiang Wang Department of Electrical Engineering National Taiwan

Introduction to Symbolic Dynamics Part 4: Entropy Silvio Capobianco Institute of Cybernetics at

Data Compression Techniques Grzegorz Pastuszak Warsaw University of Technology Trieste

Information Theory Lecture 1 Course introduction Entropy, relative entropy and mutual

Perceptually-Driven Video Coding with the Daala Video Codec Timothy B. Terriberry The Xiph.Org

Image/video compression: Basics and research issues Christine GUILLEMOT Outline A few basics

Deep Learning for Image and Video Compression Yao Wang Dept. of Electrical and Computer

Challenge Codes for Physically Unclonable Functions (PUFs) A Maximum Entropy Problem Alexander

Inform ormat ation & & Cor Correlati tion on Jill - PowerPoint PPT Presentation

Inform ormat ation & & Cor Correlati tion on Jill illes V s Vreeken 11 11 June une 2014 2014 (TA TADA) Quest uestio ions of th f the da day What is information? How can we measure correlation? and what do talking drums

Pollinator I nform ation Netw ork of the Am ericas Vital inform ation for a vital ecosystem

TSUBAME---A Year Lat er Sat oshi Mat suoka, Prof essor/ Dr.Sci. Global Scient if ic I nf ormat

Ind ndustrial rial an and d St Structur tural al Tran ansfor ormat ation ion in n

Inform 7 The Hall is a room. Inform 7 Logic based programming Similar to Prolog

CONTRACTING BASICS 410 th COR Training 410th CSB 410th CSB LEARNING OBJECTIVES CONTRACTING

CONTRACT PAYMENT 410 th COR Training 410th CSB 410th CSB CONTRACT PAYMENT OBJECTIVES

P EOPLE OF R ENOWN H OW MANY PEOPLE MENTIONED IN 1 C ORINTHIANS O NE ? N INE PEOPLE MENTIONED

MANAGING INF ORMAT ION PRIVACY &amp; SE CURIT Y RE MOT E L Y Pr e se nte d by: Mar y

Relati v e Val u ation E QU ITY VAL U ATION IN R Cli Ang Senior Vice President , Compass Le

Thriving in the new w orld order Martin L. Flanagan President and CEO Invesco Ltd. All inform

Premium Manganese ASX : GMC Gulf Mangane se Cor por ation L imite d PT Gulf Mangan Gr up

Bhar Bharat at Petr troleum oleum Cor Corpor poration Limit ation Limited ed Investor

Multi-Ag e nc y Pa rtne rship E ve nt City of L ondon Cor por ation and City and Hac kne y

By Millike n De ve lopme nt Cor por ation SI T E MAP WE ST VANCOUVE R Ne e d fo r

Optimal Planning of Digital Cor d less T ele c ommunic ation Systems 1 Optimal

Model-based ased Ve Veri rific icatio ation, Optim imiz ization ation, Sy Synthesi hesis

Lecture 0 Introduction I-Hsiang Wang Department of Electrical Engineering National Taiwan

Introduction to Symbolic Dynamics Part 4: Entropy Silvio Capobianco Institute of Cybernetics at

Data Compression Techniques Grzegorz Pastuszak Warsaw University of Technology Trieste

Information Theory Lecture 1 Course introduction Entropy, relative entropy and mutual

Perceptually-Driven Video Coding with the Daala Video Codec Timothy B. Terriberry The Xiph.Org

Image/video compression: Basics and research issues Christine GUILLEMOT Outline A few basics

Deep Learning for Image and Video Compression Yao Wang Dept. of Electrical and Computer

Challenge Codes for Physically Unclonable Functions (PUFs) A Maximum Entropy Problem Alexander

MANAGING INF ORMAT ION PRIVACY & SE CURIT Y RE MOT E L Y Pr e se nte d by: Mar y