Lecture 1 Matthieu Bloch 1 Notation and basic definitions integers - PDF document

1 arises with random vectors. Tie notion of Markov kernel is mainly used to simplify notation and does not bear any profound (1) use the notion of Markov kernel as defined below. convenient to define a conditional distribution without specific random variables, in which case we We start by briefly introducing the notion used throughout this book. Tie set of real numbers and be clear from context and no confusion should arise. Revised December 5, 2019 Information Theoretic Security Lecture 1 Matthieu Bloch 1 Notation and basic definitions integers are denoted by R and N , respectively. For a < b ∈ R , the open (resp. closed) interval between a and b is denoted by ] a ; b [ (resp. [ a ; b ] ). For n < m ∈ N , we set � n, m � ≜ { i ∈ N : n ⩽ i ⩽ m } . Scalar real-valued random variables are denoted by uppercase letters, e.g., X with realizations denoted in lowercase, e.g, x . Vector real-valued random variables and realizations are shown in bold, e.g., X and x . Sets are denoted by calligraphic letters, e.g., X . Given a vector x ∈ X n , the i =1 . For 1 ⩽ i ⩽ j ⩽ n , we define x i : j ≜ { x k : i ⩽ k ⩽ j } . components of x are denoted { x i } n Matrices are also denoted by capital bold letters, e.g., A ∈ R n × m and we make sure no confusion Unless otherwise specified, sets are assumed to be finite, e.g., |X| < ∞ , and random variables are assumed to be discrete. Tie probability simplex over X is denoted ∆( X ) and we always denote the Probability Mass Function (PMF) of X by p X . Tie support of a PMF p ∈ ∆( X ) is supp ( p ) ≜ { x ∈ X : p ( x ) > 0 } . If X is a continuous random variable, we will abuse notation and also denote its Probability Density Function (PDF) by p X ; whether we manipulate PDFs or PMFs will always Given two jointly distributed random variables X ∈ X and Y ∈ Y , their joint distribution is denoted by p XY and the conditional distribution of Y given X is denoted by p Y | X . It is sometimes Definition 1.1. W is a Markov kernel from X to Y , if for all x, y ∈ X × Y W ( y | x ) ⩾ 0 and for all x ∈ X � y ∈Y W ( y | x ) = 1 . For any p ∈ ∆( X ) , we define W · p ∈ ∆( X × Y ) and W ◦ p ∈ ∆( Y ) as � ( W · p ) ( x, y ) ≜ W ( y | x ) p ( x ) and ( W ◦ p ) ( y ) ≜ ( W · p )( x, y ) . x ∈X meaning. Two jointly distributed random variables X and Y are called independent if p XY = p X p Y . Tie expected value, or average, or a random variable X is defined as � E ( X ) ≜ xp X ( x ) . x ∈X Tie m -th centered moment of X is defined as E (( X − E X ( X )) m ) , and in particular the variance of X denoted Var ( X ) is the second centered moment. Definition 1.2 (Markov chain and conditional independence) . Markov chain Let X, Y, Z be real- valued random variables with joint PMF p XY Z . Tien X , Y , Z form a Markov chain in that order, denoted X − Y − Z , if X and Z are conditionally independent given Y , i.e., ∀ ( x, y, z, ) ∈ X ×Y ×Z we have p XZY ( x, y, z ) = p Z | Y ( z | y ) p Y | X ( y | x ) p X ( x ) .

2 Convexity plays a central role in many of our proofs, largely because information-theoretic metrics where we have used Jensen’s inequality. (3) Tien, generally for continuous random variables. log-sum inequality in subsequent chapters and we recall these results here for completeness. possess convenient convexity properties. We shall extensively use Jensen’s inequality and the related (2) Finally, we define the notion of absolute continuity that will prove useful in later sections. Revised December 5, 2019 Information Theoretic Security Definition 1.3 (Absolute continuity) . Let p, q ∈ ∆( X ) . We say that p is absolutely continuous with respect to (w.r.t.) q , denoted by p ≪ q , if supp ( p ) ⊆ supp ( q ) . If p is not absolutely continuous w.r.t. q , we write p ≪ / q . 2 Convexity and Jensen’s inequality Definition 2.1 (Convex and concave functions) . A function f : � a, b � �− → R is convex if ∀ λ ∈ � 0 , 1 � f ( λa + (1 − λ ) b ) ⩽ λf ( a ) + (1 − λ ) f ( b ) . A function f is strictly convex if the inequality above is strict. A function f is (strictly) concave if − f is (strictly) convex. Tieorem 2.2 (Jensen’s inequality) . Jensen’s inequality Let X be a real-valued random variable defined on some interval [ a, b ] and with PDF p X . Let f : [ a, b ] → R be a real valued function that is convex in [ a, b ] . Tien, f ( E ( X )) ⩽ E ( f ( X )) . For any strictly convex function, equality holds if and only if X is a constant. Tie results also holds more Proof. Let h L : [ a, b ] → R be a line such that ∀ x ∈ � a, b � h L ( x ) ⩽ f ( x ) . Such a line always exists as a result of convexity. Tien, E ( h L ( X )) ⩽ E ( f ( X )) , but since h L is a line, we have h L ( E ( X )) = E ( h L ( X )) ⩽ E ( f ( X )) . In particular, we can choose h L such that h L ( E ( X )) = f ( E ( X )) because f is convex. Hence, f ( E ( X )) ⩽ E ( f ( X )) and if f is strictly convex, we have equality if and only if X = cst. ■ Corollary 2.3 (Log-sum inequality) . log-sum inequality Let { a i } n i =1 ∈ R n + and { b i } n i =1 ∈ R n + . � n n � ln ( � n i =1 a i ) a i ln a i � � ⩾ a i i =1 b i ) . ( � n b i i =1 i =1 Proof. Note that if b j = 0 and a j � = 0 for some j ∈ � 1 , n � then the results holds since the left- hand-side of (2) is infinite. If not, we introduce the function f : R + → R : x �→ x ln x with the convention that f (0) = 0 , which is infinitely differentiable on its domain. Since f ′′ ( x ) = 1 ⩾ 0 , f is convex. Set a ≜ � n i =1 a i and b ≜ � n i =1 b i . Tien, note that � n n n n � � a i � a i ln a i b i a i a i � a = a ln a � � � � � ⩾ bf = = b = bf b i f b , b i b i b b i b b i =1 i =1 i =1 i =1 ■ Proposition 2.4. Let X be a real-valued random variable defined on some interval [ a, b ] and with PDF p X . Let f : [ a, b ] → R be a real valued function that is convex in [ a, b ] . Tien, E ( f ( X )) ⩽ f ( a ) + f ( b ) − f ( a ) ( E ( X ) − a ) b − a

3 As we will see in subsequent chapters, many information-theoretic security metrics can be expressed (12) interval (11) (10) (9) (8) (7) (6) (5) expressed more generally as shown in the next proposition. (4) is two distances, the total variation distance and the relative entropy, which we shall extensively use. in terms of how close or distinct probability distributions are. We develop here the properties of Revised December 5, 2019 Information Theoretic Security For any strictly convex function, the equality holds if and only if X is only distributed on the end of the Proof. Let h U : [ a, b ] → R be a line such that ∀ x ∈ � a, b � f ( x ) ⩽ h U ( x ) . Tien, E ( f ( X )) ⩽ E ( h U ( X )) = h U ( E ( X )) . In particular, we may choose h U : x �→ f ( a ) + f ( b ) − f ( a ) ( x − a ) . b − a Hence, E ( f ( X )) ⩽ f ( a ) + f ( b ) − f ( a ) ( E ( X ) − a ) and if f is strictly convex, equality holds if X is b − a such that p X ( x ) = 0 for x ∈ ] a, b [ . ■ 3 Distances between distributions Definition 3.1 (Total variation distance) . Tie total variation between two distributions p, q ∈ ∆( X ) V ( p, q ) ≜ 1 2 � p − q � 1 ≜ 1 � | p ( x ) − q ( x ) | . 2 x ∈X For all practical purposes, the total variation distance is an ℓ 1 norm on the probability simplex ∆( X ) and inherits all its properties (symmetry, positivity, triangle, inequality). Tie normalization by 1 2 is for convenience as we shall see from the properties derived next. Tie total variation can be Proposition 3.2. Tie total variation between two distributions p, q ∈ ∆( X ) is ( P p ( E ) − P q ( E )) = sup ( P q ( E ) − P p ( E )) . V ( p, q ) = sup E⊂X E⊂X Tie supremum is attained for E ≜ { x : ∈ X : p ( x ) > q ( x ) } . Consequently, 0 ⩽ V ( p, q ) ⩽ 1 . Proof. From the definition, upon setting E 0 ≜ { x ∈ X : p ( x ) > q ( x ) } we have V ( p, q ) = 1 ( p ( x ) − q ( x )) + 1 � � ( q ( x ) − p ( x )) 2 2 x : ∈E c x ∈E 0 0 = 1 2 ( P p ( E 0 ) − P q ( E 0 ) + P q ( E c 0 ) − P p ( E c 0 )) = P p ( E 0 ) − P q ( E 0 ) ⩽ sup ( P p ( E ) − P q ( E )) . E⊂X Conversely, note that for every E P p ( E ) − P q ( E ) = 1 2 ( P p ( E ) − P q ( E ) + P q ( E c ) − P p ( E c )) = 1 ( p ( x ) − q ( x )) + 1 � � ( q ( x ) − p ( x )) 2 2 x ∈E x : ∈E c ⩽ V ( p, q ) , so that sup E⊂X ( P p ( E ) − P q ( E )) ⩽ V ( p, q ) . ■

Lecture 1 Matthieu Bloch 1 Notation and basic definitions integers - PDF document

1 arises with random vectors. Tie notion of Markov kernel is mainly used to simplify notation and does not bear any profound (1) use the notion of Markov kernel as defined below. convenient to define a conditional distribution without specific

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Linear and Statistical Independence of Linear Approximations and their Correlations Kaisa Nyberg

1 A statistical definition of probability: frequentist 2 concepts: 1. Sam ple space , S , is the

Statistics and Data Analysis Introduction to Probability (1) Ling-Chieh Kung Department of

Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID -

Sta$s$cs & Experimental Design with R Barbara Kitchenham

Pisa mai 2006 Nonstandard Averaging and Signal Processing E.Benot Universit de La Rochelle

Polynomial completeness properties Erhard Aichinger Department of Algebra Johannes Kepler

Properties of the automorphism group and a probabilistic construction of a class of countable

Lecture 1 Matthieu Bloch 1 Notation and basic definitions integers - PDF document

1 arises with random vectors. Tie notion of Markov kernel is mainly used to simplify notation and does not bear any profound (1) use the notion of Markov kernel as defined below. convenient to define a conditional distribution without specific

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Linear and Statistical Independence of Linear Approximations and their Correlations Kaisa Nyberg

1 A statistical definition of probability: frequentist 2 concepts: 1. Sam ple space , S , is the

Statistics and Data Analysis Introduction to Probability (1) Ling-Chieh Kung Department of

Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID -

Sta$s$cs &amp; Experimental Design with R Barbara Kitchenham

Pisa mai 2006 Nonstandard Averaging and Signal Processing E.Benot Universit de La Rochelle

Polynomial completeness properties Erhard Aichinger Department of Algebra Johannes Kepler

Properties of the automorphism group and a probabilistic construction of a class of countable

Sta$s$cs & Experimental Design with R Barbara Kitchenham