Course : Data mining Lecture : Basic concepts on discrete - PowerPoint PPT Presentation

Course : Data mining Lecture : Basic concepts on discrete probability Aristides Gionis Department of Computer Science Aalto University visiting in Sapienza University of Rome fall 2016

reading assignment • your favorite book on probability, computing, and randomized algorithms, e.g., • Randomized algorithms, Motwani and Raghavan (chapters 3 and 4) or • Probability and computing, Mitzenmacher and Upfal (chapters 2, 3 and 4) Data mining — Basic concepts on discrete probability 2

events and probability • consider a random process (e.g., throw a die, pick a card from a deck) • each possible outcome is a simple event (or sample point) • the sample space is the set of all possible simple events. • an event is a set of simple events (a subset of the sample space) • with each simple event E we associate a real number 0 ≤ Pr[ E ] ≤ 1 which is the probability of E Data mining — Basic concepts on discrete probability 3

probability spaces and probability functions • sample space Ω: the set of all possible outcomes of the random process • family of sets F representing the allowable events: each set in F is a subset of the sample space Ω • a probability function Pr : F → R satisfies the following conditions 1 for any event E , 0 ≤ Pr[ E ] ≤ 1 2 Pr[Ω] = 1 3 for any finite (or countably infinite) sequence of pairwise mutually disjoint events E 1 , E 2 , . . .   � �  = Pr E i Pr[ E i ] i ≥ 1 i ≥ 1 Data mining — Basic concepts on discrete probability 4

the union bound • for any events E 1 , E 2 , . . . , E n � n � n � � Pr ≤ Pr[ E i ] E i i =1 i =1 Data mining — Basic concepts on discrete probability 5

conditional probability • the conditional probability that event E occurs given that event F occurs is Pr[ E | F ] = Pr[ E ∩ F ] Pr[ F ] • well-defined only if Pr[ F ] > 0 • we restrict the sample space to the set F • thus we are interested in Pr[ E ∩ F ] “normalized” by Pr[ F ] Data mining — Basic concepts on discrete probability 6

independent events • two events E and F are independent if and only if Pr[ E ∩ F ] = Pr[ E ] Pr[ F ] equivalently if and only if Pr[ E | F ] = Pr[ E ] Data mining — Basic concepts on discrete probability 7

conditional probability Pr[ E 1 ∩ E 2 ] = Pr[ E 1 ] Pr[ E 2 | E 1 ] generalization for k events E 1 , E 2 , . . . , E k Pr[ ∩ k i =1 E i ] = Pr[ E 1 ] Pr[ E 2 | E 1 ] Pr[ E 3 | E 1 ∩ E 2 ] . . . Pr[ E k | ∩ k − 1 i =1 E i ] Data mining — Basic concepts on discrete probability 8

birthday paradox E i : the i -th person has a different birthday than all 1 , . . . , i − 1 persons (consider n -day year) Pr[ E 1 ] Pr[ E 2 | E 1 ] . . . Pr[ E k | ∩ k − 1 Pr[ ∩ k i =1 E i ] = i =1 E i ] k � 1 − i − 1 � � ≤ n i =1 k � e − ( i − 1) / n ≤ i =1 e − k ( k − 1)2 / n = √ for k equal to about 2 n + 1 the probability is at most 1 / e as k increases the probability drops rapidly Data mining — Basic concepts on discrete probability 9

random variable • a random variable X on a sample space Ω is a function X : Ω → R • a discrete random variable takes only a finite (or countably infinite) number of values Data mining — Basic concepts on discrete probability 10

random variable — example • from birthday paradox setting: • E i : the i -th person has a different birthday than all 1 , . . . , i − 1 persons • define the random variable  1 the i -th person has different birthday  X i = than all 1 , . . . , i − 1 persons 0 otherwise  Data mining — Basic concepts on discrete probability 11

expectation and variance of a random variable • the expectation of a discrete random variable X , denoted by E [ X ], is given by � E [ X ] = x Pr[ X = x ] , x where the summation is over all values in the range of X • variance Var [ X ] = σ 2 X = E [( X − E [ X ]) 2 ] = E [( X − µ X ) 2 ] Data mining — Basic concepts on discrete probability 12

linearity of expectation • for any two random variables X and Y E [ X + Y ] = E [ X ] + E [ Y ] • for a constant c and a random variable X E [ cX ] = c E [ X ] Data mining — Basic concepts on discrete probability 13

coupon collector’s problem • n types of coupons • a collector picks coupons • in each trial a coupon type is chosen at random • how many trials are needed, in expectation, until the collector gets all the coupon types? Data mining — Basic concepts on discrete probability 14

coupon collector’s problem — analysis • let c 1 , c 2 , . . . , c X the sequence of coupons picked • c i ∈ { 1 , . . . , n } • call c i success if a new coupon type is picked • ( c 1 and c X are always successes) • divide the sequence in epochs: the i -th epoch starts after the i -th success and ends with the ( i + 1)-th success • define the random variable X i = length of the i -th epoch • easy to see that n − 1 � X = X i i =0 Data mining — Basic concepts on discrete probability 15

coupon collector’s problem — analysis (cont’d) probability of success in the i -th epoch p i = n − i n ( X i geometrically distributed with parameter p i ) E [ X i ] = 1 n = n − i p i from linearity of expectation � n − 1 � n − 1 n − 1 n 1 n � � � � E [ X ] = E = E [ X i ] = n − i = n i = nH n X i i =0 i =0 i =0 i =1 where H n is the harmonic number, asymptotically equal to ln n Data mining — Basic concepts on discrete probability 16

deviations • inequalities on tail probabilities • estimate the probability that a random variable deviates from its expectation Data mining — Basic concepts on discrete probability 17

Markov inequality • let X a random variable taking non-negative values • for all t > 0 Pr[ X ≥ t ] ≤ E [ X ] t or equivalently Pr[ X ≥ k E [ X ]] ≤ 1 k Data mining — Basic concepts on discrete probability 18

Markov inequality — proof • it is E [ f ( X )] = � x f ( x ) Pr[ X = x ] • define f ( x ) = 1 if x ≥ t and 0 otherwise • then E [ f ( X )] = Pr[ X ≥ t ] • notice that f ( x ) ≤ x / t implying that � X � E [ f ( X )] ≤ E t • putting everything together � X � = E [ X ] Pr[ X ≥ t ] = E [ f ( X )] ≤ E t t Data mining — Basic concepts on discrete probability 19

Chebyshev inequality • let X a random variable with expectaction µ X and standard deviation σ X • then for all t > 0 Pr[ | X − µ X | ≥ t σ X ] ≤ 1 t 2 Data mining — Basic concepts on discrete probability 20

Chebyshev inequality — proof • notice that Pr[ | X − µ X | ≥ t σ X ] = Pr[( X − µ X ) 2 ≥ t 2 σ 2 X ] • the random variable Y = ( X − µ X ) 2 has expectation σ 2 X • apply the Markov inequality on Y Data mining — Basic concepts on discrete probability 21

Chernoff bounds • let X 1 , . . . , X n independent Poisson trials • Pr[ X i = 1] = p i (and Pr[ X i = 0] = 1 − p i ) • define X = � i X i , so µ = E [ X ] = � i E [ X i ] = � i p i • for any δ > 0 Pr[ X > (1 + δ ) µ ] ≤ e − δ 2 µ 3 and Pr[ X < (1 − δ ) µ ] ≤ e − δ 2 µ 2 Data mining — Basic concepts on discrete probability 22

Chernoff bound — proof idea • consider the random variable e tX instead of X (where t is a parameter to be chosen later) • apply the Markov inequality on e tX and work with E [ e tX ] • E [ e tX ] turns into E [ � i e tX i ], which turns into � i E [ e tX i ], due to independence • calculations, and pick a t that yields the most tight bound optional homework: study the proof by yourself Data mining — Basic concepts on discrete probability 23

Chernoff bound — example • n coin flips • X i = 1 if i -th coin flip is H and 0 if T • µ = n / 2 • pick δ = 2 c √ n n 2 = e − 4 c 2 · n · n n 2 · 2 · 2 = e − c 2 drops very fast with c • then e − δ 2 µ • so 2 − c √ n ] = Pr[ X < (1 − δ ) µ ] ≤ e − δ 2 µ Pr[ X < n 3 = e − c 2 • and similarly with e − δ 2 µ 3 = e − 2 c 2 / 3 • so, the probability that the number of H ’s falls outside 2 − c √ n , n 2 + c √ n ] is very small the range [ n Data mining — Basic concepts on discrete probability 24

Course : Data mining Lecture : Basic concepts on discrete - PowerPoint PPT Presentation

Course : Data mining Lecture : Basic concepts on discrete probability Aristides Gionis Department of Computer Science Aalto University visiting in Sapienza University of Rome fall 2016 reading assignment your favorite book on probability,

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Data Mining Based Detection Methods Data Mining in Intrusion detection Feng Pan Outline

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Data Mining 2018 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 10, 2018

MATH 20: PROBABILITY Variance of Discrete Random Variables Xingru Chen

Randome Variables and Expectation Example: Finding the k -Smallest Element in an ordered set.

Probability and Statistics for Computer Science Its straigh+orward to link a number to the

Discrete Random Variables A random variable is a numerical value associated with the outcome of an

BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 4: Expectations, variances and

Probability Primer CS60077: Reinforcement Learning Abir Das IIT Kharagpur July 19 and 25, 2019

Uncertainty CS 486/686 University of Waterloo Sept 30, 2008 1 CS486/686 Lecture Slides (c)

Lecture 2: Probability Theory and Linear Algebra Review Dr. Chengjiang Long Computer Vision