 
              Statistical Natural Language Processing Outcome – Whether a review is negative or positive: Outcome Negative Positive Value – The POS tag of a word: Noun Random variables Verb Adj Adv … Value 1 mapping outcomes to real numbers Summary 3 – whether an email is spam or not Random variables uncertainties outcomes of a trial to (a vector of) real numbers (a real valued function on the sample space) – height or weight of a person – length of a word randomly chosen from a corpus A refresher on probability theory Some probability distributions Note: not all of these are numbers Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 5 / 59 Probability theory 2 4 Some probability distributions 7 1 2 3 4 5 6 8 0.1 9 10 11 Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 0.2 Sentence length … Ç. Çöltekin, …or 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 … SfS / University of Tübingen Probability Summer Semester 2017 6 / 59 Probability theory Some probability distributions Summary Probability mass function Example: probabilities for sentence length in words Summary – the fjrst word of a book, or fjrst word uttered by a baby Probability theory 0.5 the event is as likely to happen as it is not Probability theory Some probability distributions Summary What is probability? between 0 and 1 0 the event is impossible 1 the event is certain Summer Semester 2017 4 / 59 Axioms of probability state that Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 2 / 59 1 / 59 SfS / University of Tübingen Some probability distributions Summary Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2017 Probability theory Some probability distributions Why probability theory? Ç. Çöltekin, But it must be recognized that the notion ’probability of a sentence’ is an entirely useless one, under any known interpretation of this term. — Chomsky (1968) Short answer: practice proved otherwise. Slightly long answer tendencies, rather than fjxed rules (human) cognition, language is not an exception Probability theory 7 / 59 Summary Axioms of probability do not specify how to assign Summer Semester 2017 3 / 59 Probability theory Some probability distributions Summary Where do probabilities come from What you should already know SfS / University of Tübingen Ç. Çöltekin, probabilities to events. Two major (rival) ways of assigning probabilities to events are event is its relative frequency (in the limit) degrees of belief Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 • Many linguistic phenomena are better explained as • Probability theory captures many characteristics of • Probability is a measure of (un)certainty • We quantify the probability of an event with a number • P ( ) = ? • P ( ) = ? • P ( ) = ? • The set of all possible outcome s of a trial is called sample • P ( { , } ) = ? space ( Ω ) • P ( { , , } ) = ? • An event ( E ) is a set of outcomes 1. P ( E ) ∈ R , P ( E ) ⩾ 0 2. P ( Ω ) = 1 3. For disjoint events E 1 and E 2 , P ( E 1 ∪ E 2 ) = P ( E 1 ) + P ( E 2 ) • A random variable is a variable whose value is subject to • A random variable is always a number • Think of a random variable as mapping between the • Frequentist (objective) probabilities: probability of an • Example outcomes of uncertain experiments • Bayesian (subjective) probabilities: probabilities are • Probability mass function (PMF) of a discrete random variable • Continuous ( X ) maps every possible ( x ) value to its probability – frequency of a sound signal: 100 . 5 , 220 . 3 , 4321 . 3 … ( P ( X = x ) ). • Discrete x P ( X = x ) – Number of words in a sentence: 2 , 5 , 10 , … 0 . 155 0 . 185 0 . 210 0 1 0 . 194 0 . 102 0 . 066 0 . 039 0 . 023 0 . 012 0 . 005 0 . 004 1 2 3 4 5 6 7 8 9 10 11
Probability theory Ç. Çöltekin, Summer Semester 2017 11 / 59 Probability theory Some probability distributions Summary Example: two distributions with difgerent variances SfS / University of Tübingen Ç. Çöltekin, Summer Semester 2017 12 / 59 Probability theory Some probability distributions Summary Short divergence: Chebyshev’s inequality SfS / University of Tübingen random variable 0.25 Ç. Çöltekin, 9 / 59 Probability theory Some probability distributions Summary Expected value tendency SfS / University of Tübingen tendency Summer Semester 2017 10 / 59 Probability theory Some probability distributions Summary Variance and standard deviation Some probability distributions Probability 0.11 SfS / University of Tübingen Probability 14 / 59 Probability theory Some probability distributions Summary Mode, median, mean, standard deviation Visualization on sentence length example Sentence length SfS / University of Tübingen 0.1 0.2 mode = median = 3.0 Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 Summer Semester 2017 Ç. Çöltekin, 0.04 SfS / University of Tübingen 0.01 0.0001 This also shows why standardizing values of random variables, makes sense (the normalized quantity is often called the z-score). Ç. Çöltekin, Summer Semester 2017 functions 13 / 59 Probability theory Some probability distributions Summary Median and mode of a random variable Median is the mid-point of a distribution. Median of a random Mode is the value that occurs most often in the data. Summer Semester 2017 15 / 59 Ç. Çöltekin, 8 Cumulative Probability Sentence length 0.5 9 1.0 Length Prob. Summary C. Prob. 7 1 2 3 4 6 Cumulative distribution function Some probability distributions (note the notation: we use Summary 11 Probability density function (PDF) 10 probability density function s Probability theory 5 possible for ranges: Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 8 / 59 • F X ( x ) = P ( X ⩽ x ) • Continuous variables have • p ( x ) is not a probability p ( x ) 0 . 16 0 . 16 0 . 18 0 . 34 lowercase p for PDF) 0 . 21 0 . 55 1 0 . 19 0 . 74 • Area under p ( x ) sums to 1 0 . 10 0 . 85 • P ( X = x ) = 0 0 . 07 0 . 91 0 . 5 0 . 04 0 . 95 • Non zero probabilities are 0 . 02 0 . 97 0 0 . 01 0 . 99 0 1 2 ∫ b 0 . 01 0 . 99 P ( a ⩽ x ⩽ b ) = p ( x ) dx 0 . 00 1 . 00 a 1 2 3 4 5 6 7 8 9 10 11 • Expected value (mean) of a random variable X is, • Variance of a random variable X is, n n ∑ ∑ Var ( X ) = σ 2 = P ( x i )( x i − µ ) 2 = E [ X 2 ] − ( E [ X ]) 2 E [ X ] = µ = P ( x i ) x i = P ( x 1 ) x 1 + P ( x 2 ) x 2 + . . . + P ( x n ) x n i = 1 i = 1 • More generally, expected value of a function of X is • It is a measure of spread, divergence from the central ∑ E [ f ( X )] = P ( x ) f ( x ) • The square root of variance is called standard deviation x ( n � ) � • Expected value is an important measure of central ∑ � P ( x i ) x 2 − µ 2 σ = � i i = 1 • Note: it is not the ‘most likely’ value • Standard deviation is in the same units as the values of the • Expected value is linear E [ aX + bY ] = aE [ X ] + bE [ Y ] • Variance is not linear: σ 2 X + Y ̸ = σ 2 X + σ 2 Y (neither the σ ) For any probability distribution, and k > 1 , P ( | x − µ | > kσ ) ⩽ 1 k 2 σ = 0 . 7 Distance from µ 2σ 3σ 5σ 10σ 100σ σ = 1 . 3 z = x − µ − 6 − 4 − 2 2 4 6 σ variable is defjned as the number m that satisfjes µ = 3 . 56 P ( X ⩽ m ) ⩾ 1 and P ( X ⩾ m ) ⩾ 1 2 2 • Median of 1 , 4 , 5 , 8 , 10 is 5 • Median of 1 , 4 , 5 , 7 , 8 , 10 is 6 9 0 2 . = σ • Modes appear as peaks in probability mass (or density) • Mode of 1 , 4 , 4 , 8 , 10 is 4 • Modes of 1 , 4 , 4 , 8 , 9 , 9 are 4 and 9 1 2 3 4 5 6 7 8 9 10 11
Recommend
More recommend