Artificial Intelligence: Methods and Applications Lecture 6: - PDF document

Artificial Intelligence: Methods and Applications Lecture 6: Probability theory Henrik Björklund Umeå University 4. December 2012 What is probability theory? Probability theory deals with mathematical models of random phenomena. We often use models of randomness to model uncertainty. Uncertainty can have different causes: ◮ Laziness: it is too difficult or computationally expensive to get to a certain answer ◮ Theoretical ignorance: We don’t know all the rules that influence the processes we are studying ◮ Practical ignorance: We know the rules in principle, but we don’t have all the data to apply them

Random experiments Mathematical models of randomness are based on the concept of random experiments. Such experiments should have two important properties: 1. The experiment must be repeatable 2. Future outcomes cannot be exactly predicted based on previous outcomes, even if we can controll all aspects of the experiment Examples: ◮ Coin tossing ◮ Quality control ◮ Genetics Deterministic vs. random models Deterministic models often give a macroscopic view of random phenomena. They describe an average behavior but ignore local random variations. Examples: ◮ Water molecules in a river ◮ Gas molecules in a heated container Lesson to be learned: Model on the right level of detail!

Key observation Consider a random experiment for which outcome A sometimes occurs and sometimes doesn’t occur. ◮ Repeat the experiment a large number of times and note, for each repetition, whether A occurs or not ◮ Let f n ( A ) be the number of times A occured in the first n experiments ◮ Let r n ( A ) = f n ( A ) be the relative frequency of A in the first n experiments n Key observation: As n → ∞ , the relative frequency r n ( A ) converges to a real number . Kolmogorov The concequences of the key obeservation were axiomatized by the russian mathematician Andrey Kolmogorov (1903-1987) in his book Grundbegriffe der Wahrscheinlichkeitsberechnung (1933).

Intuitions about probability i Since 0 ≤ f n ( A ) ≤ n we have 0 ≤ r n ( A ) ≤ 1. Thus the probability of A should be in [ 0 , 1 ] . ii f n ( ∅ ) = 0 and f n ( Everything ) = n . Thus the probability of ∅ should be 0 and the probability of Everything should be 1. iii Let B be Everything except A . Then f n ( A ) + f n ( B ) = n and r n ( A ) + r n ( B ) = 1. Thus the probability of A plus the probability of B should be 1. iv Let A ⊆ B . Then r n ( A ) ≤ r n ( B ) and thus the probability of A should be no bigger than that of B . v Let A ∩ B = ∅ and C = A ∪ B . Then r n ( C ) = r n ( A ) + r n ( B ) . Thus the probability of C should be the probability of A plus the probability of B . vi Let C = A ∪ B . Then f n ( C ) ≤ f n ( A ) + f n ( B ) and r n ( C ) ≤ r n ( A ) + r n ( B ) . Thus the probability of C should be at most the sum of the probabilities of A and B . vii Let C = A ∪ B and D = A ∩ B . Then f n ( C ) = f n ( A ) + f n ( B ) − f n ( D ) and thus the probability of C should be the probability of A plus the probability of B minus the probability of D . The probability space Definition A probability space is a tuple ( Ω , F , P ) where ◮ Ω is the sample space or set of all elementary events ◮ F is the set of events (for our purposes, we can consider F = P ( Ω ) ) ◮ P : F → R is the probability function Note: We often use logical formulas to describe events: Sunny ∧ ¬ Freezing

Kolmogorov’s axioms Kolmogorov formulated three axioms that the probability function P must satisfy. The rest of probability theory can be built from these axioms. A1 For any A ∈ F , there is a nonnegative real number P ( A ) A2 P ( Ω ) = 1 A3 Let { A n | 1 ≤ n } be a collection of pairwise disjoint events. Let A = � ∞ n = 1 A n be their union. Then P ( A ) = Σ ∞ n = 1 P ( A n ) Intuition i : ∀ A : P ( A ) ∈ [ 0 , 1 ] P ( A ) ≥ 0 A1 Let B = Ω \ A P ( B ) ≥ 0 A1 P ( A ∪ B ) = P ( A ) + P ( B ) A3 P ( A ∪ B ) = P ( Ω ) = 1 A2 P ( A ) ≤ 1

Intuition ii : P ( ∅ ) = 0 and P ( Ω ) = 1 P ( Ω ) = 1 A2 P ( ∅ ∪ Ω ) = P ( ∅ ) + P ( Ω ) A3 0 ≤ P ( ∅ ∪ Ω ) ≤ 1 i P ( ∅ ) = 0 Let C = A ∪ B . Then Intuition iv : P ( C ) ≤ P ( A ) + P ( B ) . Let A ′ = A \ B , B ′ = B \ A , D = A ∩ B 0 ≤ P ( A ′ ) , P ( B ′ ) , P ( D ) ≤ 1 i P ( A ) = P ( A ′ ∪ D ) = P ( A ′ ) + P ( D ) A3 P ( B ) = P ( B ′ ∪ D ) = P ( B ′ ) + P ( D ) A3 P ( A ) + P ( B ) = P ( A ′ ) + P ( B ′ ) + 2 · P ( D ) P ( C ) = P ( A ′ ∪ B ′ ∪ D ) = P ( A ′ ) + P ( B ′ ) + P ( D ) A3 P ( C ) ≤ P ( A ) + P ( B )

Flipping coins Example Consider the random experiment of flipping a coin two times, one after the other. We have Ω = { HH , HT , TH , TT } and P ( { HH } ) = P ( { HT } ) = P ( { TH } ) = P ( { TT } ) = 1 / 4 . ◮ Let H 1 = { HH , HT } = the first flip results in a head ◮ Let H 2 = { HH , TH } = the second flip results in a head We have ◮ P ( H 1 ) = P ( { HH } ) + P ( { HT } ) = 1 / 2 ◮ P ( H 2 ) = P ( { HH } ) + P ( { TH } ) = 1 / 2 ◮ P ( H 1 ∩ H 2 ) = P ( { HH } ) = 1 / 4 = P ( H 1 ) · P ( H 2 ) Drawing from an urn Example Consider the random experiment of drawing two balls, one after the other, from an urn that contains a red, a blue, and a green ball. We have Ω = { RB , RG , BR , BG , GR , GB } and P ( { RB } ) = P ( { RG } ) = P ( { BR } ) = P ( { BG } ) = P ( { GR } ) = P ( { GB } ) = 1 / 6 . ◮ Let R 1 = { RB , RG } = the first ball is red ◮ Let B 2 = { RB , GB } = the second ball is blue We have ◮ P ( R 1 ) = P ( { RB } ) + P ( { RG } ) = 1 / 3 ◮ P ( B 2 ) = P ( { RB } ) + P ( { GB } ) = 1 / 3 ◮ P ( R 1 ∩ B 2 ) = P ( { RB } ) = 1 / 6 � = P ( R 1 ) · P ( B 2 ) = 1 / 9

Independent events The difference between the two examples is that in the first one, the two events are independent while in the second they are not. Definition Events A and B are independent if P ( A ∩ B ) = P ( A ) · P ( B ) . Conditional probability Definition Let A and B be events, with P ( B ) > 0. The conditional probability P ( A | B ) of A given B is given by P ( A | B ) = P ( A ∩ B ) . P ( B ) Notice: If B = Ω , then P ( A | B ) = P ( A ∩ Ω ) = P ( A ) P ( Ω ) = P ( A ) = P ( A ) . P ( Ω ) 1 Notice: If A and B are independent, then P ( A | B ) = P ( A ∩ B ) = P ( A ) · P ( B ) = P ( A ) . P ( B ) P ( B )

Flipping coins Example What is the probability of the second throw resulting in a head given that the first one results in a head? P ( H 2 | H 1 ) = P ( H 2 ∩ H 1 ) P ( { HH , HT } ) = 1 / 4 P ( { HH } ) = 1 / 2 = 1 / 2 = P ( H 2 ) P ( H 1 ) Drawing from an urn Example What is the probability of the second ball being blue given that the first one is red? P ( B 2 | R 1 ) = P ( B 2 ∩ R 1 ) P ( { RB , RG } ) = 1 / 6 P ( { RB } ) = 1 / 3 = 1 / 2 � = P ( B 2 ) = 1 / 3 P ( R 1 )

The product rule If we rewrite the definition of conditional probability, we get the product rule. Conditional probability: P ( A | B ) = P ( A ∩ B ) P ( B ) Product rule: P ( A ∩ B ) = P ( A | B ) · P ( B ) Bayes’ rule The product rule can be written in two different ways: P ( A ∩ B ) = P ( A | B ) · P ( B ) P ( A ∩ B ) = P ( B | A ) · P ( A ) Equating the two right-hand sides we get P ( B | A ) · P ( A ) = P ( A | B ) · P ( B ) . By dividing with P ( A ) we get Bayes’ rule: P ( B | A ) = P ( A | B ) · P ( B ) P ( A )

Thomas Bayes Thomas Bayes (1701-1761) was an English mathematician and presbyterian minister. His most important results were published after his death. Example Consider the random experiment of throwing three dice. We have 6 3 = 216 elementary events: Ω = { 1 , . . . , 6 } × { 1 , . . . , 6 } × { 1 , . . . , 6 } One benefit of this sample space is that the elementary events all have the same probability: P ( { ( 1 , 3 , 5 ) } ) = P ( { ( 2 , 2 , 6 ) } ) = 1 / 216 We may, however, be more interested in the number of eyes than the precise result of each die. There are 16 such outcomes (3 being the lowest and 18 being the highest). These outcompes are not, however, equally probable. ◮ The probability of having 3 eyes showing is P ( { ( 1 , 1 , 1 ) } ) = 1 / 216 ◮ The probability of having 4 eyes showing is P ( { ( 1 , 1 , 2 ) , ( 1 , 2 , 1 ) , ( 2 , 1 , 1 ) } ) = 3 / 216 = 1 / 72

Example For the number of eyes, we introduce the random variable Eyes . Eyes can take values from the domain { 3 , . . . , 18 } . We can now talk about the probability of Eyes taking on certain values: ◮ P ( Eyes = 3 ) = P ( { ( 1 , 1 , 1 ) } ) = 1 / 216 ◮ P ( Eyes = 4 ) = P ( { ( 1 , 1 , 2 ) , ( 1 , 2 , 1 ) , ( 2 , 1 , 1 ) } ) = 1 / 72 ◮ P ( Eyes = 5 ) = 1 / 36 Example Consider our urn example again. ◮ Let B 1 be a random variable that takes values from { red , blue , green } and represents the color of the first ball ◮ Let B 2 represent the color of the second ball P ( B 2 = blue | B 1 = red ) = P ( { RB , GB }|{ RB | RG } ) = P ( { RB , RG } ) = 1 / 6 P ( { RB } ) = 1 / 3 = 1 / 2

Probability distributions For random variables with finite domains, the probability distribution simply defines the probability of the variable taking on each of the different values: P ( Eyes ) = < 1 / 216 , 1 / 72 , 1 / 36 , . . . , 1 / 36 , 1 / 72 , 1 / 216 > We can also talk about the probability distribution of conditional probabilities:   0 1 / 2 1 / 2 P ( B 2 | B 1 ) = 1 / 2 0 1 / 2   1 / 2 1 / 2 0 Continuous distributions For random variables with continuous domains, we cannot simply list the probabilities of all outcomes. Instead, we use probability density functions: P ( x ) = lim dx → 0 P ( x ≤ X ≤ x + dx ) / dx

Artificial Intelligence: Methods and Applications Lecture 6: - PDF document

Artificial Intelligence: Methods and Applications Lecture 6: Probability theory Henrik Bjrklund Ume University 4. December 2012 What is probability theory? Probability theory deals with mathematical models of random phenomena. We often

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

Lecture Overview What is Artificial Intelligence? Agents acting in an environment

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

What is Artificial Intelligence? . . . Exactly what the computer provides is the ability not to be

monomer : single unit Chapter 5 dimer : two monomers polymer : three or more monomers

DNS64 Implementer's Report Simon Perreault & Marc Blanchet Viagnie

Motivation p Customer: Nuestras Raices, Holyoke MA p Current use: drip irrigation p Need:

SECTION 26 SPECIAL MEETING OF COUNCIL September 6, 2016 Bradford Library, Zima Room Credit:

GLIFWC Chippewa Ceded Territory Traditional Food Regulatory Project Food Manager &

KOTLIN/NATIVE + CLANG, TRAVEL NOTES NIKOLAY IGOTTI, JETBRAINS KOTLIN IS NOT JUST AN ISLAND

Decision Networks CS 188: Artificial Intelligence Decision Networks and Value of Information

CSE 473: Artificial Intelligence Spring 2014 Uncertainty & Probabilistic Reasoning Hanna

Artificial Intelligence: Methods and Applications Lecture 6: - PDF document

Artificial Intelligence: Methods and Applications Lecture 6: Probability theory Henrik Bjrklund Ume University 4. December 2012 What is probability theory? Probability theory deals with mathematical models of random phenomena. We often

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

Lecture Overview What is Artificial Intelligence? Agents acting in an environment

1.1 What is AI? 1. What is Artificial Intelligence? 2. AI Past and Present 3. Rational

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

CSCI 446: Artificial Intelligence CSCI 446: Artificial Intelligence Course Website:

8th November 2019 Artificial Intelligence Finance Institute NYU Courant Artificial Intelligence

CSCI 446 ARTIFICIAL INTELLIGENCE EXAM 1 STUDY OUTLINE Introduction to Artificial Intelligence

Introduction to Artificial Intelligence What is Artificial Intelligence for YOU? CPSC 533

What is Artificial Intelligence? . . . Exactly what the computer provides is the ability not to be

monomer : single unit Chapter 5 dimer : two monomers polymer : three or more monomers

DNS64 Implementer's Report Simon Perreault &amp; Marc Blanchet Viagnie

Motivation p Customer: Nuestras Raices, Holyoke MA p Current use: drip irrigation p Need:

SECTION 26 SPECIAL MEETING OF COUNCIL September 6, 2016 Bradford Library, Zima Room Credit:

GLIFWC Chippewa Ceded Territory Traditional Food Regulatory Project Food Manager &amp;

KOTLIN/NATIVE + CLANG, TRAVEL NOTES NIKOLAY IGOTTI, JETBRAINS KOTLIN IS NOT JUST AN ISLAND

Decision Networks CS 188: Artificial Intelligence Decision Networks and Value of Information

CSE 473: Artificial Intelligence Spring 2014 Uncertainty &amp; Probabilistic Reasoning Hanna

DNS64 Implementer's Report Simon Perreault & Marc Blanchet Viagnie

GLIFWC Chippewa Ceded Territory Traditional Food Regulatory Project Food Manager &

CSE 473: Artificial Intelligence Spring 2014 Uncertainty & Probabilistic Reasoning Hanna