Lecture 2: Probability Theory and Linear Algebra Review Dr. - - PowerPoint PPT Presentation

lecture 2 probability theory and linear algebra review
SMART_READER_LITE
LIVE PREVIEW

Lecture 2: Probability Theory and Linear Algebra Review Dr. - - PowerPoint PPT Presentation

Lecture 2: Probability Theory and Linear Algebra Review Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu Recap Previous Lecture 2 C. Long Lecture 2 January 28, 2018 Outline


slide-1
SLIDE 1

Lecture 2: Probability Theory and Linear Algebra Review

  • Dr. Chengjiang Long

Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu

slide-2
SLIDE 2
  • C. Long

Lecture 2 January 28, 2018 2

Recap Previous Lecture

slide-3
SLIDE 3
  • C. Long

Lecture 2 January 28, 2018 3

Outline

  • Probability Theory Review
  • Linear Algebra Review
slide-4
SLIDE 4
  • C. Long

Lecture 2 January 28, 2018 4

Outline

  • Probability Theory Review
  • Linear Algebra Review
slide-5
SLIDE 5
  • C. Long

Lecture 2 January 28, 2018 5

Discrete Random Variables

  • A Random Variable is a measurement on an
  • utcome of a random experiment.
  • Discrete versus Continuous random variable: a

random variable x is discrete if it can assume a finite or countably infinite number of values. x is continuous if it can assume all values in an interval.

slide-6
SLIDE 6
  • C. Long

Lecture 2 January 28, 2018 6

Example

  • Which of the following random variables are discrete

and which are continuous?

  • x = Number of houses sold by real estate developer per

week?

  • x = Number of heads in ten tosses of a coin?
  • x = Weight of a child at birth?
  • x = Time required to run100 yards?
slide-7
SLIDE 7
  • C. Long

Lecture 2 January 28, 2018 7

Examples

X is the Sum of Two Dice. What is the probability of X?

slide-8
SLIDE 8
  • C. Long

Lecture 2 January 28, 2018 8

Probability Distribution Example: X is the Sum

  • f Two Dice

This sequence provides an example of a discrete random variable. Suppose that you have a red die which, when thrown, takes the numbers from 1 to 6 with equal probability.

slide-9
SLIDE 9
  • C. Long

Lecture 2 January 28, 2018 9

Probability Distribution Example: X is the Sum

  • f Two Dice

Suppose that you also have a green die that can take the numbers from 1 to 6 with equal probability.

slide-10
SLIDE 10
  • C. Long

Lecture 2 January 28, 2018 10

Probability Distribution Example: X is the Sum

  • f Two Dice

We will define a random variable X as the sum of the numbers when the dice are thrown.

slide-11
SLIDE 11
  • C. Long

Lecture 2 January 28, 2018 11

Probability Distribution Example: X is the Sum

  • f Two Dice

For example, if the red die is 4 and the green one is 6, X is equal to 10.

slide-12
SLIDE 12
  • C. Long

Lecture 2 January 28, 2018 12

Probability Distribution Example: X is the Sum

  • f Two Dice

Similarly, if the red die is 2 and the green one is 5, X is equal to 7.

slide-13
SLIDE 13
  • C. Long

Lecture 2 January 28, 2018 13

Probability Distribution Example: X is the Sum

  • f Two Dice

The table shows all the possible outcomes.

slide-14
SLIDE 14
  • C. Long

Lecture 2 January 28, 2018 14

Probability Distribution Example: X is the Sum

  • f Two Dice

We will now define f, the frequencies associated with the possible values of X.

slide-15
SLIDE 15
  • C. Long

Lecture 2 January 28, 2018 15

Probability Distribution Example: X is the Sum

  • f Two Dice

For example, there are four outcomes which make X equal to 5.

slide-16
SLIDE 16
  • C. Long

Lecture 2 January 28, 2018 16

Probability Distribution Example: X is the Sum

  • f Two Dice

Similarly you can work out the frequencies for all the other values of X.

slide-17
SLIDE 17
  • C. Long

Lecture 2 January 28, 2018 17

Probability Distribution Example: X is the Sum

  • f Two Dice

Finally we will derive the probability of obtaining each value of X.

slide-18
SLIDE 18
  • C. Long

Lecture 2 January 28, 2018 18

Probability Distribution Example: X is the Sum

  • f Two Dice

If there is 1/6 probability of obtaining each number on the red die, and the same on the green die, each outcome in the table will occur with 1/36 probability.

slide-19
SLIDE 19
  • C. Long

Lecture 2 January 28, 2018 19

Probability Distribution Example: X is the Sum

  • f Two Dice

Hence to obtain the probabilities associated with the different values

  • f X, we divide the frequencies by 36.
slide-20
SLIDE 20
  • C. Long

Lecture 2 January 28, 2018 20

Probability Distribution Example: X is the Sum

  • f Two Dice

The distribution is shown graphically. in this example it is symmetrical, highest for X equal to 7 and declining on either side.

slide-21
SLIDE 21
  • C. Long

Lecture 2 January 28, 2018 21

Expected Value

  • Definition of E(X), the expected value of X:
  • The expected value of a random variable, also known

as its population mean, is the weighted average of its possible values, the weights being the probabilities attached to the values

slide-22
SLIDE 22
  • C. Long

Lecture 2 January 28, 2018 22

Expected Value Example

slide-23
SLIDE 23
  • C. Long

Lecture 2 January 28, 2018 23

Expected Value Properties

  • Linear
  • Also denoted by
slide-24
SLIDE 24
  • C. Long

Lecture 2 January 28, 2018 24

Variance

slide-25
SLIDE 25
  • C. Long

Lecture 2 January 28, 2018 25

Pairs of Discrete Random Variables

  • Let x and y be two discrete r.v.
  • For each possible pair of values, we can define a

joint probability P(x, y)

  • We can also define a joint probability mass function

P(x,y) which offers a complete characterization of the pair of

slide-26
SLIDE 26
  • C. Long

Lecture 2 January 28, 2018 26

Statistical Independence

  • Two random variables x and y are said to be

independent, if and only if that is, when knowing the value of x does not give us additional information for the value of y.

  • Or, equivalently

for any functions f(x) and g(y).

slide-27
SLIDE 27
  • C. Long

Lecture 2 January 28, 2018 27

Conditional Probability

  • When two r.v. are not independent, knowing one

allows better estimate of the other (e.g. outside temperature, season)

  • If independent P(x|y)=P(x)
slide-28
SLIDE 28
  • C. Long

Lecture 2 January 28, 2018 28

Sum and Product Rules

  • Example:
  • We have two boxes: one red and one blue
  • Red box: 2 apples and 6 oranges
  • Blue box: 3 apples and 1 orange

[C.M. Bishop, “Pattern Recognition and Machine Learning”, 2006]

slide-29
SLIDE 29
  • C. Long

Lecture 2 January 28, 2018 29

Sum and Product Rules

 Define:

  • B random variable for box picked (r or b)
  • F identity of fruit (a or o)

 p(B=r)=4/10 and p(B=b)=6/10

  • Events are mutually exclusive and include all possible
  • utcomes their probabilities must sum to 1.
slide-30
SLIDE 30
  • C. Long

Lecture 2 January 28, 2018 30

Sum and Product Rules

slide-31
SLIDE 31
  • C. Long

Lecture 2 January 28, 2018 31

Sum and Product Rules

slide-32
SLIDE 32
  • C. Long

Lecture 2 January 28, 2018 32

Sum and Product Rules

slide-33
SLIDE 33
  • C. Long

Lecture 2 January 28, 2018 33

Law of Total Probability

  • If an event A can occur in m different ways and if

these m different ways are mutually exclusive, then the probability of A occurring is the sum of the probabilities of the sub-events

slide-34
SLIDE 34
  • C. Long

Lecture 2 January 28, 2018 34

Sum and Product Rules

  • Back to the fruit baskets

– p(B=r)=4/10 and p(B=b)=6/10 – p(B=r) + p( B=b) = 1

  • Conditional probabilities

– p(F=a | B = r) = 1/4 – p(F=o | B = r) = 3/4 – p(F=a | B = b) = 3/4 – p(F=o | B = b) = 1/4

slide-35
SLIDE 35
  • C. Long

Lecture 2 January 28, 2018 35

Sum and Product Rules

  • Note:

p(F=a | B = r) + p(F=o | B = r) = 1 p(F=a) = p(F=a | B = r) p(B=r) + p(F=a | B = b)p(B=b) = 1/4 * 4/10 + 3/4 * 6/10 = 11/20

  • Sum rule: p(F=o) = ?
slide-36
SLIDE 36
  • C. Long

Lecture 2 January 28, 2018 36

Conditional Probability Example

  • A jar contains black and white marbles.
  • Two marbles are chosen without replacement.
  • The probability of selecting a black marble and then a

white marble is 0.34.

  • The probability of selecting a black marble on the first

draw is 0.47.

  • What is the probability of selecting a white marble
  • n the second draw, given that the first marble

drawn was black?

slide-37
SLIDE 37
  • C. Long

Lecture 2 January 28, 2018 37

Law of Total Probability

slide-38
SLIDE 38
  • C. Long

Lecture 2 January 28, 2018 38

Bayes Rule

  • x is the unknown cause
  • y is the observed evidence
  • Bayes rule shows how probability of x changes

after we have observed y

slide-39
SLIDE 39
  • C. Long

Lecture 2 January 28, 2018 39

Bayes Rule on the Fruit Example

  • Suppose we have selected an orange. Which box

did it come from?

slide-40
SLIDE 40
  • C. Long

Lecture 2 January 28, 2018 40

Continuous Random Variables

  • Examples: room temperature, time to run100m, weight
  • f child at birth…
  • Cannot talk about probability of that x has a particular

value

  • Instead, probability that x falls in an interval

probability density function

slide-41
SLIDE 41
  • C. Long

Lecture 2 January 28, 2018 41

Expected Value

slide-42
SLIDE 42
  • C. Long

Lecture 2 January 28, 2018 42

Normal (Gaussian) Distribution

  • Central Limit Theorem: under various conditions,

the distribution of the sum of d independent random variables approaches a limiting form known as the normal distribution

slide-43
SLIDE 43
  • C. Long

Lecture 2 January 28, 2018 43

Normal (Gaussian) Distribution

slide-44
SLIDE 44
  • C. Long

Lecture 2 January 28, 2018 44

Uniform Distribution

slide-45
SLIDE 45
  • C. Long

Lecture 2 January 28, 2018 45

Outline

  • Probability Theory Review
  • Linear Algebra Review
  • Summary
slide-46
SLIDE 46
  • C. Long

Lecture 2 January 28, 2018 46

Linear Algebra

slide-47
SLIDE 47
  • C. Long

Lecture 2 January 28, 2018 47

Vector space

  • Informal definition:
  • Formal definition includes axioms about associativity and

distributivity of the + and operators.

  • Always!!
slide-48
SLIDE 48
  • C. Long

Lecture 2 January 28, 2018 48

Example: Linear subspace of and

Line Plane

slide-49
SLIDE 49
  • C. Long

Lecture 2 January 28, 2018 49

Linear independence

  • The vectors are a linearly independent

set if:

  • It means that none of the vectors can be obtained as

a linear combination of the others.

slide-50
SLIDE 50
  • C. Long

Lecture 2 January 28, 2018 50

Example

  • Parallel vectors are always dependent:
  • Orthogonal vectors are always linearly independent:
slide-51
SLIDE 51
  • C. Long

Lecture 2 January 28, 2018 51

Basis of Vector

  • are linearly independent.
  • span the whole vector space V:
  • Any vector in V is a unique linear combination of the

basis.

  • The number of basis vectors is called the dimension
  • f V.
slide-52
SLIDE 52
  • C. Long

Lecture 2 January 28, 2018 52

Example

Note that a matrix can be interpreted as a vector.

slide-53
SLIDE 53
  • C. Long

Lecture 2 January 28, 2018 53

Matrix representation

  • Let be a basis of V.
  • Every has a unique representation.
  • Denote v by the column-vector:
  • The basis vectors are therefore denoted:
slide-54
SLIDE 54
  • C. Long

Lecture 2 January 28, 2018 54

Linear Operation

  • is called linear operator if:
  • In particular, A(0 ) = 0
  • Linear operators we know:

– Scaling – Rotation, reflection – Translation is not linear – moves the origin

slide-55
SLIDE 55
  • C. Long

Lecture 2 January 28, 2018 55

Example: Rotation is a linear operator

slide-56
SLIDE 56
  • C. Long

Lecture 2 January 28, 2018 56

Example: Rotation is a linear operator

slide-57
SLIDE 57
  • C. Long

Lecture 2 January 28, 2018 57

Matrix operations

  • Addition, subtraction, scalar multiplication – simple…
  • Multiplication of matrix by column vector:
slide-58
SLIDE 58
  • C. Long

Lecture 2 January 28, 2018 58

Matrix by vector multiplication

  • Sometimes a better way to look at it:
  • Ab is a linear combination of A’s columns
slide-59
SLIDE 59
  • C. Long

Lecture 2 January 28, 2018 59

How to Understand Matrix Product

slide-60
SLIDE 60
  • C. Long

Lecture 2 January 28, 2018 60

Transposition: make the rows to be the columns

slide-61
SLIDE 61
  • C. Long

Lecture 2 January 28, 2018 61

Matrix properties

  • Matrix is non-singular if
  • is called the inverse of A.
  • A is non-singular
  • If A is non-singular then the equation has one

unique solution for each b.

  • A is non-singular the rows of A are linearly

independent (and so are the columns)

slide-62
SLIDE 62
  • C. Long

Lecture 2 January 28, 2018 62

Orthogonal matrices

  • Matrix is orthogonal if .
  • Follows:
  • The rows of A are orthonormal vectors!
  • Proof:
slide-63
SLIDE 63
  • C. Long

Lecture 2 January 28, 2018 63

The Trace

  • The trace of a square matrix denoted by tr(A) is the

sum of the diagonal elements

  • Some properties
slide-64
SLIDE 64
  • C. Long

Lecture 2 January 28, 2018 64

Example

slide-65
SLIDE 65
  • C. Long

Lecture 2 January 28, 2018 65

The Determinant

slide-66
SLIDE 66
  • C. Long

Lecture 2 January 28, 2018 66

The Determinant

  • For a square matrix A, the determinant is denoted

by |A| or det(A)

slide-67
SLIDE 67
  • C. Long

Lecture 2 January 28, 2018 67

The Determinant

For proof, see https://math.stackexchange.com/questions/427528/why- determinant-is-volume-of-parallelepiped-in-any-dimensions

slide-68
SLIDE 68
  • C. Long

Lecture 2 January 28, 2018 68

Covariance

  • Covariance is a numerical measure that shows how

much two random variables change together

  • Positive covariance: if one increases, the other is

likely to increase

  • Negative covariance: …
  • More precisely: the covariance is a measure of the

linear dependence between the two variables

slide-69
SLIDE 69
  • C. Long

Lecture 2 January 28, 2018 69

Covariance Matrix

  • For a vector of repeated measures, we define the

covariance matrix:

  • It is a symmetric, square matrix.
slide-70
SLIDE 70
  • C. Long

Lecture 2 January 28, 2018 70

Normal Distribution

slide-71
SLIDE 71
  • C. Long

Lecture 2 January 28, 2018 71

Multivariate Normal Distribution

  • The multivariate normal density in d dimensions is:
slide-72
SLIDE 72
  • C. Long

Lecture 2 January 28, 2018 72

Multivariate Normal Distribution

slide-73
SLIDE 73
  • C. Long

Lecture 2 January 28, 2018 73

Multivariate Normal Distribution

slide-74
SLIDE 74
  • C. Long

Lecture 2 January 28, 2018 74

Eigenvalues and Eigenvectors

  • For an n×n square matrix A, e is an eigenvector with

eigenvalueλ if Ae =λe Or (A-λI)e=0

  • If (A-λI) is invertible, the only solution is e=0 (trivial)
  • For non-trivial solutions: det(A-λI)=0
  • Above equation is called the “characteristic polynomial”
  • Solutions are not unique

– If e is an eigenvector αe is also an eigenvector

slide-75
SLIDE 75
  • C. Long

Lecture 2 January 28, 2018 75

Example: a 2×2 matrix

  • -

x=2, y=-1

slide-76
SLIDE 76
  • C. Long

Lecture 2 January 28, 2018 76

Example: a 2×2 matrix

  • -

x=1, y=2

slide-77
SLIDE 77
  • C. Long

Lecture 2 January 28, 2018 77

Properties

  • The product of the eigenvalues = |A|
  • The sum of the eigenvalues = trace(A)
  • A symmetric matrix has real eigenvalues.
  • A real symmetric matrix can be written as:
slide-78
SLIDE 78
  • C. Long

Lecture 2 January 28, 2018 78