BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 3: - - PowerPoint PPT Presentation

btry 4830 6830 quantitative genomics and genetics
SMART_READER_LITE
LIVE PREVIEW

BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 3: - - PowerPoint PPT Presentation

BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 3: Random variables, random vectors, and probability distribution functions Jason Mezey jgm45@cornell.edu Sept. 2, 2014 (Th) 8:40-9:55 Announcements 1 Reminder: please make sure


slide-1
SLIDE 1

BTRY 4830/6830: Quantitative Genomics and Genetics

Lecture 3: Random variables, random vectors, and probability distribution functions

Jason Mezey jgm45@cornell.edu

  • Sept. 2, 2014 (Th) 8:40-9:55
slide-2
SLIDE 2

Announcements 1

  • Reminder: please make sure you received an email from the

class listserv (!!): MEZEY-QUANTGENOME-L

  • We sent out a test message so if you did not receive it, let

Amanda know ASAP

  • If you are not yet on the listserv, also email Amanda:

yg246@cornell.edu

  • Official office hours will start this week
  • Jason, Thurs. 3-5PM in 101 Biotech AND Genetic Med.

Conference

  • Amanda, Tues. 3-5PM in 343 Weill Hall
slide-3
SLIDE 3

Announcements II

  • We will be posting materials on the class website later today (!!):

http://mezeylab.cb.bscb.cornell.edu

  • We will be posting videos of the first two lectures (at least)
  • We will be posting a supplemental reading (#1) concerning R today

(+ other materials)

  • Homework #1 will be posted tomorrow (!!) on the class website:
  • You must email your answers to Amanda by 11:59PM following
  • Mon. (Sep.t 8 in this case!) from when the homework is assigned

(otherwise, it is late - no excuses!)

  • Problems will be divided into “easy”, “medium”, and “hard”
  • Homeworks are “open book” and you may work together but you

MUST hand in your own work!

slide-4
SLIDE 4

Conceptual Overview

System Question

Experiment

Sample

Assumptions

Inference P r

  • b

. M

  • d

e l s

Statistics

slide-5
SLIDE 5

Summary of lecture 3:

  • Last lecture, we introduced critical concepts for modeling

genetic systems, including rigorous definitions of experiments, sample spaces, sigma algebras, probability functions, conditional probabilities, and independence

  • In this lecture, we will add another critical building block:

random variables, random vectors, and we will begin discussing probability distributions

slide-6
SLIDE 6

Random Variables

Experiment (Sample Space) (Sigma Algebra)

X = x , Pr(X)

X =

Random Variable

Pr(F)

F

E X(Ω)

F E(Ω)

slide-7
SLIDE 7

Experiments and samples

  • Experiment - a manipulation or measurement of a system

that produces an outcome we can observe

  • Experimental trial - one instance of an experiment
  • Sample outcome - a possible outcome of the experiment
  • Sample - the results of one or more experimental trials
  • Example (Experiment / Sample outcomes):
  • Coin flip / “Heads” or “Tails”
  • Two coin flips / HH, HT, TH, TT
  • Measure heights in this class / 5’, 5’3’’,5’3.5, ...
slide-8
SLIDE 8

Sample Spaces / Sigma Algebra

  • Sample Space ( ) - set comprising all possible outcomes associated with an experiment
  • Examples (Experiment / Sample Space):
  • “Single coin flip” / {H, T}
  • “Two coin flips” / {HH, HT, TH, TT}
  • “Measure Heights” / {5’, 5’3’’, 5’3.5’’, ... }
  • Events - a subset of the sample space
  • Sigma Algebra ( ) - a collection of events (subsets) of of interest with the following

three properties: 1. , 2. , 3. Note that we are interested in a particular Sigma Algebra for each sample space...

  • Examples (Sample Space / Sigma Algebra):
  • {H, T} /
  • {HH, HT, TH, TT} / see last lecture
  • {5’, 5’3’’, 5’3.5’’, ... } / see last lecture

; 2 F

This A 2 F then Ac 2 F A1, A2, ... 2 F then S∞

i=1 Ai 2 F

F Ω

;, {H}, {T}, {H, T}

slide-9
SLIDE 9
  • As a model of that captures “random” (see last lecture!) outcomes of our experiment given our

system and all conditions under which the probability function (also called a probability measure) to be a function on the sigma algebra to the reals that satisfies the axioms of probability:

  • When we define a probability function, this is an assumption (!!), i.e. what we believe is an

appropriate probabilistic description of the outcomes of our experiment

  • We would like to have a concept that connects the actual outcomes of our experiment to this

probability model

  • We are often in situations where we are interested in outcomes that are a function of the original

sample space

  • For example, “Heads” and “Tails” accurately represent the outcomes of a coin flip example but

they are not numbers (and therefore have no intrinsic ordering, etc.)

  • We will define a random variable for this purpose
  • In general, the concept of a random variable is a “bridging” concept between the actual

experiment and the probability model, this provides a numeric description of sample outcomes that can be defined many ways (i.e. this provides great versatility), and this provides conceptual conveniences for the mathematical formalism

Random variables I

Pr(F) : F ! [0, 1]

slide-10
SLIDE 10
  • Random variable - a real valued function on the sample space:
  • Intuitively:
  • Note that these functions are not constrained by the axioms of probability,

e.g. not constrained to be between zero or one (although they must be measurable functions and admit a probability distribution on the random variable!!)

  • We generally define them in a manner that captures information that is of

interest

  • As an example, let’s define a random variable for the sample space of the

“two coin flip” experiment that maps each sample outcome to the “number

  • f Tails” of the outcome:

X(HH) = 0, X(HT) = 1, X(TH) = 1, X(TT) = 2

R

Random variables II

X(Ω) : Ω → R

E X(Ω)

slide-11
SLIDE 11

Random variables III

  • Examples of why we might start with S instead of X?
  • There is a “true” sample space of the experiment, even if we can’t (or

don’t) measure the basic elements

  • We often want to define several random variables on S and this provides

the same starting point for each

  • There is no loss of information if we start with the most basic elements
  • f the sample space and define random variables on this space
  • This approach allows us to handle non-numeric and numeric sample

spaces (sets) in the same framework

X1 X2

→ X1(Ω) : Ω → R X2(Ω) : Ω → R

slide-12
SLIDE 12

Random variables III

  • A critical point to note: because we have defined a probability function on

the sigma algebra, this “induces” a probability function on the random variable X:

  • We often use an “upper” case letter to represent the function and a

“lower” case letter to represent the values the function takes but (unfortunately) we will refer to both as “the random variable” (!!)

  • We will divide our discussion of random variables (which we will abbreviate

r.v.) and the induced probability distributions into cases that are discrete (taking individual point values) or continuous (taking on values within an interval of the reals), since these have slightly different properties (but the same foundation is used to define both!!)

e Pr(X),

x

  • n the random

Pr(X = x), variable X.

Pr(F) E X(Ω)

slide-13
SLIDE 13

Discrete random variables / probability mass functions (pmf)

  • If we define a random variable on a discrete sample space, we produce a

discrete random variable. For example, our two coin flip / number of Tails example:

  • The probability function in this case will induce a probability distribution that

we call a probability mass function which we will abbreviate as pmf

  • For our example, if we consider a fair coin probability model (assumption!) for
  • ur two coin flip experiment and define a “number of Tails” r.v., we induce the

following pmf: X(HH) = 0, X(HT) = 1, X(TH) = 1, X(TT) = 2

Pr(HH) = Pr(HT) = Pr(TH) = Pr(TT) = 0.25

PX(x) = Pr(X = x) =    Pr(X = 0) = 0.25 Pr(X = 1) = 0.5 Pr(X = 2) = 0.25

slide-14
SLIDE 14

Discrete random variables / cumulative mass functions (cmf)

  • An alternative (and important - stay tuned!) representation of a discrete probability

model is a cumulative mass function which we will abbreviate (cmf):

  • This definition is not particularly intuitive, so it is often helpful to consider a graph
  • illustration. For example, for our two coin flip / fair coin / number of Tails example:

FX(x) = Pr(X 6 x) where we define this function for X from −∞ to +∞.

slide-15
SLIDE 15

Continuous random variables / probability density functions (pdf)

  • For a continuous sample space, we can define a discrete random variable or

a continuous random variable (or a mixture!)

  • For continuous random variables, we will define analogous “probability” and

“cumulative” functions, although these will have different properties

  • For this class, we are considering only one continuous sample space: the

reals (or more generally the multidimensional Euclidean space)

  • Recall that we will use the reals as a convenient approximation to the true

sample space

  • For the reals, we define a probability density function (pdf):
  • A pdf defines the probability of an interval of a random variable, i.e. the

probability that the random variable will take a value in that interval

) fX(x)

Pr(a 6 X 6 b) = b

a

fX(x)dx

slide-16
SLIDE 16

Probability density functions (pdf): normal example

  • To illustrate the concept of a pdf, let’s consider the reals as the

(approximate!) sample space of human heights, the normal (also called Gaussian) probability function as a probability model for human heights, and the random variable X that takes the value “height” (what kind of function is this!?)

  • In this case, the pdf of X has the following form: fX(x) =

1 √ 2πσ2 e (x−µ)2

2σ2

slide-17
SLIDE 17

Continuous random variables / cumulative density functions (cdf)

  • For continuous random variables,

we also have an analog to the cmf, which is the cumulative density function abbreviated as cdf:

  • Again, a graph illustration is

instructive

  • Note the cdf runs from zero to
  • ne (why is this?)

FX(x) = x

fX(x)dx

slide-18
SLIDE 18

Mathematical properties of continuous r.v.’s

  • The pdf of X, a continuous r.v., does not represent the

probability of a specific value of X, rather we can use it to find the probability that a value of X falls in an interval [a,b]:

  • Related to this concept, for a continuous random variable, the

probability of specific value (or point) is zero (why is this!?)

  • For a specific continuous distribution the cdf is unique but the

pdf is not, since we can assign values to non-measurable sets

  • If this is the case, how would we ever get a specific value when

performing an experiment!?

Pr(a 6 X 6 b) = b

a

fX(x)dx

slide-19
SLIDE 19

Random vectors

  • We are often in situations where we are interested in defining more than
  • ne r.v. on the same sample space
  • When we do this, we define a random vector
  • Note that a vector, in its simplest form, may be considered a set of numbers

(e.g. [1.2, 2.0, 3.3] is a vector with three elements)

  • Also note that vectors (when a vector space is defined) have similar

properties to numbers in the sense that we can define operations for them (e.g. addition, multiplication), which we will use later in this course

  • Beyond keeping track of multiple r.v.’s, a random vector works just like a r.v.,

i.e. a probability function induces a probability function on the random vector and we may consider discrete or continuous (or mixed!) random vectors

  • Finally, note that while we can define several r.v.’s on the same sample

space, we can only define one true probability function (why!?)

slide-20
SLIDE 20

That’s it for today

  • Remember your first homework will be posted tomorrow!!
  • Next lecture, we will introduce random vectors,

expectations, variances, covariances, and begin our discussion

  • f parameterized probability models