BTRY 4830/6830: Quantitative Genomics and Genetics
Lecture 3: Random variables, random vectors, and probability distribution functions
Jason Mezey jgm45@cornell.edu
- Sept. 2, 2014 (Th) 8:40-9:55
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 3: - - PowerPoint PPT Presentation
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 3: Random variables, random vectors, and probability distribution functions Jason Mezey jgm45@cornell.edu Sept. 2, 2014 (Th) 8:40-9:55 Announcements 1 Reminder: please make sure
Jason Mezey jgm45@cornell.edu
class listserv (!!): MEZEY-QUANTGENOME-L
Amanda know ASAP
yg246@cornell.edu
Conference
http://mezeylab.cb.bscb.cornell.edu
(+ other materials)
(otherwise, it is late - no excuses!)
MUST hand in your own work!
Experiment
Assumptions
. M
e l s
Statistics
genetic systems, including rigorous definitions of experiments, sample spaces, sigma algebras, probability functions, conditional probabilities, and independence
random variables, random vectors, and we will begin discussing probability distributions
Experiment (Sample Space) (Sigma Algebra)
X = x , Pr(X)
Random Variable
Pr(F)
F
that produces an outcome we can observe
three properties: 1. , 2. , 3. Note that we are interested in a particular Sigma Algebra for each sample space...
; 2 F
This A 2 F then Ac 2 F A1, A2, ... 2 F then S∞
i=1 Ai 2 F
F Ω
;, {H}, {T}, {H, T}
Ω
system and all conditions under which the probability function (also called a probability measure) to be a function on the sigma algebra to the reals that satisfies the axioms of probability:
appropriate probabilistic description of the outcomes of our experiment
probability model
sample space
they are not numbers (and therefore have no intrinsic ordering, etc.)
experiment and the probability model, this provides a numeric description of sample outcomes that can be defined many ways (i.e. this provides great versatility), and this provides conceptual conveniences for the mathematical formalism
Pr(F) : F ! [0, 1]
e.g. not constrained to be between zero or one (although they must be measurable functions and admit a probability distribution on the random variable!!)
interest
“two coin flip” experiment that maps each sample outcome to the “number
X(HH) = 0, X(HT) = 1, X(TH) = 1, X(TT) = 2
X(Ω) : Ω → R
E X(Ω)
don’t) measure the basic elements
the same starting point for each
spaces (sets) in the same framework
→ X1(Ω) : Ω → R X2(Ω) : Ω → R
the sigma algebra, this “induces” a probability function on the random variable X:
“lower” case letter to represent the values the function takes but (unfortunately) we will refer to both as “the random variable” (!!)
r.v.) and the induced probability distributions into cases that are discrete (taking individual point values) or continuous (taking on values within an interval of the reals), since these have slightly different properties (but the same foundation is used to define both!!)
e Pr(X),
Pr(X = x), variable X.
Pr(F) E X(Ω)
discrete random variable. For example, our two coin flip / number of Tails example:
we call a probability mass function which we will abbreviate as pmf
following pmf: X(HH) = 0, X(HT) = 1, X(TH) = 1, X(TT) = 2
Pr(HH) = Pr(HT) = Pr(TH) = Pr(TT) = 0.25
PX(x) = Pr(X = x) = Pr(X = 0) = 0.25 Pr(X = 1) = 0.5 Pr(X = 2) = 0.25
model is a cumulative mass function which we will abbreviate (cmf):
FX(x) = Pr(X 6 x) where we define this function for X from −∞ to +∞.
a continuous random variable (or a mixture!)
“cumulative” functions, although these will have different properties
reals (or more generally the multidimensional Euclidean space)
sample space
probability that the random variable will take a value in that interval
) fX(x)
Pr(a 6 X 6 b) = b
a
fX(x)dx
(approximate!) sample space of human heights, the normal (also called Gaussian) probability function as a probability model for human heights, and the random variable X that takes the value “height” (what kind of function is this!?)
1 √ 2πσ2 e (x−µ)2
2σ2
we also have an analog to the cmf, which is the cumulative density function abbreviated as cdf:
instructive
FX(x) = x
⇤
fX(x)dx
probability of a specific value of X, rather we can use it to find the probability that a value of X falls in an interval [a,b]:
probability of specific value (or point) is zero (why is this!?)
pdf is not, since we can assign values to non-measurable sets
performing an experiment!?
Pr(a 6 X 6 b) = b
a
fX(x)dx
(e.g. [1.2, 2.0, 3.3] is a vector with three elements)
properties to numbers in the sense that we can define operations for them (e.g. addition, multiplication), which we will use later in this course
i.e. a probability function induces a probability function on the random vector and we may consider discrete or continuous (or mixed!) random vectors
space, we can only define one true probability function (why!?)
expectations, variances, covariances, and begin our discussion