Principles of Data Reduction Introduction to BIOSTAT602 Lecture 01 - - PowerPoint PPT Presentation

principles of data reduction introduction to biostat602
SMART_READER_LITE
LIVE PREVIEW

Principles of Data Reduction Introduction to BIOSTAT602 Lecture 01 - - PowerPoint PPT Presentation

. Summary January 10th, 2013 Biostatistics 602 - Lecture 01 Hyun Min Kang January 10th, 2013 Hyun Min Kang Principles of Data Reduction Introduction to BIOSTAT602 Lecture 01 Biostatistics 602 - Statistical Inference . . . . Sufficient


slide-1
SLIDE 1

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

. .

Biostatistics 602 - Statistical Inference Lecture 01 Introduction to BIOSTAT602 Principles of Data Reduction

Hyun Min Kang January 10th, 2013

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 1 / 39

slide-2
SLIDE 2

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Today’s Outline

  • Course Syllabus
  • Overview of BIOSTAT602
  • Sufficient Statistics

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 2 / 39

slide-3
SLIDE 3

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Basic Polls : Home Department

.

What is your home department?

. .

  • Biostatistics
  • Statistics
  • Bioinformatics
  • Survey Methodology
  • Other Departments

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 3 / 39

slide-4
SLIDE 4

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Basic Polls : Official Roster

.

Are you taking the class, or just sitting in?

. .

  • Taking for credit
  • Sitting in
  • Plan to take, but needs permission

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 4 / 39

slide-5
SLIDE 5

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Basic Polls : 601 History

.

Have you taken BIOSTAT601 or equivalent class?

. .

  • I took BIOSTAT601.
  • I took an BIOSTAT601-equivalent class.
  • I do not have BIOSTAT601 equivalent background

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 5 / 39

slide-6
SLIDE 6

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT602 - Course Information

.

Instructor

. . Name Hyun Min Kang Office M4531, SPH II E-mail hmkang@umich.edu Office hours Thursday 4:30-5:30pm .

Course Web Page

. .

  • See http://genome.sph.umich.edu/wiki/602
  • No C-Tools site will be available in 2013.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 6 / 39

slide-7
SLIDE 7

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT602 - Basic Information

.

Class Time and Location

. . Time Tuesday and Thursday 1:00-3:00pm. Location USB 2260 .

Prerequisites

. . . . . . . .

  • BIOSTAT601 or equivalent knowledge

(Chapter 1-5.5 of Casella and Berger)

  • Basic calculus and matrix algebra

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 7 / 39

slide-8
SLIDE 8

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT602 - Basic Information

.

Class Time and Location

. . Time Tuesday and Thursday 1:00-3:00pm. Location USB 2260 .

Prerequisites

. .

  • BIOSTAT601 or equivalent knowledge

(Chapter 1-5.5 of Casella and Berger)

  • Basic calculus and matrix algebra

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 7 / 39

slide-9
SLIDE 9

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT602 - Textbooks

.

Required Textbook

. . Statistical Inference, 2nd Edition, by Casella and Berger .

Recommended Textbooks

. . . . . . . .

  • Statistical Inference, by Garthwaite, Jolliffe and Jones.
  • All of Statistics: A Concise Course in Statistical Inference, by

Wasserman

  • Mathematical Statistics: Basics Ideas and Selected Topics, by Bickel

and Doksum.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 8 / 39

slide-10
SLIDE 10

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT602 - Textbooks

.

Required Textbook

. . Statistical Inference, 2nd Edition, by Casella and Berger .

Recommended Textbooks

. .

  • Statistical Inference, by Garthwaite, Jolliffe and Jones.
  • All of Statistics: A Concise Course in Statistical Inference, by

Wasserman

  • Mathematical Statistics: Basics Ideas and Selected Topics, by Bickel

and Doksum.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 8 / 39

slide-11
SLIDE 11

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Grading

  • Homework 20%
  • Midterm 40%
  • Final 40%

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 9 / 39

slide-12
SLIDE 12

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Important Dates

  • First Lecture : Thursday January 10th, 2013
  • Midterm : 1:00pm - 3:00pm, Thursday February 21st, 2013
  • No lectures on March 5th and 7th (Vacation)
  • No lecture on April 2nd (Instructor out of town)
  • Last Lecture : Tuesday April 23rd, 2013 (Total of 26 lectures)
  • Final : 4:00pm - 6:00pm, Thursday April 25th, 2013 (University-wide

schedule)

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 10 / 39

slide-13
SLIDE 13

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Honor code

  • Honor code is STRONGLY enforced throughout the course.
  • The key principle is that all your homework and exams must be on your
  • wn.
  • See http://www.sph.umich.edu/academics/policies/conduct.html for

details.

  • You are encouraged to discuss the homework with your colleagues.
  • You are NOT allowed to share any piece of your homework with your

colleagues electronically or by a hard copy.

  • If a break of honor code is identified, your entire homework (or exam)

will be graded as zero, while incomplete submission of homework assignment will be considered for partial credit.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 11 / 39

slide-14
SLIDE 14

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Honor code

  • Honor code is STRONGLY enforced throughout the course.
  • The key principle is that all your homework and exams must be on your
  • wn.
  • See http://www.sph.umich.edu/academics/policies/conduct.html for

details.

  • You are encouraged to discuss the homework with your colleagues.
  • You are NOT allowed to share any piece of your homework with your

colleagues electronically or by a hard copy.

  • If a break of honor code is identified, your entire homework (or exam)

will be graded as zero, while incomplete submission of homework assignment will be considered for partial credit.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 11 / 39

slide-15
SLIDE 15

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Honor code

  • Honor code is STRONGLY enforced throughout the course.
  • The key principle is that all your homework and exams must be on your
  • wn.
  • See http://www.sph.umich.edu/academics/policies/conduct.html for

details.

  • You are encouraged to discuss the homework with your colleagues.
  • You are NOT allowed to share any piece of your homework with your

colleagues electronically or by a hard copy.

  • If a break of honor code is identified, your entire homework (or exam)

will be graded as zero, while incomplete submission of homework assignment will be considered for partial credit.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 11 / 39

slide-16
SLIDE 16

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Honor code

  • Honor code is STRONGLY enforced throughout the course.
  • The key principle is that all your homework and exams must be on your
  • wn.
  • See http://www.sph.umich.edu/academics/policies/conduct.html for

details.

  • You are encouraged to discuss the homework with your colleagues.
  • You are NOT allowed to share any piece of your homework with your

colleagues electronically or by a hard copy.

  • If a break of honor code is identified, your entire homework (or exam)

will be graded as zero, while incomplete submission of homework assignment will be considered for partial credit.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 11 / 39

slide-17
SLIDE 17

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

About the style of the class

  • In previous years, the instructors wrote the notes on the whiteboard or

projected the notes onto a screen during the class

  • In this class, we will use prepared slides for the sake of clarity.
  • For this reason, the his class has a risk to serve as a slot for

after-lunch nap.

  • Instructor strongly encourages to copy the slides during the class by

hand to digest the material, although all slides will be available online.

  • Focusing on the class will be helpful a lot.
  • Feedback on the class, especially on the lecture style, would be very

much appreciated.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 12 / 39

slide-18
SLIDE 18

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

About the style of the class

  • In previous years, the instructors wrote the notes on the whiteboard or

projected the notes onto a screen during the class

  • In this class, we will use prepared slides for the sake of clarity.
  • For this reason, the his class has a risk to serve as a slot for

after-lunch nap.

  • Instructor strongly encourages to copy the slides during the class by

hand to digest the material, although all slides will be available online.

  • Focusing on the class will be helpful a lot.
  • Feedback on the class, especially on the lecture style, would be very

much appreciated.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 12 / 39

slide-19
SLIDE 19

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

About the style of the class

  • In previous years, the instructors wrote the notes on the whiteboard or

projected the notes onto a screen during the class

  • In this class, we will use prepared slides for the sake of clarity.
  • For this reason, the his class has a risk to serve as a slot for

after-lunch nap.

  • Instructor strongly encourages to copy the slides during the class by

hand to digest the material, although all slides will be available online.

  • Focusing on the class will be helpful a lot.
  • Feedback on the class, especially on the lecture style, would be very

much appreciated.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 12 / 39

slide-20
SLIDE 20

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

About the style of the class

  • In previous years, the instructors wrote the notes on the whiteboard or

projected the notes onto a screen during the class

  • In this class, we will use prepared slides for the sake of clarity.
  • For this reason, the his class has a risk to serve as a slot for

after-lunch nap.

  • Instructor strongly encourages to copy the slides during the class by

hand to digest the material, although all slides will be available online.

  • Focusing on the class will be helpful a lot.
  • Feedback on the class, especially on the lecture style, would be very

much appreciated.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 12 / 39

slide-21
SLIDE 21

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

About the style of the class

  • In previous years, the instructors wrote the notes on the whiteboard or

projected the notes onto a screen during the class

  • In this class, we will use prepared slides for the sake of clarity.
  • For this reason, the his class has a risk to serve as a slot for

after-lunch nap.

  • Instructor strongly encourages to copy the slides during the class by

hand to digest the material, although all slides will be available online.

  • Focusing on the class will be helpful a lot.
  • Feedback on the class, especially on the lecture style, would be very

much appreciated.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 12 / 39

slide-22
SLIDE 22

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

About the style of the class

  • In previous years, the instructors wrote the notes on the whiteboard or

projected the notes onto a screen during the class

  • In this class, we will use prepared slides for the sake of clarity.
  • For this reason, the his class has a risk to serve as a slot for

after-lunch nap.

  • Instructor strongly encourages to copy the slides during the class by

hand to digest the material, although all slides will be available online.

  • Focusing on the class will be helpful a lot.
  • Feedback on the class, especially on the lecture style, would be very

much appreciated.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 12 / 39

slide-23
SLIDE 23

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

”Statistical Inference”

.

Probability in BIOSTAT601

. . Given some specified probability mass function (pmf) or probability density function (pdf), we can make probabilistic statement about data that could be generated from the model. .

Statistical Inference in BIOSTAT602

. . . . . . . . A process of drawing conclusions or making statements about a population

  • f data based on a random sample of data from the population.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 13 / 39

slide-24
SLIDE 24

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

”Statistical Inference”

.

Probability in BIOSTAT601

. . Given some specified probability mass function (pmf) or probability density function (pdf), we can make probabilistic statement about data that could be generated from the model. .

Statistical Inference in BIOSTAT602

. . A process of drawing conclusions or making statements about a population

  • f data based on a random sample of data from the population.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 13 / 39

slide-25
SLIDE 25

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Notations in BIOSTAT602

  • X1, · · · , Xn : Random variables identically and independently

distributed (iid) with probability density (or mass) function fX(x|θ).

  • x

xn : Realization of random variables X Xn.

  • X

X Xn is a random sample of a population (typically iid), and the characteristics of this population are described by fX x .

  • The joint pdf (or pmf) of X

X Xn (assuming iid) is fX x

n i

fX xi

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 14 / 39

slide-26
SLIDE 26

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Notations in BIOSTAT602

  • X1, · · · , Xn : Random variables identically and independently

distributed (iid) with probability density (or mass) function fX(x|θ).

  • x1, · · · , xn : Realization of random variables X1, · · · , Xn.
  • X

X Xn is a random sample of a population (typically iid), and the characteristics of this population are described by fX x .

  • The joint pdf (or pmf) of X

X Xn (assuming iid) is fX x

n i

fX xi

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 14 / 39

slide-27
SLIDE 27

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Notations in BIOSTAT602

  • X1, · · · , Xn : Random variables identically and independently

distributed (iid) with probability density (or mass) function fX(x|θ).

  • x1, · · · , xn : Realization of random variables X1, · · · , Xn.
  • X = (X1, · · · , Xn) is a random sample of a population (typically iid),

and the characteristics of this population are described by fX(x|θ).

  • The joint pdf (or pmf) of X

X Xn (assuming iid) is fX x

n i

fX xi

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 14 / 39

slide-28
SLIDE 28

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Notations in BIOSTAT602

  • X1, · · · , Xn : Random variables identically and independently

distributed (iid) with probability density (or mass) function fX(x|θ).

  • x1, · · · , xn : Realization of random variables X1, · · · , Xn.
  • X = (X1, · · · , Xn) is a random sample of a population (typically iid),

and the characteristics of this population are described by fX(x|θ).

  • The joint pdf (or pmf) of X = (X1, · · · , Xn) (assuming iid) is

fX(x|θ) =

n

i=1

fX(xi|θ)

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 14 / 39

slide-29
SLIDE 29

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT601 vs BIOSTAT602

.

BIOSTAT601

. . In BIOSTAT601, we assume the knowledge of θ in making probabilistic statements about X1, · · · , Xn. .

BIOSTAT602

. . . . . . . . In BIOSTAT602, we do not know the true value of the parameter , and instead we try to learn about this true parameter value through the

  • bserved data x

xn.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 15 / 39

slide-30
SLIDE 30

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT601 vs BIOSTAT602

.

BIOSTAT601

. . In BIOSTAT601, we assume the knowledge of θ in making probabilistic statements about X1, · · · , Xn. .

BIOSTAT602

. . In BIOSTAT602, we do not know the true value of the parameter θ, and instead we try to learn about this true parameter value through the

  • bserved data x1, · · · , xn.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 15 / 39

slide-31
SLIDE 31

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example of BIOSTAT601 Questions

For a sample size n, let X1, · · · , Xn

i.i.d.

∼ Bernoulli(p0). What is the

probability of ∑n

i=1 Xi ≤ m? n i

Xi Binomial n p Pr

n i

Xi m

m k

n k pk p

n k

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 16 / 39

slide-32
SLIDE 32

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example of BIOSTAT601 Questions

For a sample size n, let X1, · · · , Xn

i.i.d.

∼ Bernoulli(p0). What is the

probability of ∑n

i=1 Xi ≤ m? n

i=1

Xi ∼ Binomial(n, p0) Pr

n i

Xi m

m k

n k pk p

n k

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 16 / 39

slide-33
SLIDE 33

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example of BIOSTAT601 Questions

For a sample size n, let X1, · · · , Xn

i.i.d.

∼ Bernoulli(p0). What is the

probability of ∑n

i=1 Xi ≤ m? n

i=1

Xi ∼ Binomial(n, p0) Pr ( n ∑

i=1

Xi ≤ m ) =

m

k=0

(n k ) pk

0(1 − p0)n−k

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 16 / 39

slide-34
SLIDE 34

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example of BIOSTAT602 Questions

We assume that the data was generated by a pdf (or pmf) that belongs to a class of pdfs (or pmfs). P = {fX(x|θ), θ ∈ Ω ⊂ Rp} For example X Bernoulli . We collect data in order to

. . 1 Estimate

(point estimation)

. . 2 Perform tests of hypothesis about

.

. . 3 Estimate confidence intervals for

(interval estimation).

. . 4 Make predictions of future data.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 17 / 39

slide-35
SLIDE 35

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example of BIOSTAT602 Questions

We assume that the data was generated by a pdf (or pmf) that belongs to a class of pdfs (or pmfs). P = {fX(x|θ), θ ∈ Ω ⊂ Rp} For example X ∼ Bernoulli(θ), θ ∈ (0, 1) = Ω ⊂ R. We collect data in order to

. . 1 Estimate

(point estimation)

. . 2 Perform tests of hypothesis about

.

. . 3 Estimate confidence intervals for

(interval estimation).

. . 4 Make predictions of future data.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 17 / 39

slide-36
SLIDE 36

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example of BIOSTAT602 Questions

We assume that the data was generated by a pdf (or pmf) that belongs to a class of pdfs (or pmfs). P = {fX(x|θ), θ ∈ Ω ⊂ Rp} For example X ∼ Bernoulli(θ), θ ∈ (0, 1) = Ω ⊂ R. We collect data in order to

. . 1 Estimate θ (point estimation) . . 2 Perform tests of hypothesis about θ. . . 3 Estimate confidence intervals for θ (interval estimation). . . 4 Make predictions of future data.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 17 / 39

slide-37
SLIDE 37

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT602: Examples of informal questions

. . 1 Estimate θ (point estimation)

  • What is the estimated probability of head given a series of coin tosses?

. . 2 Perform tests of hypothesis about

.

  • Given a series of coin tosses, can you tell whether the coin is biased or

not?

. . 3 Estimate confidence intervals for

(interval estimation).

  • What is the plausible range of the true probability of head, given a

series of coin tosses?

. . 4 Make predictions of future data.

  • Given the series of coin tosses, can you predict what the outcome of

the next coin toss?

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 18 / 39

slide-38
SLIDE 38

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT602: Examples of informal questions

. . 1 Estimate θ (point estimation)

  • What is the estimated probability of head given a series of coin tosses?

. . 2 Perform tests of hypothesis about θ.

  • Given a series of coin tosses, can you tell whether the coin is biased or

not?

. . 3 Estimate confidence intervals for

(interval estimation).

  • What is the plausible range of the true probability of head, given a

series of coin tosses?

. . 4 Make predictions of future data.

  • Given the series of coin tosses, can you predict what the outcome of

the next coin toss?

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 18 / 39

slide-39
SLIDE 39

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT602: Examples of informal questions

. . 1 Estimate θ (point estimation)

  • What is the estimated probability of head given a series of coin tosses?

. . 2 Perform tests of hypothesis about θ.

  • Given a series of coin tosses, can you tell whether the coin is biased or

not?

. . 3 Estimate confidence intervals for θ (interval estimation).

  • What is the plausible range of the true probability of head, given a

series of coin tosses?

. . 4 Make predictions of future data.

  • Given the series of coin tosses, can you predict what the outcome of

the next coin toss?

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 18 / 39

slide-40
SLIDE 40

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

BIOSTAT602: Examples of informal questions

. . 1 Estimate θ (point estimation)

  • What is the estimated probability of head given a series of coin tosses?

. . 2 Perform tests of hypothesis about θ.

  • Given a series of coin tosses, can you tell whether the coin is biased or

not?

. . 3 Estimate confidence intervals for θ (interval estimation).

  • What is the plausible range of the true probability of head, given a

series of coin tosses?

. . 4 Make predictions of future data.

  • Given the series of coin tosses, can you predict what the outcome of

the next coin toss?

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 18 / 39

slide-41
SLIDE 41

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Data Reduction

.

Data

. . x1, · · · , xn : Realization of random variables X1, · · · , Xn. .

Data Reduction

. . . . . . . . Define a function of data T x xn

n d

We wish this summary of data to..

. . 1 Be simpler than the original data, e.g. d

n.

. . 2 Keep all the information about

that is contained in the original data x xn.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 19 / 39

slide-42
SLIDE 42

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Data Reduction

.

Data

. . x1, · · · , xn : Realization of random variables X1, · · · , Xn. .

Data Reduction

. . Define a function of data T(x1, · · · , xn) : Rn → Rd We wish this summary of data to..

. . 1 Be simpler than the original data, e.g. d

n.

. . 2 Keep all the information about

that is contained in the original data x xn.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 19 / 39

slide-43
SLIDE 43

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Data Reduction

.

Data

. . x1, · · · , xn : Realization of random variables X1, · · · , Xn. .

Data Reduction

. . Define a function of data T(x1, · · · , xn) : Rn → Rd We wish this summary of data to..

. . 1 Be simpler than the original data, e.g. d ≤ n. . . 2 Keep all the information about θ that is contained in the original data

x1, · · · , xn.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 19 / 39

slide-44
SLIDE 44

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Statistic

T(X1, · · · , Xn) = T(X)

  • It is a function of random variables X1, · · · , Xn.
  • T(X) itself is also a random variable.
  • T(X) defines a form of data reduction or data summary.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 20 / 39

slide-45
SLIDE 45

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Data Reduction

Data reduction in terms of a statistic T(X) is a partition of the sample space X. .

Example

. . Suppose Xi

i.i.d.

∼ Bernoulli(p) for i = 1, 2, 3, and 0 < p < 1.

Define T(X1, X2, X3) = X1 + X2 + X3, then T : {0, 1}3 → {0, 1, 2, 3}.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 21 / 39

slide-46
SLIDE 46

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example of Data Reduction

Partition X1 X2 X3 T(X) = X1 + X2 + X3 A0 A1 1 1 1 1 1 1 A2 1 1 2 1 1 2 1 1 2 A3 1 1 1 3 t t T X for some x At x T X t t Instead of reporting x x x x

T, we report only T X

t, or equivalently x At.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 22 / 39

slide-47
SLIDE 47

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example of Data Reduction

Partition X1 X2 X3 T(X) = X1 + X2 + X3 A0 A1 1 1 1 1 1 1 A2 1 1 2 1 1 2 1 1 2 A3 1 1 1 3 T = {t : t = T(X) for some x ∈ X} At = {x : T(X) = t, t ∈ T } Instead of reporting x = (x1, x2, x3)T, we report only T(X) = t, or equivalently x ∈ At.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 22 / 39

slide-48
SLIDE 48

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example of Data Reduction

The partition of the sample space based on T(X) is ”coarser” than the

  • riginal sample space.
  • There are 8 elements in the sample space X.
  • They are partitioned into 4 subsets
  • Thus, T(X) is simpler (or coarser) than X.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 23 / 39

slide-49
SLIDE 49

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Sufficient Statistics

.

Definition 6.2.1

. . A statistic T(X) is a sufficient statistic for θ if the conditional distribution

  • f sample X given the value of T(X) does not depend on θ.

In other words, the conditional pdf or pmf of X given T t, fX x T X t h x does not depend on

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 24 / 39

slide-50
SLIDE 50

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Sufficient Statistics

.

Definition 6.2.1

. . A statistic T(X) is a sufficient statistic for θ if the conditional distribution

  • f sample X given the value of T(X) does not depend on θ.

In other words, the conditional pdf or pmf of X given T = t, fX(x|T(X) = t) = h(x) does not depend on θ

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 24 / 39

slide-51
SLIDE 51

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Sufficient Statistics: Example

  • Suppose X1, · · · , Xn

i.i.d.

∼ Bernoulli(p), 0 < p < 1.

  • Claim that T(X1, · · · , Xn) = ∑n

i=1 Xi is a sufficient statistic for p.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 25 / 39

slide-52
SLIDE 52

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Proof : Overview

  • T(X) = ∑n

i=1 Xi ∼ Binomial(n, p)

  • Need to find the conditional pmf of X given T = t.
  • And show that the distribution does not depend on p.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 26 / 39

slide-53
SLIDE 53

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Detailed Proof

Pr ( X1 = x1, · · · , Xn = xn|

n

i=1

Xi = t ) Pr X x Xn xn

n i

Xi t Pr

n i

Xi t Pr X x Xn xn Pr

n i

Xi t if

n i

Xi t

  • therwise

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 27 / 39

slide-54
SLIDE 54

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Detailed Proof

Pr ( X1 = x1, · · · , Xn = xn|

n

i=1

Xi = t ) = Pr (X1 = x1, · · · , Xn = xn, ∑n

i=1 Xi = t)

Pr (∑n

i=1 Xi = t)

Pr X x Xn xn Pr

n i

Xi t if

n i

Xi t

  • therwise

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 27 / 39

slide-55
SLIDE 55

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Detailed Proof

Pr ( X1 = x1, · · · , Xn = xn|

n

i=1

Xi = t ) = Pr (X1 = x1, · · · , Xn = xn, ∑n

i=1 Xi = t)

Pr (∑n

i=1 Xi = t)

=    Pr (X1 = x1, · · · , Xn = xn) Pr (∑n

i=1 Xi = t)

if ∑n

i=1 Xi = t

  • therwise

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 27 / 39

slide-56
SLIDE 56

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Detailed Proof (cont’d)

If ∑n

i=1 Xi = t, t ∼ Binomial(n, p)

Pr(X1 = x1, · · · , Xn = xn) =

n

i=1

Pr(Xi = xi) px p

x

pxn p

xn

p

n i

xi

p n

n i

xi

Pr

n i

Xi t n t pt p n

t

Pr X x

n i

Xi t

n t

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 28 / 39

slide-57
SLIDE 57

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Detailed Proof (cont’d)

If ∑n

i=1 Xi = t, t ∼ Binomial(n, p)

Pr(X1 = x1, · · · , Xn = xn) =

n

i=1

Pr(Xi = xi) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn p

n i

xi

p n

n i

xi

Pr

n i

Xi t n t pt p n

t

Pr X x

n i

Xi t

n t

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 28 / 39

slide-58
SLIDE 58

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Detailed Proof (cont’d)

If ∑n

i=1 Xi = t, t ∼ Binomial(n, p)

Pr(X1 = x1, · · · , Xn = xn) =

n

i=1

Pr(Xi = xi) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

Pr

n i

Xi t n t pt p n

t

Pr X x

n i

Xi t

n t

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 28 / 39

slide-59
SLIDE 59

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Detailed Proof (cont’d)

If ∑n

i=1 Xi = t, t ∼ Binomial(n, p)

Pr(X1 = x1, · · · , Xn = xn) =

n

i=1

Pr(Xi = xi) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

Pr ( n ∑

i=1

Xi = t ) = (n t ) pt(1 − p)n−t Pr X x

n i

Xi t

n t

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 28 / 39

slide-60
SLIDE 60

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Detailed Proof (cont’d)

If ∑n

i=1 Xi = t, t ∼ Binomial(n, p)

Pr(X1 = x1, · · · , Xn = xn) =

n

i=1

Pr(Xi = xi) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

Pr ( n ∑

i=1

Xi = t ) = (n t ) pt(1 − p)n−t Pr ( X = x|

n

i=1

Xi = t ) = 1 (n

t

)

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 28 / 39

slide-61
SLIDE 61

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Detailed Proof (cont’d)

Therefore, conditional distribution Pr ( X = x|

n

i=1

Xi = t ) = {

1

(n

t)

if ∑n

i=1 Xi = t

  • therwise

Because Pr(X|T(X) = t) does not depend on p, by definition, T(X) = ∑n

i=1 Xi is a sufficient statistic for p.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 29 / 39

slide-62
SLIDE 62

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Note from the proof

If X is a sample point such that T(X) ̸= t, then Pr(X = x|T(x) = t) = 0 always, so we don’t have to consider the case when T(x) ̸= t in the future.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 30 / 39

slide-63
SLIDE 63

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

A Theorem for Sufficient Statistics

.

Theorem 6.2.2

. .

  • Let fX(x|θ) is a joint pdf or pmf of X
  • and q(t|θ) is the pdf or pmf of T(X).
  • Then T(X) is a sufficient statistic for θ,
  • if, for every x ∈ X,
  • the ratio fX(x|θ)/q(T(x)|θ) is constant as a function of θ.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 31 / 39

slide-64
SLIDE 64

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Proof of Theorem 6.2.2 - discrete case

Pr (X = x|T(X) = t) = Pr (X = x, T(X) = t) Pr(T(X) = t) Pr X x Pr T X t if T x t

  • therwise

fX x q T x if T x t

  • therwise

which does not depend on by assumption. Therefore, T X is a sufficient statistic for .

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 32 / 39

slide-65
SLIDE 65

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Proof of Theorem 6.2.2 - discrete case

Pr (X = x|T(X) = t) = Pr (X = x, T(X) = t) Pr(T(X) = t) =    Pr(X = x) Pr(T(X) = t) if T(x) = t

  • therwise

fX x q T x if T x t

  • therwise

which does not depend on by assumption. Therefore, T X is a sufficient statistic for .

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 32 / 39

slide-66
SLIDE 66

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Proof of Theorem 6.2.2 - discrete case

Pr (X = x|T(X) = t) = Pr (X = x, T(X) = t) Pr(T(X) = t) =    Pr(X = x) Pr(T(X) = t) if T(x) = t

  • therwise

=    fX(x|θ) q(T(x)|θ) if T(x) = t

  • therwise

which does not depend on by assumption. Therefore, T X is a sufficient statistic for .

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 32 / 39

slide-67
SLIDE 67

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Proof of Theorem 6.2.2 - discrete case

Pr (X = x|T(X) = t) = Pr (X = x, T(X) = t) Pr(T(X) = t) =    Pr(X = x) Pr(T(X) = t) if T(x) = t

  • therwise

=    fX(x|θ) q(T(x)|θ) if T(x) = t

  • therwise

which does not depend on θ by assumption. Therefore, T(X) is a sufficient statistic for θ.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 32 / 39

slide-68
SLIDE 68

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.3 - Binomial Sufficient Statistic

.

Problem

. .

  • X1, · · · , Xn

i.i.d.

∼ Bernoulli(p), 0 < θ < 1.

  • Show that T(X) = ∑n

i=1 Xi is a sufficient statistic for θ.

This is the same problem from the last lecture, but we would like to solve is using Theorem 6.2.2.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 33 / 39

slide-69
SLIDE 69

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.3 - Binomial Sufficient Statistic

.

Proof

. . fX(x|p) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

T X Binomial n p q t p n t pt p n

t

fX x p q T x p p

n i

xi

p n

n i

xi n

n i

xi p

n i

xi

p n

n i

xi n

n i

xi n T x

By theorem 6.2.2. T X is a sufficient statistic for p.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 34 / 39

slide-70
SLIDE 70

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.3 - Binomial Sufficient Statistic

.

Proof

. . fX(x|p) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

T(X) ∼ Binomial(n, p) q t p n t pt p n

t

fX x p q T x p p

n i

xi

p n

n i

xi n

n i

xi p

n i

xi

p n

n i

xi n

n i

xi n T x

By theorem 6.2.2. T X is a sufficient statistic for p.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 34 / 39

slide-71
SLIDE 71

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.3 - Binomial Sufficient Statistic

.

Proof

. . fX(x|p) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

T(X) ∼ Binomial(n, p) q(t|p) = (n t ) pt(1 − p)n−t fX x p q T x p p

n i

xi

p n

n i

xi n

n i

xi p

n i

xi

p n

n i

xi n

n i

xi n T x

By theorem 6.2.2. T X is a sufficient statistic for p.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 34 / 39

slide-72
SLIDE 72

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.3 - Binomial Sufficient Statistic

.

Proof

. . fX(x|p) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

T(X) ∼ Binomial(n, p) q(t|p) = (n t ) pt(1 − p)n−t fX(x|p) q(T(x)|p) = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

(

n ∑n

i=1 xi

) p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

n

n i

xi n T x

By theorem 6.2.2. T X is a sufficient statistic for p.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 34 / 39

slide-73
SLIDE 73

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.3 - Binomial Sufficient Statistic

.

Proof

. . fX(x|p) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

T(X) ∼ Binomial(n, p) q(t|p) = (n t ) pt(1 − p)n−t fX(x|p) q(T(x)|p) = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

(

n ∑n

i=1 xi

) p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

= 1 (

n ∑n

i=1 xi

) = 1 ( n

T(x)

) By theorem 6.2.2. T X is a sufficient statistic for p.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 34 / 39

slide-74
SLIDE 74

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.3 - Binomial Sufficient Statistic

.

Proof

. . fX(x|p) = px1(1 − p)1−x1 · · · pxn(1 − p)1−xn = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

T(X) ∼ Binomial(n, p) q(t|p) = (n t ) pt(1 − p)n−t fX(x|p) q(T(x)|p) = p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

(

n ∑n

i=1 xi

) p

∑n

i=1 xi(1 − p)n−∑n i=1 xi

= 1 (

n ∑n

i=1 xi

) = 1 ( n

T(x)

) By theorem 6.2.2. T(X) is a sufficient statistic for p.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 34 / 39

slide-75
SLIDE 75

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.4 - Normal Sufficient Statistic

.

Problem

. .

  • X1, · · · , Xn

i.i.d.

∼ N(µ, σ2)

  • Assume that σ2 is known.
  • Show that the sample mean T(X) = X = 1

n

∑n

i=1 Xi is a sufficient

statistic for µ.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 35 / 39

slide-76
SLIDE 76

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.4 - Proof

.

fX(x|µ)

. . fX(x|µ) =

n

i=1

1 √ 2πσ2 exp ( −(xi − µ)2 2σ2 )

n

exp

n i

xi

n

exp

n i

xi x x

n

exp

n i

xi x n x

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 36 / 39

slide-77
SLIDE 77

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.4 - Proof

.

fX(x|µ)

. . fX(x|µ) =

n

i=1

1 √ 2πσ2 exp ( −(xi − µ)2 2σ2 ) = (2πσ2)−n/2 exp ( −

n

i=1

(xi − µ)2 2σ2 )

n

exp

n i

xi x x

n

exp

n i

xi x n x

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 36 / 39

slide-78
SLIDE 78

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.4 - Proof

.

fX(x|µ)

. . fX(x|µ) =

n

i=1

1 √ 2πσ2 exp ( −(xi − µ)2 2σ2 ) = (2πσ2)−n/2 exp ( −

n

i=1

(xi − µ)2 2σ2 ) = (2πσ2)−n/2 exp ( −

n

i=1

(xi − x + x − µ)2 2σ2 )

n

exp

n i

xi x n x

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 36 / 39

slide-79
SLIDE 79

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.4 - Proof

.

fX(x|µ)

. . fX(x|µ) =

n

i=1

1 √ 2πσ2 exp ( −(xi − µ)2 2σ2 ) = (2πσ2)−n/2 exp ( −

n

i=1

(xi − µ)2 2σ2 ) = (2πσ2)−n/2 exp ( −

n

i=1

(xi − x + x − µ)2 2σ2 ) = (2πσ2)−n/2 exp ( − ∑n

i=1(xi − x)2 + n(x − µ)2

2σ2 )

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 36 / 39

slide-80
SLIDE 80

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.4 - Proof (cont’d)

.

q(T(x)|µ)

. . Remember from BIOSTAT601 that T(X) = X ∼ N(µ, σ2/n). q(T(x)|µ) = 1 √ 2πσ2/n exp ( −n(x − µ)2/(2σ2) )

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 37 / 39

slide-81
SLIDE 81

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.4 - Proof

.

Putting things together

. . fX(x|µ) q(T(x)|µ) = (2πσ2)−n/2 exp ( − ∑n

i=1(xi − x)2 + n(x − µ)2

2σ2 ) (2πσ2/n)−1/2 exp ( −n(x − µ)2 2σ2 ) n

n

exp

n i

xi x which does not depend on . By Theorem 6.2.2, the sample mean is a sufficient statistic for .

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 38 / 39

slide-82
SLIDE 82

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.4 - Proof

.

Putting things together

. . fX(x|µ) q(T(x)|µ) = (2πσ2)−n/2 exp ( − ∑n

i=1(xi − x)2 + n(x − µ)2

2σ2 ) (2πσ2/n)−1/2 exp ( −n(x − µ)2 2σ2 ) = n−1/2(2πσ2)−(n−1)/2 exp ( − ∑n

i=1(xi − x)2

2σ2 ) which does not depend on . By Theorem 6.2.2, the sample mean is a sufficient statistic for .

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 38 / 39

slide-83
SLIDE 83

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Example 6.2.4 - Proof

.

Putting things together

. . fX(x|µ) q(T(x)|µ) = (2πσ2)−n/2 exp ( − ∑n

i=1(xi − x)2 + n(x − µ)2

2σ2 ) (2πσ2/n)−1/2 exp ( −n(x − µ)2 2σ2 ) = n−1/2(2πσ2)−(n−1)/2 exp ( − ∑n

i=1(xi − x)2

2σ2 ) which does not depend on µ. By Theorem 6.2.2, the sample mean is a sufficient statistic for µ.

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 38 / 39

slide-84
SLIDE 84

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Summary

.

Today

. .

  • Overview of BIOSTAT602
  • Key differences between BIOSTAT601 and BIOSTAT602
  • Sufficient Statistics
  • X is conditionally independent on θ given T(X)
  • If ratio of pdfs between X and T(X) does not depend on θ, T(X) is a

sufficient statistic.

.

Next Lecture

. . . . . . . .

  • More on Sufficient Statistics
  • Factorization Theorem

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 39 / 39

slide-85
SLIDE 85

. . . . . .

. . . . . . . Syllabus . . . . . . BIOSTAT602 . . . . . Data Reduction . . . . . . . . . . . . . . . Sufficient Statistics . Summary

Summary

.

Today

. .

  • Overview of BIOSTAT602
  • Key differences between BIOSTAT601 and BIOSTAT602
  • Sufficient Statistics
  • X is conditionally independent on θ given T(X)
  • If ratio of pdfs between X and T(X) does not depend on θ, T(X) is a

sufficient statistic.

.

Next Lecture

. .

  • More on Sufficient Statistics
  • Factorization Theorem

Hyun Min Kang Biostatistics 602 - Lecture 01 January 10th, 2013 39 / 39