Welcome back. Projects comments available on Glookup! Welcome back. - - PowerPoint PPT Presentation

welcome back
SMART_READER_LITE
LIVE PREVIEW

Welcome back. Projects comments available on Glookup! Welcome back. - - PowerPoint PPT Presentation

Welcome back. Projects comments available on Glookup! Welcome back. Projects comments available on Glookup! Turn in homework! Welcome back. Projects comments available on Glookup! Turn in homework! I am away April 15-20. Welcome back.


slide-1
SLIDE 1

Welcome back.

Projects comments available on Glookup!

slide-2
SLIDE 2

Welcome back.

Projects comments available on Glookup! Turn in homework!

slide-3
SLIDE 3

Welcome back.

Projects comments available on Glookup! Turn in homework! I am away April 15-20.

slide-4
SLIDE 4

Welcome back.

Projects comments available on Glookup! Turn in homework! I am away April 15-20. Midterm out when I get back.

slide-5
SLIDE 5

Welcome back.

Projects comments available on Glookup! Turn in homework! I am away April 15-20. Midterm out when I get back. Few days takehome. Shiftable.

slide-6
SLIDE 6

Welcome back.

Projects comments available on Glookup! Turn in homework! I am away April 15-20. Midterm out when I get back. Few days takehome. Shiftable. Have handle on projects before that.

slide-7
SLIDE 7

Welcome back.

Projects comments available on Glookup! Turn in homework! I am away April 15-20. Midterm out when I get back. Few days takehome. Shiftable. Have handle on projects before that. Progress report due Monday.

slide-8
SLIDE 8

Two populations.

DNA data:

slide-9
SLIDE 9

Two populations.

DNA data: human1: A ··· C ··· T ··· A

slide-10
SLIDE 10

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T

slide-11
SLIDE 11

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T

slide-12
SLIDE 12

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism.

slide-13
SLIDE 13

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population?

slide-14
SLIDE 14

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds.

slide-15
SLIDE 15

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

slide-16
SLIDE 16

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn.

slide-17
SLIDE 17

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Which population?

slide-18
SLIDE 18

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Which population? Comment: snps could be movie preferences, populations could be types.

slide-19
SLIDE 19

Two populations.

DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Which population? Comment: snps could be movie preferences, populations could be types. E.g., republican/democrat, shopper/saver.

slide-20
SLIDE 20

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4

slide-21
SLIDE 21

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn.

slide-22
SLIDE 22

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Population 1: snp i: Pr[xi = 1] = p(1)

i

Population 2: snp i: Pr[xi = 0] = p(2)

i

slide-23
SLIDE 23

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Population 1: snp i: Pr[xi = 1] = p(1)

i

Population 2: snp i: Pr[xi = 0] = p(2)

i

Simpler Calculation:

slide-24
SLIDE 24

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Population 1: snp i: Pr[xi = 1] = p(1)

i

Population 2: snp i: Pr[xi = 0] = p(2)

i

Simpler Calculation: Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim.

slide-25
SLIDE 25

Which population?

Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Population 1: snp i: Pr[xi = 1] = p(1)

i

Population 2: snp i: Pr[xi = 0] = p(2)

i

Simpler Calculation: Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim.

slide-26
SLIDE 26

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim.

slide-27
SLIDE 27

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp.

slide-28
SLIDE 28

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x?

slide-29
SLIDE 29

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1.

slide-30
SLIDE 30

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2

slide-31
SLIDE 31

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2.

slide-32
SLIDE 32

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different.

slide-33
SLIDE 33

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2

slide-34
SLIDE 34

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator?

slide-35
SLIDE 35

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4.

slide-36
SLIDE 36

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4. Signal is difference between expecations.

slide-37
SLIDE 37

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4. Signal is difference between expecations. roughly dε2

slide-38
SLIDE 38

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4. Signal is difference between expecations. roughly dε2 Signal >> Noise. ↔ dε2 >> √ dσ2.

slide-39
SLIDE 39

Gaussians

Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4. Signal is difference between expecations. roughly dε2 Signal >> Noise. ↔ dε2 >> √ dσ2. Need d >> σ4/ε4.

slide-40
SLIDE 40

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim.

slide-41
SLIDE 41

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp.

slide-42
SLIDE 42

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1.

slide-43
SLIDE 43

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1.

slide-44
SLIDE 44

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2.

slide-45
SLIDE 45

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2!

slide-46
SLIDE 46

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2!

slide-47
SLIDE 47

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal!

slide-48
SLIDE 48

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal! dε2 >> σ2.

slide-49
SLIDE 49

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal! dε2 >> σ2. → d >> σ2/ε2

slide-50
SLIDE 50

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal! dε2 >> σ2. → d >> σ2/ε2 Versus d >> σ4/ε4.

slide-51
SLIDE 51

Projection

Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal! dε2 >> σ2. → d >> σ2/ε2 Versus d >> σ4/ε4. A quadratic difference in amount of data!

slide-52
SLIDE 52

Don’t know much about...

Don’t know µ1 or µ2?

slide-53
SLIDE 53

Without the means?

Sample of n people.

slide-54
SLIDE 54

Without the means?

Sample of n people. Some (say half) from population 1,

slide-55
SLIDE 55

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2.

slide-56
SLIDE 56

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which?

slide-57
SLIDE 57

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach

slide-58
SLIDE 58

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared.

slide-59
SLIDE 59

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold.

slide-60
SLIDE 60

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)]

slide-61
SLIDE 61

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y)

slide-62
SLIDE 62

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other.

slide-63
SLIDE 63

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2.

slide-64
SLIDE 64

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2.

slide-65
SLIDE 65

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2. d >> σ4/ε4 → same type people closer to each other.

slide-66
SLIDE 66

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2. d >> σ4/ε4 → same type people closer to each other. d >> (σ4/ε4)logn suffices for threshold clustering.

slide-67
SLIDE 67

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2. d >> σ4/ε4 → same type people closer to each other. d >> (σ4/ε4)logn suffices for threshold clustering. logn factor for union bound over n

2

  • pairs.
slide-68
SLIDE 68

Without the means?

Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2. d >> σ4/ε4 → same type people closer to each other. d >> (σ4/ε4)logn suffices for threshold clustering. logn factor for union bound over n

2

  • pairs.

Best one can do?

slide-69
SLIDE 69

Principal components analysis.

Remember Projection!

slide-70
SLIDE 70

Principal components analysis.

Remember Projection!

slide-71
SLIDE 71

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2?

slide-72
SLIDE 72

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis:

slide-73
SLIDE 73

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance.

slide-74
SLIDE 74

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points)

slide-75
SLIDE 75

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population.

slide-76
SLIDE 76

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2.

slide-77
SLIDE 77

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2,

slide-78
SLIDE 78

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2.

slide-79
SLIDE 79

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2.

slide-80
SLIDE 80

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least.

slide-81
SLIDE 81

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability?

slide-82
SLIDE 82

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability? Union bound over directions.

slide-83
SLIDE 83

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability? Union bound over directions. How many directions?

slide-84
SLIDE 84

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability? Union bound over directions. How many directions? Infinity

slide-85
SLIDE 85

Principal components analysis.

Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability? Union bound over directions. How many directions? Infinity and beyond!

slide-86
SLIDE 86

Nets

“δ - Net”.

slide-87
SLIDE 87

Nets

“δ - Net”. Set D of directions

slide-88
SLIDE 88

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D.

slide-89
SLIDE 89

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ.

slide-90
SLIDE 90

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net:

slide-91
SLIDE 91

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ].

slide-92
SLIDE 92

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝

  • d

δ

O(d) vectors in net.

slide-93
SLIDE 93

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝

  • d

δ

O(d) vectors in net. Signal >> Noise times logN = O(d log d

δ ) to isolate direction.

slide-94
SLIDE 94

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝

  • d

δ

O(d) vectors in net. Signal >> Noise times logN = O(d log d

δ ) to isolate direction.

logN is due to union bound over vectors in net.

slide-95
SLIDE 95

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝

  • d

δ

O(d) vectors in net. Signal >> Noise times logN = O(d log d

δ ) to isolate direction.

logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2.

slide-96
SLIDE 96

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝

  • d

δ

O(d) vectors in net. Signal >> Noise times logN = O(d log d

δ ) to isolate direction.

logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2. Noise (std dev.): √nσ2.

slide-97
SLIDE 97

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝

  • d

δ

O(d) vectors in net. Signal >> Noise times logN = O(d log d

δ ) to isolate direction.

logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2. Noise (std dev.): √nσ2. nd >> (σ4/ε4)logd and d >> σ2/ε2 works.

slide-98
SLIDE 98

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝

  • d

δ

O(d) vectors in net. Signal >> Noise times logN = O(d log d

δ ) to isolate direction.

logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2. Noise (std dev.): √nσ2. nd >> (σ4/ε4)logd and d >> σ2/ε2 works. Nearest neighbor works with very high d > σ4/ε4.

slide-99
SLIDE 99

Nets

“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝

  • d

δ

O(d) vectors in net. Signal >> Noise times logN = O(d log d

δ ) to isolate direction.

logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2. Noise (std dev.): √nσ2. nd >> (σ4/ε4)logd and d >> σ2/ε2 works. Nearest neighbor works with very high d > σ4/ε4. PCA can reduce d to “knowing centers” case, with reasonable number of sample points.

slide-100
SLIDE 100

PCA calculation.

Matrix A where rows are points.

slide-101
SLIDE 101

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction.

slide-102
SLIDE 102

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v.

slide-103
SLIDE 103

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2.

slide-104
SLIDE 104

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2.

slide-105
SLIDE 105

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx.

slide-106
SLIDE 106

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ.

slide-107
SLIDE 107

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v.

slide-108
SLIDE 108

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v.

slide-109
SLIDE 109

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v. Eigenvectors form orthonormal basis.

slide-110
SLIDE 110

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v. Eigenvectors form orthonormal basis. Any other vector av +x, x ·v = 0

slide-111
SLIDE 111

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v. Eigenvectors form orthonormal basis. Any other vector av +x, x ·v = 0 x is composed of possibly smaller eigenvalue vectors.

slide-112
SLIDE 112

PCA calculation.

Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v. Eigenvectors form orthonormal basis. Any other vector av +x, x ·v = 0 x is composed of possibly smaller eigenvalue vectors. → vBv ≥ (av +x)B(av +x) for unit v, av +x.

slide-113
SLIDE 113

Computing eigenvalues.

Power method:

slide-114
SLIDE 114

Computing eigenvalues.

Power method: Choose random x.

slide-115
SLIDE 115

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector.

slide-116
SLIDE 116

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +···

slide-117
SLIDE 117

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

slide-118
SLIDE 118

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

slide-119
SLIDE 119

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

Cluster Algorithm:

slide-120
SLIDE 120

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

Cluster Algorithm: Choose random partition.

slide-121
SLIDE 121

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster.

slide-122
SLIDE 122

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector.

slide-123
SLIDE 123

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means),

slide-124
SLIDE 124

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means), multiply by A (project points),

slide-125
SLIDE 125

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means), multiply by A (project points), cluster (round to +1/-1 vector.)

slide-126
SLIDE 126

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means), multiply by A (project points), cluster (round to +1/-1 vector.) Sort of repeatedly multiplying by AAT .

slide-127
SLIDE 127

Computing eigenvalues.

Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t

1v1 +a2λ t 2v2 +···

Mostly v1 after a while since λ t

1 >> λ t 2.

Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means), multiply by A (project points), cluster (round to +1/-1 vector.) Sort of repeatedly multiplying by AAT . Power method.

slide-128
SLIDE 128

Sum up.

Clustering mixture of gaussians.

slide-129
SLIDE 129

Sum up.

Clustering mixture of gaussians. Near Neighbor works with sufficient data.

slide-130
SLIDE 130

Sum up.

Clustering mixture of gaussians. Near Neighbor works with sufficient data. Projection onto subspace of means is better.

slide-131
SLIDE 131

Sum up.

Clustering mixture of gaussians. Near Neighbor works with sufficient data. Projection onto subspace of means is better. Principal compent analysis can find subspace of means.

slide-132
SLIDE 132

Sum up.

Clustering mixture of gaussians. Near Neighbor works with sufficient data. Projection onto subspace of means is better. Principal compent analysis can find subspace of means. Power method computes principal component.

slide-133
SLIDE 133

Sum up.

Clustering mixture of gaussians. Near Neighbor works with sufficient data. Projection onto subspace of means is better. Principal compent analysis can find subspace of means. Power method computes principal component. Generic clustering algorithm is rounded version of power method.

slide-134
SLIDE 134

See you on Thursday.