SLIDE 1
Welcome back. Projects comments available on Glookup! Welcome back. - - PowerPoint PPT Presentation
Welcome back. Projects comments available on Glookup! Welcome back. - - PowerPoint PPT Presentation
Welcome back. Projects comments available on Glookup! Welcome back. Projects comments available on Glookup! Turn in homework! Welcome back. Projects comments available on Glookup! Turn in homework! I am away April 15-20. Welcome back.
SLIDE 2
SLIDE 3
Welcome back.
Projects comments available on Glookup! Turn in homework! I am away April 15-20.
SLIDE 4
Welcome back.
Projects comments available on Glookup! Turn in homework! I am away April 15-20. Midterm out when I get back.
SLIDE 5
Welcome back.
Projects comments available on Glookup! Turn in homework! I am away April 15-20. Midterm out when I get back. Few days takehome. Shiftable.
SLIDE 6
Welcome back.
Projects comments available on Glookup! Turn in homework! I am away April 15-20. Midterm out when I get back. Few days takehome. Shiftable. Have handle on projects before that.
SLIDE 7
Welcome back.
Projects comments available on Glookup! Turn in homework! I am away April 15-20. Midterm out when I get back. Few days takehome. Shiftable. Have handle on projects before that. Progress report due Monday.
SLIDE 8
Two populations.
DNA data:
SLIDE 9
Two populations.
DNA data: human1: A ··· C ··· T ··· A
SLIDE 10
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T
SLIDE 11
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T
SLIDE 12
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism.
SLIDE 13
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population?
SLIDE 14
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds.
SLIDE 15
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4
SLIDE 16
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn.
SLIDE 17
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Which population?
SLIDE 18
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Which population? Comment: snps could be movie preferences, populations could be types.
SLIDE 19
Two populations.
DNA data: human1: A ··· C ··· T ··· A human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Single Nucleotide Polymorphism. Same population? Model: same populution breeds. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Which population? Comment: snps could be movie preferences, populations could be types. E.g., republican/democrat, shopper/saver.
SLIDE 20
Which population?
Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4
SLIDE 21
Which population?
Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn.
SLIDE 22
Which population?
Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Population 1: snp i: Pr[xi = 1] = p(1)
i
Population 2: snp i: Pr[xi = 0] = p(2)
i
SLIDE 23
Which population?
Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Population 1: snp i: Pr[xi = 1] = p(1)
i
Population 2: snp i: Pr[xi = 0] = p(2)
i
Simpler Calculation:
SLIDE 24
Which population?
Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Population 1: snp i: Pr[xi = 1] = p(1)
i
Population 2: snp i: Pr[xi = 0] = p(2)
i
Simpler Calculation: Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim.
SLIDE 25
Which population?
Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x1,x2,x3..,xn. Population 1: snp i: Pr[xi = 1] = p(1)
i
Population 2: snp i: Pr[xi = 0] = p(2)
i
Simpler Calculation: Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim.
SLIDE 26
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim.
SLIDE 27
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp.
SLIDE 28
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x?
SLIDE 29
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1.
SLIDE 30
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2
SLIDE 31
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2.
SLIDE 32
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different.
SLIDE 33
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2
SLIDE 34
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator?
SLIDE 35
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4.
SLIDE 36
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4. Signal is difference between expecations.
SLIDE 37
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4. Signal is difference between expecations. roughly dε2
SLIDE 38
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4. Signal is difference between expecations. roughly dε2 Signal >> Noise. ↔ dε2 >> √ dσ2.
SLIDE 39
Gaussians
Population 1: Gaussion with mean µ1 ∈ Rd, std deviation σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, std deviation σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x? x in population 1. E[(x − µ1)2] = dσ2 E[(x − µ2)2] ≥ (d −1)σ2 +(µ1 − µ2)2. If (µ1 − µ2)2 = dε2 >> σ2, then different. → take d >> σ2/ε2 Variance of estimator? Roughly dσ4. Signal is difference between expecations. roughly dε2 Signal >> Noise. ↔ dε2 >> √ dσ2. Need d >> σ4/ε4.
SLIDE 40
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim.
SLIDE 41
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp.
SLIDE 42
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1.
SLIDE 43
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1.
SLIDE 44
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2.
SLIDE 45
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2!
SLIDE 46
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2!
SLIDE 47
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal!
SLIDE 48
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal! dε2 >> σ2.
SLIDE 49
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal! dε2 >> σ2. → d >> σ2/ε2
SLIDE 50
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal! dε2 >> σ2. → d >> σ2/ε2 Versus d >> σ4/ε4.
SLIDE 51
Projection
Population 1: Gaussion with mean µ1 ∈ Rd, variance σ in each dim. Population 2: Gaussion with mean µ2 ∈ Rd, variance σ in each dim. Difference between humans σ per snp. Difference between populations ε per snp. Project x onto unit vector v in direction µ2 − µ1. E[((x − µ1)·v)2] = 0 if x is population 1. E[((x − µ2)·v)2] ≥ (µ1 − µ2)2 if x is population 2. Std deviation is σ2! versus √ dσ2! No loss in signal! dε2 >> σ2. → d >> σ2/ε2 Versus d >> σ4/ε4. A quadratic difference in amount of data!
SLIDE 52
Don’t know much about...
Don’t know µ1 or µ2?
SLIDE 53
Without the means?
Sample of n people.
SLIDE 54
Without the means?
Sample of n people. Some (say half) from population 1,
SLIDE 55
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2.
SLIDE 56
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which?
SLIDE 57
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach
SLIDE 58
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared.
SLIDE 59
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold.
SLIDE 60
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)]
SLIDE 61
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y)
SLIDE 62
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other.
SLIDE 63
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2.
SLIDE 64
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2.
SLIDE 65
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2. d >> σ4/ε4 → same type people closer to each other.
SLIDE 66
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2. d >> σ4/ε4 → same type people closer to each other. d >> (σ4/ε4)logn suffices for threshold clustering.
SLIDE 67
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2. d >> σ4/ε4 → same type people closer to each other. d >> (σ4/ε4)logn suffices for threshold clustering. logn factor for union bound over n
2
- pairs.
SLIDE 68
Without the means?
Sample of n people. Some (say half) from population 1, some from population 2. Which are which? Near Neighbors Approach Compute Euclidean distance squared. Cluster using threshold. Signal E[d(x1,x2)]−E[d(x1,y1)] should be larger than noise in d(x,y) Where x’s from one population, y’s from other. Signal is proportional dε2. Noise is proportional to √ dσ2. d >> σ4/ε4 → same type people closer to each other. d >> (σ4/ε4)logn suffices for threshold clustering. logn factor for union bound over n
2
- pairs.
Best one can do?
SLIDE 69
Principal components analysis.
Remember Projection!
SLIDE 70
Principal components analysis.
Remember Projection!
SLIDE 71
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2?
SLIDE 72
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis:
SLIDE 73
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance.
SLIDE 74
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points)
SLIDE 75
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population.
SLIDE 76
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2.
SLIDE 77
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2,
SLIDE 78
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2.
SLIDE 79
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2.
SLIDE 80
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least.
SLIDE 81
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability?
SLIDE 82
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability? Union bound over directions.
SLIDE 83
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability? Union bound over directions. How many directions?
SLIDE 84
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability? Union bound over directions. How many directions? Infinity
SLIDE 85
Principal components analysis.
Remember Projection! Don’t know µ1 or µ2? Principal component analysis: Find direction, v, of maximum variance. Maximize ∑(x ·v)2 (zero center the points) Recall: (x ·v)2 could determine population. Typical direction variance. nσ2. Direction along µ1 − µ2, ∝ n(µ1 − µ2)2. ∝ ndε2. Need d >> σ2/ε2 at least. When will PCA pick correct direction with good probability? Union bound over directions. How many directions? Infinity and beyond!
SLIDE 86
Nets
“δ - Net”.
SLIDE 87
Nets
“δ - Net”. Set D of directions
SLIDE 88
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D.
SLIDE 89
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ.
SLIDE 90
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net:
SLIDE 91
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ].
SLIDE 92
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝
- d
δ
O(d) vectors in net.
SLIDE 93
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝
- d
δ
O(d) vectors in net. Signal >> Noise times logN = O(d log d
δ ) to isolate direction.
SLIDE 94
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝
- d
δ
O(d) vectors in net. Signal >> Noise times logN = O(d log d
δ ) to isolate direction.
logN is due to union bound over vectors in net.
SLIDE 95
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝
- d
δ
O(d) vectors in net. Signal >> Noise times logN = O(d log d
δ ) to isolate direction.
logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2.
SLIDE 96
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝
- d
δ
O(d) vectors in net. Signal >> Noise times logN = O(d log d
δ ) to isolate direction.
logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2. Noise (std dev.): √nσ2.
SLIDE 97
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝
- d
δ
O(d) vectors in net. Signal >> Noise times logN = O(d log d
δ ) to isolate direction.
logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2. Noise (std dev.): √nσ2. nd >> (σ4/ε4)logd and d >> σ2/ε2 works.
SLIDE 98
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝
- d
δ
O(d) vectors in net. Signal >> Noise times logN = O(d log d
δ ) to isolate direction.
logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2. Noise (std dev.): √nσ2. nd >> (σ4/ε4)logd and d >> σ2/ε2 works. Nearest neighbor works with very high d > σ4/ε4.
SLIDE 99
Nets
“δ - Net”. Set D of directions where all others, v, are close to x ∈ D. x ·v ≥ 1−δ. δ- Net: [··· ,iδ/d,···] integers i ∈ [−d/δ,dδ]. Total of N ∝
- d
δ
O(d) vectors in net. Signal >> Noise times logN = O(d log d
δ ) to isolate direction.
logN is due to union bound over vectors in net. Signal (exp. projection): ∝ ndε2. Noise (std dev.): √nσ2. nd >> (σ4/ε4)logd and d >> σ2/ε2 works. Nearest neighbor works with very high d > σ4/ε4. PCA can reduce d to “knowing centers” case, with reasonable number of sample points.
SLIDE 100
PCA calculation.
Matrix A where rows are points.
SLIDE 101
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction.
SLIDE 102
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v.
SLIDE 103
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2.
SLIDE 104
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2.
SLIDE 105
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx.
SLIDE 106
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ.
SLIDE 107
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v.
SLIDE 108
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v.
SLIDE 109
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v. Eigenvectors form orthonormal basis.
SLIDE 110
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v. Eigenvectors form orthonormal basis. Any other vector av +x, x ·v = 0
SLIDE 111
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v. Eigenvectors form orthonormal basis. Any other vector av +x, x ·v = 0 x is composed of possibly smaller eigenvalue vectors.
SLIDE 112
PCA calculation.
Matrix A where rows are points. First eigenvector of B = AT A is maximum variance direction. Av are projections onto v. vBv = (vA)T (Av) is ∑x(x ·v)2. First eigenvector, v, of B maximizes xT Bx. Bv = λv for maximum λ. → vBv = λ for unit v. Eigenvectors form orthonormal basis. Any other vector av +x, x ·v = 0 x is composed of possibly smaller eigenvalue vectors. → vBv ≥ (av +x)B(av +x) for unit v, av +x.
SLIDE 113
Computing eigenvalues.
Power method:
SLIDE 114
Computing eigenvalues.
Power method: Choose random x.
SLIDE 115
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector.
SLIDE 116
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +···
SLIDE 117
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
SLIDE 118
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
SLIDE 119
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
Cluster Algorithm:
SLIDE 120
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
Cluster Algorithm: Choose random partition.
SLIDE 121
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster.
SLIDE 122
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector.
SLIDE 123
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means),
SLIDE 124
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means), multiply by A (project points),
SLIDE 125
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means), multiply by A (project points), cluster (round to +1/-1 vector.)
SLIDE 126
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means), multiply by A (project points), cluster (round to +1/-1 vector.) Sort of repeatedly multiplying by AAT .
SLIDE 127
Computing eigenvalues.
Power method: Choose random x. Repeat: Let x = Bx. Scale x to unit vector. x = a1v1 +a2v2 +··· xt ∝ Btx = a1λ t
1v1 +a2λ t 2v2 +···
Mostly v1 after a while since λ t
1 >> λ t 2.
Cluster Algorithm: Choose random partition. Repeat: Compute means of partition. Project, cluster. Choose random +1/−1 vector. Multiply by AT (direction between means), multiply by A (project points), cluster (round to +1/-1 vector.) Sort of repeatedly multiplying by AAT . Power method.
SLIDE 128
Sum up.
Clustering mixture of gaussians.
SLIDE 129
Sum up.
Clustering mixture of gaussians. Near Neighbor works with sufficient data.
SLIDE 130
Sum up.
Clustering mixture of gaussians. Near Neighbor works with sufficient data. Projection onto subspace of means is better.
SLIDE 131
Sum up.
Clustering mixture of gaussians. Near Neighbor works with sufficient data. Projection onto subspace of means is better. Principal compent analysis can find subspace of means.
SLIDE 132
Sum up.
Clustering mixture of gaussians. Near Neighbor works with sufficient data. Projection onto subspace of means is better. Principal compent analysis can find subspace of means. Power method computes principal component.
SLIDE 133
Sum up.
Clustering mixture of gaussians. Near Neighbor works with sufficient data. Projection onto subspace of means is better. Principal compent analysis can find subspace of means. Power method computes principal component. Generic clustering algorithm is rounded version of power method.
SLIDE 134