Today. Two populations. Which population? DNA data: Population 1: - PowerPoint PPT Presentation

Today. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 human1: A ··· C ··· T ··· A Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 human2: C ··· C ··· A ··· T human3: A ··· G ··· T ··· T Individual: x 1 , x 2 , x 3 .., x n . Population 1: snp i : Pr[ x i = 1] = p ( 1 ) Single Nucleotide Polymorphism. i Modelling. Population 2: snp i : Pr[ x i = 1] = p ( 2 ) Same population? i An Analysis of the Power of PCA. Simpler Calculation: Model: same populution breeds. Musing (rant?) about algorithms in the real world. Population 1: Gaussion with mean µ 1 ∈ R d , variance σ in each dim. Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 Population 2: Gaussion with mean µ 2 ∈ R d , variance σ in each dim. Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 Individual: x 1 , x 2 , x 3 .., x n . Which population? Comment: snps could be movie preferences, populations could be types. E.g., republican/democrat, shopper/saver. Gaussians Projection Don’t know much about... Population 1: Gaussion with mean µ 1 ∈ R d , std dev. σ in each dim. Population 1: Gaussion with mean µ 1 ∈ R d , variance σ in each dim. Population 2: Gaussion with mean µ 2 ∈ R d , std dev. σ in each dim. Population 2: Gaussion with mean µ 2 ∈ R d , variance σ in each dim. Difference between humans σ per snp. Difference between humans σ per snp. Difference between populations ε per snp. Difference between populations ε per snp. How many snps to collect to determine population for individual x ? Project x onto unit vector v in direction µ 2 − µ 1 . x in population 1. E [(( x − µ 1 ) · v ) 2 ] = σ 2 if x is population 1. Don’t know µ 1 or µ 2 ? E [( x − µ 1 ) 2 ] = d σ 2 E [(( x − µ 2 ) · v ) 2 ] ≥ ( µ 1 − µ 2 ) 2 if x is population 2. E [( x − µ 2 ) 2 ] ≥ ( d − 1 ) σ 2 +( µ 1 − µ 2 ) 2 . √ If ( µ 1 − µ 2 ) 2 = d ε 2 >> σ 2 , then different. Std deviation is σ 2 ! versus d σ 2 ! → take d >> σ 2 / ε 2 No loss in signal! Variance of estimator? d ε 2 >> σ 2 . Roughly d σ 4 . → d >> σ 2 / ε 2 Signal is difference between expecations. Versus d >> σ 4 / ε 4 . roughly d ε 2 √ Signal >> Noise. ↔ d ε 2 >> d σ 2 . A quadratic difference in amount of data! Need d >> σ 4 / ε 4 .

Without the means? Principal components analysis. Nets Remember Projection! Sample of n people. “ δ - Net”. Some (say half) from population 1, Set D of directions some from population 2. where all others, v , are close to x ∈ D . Don’t know µ 1 or µ 2 ? x · v ≥ 1 − δ . Which are which? δ - Net: Near Neighbors Approach [ ··· , i δ / d , ··· ] integers i ∈ [ − d / δ , d δ ] . Principal component analysis: Compute Euclidean distance squared. Find direction, v , of maximum variance. � O ( d ) Cluster using threshold. � d Total of N ∝ vectors in net. Maximize ∑ ( x · v ) 2 (zero center the points) δ Signal E [ d ( x 1 , x 2 )] − E [ d ( x 1 , y 1 )] Recall: ( x · v ) 2 could determine population. Signal >> Noise times log N = O ( d log d δ ) to isolate direction. should be larger than noise in d ( x , y ) Typical direction variance. n σ 2 . Where x ’s from one population, y ’s from other. log N is due to union bound over vectors in net. Direction along µ 1 − µ 2 , Signal is proportional d ε 2 . Signal (exp. projection): ∝ nd ε 2 . ∝ n ( µ 1 − µ 2 ) 2 . Noise (std dev.): √ n σ 2 . √ d σ 2 . ∝ nd ε 2 . Noise is proportional to nd >> ( σ 4 / ε 4 ) log d and d >> σ 2 / ε 2 works. Need d >> σ 2 / ε 2 at least. d >> σ 4 / ε 4 → same type people closer to each other. Nearest neighbor works with very high d > σ 4 / ε 4 . d >> ( σ 4 / ε 4 ) log n suffices for threshold clustering. When will PCA pick correct direction with good probability? � n PCA can reduce d to “knowing centers” case, with reasonable � log n factor for union bound over pairs. Union bound over directions. How many directions? 2 number of sample points. Best one can do? Infinity and beyond! PCA calculation. Computing eigenvalues. Sum up. Power method: Matrix A where rows are points. Choose random x . First eigenvector of B = A T A is maximum variance direction. Repeat: Let x = Bx . Scale x to unit vector. Clustering mixture of gaussians. Av are projections onto v . x = a 1 v 1 + a 2 v 2 + ··· vBv = ( vA ) T ( Av ) is ∑ x ( x · v ) 2 . Near Neighbor works with sufficient data. x t ∝ B t x = a 1 λ t 1 v 1 + a 2 λ t 2 v 2 + ··· First eigenvector, v , of B maximizes x T Bx . Projection onto subspace of means is better. Mostly v 1 after a while since λ t 1 >> λ t 2 . Principal compent analysis can find subspace of means. Bv = λ v for maximum λ . Cluster Algorithm: → vBv = λ for unit v . Power method computes principal component. Choose random partition. Eigenvectors form orthonormal basis. Repeat: Compute means of partition. Project, cluster. Generic clustering algorithm is rounded version of power method. Any other vector av + x , x · v = 0 Choose random + 1 / − 1 vector. Multiply by A T (direction between x is composed of possibly smaller eigenvalue vectors. means), multiply by A (project points), cluster (round to +1/-1 vector.) → vBv ≥ ( av + x ) B ( av + x ) for unit v , av + x . Sort of repeatedly multiplying by AA T . Power method.

See you on Tuesday.

Today. Two populations. Which population? DNA data: Population 1: - PowerPoint PPT Presentation

Today. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 human1: A C T A Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 human2: C C A T human3: A G

Possum Populations Examining changes in possum populations at two different areas of Australia

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

Welcome back. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 ,

Counting Words: Type probabilities Population models Type-rich populations, samples, ZM &

Analysis of population genetic data: Identifying populations or stocks Robin Waples NOAA

Modeling Prey-Predator Populations Alison Pool and Lydia Silva December 13, 2006 Alison Pool and

World Population Trends January 26, 2012 World Population Trends World Population Growth

Interactions: Populations and Communities Population Interactions A population of organisms

Variance Estimation in Complex Samples: The Finite Population Bootstrap Using Pseudo-Populations

SOCI 210: Sociological Perspectives Oct. 20 1. Studying Populations 2. Demographic theories 3.

Business Statistics CONTENTS Comparing the variance of two populations The -distribution The

Population Health Update 2.1.2019 Board of Trustee Retreat 1 2 Topics AHS Population Health

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

Population Structure Population Structure Nonrandom Mating HWE assumes that mating is random in

Introduction to Research Methods Samples and Populations Measuring Data Relationships Bewteen

Defining and selecting populations, extrapolation of benefit between populations Jan

Aligning DNA sequences on compressed collections of genomes Part 5. Practical session: alignment

FOCA Road Survey 2019 Presentation at the FOCA Fall Seminar, Nov.16, 2019 Michelle Lewin FOCA

Building Healthy Communities A Smart Growth Assessment Project Partners Town of Fort Edward

PLAN FOR ASHUELOT RAIL TRAIL A P I L OT P R O J E C T AGENDA Welcome & Introductions

GWAS on your notebook: Semi-parallel linear and logis9c

Taking Control of Your Managed Care Destiny AJAS 2017 April 3, 2017 All Roads Lead to Managed

XKCP internals Gilles Van Assche 1 1 STMicroelectronics SCA workshop ibenik, Croatia, June 2019

CS681: Advanced Topics in Computational Biology Can Alkan EA509 calkan@cs.bilkent.edu.tr

Today. Two populations. Which population? DNA data: Population 1: - PowerPoint PPT Presentation

Today. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 , Pr[T] = .6 human1: A C T A Population 2: snp 843: Pr[A] = .6 , Pr[T] = .4 human2: C C A T human3: A G

Possum Populations Examining changes in possum populations at two different areas of Australia

Population Ecology 1. Population Concepts 2. Population Growth 3. Regulation of Population

Welcome back. Two populations. Which population? DNA data: Population 1: snp 843: Pr[A] = .4 ,

Counting Words: Type probabilities Population models Type-rich populations, samples, ZM &amp;

Analysis of population genetic data: Identifying populations or stocks Robin Waples NOAA

Modeling Prey-Predator Populations Alison Pool and Lydia Silva December 13, 2006 Alison Pool and

World Population Trends January 26, 2012 World Population Trends World Population Growth

Interactions: Populations and Communities Population Interactions A population of organisms

Variance Estimation in Complex Samples: The Finite Population Bootstrap Using Pseudo-Populations

SOCI 210: Sociological Perspectives Oct. 20 1. Studying Populations 2. Demographic theories 3.

Business Statistics CONTENTS Comparing the variance of two populations The -distribution The

Population Health Update 2.1.2019 Board of Trustee Retreat 1 2 Topics AHS Population Health

ACMS 20340 Statistics for Life Sciences Chapter 20: Comparing Two Proportions Two sample tests

Population Structure Population Structure Nonrandom Mating HWE assumes that mating is random in

Introduction to Research Methods Samples and Populations Measuring Data Relationships Bewteen

Defining and selecting populations, extrapolation of benefit between populations Jan

Aligning DNA sequences on compressed collections of genomes Part 5. Practical session: alignment

FOCA Road Survey 2019 Presentation at the FOCA Fall Seminar, Nov.16, 2019 Michelle Lewin FOCA

Building Healthy Communities A Smart Growth Assessment Project Partners Town of Fort Edward

PLAN FOR ASHUELOT RAIL TRAIL A P I L OT P R O J E C T AGENDA Welcome &amp; Introductions

GWAS on your notebook: Semi-parallel linear and logis9c

Taking Control of Your Managed Care Destiny AJAS 2017 April 3, 2017 All Roads Lead to Managed

XKCP internals Gilles Van Assche 1 1 STMicroelectronics SCA workshop ibenik, Croatia, June 2019

CS681: Advanced Topics in Computational Biology Can Alkan EA509 calkan@cs.bilkent.edu.tr

Counting Words: Type probabilities Population models Type-rich populations, samples, ZM &

PLAN FOR ASHUELOT RAIL TRAIL A P I L OT P R O J E C T AGENDA Welcome & Introductions