Selection Detection and Two-Sample-Testing: Generalized Greenwood - - PowerPoint PPT Presentation

selection detection and two sample testing generalized
SMART_READER_LITE
LIVE PREVIEW

Selection Detection and Two-Sample-Testing: Generalized Greenwood - - PowerPoint PPT Presentation

Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their Applications an Daniel Erdmann-Pham, Jonathan Terhorst & Yun S. Song University of California, Berkeley July 9, 2019 SPA 2019 Motivation Framework


slide-1
SLIDE 1

Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their Applications

Ðan Daniel Erdmann-Pham, Jonathan Terhorst & Yun S. Song

University of California, Berkeley

July 9, 2019 SPA 2019

slide-2
SLIDE 2

Motivation Framework Application

Two Problems

Generalized Greenwood Statistics

slide-3
SLIDE 3

Motivation Framework Application

Population Genetics: Detecting Selective Pressure

Neutral Tree

◮ At each depth, leaf set sizes are approximately equidistributed ◮ Leaf set sizes are highly unbalanced close to the root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes

Generalized Greenwood Statistics

slide-4
SLIDE 4

Motivation Framework Application

Population Genetics: Detecting Selective Pressure

Neutral Tree

◮ At each depth, leaf set sizes are approximately equidistributed ◮ Leaf set sizes are highly unbalanced close to the root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes

Generalized Greenwood Statistics

slide-5
SLIDE 5

Motivation Framework Application

Population Genetics: Detecting Selective Pressure

Neutral Tree

◮ At each depth, leaf set sizes are approximately equidistributed

Tree with Selection

◮ Leaf set sizes are highly unbalanced close to the root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes

Generalized Greenwood Statistics

slide-6
SLIDE 6

Motivation Framework Application

Population Genetics: Detecting Selective Pressure

Neutral Tree

◮ At each depth, leaf set sizes are approximately equidistributed

Tree with Selection

◮ Leaf set sizes are highly unbalanced close to the root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes

Generalized Greenwood Statistics

slide-7
SLIDE 7

Motivation Framework Application

Population Genetics: Detecting Selective Pressure

Neutral Tree

◮ At each depth, leaf set sizes are approximately equidistributed

Tree with Selection

◮ Leaf set sizes are highly unbalanced close to the root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes

Generalized Greenwood Statistics

slide-8
SLIDE 8

Motivation Framework Application

Population Genetics: Detecting Selective Pressure

Neutral Tree

◮ At each depth, leaf set sizes are approximately equidistributed

Tree with Selection

◮ Leaf set sizes are highly unbalanced close to the root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes

Generalized Greenwood Statistics

slide-9
SLIDE 9

Motivation Framework Application

Population Genetics: Detecting Selective Pressure

Neutral Tree

◮ At each depth, leaf set sizes are approximately equidistributed

Tree with Selection

◮ Leaf set sizes are highly unbalanced close to the root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes

Generalized Greenwood Statistics

slide-10
SLIDE 10

Motivation Framework Application

Two-Sample Tests: Comparing {Xk}k∈[n] and {Yk}k∈[m]  ≠  ≠

How to test the hypothesis whether {Xk} and {Yk} are identi- cally distributed?

Generalized Greenwood Statistics

slide-11
SLIDE 11

Motivation Framework Application

Two-Sample Tests: Comparing {Xk}k∈[n] and {Yk}k∈[m] Xk ~ Yk (Null)  ≠  ≠

How to test the hypothesis whether {Xk} and {Yk} are identi- cally distributed?

Generalized Greenwood Statistics

slide-12
SLIDE 12

Motivation Framework Application

Two-Sample Tests: Comparing {Xk}k∈[n] and {Yk}k∈[m] Xk ~ Yk (Null) [Xk] ≠ [Yk] (Alternative) ≠

How to test the hypothesis whether {Xk} and {Yk} are identi- cally distributed?

Generalized Greenwood Statistics

slide-13
SLIDE 13

Motivation Framework Application

Two-Sample Tests: Comparing {Xk}k∈[n] and {Yk}k∈[m] Xk ~ Yk (Null) [Xk] ≠ [Yk] (Alternative) Var[Xk] ≠ Var[Yk] (Alternative)

How to test the hypothesis whether {Xk} and {Yk} are identi- cally distributed?

Generalized Greenwood Statistics

slide-14
SLIDE 14

Motivation Framework Application

Two-Sample Tests: Comparing {Xk}k∈[n] and {Yk}k∈[m] Xk ~ Yk (Null) [Xk] ≠ [Yk] (Alternative) Var[Xk] ≠ Var[Yk] (Alternative)

How to test the hypothesis whether {Xk} and {Yk} are identi- cally distributed?

Generalized Greenwood Statistics

slide-15
SLIDE 15

Motivation Framework Application

Sampling uniformly from the k-dimensional simplex ∆k−1

Generalized Greenwood Statistics

slide-16
SLIDE 16

Motivation Framework Application

Balls and bins

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-17
SLIDE 17

Motivation Framework Application

Balls and bins

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-18
SLIDE 18

Motivation Framework Application

Balls and bins

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-19
SLIDE 19

Motivation Framework Application

Balls and bins

Sn,k

1

Sn,k

2

... ... Sn,k

k

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-20
SLIDE 20

Motivation Framework Application

Balls and bins

Sn,k

1

Sn,k

2

... ... Sn,k

k

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-21
SLIDE 21

Motivation Framework Application

Balls and bins

Sn,k

1

Sn,k

2

... ... Sn,k

k

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,k2

2?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-22
SLIDE 22

Motivation Framework Application

Balls and bins

Sn,k

1

Sn,k

2

... ... Sn,k

k

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,k2

2?

What is the distribution of Sn,k2

2?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-23
SLIDE 23

Motivation Framework Application

Balls and bins

Xk ~ Yk (Null) Sn,m+1

1

... Sn,m+1

j

... Sn,m+1

m

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,k2

2?

What is the distribution of Sn,k2

2?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-24
SLIDE 24

Motivation Framework Application

Balls and bins

Xk ~ Yk (Null) Sn,m+1

1

... Sn,m+1

j

... Sn,m+1

m

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,k2

2?

What is the distribution of Sn,k2

2?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-25
SLIDE 25

Motivation Framework Application

Balls and bins

Xk ~ Yk (Null) Sn,m+1

1

... Sn,m+1

j

... Sn,m+1

m

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,kp

p,w

What is the distribution of Sn,kp

p,w?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-26
SLIDE 26

Motivation Framework Application

Balls and bins

Xk ~ Yk (Null) Sn,m+1

1

... Sn,m+1

j

... Sn,m+1

m

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,kp

p,w

What is the distribution of Sn,kp

p,w?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-27
SLIDE 27

Motivation Framework Application

Balls and bins

Xk ~ Yk (Null) Sn,m+1

1

... Sn,m+1

j

... Sn,m+1

m

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,kp

p,w

What is the distribution of Sn,kp

p,w?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-28
SLIDE 28

Motivation Framework Application

Balls and bins

Xk ~ Yk (Null) Sn,m+1

1

... Sn,m+1

j

... Sn,m+1

m

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,kp

p,w

What is the distribution of Sn,kp

p,w?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-29
SLIDE 29

Motivation Framework Application

Balls and bins

Xk ~ Yk (Null) Sn,m+1

1

... Sn,m+1

j

... Sn,m+1

m

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,kp

p,w

What is the distribution of Sn,kp

p,w?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-30
SLIDE 30

Motivation Framework Application

Balls and bins

Xk ~ Yk (Null) Sn,m+1

1

... Sn,m+1

j

... Sn,m+1

m

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,kp

p,w

What is the distribution of Sn,kp

p,w?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-31
SLIDE 31

Motivation Framework Application

Balls and bins

Xk ~ Yk (Null) Sn,m+1

1

... Sn,m+1

j

... Sn,m+1

m

◮ Sn,k ∼ U

  • n · ∆k−1 ∩ Z+
  • (Bose-Einstein-

Distribution) ◮ Can we perform hypothesis testing based on Sn,kp

p,w

What is the distribution of Sn,kp

p,w?

Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of L1 and L2 balls

◮ Up to k = 3 (Gardner ’52) ◮ Large deviations (Schechtner, Zinn ’00)

◮ Tabulation of z-scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81)

Generalized Greenwood Statistics

slide-32
SLIDE 32

Motivation Framework Application

Results

Observation (Recursion) Let G(x) = ∞

m=0 Li−2m(x)/m!, then

ESn,k2m

2

= m!

n−1

k−1

[xn] (Sm(x))⋆(k) .

Corollaries (Discrete)

  • 1. ε-approximation in

O

n

ε log

n

ε

+ n

ε log k

  • time
  • 2. Conservative hypothesis

tests

  • 3. Alternative Scaling

limits: CLT, LLN, large deviations Corollaries (Continuous)

  • 1. Continuum

approximation: Fn,k − Fk∞ ∈ O

n−1

  • 2. Monotonicity:

Fn,k − Fk ≥ 0

  • 3. Regularity:

Fk ∈ Ck−3 ([0, 1])

Generalized Greenwood Statistics

slide-33
SLIDE 33

Motivation Framework Application

Results

Observation (Recursion) Let G(x) = ∞

m=0 Li−2m(x)/m!, then

ESn,k2m

2

= m!

n−1

k−1

[xn] (Sm(x))⋆(k) .

Corollaries (Discrete)

  • 1. ε-approximation in

O

n

ε log

n

ε

+ n

ε log k

  • time
  • 2. Conservative hypothesis

tests

  • 3. Alternative Scaling

limits: CLT, LLN, large deviations Corollaries (Continuous)

  • 1. Continuum

approximation: Fn,k − Fk∞ ∈ O

n−1

  • 2. Monotonicity:

Fn,k − Fk ≥ 0

  • 3. Regularity:

Fk ∈ Ck−3 ([0, 1])

Generalized Greenwood Statistics

slide-34
SLIDE 34

Motivation Framework Application

Results

Observation (Recursion) Let G(x) = ∞

m=0 Li−2m(x)/m!, then

ESn,k2m

2

= m!

n−1

k−1

[xn] (Sm(x))⋆(k) .

Corollaries (Discrete)

  • 1. ε-approximation in

O

n

ε log

n

ε

+ n

ε log k

  • time
  • 2. Conservative hypothesis

tests

  • 3. Alternative Scaling

limits: CLT, LLN, large deviations Corollaries (Continuous)

  • 1. Continuum

approximation: Fn,k − Fk∞ ∈ O

n−1

  • 2. Monotonicity:

Fn,k − Fk ≥ 0

  • 3. Regularity:

Fk ∈ Ck−3 ([0, 1])

Generalized Greenwood Statistics

slide-35
SLIDE 35

Motivation Framework Application

Results

Observation (Recursion) Let G(x) = ∞

m=0 Li−2m(x)/m!, then

ESn,k2m

2

= m!

n−1

k−1

[xn] (Sm(x))⋆(k) .

Corollaries (Discrete)

  • 1. ε-approximation in

O

n

ε log

n

ε

+ n

ε log k

  • time
  • 2. Conservative hypothesis

tests

  • 3. Alternative Scaling

limits: CLT, LLN, large deviations Corollaries (Continuous)

  • 1. Continuum

approximation: Fn,k − Fk∞ ∈ O

n−1

  • 2. Monotonicity:

Fn,k − Fk ≥ 0

  • 3. Regularity:

Fk ∈ Ck−3 ([0, 1])

Generalized Greenwood Statistics

slide-36
SLIDE 36

Motivation Framework Application

Results

Observation (Recursion) Let G(x) = ∞

m=0 Li−2m(x)/m!, then

ESn,k2m

2

= m!

n−1

k−1

[xn] (Sm(x))⋆(k) .

Corollaries (Discrete)

  • 1. ε-approximation in

O

n

ε log

n

ε

+ n

ε log k

  • time
  • 2. Conservative hypothesis

tests

  • 3. Alternative Scaling

limits: CLT, LLN, large deviations Corollaries (Continuous)

  • 1. Continuum

approximation: Fn,k − Fk∞ ∈ O

n−1

  • 2. Monotonicity:

Fn,k − Fk ≥ 0

  • 3. Regularity:

Fk ∈ Ck−3 ([0, 1])

Generalized Greenwood Statistics

slide-37
SLIDE 37

Motivation Framework Application

Results

Observation (Recursion) Let G(x) = ∞

m=0 Li−2m(x)/m!, then

ESn,k2m

2

= m!

n−1

k−1

[xn] (Sm(x))⋆(k) .

Corollaries (Discrete)

  • 1. ε-approximation in

O

n

ε log

n

ε

+ n

ε log k

  • time
  • 2. Conservative hypothesis

tests

  • 3. Alternative Scaling

limits: CLT, LLN, large deviations Corollaries (Continuous)

  • 1. Continuum

approximation: Fn,k − Fk∞ ∈ O

n−1

  • 2. Monotonicity:

Fn,k − Fk ≥ 0

  • 3. Regularity:

Fk ∈ Ck−3 ([0, 1])

Generalized Greenwood Statistics

slide-38
SLIDE 38

Motivation Framework Application

Results

Observation (Recursion) Let G(x) = ∞

m=0 Li−2m(x)/m!, then

ESn,k2m

2

= m!

n−1

k−1

[xn] (Sm(x))⋆(k) .

Corollaries (Discrete)

  • 1. ε-approximation in

O

n

ε log

n

ε

+ n

ε log k

  • time
  • 2. Conservative hypothesis

tests

  • 3. Alternative Scaling

limits: CLT, LLN, large deviations Corollaries (Continuous)

  • 1. Continuum

approximation: Fn,k − Fk∞ ∈ O

n−1

  • 2. Monotonicity:

Fn,k − Fk ≥ 0

  • 3. Regularity:

Fk ∈ Ck−3 ([0, 1])

Generalized Greenwood Statistics

slide-39
SLIDE 39

Motivation Framework Application

Results

Observation (Recursion) Let G(x) = ∞

m=0 Li−2m(x)/m!, then

ESn,k2m

2

= m!

n−1

k−1

[xn] (Sm(x))⋆(k) .

Corollaries (Discrete)

  • 1. ε-approximation in

O

n

ε log

n

ε

+ n

ε log k

  • time
  • 2. Conservative hypothesis

tests

  • 3. Alternative Scaling

limits: CLT, LLN, large deviations Corollaries (Continuous)

  • 1. Continuum

approximation: Fn,k − Fk∞ ∈ O

n−1

  • 2. Monotonicity:

Fn,k − Fk ≥ 0

  • 3. Regularity:

Fk ∈ Ck−3 ([0, 1])

Generalized Greenwood Statistics

slide-40
SLIDE 40

Motivation Framework Application

Results

Observation (Recursion) Let G(x) = ∞

m=0 Li−2m(x)/m!, then

ESn,k2m

2

= m!

n−1

k−1

[xn] (Sm(x))⋆(k) .

Corollaries (Discrete)

  • 1. ε-approximation in

O

n

ε log

n

ε

+ n

ε log k

  • time
  • 2. Conservative hypothesis

tests

  • 3. Alternative Scaling

limits: CLT, LLN, large deviations Corollaries (Continuous)

  • 1. Continuum

approximation: Fn,k − Fk∞ ∈ O

n−1

  • 2. Monotonicity:

Fn,k − Fk ≥ 0

  • 3. Regularity:

Fk ∈ Ck−3 ([0, 1])

Generalized Greenwood Statistics

slide-41
SLIDE 41

Motivation Framework Application

Application to Two-Sample Testing

Generalized Greenwood Statistics

slide-42
SLIDE 42

Motivation Framework Application

Comparing Non-Parametric Two-Sample Tests

|| Sn,k ||2

2

Kolmogorov-Smirnov

7 20

Figure: Hypothesis testing based on Sn,k2

2 is more sensitive to variance

changes than common other two-sample tests.

Generalized Greenwood Statistics

slide-43
SLIDE 43

Motivation Framework Application

Comparing Non-Parametric Two-Sample Tests

|| Sn,k ||2

2

Kolmogorov-Smirnov

Null Alternative

Figure: Hypothesis testing based on Sn,k2

2 is more sensitive to

compound mean and variance changes than common other two-sample tests, for randomly generated null and alternative of common support.

Generalized Greenwood Statistics

slide-44
SLIDE 44

Motivation Framework Application

Comparing Non-Parametric Two-Sample Tests

|| Sn,k ||2

2

Kolmogorov-Smirnov

Null Alternative

Figure: Hypothesis testing based on Sn,k2

2 is more sensitive to

compound mean and variance changes than common other two-sample tests, for randomly generated null and alternative of distinct support.

Generalized Greenwood Statistics

slide-45
SLIDE 45

Motivation Framework Application

New Perspectives on old Questions

What happened?

  • 1. Discretized continuous Greenwood Statistic
  • 2. Understood discretized problem through generating

functions of moments

  • 3. CDF reconstruction from moments, CLT, transfer to

continuous problem

  • 4. Application to two-sample testing

What happens now?

  • 1. Apply hypothesis test to real data
  • 2. Quantify more precisely the power against given classes of

alternatives

Generalized Greenwood Statistics

slide-46
SLIDE 46

Motivation Framework Application

New Perspectives on old Questions

What happened?

  • 1. Discretized continuous Greenwood Statistic
  • 2. Understood discretized problem through generating

functions of moments

  • 3. CDF reconstruction from moments, CLT, transfer to

continuous problem

  • 4. Application to two-sample testing

What happens now?

  • 1. Apply hypothesis test to real data
  • 2. Quantify more precisely the power against given classes of

alternatives

Generalized Greenwood Statistics

slide-47
SLIDE 47

Motivation Framework Application

New Perspectives on old Questions

What happened?

  • 1. Discretized continuous Greenwood Statistic
  • 2. Understood discretized problem through generating

functions of moments

  • 3. CDF reconstruction from moments, CLT, transfer to

continuous problem

  • 4. Application to two-sample testing

What happens now?

  • 1. Apply hypothesis test to real data
  • 2. Quantify more precisely the power against given classes of

alternatives

Generalized Greenwood Statistics

slide-48
SLIDE 48

Motivation Framework Application

New Perspectives on old Questions

What happened?

  • 1. Discretized continuous Greenwood Statistic
  • 2. Understood discretized problem through generating

functions of moments

  • 3. CDF reconstruction from moments, CLT, transfer to

continuous problem

  • 4. Application to two-sample testing

What happens now?

  • 1. Apply hypothesis test to real data
  • 2. Quantify more precisely the power against given classes of

alternatives

Generalized Greenwood Statistics

slide-49
SLIDE 49

Motivation Framework Application

New Perspectives on old Questions

What happened?

  • 1. Discretized continuous Greenwood Statistic
  • 2. Understood discretized problem through generating

functions of moments

  • 3. CDF reconstruction from moments, CLT, transfer to

continuous problem

  • 4. Application to two-sample testing

What happens now?

  • 1. Apply hypothesis test to real data
  • 2. Quantify more precisely the power against given classes of

alternatives

Generalized Greenwood Statistics

slide-50
SLIDE 50

Motivation Framework Application

New Perspectives on old Questions

What happened?

  • 1. Discretized continuous Greenwood Statistic
  • 2. Understood discretized problem through generating

functions of moments

  • 3. CDF reconstruction from moments, CLT, transfer to

continuous problem

  • 4. Application to two-sample testing

What happens now?

  • 1. Apply hypothesis test to real data
  • 2. Quantify more precisely the power against given classes of

alternatives

Generalized Greenwood Statistics

slide-51
SLIDE 51

Motivation Framework Application

New Perspectives on old Questions

What happened?

  • 1. Discretized continuous Greenwood Statistic
  • 2. Understood discretized problem through generating

functions of moments

  • 3. CDF reconstruction from moments, CLT, transfer to

continuous problem

  • 4. Application to two-sample testing

What happens now?

  • 1. Apply hypothesis test to real data
  • 2. Quantify more precisely the power against given classes of

alternatives

Generalized Greenwood Statistics

slide-52
SLIDE 52

Motivation Framework Application

Acknowledgements

Jonathan Terhorst Yun Song Jonathan Fischer Funding: German Academic Scholarship Foundation Generalized Greenwood Statistics