selection detection and two sample testing generalized
play

Selection Detection and Two-Sample-Testing: Generalized Greenwood - PowerPoint PPT Presentation

Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their Applications an Daniel Erdmann-Pham, Jonathan Terhorst & Yun S. Song University of California, Berkeley July 9, 2019 SPA 2019 Motivation Framework


  1. Selection Detection and Two-Sample-Testing: Generalized Greenwood Statistics and their Applications Ðan Daniel Erdmann-Pham, Jonathan Terhorst & Yun S. Song University of California, Berkeley July 9, 2019 SPA 2019

  2. Motivation Framework Application Two Problems Generalized Greenwood Statistics

  3. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  4. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  5. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  6. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  7. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  8. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  9. Motivation Framework Application Population Genetics: Detecting Selective Pressure Neutral Tree Tree with Selection ◮ At each depth, leaf set ◮ Leaf set sizes are highly sizes are approximately unbalanced close to the equidistributed root ◮ Given a tree, how can we tell whether it was generated under selection or not? ◮ Data allows computation of sum of squares of leaf set sizes Generalized Greenwood Statistics

  10.  ≠  ≠ Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  11.  ≠  ≠ Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] X k ~ Y k ( Null ) How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  12. ≠ Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] X k ~ Y k ( Null )  [ X k ] ≠  [ Y k ] ( Alternative ) How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  13. Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] X k ~ Y k ( Null )  [ X k ] ≠  [ Y k ] ( Alternative ) Var [ X k ] ≠ Var [ Y k ] ( Alternative ) How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  14. Motivation Framework Application Two-Sample Tests: Comparing { X k } k ∈ [ n ] and { Y k } k ∈ [ m ] X k ~ Y k ( Null )  [ X k ] ≠  [ Y k ] ( Alternative ) Var [ X k ] ≠ Var [ Y k ] ( Alternative ) How to test the hypothesis whether { X k } and { Y k } are identi- cally distributed? Generalized Greenwood Statistics

  15. Motivation Framework Application Sampling uniformly from the k -dimensional simplex ∆ k − 1 Generalized Greenwood Statistics

  16. Motivation Framework Application Balls and bins Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  17. Motivation Framework Application Balls and bins Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  18. Motivation Framework Application Balls and bins Limit as n → ∞ for fixed k ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  19. Motivation Framework Application Balls and bins k S n , k Limit as n → ∞ for fixed k 1 2 S n , k S n , k ... ... ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  20. Motivation Framework Application Balls and bins k S n , k Limit as n → ∞ for fixed k 1 2 S n , k S n , k ... ... ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations (Schechtner, Zinn ’00) ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  21. Motivation Framework Application Balls and bins k S n , k Limit as n → ∞ for fixed k 1 2 S n , k S n , k ... ... ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations ◮ Can we perform hypothesis (Schechtner, Zinn ’00) testing based on � S n , k � 2 2 ? ◮ Tabulation of z -scores up to k = 20 (Burrows ’79, Currie ’81, Stephens ’81) Generalized Greenwood Statistics

  22. Motivation Framework Application Balls and bins k S n , k Limit as n → ∞ for fixed k 1 2 S n , k S n , k ... ... ◮ Greenwood Statistic (Greenwood ’46) ◮ Some moments, CLT, statistical efficiency (Moran ’47, ’51, ’53) ◮ Geometry: intersection of � n · ∆ k − 1 ∩ Z + � ◮ S n , k ∼ U L 1 and L 2 balls (Bose-Einstein- ◮ Up to k = 3 (Gardner Distribution) ’52) ◮ Large deviations ◮ Can we perform hypothesis (Schechtner, Zinn ’00) testing based on � S n , k � 2 2 ? ◮ Tabulation of z -scores up What is the distribution of to k = 20 (Burrows ’79, � S n , k � 2 2 ? Currie ’81, Stephens ’81) Generalized Greenwood Statistics

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend