SLIDE 1
Jerry in the Age of Trees Werner Stuetzle Department of Statistics, - - PowerPoint PPT Presentation
Jerry in the Age of Trees Werner Stuetzle Department of Statistics, - - PowerPoint PPT Presentation
Jerry in the Age of Trees Werner Stuetzle Department of Statistics, UW May 15, 2019 At a time long long ago The Car The Fashion The Band The Stanford Stat Computing Facility The Frontier Virtually Unlimited Storage The Message Should be
SLIDE 2
SLIDE 3
The Car
SLIDE 4
The Fashion
SLIDE 5
The Band
SLIDE 6
The Stanford Stat Computing Facility
SLIDE 7
The Frontier
SLIDE 8
Virtually Unlimited Storage
SLIDE 9
The Message Should be Clear
SLIDE 10
Who’s this Cool Dude??
SLIDE 11
Gems - Not so Hidden
SLIDE 12
PRIM-9: An interactive multidimensional data display and analysis system (with Mary Anne Fisherkeller and John
Tukey, 1974, 208 citations)
A Projection Pursuit algorithm for exploratory data analysis (with John Tukey, 1974, 2245 citations)
SLIDE 13
An algorithm for finding best matches in logarithmic time (with Jon Bentley and Ari Finkel, 1976, 3150 Citations) Data structures for range searching (with Jon Bentley,
1979, 814 citations)
A recursive partitioning decision rule for nonparametric classification (1977, 507 citations) A tree-structured approach to nonparametric multiple regression (acknowledges Leo Breiman, Charles Stone, Larry
Rafsky, 1979, 67 citations)
SLIDE 14
Fast algorithms for constructing minimal spanning trees in coordinate spaces (with Jon Bentley, 1978, 137 citations) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests (with Larry Rafsky, 1979, 606
citations)
SLIDE 15
Hidden Gems
SLIDE 16
A nonparametric procedure for comparing multivariate point sets (with Sam Steppel, 1973, 13 citations)
Given: Two samples S1 and S2. Question: Are they from the same population? Idea: ◮ For each obs i in S1 ∪ S2 count number of S1 obs among its k nearest neighbors ⇒ mi ◮ If they are from the same population then the distribution of mi for obs in S1 and obs in S2 should be the same. ◮ Comparison of univariate distributions can be calibrated using permutations.
SLIDE 17
Data analysis techniques for high energy particle physics (1974, 45 citations)
Given: Two sets of features X and Y observed for the same collection of objects. Question: Are they independent? ◮ For each obs i find k nearest neighbors in X-space and k nearest neighbors in Y -space. ◮ Find mi number of shared nearest neighbors ◮ Compare to permutation distribution
SLIDE 18
A nested partitioning procedure for numerical multiple integration and adaptive importance sampling (with
Margaret Wright (?), 1978, 51 citations) Goal: Compute integral of multivariate function f over a box. Idea: ◮ There may be small regions that dominate the integral ⇒ need to stratify. ◮ Strata consist of axis parallel boxes ◮ Optimal strata depend on sd of f , but sd is as hard to estimate as mean ◮ Use numerical optimization to find max and min of f in box
SLIDE 19