Jerry in the Age of Trees Werner Stuetzle Department of Statistics, - - PowerPoint PPT Presentation

jerry in the age of trees
SMART_READER_LITE
LIVE PREVIEW

Jerry in the Age of Trees Werner Stuetzle Department of Statistics, - - PowerPoint PPT Presentation

Jerry in the Age of Trees Werner Stuetzle Department of Statistics, UW May 15, 2019 At a time long long ago The Car The Fashion The Band The Stanford Stat Computing Facility The Frontier Virtually Unlimited Storage The Message Should be


slide-1
SLIDE 1

Jerry in the Age of Trees

Werner Stuetzle

Department of Statistics, UW

May 15, 2019

slide-2
SLIDE 2

At a time long long ago

slide-3
SLIDE 3

The Car

slide-4
SLIDE 4

The Fashion

slide-5
SLIDE 5

The Band

slide-6
SLIDE 6

The Stanford Stat Computing Facility

slide-7
SLIDE 7

The Frontier

slide-8
SLIDE 8

Virtually Unlimited Storage

slide-9
SLIDE 9

The Message Should be Clear

slide-10
SLIDE 10

Who’s this Cool Dude??

slide-11
SLIDE 11

Gems - Not so Hidden

slide-12
SLIDE 12

PRIM-9: An interactive multidimensional data display and analysis system (with Mary Anne Fisherkeller and John

Tukey, 1974, 208 citations)

A Projection Pursuit algorithm for exploratory data analysis (with John Tukey, 1974, 2245 citations)

slide-13
SLIDE 13

An algorithm for finding best matches in logarithmic time (with Jon Bentley and Ari Finkel, 1976, 3150 Citations) Data structures for range searching (with Jon Bentley,

1979, 814 citations)

A recursive partitioning decision rule for nonparametric classification (1977, 507 citations) A tree-structured approach to nonparametric multiple regression (acknowledges Leo Breiman, Charles Stone, Larry

Rafsky, 1979, 67 citations)

slide-14
SLIDE 14

Fast algorithms for constructing minimal spanning trees in coordinate spaces (with Jon Bentley, 1978, 137 citations) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests (with Larry Rafsky, 1979, 606

citations)

slide-15
SLIDE 15

Hidden Gems

slide-16
SLIDE 16

A nonparametric procedure for comparing multivariate point sets (with Sam Steppel, 1973, 13 citations)

Given: Two samples S1 and S2. Question: Are they from the same population? Idea: ◮ For each obs i in S1 ∪ S2 count number of S1 obs among its k nearest neighbors ⇒ mi ◮ If they are from the same population then the distribution of mi for obs in S1 and obs in S2 should be the same. ◮ Comparison of univariate distributions can be calibrated using permutations.

slide-17
SLIDE 17

Data analysis techniques for high energy particle physics (1974, 45 citations)

Given: Two sets of features X and Y observed for the same collection of objects. Question: Are they independent? ◮ For each obs i find k nearest neighbors in X-space and k nearest neighbors in Y -space. ◮ Find mi number of shared nearest neighbors ◮ Compare to permutation distribution

slide-18
SLIDE 18

A nested partitioning procedure for numerical multiple integration and adaptive importance sampling (with

Margaret Wright (?), 1978, 51 citations) Goal: Compute integral of multivariate function f over a box. Idea: ◮ There may be small regions that dominate the integral ⇒ need to stratify. ◮ Strata consist of axis parallel boxes ◮ Optimal strata depend on sd of f , but sd is as hard to estimate as mean ◮ Use numerical optimization to find max and min of f in box

slide-19
SLIDE 19

Looking forward to Jerry @ 100