Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d - - PDF document

com putin g the depth of a flat
SMART_READER_LITE
LIVE PREVIEW

Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d - - PDF document

Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d David Eppstein UC Irvin e 1 Robust Regression Given data with depen den t an d in depen den t vars Describe depen den t vars as fun ction of in dep. on es Should be robust again


slide-1
SLIDE 1

Com putin g the Depth of a Flat

Marshall Bern

Xerox PARC an d

David Eppstein

UC Irvin e

1

slide-2
SLIDE 2

Robust Regression

Given data with depen den t an d in depen den t vars Describe depen den t vars as fun ction of in dep. on es Should be robust again st arbitrary outliers Prefer distan ce-free m ethods for robustn ess again st skewed an d data-depen den t n oise

2

slide-3
SLIDE 3

Exam ple: Data Depth

(n o variables in depen den t) Fit a poin t to a cloud of data poin ts Depth of a fit x = m in # data poin ts in halfspace con tain in g x

x

Tukey m edian = poin t with m ax possible depth

3

slide-4
SLIDE 4

Kn own Results for Data Depth

Tukey m edian has depth ≥

  • n

d + 1

  • [Radon 1946]

Deep (but n ot optim ally deep) poin t can be foun d in tim e polyn om ial in n an d d

[Clarkson , Eppstein , Miller, Sturtivan t, Ten g 1996]

Deepest poin t can be foun d in tim e O(nd) (lin ear program with that m any con strain ts) Com putin g the depth of a poin t is NP-com plete for variable d

[John son & Preparata 1978]

O(nd−1+n log n) for fixed d

[Rousseeuw & Struyf 1998]

4

slide-5
SLIDE 5

Exam ple: Regression Depth

(all but on e variable in depen den t)

[Hubert & Rousseeuw 1998]

Fit a hyperplan e to a cloud of data poin ts Non fit = vertical hyperplan e (doesn ’t predict depen den t variable) Depth of a fit = m in # data poin ts crossed while m ovin g to a n on fit

5

slide-6
SLIDE 6

Kn own Results for Regression Depth

Deepest hyperplan e has depth ≥

  • n

d + 1

  • [Am en ta, Bern , Eppstein , Ten g 1998; Mizera 1998]

Deepest hyperplan e can be foun d in tim e O(nd) (breadth first search in arran gem en t) Plan ar deepest lin e can be foun d in O(n log n)

[van Kreveld et al. 1999; Lan germ an & Steiger 2000]

Com putin g the depth of a hyperplan e is NP-com plete for variable d

[Am en ta et al. 1998]

O(nd−1+n log n) for fixed d

[Rousseeuw & Struyf 1998]

6

slide-7
SLIDE 7

Multivariate Regression Depth

(any n um ber k of in depen den t variables)

[Bern & Eppstein 2000]

Defin ition of depth for k-flat Equals data depth for k = 0 Equals regression depth for k = d − 1 Deepest flat has depth Ω(n) Con jecture: depth ≥

  • n

(k + 1)(d − k) + 1

  • true for k = 0, k = 1, k = d − 1

7

slide-8
SLIDE 8

New Results

Com putin g the depth of a k-flat is O(nd−2 + n log n) when 0 < k < d − 1 Saves a factor of n com pared to sim ilar results for regression depth, data depth Determ in istic O(n log n) for lin es in space (k = 1, d = 3) Ran dom ized O(nd−2) for all other cases Likely can be deran dom ized usin g ǫ-n et techn iques

8

slide-9
SLIDE 9

Projective Geom etry

Augm en t Euclidean geom . by “poin ts at in fin ity” On e in fin ite poin t per fam ily of parallel lin es Set of in fin ite poin ts form s “hyperplan e at in fin ity” Equivalen tly: view hyperplan es an d poin ts as equators an d pairs of poles on a sphere Non fit = k-flat touchin g som e particular (d − k − 1)-flat at in fin ity

9

slide-10
SLIDE 10

Projective Duality

In ciden ce-preservin g correspon den ce between k-flats an d (d − k − 1)-flats Cloud of data poin ts becom es arran gem en t of hyperplan es In coordin ates (two dim en sion al case): (a, b) → y = ax + b y = m x + c → (−m , c)

10

slide-11
SLIDE 11

Crossin g Distan ce

Crossin g distan ce between a j-flat an d a k-flat in a hyperplan e arran gem en t = m in im um n um ber of hyperplan es crossed by any lin e segm en t con n ectin g the two flats (in cl. lin e segm en ts “through in fin ity”)

11

slide-12
SLIDE 12

Defin ition of Depth

Depth of a k-flat F = crossin g distan ce between dual(F) an d dual((d − k − 1)-flat at in fin ity) In prim al space, m in im um # data poin ts in double wedge boun ded by F an d by ((d − k − 1)-flat at in fin ity Non fit always has depth zero (zero-len gth lin e seg, em pty wedge)

12

slide-13
SLIDE 13

Param etrizin g Lin e Segm en ts

Let F1, F2 be flats (un orien ted projective spaces) If F1 ∩ F2 = ∅, any pair (p1 ∈ F1, p2 ∈ F2) determ in es un ique lin e through them Need on e m ore bit of in form ation to specify which of two lin e segm en ts: double cover (orien ted proj. spaces) O1, O2 Two-to-on e correspon den ce O1 × O2 → lin e segm en ts

13

slide-14
SLIDE 14

When does a segm en t cross a hyperplan e?

Set of lin e segm en ts crossin g hyperplan e H is h1 ⊕ h2 where hi are halfspaces in Oi with boun dary(hi) = H ∩ Oi Or m ore sim ply, disjoin t un ion of two sets halfspace × halfspace

O1 F1+ ∞ F1– O2 ∞ F2+ F2–

Lin e seg w/ fewest crossin gs = poin t covered fewest tim es by such sets

14

slide-15
SLIDE 15

Algorithm for k = 1, d = 3:

Wan t poin t in torus O1 × O2 covered by fewest rectan gles h1 × h2 Sweep left-right (i.e., by O1-coordin ate), use segm en t tree to keep track of shallowest poin t in sweep lin e Tim e: O(n log n)

Algorithm for Higher Dim en sion s:

Replace segm en t tree by history tree of ran dom ized in crem en tal arran gem en t Replace sweep by traversal of history tree O(nj+k−1) for crossin g distan ce between j-flat an d k-flat ⇒ O(nd−2) for flat depth

15

slide-16
SLIDE 16

Con clusion s

Presen ted efficien t algorithm for testin g depth Many rem ain in g open problem s in algorithm s, com bin atorics, & statistics How to fin d deepest flat efficien tly? What is its depth? Can we fin d deep flats efficien tly when d is variable? Do local optim ization heuristics work? Are sim ilar ideas of depth useful for n on lin ear regression ?

16