SLIDE 1
Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d - - PDF document
Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d - - PDF document
Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d David Eppstein UC Irvin e 1 Robust Regression Given data with depen den t an d in depen den t vars Describe depen den t vars as fun ction of in dep. on es Should be robust again
SLIDE 2
SLIDE 3
Exam ple: Data Depth
(n o variables in depen den t) Fit a poin t to a cloud of data poin ts Depth of a fit x = m in # data poin ts in halfspace con tain in g x
x
Tukey m edian = poin t with m ax possible depth
3
SLIDE 4
Kn own Results for Data Depth
Tukey m edian has depth ≥
- n
d + 1
- [Radon 1946]
Deep (but n ot optim ally deep) poin t can be foun d in tim e polyn om ial in n an d d
[Clarkson , Eppstein , Miller, Sturtivan t, Ten g 1996]
Deepest poin t can be foun d in tim e O(nd) (lin ear program with that m any con strain ts) Com putin g the depth of a poin t is NP-com plete for variable d
[John son & Preparata 1978]
O(nd−1+n log n) for fixed d
[Rousseeuw & Struyf 1998]
4
SLIDE 5
Exam ple: Regression Depth
(all but on e variable in depen den t)
[Hubert & Rousseeuw 1998]
Fit a hyperplan e to a cloud of data poin ts Non fit = vertical hyperplan e (doesn ’t predict depen den t variable) Depth of a fit = m in # data poin ts crossed while m ovin g to a n on fit
5
SLIDE 6
Kn own Results for Regression Depth
Deepest hyperplan e has depth ≥
- n
d + 1
- [Am en ta, Bern , Eppstein , Ten g 1998; Mizera 1998]
Deepest hyperplan e can be foun d in tim e O(nd) (breadth first search in arran gem en t) Plan ar deepest lin e can be foun d in O(n log n)
[van Kreveld et al. 1999; Lan germ an & Steiger 2000]
Com putin g the depth of a hyperplan e is NP-com plete for variable d
[Am en ta et al. 1998]
O(nd−1+n log n) for fixed d
[Rousseeuw & Struyf 1998]
6
SLIDE 7
Multivariate Regression Depth
(any n um ber k of in depen den t variables)
[Bern & Eppstein 2000]
Defin ition of depth for k-flat Equals data depth for k = 0 Equals regression depth for k = d − 1 Deepest flat has depth Ω(n) Con jecture: depth ≥
- n
(k + 1)(d − k) + 1
- true for k = 0, k = 1, k = d − 1
7
SLIDE 8
New Results
Com putin g the depth of a k-flat is O(nd−2 + n log n) when 0 < k < d − 1 Saves a factor of n com pared to sim ilar results for regression depth, data depth Determ in istic O(n log n) for lin es in space (k = 1, d = 3) Ran dom ized O(nd−2) for all other cases Likely can be deran dom ized usin g ǫ-n et techn iques
8
SLIDE 9
Projective Geom etry
Augm en t Euclidean geom . by “poin ts at in fin ity” On e in fin ite poin t per fam ily of parallel lin es Set of in fin ite poin ts form s “hyperplan e at in fin ity” Equivalen tly: view hyperplan es an d poin ts as equators an d pairs of poles on a sphere Non fit = k-flat touchin g som e particular (d − k − 1)-flat at in fin ity
9
SLIDE 10
Projective Duality
In ciden ce-preservin g correspon den ce between k-flats an d (d − k − 1)-flats Cloud of data poin ts becom es arran gem en t of hyperplan es In coordin ates (two dim en sion al case): (a, b) → y = ax + b y = m x + c → (−m , c)
10
SLIDE 11
Crossin g Distan ce
Crossin g distan ce between a j-flat an d a k-flat in a hyperplan e arran gem en t = m in im um n um ber of hyperplan es crossed by any lin e segm en t con n ectin g the two flats (in cl. lin e segm en ts “through in fin ity”)
11
SLIDE 12
Defin ition of Depth
Depth of a k-flat F = crossin g distan ce between dual(F) an d dual((d − k − 1)-flat at in fin ity) In prim al space, m in im um # data poin ts in double wedge boun ded by F an d by ((d − k − 1)-flat at in fin ity Non fit always has depth zero (zero-len gth lin e seg, em pty wedge)
12
SLIDE 13
Param etrizin g Lin e Segm en ts
Let F1, F2 be flats (un orien ted projective spaces) If F1 ∩ F2 = ∅, any pair (p1 ∈ F1, p2 ∈ F2) determ in es un ique lin e through them Need on e m ore bit of in form ation to specify which of two lin e segm en ts: double cover (orien ted proj. spaces) O1, O2 Two-to-on e correspon den ce O1 × O2 → lin e segm en ts
13
SLIDE 14
When does a segm en t cross a hyperplan e?
Set of lin e segm en ts crossin g hyperplan e H is h1 ⊕ h2 where hi are halfspaces in Oi with boun dary(hi) = H ∩ Oi Or m ore sim ply, disjoin t un ion of two sets halfspace × halfspace
O1 F1+ ∞ F1– O2 ∞ F2+ F2–
Lin e seg w/ fewest crossin gs = poin t covered fewest tim es by such sets
14
SLIDE 15
Algorithm for k = 1, d = 3:
Wan t poin t in torus O1 × O2 covered by fewest rectan gles h1 × h2 Sweep left-right (i.e., by O1-coordin ate), use segm en t tree to keep track of shallowest poin t in sweep lin e Tim e: O(n log n)
Algorithm for Higher Dim en sion s:
Replace segm en t tree by history tree of ran dom ized in crem en tal arran gem en t Replace sweep by traversal of history tree O(nj+k−1) for crossin g distan ce between j-flat an d k-flat ⇒ O(nd−2) for flat depth
15
SLIDE 16