com putin g the depth of a flat
play

Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d - PDF document

Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d David Eppstein UC Irvin e 1 Robust Regression Given data with depen den t an d in depen den t vars Describe depen den t vars as fun ction of in dep. on es Should be robust again


  1. Com putin g the Depth of a Flat Marshall Bern Xerox PARC an d David Eppstein UC Irvin e 1

  2. Robust Regression Given data with depen den t an d in depen den t vars Describe depen den t vars as fun ction of in dep. on es Should be robust again st arbitrary outliers Prefer distan ce-free m ethods for robustn ess again st skewed an d data-depen den t n oise 2

  3. Exam ple: Data Depth (n o variables in depen den t) Fit a poin t to a cloud of data poin ts Depth of a fit x = m in # data poin ts in halfspace con tain in g x x Tukey m edian = poin t with m ax possible depth 3

  4. Kn own Results for Data Depth n � � Tukey m edian has depth ≥ d + 1 [Radon 1946] Deep (but n ot optim ally deep) poin t can be foun d in tim e polyn om ial in n an d d [Clarkson , Eppstein , Miller, Sturtivan t, Ten g 1996] Deepest poin t can be foun d in tim e O ( n d ) (lin ear program with that m any con strain ts) Com putin g the depth of a poin t is NP-com plete for variable d [John son & Preparata 1978] O ( n d − 1 + n log n ) for fixed d [Rousseeuw & Struyf 1998] 4

  5. Exam ple: Regression Depth (all but on e variable in depen den t) [Hubert & Rousseeuw 1998] Fit a hyperplan e to a cloud of data poin ts Non fit = vertical hyperplan e (doesn ’t predict depen den t variable) Depth of a fit = m in # data poin ts crossed while m ovin g to a n on fit 5

  6. Kn own Results for Regression Depth n � � Deepest hyperplan e has depth ≥ d + 1 [Am en ta, Bern , Eppstein , Ten g 1998; Mizera 1998] Deepest hyperplan e can be foun d in tim e O ( n d ) (breadth first search in arran gem en t) Plan ar deepest lin e can be foun d in O ( n log n ) [van Kreveld et al. 1999; Lan germ an & Steiger 2000] Com putin g the depth of a hyperplan e is NP-com plete for variable d [Am en ta et al. 1998] O ( n d − 1 + n log n ) for fixed d [Rousseeuw & Struyf 1998] 6

  7. Multivariate Regression Depth (any n um ber k of in depen den t variables) [Bern & Eppstein 2000] Defin ition of depth for k -flat Equals data depth for k = 0 Equals regression depth for k = d − 1 Deepest flat has depth Ω( n ) � � n Con jecture: depth ≥ ( k + 1 )( d − k ) + 1 true for k = 0, k = 1, k = d − 1 7

  8. New Results Com putin g the depth of a k -flat is O ( n d − 2 + n log n ) when 0 < k < d − 1 Saves a factor of n com pared to sim ilar results for regression depth, data depth Determ in istic O ( n log n ) for lin es in space ( k = 1 , d = 3 ) Ran dom ized O ( n d − 2 ) for all other cases Likely can be deran dom ized usin g ǫ -n et techn iques 8

  9. Projective Geom etry Augm en t Euclidean geom . by “poin ts at in fin ity” On e in fin ite poin t per fam ily of parallel lin es Set of in fin ite poin ts form s “hyperplan e at in fin ity” Equivalen tly: view hyperplan es an d poin ts as equators an d pairs of poles on a sphere Non fit = k -flat touchin g som e particular ( d − k − 1 ) -flat at in fin ity 9

  10. Projective Duality In ciden ce-preservin g correspon den ce between k -flats an d ( d − k − 1 ) -flats Cloud of data poin ts becom es arran gem en t of hyperplan es In coordin ates (two dim en sion al case): ( a , b ) �→ y = ax + b y = m x + c �→ ( − m , c ) 10

  11. Crossin g Distan ce Crossin g distan ce between a j -flat an d a k -flat in a hyperplan e arran gem en t = m in im um n um ber of hyperplan es crossed by any lin e segm en t con n ectin g the two flats (in cl. lin e segm en ts “through in fin ity”) 11

  12. Defin ition of Depth Depth of a k-flat F = crossin g distan ce between dual ( F ) an d dual (( d − k − 1 ) -flat at in fin ity ) In prim al space, m in im um # data poin ts in double wedge boun ded by F an d by (( d − k − 1 ) -flat at in fin ity Non fit always has depth zero (zero-len gth lin e seg, em pty wedge) 12

  13. Param etrizin g Lin e Segm en ts Let F 1 , F 2 be flats (un orien ted projective spaces) If F 1 ∩ F 2 = ∅ , any pair ( p 1 ∈ F 1 , p 2 ∈ F 2 ) determ in es un ique lin e through them Need on e m ore bit of in form ation to specify which of two lin e segm en ts: double cover (orien ted proj. spaces) O 1 , O 2 Two-to-on e correspon den ce O 1 × O 2 �→ lin e segm en ts 13

  14. When does a segm en t cross a hyperplan e? Set of lin e segm en ts crossin g hyperplan e H is h 1 ⊕ h 2 where h i are halfspaces in O i with boun dary ( h i ) = H ∩ O i Or m ore sim ply, disjoin t un ion of two sets halfspace × halfspace O 1 ∞ F 1+ F 1– F 2+ O 2 ∞ F 2– Lin e seg w/ fewest crossin gs = poin t covered fewest tim es by such sets 14

  15. Algorithm for k = 1, d = 3 : Wan t poin t in torus O 1 × O 2 covered by fewest rectan gles h 1 × h 2 Sweep left-right (i.e., by O 1 -coordin ate), use segm en t tree to keep track of shallowest poin t in sweep lin e Tim e: O ( n log n ) Algorithm for Higher Dim en sion s : Replace segm en t tree by history tree of ran dom ized in crem en tal arran gem en t Replace sweep by traversal of history tree O ( n j + k − 1 ) for crossin g distan ce between j -flat an d k -flat ⇒ O ( n d − 2 ) for flat depth 15

  16. Con clusion s Presen ted efficien t algorithm for testin g depth Many rem ain in g open problem s in algorithm s, com bin atorics, & statistics How to fin d deepest flat efficien tly? What is its depth? Can we fin d deep flats efficien tly when d is variable? Do local optim ization heuristics work? Are sim ilar ideas of depth useful for n on lin ear regression ? 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend