computational geometry and statistical depth measures
play

Computational geometry and statistical depth measures Eynat - PowerPoint PPT Presentation

Computational geometry and statistical depth measures Eynat Rafalin Computer Science Department, Tufts University www.cs.tufts.edu/research/geometry Joint work with Prof. Diane Souvaine 1 Interface 04 Outline of talk Data analysis,


  1. Computational geometry and statistical depth measures Eynat Rafalin Computer Science Department, Tufts University www.cs.tufts.edu/research/geometry Joint work with Prof. Diane Souvaine 1 Interface 04

  2. Outline of talk � Data analysis, Computational geometry and depth based statistics � Applications – A basic technique: the duality transform – Least Median of Squares (LMS) regression in optimal time – Half-space depth contours in optimal time – Depth contours – Simplicial depth � Future research 2 Interface 04

  3. Computational Geometry � Deals with problems that require geometric algorithms for their solutions. � Systematic study of algorithms and data structures for Computational geometry is geometric objects, with a focus on exact algorithms that are asymptotically fast. everywhere! � At the outset: once exact algorithms have been obtained, refined, and are still slow, then move to approximation algorithms. 3 Interface 04

  4. Computational geometry & Statistics – data analysis 4 Interface 04

  5. Multivariate analysis by Data depth � Data depth - A way of measuring how deep a given point x in R d is relative to F , a probability distribution, or relative to a given data cloud. � Examples: – Halfspace (Location, Tukey) depth (Hodges 55, Tukey 75) – Simplicial depth (Liu 90) – Convex Hull Peeling depth (Barnett 76, Eddy 82) – Regression depth (Rousseeuw & Hubert 99) – Mahalanobis depth (Mahalanobis 36) – Oja depth (Oja 83) 5 Interface 04

  6. Multivariate analysis by Data depth � Data depth - A way of measuring how deep a given point x in R d is relative to F , a probability distribution, or relative to a given data cloud. � Concept provides center outward ordering of points. � Non parametric, multivariate statistics. � Robust. � affine invariance - for many depth functions the choice of axes does not affect the depth values. 6 Interface 04

  7. Outliers and Robustness � Observations that deviate from the main part of the data ( outliers ) can have an undesirable influence on the analysis of the data � A robust depth function yields reasonable results even if several unannounced outliers occur in the data [Handbook of statistics 15, Rao & Maddala]. � For example – Depth contours are nested contours that enclose regions with increasing depth – For half-space depth contours: in the presence of m outliers only the m outermost depth contours may be corrupted by the outliers, but the inner set of depth contours will maintain 7 its shape [Donoho & Gasko 92]. Interface 04

  8. Data depth: a characterization, visualization and quantification tool � Deepest point � Outliers � Depth contours � Bag-plot (Box-plot) [ Rousseeuw , Ruts, Tukey 99] � Scale curve as a measure of scale [Liu, Parelius, Singh 99] � Fan plot as a measure of tailedness [Liu, Parelius, Singh 99] � Robustified classification and cluster analysis [ Rousseeuw , Ruts 96] 8 Interface 04

  9. Fan plots [Liu, Parelius & Singh 99] Relative area (CH of p%/CH) Percentile of points 50 data points, created from a random distribution, with covariance matrix 4 times identity. The fans are created for data sets containing the 1/6, 2/6, ..central regions. For each region the area of the CH of 2, 4, 6,…% of the points is 9 Interface 04 computed.

  10. The continuous and finite sample case � Most depth functions are defined in respect to a probability distribution F , considering {X 1 ,.., X n } random observations from F . � The finite sample version of the depth function is obtained by replacing F by F n , the empirical distribution of the sample {X 1 ,.., X n }. � In general, computational geometers study the finite sample case! 10 Interface 04

  11. Applications

  12. Applications � History – Shamos, Geometry and statistics: problems at the interface,1976 – Bentley & Shamos, A problem in multivariate statistics: algorithm, data structure and applications, 1977 12 Interface 04

  13. Data set of the stellar cluster CYGOB1 (Leroy & Rousseeuw 87) Logarithm of light intensity Given a set of points find a line such that the sum of the squares of the residuals is minimized Star spectrum 13 Interface 04

  14. Data set of the stellar cluster CYGOB1 (Leroy & Rousseeuw 87) Logarithm of light intensity Given a set of points find a line such that the median of the squares of the residuals is minimized Star spectrum 14 Interface 04

  15. Least Median of Squares Regression � Ordinary least sum of Squares – Low breakdown point � Least median of squares – high breakdown point � Given a set of points, find a line such that the median of the squares of the residuals is minimized � Find two parallel lines at minimum vertical distance from each other with half of the data points in the slab they define � naïve approach O(n 3 ) � O(n 2 logn) time algorithm for computing the LMS line in R 2 [Souvaine,Steele 87] B � An O(n 2 ) algorithm using duality and topologcial sweep [Edelsbrunner,Souvaine 90] A C l 15 Interface 04

  16. Points and lines � It is hard to find an order in a set of points. � An arrangement of lines is easier. � A set of points can be transformed into an arrangement of lines, preserving important properties using duality: T a point (a,b) a line y=ax+b 16 Interface 04

  17. TC:y=3x TD:y=4x-1 Duality TB:y=2x+1 TA:y=x+2 l: y = -x+3 T(l) A (1,3) (1,2) T(m) B (2,2) (2,1) C (3,0) D (4,-1) m: y=-2x+2 Primal Dual Primal Dual T Preserves slope, vertical distance and the a point (a,b) a line y=ax+b above\below relationship T ? (-c, d) A line y=cx+d 17 Interface 04

  18. LMS Primal LMS B z y A x l LMS dual C TA Tx The LMS line bisects a slab bounded by 2 parallel TC lines, one of which goes T l through 2 data points and Ty the other goes through one data point TB (Provable characteristics of LMS) 18 Interface 04 Tz

  19. � Least Median of Squares (LMS) Regression – The LMS line can be computed in 2D in O(n 2 ) [Edelsbrunner, Souvaine 90]. Earlier result: [Souvaine, Steele 87] – Practical approximation algorithm [Mount, Netanyahu, Romanik, Silverman, Yu 97], [Mount, Erickson, Har-Peled 04] 19 Interface 04

  20. Half-space depth

  21. D The half-space depth of a F p point p is the minimum number of points of a G given set S lying in any E A closed halfplane bounded B by a line through p C Question – – how to compute the half how to compute the half- -space depth space depth Question contours efficiently? (naive cost per point 2 )) contours efficiently? O(n 2 (naive cost per point– – O(n )) 21 Interface 04

  22. The depth of a point p – – The minimum The minimum The depth of a point p number of points of S S lying in any closed lying in any closed number of points of halfspace determined by a line through determined by a line through p halfspace p � A line l through p a point T(l) through line T(p) � k points in the half-plane above the line l through p k lines above the point T(l) � To count how many lines above another point TA look at the level D TB p F TC TD G E TE T(l) A B T(p) TF l 22 Interface 04 TG C

  23. All the half-space depth contours in R 2 can be Depth 1 D p F computed in O(n 2 ) time using topological sweep Depth 2 G E [Miller, Ramaswami, Rousseeuw, A Sellares,Souvaine,Streinu,Struyf,01] B TA TB TC C TD TE Tp TF 23 Interface 04 TG

  24. Half-space depth contours � The minimum number of points lying in any closed half-space determined by a line through p - the min level of the dual line T(l) � To compute the k-th half-space depth contour (all points of depth at least k) find the k-th level in the dual 24 Interface 04

  25. Sweeping an arrangement of lines � Vertical line sweep � Topological line sweep – Report all intersection – Report all intersection pairs pairs – sorted in order of x – according to a partial order related to the levels coordinate of the arrangement – O(n 2 ) time and O(n) space – O(n 2 logn) time and O(n) space 25 Interface 04

  26. Duality in 3D Primal Dual Primal Dual T a point (a,b,c) a plane z=ax+by+c 26 Interface 04

  27. Half-space depth in R d � The depth of a point p is the minimum number of points of a given set S lying in any closed half-space bounded by a line hyperplane through p 27 Interface 04

  28. Collaboration – half-space depth � The depth of a single point can be computed in O(nlog n) [Rousseeuw & Ruts 1996]. The lower bound is Ω (n log n) [Aloupis, Cortes, Gomez, Soss, Toussaint 02] � Computing the 2D tukey median can be done in O(n log 5 n) [Matousek 1991], and was improved to O(n log 3 n) [Langerman, Steiger 03] � Computing all 2D depth contours can be done in O(n 2 ) time using duality & topological sweep [Miller, Ramaswami, Rousseeuw, Sellares, Souvaine, Streinu, Struyf, 01] � Another approach for computing depth contours uses parallel arrangement construction [Fokuda & Rosta, 02] � Half-space depth contours can be computed for display in 2D using hardware assisted computation [Krishnan, Mustafa, Venkatasubramanian 02 ] 28 Interface 04

  29. Depth Contours

  30. Depth Contours � nested contours that enclose regions with increasing depth. � First introduced by Tukey as a data visualization tool for a two dimensional data (half-space depth contours) [Tukey 75] � Provide powerful tools to visualize and compare data sets. 30 Interface 04

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend