principles of database systems
play

Principles of Database Systems V. Megalooikonomou Fractals and - PowerPoint PPT Presentation

Principles of Database Systems V. Megalooikonomou Fractals and Databases (based on notes by C. Faloutsos at CMU) Indexing - Detailed outline fractals intro applications 2 Intro to fractals - outline Motivation 3 problems /


  1. Principles of Database Systems V. Megalooikonomou Fractals and Databases (based on notes by C. Faloutsos at CMU)

  2. Indexing - Detailed outline  fractals  intro  applications 2

  3. Intro to fractals - outline  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More examples and tools  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 3

  4. Problem # 1: GIS - points Road end-points of Montgomery county: •Q1: how many d.a. for an R-tree? •Q2 : distribution? •not uniform •not Gaussian •no rules?? 4

  5. Problem # 2 - spatial d.m. Galaxies (Sloan Digital Sky Survey -B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores and households ...) - patterns? - attraction/ repulsion? - how many ‘spi’ within r from an ‘ell’? 5

  6. Problem # 3: traffic  disk trace (from HP - J. Wilkes); Web traffic - fit a model # bytes - how many explosions to expect? Poisson - queue length distr.? time 6

  7. Common answer:  Fractals / self-similarities / power laws  Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …) 7

  8. Road map  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More examples and tools  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 8

  9. What is a fractal? = self-similar point set, e.g., Sierpinski triangle: zero area; ... infinite perimeter! 9

  10. Definitions (cont’d)  Paradox: Infinite perimeter ; Zero area!  ‘dimensionality’: between 1 and 2  actually: Log(3)/Log(2) = 1.58... 10

  11. Dfn of fd: ONLY for a perfectly self-similar point set: zero area; ... infinite length! = log(n)/ log(f) = log(3)/ log(2) = 1.58 a perfectly self-similar object with n similar pieces each scaled down by a factor f 11

  12. Intrinsic (‘fractal’) dimension  Q: fractal dimension of a line?  A: 1 (= log(2)/log(2)!) 12

  13. Intrinsic (‘fractal’) dimension  Q: dfn for a given set of points? x y 5 1 4 2 3 3 2 4 13

  14. Intrinsic (‘fractal’) dimension  Q: fractal dimension of  Q: fd of a plane? a line?  A: nn ( < = r ) ~ r^ 2  A: nn ( < = r ) ~ r^ 1 fd= = slope of (log(nn) vs (‘power law’: y= x^ a) log(r) ) 14

  15. Intrinsic (‘fractal’) dimension  Algorithm, to estimate it? Notice  avg nn(< = r) is exactly tot# pairs(< = r) / (2* N) including ‘mirror’ pairs 15

  16. Sierpinsky triangle = = ‘correlation integral’ log(# pairs within < = r ) 1.58 log( r ) 16

  17. Observations:  Euclidean objects have integer fractal dimensions  point: 0  lines and smooth curves: 1  smooth surfaces: 2  fractal dimension -> roughness of the periphery 17

  18. Important properties  fd = embedding dimension -> uniform pointset  a point set may have several fd, depending on scale 18

  19. Road map  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More examples and tools  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 19

  20. Problem # 1: GIS points Cross-roads of Montgomery county: •any rules? 20

  21. Solution # 1 A: self-similarity -> log(# pairs(within < = r))  < = > fractals  < = > scale-free  < = > power-laws 1.51 (y= x^ a, F= C* r^ (- 2))  avg# neighbors(< = r ) = r^ D log( r ) 21

  22. Solution # 1 A: self-similarity log(# pairs(within < = r))  avg# neighbors(< = r ) ~ r^ (1.51) 1.51 log( r ) 22

  23. Examples:MG county  Montgomery County of MD (road end- points) 23

  24. Examples:LB county  Long Beach county of CA (road end- points) 24

  25. Solution# 2: spatial d.m. Galaxies ( ‘BOPS’ plot - [sigmod2000]) log(# pairs) log(r) 25

  26. Solution# 2: spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - spi-spi repulsion! spi-ell log(r) 26

  27. spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - spi-spi repulsion! spi-ell log(r) 27

  28. spatial d.m. r1 r2 Heuristic on choosing # of clusters r2 r1 28

  29. spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - spi-spi repulsion! spi-ell log(r) 29

  30. spatial d.m. log(# pairs within < = r ) - 1.8 slope ell-ell - plateau! - repulsion spi-spi - duplicates !! spi-ell log(r) 30

  31. Solution # 3: traffic  disk traces: self-similar: # bytes time 31

  32. Solution # 3: traffic  disk traces (80-20 ‘law’ = ‘multifractal’) 20% 80% # bytes time 32

  33. Solution# 3: traffic Clarification:  fractal: a set of points that is self-similar  multifractal: a probability density function that is self-similar Many other time-sequences are bursty/clustered: (such as?) 33

  34. Tape accesses # tapes needed, to retrieve n records? Tape# 1 Tape# N (# days down, due to failures / hurricanes / communication time noise...) 34

  35. Tape accesses 50-50 = Poisson # tapes retrieved Tape# 1 Tape# N real time # qual. records 35

  36. Road map  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More tools and examples  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 36

  37. More tools  Zipf’s law  Korcak’s law / “fat fractals” 37

  38. A famous power law: Zipf’s law • Q: vocabulary word frequency in a document - any pattern? freq. aaron zoo 38

  39. A famous power law: Zipf’s law log(freq) “a” • Bible - rank vs frequency (log- “the” log) log(rank) 39

  40. A famous power law: Zipf’s law log(freq) • Bible - rank vs frequency (log-log) • similarly, in many other languages; for customers and log(rank) sales volume; city populations etc etc 40

  41. A famous power law: Zipf’s law log(freq) •Zipf distr: freq = 1/ rank •generalized Zipf: freq = 1 / (rank)^ a log(rank) 41

  42. Olympic medals (Sidney): log(# medals) 2.5 2 1.5 Series1 Linear (Series1) 1 y = -0.9676x + 2.3054 R 2 = 0.9458 0.5 0 0 0.5 1 1.5 2 rank 42

  43. More power laws: areas – Korcak’s law Scandinavian lakes Any pattern? 43

  44. More power laws: areas – Korcak’s law log(count( > = area)) Scandinavian lakes area vs complementary cumulative count log(area) (log-log axes) 44

  45. More power laws: Korcak Japan islands 45

  46. More power laws: Korcak log(count( > = area)) Japan islands; area vs cumulative log(area) count (log-log axes) 46

  47. (Korcak’s law: Aegean islands) 47

  48. Korcak’s law & “fat fractals” How to generate such regions? 48

  49. Korcak’s law & “fat fractals” Q: How to generate such regions? A: recursively, from a single region 49

  50. so far we’ve seen:  concepts:  fractals, multifractals and fat fractals  tools:  correlation integral (= pair-count plot)  rank/frequency plot (Zipf’s law)  CCDF (Korcak’s law) 50

  51. Road map  Motivation – 3 problems / case studies  Definition of fractals and power laws  Solutions to posed problems  More tools and examples  Discussion - putting fractals to work!  Conclusions – practitioner’s guide  Appendix: gory details - boxcounting plots 51

  52. Other applications: Internet  How does the internet look like? CMU 52

  53. Other applications: Internet  How does the internet look like?  Internet routers: how many neighbors within h hops? CMU 53

  54. (reminder: our tool-box:)  concepts:  fractals, multifractals and fat fractals  tools:  correlation integral (= pair-count plot)  rank/frequency plot (Zipf’s law)  CCDF (Korcak’s law) 54

  55. Internet topology  Internet routers: how many neighbors within h hops? log(# pairs) Reachability function: number of neighbors 2.8 within r hops, vs r (log- log). Mbone routers, 1995 log(hops ) 55

  56. More power laws on the Internet log(degree) -0.82 log(rank) degree vs rank, for Internet domains (log-log) [sigcomm99] 56

  57. More power laws - internet  pdf of degrees: (slope: 2.2 ) Log(count) -2.2 Log(degree) 57

  58. Even more power laws on the Internet log( i-th eigenvalue) 0.47 log(i) Scree plot for Internet domains (log-log) [sigcomm99] 58

  59. More apps: Brain scans  Oct-trees; brain-scans Log(# octants) 2.63 = fd octree levels 59

  60. More apps: Medical images [Burdett et al, SPIE ‘93]:  benign tumors: fd ~ 2.37  malignant: fd ~ 2.56 60

  61. More fractals:  cardiovascular system: 3 (!)  stock prices (LYCOS) - random walks: 1.5 1 year 2 years  Coastlines: 1.2-1.58 (Norway!) 61

  62. 62

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend