motivation
play

Motivation Many applications of databases manipulate geographical - PDF document

Motivation Many applications of databases manipulate geographical (2-d) data. Others involve large number of dimensions Multidimensional (Spatial) Examples: Indexing location of restaurants in a city. Map data: zones, county


  1. Motivation • Many applications of databases manipulate geographical (2-d) data. Others involve large number of dimensions Multidimensional (Spatial) • Examples: Indexing – location of restaurants in a city. – Map data: zones, county lines, rivers, lakes, etc. (Data has spatial extent) – Sales information described by store, day, item, color, size, etc. Sale = point in multidimensional space. – Student described by age, zipcode, marital status. CS5208 – Spatial Indexing 1 Applications with Multi-Dimensional Data Types of Queries • Point queries • Range Query: “find all McDonald restaurants within a given region”. Point Query Range Query • Nearest Neighbor Query: Find the nearest McDonald to my house • Partial match queries • Spatial join (“all pairs” queries) NN Query Spatial Join Query Multi-attribute Indexes Bitmap Indexes Examples of composite key • Composite Search Keys : Search on • Bitmap indices are a special type of index designed for efficient a combination of fields. indexes using lexicographic order. querying on multiple keys – Equality query: Every field value is equal to a constant value. E.g. wrt 11,80 11 <sal,age> index: • Records in a relation are assumed to be numbered sequentially 12,10 12 • age=12 & sal =75 name age sal 12,20 12 – Range query: Some field value is not • Given a number n it must be easy to retrieve record n 13,75 bob 12 10 13 a constant. E.g.: (Particularly easy if records are of fixed size) <age, sal> cal 11 80 <age> • age=12 & sal > 10 (use <age, sal>) joe 12 20 • age < 12 & sal = 10 (use <age,sal> • Applicable on attributes that take on a relatively small number of may fetch more records than desired) 10,12 sue 13 75 10 distinct values • Data entries in index sorted by Data records 20 20,12 – E.g. gender, country, state, … search key to support range queries. sorted by name 75,13 75 – E.g. income-level (income broken up into a small number of levels – Lexicographic order, or 80,11 80 such as 0-9999, 10000-19999, 20000-50000, 50000- infinity) – Spatial order. <sal, age> <sal> Data entries Data entries in index • A bitmap is simply an array of bits sorted by <sal> sorted by <sal,age> CS5208 – Spatial Indexing 5 CS5208 – Spatial Indexing 6

  2. Use of Bitmap Indexes: Example Bitmap Indexes (Cont.) • In its simplest form, a bitmap index on an attribute has a bitmap for each value of the attribute • Queries are answered using logical (bitwise) operations – Bitmap has as many bits as records – Intersection (and) – In a bitmap for value v, the bit for a record is 1 if the record has the – Union (or) value v for the attribute, and is 0 otherwise – Complementation (not) – Size = nm bits where n is the #records, m is the #distinct values • Each operation takes two bitmaps of the same size and applies the operation on corresponding bits to get the result bitmap – Males with income level L1 • 10010 AND 10100 = 10000 • Can then retrieve required tuples • Counting number of matching tuples is even faster • Range queries? – Age IN [30,40] AND Salary IN [10k,20k] 7 CS5208 – Spatial Indexing 8 CS5208 – Spatial Indexing Compressed Bitmaps Compressed Bitmap (Cont.) • If n and m are large, then nm bits may incur high I/O • Consider 0000000000000110001 • Compress the bitmap – run-length encoding – The encoded sequence is … – A sequence of i 0’s followed by a 1 (run) is represented by some binary encoding of the integer i • Now consider 000000010000 (i.e., n = 12) – A number i is represented by (log 2 i -1) 1-bit (indicates • What is the compressed bitmap? the number of bits required to represent i ) and a single 0, followed by its binary value • E.g., 13 = 1101 (binary) is represented as 111 0 1101 • Decode 110111 13 log13-1 • Exceptions: i = 0 is 00; i = 1 is 01 – What about the (missing) 0’s? – Every run incurs 2 log 2 i bits CS5208 – Spatial Indexing 9 CS5208 – Spatial Indexing 10 Operating on Compressed Bitmap Operating on Compressed Bitmap • Need to decode first, then perform the • Need to decode first, then perform the bitwise operations bitwise operations • But can be done incrementally • But can be done incrementally • Suppose we ORed encodings: • Suppose we ORed encodings: 0 0 1 1 0 1 1 1 1 1 0 1 1 1 0 0 1 1 0 1 1 1 1 1 0 1 1 1 7 7 7 0 0 00000001 00000001 0 0 00000001 OUTPUT: 0 OUTPUT: 0 0 0 0 0 0 0 1 CS5208 – Spatial Indexing 11 CS5208 – Spatial Indexing 12

  3. Why spatial index methods (SAMs)? Operating on Compressed Bitmap • B-tree & hash tables – Guarantee the number of I/O operations is • Need to decode first, then perform the respectively logarithmic and constant with respect bitwise operations to the collection’s size – Index a collection on a key • But can be done incrementally – Rely on a total order on the key domain, the order • Suppose we ORed encodings: of natural numbers, or the lexicographic order on strings 0 0 1 1 0 1 1 1 1 1 0 1 1 1 • There is no such total order for multidimensional objects and geometric 7 7 0 objects with spatial extent 00000001 0 00000001 • SAMs were designed to try as much as OUTPUT: 0 0 0 0 0 0 0 1 1 possible to preserve spatial object proximity CS5208 – Spatial Indexing 13 CS5208 – Spatial Indexing 14 Multidimensional Indexing Structures Grid File: A Space-based Approach • Space-Based structures: – Partition the embedding Space into rectangular cells – Independent from the distribution of the objects • Start with one bucket – Objects are mapped to the cells based on some geometric criterion for the whole space. – Eg: Grid file, Buddy-tree, KDB-tree • Data-Based structures: • Select dividers along – Organize by partitioning the set of objects based on spatial proximity such that each group can fit into a page each dimension. – Adapt to the objects’ distribution Partition space into – Eg. R-tree, R* tree, R+ tree • Mapping cells – Transform the data into lower dimensional space • Dividers cut all the – E.g., space filling curve way CS5208 – Spatial Indexing 15 CS5208 – Spatial Indexing 16 Grid File Implementation Grid File • Dynamic structure using a grid directory • Each cell corresponds to 1 disk page. – Grid array: a 2 dimensional array with • Many cells can point pointers to buckets (this array can be large, to the same page. disk resident) G(0,…, nx-1, 0, …, ny-1) • Cell directory – Linear scales: Two 1 dimensional arrays that potentially exponential in the number of used to access the grid array (main memory) dimensions X(0, …, nx-1), Y(0, …, ny-1) CS5208 – Spatial Indexing 17 CS5208 – Spatial Indexing 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend