2 management of large objects
play

2. Management of large objects LOB = Large OBject Normal DBMS - PowerPoint PPT Presentation

2. Management of large objects LOB = Large OBject Normal DBMS regards a LOB as one field with no internal structure Traditional business-oriented relational DBMSs: Maximum field length e.g. 255 or 32767 bytes Media objects are


  1. 2. Management of large objects � LOB = Large OBject � ‘Normal’ DBMS regards a LOB as one field with no internal structure � Traditional business-oriented relational DBMSs: Maximum field length e.g. 255 or 32767 bytes � Media objects are usually considerably larger � Today’s relational DBMSs support field lengths of several Gbytes, but - Wasteful to access the whole object if only a piece is needed. - The long object may not fit in the main memory. - Piecewise processing should be supported. - The logical structure is handled by higher-level software. - A log file is needed for recovery from errors. Logging a whole object is very ineffective, if only a small part of it is affected. - Secondary storage management should be more flexible: Multiple page sizes or multiple-size clusters of pages would enhance the I/O for variable-length objects. MMDB-2 J. Teuhola 2012 19

  2. SQL and long fields Long data types: � Character large object ( CLOB ), content e.g. HTML, XML � Binary large object ( BLOB ), sequence of 8-bit octets , content e.g. MP3 or JPEG � External, read-only file ( BFILE ), content e.g. AVI, MPEG Operations: � Concatenation � Substring (from a start position for a given length) � Overlay (substring replacement) � Trim (remove given leading/trailing characters) � Length (function returning the number of characters) � Position (start position of searched substring) � But: Not GROUP BY, ORDER BY, join, set operations, etc.) MMDB-2 J. Teuhola 2012 20

  3. Tree-structured representation � B-tree-type multi-level directory: Used e.g. in SQL Server, Oracle, … � Example architecture: EXODUS storage system (extensible OODBMS) � Very flexible management of large objects that can grow and shrink at arbitrary positions. � Not optimized for sequential processing speed (best for long text doc.) � Each object has a unique OID = <page no, slot no> � Two kinds of objects: (1) Small objects fit in one page. (2) Large objects occupy multiple pages, OID points to the header . � Two kinds of pages: (1) Slotted pages contain small objects & headers of large objects. (2) Other pages contain parts of large objects, each page being private to one object, only. � When a small object grows larger than a page, it is converted automatically into a large object. MMDB-2 J. Teuhola 2012 21

  4. Page allocation schematically Slotted pages LOB pages Pages of Small LOB x Small LOB x obj header obj Small Small Small obj obj obj Small LOB y Small obj header obj … Pages of LOB y free space MMDB-2 J. Teuhola 2012 22

  5. Tree-structured representation (cont.) � Physical representation: B + -tree, indexed on byte positions within the object. � Root is a header for the large object � Internal nodes : <count, pointer> pair for each child. - Count means the highest relative byte number (= offset within subtree) rooted at that node. - Pointer means page id (address). The count of the rightmost child is the size of the (sub)tree rooted by the current node. The number of <count, pointer> pairs in a node is between k and 2 k +1 (i.e. nodes are at least about half-full) where degree k is the B + -tree parameter. Internal nodes occupy one page, each. � Leaves are blocks of one or more pages (system parameter). Leaf blocks contain nothing else but actual data. Also leaves can vary from half-full to full. MMDB-2 J. Teuhola 2012 23

  6. Tree-structured representation: Example OID 421 786 120 282 421 192 365 120 bytes 162 bytes 139 bytes 192 bytes 173 bytes � Maximal object sizes for 4Kbyte pages, 4-byte pointers, 4-byte counts and 4-page leaf blocks: - 2-level tree: 8 Mbytes - 3-level tree: 4 Gbytes MMDB-2 J. Teuhola 2012 24

  7. Tree-structured representation (cont.) Notations: � Counts: c[i], pointers p[i], 1 ≤ i ≤ 2k+1. � For convenience, c[0] = 0 Retrieval algorithm: Get a sequence of N bytes, starting at S. begin Read the root page P. Let start = S. while P is a non-leaf node do Save P to a stack Find the smallest c[i] such that start ≤ c[i]. // e.g. binary search Set start := start − c[i-1]. // relative start index Read p[i] as the new page P. The first desired byte is at location start in P. // being in a leaf For the rest of the bytes, walk the tree in depth-first order using the stack. end MMDB-2 J. Teuhola 2012 25

  8. Tree-structured representation (cont.) Insert algorithm: Add a sequence of N bytes after position S. begin Search byte position S, as above, but on the path down, update the byte counts to reflect the insertion and save the path in a stack. Denote the reached leaf by L. if N bytes fit in L then do the insert within L else Allocate a sufficient number of new leaves, and distribute L’s old bytes and the N new bytes evenly among the leaves. Propagate the new counts and pointers upwards (use the stack) If an internal node overflows, it is handled in a similar way as the leaf overflow. end Note : Space utilization can be improved by inspecting the left and right neighbours of the found leaf, and using the available free space. MMDB-2 J. Teuhola 2012 26

  9. Tree-structured representation (cont.) Append algorithm: Add N bytes to the end of an object. (Special case of insert) begin Walk the rightmost path of the tree, add N to the counts, and save the path in a stack. if the rightmost leaf R has N free bytes then do the appending there, and stop else Access R’s left neighbour L. Allocate as many new leaves as required to accommodate L’s and R’s bytes plus the N new ones. Fill all but the last two pages completely, and the last two evenly (both become at least half-full). Propagate the counts and pointers upwards, using the stack. Handle internal node overflows as in insert. end Note : The advantage of this special insert is that it allows large objects to be built in pieces. The next piece fills the last two non-full leaves. MMDB-2 J. Teuhola 2012 27

  10. Tree-structured representation: Observations The organization is quite effective in practice: � Storage utilization is 70% for simple and 80% for advanced insertion. � Complexity of locating the correct position theoretically O(log N), in pratice almost constant. � Access speed is some tens of milliseconds, depending on disk speed and buffering. Not the best choice for streaming media . Extension: Versioning of large objects � Common parts of different versions can be shared. � Updates must not invalidate old versions: nodes on the update path must be copied for changing. � Old versions are not updated, but deletion should be allowed: Avoid deleting nodes shared by other versions. Expensive way: Mark nodes of all other versions, and then discard the unmarked ones. MMDB-2 J. Teuhola 2012 28

  11. Advanced 2-level representation � Example architecture: Starburst long field manager . (Experimental DBMS, developed at IBM research center.) � Suggests an elegant and extremely fast 2-level scheme for long fields. � Key idea: Build the field by allocating variable-size (with size units of exponential scale), physically contiguous disk extents. � Not arbitrary sizes, nor arbitrary starting points. Buddy system : � In a buddy space of 2 n pages, buddy segments can be allocated, so that a segment of size 2 k can start at address 0, 2 k , 2 × 2 k , 3 × 2 k , … � Two same-sized (2 k ) consecutive segments are buddies , if their concatenation is a legal buddy segment of size 2 k +1 . � The address of a segment XORed with its size gives the address of its buddy. � Advantage: Shorter pointers, because the repertoire of segment sizes is restricted. MMDB-2 J. Teuhola 2012 29

  12. Memory architecture in Starburst � The whole external memory is divided into database spaces , that may correspond to e.g. separate disks. � Each database space contains an array of buddy spaces. � A buddy space consists of � An allocation page (specially coded segment index) � 2 n data pages (buddies marked): 2 n 2 n -2 5·2 n -4 3 · 2 n -3 2 n -1 � Fragmentation (normal problem of buddy system) is partially avoided because the long field can be built from several segments. � For any long field, less than one disk page is lost due to fragmentation. MMDB-2 J. Teuhola 2012 30

  13. Long field descriptor in Starburst � The descriptor is a directory to the field components. � The descriptor size is at most 255 bytes and stored in the record where the long field logically belongs to. � The descriptor components: - Database space id - Field size - Number of buddy segments - Sizes of the first and last segment - Pointers (= offsets) to the buddy segments � The key solution to keep the field descriptor small is to have exponentially growing segment sizes. MMDB-2 J. Teuhola 2012 31

  14. Descriptor usage: schematic example ‘ Person’ table PID Name Addr Photo Segments storing the photo 12345 Smith NYC 23456 Jones Dallas 11223 Blake Miami 33211 Brown LA 54321 Clark Denver LOB descriptor, max 255 bytes MMDB-2 J. Teuhola 2012 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend