Storage and Indexing
11/19/2018 1
Storage and Indexing 11/19/2018 1 Overview We covered storage of - - PowerPoint PPT Presentation
Storage and Indexing 11/19/2018 1 Overview We covered storage of unstructured files in HDFS Partition into blocks Replicate to data nodes This lecture will cover the storage of structured and semi-structured data Row vs column formats
11/19/2018 1
11/19/2018 2
11/19/2018 3
11/19/2018 4
Field 1
Field 2 Field 3 …
11/19/2018 5
Name:type
Name:type Name:type
Value Value Value Name:type:value Name:type:value Name:type:value
11/19/2018 6
ID:int
Name:string Email:string
1564 1567 1568 1569 1572 …
Paul Xu Jyeshta Nora Alex …
paul@gmail.com xu@163.com nil alex@live.com nil
11/19/2018 7
11/19/2018 8
11/19/2018 9
ID Name ID Email
11/19/2018 10
11/19/2018 11
11/19/2018 12
Big Data Global Index a.k.a. Partitioning Local Index Local Index Local Index Local Index Local Index
11/19/2018 13
11/19/2018 14
11/19/2018 15
11/19/2018 16
Master Node Memory component Slave Node Disk components Slave Node Disk components Slave Node Disk components New records Flushed … Compact and merge (e.g., External merge sort)
11/19/2018 17
11/19/2018 18