interactive analysis of web scale database
play

Interactive Analysis of Web-Scale Database Presented by Jian Fang - PowerPoint PPT Presentation

Dremel: Interactive Analysis of Web-Scale Database Presented by Jian Fang Most parts of these slides are stolen from here: http://bit.ly/HIPzeG What is Dremel Trillion-record, multi-terabyte datasets at interactive speed Scales to


  1. Dremel: Interactive Analysis of Web-Scale Database Presented by Jian Fang Most parts of these slides are stolen from here: http://bit.ly/HIPzeG

  2. What is Dremel  Trillion-record, multi-terabyte datasets at interactive speed  Scales to thousands of nodes  Fault and straggler tolerant execution  Nested data model  Complex datasets; normalization is prohibitive  Columnar storage and processing  Tree architecture (as in web search)  Interoperates with Google's data management tools  In situ data access (e.g., GFS, Bigtable)  MapReduce pipelines

  3. Widely used inside Google  Analysis of crawled web documents  Tracking install data for applications on Android Market  Spam analysis  Results of tests run on Google's distributed build system  Disk I/O statistics for hundreds of thousands of disks  ……

  4. Outline  Nested columnar storage  Query processing  Experiments  Observations

  5. Common Storage Layer  Google File System  Fault tolerance  Fast response time  Data can be manipulated easily

  6. Rows vs Columns r 1 DocId: 10 Links A Forward: 20 * * Name . . . Language B E Code: 'en-us' * Country: 'us' C D Url: 'http://A' r 1 Name Url: 'http://B' r 1 r 1 r 1 r 2 r 2 r 2 r 2 Read less, cheaper . . . decompression Challenge: preserve structure, reconstruct from a subset of fields

  7. r 1 DocId: 10 Nested Data Model Links Forward: 20 Forward: 40 Forward: 60 Name Language message Document { Code: 'en-us' required int64 DocId; Country: 'us' Language optional group Links { Code: 'en' repeated int64 Backward; Url: 'http://A' Name repeated int64 Forward; Url: 'http://B' } Name Language repeated group Name { Code: 'en-gb' repeated group Language { Country: 'gb' required string Code; r 2 optional string Country; DocId: 20 Links } Backward: 10 optional string Url; Backward: 30 Forward: 80 } Name } Url: 'http://C' 7

  8. Repetition and Definition Levels r 1 DocId: 10 Links  Values alone do not convey the Forward: 20 structure of a record Forward: 40 Forward: 60  Repetition levels Name R = 0 Language  It tells us at what repeated field in the Code: 'en-us' field’s path the value has repeated Country: 'us' R = 2 Language  Example: r1, Name.Language.Code Code: 'en' Url: 'http://A'  Repetition level: [0,2,1,1] Name R = 1 Language Code: NULL Url: 'http://B' Name R = 1 Language Code: 'en-gb' Country: 'gb'

  9. Repetition and Definition Levels r 1  Definition Levels DocId: 10 Links  Specifying how many fields in a path Forward: 20 Forward: 40 that could be undefined are actually Forward: 60 present in the record Name  Example: Name.Language.Country Language Code: 'en-us' Country: 'us' Language Missing Country, Code: 'en' d = 2 Url: 'http://A' Name Missing Language, Url: 'http://B' d = 1 Name Language Code: 'en-gb' Country: 'gb'

  10. Column-striped representation DocId Name.Url Links.Forward Links.Backward value r d value r d value r d value r d 10 0 0 http://A 0 2 20 0 2 NULL 0 1 20 0 0 http://B 1 2 40 1 2 10 0 2 NULL 1 1 60 1 2 30 1 2 http://C 0 2 80 0 2 Name.Language.Code Name.Language.Country value r d value r d en-us 0 2 us 0 3 en 2 2 NULL 2 2 NULL 1 1 NULL 1 1 en-gb 1 2 gb 1 3 NULL 0 1 NULL 0 1

  11. Record Assembly FSM DocId message Document { 0 required int64 DocId; 0 optional group Links { 1 1 Links.Backward Links.Forward repeated int64 Backward; 0 repeated int64 Forward; } 0,1,2 repeated group Name { Name.Language.Code Name.Language.Country repeated group Language { required string Code; 2 optional string Country; Name.Ur } 0,1 1 l optional string Url; } 0 } Transitions labeled with repetition levels

  12. Reading two fields s 1 DocId: 10 Name Language Country: 'us' DocId Language 0 Name Name 1,2 Name.Language.Country Language Country: 'gb' 0 s 2 DocId: 20 Name

  13. Query Processing  Optimized for select-project-aggregate  Very common class of interactive queries  Single scan  Within-record and cross-record aggregation  Approximations: count(distinct), top-k  Joins, temp tables, UDFs/TVFs, etc.

  14. Serving Tree client root server intermediate . . . servers . . . leaf servers (with local . . . storage) storage layer (e.g., GFS)

  15. Example: count() SELECT A, COUNT(B) FROM T SELECT A, SUM(c) 0 GROUP BY A FROM (R 1 1 UNION ALL R 1 10) T = {/gfs/1, /gfs/2, …, /gfs/100000} GROUP BY A R 1 1 R 1 2 SELECT A, COUNT(B) AS c SELECT A, COUNT(B) AS c 1 . . . FROM T 1 1 GROUP BY A FROM T 1 2 GROUP BY A T 1 1 = {/gfs/1, …, /gfs/10000} T 1 2 = {/gfs/10001, …, /gfs/20000} SELECT A, COUNT(B) AS c . . . 3 FROM T 3 1 GROUP BY A T 3 1 = {/gfs/1} Data access ops

  16. Experiments Table Number of Size (unrepl., Number Data Repl.  1 PB of real data name records compressed) of fields center factor (uncompressed, non- 3 × replicated) T1 85 billion 87 TB 270 A 3 × T2 24 billion 13 TB 530 A  100K-800K tablets per table 3 × T3 4 billion 70 TB 1200 A  Experiments run during 3 × T4 1+ trillion 105 TB 50 B business hours 2 × T5 1+ trillion 20 TB 30 B

  17. Read from disk "cold" time on local disk, averaged over 30 runs time (sec) ( e ) parse as from records C++ objects 10x speedup using columnar objects ( d ) read + storage decompress records ( c ) parse as from columns columns C++ objects ( b ) assemble 2-4x overhead of records using records ( a ) read + decompress number of fields Table partition: 375 MB (compressed), 300K rows, 125 columns

  18. MapReduce and Dremel Execution Avg # of terms in txtField in 85 billion record table T1 execution time (sec) on 3000 nodes Sawzall program ran on MR: num_recs: table sum of int; num_words: table sum of int; emit num_recs <- 1; emit num_words <- count_words(input.txtField); 87 TB 0.5 TB 0.5 TB SELECT SUM(count_words(txtField)) / COUNT(*) Q1: FROM T1 MR overheads: launch jobs, schedule 0.5M tasks, assemble records

  19. Impact of serving tree depth execution time (sec) (returns 1M records) (returns 100s of records) SELECT country, SUM(item.amount) FROM T2 Q2: GROUP BY country 40 billion nested items SELECT domain, SUM(item.amount) FROM T2 Q3: WHERE domain CONTAINS ’ .net ’ GROUP BY domain

  20. Observations  Possible to analyze large disk-resident datasets interactively on commodity hardware  1T records, 1000s of nodes  MR can benefit from columnar storage just like a parallel DBMS  But record assembly is expensive  Interactive SQL and MR can be complementary  Parallel DBMSes may benefit from serving tree architecture just like search engines

  21. More Information  Big Query: http://code.google.com/apis/bigquery/  Apache Drill: http://incubator.apache.org/drill/index.html Thank You

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend