spanner
play

Spanner A distributed database system Presented by Yue Xia - PowerPoint PPT Presentation

Spanner A distributed database system Presented by Yue Xia Background - Developed by Google initially as a key-value storage system - Developers want traditional database features like query language - Evolved to a full featured SQL


  1. Spanner A distributed database system Presented by Yue Xia

  2. Background - Developed by Google initially as a key-value storage system - Developers want traditional database features like query language - Evolved to a full featured SQL system - Now used by teams across many parts of Google and Alphabet

  3. Overview - Distributed computing - Architecture - Replication - Partitioning - Interleaving - Range Extraction - Distributed Union - Distributed Join - Data storage - PAX - LSM Tree - Conclusion and Discussion

  4. Replication - Multiple datacenters in different geographic locations - Data replicated in each datacenter - Run query on nearest datacenter

  5. Partitioning Id Name Department - Each datacenter has multiple servers 3 Alice ‘A’ Shard 1: Department 2 Eve ‘A’ = ‘A’ - Data row-range sharded (partitioned) Shard 2: 1 Carol ‘B’ Department - Shards distributed across servers in each 4 Bob ‘C’ = ‘B’ to ‘C‘ datacenter Shard 3: 0 George ‘D’ Department 5 Fred ‘D’ = ‘D’

  6. Interleaving Parent & child tables interleaved Customer Id Customer Name and co-located Order # Price customer 1 Alice order 1 $2 Customer Join Order only needs one scan of the interleaved table order 2 $4 customer 2 Bob order 1 $6

  7. Query Execution 1. Go to the nearest datacenter 2. Extract key range 3. Run query only on shards covering the key range

  8. Range Extraction - Goal Given a query, we want to know: - What shards to access - What fragments of shards to access (seek into smaller key ranges instead of scanning the full shard)

  9. Range Extraction - Filter Tree AND Example: scan Table filter A=1 && ((B=’a’ && C=1) A=1 OR || (B>’a’ && C=2) AND AND Construct a tree according to the filter condition. B=’a’ C=1 B>’a’ C=2

  10. Range Extraction - Filter Tree AND Example: scan Table filter A=1 && ((B=’a’ && C=1) A=1 OR || (B>’a’ && C=2) A:[1,1] AND AND First find the range for A. Assign leaf node an initial B=’a’ C=1 B>’a’ C=2 interval A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] A:[- ∞ , ∞ ]

  11. Range Extraction - Filter Tree A:[1,1] ∩ [- ∞ , ∞ ] = [1,1] AND Find the interval for each node from bottom to top. A:[- ∞ , ∞ ] AND is intersection and OR A=1 OR is union. A:[1,1] A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] AND AND The range for A is [1,1] B=’a’ C=1 B>’a’ C=2 A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] A:[- ∞ , ∞ ]

  12. Range Extraction - Filter Tree A:[1,1] AND Then find the range for B B:[- ∞ , ∞ ] ∩ [‘a’, ∞ ] = [‘a’, ∞ ] A:[- ∞ , ∞ ] A=1 OR B:[‘a’,’a’]U(‘a’, ∞ ]=[‘a’, ∞ ] A:[1,1] B:[- ∞ , ∞ ] A:[- ∞ , ∞ ] AND A:[- ∞ , ∞ ] AND B:[‘a’,’a’] ∩[ - ∞ , ∞ ] = [‘a’,’a’] B:(’a’, ∞ ] ∩[ - ∞ , ∞ ] = (’a’, ∞ ] B=’a’ C=1 B>’a’ C=2 A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] B:[‘a’,’a’] B:[- ∞ , ∞ ] B:(‘a’, ∞ ] B:[- ∞ , ∞ ]

  13. Range Extraction - Filter Tree A:[1,1] AND Then C B:[‘a’, ∞ ] C:[1,1] if B=’a’, [2,2] if B>’a’ Note that C’s range A:[- ∞ , ∞ ] B:[‘a’, ∞ ] depends on B A=1 OR C:[1,1] if B=’a’, [2,2] if B>’a’ A:[1,1] B:[- ∞ , ∞ ] A:[- ∞ , ∞ ] C:[- ∞ , ∞ ] AND A:[- ∞ , ∞ ] AND B:[‘a’,’a’] B:(’a’, ∞ ] C:[1,1] C:[2,2] B=’a’ C=1 B>’a’ C=2 A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] A:[- ∞ , ∞ ] B:[‘a’,’a’] B:[- ∞ , ∞ ] B:(‘a’, ∞ ] B:[- ∞ , ∞ ] C:[- ∞ , ∞ ] C:[1,1] C:[- ∞ , ∞ ] C:[2, 2 ]

  14. Range Extraction - Query Rewrite Rewrite filtered scan to self-join JOIN scan Table filter A=1 && ((B=’a’ && C=1) || JOIN (B>’a’ && C=2) becomes: Scan3(Table) Filter A=@A And B=@B Where the conditions Scan2(Table) And C = 1 if B=‘a’, C=2 if B>’a’ Scan1(Table) Filter A=@A output columns of interest in red are from the filter Filter A in [1,1] AND B in [‘a’,∞] output @A tree output @A,@B

  15. Range Extraction - Execution Scan1(Table) Filter A in [1,1] record id A B C output @A 1 1 ‘9’ 0 2 1 ‘a’ 0 Output: 1 3 1 ‘a’ 1 4 1 ‘ab’ 0 5 1 ‘b’ 1 6 2 ‘c’ 2 …….

  16. Range Extraction - Execution Scan2(Table) JOIN Filter A=@A AND B in [‘a’,∞] output @A, @B A record id A B C Output: 1 1 1 ‘9’ 0 1, ‘a’ 2 1 ‘a’ 0 1, ’ab’, 3 1 ‘a’ 1 1, ‘b’, 4 1 ‘ab’ 0 5 1 ‘b’ 1 6 2 ‘c’ 2 …….

  17. Range Extraction - Execution Scan3(Table) JOIN Filter A=@A And B=@B record id A B C A B And C = 1 if B=‘a’, C=2 if B>’a’ output columns of interest 1 ‘a’ 1 1 ‘9’ 0 (output record_id if the table is sharded 1 ‘ab’ 2 1 ‘a’ 0 according to record_id) 1 ‘b’ 3 1 ‘a’ 1 Output: 4 1 ‘ab’ 0 3 5 1 ‘b’ 1 6 2 ‘c’ 2 …….

  18. Range Extraction - Query Rewrite Join 3 tables. JOIN Too slow? JOIN Scan3(Table) Filter A=@A And B=@B Scan2(Table) And C = 1 if B=‘a’, C=2 if B>’a’ Scan1(Table) Filter A=@A output columns of interest Filter A in [1,1] AND B in [‘a’,∞] output @A output @A, @B

  19. Range Extraction - Actual Execution Scan1(Table) Filter A in [1,1] record id A B C output @A 1 1 ‘9’ 0 2 1 ‘a’ 0 Output 1 without accessing the data as A is fixed to 1. 3 1 ‘a’ 1 4 1 ‘ab’ 0 5 1 ‘b’ 1 6 2 ‘c’ 2 …….

  20. Range Extraction - Execution Scan2(Table) JOIN Filter A=@A AND B in [‘a’,∞] output @B A record id A B C Seek the first record with A = 1 and B = ‘a’ 1 1 1 ‘9’ 0 instead of scanning the whole table. 2 1 ‘a’ 0 3 1 ‘a’ 1 Record 2 is found. Output 1, ‘a’ 4 1 ‘ab’ 0 Then seek the next record with A = 1 and B 5 1 ‘b’ 1 != ‘a’, which is record 4. Output 1, ‘ab’ (skip records with B=’a’) 6 2 ‘c’ 2 …….

  21. Range Extraction - Execution Scan3(Table) JOIN Filter A=@A And B=@B record id A B C A B And C = 1 if B=‘a’, C=2 if B>’a’ output columns of interest 1 ‘a’ 1 1 ‘9’ 0 1 ‘ab’ 2 1 ‘a’ 0 Similar to Scan2, seek instead of scan. 1 ‘b’ 3 1 ‘a’ 1 Can finger 4 1 ‘ab’ 0 5 1 ‘b’ 1 6 2 ‘c’ 2 …….

  22. Range Extraction - Conclusion - Filter Tree: find range for each column values. - Rewrite filter to multiple self joins. - Execute with seek instead of scan.

  23. Distributed Union - A new relational algebra operator - Send subquery to each shard and concatenate results

  24. Distributed Union - Replace scan with distributed union of scan Scan(Table) -> DistributedUnion[shard ⊆ T](Scan(shard)) - Pull distributed union above as many operations as possible. (push computation to each server)

  25. Distributed Union Some operations can be directly rewritten by: Op(DistributedUnion[shard ⊆ T](Scan(shard))) -> DistributedUnion[shard ⊆ T](Op(Scan(shard))) Example: - Basic operations like projection, filtering… - Group by K or Ordered by K if sharded according to K - Join of interleaved table

  26. Distributed Union Some need extra processing: Op(DistributedUnion[shard ⊆ T](Scan(shard))) -> Op_Final(DistributedUnion[shard ⊆ T](Op_Local(Scan(shard)))) Example: - Top(5) can be done by finding the top 5 in each shard and then finding the top 5 among the results from all shards

  27. Distributed Union - Optimization - Multiple levels of distributed union Distributed Union - On large shards, further parallelize between subshards Distributed Distributed Union Union - Detect locally hosted shards and avoid remote call Shard Shard Shard Shard 1 2 3 4

  28. Distributed Union - Optimization - Range extraction: Extract key range Map key range to shards Send to min # of servers such that they contain all the required shards Only run query on required shards

  29. Batched Distributed Join - Join can also be distributed - Send batches of left table to each shards - Join batch with local shards - Union

  30. Batched Distributed Join - Optimization - Select left table to fit in a batch - Range extraction for each batch - Construct the minimum batch to be sent to each shard

  31. Data Storage - PAX - Data stored in Partition Attributes Across (PAX) layout - Records are horizontally partitioned in pages - In each page all values of each attribute are grouped together - Greatly improves cache performance

  32. Data Storage - PAX Table PAX Page Id Name Age 0,1,2,... 0 Alice 15 Alice,Bob,Carol,... 1 Bob 20 2 Carol 25 15,20,25,... ... ... ...

  33. Data Storage - PAX Not Used N-ary Storage Model (NSM) Page Cache Table 0,Alice,15|1,Bob,20| Bob,20,2 2,Carol,25|... Id Name Age 0 Alice 15 1 Bob 20 PAX Page Cache 2 Carol 25 0,1,2,... 15,20,25 ... ... ... Alice,Bob,Carol,... 15,20,25,... - SELECT age WHERE age> 20 - Cache miss will cache the asked value and the values next to it

  34. Data Storage - LSM Tree - Insert, update or delete would require rewriting the whole file. Id Age Id Age 0 5 0 5 3 10 insert 1 15 3 10 2 20 1 15 2 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend