Smooth Scan: Statistics-Oblivious Access Paths
Renata Borovica-Gajic Stratos Idreos Anastasia Ailamaki Marcin Zukowski Campbell Fraser
Access Paths Renata Borovica-Gajic Stratos Idreos Anastasia - - PowerPoint PPT Presentation
Smooth Scan: Statistics-Oblivious Access Paths Renata Borovica-Gajic Stratos Idreos Anastasia Ailamaki Marcin Zukowski Campbell Fraser Optimizers sensitivity to statistics Setting : TPC-H, SF10, DBMS-X, Tuning tool 5GB space 1000 400
Renata Borovica-Gajic Stratos Idreos Anastasia Ailamaki Marcin Zukowski Campbell Fraser
2
0.1 1 10 100 1000
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q16 Q18 Q19 Q21 Q22
Normalized exec. time (log) TPC-H Query
Tuned Original
Setting: TPC-H, SF10, DBMS-X, Tuning tool 5GB space 400
Selectivity Execution time
100% Full Scan
RISK
Index Scan Performance cliff Estimated Actual Full Scan
3
Re-optimization
[MID’98, POP’04, RIO’05, BOU’14]
4
...
HEAP PAGES INDEX
Sequential Access + (fast) sequential I/O
Index Access + read what you need
5
100% Index Scan
Robust Execution
Selectivity Execution time
Full Scan Full Scan
RISK
6
7
1. Index Access: Traditional index access 2. Entire Page Probe: Index access probes entire page 3. Gradual Flattening Access: Probe adjacent region(s)
8
...
HEAP PAGES INDEX
Mode 1 Mode 2 Mode 3
– Greedy – Selectivity Increase Driven – Elastic
Selectivity increase -> Mode Increase SEL_region > SEL_global Selectivity decrease -> Mode Decrease SEL_region < SEL_global
INDEX
SR:1 SR:1 SR:0.5 SR:0.75 SR:1 SR:1 SR:0.5 SG: 0
X: Page with result SR: Region selectivity SG: Global selectivity
1 0.81 0.66 0.7 0.75
HEAP PAGES
9
Index Scan Full Scan Sort Scan Smooth Scan Avoid repeated accesses
Fast sequential I/O
Avoid full table read
Tuples pipelining
10
11
2 Intel Xeon 6-core CPU @2.8 GHz, 48GB RAM HDD: I/O transfer rate 120 MB/s, Random vs. Sequential ratio = 10
PostgreSQL 9.2.1: Index Scan, Full Scan, Sort (Bitmap) Scan, Smooth Scan
TPC-H: SF 10 Micro-benchmark: 400M tuples, 10 columns random (1 – 105), 25GB Q1: select * from relation where c2 >= 0 and c2< X% [order by c2];
Cold file system cache
Setting: TPC-H, SF10, PostgreSQL with Smooth Scan
200 400 600 800 1000 1200 1400
Q1 (Sort Scan) Q4 (Full Scan) Q6 (Index Scan) Q7 (Index Scan) Q14 (Index Scan)
Execution time (sec)
PostgreSQL PostgreSQL with Smooth Scan
15% 10x
12
High selectivity Low selectivity
200 400 600 800 1000 1200
pSQL pSQL w. Smooth Scan pSQL pSQL w. Smooth Scan pSQL pSQL w. Smooth Scan pSQL pSQL w. Smooth Scan pSQL pSQL w. Smooth Scan Q1 (Sort Scan) Q4 (Full Scan) Q6 (Index Scan) Q7 (Index Scan) Q14 (Index Scan)
Execution time (sec) CPU Utilization I/O Wait time
Q1 Q4 Q6 Q7 Q14 pSQL Smooth S. pSQL Smooth S. pSQL Smooth S. pSQL Smooth S. pSQL Smooth S. # I/O Requests (K) 70 77 224 235 566 95 745 124 416 87
13
Time (sec)
Index Scan
Time (sec) Time (sec) Block address
Sequential Scan Smooth Scan Setting: TPC-H, Q1, Lineitem table, iosnoop tool
14
0.1 1 10 100 1000 10000 100000 0.001 0.01 0.1 1 20 50 75 100
Execution time (sec)
Selectivity(%) Full Scan Index Scan Sort Scan Smooth Scan 0.1 1 10 100 1000 10000 100000 0.001 0.01 0.1 1 20 50 75 100
Execution time (sec)
Selectivity(%) Full Scan Index Scan Sort Scan Smooth Scan
NO ORDER BY ORDER BY
Setting: Micro-benchmark, Q1 (w. and w/o. order), Selectivity 0-100%
115x
Robust Execution Robust Execution
15
16
renata.borovica@epfl.ch stratos@seas.harvard.edu natassa@epfl.ch marcin@snowflake.net campbellf@google.com