SLIDE 1 ' & $ % Bitmap Index Design and Ev aluation Chee-Y
Chan Univ ersit y
Wisconsin-Madison Y annis Ioannidis Univ ersit y
Wisconsin-Madison Univ ersit y
A thens 1
SLIDE 2 ' & $ % In tro duction
remendous gro wth in Decision Supp
Systems (DSS).
DSS Queries: r e ad-mostly, c
adho c, with lar ge foundsets (i.e., high sele ctivity factors).
in terest in bitmap indexing.
m uc h kno wn ab
space-time tradeos. 2
SLIDE 3 Example
a Bitmap Index A B 10 B 9 B 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B 4 1 9 1 1 1 3 1 8 1 2 1 10 1 1 7 1 5 1 6 1 3 1 3
SLIDE 4 ' & $ % Bitmap Index (con t.)
alue-List Index [O'Neil & Quass, SIGMOD'97].
an tages: { Compact represen tation
index (esp ecially for attributes with lo w cardinalit y) ) space and I/O ecien t. { Bitmap
erations (AND, OR, X OR, NOT) are ecien tly supp
b y hardw are. 4
SLIDE 5 ' & $ % Scop e
T alk
Index Design for selection queries
the form: (A
c) where
2 f; ; <; >; =; 6=g: { Range Query:
2 f; ; <; >g. { Equalit y Query:
2 f=; 6=g.
A ttribute v alues are in f0; 1; 2; : : : ; C
where C is the attribute cardinalit y.
F ramew
for Design Space.
T radeo Study . 5
SLIDE 6 Example
a V alue-List Index A B 10 B 9 B 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B 4 1 9 1 1 1 3 1 8 1 2 1 10 1 1 7 1 5 1 6 1 3 1 6
SLIDE 7 ' & $ % Design Space Of Bitmap Indexes for Selection Queries
space consists
2
dimensions (inspired b y [W
et al, VLDB'85]): 1. A ttribute V alue Decomp
determines n um b er and size
index comp
ts. 2. Bitmap Enco ding Sc heme: determines enco ding
bitmap comp
ts.
! ! Comp
t ! ! Bitmap 7
SLIDE 8 ' & $ % 1 st Dimension: A ttribute V alue Decomp
en a sequence
n n um b ers < b n ; b n1 ; : : : ; b 1 >, eac h attribute v alue A is decomp
in to n digits A n A n1 : : : A 1 , where A i is a base-b i digit.
C = 1000 and attribute v alue A = 256. < b n ; : : : ; b 1 > Decomp
A < 1000 > 256 < 50; 20 > 12 (20) + 16 < 32; 32 > 8(32) + < 5; 20; 10 > 1(20)(10) + 5(10) + 6
h < b n ; b n1 ; : : : ; b 1 > (base
index) denes an n-comp
t index. 8
SLIDE 9 A ttribute V alue Decomp
with Base < 3; 4 > A A 2 A 1 4 14+0
1 9 24+1
2 1 1 04+1
1 3 04+3
3 8 24+0
2 2 04+2
2 10 24+2
2 2 04+0
7 14+3
1 3 5 14+1
1 1 6 14+2
1 2 3 04+3
3 9
SLIDE 10 ' & $ % 2 nd Dimension: Bitmap Enco ding Sc hemes
the i th index comp
t with base b i .
w a ys to enco de a v alue x (0
< b i ): Enco ding b i
Represen tation for v alue x Sc heme b i
+ 1 x x
y
1
1
y Enco ded Bitmap: B x i = f records with A i = x g
Enco ded Bitmap: B x i = f records with A i
g B b i 1 i is not materialized since all its bits are set to 1. 10
SLIDE 11 An Equalit y-Enco ded Base-< 3; 4 > Index A A 2 A 1 B 2 2 B 1 2 B 2 B 3 1 B 2 1 B 1 1 B 1 4 1 1 1 9 2 1 1 1 1 1 1 1 3 3 1 1 8 2 1 1 2
2
1 1 10 2 2 1 1 1 1 7 1 3 1 1 5 1 1 1 1 6 1 2 1 1 3 3 1 1 11
SLIDE 12 A Range-Enco ded Base-< 3; 4 > Index A A 2 A 1 B 1 2 B 2 B 2 1 B 1 1 B 1 4 1 1 1 1 1 9 2 1 1 1 1 1 1 1 1 1 3 3 1 1 8 2 1 1 1 2
2
1 1 1 10 2 2 1 1 1 1 1 1 7 1 3 1 5 1 1 1 1 1 6 1 2 1 1 3 3 1 1 12
SLIDE 13 ' & $ %
< b, b, ..., b> log C
b
times < > b2 , b1 , < > b b
2 3
,
1
b
Design Space of Bitmap Indexes
. . . . .
BITMAP ENCODING SCHEME Value-List Bit-Sliced Index Index < C >
. . . . .
DECOMPOSITION VALUE ATTRIBUTE Equality Range
13
SLIDE 14 ' & $ % Space-Time T radeo Issues
Space Time Optimal Space-Time Tradeoff (knee) Time-Optimal Time-Optimal under Space Constraint S Space-Optimal Infeasible Region S
14
SLIDE 15 ' & $ % Analytical Cost Mo del Cost Metrics Space Num b er
bitmaps. Time Exp ected n um b er
bitmap scans for a selection query ev aluation.
Query Distribution Assumption: Query space = fA
v :
2 f; ; <; >; =; 6=g;
< C g, where C is the attribute cardinalit y . 15
SLIDE 16 Comparison
Enco ding Sc hemes
2 4 6 8 10 10 20 30 40 50 60 70 80 90 100 Time (Expected Number of Bitmap Scans) Space (Number of Bitmaps) Range-Encoded Index Equality-Encoded Index 2 4 6 8 10 20 40 60 80 100 Time (Expected Number of Bitmap Scans) Space (Number of Bitmaps) Range-Encoded Index Equality-Encoded Index
(a) C = 100 (b) C = 1000 16
SLIDE 17 ' & $ % Space-Time T radeo Results
n-Comp
t Indexes { Time-Optimal Index = < 2; 2; : : : ; 2; | {z } n1
2 n1
{ Space-Optimal Index = < b
b
: : : ; b
| {z } nr b; b; : : : ; b | {z } r > where b =
p C
b r 1 (b
nr +1 < C
r (b
nr .
Index = Single-comp
t index.
Index = Maximal-comp
t index.
Index
t space-optimal index. 17
SLIDE 18 Time-Optimal and Space-Optimal Indexes, C=100
1 2 3 4 5 6 7 20 40 60 80 100 Time (Expected Number of Bitmap Scans) Space (Number of Bitmaps) 1 2 3 4 5 7 2 3 4 5 6 n-Comp. Time-Optimal Index n-Comp. Space-Optimal Index
18
SLIDE 19 Knee Index, C = 100
1 2 3 4 5 6 7 20 40 60 80 100 Time (Expected Number of Bitmap Scans) Space (Number of Bitmaps) 1 2 3 4 5 7 n-Comp. Space-Optimal Index All Index
19
SLIDE 20 ' & $ % Space-Time T radeo Results (con t.) Time-Optimal Index under Space Constrain t
h space for the
solution is large!
2-step Heuristic Approac h: 1. Select an initial index that satises the space constrain t. 2. Iterativ ely adjust the base
index to impro v e its time-eciency .
Approac h is near-optimal. 20
SLIDE 21
6 files of N bits each 2 files of 3N bits each 1 file of 6N bits
( N = # tuples )
Storage Schemes for Bitmap Compression
Component-Level Storage (CS) Index-Level Storage (IS) Bitmap-Level Storage (BS)
21
SLIDE 22 ' & $ % Bitmap Compression
erimen tal Data (from TPC-D Benc hmark): { A ttribute: Lineitem.Qt y with C = 50 and 6M tuples. { Indexes: 6 n-comp
t space-optimal indexes. { Compression co de: zlib library (a LZ77 v arian t).
cBS, cCS, cIS for compressed storage sc hemes. 22
SLIDE 23 Compressibilit y
Storage Sc hemes (relativ e to 1-comp. index under BS)
0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 Compressibility Number of Components, n BS/CS/IS cBS cCS cIS
23
SLIDE 24 Space-Time T radeo
Compressed Indexes
5 10 15 20 25 30 5 10 15 20 25 30 35 40 Time (secs) Space (MB) BS cBS cCS
24
SLIDE 25 ' & $ % Conclusions
framew
to explore design space
bitmap indexes for selection queries.
space-time tradeo issues
guidelines for ph ysical database design using bitmap indexes.
uture W
{ Hybrid-enco ded bitmap indexes. { More general class
selection queries; e.g. A 2 fv 1 ; v 2 ; : : : ; v n g. 25