db 2
play

DB 2 08 Predicate Evaluation Summer 2018 Torsten Grust Universitt - PowerPoint PPT Presentation

01 DB 2 08 Predicate Evaluation Summer 2018 Torsten Grust Universitt Tbingen, Germany 02 1 Q Predicate (or Filter) Evaluation SQL's WHERE / HAVING / FILTER clauses use expressions of type Boolean ( predicates ) to filter


  1. 01 DB 2 08 – Predicate Evaluation Summer 2018 Torsten Grust Universität Tübingen, Germany

  2. 02 1 ┆ Q ₇ — Predicate (or Filter) Evaluation SQL's WHERE / HAVING / FILTER clauses use expressions of type Boolean ( predicates ) to filter rows. Predicates may use Boolean connectives ( AND , OR , NOT ) to build complex filters from simple predicate building blocks: SELECT t.a, t.b FROM ternary AS t WHERE t.a % 2 = 0 AND [ OR ] t.c < 1 -- either AND or OR Evaluate predicate for every row t scanned. Here: assume that evaluation of the predicate is not supported by a specific index. ( ⚠ Index support for predicates is essential → see upcoming chapters.)

  3. 03 Using EXPLAIN on Q ₇ !"#$%&' (!)*+,! ,!$!-. t.a, t.b /)+0 ternary %, t -- 1000 rows 12!)! t.a % 2 = 0 %'3 t.c < 1; ┌───────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────┤ | " " | │ Seq Scan on ternary t (cost=… rows=1 …) (actual time=… rows=5 …) │ │ Filter: ((c < '1'::double precision) AND ((a % 2) = 0)) # │ │ Rows Removed by Filter: 995 │ │ Planning time: 2.125 ms $ │ │ Execution time: 1.894 ms │ └───────────────────────────────────────────────────────────────────┘ Filter predicate evaluated during Seq Scan . Estimated selectivity of predicate ¹ / ₁₀₀₀ (real: ⁵ / ₁₀₀₀ ).

  4. 04 t.a % 2 = 0 AND t.c < 1 : An Expression of Type bool In the absence of index support, use the regular expression interpreter to evaluate predicates: SCAN_FETCHSOME(t, [a, c]) SCAN_VAR(c) ───────╮ CONST(1) ────────────╮ FUNCEXPR_STRICT(<, •, •) ─╮ BOOL_AND_STEP_FIRST( •) # if • = >?@AB , immediately yield >?@AB SCAN_VAR(a) ───────╮ # ( ∧ semantics: >?@AB ∧ D = >?@AB ) CONST(2) ────────────╮ FUNCEXPR_STRICT(%, •, •) ─╮ CONST(0) ──────────╮ │ FUNCEXPR_STRICT(=, •, •) ─╮ BOOL_AND_STEP_LAST( •) # yield • ( ∧ semantics: EFGB ∧ D = D ) Uses “jumps” in program to implement Boolean shortcut .

  5. 05 Heuristic Predicate Simplification Predicate evaluation effort is multiplied by the number of rows processed. Even small simplifcations add up. PostgreSQL performs basic predicate simplifications: Reduce constant expressions to true / false . Apply basic identities (e.g., NOT(NOT( D )) ≡ D and ( D AND I ) OR ( D AND J ) ≡ D AND ( I OR J ) ). Remove duplicate clauses (e.g., D AND D ≡ D ) Apply De Morgan's laws. ⚠ These are heuristics (expected to improve evaluation time): selectivity is not yet taken into account.

  6. 06 Machine-Generated Queries and Predicate Simplification Automatically generated SQL text may differ significantly from human-authored queries. Consider a web search form: ┌─────────────────────────────┒ 1. User enters search keys for │ ⮾ Search ternary... ┃ columns M and/or N . ├─────────────────────────────┨ │ ┌─────────────┐ ┃ 2. Web form maps missing keys to │ M : │⌕ 42 ⠄⠄⠄⠄⠄⠄ │ ┃ 'R$$ (interpret as wildcard). │ └─────────────┘ ┃ │ ┌─────────────┐ ┃ 3. DBMS executes parameterized query: │ N : │⌕ ⠄⠄⠄⠄⠄⠄⠄⠄⠄⠄ │ ┃ │ └─────────────┘ ┃ ,!$!-. t.* │ ╭──────╮ ┃ /)+0 ternary AS t *↗ ┃ │ │ SUBMIT 12!)! (t.a = : T +) : T &, 'R$$ ) │ ╰──────╯ ┃ %'3 (t.c = : W +) : W &, 'R$$ ) ┕━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

  7. 07 Heuristics May Not Be Enough Heuristics only go so far. The (estimated) cost of evaluation may suggest better predicate rewrites: SELECT t.* (expected) N[\] FROM ternary_10m AS t WHERE length (btrim(t.b, '0…9')) < length (t.b) D₁ ┝━━━━━ OR t.a % 1000 <> 0 D₂ ┝━ With Boolean shortcut it makes a difference which disjunct is evaluated first. (Both predicates not selective, *₁ : 85.9%, *₂ : 99.9% of 10 ⁷ rows pass.) ⇒ Many optimizer decisions indeed are cost-based .

  8. 08 2 ┆ Q ₇ — Predicate (or Filter) Evaluation " SELECT t.a, t.b FROM ternary AS t WHERE t.a % 2 = 0 AND [ OR ] t.c < 1 -- either AND or OR MonetDB can evaluate basic predicates on individual column BATs (here: a and c ) ➊ but then needs to 1. derive the result of composite predicates ➋ and 2. propagate the filter effect to all output columns (here: a , b ) ➌ to form the final selection result.

  9. 09 Using EXPLAIN on Q ₇ (Boolean Connective: OR ) sql> !"#$%&' ,!$!-. t.a, t.b /)+0 ternary %, t 12!)! t.a % 2 = 0 +) t.c < 1; ⋮ ternary :bat[:oid] := sql.tid(sql, "sys", "ternary"); a0 :bat[:int] := sql.bind(sql, "sys", "ternary", "a", 0:int); a :bat[:int] := algebra.projection(ternary, a0); e1 :bat[:int] := batcalc.%(a, 2:int); # a % 2 ➊ p1 :bat[:oid] := algebra.thetaselect(e1, 0:int, "=="); # D₁ ≡ a % 2 = 0 c0 :bat[:dbl] := sql.bind(sql, "sys", "ternary", "c", 0:int); c :bat[:dbl] := algebra.projection(ternary, c0); ➊ p2 :bat[:oid] := algebra.thetaselect(c, 1:dbl, "<"); # D₂ ≡ c < 1 ➋ or :bat[:oid] := bat.mergecand(p1, p2); # D₁ ∨ D₂ b0 :bat[:str] := sql.bind(sql, "sys", "ternary", "b", 0:int); ➌ bres :bat[:str] := algebra.projectionpath(or, ternary, b0); # result col b ➌ ares :bat[:int] := algebra.projection(or, a); # result col a ⋮

  10. 10 Result of a Predicate ≡ Selection Vectors ┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈ \hihN]j[k lhN][m\ ┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈    ┊ ┌┄┄┄┄┬────┐ ┊ ┊ ┊ head │ tail │ ┊ ┊ ┝━━━━┽────┤ ┊  | ⋯⋯⋯⋯⋯┊ 0@0 │ 0@0 │  ┌┄┄┄┄┬────┐ │ ┊ 1@0 │ 1@0 │ ┌┄┄┄┄┬────┐ ┊ head │ tail │ mergecand(•,•) ┊ 2@0| 3@0 │ ┊ head │ tail │ ┝━━━━┽────┤ ╱ ╲ └┄┄┄┄┴────┘ ┝━━━━┽────┤ ┊ 0@0 │ 1@0 │   ┊ 0@0 │ 0@0 │ ┊ 1@0 │ 3@0 │⋯⋯ thetaselect(•,0:int,==) thetaselect(•,1:dbl,<) ⋯⋯┊ 1@0 │ 1@0 │ └┄┄┄┄┴────┘ │ │ └┄┄┄┄┴────┘ ┌┄┄┄┄┬────┐ │ │ ┌┄┄┄┄┬────┐ ┊ head │ tail │⋯⋯⟦ a % 2 ⟧ ⟦ c ⟧⋯⋯┊ head │ tail │ ┝━━━━┽────┤ ┝━━━━┽────┤ ┊ 0@0 │ 1 │ ┊ 0@0 │ 0.8 │ ┊ 1@0 │ 0 │ ┊ 1@0 │ 0.9 │ ┊ 2@0 │ 1 │ ┊ 2@0 │ 1.0 │ ┊ 3@0 │ 0 │ ┊ 3@0 │ 1.1 │ └┄┄┄┄┴────┘ └┄┄┄┄┴────┘

  11. 11 Selection Vectors (also: Candidate Lists) Selection vector ~ : BAT of type bat[:oid] . Ä @0 ∈ ~ ⇔ Ä th input row satisfies filter predicate. Use algebra.projection( ~ , WÅÇ ) to propagate filter effect to column WÅÇ . Implement Boolean connectives for predicate Dᵢ with ~ᵢ : D₁ OR D₂ : bat.mergecand( ~₁ , ~₂ ) D₁ AND D₂ : algebra.projectionpath( ~₂ , ~₁ ,•) with algebra.projectionpath( ~₂ , ~₁ ,•) ≡ algebra.projection( ~₂ , algebra.projection( ~₁ ,•)) .

  12. 12 3 ┆ Implementing Selection in Tight Loops Under a layer of C macros, the core of MonetDB's filtering routine A := thetaselect( WÅÇ :bat[:int],  :int, Ñ ) resembles: int thetaselect(int * A , int * WÅÇ , int  , Ñ ) { int SIZE = ‹ ÖGÜáBF à> FàâA äÖ WÅÇ ›; /* input cardinality */ int out = 0; ã[m (int i = 0; i < SIZE; i += 1) { ⎜ jã ( WÅÇ [i] Ñ  ) { /* test filter condition */ ⎜ ⎜ ~ [out] = i; /* build selection vector */ ⎜ ⎜ out += 1; ⎜ ⎣ } ⎣ } return out; /* output cardinality */ }

  13. 13 Instruction Pipelining in Modern CPUs Control flow branches ( for , but particularly if ) are a challenge for modern pipelining CPUs: 1* (write back to register) ┊ 0!0 (read/write memory) ┊ ┊ !" (execute) ┊ ┊ ┊ &3 (instruction decode, branch flag test) ┊ ┊ ┊ ┊ &/ (instruction fetch) ┊ ┊ ┊ ┊ ┊ ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ # Ä instruction ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ # Ä+ 1 ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ # Ä+ 2 AE?@@ ( ⬚⬚ ): ins # Ä +3 ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ ⬚⬚ # Ä+ 3 cannot fetch its opcode ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ ⬛⬛ # Ä+ 4 ( &3 ) while ins # Ä time accesses memory ( 0!0 )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend