CREATE STATISTICS What is it for? Tomas Vondra - - PowerPoint PPT Presentation

create statistics
SMART_READER_LITE
LIVE PREVIEW

CREATE STATISTICS What is it for? Tomas Vondra - - PowerPoint PPT Presentation

pgconf.eu 2018 Lisbon, October 23-26, 2018 CREATE STATISTICS What is it for? Tomas Vondra <tomas.vondra@2ndquadrant.com> https://github.com/tvondra/create-statistics-talk https://www. 2ndQuadrant.com pgconf.eu 2018 Lisbon, October


slide-1
SLIDE 1

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

Tomas Vondra <tomas.vondra@2ndquadrant.com>

CREATE STATISTICS

What is it for?

slide-2
SLIDE 2

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

Agenda

  • Quick intro into planning and estimates.
  • Estimates with correlated columns.
  • CREATE STATISTICS to the rescue!

– functional dependencies – ndistinct

  • Future improvements.
slide-3
SLIDE 3

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

ZIP_CODES

C R E A T E T A B L E z i p _ c

  • d

e s ( p

  • s

t a l _ c

  • d

e V A R C H A R ( 2 ) , p l a c e _ n a m e V A R C H A R ( 1 8 ) , s t a t e _ n a m e V A R C H A R ( 1 ) , p r

  • v

i n c e _ n a m e V A R C H A R ( 1 ) , c

  • m

m u n i t y _ n a m e V A R C H A R ( 1 ) , l a t i t u d e R E A L , l

  • n

g i t u d e R E A L ) ; c a t c r e a t e

  • t

a b l e . s q l | p s q l t e s t c a t z i p

  • c
  • d

e s

  • p
  • r

t u g a l . c s v | p s q l t e s t

  • c

" c

  • p

y z i p _ c

  • d

e s f r

  • m

s t d i n "

  • h

t t p : / / d

  • w

n l

  • a

d . g e

  • n

a m e s .

  • r

g / e x p

  • r

t / z i p /

slide-4
SLIDE 4

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

EXPLAIN

E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T * F R O M z i p _ c

  • d

e s W H E R E p l a c e _ n a m e = ' L i s b

  • a

' ; Q U E R Y P L A N

  • S

e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 4 8 6 . 7 6 r

  • w

s = 8 5 8 8 w i d t h = 5 6 ) ( a c t u a l r

  • w

s = 9 1 6 6 l

  • p

s = 1 ) F i l t e r : ( ( p l a c e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) R

  • w

s R e m

  • v

e d b y F i l t e r : 1 9 7 7 7 5

slide-5
SLIDE 5

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

reltuples , relpages

S E L E C T r e l t u p l e s , r e l p a g e s F R O M p g _ c l a s s W H E R E r e l n a m e = ' z i p _ c

  • d

e s ' ; r e l t u p l e s | r e l p a g e s

  • +
  • 2

6 9 4 1 | 2 2 7 4

slide-6
SLIDE 6

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 S E L E C T * F R O M p g _ s t a t s W H E R E t a b l e n a m e = ' z i p _ c

  • d

e s ' A N D a t t n a m e = ' p l a c e _ n a m e ' ;

  • +
  • s

c h e m a n a m e | p u b l i c t a b l e n a m e | z i p _ c

  • d

e s a t t n a m e | p l a c e _ n a m e . . . | . . . m

  • s

t _ c

  • m

m

  • n

_ v a l s | { L i s b

  • a

, P

  • r

t

  • ,

" V i l a N

  • v

a d e G a i a " , M a i a , . . . } m

  • s

t _ c

  • m

m

  • n

_ f r e q s | { . 4 1 5 , . 2 6 3 3 3 , . 8 9 6 6 6 7 , . 8 9 3 3 3 3 , . . . } . . . | . . .

slide-7
SLIDE 7

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T * F R O M z i p _ c

  • d

e s W H E R E p l a c e _ n a m e = ' L i s b

  • a

' ; Q U E R Y P L A N

  • S

e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 4 8 6 . 7 6 r

  • w

s = 8 5 8 8 w i d t h = 5 6 ) ( a c t u a l r

  • w

s = 9 1 6 6 l

  • p

s = 1 ) r e l t u p l e s | 2 6 9 4 1 m

  • s

t _ c

  • m

m

  • n

_ v a l s | { L i s b

  • a

, … } m

  • s

t _ c

  • m

m

  • n

_ f r e q s | { . 4 1 5 , … } 2 6 9 4 1 * . 4 1 5 = 8 5 8 8 . 5 1 5

slide-8
SLIDE 8

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T * F R O M z i p _ c

  • d

e s W H E R E s t a t e _ n a m e = ' L i s b

  • a

' ; Q U E R Y P L A N

  • S

e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 4 8 6 . 7 6 r

  • w

s = 3 4 2 6 3 w i d t h = 5 6 ) ( a c t u a l r

  • w

s = 3 5 2 3 l

  • p

s = 1 ) r e l t u p l e s | 2 6 9 4 1 m

  • s

t _ c

  • m

m

  • n

_ v a l s | { L i s b

  • a

, … } m

  • s

t _ c

  • m

m

  • n

_ f r e q s | { . 1 6 5 5 6 7 , … } 2 6 9 4 1 * . 1 6 5 5 6 7 = 3 4 2 6 2 . 6

slide-9
SLIDE 9

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T * F R O M z i p _ c

  • d

e s W H E R E p l a c e _ n a m e = ' L i s b

  • a

' A N D s t a t e _ n a m e = ' L i s b

  • a

' ; Q U E R Y P L A N

  • S

e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 5 3 7 8 . 1 1 r

  • w

s = 1 4 2 2 w i d t h = 5 6 ) ( a c t u a l r

  • w

s = 9 1 6 5 l

  • p

s = 1 ) F i l t e r : ( ( ( p l a c e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) A N D ( ( s t a t e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) ) R

  • w

s R e m

  • v

e d b y F i l t e r : 1 9 7 7 7 6

Underestimate

slide-10
SLIDE 10

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

P ( A & B ) = P ( A ) * P ( B )

slide-11
SLIDE 11

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 S E L E C T * F R O M z i p _ c

  • d

e s W H E R E p l a c e _ n a m e = ' L i s b

  • a

' A N D s t a t e _ n a m e = ' L i s b

  • a

' ; P ( p l a c e _ n a m e = ' L i s b

  • a

' & c

  • u

n t y _ n a m e = ' L i s b

  • a

' ) = P ( p l a c e _ n a m e = ' L i s b

  • a

' ) * P ( s t a t e _ n a m e = ' L i s b

  • a

' ) = . 4 1 5 * . 1 6 5 5 6 7 = . 6 8 7 1 3 5 2 6 9 4 1 * . 6 8 7 1 3 5 = 1 4 2 1 . 8 9 8

slide-12
SLIDE 12

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

Underestimate

E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T * F R O M z i p _ c

  • d

e s W H E R E p l a c e _ n a m e = ' L i s b

  • a

' A N D s t a t e _ n a m e = ' L i s b

  • a

' ; Q U E R Y P L A N

  • S

e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 5 3 7 8 . 1 1 r

  • w

s = 1 4 2 2 w i d t h = 5 6 ) ( a c t u a l r

  • w

s = 9 1 6 5 l

  • p

s = 1 ) F i l t e r : ( ( ( p l a c e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) A N D ( ( s t a t e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) ) R

  • w

s R e m

  • v

e d b y F i l t e r : 1 9 7 7 7 6

slide-13
SLIDE 13

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T * F R O M z i p _ c

  • d

e s W H E R E p l a c e _ n a m e = ' L i s b

  • a

' A N D s t a t e _ n a m e ! = ' L i s b

  • a

' ; Q U E R Y P L A N

  • S

e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 5 3 7 8 . 1 1 r

  • w

s = 7 1 6 6 w i d t h = 5 6 ) ( a c t u a l r

  • w

s = 1 l

  • p

s = 1 ) F i l t e r : ( ( ( s t a t e _ n a m e ) : : t e x t < > ' L i s b

  • a

' : : t e x t ) A N D ( ( p l a c e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) ) R

  • w

s R e m

  • v

e d b y F i l t e r : 2 6 9 4

Overestimate

slide-14
SLIDE 14

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

Correlated columns

  • Attribute Value Independence Assumption (AVIA)

– may result in wildly inaccurate estimates – both underestimates and overestimates

  • consequences

– poor scan choices (Seq Scan vs. Index Scan) – poor join choices (Nested Loop)

slide-15
SLIDE 15

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 I n d e x S c a n u s i n g

  • r

d e r s _ c i t y _ i d x

  • n
  • r

d e r s ( c

  • s

t = . 2 8 . . 1 8 5 . 1 r

  • w

s = 9 w i d t h = 3 6 ) ( a c t u a l r

  • w

s = 1 2 2 4 8 2 3 7 l

  • p

s = 1 ) S e q S c a n u s i n g

  • n
  • r

d e r s ( c

  • s

t = . 1 3 . . 1 2 9 3 8 5 . 1 r

  • w

s = 1 2 2 4 8 2 3 7 w i d t h = 3 6 ) ( a c t u a l r

  • w

s = 9 l

  • p

s = 1 )

Poor scan choices

slide-16
SLIDE 16

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

  • >

N e s t e d L

  • p

( … r

  • w

s = 9 … ) ( … r

  • w

s = 1 2 2 4 8 2 3 7 … )

  • >

I n d e x S c a n u s i n g

  • r

d e r s _ c i t y _ i d x

  • n
  • r

d e r s ( c

  • s

t = . 2 8 . . 1 8 5 . 1 r

  • w

s = 9 w i d t h = 3 6 ) ( a c t u a l r

  • w

s = 1 2 2 4 8 2 3 7 l

  • p

s = 1 ) . . .

  • >

I n d e x S c a n … ( … l

  • p

s = 1 2 2 4 8 2 3 7 )

Poor join choices

slide-17
SLIDE 17

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

  • >

N e s t e d L

  • p

( … r

  • w

s = 9 … ) ( … r

  • w

s = 1 2 2 4 8 2 3 7 … )

  • >

N e s t e d L

  • p

( … r

  • w

s = 9 … ) ( … r

  • w

s = 1 2 2 4 8 2 3 7 … )

  • >

N e s t e d L

  • p

( … r

  • w

s = 9 … ) ( … r

  • w

s = 1 2 2 4 8 2 3 7 … )

  • >

I n d e x S c a n u s i n g

  • r

d e r s _ c i t y _ i d x

  • n
  • r

d e r s ( c

  • s

t = . 2 8 . . 1 8 5 . 1 r

  • w

s = 9 w i d t h = 3 6 ) ( a c t u a l r

  • w

s = 1 2 2 4 8 2 3 7 l

  • p

s = 1 ) . . .

  • >

I n d e x S c a n … ( … l

  • p

s = 1 2 2 4 8 2 3 7 )

  • >

I n d e x S c a n … ( … l

  • p

s = 1 2 2 4 8 2 3 7 )

  • >

I n d e x S c a n … ( … l

  • p

s = 1 2 2 4 8 2 3 7 )

  • >

I n d e x S c a n … ( … l

  • p

s = 1 2 2 4 8 2 3 7 )

Poor join choices

slide-18
SLIDE 18

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

functional dependencies (WHERE)

slide-19
SLIDE 19

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

Functional Dependencies

  • value in column A determines value in column B
  • trivial example: primary key determines everything

– zip code

{place, state, province, community → }

– 4625-113

{Favões, Porto, Marco de Canaveses, Favões → }

  • other dependencies:

– place

community →

– community

province →

– province

state →

slide-20
SLIDE 20

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

CREATE STATISTICS

C R E A T E S T A T I S T I C S s ( d e p e n d e n c i e s ) O N p l a c e _ n a m e , s t a t e _ n a m e , p r

  • v

i n c e _ n a m e F R O M z i p _ c

  • d

e s ; 2 3 4 A N A L Y Z E z i p _ c

  • d

e s ; S E L E C T s t x d e p e n d e n c i e s F R O M p g _ s t a t i s t i c _ e x t W H E R E s t x n a m e = ' s ' ; s t x d e p e n d e n c i e s

  • {

" 2 = > 3 " : . 7 8 9 4 6 7 , " 2 = > 4 " : . 7 7 4 3 3 3 , " 4 = > 2 " : . 9 3 3 , " 4 = > 3 " : . 9 9 3 1 6 7 , " 2 , 3 = > 4 " : . 9 5 1 6 6 7 , " 2 , 4 = > 3 " : . 9 9 8 3 3 3 , " 3 , 4 = > 2 " : . 9 3 3 }

slide-21
SLIDE 21

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

p l a c e → s t a t e : . 7 8 9 = d P ( p l a c e = ' L i s b

  • a

' & s t a t e = ' L i s b

  • a

' ) = P ( p l a c e = ' L i s b

  • a

' ) * [ d + ( 1

  • d

) * P ( s t a t e = ' L i s b

  • a

' ) ] 2 6 9 4 1 * . 4 1 5 * ( . 7 8 9 + ( 1

  • .

7 8 9 ) * . 1 6 5 6 ) = 7 7 6 . 5

slide-22
SLIDE 22

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T * F R O M z i p _ c

  • d

e s W H E R E p l a c e _ n a m e = ' L i s b

  • a

' A N D s t a t e _ n a m e = ' L i s b

  • a

' ; Q U E R Y P L A N

  • S

e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 5 3 7 8 . 1 1 r

  • w

s = 7 7 6 w i d t h = 5 6 ) ( a c t u a l r

  • w

s = 9 1 6 5 l

  • p

s = 1 ) F i l t e r : ( ( ( p l a c e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) A N D ( ( s t a t e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) ) R

  • w

s R e m

  • v

e d b y F i l t e r : 1 9 7 7 7 6

Underestimate : fied

slide-23
SLIDE 23

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T * F R O M z i p _ c

  • d

e s W H E R E p l a c e _ n a m e = ' L i s b

  • a

' A N D s t a t e _ n a m e ! = ' L i s b

  • a

' ; Q U E R Y P L A N

  • S

e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 5 3 7 8 . 1 1 r

  • w

s = 7 1 6 6 w i d t h = 5 6 ) ( a c t u a l r

  • w

s = 1 l

  • p

s = 1 ) F i l t e r : ( ( ( s t a t e _ n a m e ) : : t e x t < > ' L i s b

  • a

' : : t e x t ) A N D ( ( p l a c e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) ) R

  • w

s R e m

  • v

e d b y F i l t e r : 2 6 9 4 Functional dependencies only work with equalities.

Overestimate #1: not fied :-(

slide-24
SLIDE 24

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T * F R O M z i p _ c

  • d

e s W H E R E p l a c e _ n a m e = ' L i s b

  • a

' A N D s t a t e _ n a m e = ' P

  • r

t

  • '

; Q U E R Y P L A N

  • S

e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 5 3 7 8 . 1 1 r

  • w

s = 1 3 7 4 w i d t h = 5 6 ) ( a c t u a l r

  • w

s = l

  • p

s = 1 ) F i l t e r : ( ( ( p l a c e _ n a m e ) : : t e x t = ' L i s b

  • a

' : : t e x t ) A N D ( ( s t a t e _ n a m e ) : : t e x t = ' P

  • r

t

  • '

: : t e x t ) ) R

  • w

s R e m

  • v

e d b y F i l t e r : 2 6 9 4 1 The queries need to respect the functional dependencies.

Overestimate #2: not fied :-(

slide-25
SLIDE 25

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

ndistinct (GROUP BY)

slide-26
SLIDE 26

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T c

  • u

n t ( * ) F R O M z i p _ c

  • d

e s G R O U P B Y c

  • m

m u n i t y _ n a m e ; Q U E R Y P L A N

  • H

a s h A g g r e g a t e ( c

  • s

t = 3 4 4 1 2 6 . 9 2 . . 3 4 4 1 5 5 . 4 r

  • w

s = 2 8 6 w i d t h = 1 9 ) ( a c t u a l r

  • w

s = 3 8 4 5 l

  • p

s = 1 ) G r

  • u

p K e y : c

  • m

m u n i t y _ n a m e

  • >

S e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 2 7 7 9 1 . 9 5 r

  • w

s = 1 3 2 4 4 9 9 5 w i d t h = 1 1 ) ( a c t u a l r

  • w

s = 1 3 2 4 4 2 2 4 l

  • p

s = 1 ) P l a n n i n g T i m e : . 2 1 9 m s E x e c u t i

  • n

T i m e : 6 6 6 4 . 7 5 2 m s

slide-27
SLIDE 27

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 S E L E C T a t t n a m e , n _ d i s t i n c t F R O M p g _ s t a t s W H E R E t a b l e n a m e = ' z i p _ c

  • d

e s ' ; a t t n a m e | n _ d i s t i n c t

  • +
  • p

l a c e _ n a m e | 6 2 3 9 s t a t e _ n a m e | 2 l a t i t u d e | 5 5 3 2 l

  • n

g i t u d e | 5 1 3 5 p r

  • v

i n c e _ n a m e | 3 6 p

  • s

t a l _ c

  • d

e | 1 7 1 1 9 9 c

  • m

m u n i t y _ n a m e | 2 8 6 ( 7 r

  • w

s )

slide-28
SLIDE 28

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T c

  • u

n t ( * ) F R O M z i p _ c

  • d

e s G R O U P B Y p r

  • v

i n c e _ n a m e , c

  • m

m u n i t y _ n a m e ; Q U E R Y P L A N

  • G

r

  • u

p A g g r e g a t e ( c

  • s

t = 2 3 8 7 9 7 . 9 2 . . 2 5 2 9 1 3 5 . 7 5 r

  • w

s = 8 7 5 1 6 r

  • w

s = 8 7 5 1 6 w i d t h = 2 9 ) ( a c t u a l r

  • w

s = 3 8 4 5 r

  • w

s = 3 8 4 5 l

  • p

s = 1 ) G r

  • u

p K e y : p r

  • v

i n c e _ n a m e , c

  • m

m u n i t y _ n a m e

  • >

S

  • r

t ( c

  • s

t = 2 3 8 7 9 7 . 9 2 . . 2 4 2 1 8 3 . 4 1 r

  • w

s = 1 3 2 4 4 9 9 5 w i d t h = 2 1 ) ( a c t u a l r

  • w

s = 1 3 2 4 4 2 2 4 l

  • p

s = 1 ) S

  • r

t K e y : p r

  • v

i n c e _ n a m e , c

  • m

m u n i t y _ n a m e S

  • r

t M e t h

  • d

: e x t e r n a l m e r g e D i s k : 4 1 5 6 2 4 k B

  • >

S e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 2 7 7 9 1 . 9 5 r

  • w

s = 1 3 2 4 4 9 9 5 w i d t h = 2 1 ) ( a c t u a l r

  • w

s = 1 3 2 4 4 2 2 4 l

  • p

s = 1 ) P l a n n i n g T i m e : 1 . 1 1 6 m s E x e c u t i

  • n

T i m e : 4 8 5 9 1 . 3 2 6 m s

slide-29
SLIDE 29

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T c

  • u

n t ( * ) F R O M z i p _ c

  • d

e s G R O U P B Y p r

  • v

i n c e _ n a m e , c

  • m

m u n i t y _ n a m e ; Q U E R Y P L A N

  • G

r

  • u

p A g g r e g a t e G r

  • u

p A g g r e g a t e ( c

  • s

t = 2 3 8 7 9 7 . 9 2 . . 2 5 2 9 1 3 5 . 7 5 r

  • w

s = 8 7 5 1 6 r

  • w

s = 8 7 5 1 6 w i d t h = 2 9 ) ( a c t u a l r

  • w

s = 3 8 4 5 r

  • w

s = 3 8 4 5 l

  • p

s = 1 ) G r

  • u

p K e y : p r

  • v

i n c e _ n a m e , c

  • m

m u n i t y _ n a m e

  • >

S

  • r

t S

  • r

t ( c

  • s

t = 2 3 8 7 9 7 . 9 2 . . 2 4 2 1 8 3 . 4 1 r

  • w

s = 1 3 2 4 4 9 9 5 w i d t h = 2 1 ) ( a c t u a l r

  • w

s = 1 3 2 4 4 2 2 4 l

  • p

s = 1 ) S

  • r

t K e y : p r

  • v

i n c e _ n a m e , c

  • m

m u n i t y _ n a m e S

  • r

t M e t h

  • d

: e x t e r n a l m e r g e D i s k : 4 1 5 6 2 4 k B

  • >

S e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 2 7 7 9 1 . 9 5 r

  • w

s = 1 3 2 4 4 9 9 5 w i d t h = 2 1 ) ( a c t u a l r

  • w

s = 1 3 2 4 4 2 2 4 l

  • p

s = 1 ) P l a n n i n g T i m e : 1 . 1 1 6 m s E x e c u t i

  • n

T i m e : 4 8 5 9 1 . 3 2 6 m s

slide-30
SLIDE 30

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

n d i s t i n c t ( p r

  • v

i n c e , c

  • m

m u n i t y ) = n d i s t i n c t ( p r

  • v

i n c e ) * n d i s t i n c t ( c

  • m

m u n i t y ) 3 6 * 2 8 6 = 8 7 5 1 6

slide-31
SLIDE 31

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 C R E A T E S T A T I S T I C S s ( n d i s t i n c t ) O N s t a t e _ n a m e , p r

  • v

i n c e _ n a m e , c

  • m

m u n i t y _ n a m e F R O M z i p _ c

  • d

e s ; A N A L Y Z E z i p _ c

  • d

e s ; S E L E C T s t x n d i s t i n c t F R O M p g _ s t a t i s t i c _ e x t ; s t x n d i s t i n c t

  • {

" 3 , 4 " : 3 8 , " 3 , 5 " : 2 8 5 8 , " 4 , 5 " : 2 8 5 8 , " 3 , 4 , 5 " : 2 8 5 8 }

slide-32
SLIDE 32

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018 E X P L A I N ( A N A L Y Z E , T I M I N G

  • f

f ) S E L E C T c

  • u

n t ( * ) F R O M z i p _ c

  • d

e s G R O U P B Y p r

  • v

i n c e _ n a m e , c

  • m

m u n i t y _ n a m e ; Q U E R Y P L A N

  • H

a s h A g g r e g a t e ( c

  • s

t = 1 2 5 6 9 . 2 6 . . 1 2 5 9 7 . 8 4 r

  • w

s = 2 8 5 8 w i d t h = 3 6 ) ( a c t u a l r

  • w

s = 3 8 4 5 l

  • p

s = 1 ) G r

  • u

p K e y : s t a t e _ n a m e , p r

  • v

i n c e _ n a m e , c

  • m

m u n i t y _ n a m e

  • >

S e q S c a n

  • n

z i p _ c

  • d

e s ( c

  • s

t = . . . 6 9 4 6 7 . 1 3 r

  • w

s = 3 3 1 2 1 3 w i d t h = 2 8 ) ( a c t u a l r

  • w

s = 3 3 1 1 5 6 l

  • p

s = 1 ) P l a n n i n g T i m e : 1 . 3 6 7 m s E x e c u t i

  • n

T i m e : 2 3 4 3 . 8 4 6 m s

slide-33
SLIDE 33

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

ndistinct

  • the “old behavior” was defensive

– unreliable estimates with multiple columns – HashAggregate can’t spill to disk (OOM) – rather than crash do Sort+GroupAggregate (slow)

  • ndistinct coefcients

– make multi-column ndistinct estimates more reliable – reduced danger of OOM – large tables + GROUP BY multiple columns

slide-34
SLIDE 34

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

Future Improvements

  • additional types of statistics

– MCV lists, histograms, …

  • statistics on expressions

– currently only simple column references – alternative to functional indexes

  • improving join estimates

– using MCV lists – special multi-table statistics (syntax already supports it)

slide-35
SLIDE 35

https://github.com/tvondra/create-statistics-talk

https://www.2ndQuadrant.com

pgconf.eu 2018

Lisbon, October 23-26, 2018

Questions?

Tomas Vondra

tomas.vondra@2ndquadrant.com tomas@pgaddict.com

@fuzzycz