effidient ti imeiieries t
play

Effidient!Tiimeiieries T with PostgreSQL iTeveiiimps Tont! FOSDEM - PowerPoint PPT Presentation

Effidient!Tiimeiieries T with PostgreSQL iTeveiiimps Tont! FOSDEM PGDay 2018 steve@smpsn.net Overview Overview Background Overview Background Complexity Overview Background Complexity Time Series Overview Schema Indexing


  1. RelaTiont!aliModeli–iiint!gleiieries TiQuery ● Fint!dimeas Turement!Ts T SELECT – Specify time range TIME_ROUND(timestamp, 60), AVG(value) – Specify metric name FROM – Specify dimension value measurements WHERE ● AggregaTeidaTaipoint!Ts T timestamp BETWEEN '2015-01-01Z00:00:00' AND – Round to desired interval '2015-01-01Z01:00:00' AND name = – Group by that interval 'cpu.percent' AND dimensions @> – T ake average of all data '{"host": "dev-01"}'::JSONB points in that interval GROUP BY 1

  2. RelaTiont!aliModeli–iPerformant!fideiAnt!alys Tis T Query Duration (seconds) Query Duration (seconds) Data Volume Time Range (M/rows) (seconds) 喘argeT:i<100ms T (QueryiDuraTiont!)

  3. RelaTiont!aliModeli–iiint!gleiieries TiQueryi(vs Ti喘imeiRant!ge) 0.40 0.35 Query Duration (seconds) 0.30 0.25 0.20 3M Rows 2M Rows 0.15 1M Rows 0.10 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Query Time Range (seconds)

  4. RelaTiont!aliModeli–iiint!gleiieries TiQueryi(vs TiDaTaiVolume) 1.40 1.20 Query Duration (seconds) 1.00 0.80 0.60 0.40 0.20 0.00 0 1 2 3 4 5 6 7 8 9 10 Data Volume (M-rows)

  5. RelaTiont!aliModeli-iAnt!alys Tis T ✔ QueryiTimeifxedi ✗ QueryiTimeis Tfidales Ti regardles Ts TiofiTimei lint!earlyiwiThidaTai rant!ge volume ✔ Ont!iTargeTifor ✗ Everyiqueryireads Ti <i~1Mirows T everyirow ✗ Full table scan

  6. Int!dexint!g

  7. Int!dexint!g ● 喘imes TTamps Tiareies Ts Tent!Tiallyiint!Tegers T ● Pos TTgreiQLihas Timant!yiint!dexiTypes T B喘REE,iHAiH,iBRIN,iGIN,iGIi喘 ● B喘REEiexfidellent!TiforiEqualiTyiant!diBeTweent!

  8. Int!dexint!gi–iB喘REE 3 three 1 2 two 2 3 4 four 6 six 6 4 5 1 one 5 five 7 8 eight 8 7 seven Index Table

  9. Int!dexint!gi–iB喘REE 3 three 1 2 two =7 2 3 4 four 6 six 6 4 5 1 one 5 five 7 8 eight 8 7 seven Index Table

  10. Int!dexint!gi–iB喘REE 3 three 1 >= 6 2 two 2 <= 8 3 4 four 6 six 6 4 5 1 one 5 five 7 8 eight 8 7 seven Index Table

  11. Int!dexint!gi–iiint!gleiieries TiQuery ● BE喘WEENipredifidaTe SELECT ● Elimint!aTes TihugeiporTiont!i TIME_ROUND(timestamp, 60), ofiTableifidont!Tent!Ts T AVG(value) FROM – High selectivity measurements WHERE ● Exfidellent!Tifidant!didaTeifori timestamp BETWEEN '2015-01-01Z00:00:00' AND int!dexifidreaTiont! '2015-01-01Z01:00:00' AND name = 'cpu.percent' AND dimensions @> '{"host": "dev-01"}'::JSONB GROUP BY 1

  12. Int!dexint!gi-i喘imes TTamp ● ipefidifyiTableiToiint!dex ● ipefidifyiint!dexiType – Optional: BTREE is default ● ipefidifyifidolumnt!iToiint!dex CREATE INDEX ON measurements USING BTREE (timestamp);

  13. Int!dexint!gi–iiint!gleiieries TiQueryi(vs TiDaTaiVolume) 1.40 1.20 1.00 Query Duration (seconds) 0.80 No Index 0.60 With Index 0.40 0.20 0.00 0 1 2 3 4 5 6 7 8 9 10 Data Volume (millions/rows)

  14. Int!dexint!gi–iiint!gleiieries TiQueryi(vs TiDaTaiVolume) 0.025 0.020 Query Duration (seconds) 0.015 9000 8000 0.010 7000 0.005 0.000 0 1 2 3 4 5 6 7 8 9 10 Data Volume (millions/rows)

  15. Int!dexint!gi–iiint!gleiieries TiQueryi(vs Ti喘imeiRant!ge,i10MiRows T) 0.10 0.09 0.08 0.07 Query Duration (seconds) 0.06 0.05 1 Metric 0.04 10 Metrics 0.03 0.02 0.01 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Query Time Range (seconds)

  16. Int!dexint!gi–iiint!gleiieries TiQueryi(vs Ti喘imeiRant!ge,i10MiRows T) 0.25 0.20 Query Duration (seconds) 0.15 1 Metric 10 Metrics 0.10 100 Metrics 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Query Time Range (seconds)

  17. Int!dexint!gi-iAnt!alys Tis T ✔ DaTaiVolume ✗ 喘imeiRant!ge ✗ Over 4000s (100 Metrics) ✔ T o 10M ✗ Nowiapparent!Tiqueryi ✔ 喘imeiRant!ge duraTiont!iint!fidreas Tes Tias Ti ✔ T o 9000s (10 Metrics) 喘imeiRant!geigrows T ✔ QueryiTimeis TTableias Ti ✗ Int!fidreas Tint!gint!umberiofi DaTaiVolumeiint!fidreas Tes T meTrifids Tidras TTifidallyi afefidTs TiqueryiduraTiont! ✗ Data for each uninteresting series must be fltered out

  18. Int!dexint!gi–iiint!gleiieries TiQuery ● Moreiint!dexint!gd SELECT – name TIME_ROUND(timestamp, 60), AVG(value) – dimensions FROM measurements WHERE timestamp BETWEEN '2015-01-01Z00:00:00' AND '2015-01-01Z01:00:00' AND name = 'cpu.percent' AND dimensions @> '{"host": "dev-01"}'::JSONB GROUP BY 1

  19. Int!dexint!gi–iAddiTiont!al ● CreaTeint!ewiint!dexes Tiont!i meas Turement!Ts TiTable ● ipefidifyi nt!ame CREATE INDEX ON measurements – Equality: Use BTREE USING BTREE ● ipefidifyi diment!s Tiont!s T (name); CREATE INDEX ON – Containment: Use GIN measurements – Find contents of JSON USING GIN (dimensions);

  20. Int!dexint!gi–iieries TiQueryi(vs Ti喘imeiRant!ge,i10MiRows T,i100iMeTrifids T) 0.25 0.20 Query Duration (seconds) 0.15 Time & Metric 0.10 Time Index 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Query Time Range (seconds)

  21. Int!dexint!g ✗ Ohidear ✗ 喘ooimufidhiint!dexint!gifidant!ibeiharmful

  22. Normalis TaTiont!

  23. Normalis TaTiont! CREATE TABLE values ( timestamp TIMESTAMPTZ, value FLOAT8, CREATE TABLE measurements ( metric_id INT, timestamp TIMESTAMPTZ, value_meta JSON value FLOAT8, ); name VARCHAR, CREATE TABLE metrics ( dimensions JSONB, id SERIAL, value_meta JSON name VARCHAR, ); dimensions JSONB, UNIQUE (name, dimensions) );

  24. Normalis TaTiont! ● Values Tis TToredibyiint!Tegeri id CREATE TABLE values ( – References entry in metric table timestamp TIMESTAMPTZ, – The name/dimensions for each value FLOAT8, metric are only stored once metric_id INT, – Eliminates repeated bulky data value_meta JSON in measurements table ); ● MeTrifidiTableidefnt!es Ti id CREATE TABLE metrics ( – SERIAL produces incrementing id SERIAL, integers to allot id values name VARCHAR, dimensions JSONB, – UNIQUE constraint is useful UNIQUE (name, dimensions) during normalisation ); Implicitly creates suitable index ●

  25. Normalis TaTiont!i–iView ● Mimifidi meas Turement!Ts T CREATE VIEW measurements – Views can be queried in AS the same way as tables SELECT timestamp, ● Defnt!ediwiThiiELEC喘 value, – Query to run which name, dimensions, produces contents of view value_meta ● Joint!int!ormalis TediTables T FROM values ● Cant!ire-us Teis Tamei INNER JOIN queries Tias Tibefore metrics ON (metric_id = id);

  26. Normalis TaTiont!i–iViewiInt!s TerT ● Cant!’Tiint!s TerTidaTaiint!Toi CREATE RULE measurements_insert views TibyidefaulT AS ON INSERT TO measurements DO INSTEAD ● Cant!is Tpefidifyiant!iafidTiont!i INSERT INTO values ( timestamp, Toiperformiont!iINiER喘 value, ● Int!s TerTiint!Toi values T metric_id, value_meta ● HelperiprofidedureiToi ) VALUES ( NEW.timestamp, allofidaTei meTrifid_id NEW.value, create_metric ( ● Normalis TaTiont!iis Ti NEW.name, NEW.dimensions), Trant!s Tparent!Tiforius Ter NEW.value_meta );

  27. Normalis TaTiont!i–iMeTrifidiLookup ● iTorediprofidedure CREATE FUNCTION create_metric ( in_name VARCHAR, – T ake name/dimensions in_dims JSONB ) RETURNS INT LANGUAGE plpgsql AS $_$ – Returns metric_id DECLARE out_id INT; ● Fint!diexis TTint!gimeTrifid BEGIN SELECT id INTO out_id – Return existing id FROM metrics AS m WHERE m.name = in_name AND ● Ifint!ew,iThent!iINiER喘 m.dimensions = in_dims; IF NOT FOUND THEN INSERT INTO metrics – Allocates new id ("name", "dimensions") – Return the new id VALUES (in_name, in_dims) RETURNING id INTO out_id; END IF; RETURN out_id; END; $_$;

  28. Normalis TaTiont!i-iInt!dexint!g ● 喘imes TTampiint!dex – Same as before ● Newiint!dexiont!imeTrifid_id CREATE INDEX ON values – Allow effjcient fltering of USING BTREE metrics during JOIN (timestamp); – Serves similar purpose to CREATE INDEX ON existing metric indexing values USING BTREE (metric_id);

  29. Normalis TaTiont!i–iieries TiQueryi(vs Ti喘ime,i10MiRows T,i100iMeTrifids T) 0.25 0.20 Query Duration (seconds) 0.15 Normalised Denormalised (Time Index) 0.10 Denormalised (Extra Index) 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Query Time Range (seconds)

  30. Normalis TaTiont! ✔ Normalis TaTiont!i ✗ 喘heimeTrifidiint!dexint!gi elimint!aTedioverheadiofi s TTillidoes Tnt!’Tihaveiai addiTiont!alimeTrifidi pos TiTiveiefefidT int!dexint!g

  31. Normalis TaTiont!i–iBiTmapiInt!dexiifidant! :02 time value metric time metric 2 :03 index index A 10:01 . 1 B 10:01 . 2 B C C 10:02 . 1 D D D 10:02 . 2 E F E 10:03 . 1 F H F 10:03 . 2 G 10:04 . 1 D H 10:04 . 2 F

  32. Normalis TaTiont!i–iMulTi-Columnt!iInt!dexint!g time value metric :02 time 2 metric A 10:01 . 1 :03 index B 10:01 . 2 C 10:02 . 1 D D 10:02 . 2 F E 10:03 . 1 F 10:03 . 2 G 10:04 . 1 H 10:04 . 2

  33. Normalis TaTiont!i–iMulTi-Columnt!iInt!dexint!g CREATE INDEX ON values CREATE INDEX ON USING BTREE values (timestamp, metric_id); USING BTREE (timestamp); CREATE INDEX ON values CREATE INDEX ON USING BTREE values (metric_id); USING BTREE (metric_id, timestamp);

  34. Normalis TaTiont!i–iieries TiQueryi(vs TiRant!ge,i10MiRows T,i100iMeTrifids T) 0.25 0.20 Query Duration (seconds) 0.15 Normalised (Single Index) Normalised 0.10 Denormalised (Time Index) Denormalised (Extra Index) 0.05 0.00 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Query Time Range (seconds)

  35. Normalis TaTiont! ● Int!fidreas Teivolumei10MiToi100M – @ 1Hz / 100 Metrics: 1M seconds Before: ~1.15 days ● Now: ~11.5 days ● ● Int!fidreas TeimaxiTimeirant!ges Tifromi9000s TiToi90,000s T – Before: 2.5 hours – Now: 1.04 days

  36. Normalis TaTiont!i–iieries TiQueryi(vs TiVolume,i10iMeTrifids T) 0.12 0.1 0.08 Query Duration (seconds) 0.06 10000 20000 30000 0.04 0.02 0 0 10 20 30 40 50 60 70 80 90 100 Data Volume (M-rows)

  37. Normalis TaTiont!i–iieries TiQueryi(vs TiRant!ge,i100MiRows T) 0.16 0.14 0.12 Query Duration (seconds) 0.10 0.08 1 Metric 10 Metrics 0.06 0.04 0.02 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)

  38. Normalis TaTiont!i–iieries TiQueryi(vs TiRant!ge,i100MiRows T) 0.90 0.80 0.70 0.60 Query Duration (seconds) 0.50 1 Metric 0.40 10 Metrics 100 Metrics 0.30 0.20 0.10 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)

  39. Normalis TaTiont!i–iieries TiQueryi(vs TiRant!ge,i100MiRows T)i(+Cont!fg) 0.90 0.80 0.70 0.60 Query Duration (seconds) 0.50 1 Metric 0.40 10 Metrics 100 Metrics 0.30 0.20 0.10 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)

  40. Normalis TaTiont!i-iAnt!alys Tis T ✗ 喘imeiRant!ge ✔ DaTaiVolume ✗ Over 30,000s (100 ✔ T o 100M Metrics) ✔ 喘imeiRant!ge ✗ Over 90,000s ✔ T o 90,000s (10 Metrics) ✗ NeediaibeTTeris TTraTegyi foris Tervifidint!gilargeri Timeirant!ges T

  41. iummaris Tint!g

  42. iummaris Tint!gi-iProblem ● Fori100imeTrifids T,is Tomeiqueries Timis Ts TiTargeTi100ms T – Over ~40Ks (~11 days) ● QueryimighTibeireTurnt!int!giupiToi40Kipoint!Ts T ● Is TiThis TiafidTuallyint!efides Ts Taryd – Especially if data is simply used for visualisation – An average 1080p monitor only has ~2000 pixels ● LeTs Tis Tayi4000ipoint!Ts Tiareient!ough,iorievent!i400

  43. iummaris Tint!gi-iExample values values_2 time value metric time sum metric 10:00 10 1 10:00 30 10 30 1 10:00 2 2 10:00 8 2 8 2 10:01 20 1 10:02 20 20 5 1 10:01 6 2 10:02 4 5 5 2 10:02 5 1 10:02 4 2 ✔ iummaryiTableijus TTiai 10:03 15 1 frafidTiont!iofiTheis Tize 10:03 1 2

  44. iummaris Tint!gi ● CreaTeivalues TiTable – Use for 10:1 summary CREATE TABLE values_10 ( timestamp TIMESTAMPTZ, ● Ont!eient!Try/Timeiperiod metric_id INT, – Per metric sum FLOAT8, count FLOAT8, – UNIQUE provide indexing min FLOAT8, ● MulTipleiaggregaTes T max FLOAT8, – SUM UNIQUE (metric_id, – COUNT timestamp) ); – MIN – MAX

  45. iummaris Tint!g ● CreaTeiaiviewias Tibefore – Only storing metric_id ● iimplifes Tiqueries T CREATE VIEW summary_10 AS ● Joint!s TimeTrifididefnt!iTiont!s T SELECT * FROM values_10 INNER JOIN metrics ON (metric_id = id);

  46. iummaris Tint!gi–i喘riggeriDefnt!iTiont! ● BoilerplaTe ● Defnt!eiTriggerifunt!fidTiont! CREATE FUNCTION summarise_10 () – Stored procedure RETURNS TRIGGER LANGUAGE plpgsql AS $_$ – Contents omitted BEGIN : ● 喘riggeriToiexefiduTe… END; $_$; – On INSERT CREATE TRIGGER summarise_10_t – T o values table AFTER INSERT ON values FOR EACH ROW – Data passed to procedure EXECUTE PROCEDURE summarise_10 ();

  47. iummaris Tint!gi–i喘riggeriAfidTiont! ● Int!s TerTiint!Tois Tummary INSERT INTO values_10 VALUES ( TIME_ROUND(NEW.timestamp, 10), NEW is inserted data – NEW.metric_id, ● Rount!diTimeiToiperiod NEW.value, 1, 10 seconds NEW.value, – NEW.value ● Int!iTialiaggregaTeivalues T ) ON CONFLICT (metric_id, ● Ifient!Tryiexis TTs Tialready timestamp) DO UPDATE SET ● UpdaTeiint!s TTead sum = sum + EXCLUDED.sum, count = count + EXCLUDED.count, EXCLUDED is current row – min = LEAST (min,EXCLUDED.min), max = GREATEST(max,EXCLUDED.max) Combine new value with – ; existing aggregate value

  48. iummaris Tint!gi–iiint!gleiieries TiQuery ● Mos TTlyiunt!fidhant!ged SELECT ● Queryis TummaryiTable,i TIME_ROUND(timestamp, 60), nt!oTirawimeas Turement!Ts T (SUM(sum) / SUM(count)) AS avg FROM ● HaveiToiaggregaTeiThei summary_10 WHERE parTialiaggregaTiont!s T timestamp BETWEEN '2015-01-01Z00:00:00' AND MIN: MIN(min) – '2015-01-01Z01:00:00' AND name = MAX: MAX(max) – 'cpu.percent' AND dimensions @> SUM: SUM(sum) – '{"host": "dev-01"}'::JSONB COUNT: SUM(count) GROUP BY – 1 AVG: SUM(sum)/SUM(count) –

  49. iummaris Tint!gi–iieries TiQueryi(vs TiRant!ge,i100MiRows T) 0.12 0.10 0.08 Query Duration (seconds) 0.06 1 Metric 10 Metrics 100 Metrics 0.04 0.02 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)

  50. iummaris Tint!gi–iieries TiQueryi(vs TiRant!ge,i100MiRows T) 0.12 0.10 0.08 Query Duration (seconds) 1 Metric 10 Metrics 0.06 100 Metrics 1 Metric 10 Metrics 0.04 100 Metrics 0.02 0.00 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Query Time Range (seconds)

  51. iummaris Tint!g ● Int!fidreas TeivolumeiToi1BN – @ 1Hz / 100 Metrics: 10M seconds – Before: 11.5 days – Now: ~115 days : 16½ weeks ● Int!fidreas TeimaxiTimeirant!ges Tifromi90Ks TiToi900Ks T – Before: ~1.04 days – Now: ~10.4 days

  52. iummaris Tint!gi–iieries TiQueryi(vs TiVolume;i100M-1BN) 0.120 0.100 0.080 Query Duration (seconds) 0.060 100000 200000 300000 0.040 0.020 0.000 0 100 200 300 400 500 600 700 800 900 1000 Data Volume (M-rows)

  53. iummaris Tint!gi–iieries TiQueryi(vs TiRant!ge,i1BNiRows T) 0.12 0.1 0.08 Query Duration (seconds) 0.06 0.04 0.02 0 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 Query Time Range (seconds)

  54. iummaris Tint!g ✔ DaTaiVolume ✔ T o 1BN – ~16 weeks ✔ 喘imeiRant!ge ✔ T o ~10 days ✔ 喘ois TfidaleifurTherdi喘ryi100:1is Tummary

  55. Clos Tint!giNoTes T

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend