JDBC JDBC Perf erfor ormance mance fr from the Inside om the - - PowerPoint PPT Presentation

jdbc jdbc perf erfor ormance mance fr from the inside
SMART_READER_LITE
LIVE PREVIEW

JDBC JDBC Perf erfor ormance mance fr from the Inside om the - - PowerPoint PPT Presentation

JDBC JDBC Perf erfor ormance mance fr from the Inside om the Inside Ju July 2017 1 Introduction Dave Cramer Work for OpenSCG supporting PostgreSQL Maintainer for the JDBC driver since 1999 There are many options for connecting


slide-1
SLIDE 1

1

JDBC JDBC Perf erfor

  • rmance

mance fr from the Inside

  • m the Inside

Ju July 2017

slide-2
SLIDE 2

2

Ÿ Dave Cramer Ÿ Work for OpenSCG supporting PostgreSQL Ÿ Maintainer for the JDBC driver since 1999 Ÿ There are many options for connecting Ÿ Many of them I didn’t totally understand Ÿ This talk hopes to unveil some of the more interesting ones Ÿ And explain how they work.

Introduction

slide-3
SLIDE 3

3

Ÿ History of the driver Ÿ Connecting to the driver Ÿ Under utilized features Ÿ Performance tips Ÿ Latest Release major features

Overview

slide-4
SLIDE 4

4

Ÿ Originally written by Peter Mount in 1997 Ÿ Supported JDBC 1.2 Ÿ 1997 JDBC 1.2 Java 1.1 Ÿ 1999 JDBC 2.1 Java 1.2 Ÿ 2001 JDBC 3.0 Java 1.4 Ÿ 2006 JDBC 4.0 Java 6 Ÿ 2011 JDBC 4.1 Java 7 Ÿ 2014 JDBC 4.2 Java 8 Ÿ 2017 JDBC 4.3 Java 9 (Maybe ?) Ÿ Each one of these were incremental additions to the interface Ÿ Requiring additional concrete implementations of the spec to be implemented

History

slide-5
SLIDE 5

5

Connecting to the server

slide-6
SLIDE 6

6

Ÿ jdbc:postgresql: Ÿ Connects to localhost, port 5432, database specified in user Ÿ jdbc:postgresql://host/ Ÿ Connects to <host>, port 5432, and database specified in user Ÿ jdbc:postgresql://host:port/ Ÿ Connects to <host><port> and database specified in user Ÿ jdbc:postgresql:database Ÿ jdbc:postgresql://host:port/database Ÿ jdbc:postgresql://host1:port, host2:port/database

URL options

slide-7
SLIDE 7

7

  • PG_DBNAME
  • PG_DBHOST
  • PG_DBPORT

These can be used in the following manner Properties props = new Properties(); props.setProperty(PGProperty.PG_DBNAME.getName(),"test"); props.setProperty(PGProperty.PG_HOST.getName(),"localhost"); props.setProperty(PGProperty.PG_PORT.getName(),"5432"); props.setProperty("user","davec"); props.setProperty("password", ""); Connection connection = DriverManager.getConnection("jdbc:postgresql:”, props);

Connection Properties

slide-8
SLIDE 8

8

Ÿ loggerLevel = OFF|DEBUG|TRACE Ÿ Enables java.util.logging.Logger DEBUG=FINE, TRACE=FINEST Ÿ Not intended for SQL logging but rather to debug the driver Ÿ loggerFile=<filename> the file to output the log to. If this is not set then the output will be written to the console.

Logging

slide-9
SLIDE 9

9

Ÿ We will honour DriverManager.setLogStream or DriverManager.setLogWriter Ÿ Parent logger is org.postgresql Ÿ Since we are using java.util.Logging, we can use a properties file to configure logging Ÿ handlers=java.util.logging.FileHandler Ÿ org.postgresql.level=FINEST Ÿ java –Djava.util.logging.config.file=…

Logging continued

slide-10
SLIDE 10

10

Ÿ logUnclosedConnections=boolean Ÿ Provides an easy way to find connection leaks Ÿ If this is turned on we track connection opening. If the finalizer is reached and the connection is still

  • pen the stacktrace message is printed out.

Logging continued

slide-11
SLIDE 11

11

Ÿ autosave = never | always | conservative Ÿ PostgreSQL transaction semantics all or nothing. This is not always desirable Ÿ autosave=always will create a savepoint for every statement in a transaction. Ÿ The effect of which means that if you do Ÿ Insert into invoice_header … Ÿ Insert into invoice_lineitem … Ÿ If the insert into invoice lineitem fails the header will still be valid. Ÿ In conservative mode if the driver determines that reparsing the query will work then it will be reparsed and retried.

Autosave

slide-12
SLIDE 12

12

Ÿ binaryTransferEnable=comma separated list of oid’s

  • r names

Ÿ binaryTransferDisable Ÿ Currently the driver will use binary mode for most built-in types.

Binary Transfer

slide-13
SLIDE 13

13

  • simple
  • Fewer round trips to db no bind, no parse
  • Required for replication connection
  • extended
  • Default creates a server prepared statement,

uses parse, bind and execute.

  • Protects against sql injection
  • Possible to re-use the statement

preferQueryMode

slide-14
SLIDE 14

14

  • extendedForPrepared
  • Does not use extended for statements, only

prepared statements

  • Potentially faster execution of statements
  • extendedCacheEverything
  • Uses extended and caches even simple

statements such as ‘select a from tbl’ which is normally not cached

preferQueryMode

slide-15
SLIDE 15

15

  • Default is 0 which means fetch all rows
  • This is sometimes surprising and can result in
  • ut of memory errors
  • If set *AND* autocommit=false THEN will limit the

number of rows per fetc

  • Potentially significant performance boost

defaultRowFetchSize=int

slide-16
SLIDE 16

16

  • The default is varchar, which tells the server

that strings are actually strings!

  • You can use stringtype=‘unspecified’
  • Usefull if you have an existing application

that uses setString(‘1234’) to set an integer column.

  • Server will attempt to cast the “string” to the

appropriate type.

stringtype=varchar|unspecified

slide-17
SLIDE 17

17

  • sets the application name
  • Servers version 9.0 and greater
  • Useful for logging and seeing which

connections are yours in pg_stat_activity, etc.

ApplicationName=String

slide-18
SLIDE 18

18

  • The default is false
  • True sends SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY

to the server.

  • This blocks any writes to persistent tables,

interestingly you can still write to a temporary table.

readOnly=boolean

slide-19
SLIDE 19

19

  • columnSanitizer folds column names to lower

case.

  • Column names like FirstName become

firstname.

  • Resultset.getInt(“firstname”)
  • default is to sanitize names

disableColumnSanitizer=boolean

slide-20
SLIDE 20

20

  • Currently there are only 2 use cases
  • 9.0 which will enable
  • ApplicationName=ApplicationName

(defaults to PostgreSQL JDBC Driver)

  • sets extra float digits to 3
  • 9.4 necessary for replication connections

assumeMinServerVersion=String

slide-21
SLIDE 21

21

  • by default the current schema will be “public”
  • If you want to refer to a table in a different

schema it would have to be specified by schema.table

  • If you set this connection property to “audit” for

example instead of “select * from audit.log” you could use select * from log;

currentSchema=String

slide-22
SLIDE 22

22

  • Enables the driver to optimize batch inserts by

changing multiple insert statements into one insert statement.

  • Multiple statements such as “insert into tab1

values (1,2,3);”

  • Rewritten as “insert into tab1 values (1,2,3),

(4,5,6)

reWriteBatchedInserts=true

slide-23
SLIDE 23

23

  • Specify multiple hosts in the connection string
  • “jdbc:postgresql://host1:port1,host2:port2/

database”

  • By default this will attempt to make connections

to each host until it succeeds

Connection Failover

slide-24
SLIDE 24

24

  • targetServerType=master, slave, preferSlave
  • Observes if server allows writes to pick
  • preferSlave will try slaves first then fall back

to master

  • loadBalanceHosts=boolean will randomly pick

from suitable candidates

  • hostRecheckSeconds=number of seconds

between checking status (read or write) of hosts default is 10 seconds

Connection Failover tuning

slide-25
SLIDE 25

25

  • Tells the backend to go into walsender mode
  • Simple query mode, subset of commands
  • Setting to database enables logical replication

for that database

  • Must be accompanied by

assumMinServerVersion=“9.4” and preferQueryMode=“simple”

replication=database, true

slide-26
SLIDE 26

26

Ÿ setFetchSize Ÿ rewriteBatchInserts

Performance tricks

slide-27
SLIDE 27

27

Ÿ Fetch a large amount of data with different fetch sizes

Set FetchSize performance

public static final String QUERY = "SELECT t FROM number"; @Benchmark public void test(Blackhole blackhole, PgStatStatements pgStatStatements) throws SQLException { pgStatStatements.setTestName(QueryBenchmarks.JMHTestNameFromClass(_6_String_NoAutocommit.class)); QueryUtil.executeProcessQueryNoAutocommit(QUERY, resultSet -> { while (resultSet.next()) { blackhole.consume(resultSet.getString(1)); } }); } // Used to fetch rows in batches from the db. Will only work if the connection does not use AutoCommit PGProperty.DEFAULT_ROW_FETCH_SIZE.set(properties, FETCH_SIZE);

slide-28
SLIDE 28

28

Time it takes to fetch 1M rows

5 10 15 20 25 30 35 10 100 1000 10000 time(s)

slide-29
SLIDE 29

29

Ÿ For each row insertExecute this is the slowest Ÿ For each row insertBatch this would be ideal Ÿ Insert into foo (i,j) values (1,’one’), (2,’two’) …. (n,’n’) hand rolled code Ÿ Copy into foo from stdin…

What are the options for inserting lots of data

slide-30
SLIDE 30

30

Ÿ Java 1.8_60 Ÿ Core i7 2.8GHz Ÿ PostgreSQL 9.6 Ÿ https://github.com/pgjdbc/pgjdbc/tree/master/ubenchmark Ÿ create table batch_perf_test(a int4, b varchar(100), c int4) Table "public.batch_perf_test” Column | Type

  • -------+-----------------------------------

a | integer b | character varying(100) c | integer

JDBC micro benchmark suite

slide-31
SLIDE 31

31

Ÿ For each row Insert into perf (a,b,c) values (?,?,?) Ÿ After N rows executeBatch Ÿ Normal mode this executes N inserts, not any faster than Ÿ Looping over N inserts without batch mode

INSERT Batch 1 row at a time

slide-32
SLIDE 32

32

Ÿ For each row Insert into perf (a,b,c) values (?,?,?), (?,?,?), (?,?,?), (?,?,?) Ÿ After N/ rows_at_a_time rows executeBatch Ÿ Given 1000 (N) rows if we insert them 100(rows_at_a_time) , end up inserting 10 rows 100 wide Ÿ More data inserted per statement, less statements

INSERT Batch N rows_at_a_time

slide-33
SLIDE 33

33

Ÿ For each row Insert into perf (a,b,c) values (?,?,?) Ÿ After N rows executeBatch Ÿ Same as last slide except we set the connection parameter insertRewrite=true Ÿ As of version 1209 this is has been enabled Ÿ Same as insert into foo (i,j) values (1,’one’), (2,’two’) …. (n,’n’) except the driver does it for you.

INSERT Batch with insertRewrite

slide-34
SLIDE 34

34

Ÿ Loop over the rows creating the input string in memory Ÿ Build a string in memory which looks like 0\ts0\t0\n1\ts1\t1\n…. Ÿ The string will end up being nrows / rows_at_a_time long Ÿ Use the copy API to copy this into the table

Copy

slide-35
SLIDE 35

35

Hand rolled insert struct N structs at a time

Ÿ Insert into batch_perf_test select * from unnest (?::batch_perf_test[]) Ÿ For N rows setString to ‘{“(1,s1,1)”,”(2,s2,2)”,”(3,s3,3)”}’ Ÿ Add Batch Ÿ executeBatch Ÿ The query that gets executes look like: Insert into batch_perf_test select * from unnest (‘{“(1,s1,1)”,”(2,s2,2)”,”(3,s3,3)”}’::batch_perf_test[])

slide-36
SLIDE 36

36

Results

20 40 60 80 100 120 16 128 1024 Time (ms) Number of rows

Batch size of 128

InsertBatch Copy Insert

slide-37
SLIDE 37

37

Conclusion

Ÿ Compared to batch inserts, plain inserts are very slow for large amounts of data

slide-38
SLIDE 38

38

Results

50 100 150 200 1 4 8 16 128 Time (ms) Batch size

1024 rows different batch sizes

Insert Batch Insert Rewrite Copy Insert Struct

slide-39
SLIDE 39

39

Ÿ Open connection Ÿ Prepare statement ‘select * from foo where id=?’ Ÿ preparedStatment.executeQuery() Ÿ preparedStatement.close() Ÿ Close Connection Ÿ Without a pool connection creation is a heavyweight operation. PostgreSQL uses processes so each connection is a process Ÿ Does not take advantage of caching

How not to use JDBC (unfortunately typical)

slide-40
SLIDE 40

40

Ÿ Open connection Ÿ Prepare statement ‘select * from foo where id=?’ Ÿ By default after 5 executions will create a named statement PARSE S_1 as ‘select * from foo where id=?’ Ÿ Multiple preparedStatment.executeQuery() BIND/EXEC instead of PARSE/BIND/ EXEC Ÿ Never close the statement if possible

Better solution

slide-41
SLIDE 41

41

Ÿ Client side query cache only works in 9.4.1203 and up Ÿ Do not use generated queries, as they generate new server side prepared statement Ÿ Things like executeUpdate('insert into foo (i,l,f,d) values (1,2,3,4)') will never use a named statement Ÿ Do not change the type of a parameter as this leads to DEALLOCATE/PREPARE Ÿ Pstmt.setInt(1,1) Ÿ Pstmt.setNull(1,Types.VARCHAR) this will cause the prepared statement to be deallocated

Query cache best practices

slide-42
SLIDE 42

42

Ÿ Server Prepare activated after 5 executions Ÿ There is a configuration parameter called prepareThreshold (default 5) Ÿ PGStatement.isUseServerPrepare() can be used to check Ÿ After 5 executions of the same prepared statement we change from unnamed statements to named Ÿ Named statements will use binary mode where possible; Ÿ binary mode is faster when we have to parse things like timestamps Ÿ Named statements are only parsed once on the server then bind/execute

  • perations on the server

Less obvious issues

slide-43
SLIDE 43

43

Ÿ If we don’t use a fetch size we will read the entire response into memory then process Ÿ Optimizing the data sent at one time reduces memory usage and GC Ÿ Only works with in a transaction Ÿ Make sure fetch size is above 100 Ÿ If you have a lot of data this is really the only way to read it in without an Out Of Memory Exception

setFetchSize

slide-44
SLIDE 44

44

Ÿ Cache parsed statements across PrepareStatement calls now don’t have to parse the statement in java each time Ÿ Execute Batch changed to not execute statement by statement bug in code disabled batching Ÿ Rewrite Batched inserts rewrites inserts from multiple insert into foo (a,b,c) values (1,2,3) to insert into foo (a,b,c) values (1,2,3), (4,5,6) this provides 2x-3x speed up Ÿ Avoid Calendar cloning provides 4x speed increase for setTimestamp pr 376

Performance enhancements review

slide-45
SLIDE 45

45

Conclusions

Ÿ Using insert rewrite gives us a 2-3x performance increase for batch inserts Ÿ Makes sense as it is one trip Ÿ Use setFetchSize(100) or greater and use transactions Ÿ Don’t close prepared statements.

slide-46
SLIDE 46

46

New Release Numbering 42.0.0

Ÿ Wanted to divorce ourselves from the server release schedule Ÿ Wanted to reduce confusion as to which version to use. Previously the numbers 9.x were in the version number. Ÿ Introduce semantic versioning Ÿ 42 more or less at random, but also the answer to the question.

slide-47
SLIDE 47

47

Notable changes

Ÿ Support dropped for versions before 8.2 Ÿ Replace hand written logger with java.util.logging Ÿ Replication protocol API was added.

slide-48
SLIDE 48

48

Logical Replication Overview

Ÿ Reads the WAL logs and outputs them in any format you want Ÿ Read changes Ÿ Send confirmation of changes read Ÿ GOTO read more changes

slide-49
SLIDE 49

49

Logical Replication High level Steps

Ÿ Create a replication connection Ÿ Create a logical replication slot Ÿ Read changes Ÿ Send confirmation of changes read Ÿ GOTO read more changes

slide-50
SLIDE 50

50

Create a Replication Connection

Ÿ PGProperty.REPLICATION set to “database” instructs the walsender to connect to the database in the url and allow the connection to be used for logical replication. Ÿ PREFER_QUERY_MODE needs to be set to simple as replication does not allow the use of the extended query mode

String url = "jdbc:postgresql://localhost:5432/postgres"; Properties props = new Properties(); PGProperty.USER.set(props, "postgres"); PGProperty.PASSWORD.set(props, "postgres"); PGProperty.ASSUME_MIN_SERVER_VERSION.set(props, "9.4"); PGProperty.REPLICATION.set(props, "database"); PGProperty.PREFER_QUERY_MODE.set(props, "simple"); Connection con = DriverManager.getConnection(url, props); PGConnection replConnection = con.unwrap(PGConnection.class);

slide-51
SLIDE 51

51

Create a Logical Replication Slot

Ÿ Slots require a name and an output plugin Ÿ Any unique name will work Ÿ The output plugin is a previously compiled C library which formats the logical WAL

String outputPlugin = ‘test_decode’;

try (PreparedStatement preparedStatement =

connection.prepareStatement("SELECT * FROM pg_create_logical_replication_slot(?, ?)")) { preparedStatement.setString(1, slotName); preparedStatement.setString(2, outputPlugin); preparedStatement.executeQuery()) }

slide-52
SLIDE 52

52

Create a replication stream

Ÿ Open a PGReplicationStream with the same slot name Ÿ Start position can be an existing LSN or InvalidLSN Ÿ SlotOptions are sent to the logical decoder and are decoder specific

PGReplicationStream stream = pgConnection .getReplicationAPI() .replicationStream() .logical() .withSlotName(SLOT_NAME) .withStartPosition(lsn) .withSlotOption("include-xids", true) .start();

slide-53
SLIDE 53

53

Read Changes from database

Ÿ Read from the stream, data will be in a ByteBuffer Ÿ After reading the data send confirmation messages

Ÿ github.com:davecramer/LogicalDecode.git

while (true) { //non blocking receive message ByteBuffer msg = stream.readPending(); if (msg == null) { TimeUnit.MILLISECONDS.sleep(10L); continue; } int offset = msg.arrayOffset(); byte[] source = msg.array(); int length = source.length - offset; System.out.println(new String(source, offset, length)); //feedback stream.setAppliedLSN(stream.getLastReceiveLSN()); stream.setFlushedLSN(stream.getLastReceiveLSN()); }

slide-54
SLIDE 54

54

Ÿ Credit where credit is due: Ÿ Much of the optimization work on the driver was done by Vladimir Sitnikov Ÿ Much (if not all ) of the work to convert the build to Maven was done by Stephen Nelson Ÿ Rewriting batch statements thanks to Jeremy Whiting Ÿ Replication support was provided by Vladimir Gordiychuk Ÿ Questions ?

https://github.com/pgjdbc/pgjdbc