Analysis of Derby Performance Staff Engineer Olav Sandst Senior - - PowerPoint PPT Presentation

analysis of derby performance
SMART_READER_LITE
LIVE PREVIEW

Analysis of Derby Performance Staff Engineer Olav Sandst Senior - - PowerPoint PPT Presentation

Analysis of Derby Performance Staff Engineer Olav Sandst Senior Engineer Dyre Tjeldvoll Sun Microsystems Database Technology Group This is a draft version that is subject to change. The authors can be contacted at Olav.Sandstaa@sun.com or


slide-1
SLIDE 1

1

Analysis of Derby Performance

Staff Engineer Olav Sandstå Senior Engineer Dyre Tjeldvoll

Sun Microsystems Database Technology Group

This is a draft version that is subject to change. The authors can be contacted at Olav.Sandstaa@sun.com or Dyre.Tjeldvoll@sun.com

slide-2
SLIDE 2

2

  • Introduction

– What is a database (DBMS)? – Derby Architecture – What is performance?

  • Performance Evaluation of Derby
  • Performance Tips
  • Comparison with MySQL and PostgreSQL

Overview

slide-3
SLIDE 3

3

Introduction

slide-4
SLIDE 4

4

Derby Architecture

Storage JDBC SQL Access Appl. Storage JDBC SQL Access Appl. Network server JDBC

  • Embedded
  • Network Server
slide-5
SLIDE 5

5

What Is “Performance”?

  • How do you measure database performance?

– Throughput – Response time

  • Average?
  • Max?
  • Median?
  • Out-of-the-box or carefully tuned?
  • How to compare database systems with different

tuning possibilities?

  • Zero administration?
slide-6
SLIDE 6

6

Performance Evaluation of Derby

slide-7
SLIDE 7

7

Performance Evaluation of Derby

  • Evaluation of disk and file system configurations
  • Comparing Embedded and Network Server

– Throughput and response times – Resource usage (CPU, network)

  • The effects of adjusting the database buffer size

(page cache size)

  • The effect of keeping the log on a separate disk
slide-8
SLIDE 8

8

Derby Performance Evaluation:

Derby, OS and Hardware Configuration

  • Derby configuration:

– “Out of the box” – “Main memory” database – “Disk” database

  • Log device:

– Same as data device – Separate log disk

  • Disk write cache:

– Enabled – Disabled

slide-9
SLIDE 9

9

Disk and File System Configurations

  • Hard disk:

– Write cache

  • File system:

– Logging/journaling in UFS – Direct I/O (partially evaluated) – File meta-data update (not evaluated)

slide-10
SLIDE 10

10

Disk Write Cache: Throughput and Response Time

slide-11
SLIDE 11

11

Disk and File System Configurations:

Conclusions

  • Disk Write Cache: Do users want high throughput

and low response times, or durability/recoverability? Who are we optimizing for?

  • Without write cache on disks a higher number of clients are

needed to get maximum throughput

  • Things to evaluate:

– UFS logging – Java NIO – Direct I/O (not available from Java) – Avoid update of file meta-data

slide-12
SLIDE 12

12

Comparing Embedded with Network Server:

TPC-B: Throughput and Response Time

slide-13
SLIDE 13

13

Comparing Embedded with Network Server:

Single-record SELECT

slide-14
SLIDE 14

14

Comparing Embedded with Network Server:

Resource Usage

Derby Embedded: Derby Network Server:

Storage JDBC SQL Access Appl. Storage JDBC SQL Access Appl. Network server JDBC

CPU? Network? CPU? Network?

slide-15
SLIDE 15

15

Comparing Embedded and Network Server:

CPU Usage

TPC- B Emb TPC- B CS Insert Emb Insert CS Se- lect Emb Se- lect Cs Up- date Emb Up- date CS 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

CPU usage per transaction [ms]

User CPU System CPU

slide-16
SLIDE 16

16

Comparing Embedded and Network Server:

CPU Usage, cont.

System CPU User CPU Total TPC-B 0.09 64% 0.25 32% 36% Insert 0.04 100% 0.05 21% 33% Select 0.04 133% 0.06 52% 65% Update 0.03 75% 0.04 21% 30%

Increase in CPU usage compared to Embedded:

Message handling (TCP/IP) The Network Server Code

slide-17
SLIDE 17

17

Comparing Embedded with Network Server:

Conclusions

  • Reduction in throughput of Network Server

compared to Embedded:

– Update operations: 5% – Select operations: 30-40%

  • Response time:

– Small increase of about 0.5 ms per SQL operation

  • Utilizing the write cache on the disks increases the

difference for update operations

  • What can be done?

– Profile to find CPU intensive parts in the Network

Server code

– Optimize

slide-18
SLIDE 18

18

Comparing Embedded with Network Server:

Conclusions

  • The Network Server adds 30%-60% to the CPU

usage compared to the Embedded version

  • The main causes for increased CPU usage:

– System CPU usage:

  • Message sending and receiving

– User CPU usage:

  • Message parsing (Strings)
  • Character set conversions
slide-19
SLIDE 19

19

Performance Tips

slide-20
SLIDE 20

20

Performance Tips

  • Programming:

– Prepared statements

  • Database schema design:

– Indices

  • Derby configuration:

– durability=test, if recovery is not important

slide-21
SLIDE 21

21

Prepared Statements

  • Compilation is expensive, especially in Derby

which uses Java byte code generation

  • Prepared statements virtually eliminates this cost
  • Easy to fall into this trap for beginners (string

concatenation is easy)

  • <Example>
slide-22
SLIDE 22

22

Indices

  • Use indices to optimize much used access paths
  • <Example>
slide-23
SLIDE 23

23

Relaxed Durability

  • Durability is expensive

– Log must be written to disk prior to commit – This becomes a major bottleneck (cf. write cache, log

  • n separate device)
  • Durability can be disabled

– Setting durability=test disables log flushing at commit – Dramatically increases throughput... – ... but database may be corrupt after a crash – Would be better if Derby could recover to a consistent

state (with some transaction loss)

– An alternative to a main memory database

  • <Example>
slide-24
SLIDE 24

24

Comparing the Performance of MySQL, PostgreSQL and Derby

slide-25
SLIDE 25

25

Performance evaluation:

MySQL, PostgreSQL and Derby

Evaluated performance of:

  • MySQL/InnoDB
  • PostgreSQL
  • Derby Embedded
  • Derby Client/Server
slide-26
SLIDE 26

26

What is a Database (DBMS)?

  • Transactions
  • ACID

– Atomicity – Consistency – Isolation – Durability Database Atomicity Consistency Isolation Durability Derby Yes Yes Yes Yes PostgreSQL Yes Yes Yes Yes MySQL, Innodb Yes Yes Yes Yes MySQL, MyIsam No ? Table lock Yes HSQLDB No ? ? No

slide-27
SLIDE 27

27

Performance Experiments

  • Two configurations:

– “Small” main-memory database: 10 MB data/50MB

database buffer

– “Big” disk-based database: 50 MB data/10MB database

buffer

  • Tests:

– TPC-B – Single-tuple operations: select

  • Load:

– 1-100 concurrent clients

slide-28
SLIDE 28

28

Throughput: TPC-B

200 400 600 800 1000 1200 1400 1600 20 40 60 80 100 Transactions per second Number of clients Derby embedded Derby client/server MySQL (InnoDB) PostgreSQL 50 100 150 200 250 300 350 400 450 500 20 40 60 80 100 Transactions per second Number of clients Derby embedded Derby client/server MySQL (InnoDB) PostgreSQL

“Small” db “Big” db

slide-29
SLIDE 29

29

Throughput: Select (“small” db)

5000 10000 15000 20000 25000 20 40 60 80 100 Transactions per second Number of clients Derby embedded Derby client/server MySQL (InnoDB) PostgreSQL

slide-30
SLIDE 30

30

Conclusions

  • Derby outperforms MySQL on large databases
  • MySQL performs better on small main-memory

databases

  • No significant performance loss with client/server,

except for SELECT operations

  • Need to identify, and reduce the CPU overhead that

limits throughput in CPU-bound configurations

slide-31
SLIDE 31

31

Disk IO: “Small” Databases

Derby em- bedded Derby cli- ent/server MySQL PostgreSQL 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5

Db writes per transaction Log writes per transaction

slide-32
SLIDE 32

32

Derby client/server MySQL 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Db writes per transaction Log writes per transaction

Disk IO: “Big” Databases

Why does Derby out-perform MySQL for disk-based databases?

  • TPC-B like load
  • 50 MB database
  • 10 MB buffer
slide-33
SLIDE 33

33

JDBC Driver: CPU Usage

Derby MySQL PostgreSQL 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16

User System

CPU usage per TPC-B transaction [ms]:

slide-34
SLIDE 34

34

Conclusions: Resource Usage

  • MySQL performs better than Derby when:

– The database is small and fits in the database buffer – Throughput becomes CPU-bound

  • Derby performs better than MySQL when:

– The database is large and does not fit in the database

buffer

– Throughput becomes IO-bound

  • Derby has focused on maintaining a low footprint