analysis of derby performance
play

Analysis of Derby Performance Staff Engineer Olav Sandst Senior - PowerPoint PPT Presentation

Analysis of Derby Performance Staff Engineer Olav Sandst Senior Engineer Dyre Tjeldvoll Sun Microsystems Database Technology Group This is a draft version that is subject to change. The authors can be contacted at Olav.Sandstaa@sun.com or


  1. Analysis of Derby Performance Staff Engineer Olav Sandstå Senior Engineer Dyre Tjeldvoll Sun Microsystems Database Technology Group This is a draft version that is subject to change. The authors can be contacted at Olav.Sandstaa@sun.com or Dyre.Tjeldvoll@sun.com 1

  2. Overview ● Introduction – What is a database (DBMS)? – Derby Architecture – What is performance? ● Performance Evaluation of Derby ● Performance Tips ● Comparison with MySQL and PostgreSQL 2

  3. Introduction 3

  4. Derby Architecture ● Embedded ● Network Server Appl. Network Appl. JDBC server JDBC JDBC SQL SQL Access Access Storage Storage 4

  5. What Is “Performance”? ● How do you measure database performance? – Throughput – Response time ● Average? ● Max? ● Median? ● Out-of-the-box or carefully tuned? ● How to compare database systems with different tuning possibilities? ● Zero administration? 5

  6. Performance Evaluation of Derby 6

  7. Performance Evaluation of Derby ● Evaluation of disk and file system configurations ● Comparing Embedded and Network Server – Throughput and response times – Resource usage (CPU, network) ● The effects of adjusting the database buffer size (page cache size) ● The effect of keeping the log on a separate disk 7

  8. Derby Performance Evaluation: Derby, OS and Hardware Configuration ● Derby configuration: – “Out of the box” – “Main memory” database – “Disk” database ● Log device: – Same as data device – Separate log disk ● Disk write cache: – Enabled – Disabled 8

  9. Disk and File System Configurations ● Hard disk: – Write cache ● File system: – Logging/journaling in UFS – Direct I/O (partially evaluated) – File meta-data update (not evaluated) 9

  10. Disk Write Cache: Throughput and Response Time 10

  11. Disk and File System Configurations: Conclusions ● Disk Write Cache: Do users want high throughput and low response times, or durability/recoverability? Who are we optimizing for? ● Without write cache on disks a higher number of clients are needed to get maximum throughput ● Things to evaluate: – UFS logging – Java NIO – Direct I/O (not available from Java) – Avoid update of file meta-data 11

  12. Comparing Embedded with Network Server: TPC-B: Throughput and Response Time 12

  13. Comparing Embedded with Network Server: Single-record SELECT 13

  14. Comparing Embedded with Network Server: Resource Usage Derby Embedded: Derby Network Server: CPU? Network? Appl. Network Appl. JDBC server JDBC JDBC CPU? SQL SQL Network? Access Access Storage Storage 14

  15. Comparing Embedded and Network Server: CPU Usage CPU usage per transaction [ms] 1.4 1.3 1.2 1.1 1 0.9 0.8 User CPU 0.7 System CPU 0.6 0.5 0.4 0.3 0.2 0.1 0 TPC- TPC- Insert Insert Se- Se- Up- Up- B B CS Emb CS lect lect date date Emb Emb Cs Emb CS 15

  16. Comparing Embedded and Network Server: CPU Usage, cont. Increase in CPU usage compared to Embedded: System CPU User CPU Total TPC-B 0.09 64% 0.25 32% 36% Insert 0.04 100% 0.05 21% 33% Select 0.04 133% 0.06 52% 65% Update 0.03 75% 0.04 21% 30% Message handling The Network (TCP/IP) Server Code 16

  17. Comparing Embedded with Network Server: Conclusions ● Reduction in throughput of Network Server compared to Embedded: – Update operations: 5% – Select operations: 30-40% ● Response time: – Small increase of about 0.5 ms per SQL operation ● Utilizing the write cache on the disks increases the difference for update operations ● What can be done? – Profile to find CPU intensive parts in the Network Server code – Optimize 17

  18. Comparing Embedded with Network Server: Conclusions ● The Network Server adds 30%-60% to the CPU usage compared to the Embedded version ● The main causes for increased CPU usage: – System CPU usage: ● Message sending and receiving – User CPU usage: ● Message parsing (Strings) ● Character set conversions 18

  19. Performance Tips 19

  20. Performance Tips ● Programming: – Prepared statements ● Database schema design: – Indices ● Derby configuration: – durability=test , if recovery is not important 20

  21. Prepared Statements ● Compilation is expensive, especially in Derby which uses Java byte code generation ● Prepared statements virtually eliminates this cost ● Easy to fall into this trap for beginners (string concatenation is easy) ● <Example> 21

  22. Indices ● Use indices to optimize much used access paths ● <Example> 22

  23. Relaxed Durability ● Durability is expensive – Log must be written to disk prior to commit – This becomes a major bottleneck (cf. write cache, log on separate device) ● Durability can be disabled – Setting durability=test disables log flushing at commit – Dramatically increases throughput... – ... but database may be corrupt after a crash – Would be better if Derby could recover to a consistent state (with some transaction loss) – An alternative to a main memory database ● <Example> 23

  24. Comparing the Performance of MySQL, PostgreSQL and Derby 24

  25. Performance evaluation: MySQL, PostgreSQL and Derby Evaluated performance of: ● MySQL/InnoDB ● PostgreSQL ● Derby Embedded ● Derby Client/Server 25

  26. What is a Database (DBMS)? ● Transactions ● ACID – Atomicity – Consistency – Isolation – Durability Database Atomicity Consistency Isolation Durability Derby Yes Yes Yes Yes PostgreSQL Yes Yes Yes Yes MySQL, Innodb Yes Yes Yes Yes MySQL, MyIsam No ? Table lock Yes HSQLDB No ? ? No 26

  27. Performance Experiments ● Two configurations: – “Small” main-memory database: 10 MB data/50MB database buffer – “Big” disk-based database: 50 MB data/10MB database buffer ● Tests: – TPC-B – Single-tuple operations: select ● Load: – 1-100 concurrent clients 27

  28. Throughput: TPC-B “Small” db “Big” db 1600 500 Derby embedded Derby embedded Derby client/server Derby client/server MySQL (InnoDB) MySQL (InnoDB) 450 1400 PostgreSQL PostgreSQL 400 1200 350 Transactions per second Transactions per second 1000 300 800 250 200 600 150 400 100 200 50 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Number of clients Number of clients 28

  29. Throughput: Select (“small” db) 25000 Derby embedded Derby client/server MySQL (InnoDB) PostgreSQL 20000 Transactions per second 15000 10000 5000 0 0 20 40 60 80 100 29 Number of clients

  30. Conclusions ● Derby outperforms MySQL on large databases ● MySQL performs better on small main-memory databases ● No significant performance loss with client/server, except for SELECT operations ● Need to identify, and reduce the CPU overhead that limits throughput in CPU-bound configurations 30

  31. Disk IO: “Small” Databases 2.5 2.25 2 1.75 1.5 Db writes per transaction 1.25 Log writes per transaction 1 0.75 0.5 0.25 0 Derby em- Derby cli- MySQL PostgreSQL bedded ent/server 31

  32. Disk IO: “Big” Databases Why does Derby out-perform MySQL for disk-based databases? 2 1.8 1.6 - TPC-B like load - 50 MB database 1.4 - 10 MB buffer 1.2 Db writes per 1 transaction Log writes per transaction 0.8 0.6 0.4 0.2 0 Derby client/server MySQL 32

  33. JDBC Driver: CPU Usage CPU usage per TPC-B transaction [ms]: 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.09 0.08 User System 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 33 Derby MySQL PostgreSQL

  34. Conclusions: Resource Usage ● MySQL performs better than Derby when: – The database is small and fits in the database buffer – Throughput becomes CPU-bound ● Derby performs better than MySQL when: – The database is large and does not fit in the database buffer – Throughput becomes IO-bound ● Derby has focused on maintaining a low footprint 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend