Oracle Advanced Compression Tests Svetozar Kapusta 15 th of October - - PowerPoint PPT Presentation

oracle advanced compression tests
SMART_READER_LITE
LIVE PREVIEW

Oracle Advanced Compression Tests Svetozar Kapusta 15 th of October - - PowerPoint PPT Presentation

Oracle Advanced Compression Tests Svetozar Kapusta 15 th of October 2009 What is CERN? CERN is: CERN is: 2500 staff scientists (physicists, engineers, CERN is the worlds largest particle etc.) 6500 visiting scientists physics


slide-1
SLIDE 1

Oracle Advanced Compression Tests

Svetozar Kapusta

15th of October 2009

slide-2
SLIDE 2

What is CERN?

CERN is:

  • CERN is the world’s largest particle

h i l b t l t d i G

CERN is: ≈2500 staff scientists (physicists, engineers, etc.) ≈6500 visiting scientists (h lf f th ld'

physics laboratory located in Geneva, Switzerland

  • CERN hosts the Large Hadron Collider

(half of the world's particle physicists) Coming from ≈500 universities or institutes

CERN hosts the Large Hadron Collider (LHC) which is the biggest man-made accelerator

representing ≈80 nationalities.

  • LHC will start its operation in November 2009

and will form, together with its experiments, the and will form, together with its experiments, the biggest sub-nuclear microscope in the world.

Courtesy of M. Girone

slide-3
SLIDE 3

LHC: a Very Large Scientific Instrument

LHC : 27 km long

Mont Blanc, 4810 m

LHC : 27 km long 100m underground

ATLAS

Downtown Geneva

ALICE

CMS

+TOTEM

Courtesy of M. Girone

slide-4
SLIDE 4

… Based on Advanced Technology

27 km of superconducting magnets cooled in superfluid helium at 1.9 K p

Courtesy of M. Girone

slide-5
SLIDE 5

Experiments are ready for collisions

Courtesy of M. Girone

slide-6
SLIDE 6

The Data Acquisition

Ian.Bird@c ern ch 6 Courtesy of M. Girone

slide-7
SLIDE 7

Data Acquisition, First pass processing

Ian.Bird@c ern ch 7

1.25 GB/sec (ions)

Courtesy of M. Girone

slide-8
SLIDE 8

CERN Openlab

  • Collaboration between CERN and industrial

Openlab partners: HP Intel Oracle and Openlab partners: HP, Intel, Oracle and Siemens

  • Framework for evaluating and integrating
  • Framework for evaluating and integrating

cutting-edge IT technologies CERN i l t t h l

  • CERN acquires early access to technology
  • CERN offers expertise and a demanding

computing environment to push new technologies to their limits

  • CERN provides a neutral ground for carrying
  • ut advanced R&D
  • Excellent collaboration with Oracle
slide-9
SLIDE 9

Databases for physics at CERN

  • Relational databases play a key role in the

experiments’ production dataflow chains

  • Listed among the critical services for the

g LHC experiments

  • Bulk of physics data stored in files a

Bulk of physics data stored in files, a fraction of it in databases

  • Most applications are OLTP
  • Most applications are OLTP
  • Some data warehouse applications are also

i emerging

slide-10
SLIDE 10

Data Growth

  • Expected data growth is roughly ≈20-30 TB

per year per experiment

  • Experiments need to have all data available

p at any time

  • During the experiments lifetimes (10-15 years)

u g t e e pe e ts et es ( 0 5 yea s)

  • Few extra years, as the data analysis will continue
  • We have to provide an efficient way of storing
  • We have to provide an efficient way of storing

and accessing the few Peta bytes of mostly read-only data read-only data

  • Answer to our challenge is the compression

available in 11G2 and Exadata2 available in 11G2 and Exadata2

slide-11
SLIDE 11

Advanced Compression Tests

  • Exadata2 located in Reading, UK
  • Half rack with 7 storage cells each of 12 disks each

Half rack with 7 storage cells each of 12 disks each

  • Accessed remotely from Geneva for 2 weeks
  • Data used
  • The largest and representative production and test tables
  • Exported compressed using Datapump
  • Imported into Exadata2 using Datapump
  • Applications

PVSS ( l t l t d b th i t )

  • PVSS (slow control system used by the experiments)
  • GRID monitoring application
  • GRID Test data

GRID Test data

  • File transfer applications (PANDA)
  • Logging application for ATLAS
  • First results the same day
slide-12
SLIDE 12

Compression factors for various compression types of various physics applications

60 70 20 30 40 50 ARCHIVE LOW ARCHIVE HIGH 10 20 OLTP BASIC QUERY LOW QUERY HIGH ARCHIVE LOW NO COMPRESSION

PVSS columns: 6 number, 4 TS(9) , 5 varchar2 , 3 binary_double LCG GRID Monitoring columns: 5 number LCG TESTDATA columns: 6 number(38), 1 varchar2, 1 CLOB ( ) ATLAS PANDA FILESTABLE columns: 3 number, 12 varchar2, 2 date, 2 char ATLAS LOG MESSAGES columns: 5 number, 7 varchar2, 1 TS

slide-13
SLIDE 13

Table creation times for various compression types of various physics

  • applications. Normalized to no compression.

40 45 15 20 25 30 35 ARCHIVE LOW ARCHIVE HIGH 5 10 15 OLTP BASIC QUERY LOW QUERY HIGH ARCHIVE LOW NO COMPRESSION OLTP

PVSS columns: 6 number, 4 TS(9) , 5 varchar2 , 3 binary_double LCG GRID monitoring columns: 5 number LCG TESTDATA columns: 6 number(38), 1 varchar2, 1 CLOB ( ) ATLAS PANDA FILESTABLE columns: 3 number, 12 varchar2, 2 date, 2 char ATLAS LOG MESSAGES columns: 5 number, 7 varchar2, 1 TS

slide-14
SLIDE 14

Full table scans performance for various compression types of various physics

  • applications. Normalized to no compression.

3 3.5 1.5 2 2.5 ARCHIVE HIGH 0.5 1 BASIC QUERY LOW QUERY HIGH ARCHIVE LOW NO COMPRESSION OLTP

PVSS columns: 6 number, 4 TS(9) , 5 varchar2 , 3 binary_double LCG GRID monitoring columns: 5 number LCG TESTDATA columns: 6 number(38), 1 varchar2, 1 CLOB ( ) ATLAS PANDA FILESTABLE columns: 3 number, 12 varchar2, 2 date, 2 char ATLAS LOG MESSAGES columns: 5 number, 7 varchar2, 1 TS

slide-15
SLIDE 15

Full table scans performance for various compression types of various physics

  • applications. Normalized to no compression. Exadata offloading set to false.

25 30 10 15 20 25 ARCHIVE HIGH 5 10 BASIC QUERY LOW QUERY HIGH ARCHIVE LOW NO COMPRESSION OLTP

PVSS columns: 6 number, 4 TS(9) , 5 varchar2 , 3 binary_double LCG GRID monitoring columns: 5 number LCG TESTDATA columns: 6 number(38), 1 varchar2, 1 CLOB ( ) ATLAS PANDA FILESTABLE columns: 3 number, 12 varchar2, 2 date, 2 char ATLAS LOG MESSAGES columns: 5 number, 7 varchar2, 1 TS

slide-16
SLIDE 16

Exadata2 offloading

Full table scans performance for various compression types of ATLAS logging application with and without Exadata offloading

1000

me [s]

10 100

ble scan tim

1 10

Full tab Please note the logarithmic scale

slide-17
SLIDE 17

Export Datapump Compression

  • Compression factor for PVSS data
  • Export Datapump ≈9X
  • tar bzip2 utility
  • ≈11X on non compressed exported PVSS data
  • ≈1.2X on the compressed exported PVSS data
  • Compression factor for LCG application

p pp

  • Export Datapump ≈13X
  • tar bzip2 utility

p y

  • ≈9X on non compressed exported LCG data
  • ≈1.2X on the compressed exported LCG data
slide-18
SLIDE 18

Conclusions

  • Tested basic, OLTP and hybrid columnar

i d D t i compression and Datapump compression

  • The results for data from physics

applications are rather impressing (2-6X OLTP, 10-70X EHCC archive high)

  • EHCC can achieve up to ≈3X better

compression than tar bzip2 compression of p p p the same data exported uncompressed

  • Oracle Compression offers a win-win

Oracle Compression offers a win win solution, especially for OLTP

  • Shrinks used storage volume
  • Shrinks used storage volume
  • Improves performance
slide-19
SLIDE 19

Thank you for your attention

slide-20
SLIDE 20

Backup

16 18 8 10 12 14 2 4 6 8 CPU Consumed vs No Cmp Logical Reads vs No Cmp