GoldenGate GoldenGate Outline What is GoldenGate? Architecture - - PDF document

goldengate goldengate outline
SMART_READER_LITE
LIVE PREVIEW

GoldenGate GoldenGate Outline What is GoldenGate? Architecture - - PDF document

Zbigniew Baranowski GoldenGate GoldenGate Outline What is GoldenGate? Architecture Architecture Performance GoldenGate vs Streams GoldenGate vs. Streams Monitoring Summary Summary What is GoldenGate?


slide-1
SLIDE 1

GoldenGate GoldenGate

Zbigniew Baranowski

slide-2
SLIDE 2

Outline

  • What is GoldenGate?
  • Architecture

Architecture

  • Performance
  • GoldenGate vs Streams

GoldenGate vs. Streams

  • Monitoring
  • Summary
  • Summary
slide-3
SLIDE 3

What is GoldenGate?

  • Real-time data integration solutions
  • Continuous data synchronization across

Continuous data synchronization across heterogeneous environments

– Oracle, DB2, SQL Server, MySQL and more

  • Project started in early 90s
  • Purchased by Oracle in late 2009

y

slide-4
SLIDE 4

Motivation for testing

  • Need of stable and reliable replicarion service
  • Streams require frequent interventions (at least

Streams require frequent interventions (at least

  • nce per week)

– Blocking sessions – Memory pools shortage – Human errors

  • Streams administration is difficult

– T1s administrators need our help

  • Streams project has been abandoned by Oracle
slide-5
SLIDE 5

Data integration using GG

slide-6
SLIDE 6

GoldenGate architecture

slide-7
SLIDE 7

GoldenGate components

  • Manager

– Runs on both databases – Starts and monitors GG processes – Manages trail files – Reporting

slide-8
SLIDE 8

GoldenGate components

  • Extract (capture process)

– Runs on the source database – Extracts data changes from redo logs – Writes transactional and DDL changes (in common GG f ) il fil format) to trail files

slide-9
SLIDE 9

GoldenGate components

  • Collector

– Background process which runs on the target system g p g y – Two modes: static and dynamic (managed by manager) – Receives data changes from TCP/IP and writes to remote il fil trail files

slide-10
SLIDE 10

GoldenGate components

  • Replicat (apply process)

– Runs on the target system g y – Reads transactional data from trail files and applies them

  • n target database

C b ll li d – Can by parallelized

slide-11
SLIDE 11

GoldenGate components

  • Trail files

– Contains data changes written in GG common format g – Series of files that GoldenGate temporarily stores on disks T – Two types

  • Extract Trail (located on source - optional)
  • Remote Trail (located on destination)

( )

slide-12
SLIDE 12

GoldenGate components

  • Data pump

– Sends data changes from extract trail via TCP/IP to g remote trail on the target – Optional component

E t t d h di tl t t t il

  • Extract process can sends changes directly to remote trail

– Optional encryption and compression over TCP/IP

slide-13
SLIDE 13

GoldenGate architecture (overview)

EXTRACT PROCESS

LCs

TRAIL FILES

Source Source

PROCESS

capture changes

TRAIL FILES

File streaming

Destination Destination

DELIVERY PROCESS

log changes (parsing logs) File streaming (data pumping) LCs apply changes

PROCESS REDO LOG TRAIL FILES TARGET DATABASE (replica) SOURCE DATABASE MANAGER MANAGER PROCESS MANAGER PROCESS

slide-14
SLIDE 14

Streams architecture (overview)

CAPTURE PROCESS

LCRs

Source Source

PROCESS

capture changes

SOURCE QUEUE

propagate

Destination Destination

APPLY PROCESS

log changes (using Logminer) propagate events LCRs apply changes

PROCESS REDO LOG DESTINATION QUEUE TARGET DATABASE (replica) SOURCE DATABASE

slide-15
SLIDE 15

Discovered problems

  • Supplemental logging groups are created to late to

replicate the following DML operations g

– Workaround: add global supplemental logging for primary and foreign keys

  • Cannot use create as select – select statement is

executed on the destination database as GGADMIN user user

  • Delivery parallelism: lack of DMLs and DDLs

synchronization – problem with temporal tables synchronization – problem with temporal tables

slide-16
SLIDE 16

Performance

  • Simple workload (inserts, updates, deletes)

– With basic configuration (no parallelism): 7K LCRs /s g ( p ) – With BatchSQL optimization 15K LCRs /s and can be more Wi h ll li l 15K LCR / ( h ll l – With parallelism: at least 15K LCRs /s (each parallel process increases throughput linearly)

  • COOL workload
  • COOL workload

– ~3K LCRs /s – BatchSQL does not improve performance BatchSQL does not improve performance – Unable to use parallelism

  • Delivery process is the bottlenecks due to limits

y p

  • f resource utilization by single session.

– Possible solution: per schema delivery parallelism

slide-17
SLIDE 17

Where are we now?

slide-18
SLIDE 18

GoldenGate vs Streams (setting up)

Streams GoldenGate Installation

  • Embedded in Oracle DBMS
  • Unzipping binaries

Management

  • Using SQL or EM
  • Mainly with GGSCI command tool
  • Director application

Replication setup

  • TNS + database links config
  • Executions of few procedures
  • Editing parameter files + executions of

few command

  • Additional port required

RAC

N dditi l t i d Sh d t CRS fi ti

RAC environment

  • No additional steps required
  • Shared storage, CRS configuration,

additional parameters

Schemas selection

  • Setting rules for process – SQL

procedure

  • Editing mapping files

selection

procedure

Supervisor account

  • STRMADMIN – owner of processes,

jobs, queues and links

  • Executes transactions on target
  • GGADMIN – keeps metadata about

replicated schema

  • Executes transactions on target

g g

Watch

  • None
  • Manager process
slide-19
SLIDE 19

GoldenGate vs Streams (performance)

Streams GoldenGate Bottleneck

  • Capture process – log mining
  • Delivery process – limitation of single session

resources resources

Parallelism of apply / delivery

  • By setting process’s parameter
  • Apply coordinator takes care about

serialization

  • By manual addition of processes and

specification of filters

  • No coordinator process – no serialization

p guaranteed

Big parallel transactions

  • Serialized capture
  • Parallel capture

Replication processes impact

  • n the system
  • Quite significant
  • Minimal

Potential improvements

  • None
  • Parallelism of schemas + BatchSQL +

compression

Stability

  • Not stable. A lot of aborts due to
  • Very stable so far

y

database memory issues

  • Hangs due to session blocking
slide-20
SLIDE 20

GoldenGate vs Streams (functionality and maintenance)

Streams GoldenGate Initial load

  • Data pump
  • Initial load configuration

Monitoring

  • SQL + EM + Custom tools
  • Director

Administration

  • All task can be done through SQLPlus
  • No direct access to machines required
  • Using Director (most of tasks)
  • Using GGSCI requires direct access to

node

Error handling

  • Errors well documented
  • Errors not well documented and not fully

Error handling

  • Errors expressed in database language
  • All error handling procedures fully

understood y understandable

  • Handling procedures not recognized yet

Data replication

  • Replication of data definitions and
  • Special handling of data definitions.

Data replication (DDL+DML)

p modifications p g

  • Only delivery without parallelism can

guarantee smooth replication

Specific data handling

  • Apply handlers
  • None

handling

slide-21
SLIDE 21

GoldenGate vs Streams (functionality and maintenance)

Streams GoldenGate Schemas versioning

  • Multi Version Object Dictionary
  • Single Version Object Dictionary
  • Potential problems with recapturing of older

versioning

  • Potential problems with recapturing of older

changes

Resuming of replication

  • Slow - capture process needs to re-init

dictionary from the last checkpoint

  • Very fast
  • Using checkpoint files

Failing transactions

  • Mechanism for re-execution of failed

transaction

  • No extra mechanism.
  • Rollback the sequence change number

(SCN) and restart the delivery process

slide-22
SLIDE 22

Monitoring

  • 3 applications

– Admin tool (defines managers location) ( g ) – Director web application (status, logs, start/stop, email notifications) Di d k li i ( l / – Director desktop application ( status, logs, start/stop, email notifications, replication configuration and management, setup topology ) g , p p gy )

  • Written in JAVA
  • Requires Java 1.6 SDK and Weblogic server

Requires Java 1.6 SDK and Weblogic server

  • Wraps GGSCI command line interface
slide-23
SLIDE 23

Admin tool

slide-24
SLIDE 24

Director - web app

slide-25
SLIDE 25

Director – desktop app

slide-26
SLIDE 26

Monitoring

  • Intuitive interface layout (fast information display – no nested

views)

  • Access to process statistics and reports
  • Email notification

Lag (floods mail box sends every minute) – Lag (floods mail box – sends every minute) – Event (errors, warnings)

  • Difficult to monitor many replication setups
  • From time to time problems with refreshing
  • Not all operations works after first attempt (retries are

d d) needed)

  • No history data
  • No plots except lags
  • No plots except lags
slide-27
SLIDE 27

Summary

  • DBMS independent software

– Deployment requires more afford (‘attaching’ to DB) p y q ( g ) – Minimal machine resource utilization – Minimal impact on the database

  • Golden Gate is stable and efficient.
  • Technology focused on data modification
  • perations

– CERN’s data profile is definitions + modifications

slide-28
SLIDE 28

Summary

  • Performance (safe mode) better than Streams10g

– There is big potential of improvements but we need to g p p avoid ‘driving without breaks’

  • Errors not well documented
  • Lack of experience (administration)

– Errors handling

  • Fair monitoring (in multiple setups context)
  • Strategic replication solution for Oracle
slide-29
SLIDE 29

Future plans

  • More performance tests

– Per schema delivery parallelism y p – Real PVSS’s and Condition's data replication tests

  • Establishing contact with GG development team

– Arranged meeting at Oracle Open World next week

  • Evaluation new features on GG 11g (new version

just has been released)