1
HI HIGH H AVA VAILAB ILABILITY ILITY AND DIS ISASTER ASTER RECO ECOVERY VERY FOR OR IM IMDG
VLADIMIR KOMAROV, MIKHAIL GORELOV SBERBANK OF RUSSIA
FOR OR IM IMDG VLADIMIR KOMAROV, MIKHAIL GORELOV SBERBANK OF - - PowerPoint PPT Presentation
HI HIGH H AVA VAILAB ILABILITY ILITY AND DIS ISASTER ASTER RECO ECOVERY VERY FOR OR IM IMDG VLADIMIR KOMAROV, MIKHAIL GORELOV SBERBANK OF RUSSIA 1 ABOUT UT SP SPEA EAKER ERS Vladimir imir Komaro marov in Sberbank since 2010.
1
VLADIMIR KOMAROV, MIKHAIL GORELOV SBERBANK OF RUSSIA
2
in Sberbank since 2010. He realized the concepts of operational data store (ODS) and retail risk data mart as a part of enterprise data warehouse. In 2015 performed the testing of 10+ distributed in-memory platforms for transaction processing. Now responsible for grid-based core banking infrastructure architecture including high availability and disaster recovery. in Sberbank since 2012. He is responsible for building the infrastructure landscape for the major mission-critical applications as core banking and cards processing including new grid-based banking platform. Now he acts as both expert and project manager in β18+β core banking transformation program.
3
4
5
6
Datace cente nter loss ss DC interc ercon
7
Applicati tion
vers
compute
In In-memo memory y data grid
caching & temporary storage
Relati ational nal DBMS
persistence & compute
representation to relational model
data is changed directly in the database
8
Applicati tion
vers
compute
In In-memo memory y data grid
compute & data persistence
no conversion required
9
Continuity threats Errors Local failures Hardwa dware/OS /OS/J /JVM VM failures Netw twor
k failures Data corruption due to user/admin action Disasters Cluster breakdown due to application errors and/or admin action Datac acente ter loss Datac acente ter intercon
t loss Service jobs Cluster topology change Software update Firmwar are/OS /OS/JV /JVM upgr grade ade Platf tfor
grade ade Application upgrade
10
API SPI Defin ined by Platform Platform Impleme mente nted by Platform System software (custom code) Called by Application (custom code) Platform
11
Data/compute grid
Data Data area 1
(e. . g. cli client ents)
Data area 2
(e. g.
ccoun unti ting) ng)
nodeFilter
partition() assignPartitions()
the property of the cache that defines the set of nodes where the cacheβs data can reside the fast, simple and deterministic function usually division reminder mapping
the function distributing partitions (chunks) across the nodes
12
Cell
1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 6 7 8 1 7 8 1 2 8 1 2 3 Datacenter ter 2 Datacenter ter 1
more nodes in the cluster β faster recovery
more linked nodes β stronger performance impact
8 nodes (a cell).
3 backups.
13
rent racks ks.
twork rk provides stable high-speed connectivity.
ter interconne connect ct reduces split-brain probability.
Me flash h and HDDs.
14
DC1 DC2
DC1 DC2 DC1 DC2
DC1 DC2
DC1 DC2 DC1 DC2
Regular operation Datacenter loss DC interconnect loss Fragmentation type 1 Fragmentation type 2 Fragmentation type 3
15
true (default) false
16
yes no DC1 DC2 Data Decisio ion All Partial οΌ RW All Partial ο» All None οΌ RW All None ο» Partial All οΌ AW
and wait for admin interaction
17
STOP STOP
Quorum node
18
Trx processing Paged memory Paged disk storage (files)
Write-ahead log Async
te
19
20