Monolithic Batch Goes Microservice Streaming A story about one - - PowerPoint PPT Presentation

monolithic batch goes microservice streaming
SMART_READER_LITE
LIVE PREVIEW

Monolithic Batch Goes Microservice Streaming A story about one - - PowerPoint PPT Presentation

Monolithic Batch Goes Microservice Streaming A story about one transformation Charles Tye & Anton Polyakov Who are We? What We Do Develop solutions for Market Risk Credit Risk Liquidity Risk Stress Testing Messaging Together with


slide-1
SLIDE 1

Monolithic Batch Goes Microservice Streaming

A story about one transformation

Charles Tye & Anton Polyakov

slide-2
SLIDE 2
slide-3
SLIDE 3

Who are We?

3 •

Anton Polyakov Head of Application Development 2 years in Nordea Charles Tye Head of Core Services & Risk IT 17 years in Nordea

Develop solutions for Market Risk Credit Risk Liquidity Risk Stress Testing Messaging Together with around 70 other people from all over the world

What We Do

slide-4
SLIDE 4

Market Risk

4 •

The high level view Quantify potential losses and exposures Do many small risks add up to a big risk? Can risks combine in unusual and unexpected ways?

slide-5
SLIDE 5

Market Risk

5 •

Line of Defence Protect Nordea and

  • ur customers

Daily internal reporting and external reporting to regulators Independent function Analysis and insight into the sources of risk Control of risk Management of capital

slide-6
SLIDE 6

Examples of Risk Analysis

6 •

Value at Risk Look at last 2 years of market history Average of the worst 1%

  • f outcomes

Simulate if the same thing happened again today. Highly non-linear but requirement to drill in and find the drivers

slide-7
SLIDE 7

Examples of Risk Analysis

7 •

Stress Scenarios “Black Swan” worst case scenarios Unexpected outcomes from future events Example: Brexit Simulate if it happened

slide-8
SLIDE 8

An Interesting Technology Problem

8 •

Consistent Non-linear Volume Speed Risk Analysis: Everything has to be included = know when you are complete Risk does not sum

  • ver hierarchies

Drill-down is non trivial Traditional OLAP aggregate & increment doesn’t work 10,000,000 ,000,000 Reactive near real-time calculations Streaming data Fast corrections and “what-if” Interactive sub-second queries on huge data sets

slide-9
SLIDE 9

Challenge No 1.

Find the seams Break it up Reusable components Replace a piece at a time

9 •

Spaghetti

slide-10
SLIDE 10

Challenge No 2.

10 •

Develop a new service Integrate into the legacy system Reconcile the output Find and fix legacy bugs Fight complification

slide-11
SLIDE 11

Challenge No 3.

Batch is synchronous state transfer. The

  • nly way to achieve consistency?

11 •

Consistency is seriously hard to combine with streaming

Event sourced and streaming approach More robust, scalable and faster, especially for recovery Comes with a cost

slide-12
SLIDE 12

Challenge No 4.

Legacy SQL was slow

12 •

Partitions and horizontally scales out across commodity hardware. Tougher challenges on terabyte-scale hardware due to NUMA limitations. Some cubes already > 200gb and larger ones planned.

Replace with in-memory aggregation Aggregate billions of scenarios in-memory and pre-compute total vectors

  • ver hierarchies (linear)

Non-linear measures computed lazily Reactive and continuous queries

slide-13
SLIDE 13

Solution: Microservices!

Well almost… Single responsibility – replace pieces of legacy from the inside out Self contained with business functional boundaries

  • Independent and rapid development – team owns the whole stack
  • Organisationally scalable – horizontally scale your teams

Flexible and maintainable – evolve the architecture Smart endpoints and dumb pipes Innovation and short lifecycles

13 •

slide-14
SLIDE 14

The problem

  • Business:
  • Multi-model Market Risk calculator for Nordea portfolio
  • VaR on different organization levels with 5-6 different models in parallel
  • IT:
  • 7000 CPU hours of grid calculation
  • More than 4000 SQL jobs
  • Graph with more than 10000 edges
  • Nightly batch flow

14 •

slide-15
SLIDE 15

How did it look like?

  • Well, you know. 10 years of development
  • In SQL
  • No refactoring

(who needs it?)

15 •

slide-16
SLIDE 16

Precisely, how did it look?

16 •

slide-17
SLIDE 17

Logical architecture

Monolith staged app

17 •

slide-18
SLIDE 18

Now a little of complication

Sloo-o-o-ow

  • Fat. So it breaks

Can be parallel?

18 •

slide-19
SLIDE 19

So what to do?

We all know the answer probably (since we are at this section ☺ )

  • Find logically isolated blocks
  • Keep an eye on non-functional aspect
  • Think of how they communicate
  • Think about what happens if something dies

19 •

slide-20
SLIDE 20

Not quite a “classical” microservices…or?

produce enrich aggregate

  • Request/response is not feasible
  • Synchronous interaction is too long
  • Some results are expensive to reproduce

20 •

slide-21
SLIDE 21

So we need…

A middleware which

  • “Glues” services together
  • Caches important results
  • Serves as a coordinator and work distributor

21 •

slide-22
SLIDE 22

Scale out Fast pub/sub Queues and sets pull and dedup Distributed locks

22 •

slide-23
SLIDE 23

Scale out Fast pub/sub Queues and sets pull and dedup Distributed locks Locks? Who needs locks?

23 •

slide-24
SLIDE 24

store store store

Pub/sub messaging as notifier

Producer Enricher Aggregator consumer Redis pub/sub

24 •

slide-25
SLIDE 25

But…

25 •

There are two main problems in distributed messaging: 2) Guarantee that each message is only delivered once 1) Guarantee messages order 2) Guarantee that each message is only delivered once

slide-26
SLIDE 26

Enricher

Redis pub/sub Incoming queue Processing queue Enricher Producer store Queues with atomic operations BRPOPLPUSH

26 •

slide-27
SLIDE 27

Sets and Hmaps – all good for dedup

In eventually consistent world dedup is your best friend store - HSET Enricher Multiple inserts due to recovery Consistent state due to dedup

27 •

slide-28
SLIDE 28

So how to scale out?

logically concurrently Enricher <type A> Enricher <type B> Enricher <type X> Redis pub/sub Aggregator <day 1> Aggregator <day 2> Aggregator <day 3> Steal work Filter my events RedLock + TTL

28 •

slide-29
SLIDE 29

Demo

store store store Producer Enricher Aggregator consumer Redis pub/sub Incoming queue Processing queue RedLock + TTL

29 •

slide-30
SLIDE 30

The Result and What We Learned

Success!

  • Aggregate and produce risk: 5 hours → 30 mins
  • Corrections: 40 mins → 1 second
  • Earlier deliveries – more time to manage the risks
  • Faster recovery from problems
  • Happy risk managers

Important (and painful) to integrate new services into the existing system Consistency is hard to combine with streaming (subject of another talk maybe) When distributing remember first law of distributed objects architecture

(do you remember it?)

30 •

slide-31
SLIDE 31

The Result and What We Learned

First Law of Distributed Object Design: "don't distribute your objects"

31 •

slide-32
SLIDE 32

And of course…

32 •

https://dk.linkedin.com/in/charles-tye-a8aa88b

https://github.com/parallelstream/

slide-33
SLIDE 33