Unpredictable & interactive analysis of terabytes of data Amadeus - - PowerPoint PPT Presentation

unpredictable interactive
SMART_READER_LITE
LIVE PREVIEW

Unpredictable & interactive analysis of terabytes of data Amadeus - - PowerPoint PPT Presentation

Click Insert Click Picture Unpredictable & interactive analysis of terabytes of data Amadeus Revenue Accounting Metadata Search Big Data Paris , 11 March 2015 Laurent Doll ldolle@amadeus.com


slide-1
SLIDE 1

Click ’Insert’ Click ‘Picture’

Unpredictable & interactive

analysis of terabytes of data

Amadeus Revenue Accounting Metadata Search

Big Data Paris, 11 March 2015 Laurent Dollé

ldolle@amadeus.com

265ced1609a17cf1a5979880a2ad364653895ae8

slide-2
SLIDE 2

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Amadeus today

1

265ced1609a17cf1a5979880a2ad364653895ae8

slide-3
SLIDE 3

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Amadeus

In a few words

Amadeus is a technology company dedicated to the

global travel industry.

We are present in 195 countries

with a worldwide team of more than 11,000 people. Our solutions help improve the

business performance

  • f travel agencies, corporations, airlines,

airports, hotels, railways and more.

slide-4
SLIDE 4

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Connecting

The travel industry

Cruiselines Hotels Car rental Ground handlers Ferry operators Ground transportation Airports Travel agencies Insurance companies Airlines

slide-5
SLIDE 5

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Supporting

The traveler life cycle

Post-trip On trip Pre-trip Buy/Purchase Search Inspire

slide-6
SLIDE 6

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Robust

Global operations

We designed & own our Data Processing Centres

_ Central DC @ Erding, Germany _ Remote DCs all over the globe _ Recovery DC on standby in case of natural disasters

1.6+

billion

transactions

processed per day

502+

million

travel agency bookings

processed in 2013

615+

million

Passengers Boarded

in 2013

95%

  • f the world’s

scheduled network

airline seats

slide-7
SLIDE 7

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Close

To our customers

slide-8
SLIDE 8

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Our commitment

To innovation

_ Amadeus has invested €2.9bn in

Research & Development

since 2004.

_ Nominated within “top 3”

software companies in 2013 European Union Industrial R&D Investment Scorecard.

slide-9
SLIDE 9

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Amadeus growth is powered by a

sustainable

transaction-based business model Global air travel

Is a growth industry

Source: IATA. Airline Industry forecast 2013-2017

2.98 billion air passengers

2012

2017

3.91 billion

air passengers

31 %

growth

slide-10
SLIDE 10

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Amadeus Revenue Accounting

2

265ced1609a17cf1a5979880a2ad364653895ae8

slide-11
SLIDE 11

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Revenue of a flight ticket

is shared

_ Travel agent _ Governments _ Airlines: many can be involved

(marketing & operating)

What for?

Passenger Revenue Accounting Amadeus Revenue Accounting

handles cash flows

  • n behalf of airlines

_ Tracking _ Error handling & optimisation _ Reporting: analysis & audit

slide-12
SLIDE 12

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Distribution

IT

  • Data centres
  • Platforms and

applications

  • Sales & marketing

infrastructure

  • Customers

In common

Increasing accuracy

By leveraging our GDS position

Real-time tracking

  • f airline’s

passenger sales revenue

_ at usage time: effective revenue _ at sale time, weeks before:

expected revenue

slide-13
SLIDE 13

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

_Facilitate

strategic decisions

_Optimise

revenue accounting

processes

Amadeus Revenue Accounting

Key benefits & features

Web apps, APIs & feeds hosted in the Amadeus cloud (SaaS)

slide-14
SLIDE 14

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Metadata Search business needs

3

265ced1609a17cf1a5979880a2ad364653895ae8

slide-15
SLIDE 15

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

One of our launch partners is a

large European airline

_ transporting 35m+

passengers a year

_ key player in the

revenue accounting industry

Business needs

Gathered from a launch partner They requested a user-friendly way to query any data

in our main operational database

_ Unpredictable ad-hoc search

_ Many advanced reporting requirements

Migrating _ from their in-house data warehouse _ to our cloud-based solution

slide-16
SLIDE 16

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

_Graphical user interface

edit, import, save & share queries

_Data warehouse

fed in real time 4 years history (140m+ documents, versioned)

_ Interactive response times

_ Search further using

chained queries (patent pending)

Metadata Search

The main promises

slide-17
SLIDE 17

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

November 2013 User acceptance testing December 2014 Migration & parallel running validation on production Summer 2015 Production cut-over Post cut-over

SLA & optimisation based on usage statistics

Project milestones

And possible impacts

Any delay or functional gap may

impact the whole project

as application is used to validate

migration and parallel running phases.

slide-18
SLIDE 18

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

User-friendly SQL graphical user interface

4

265ced1609a17cf1a5979880a2ad364653895ae8

slide-19
SLIDE 19

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

SQL paradigm

Split into 2 functional areas 2 functional areas can be defined

_ Search criteria

predicates filtering the results

_ Displayed data

projections and related functions

SELECT A, SUM(B) WHERE A > C AND B > D GROUP BY A ORDER BY A

slide-20
SLIDE 20

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Graphical user interface

Query editor

slide-21
SLIDE 21

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Graphical user interface

Query editor

slide-22
SLIDE 22

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Technical constraints

5

265ced1609a17cf1a5979880a2ad364653895ae8

slide-23
SLIDE 23

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Expecting fast answer

to unpredictable queries

No index, no hint (almost)

_ Fields to be scanned unknown

_ Main-memory full scans to decrease response time

Need to scale out for sustainable performances Support mainstream SQL DML statements

_ Aggregation _ Cross-column comparison, Boolean logic _ Sort

slide-24
SLIDE 24

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Document timeline implemented to

retrieve efficiently the particular version of a document

based on arbitrary date, event name, flags

Efficient upserts & transactions needed to

replace or update multiple versions at each write

Resilient & user-friendly versioning

Featuring a document timeline

1.0

Issuance

1.1

Issuance confirmation

2.0

Exchange

Timeline

3.0

Usage

3.1

Usage (replay)

3.2

Usage (replay)

Events out of timeline

2.1

Exchange (replay)

4.0

Exchange

conflict: 3.2 bumped out of timeline conflict last issuance confirmation last 2.x last usage last issuance last 1.x last 3.x last exchange final event last 4.x

Flags

slide-25
SLIDE 25

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Our main operational database is an Oracle document store containing

Protocol Buffers documents

(4000+ fields)

A schema-less document store would ease

_ the ETL transformation process

(400+ metadata fields to load)

_ the data model maintenance

& synchronization between both databases

Schema-less document store

For agile integration

slide-26
SLIDE 26

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Consistency favoured over availability (CAP)

_ Expecting accuracy since data used by auditors _ However: no operational impact

application is not MCA

No contractual SLA

_ To be agreed after benchmarking on production

_ Interactive response times expected

with very few parallel users

_ Full outages out of business hours accepted

Consistency & availability

And their impacts

slide-27
SLIDE 27

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Runs on standard x86 architecture

C++, Python & Java drivers

Enterprise-grade security

_ SSL encryption _ Kerberos authentication _ Data-at-rest encryption

Integration

In the Amadeus standards

slide-28
SLIDE 28

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

_ Oracle

Mounting all data in memory is

irrelevant for cost & hardware reasons: 90TB for our biggest prospect.

_ MySQL cluster

Technical & functional limitations,

complex to implement & maintain.

_ Impala

Still young, with a steep learning

curve.

Distributed data analysis not exactly

matching our use-case.

Considered alternatives

To MongoDB

_ Couchbase

Slightly behind MongoDB for document

search (index mandatory).

N1QL not finalized.

Key-value store not exactly

matching our use-case.

_ Crescando

Amadeus in-house R&D database engine

(index-less, main-memory only, partitioning data at CPU core level).

Project terminated.

slide-29
SLIDE 29

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Technical architecture

6

265ced1609a17cf1a5979880a2ad364653895ae8

slide-30
SLIDE 30

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Microsharding solves this issue.

Database is highly sharded – as many shards as cores – so that each shard spawns its own thread,

thus sharing efficiently the workload on the whole CPU power.

Enforcing parallel processing

To speed up aggregation queries

A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

many sockets (2-4) and many cores (8-16),

meaning wasted computing power if we do not enforce parallel processing. Our online analytical processing use-case implies

intense workload (full scans)

with limited concurrency as queries are queued and

run sequentially.

slide-31
SLIDE 31

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

_Microsharding validated, from 6 to 48 shards on 6 physical servers

Performances increase almost linearly in respect to the number of shards

_On-the-fly rebalancing validated

Cleaning step is mandatory (12 shards and +)

Benchmarking CPU usage

Through in-memory microsharding

50 100 150 200 250 300 350 400 10 20 30 40 50 60

time shards

Full scan

200 400 600 800 1000 1200 1400 1600 1800 10 20 30 40 50 60

time shards

Full scan with aggregation

slide-32
SLIDE 32

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

_ Performances increase linearly in respect to the amount of scanned data _ Positive impact of caching (light blue dots) validated on full scans only

Benchmarking scalability

Through data ramp-up

2 4 6 8 10 12 200 400 600 800 1000 1200

time data size

Full scan

100 200 300 400 500 200 400 600 800 1000 1200

time data size

Full scan with aggregation

Behaviour reproduced for 2 shard distributions 24 & 48 shards on 6 physical servers, 100% in-memory

slide-33
SLIDE 33

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Benchmarking scalability

Through generated search criteria

2 4 6 8 10 10000 20000 30000 40000 50000

time search criteria pairs (A and B)

Full scan: OR & AND

0,5 1 1,5 2 10000 20000 30000 40000 50000

time search criteria

Full scan: IN

_ Performances increase linearly in respect to the amount of search criteria

slide-34
SLIDE 34

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

6 physical data servers

_ Server

HP ProLiant DL580 Gen8

4 sockets, x86, rack

_ 4x CPU

Intel Xeon E7-4850 v2

2.30 GHz, 12 physical cores

_ RAM 512GB

40GB/s scanning speed

_ 2x flash cards

Fusion-io ioScale 3.2TB

1.5GB/s read

3 virtual config servers

_

RAM 8GB

Production cluster setup

Facts & figures Overall cluster

_ 288 cores, 288 sharded replica sets (2x+1) _ 3TB RAM, 38.4TB flash card storage

Currently 1 year of production data (4 expected)

_ 250m+ docs (1bn)

_ Data size 2.8TB (11TB)

docs with padding

_ Average object size 11.9KB _ File size 3.97TB (16TB)

data & index extents

slide-35
SLIDE 35

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

265ced1609a17cf1a5979880a2ad364653895ae8

Input queue Error queue

RA workflow Revenue Accounting

  • perational database

Write Read

REV

Sharded replica sets Config servers

1st 2nd x Mongo daemons & arbiter

Shell & drivers (C++, Python, Java)

mongoimport initial/massive feed live feed

REV OBE BATCH CLUSTER - SLES MONGODB CLUSTER - RHEL

  • n-call, debugging &

ad-hoc investigation AQG lib C++ driver

Shard router

service

live trigger

MSG

live gateway

Shard router

applicative

Shard router

applicative

REV OBE OLTP CLUSTER - SLES

SI

https

Browser

corrective feed

MSF

front-end

edifact

JSON files

MSG

batch gateway

AQG lib C++ driver

ORACLE CLUSTER

Technical architecture

slide-36
SLIDE 36

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Microsharding is a powerful way to increase response times, what else can bring value?

Database customisation

And its results

NUMA Kernel tuning Striped

replica set

Cgroups

Cgroups

Prevent shards from competing for memory when data does not fit into RAM – especially with microsharding. Low-memory Cgroups may be compressed with zRAM/WiredTiger.

Kernel tuning

Optimize Linux in case of CPU-bound effort (vs. IO-bound): small readahead, THP off, increase task scheduler.

NUMA

Restrict access to CPU & memory for secondary daemons.

Striped replica set

Span shards on all the available hardware, with secondary daemons replicated on different nodes for smooth failover.

slide-37
SLIDE 37

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

High availability & fault tolerance

265ced1609a17cf1a5979880a2ad364653895ae8

Mongo daemon Mongo daemon Mongo daemon 1st 2nd Mongo daemons 1st 2nd x Mongo daemons & arbiter 1st 2nd x Mongo daemons & arbiter 1st 2nd

x

Mongo daemons & arbiter 1st 2nd x Mongo daemons & arbiter 1st 2nd x Mongo daemons & arbiter 1st 2nd x Mongo daemons & arbiter 2nd 1st 2nd Mongo daemons 2nd 1st 2nd Mongo daemons 2nd

UNSHARDED DATABASE SHARDS SHARDED REPLICA SETS SHARDED REPLICA SETS STRIPED & SHARDED REPLICA SETS

_ Many options & combinations possible

_ Updates performed on-the-fly

Horizontal scaling through sharding High availability through replication (primary & secondary shards) Cheaper, relaxed high-availability through arbiters (empty shards) Hardware fault-tolerance through physical servers

C B A

Shard, replicate & stripe

slide-38
SLIDE 38

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Production benchmarks

7

265ced1609a17cf1a5979880a2ad364653895ae8

slide-39
SLIDE 39

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Full scan aggregation is CPU-bound,

with a fixed entry cost for unwinds.

_ no unwind 3s _ unwinds on 1, 2 or 3 levels 70s

Interactive response times

promise is complied with

  • n basic use-cases

In the absence of concurrency,

response times are consistent

across all tests.

Production response times

And their lessons learnt

Indexes have a linear impact

  • n response times.

Complex query with 4 match criteria

_

full scan 100s

_

index, 40% selectivity 40s Complex query with 4 match criteria, including field-on-field comparison

_

full scan 190s

_

index, 40% selectivity 70s

_

index, 75% selectivity 145s

Position of the match operator in the aggregation pipeline can impact

index usage.

slide-40
SLIDE 40

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Integrated monitoring

8

265ced1609a17cf1a5979880a2ad364653895ae8

slide-41
SLIDE 41

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Ops Manager

Flavours

MongoDB Ops Manager can be run

_ in the cloud

_ on premise

On-prem version features

_ an admin GUI _ a monitoring API

slide-42
SLIDE 42

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Ops Manager API

Integrated in topology explorer

slide-43
SLIDE 43

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Ops Manager API

Integrated in ping watchdog

slide-44
SLIDE 44

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Ops Manager API

Integrated in real-time monitoring

slide-45
SLIDE 45

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Ops Manager API

Integrated in Ops workbench

slide-46
SLIDE 46

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Feedback

  • n 1 year of Open Source

9

265ced1609a17cf1a5979880a2ad364653895ae8

slide-47
SLIDE 47

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’

Need some basic help? Some expert advice? Or the source code? Google can definitely help, but MongoDB too.

_ Turn Pre-sales Engineers & Solutions Architects into Trainers & Evangelists _ Everybody can open tickets in MongoDB’s JIRA, but Commercial Support can

process them even faster for you (premium)

_ A dedicated Technical Account Manager can follow your project, provide ad-hoc support

and chase tickets internally Turn your employees into smart creatives

_ Empower small teams, embrace agility, set broad objectives & watch the magic

_ Even internal use-cases might be addressed by accident

Services & empowerment

Can help you go the extra mile

slide-48
SLIDE 48

Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’ You can follow us on:

AmadeusITGroup amadeus.com/blog amadeus.com

Thank you

265ced1609a17cf1a5979880a2ad364653895ae8