Click ’Insert’ Click ‘Picture’
Unpredictable & interactive
analysis of terabytes of data
Amadeus Revenue Accounting Metadata Search
Big Data Paris, 11 March 2015 Laurent Dollé
ldolle@amadeus.com
265ced1609a17cf1a5979880a2ad364653895ae8
Unpredictable & interactive analysis of terabytes of data Amadeus - - PowerPoint PPT Presentation
Click Insert Click Picture Unpredictable & interactive analysis of terabytes of data Amadeus Revenue Accounting Metadata Search Big Data Paris , 11 March 2015 Laurent Doll ldolle@amadeus.com
Click ’Insert’ Click ‘Picture’
Amadeus Revenue Accounting Metadata Search
Big Data Paris, 11 March 2015 Laurent Dollé
ldolle@amadeus.com
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
In a few words
Amadeus is a technology company dedicated to the
We are present in 195 countries
with a worldwide team of more than 11,000 people. Our solutions help improve the
airports, hotels, railways and more.
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
The travel industry
Cruiselines Hotels Car rental Ground handlers Ferry operators Ground transportation Airports Travel agencies Insurance companies Airlines
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
The traveler life cycle
Post-trip On trip Pre-trip Buy/Purchase Search Inspire
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Global operations
We designed & own our Data Processing Centres
transactions
processed per day
travel agency bookings
processed in 2013
Passengers Boarded
in 2013
scheduled network
airline seats
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
To our customers
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
To innovation
since 2004.
software companies in 2013 European Union Industrial R&D Investment Scorecard.
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
sustainable
Is a growth industry
Source: IATA. Airline Industry forecast 2013-2017
2.98 billion air passengers
2012
air passengers
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
is shared
(marketing & operating)
What for?
handles cash flows
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Distribution
applications
infrastructure
In common
By leveraging our GDS position
passenger sales revenue
_ at usage time: effective revenue _ at sale time, weeks before:
expected revenue
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
revenue accounting
Key benefits & features
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
One of our launch partners is a
large European airline
_ transporting 35m+
revenue accounting industry
Gathered from a launch partner They requested a user-friendly way to query any data
in our main operational database
_ Many advanced reporting requirements
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
edit, import, save & share queries
fed in real time 4 years history (140m+ documents, versioned)
_ Search further using
The main promises
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
November 2013 User acceptance testing December 2014 Migration & parallel running validation on production Summer 2015 Production cut-over Post cut-over
SLA & optimisation based on usage statistics
And possible impacts
Any delay or functional gap may
as application is used to validate
migration and parallel running phases.
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Split into 2 functional areas 2 functional areas can be defined
predicates filtering the results
projections and related functions
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Query editor
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Query editor
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
to unpredictable queries
_ Fields to be scanned unknown
Need to scale out for sustainable performances Support mainstream SQL DML statements
_ Aggregation _ Cross-column comparison, Boolean logic _ Sort
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Document timeline implemented to
retrieve efficiently the particular version of a document
based on arbitrary date, event name, flags
Efficient upserts & transactions needed to
replace or update multiple versions at each write
Featuring a document timeline
1.0
Issuance
1.1
Issuance confirmation
2.0
Exchange
Timeline
3.0
Usage
3.1
Usage (replay)
3.2
Usage (replay)
Events out of timeline
2.1
Exchange (replay)
4.0
Exchange
conflict: 3.2 bumped out of timeline conflict last issuance confirmation last 2.x last usage last issuance last 1.x last 3.x last exchange final event last 4.x
Flags
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Our main operational database is an Oracle document store containing
(4000+ fields)
A schema-less document store would ease
(400+ metadata fields to load)
_ the data model maintenance
& synchronization between both databases
For agile integration
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
_ Expecting accuracy since data used by auditors _ However: no operational impact
application is not MCA
_ To be agreed after benchmarking on production
with very few parallel users
And their impacts
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Runs on standard x86 architecture
Enterprise-grade security
In the Amadeus standards
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Mounting all data in memory is
irrelevant for cost & hardware reasons: 90TB for our biggest prospect.
Technical & functional limitations,
complex to implement & maintain.
Still young, with a steep learning
curve.
Distributed data analysis not exactly
matching our use-case.
To MongoDB
Slightly behind MongoDB for document
search (index mandatory).
N1QL not finalized.
Key-value store not exactly
matching our use-case.
Amadeus in-house R&D database engine
(index-less, main-memory only, partitioning data at CPU core level).
Project terminated.
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Database is highly sharded – as many shards as cores – so that each shard spawns its own thread,
thus sharing efficiently the workload on the whole CPU power.
To speed up aggregation queries
A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features
many sockets (2-4) and many cores (8-16),
meaning wasted computing power if we do not enforce parallel processing. Our online analytical processing use-case implies
intense workload (full scans)
with limited concurrency as queries are queued and
run sequentially.
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Performances increase almost linearly in respect to the number of shards
Cleaning step is mandatory (12 shards and +)
Through in-memory microsharding
50 100 150 200 250 300 350 400 10 20 30 40 50 60
time shards
Full scan
200 400 600 800 1000 1200 1400 1600 1800 10 20 30 40 50 60
time shards
Full scan with aggregation
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Through data ramp-up
2 4 6 8 10 12 200 400 600 800 1000 1200
time data size
Full scan
100 200 300 400 500 200 400 600 800 1000 1200
time data size
Full scan with aggregation
Behaviour reproduced for 2 shard distributions 24 & 48 shards on 6 physical servers, 100% in-memory
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Through generated search criteria
2 4 6 8 10 10000 20000 30000 40000 50000
time search criteria pairs (A and B)
Full scan: OR & AND
0,5 1 1,5 2 10000 20000 30000 40000 50000
time search criteria
Full scan: IN
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
_ Server
HP ProLiant DL580 Gen8
4 sockets, x86, rack
_ 4x CPU
Intel Xeon E7-4850 v2
2.30 GHz, 12 physical cores
_ RAM 512GB
40GB/s scanning speed
_ 2x flash cards
Fusion-io ioScale 3.2TB
1.5GB/s read
3 virtual config servers
_
RAM 8GB
Facts & figures Overall cluster
Currently 1 year of production data (4 expected)
docs with padding
data & index extents
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Input queue Error queue
RA workflow Revenue Accounting
Write Read
REV
Sharded replica sets Config servers
1st 2nd x Mongo daemons & arbiter
Shell & drivers (C++, Python, Java)
mongoimport initial/massive feed live feed
REV OBE BATCH CLUSTER - SLES MONGODB CLUSTER - RHEL
ad-hoc investigation AQG lib C++ driver
Shard router
service
live trigger
MSG
live gateway
Shard router
applicative
Shard router
applicative
REV OBE OLTP CLUSTER - SLES
SI
https
Browser
corrective feed
MSF
front-end
edifact
JSON files
MSG
batch gateway
AQG lib C++ driver
ORACLE CLUSTER
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Microsharding is a powerful way to increase response times, what else can bring value?
And its results
NUMA Kernel tuning Striped
replica set
Cgroups
Cgroups
Prevent shards from competing for memory when data does not fit into RAM – especially with microsharding. Low-memory Cgroups may be compressed with zRAM/WiredTiger.
Kernel tuning
Optimize Linux in case of CPU-bound effort (vs. IO-bound): small readahead, THP off, increase task scheduler.
NUMA
Restrict access to CPU & memory for secondary daemons.
Striped replica set
Span shards on all the available hardware, with secondary daemons replicated on different nodes for smooth failover.
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Mongo daemon Mongo daemon Mongo daemon 1st 2nd Mongo daemons 1st 2nd x Mongo daemons & arbiter 1st 2nd x Mongo daemons & arbiter 1st 2nd
x
Mongo daemons & arbiter 1st 2nd x Mongo daemons & arbiter 1st 2nd x Mongo daemons & arbiter 1st 2nd x Mongo daemons & arbiter 2nd 1st 2nd Mongo daemons 2nd 1st 2nd Mongo daemons 2nd
UNSHARDED DATABASE SHARDS SHARDED REPLICA SETS SHARDED REPLICA SETS STRIPED & SHARDED REPLICA SETS
_ Many options & combinations possible
_ Updates performed on-the-fly
Horizontal scaling through sharding High availability through replication (primary & secondary shards) Cheaper, relaxed high-availability through arbiters (empty shards) Hardware fault-tolerance through physical servers
C B A
Shard, replicate & stripe
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Full scan aggregation is CPU-bound,
with a fixed entry cost for unwinds.
_ no unwind 3s _ unwinds on 1, 2 or 3 levels 70s
promise is complied with
In the absence of concurrency,
across all tests.
And their lessons learnt
Indexes have a linear impact
Complex query with 4 match criteria
_
full scan 100s
_
index, 40% selectivity 40s Complex query with 4 match criteria, including field-on-field comparison
_
full scan 190s
_
index, 40% selectivity 70s
_
index, 75% selectivity 145s
Position of the match operator in the aggregation pipeline can impact
index usage.
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Flavours
MongoDB Ops Manager can be run
_ in the cloud
On-prem version features
_ an admin GUI _ a monitoring API
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Integrated in topology explorer
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Integrated in ping watchdog
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Integrated in real-time monitoring
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Integrated in Ops workbench
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’
Need some basic help? Some expert advice? Or the source code? Google can definitely help, but MongoDB too.
_ Turn Pre-sales Engineers & Solutions Architects into Trainers & Evangelists _ Everybody can open tickets in MongoDB’s JIRA, but Commercial Support can
process them even faster for you (premium)
_ A dedicated Technical Account Manager can follow your project, provide ad-hoc support
and chase tickets internally Turn your employees into smart creatives
_ Empower small teams, embrace agility, set broad objectives & watch the magic
_ Even internal use-cases might be addressed by accident
Can help you go the extra mile
Click ‘Insert’ in Top menu Click ’Header & Footer’ in field ‘Footer’ Click ‘Apply to All’ You can follow us on:
AmadeusITGroup amadeus.com/blog amadeus.com
Thank you
265ced1609a17cf1a5979880a2ad364653895ae8