On Brewing Fresh Espresso: LinkedIns Distributed Data Serving - - PowerPoint PPT Presentation

on brewing fresh espresso linkedin s distributed data
SMART_READER_LITE
LIVE PREVIEW

On Brewing Fresh Espresso: LinkedIns Distributed Data Serving - - PowerPoint PPT Presentation

On Brewing Fresh Espresso: LinkedIns Distributed Data Serving Platform Thomas Marshall Motivation Better performance and horizontal scalability than traditional RDBMS. Better consistency, transactions, and schema support than


slide-1
SLIDE 1

On Brewing Fresh Espresso: LinkedIn’s Distributed Data Serving Platform

Thomas Marshall

slide-2
SLIDE 2

Motivation

  • Better performance and horizontal scalability

than traditional RDBMS.

  • Better consistency, transactions, and

schema support than NoSQL.

  • Integration into LinkedIn’s data ecosystem.
slide-3
SLIDE 3

Data Model

  • Nested entities and independent entities.
  • Relational

○ Documents - the equivalent of rows

  • Hierarchical

○ Document groups - share same partitioning key, span tables, largest unit of transactions

slide-4
SLIDE 4

Secondary Indexes

  • Allow for efficient lookup based on values
  • ther than the primary key.
  • Local secondary indexes - apply to one

document group.

  • Global secondary indexes - apply across doc

groups, implemented as derived tables.

slide-5
SLIDE 5

Secondary Indexes

  • Lucene

○ Inverted index. ○ Log structured.

  • Prefix

○ Inverted index, prefixed by the partition key.

slide-6
SLIDE 6

Architecture

  • Client - submit requests

via REST API.

  • Router - send request to

appropriate node based

  • n partitioning protocol.
slide-7
SLIDE 7

Architecture

  • Helix

○ Cluster management system ○ Assigns partitions

slide-8
SLIDE 8

Architecture

  • Fault tolerance

○ When a master partition fails, a slave is promoted by Helix. ○ Zookeeper heartbeat and performance metrics determine failure.

slide-9
SLIDE 9

Overpartitioning

  • Shard data into

many more partitions than there are nodes.

  • Eases

failover/cluster expansion.

slide-10
SLIDE 10

Architecture

  • Storage node

○ Stores partitions. ○ Performs queries. ○ Maintains log. ○ Performs background tasks.

slide-11
SLIDE 11

Architecture

  • Databus

○ Achieves replication via pub/sub ○ Ensures timeline consistency ○ Replicated for fault tolerance

slide-12
SLIDE 12

Future Work

  • Transactions across document groups.
  • OLAP workloads.
  • Multiple data center deployment.
slide-13
SLIDE 13

Conclusion

  • Espresso attempts to find a nice medium

between traditional RDBMS and NoSQL.

  • LinkedIn particularly emphasized operability
  • ease of schema changes, horizontal

scalability, etc.