On Brewing Fresh Espresso: LinkedIns Distributed Data Serving - - PowerPoint PPT Presentation

▶

Mar 01, 2023 106 likes •251 views

On Brewing Fresh Espresso: LinkedIns Distributed Data Serving Platform Thomas Marshall Motivation Better performance and horizontal scalability than traditional RDBMS. Better consistency, transactions, and schema support than

SLIDE 1

On Brewing Fresh Espresso: LinkedIn’s Distributed Data Serving Platform

Thomas Marshall

SLIDE 2

Motivation

Better performance and horizontal scalability

than traditional RDBMS.

Better consistency, transactions, and

schema support than NoSQL.

Integration into LinkedIn’s data ecosystem.

SLIDE 3

Data Model

Nested entities and independent entities.
Relational

○ Documents - the equivalent of rows

Hierarchical

○ Document groups - share same partitioning key, span tables, largest unit of transactions

SLIDE 4

Secondary Indexes

Allow for efficient lookup based on values
ther than the primary key.
Local secondary indexes - apply to one

document group.

Global secondary indexes - apply across doc

groups, implemented as derived tables.

SLIDE 5

Secondary Indexes

Lucene

○ Inverted index. ○ Log structured.

Prefix

○ Inverted index, prefixed by the partition key.

SLIDE 6

Architecture

Client - submit requests

via REST API.

Router - send request to

appropriate node based

n partitioning protocol.

SLIDE 7

Architecture

Helix

○ Cluster management system ○ Assigns partitions

SLIDE 8

Architecture

Fault tolerance

○ When a master partition fails, a slave is promoted by Helix. ○ Zookeeper heartbeat and performance metrics determine failure.

SLIDE 9

Overpartitioning

Shard data into

many more partitions than there are nodes.

Eases

failover/cluster expansion.

SLIDE 10

Architecture

Storage node

○ Stores partitions. ○ Performs queries. ○ Maintains log. ○ Performs background tasks.

SLIDE 11

Architecture

Databus

○ Achieves replication via pub/sub ○ Ensures timeline consistency ○ Replicated for fault tolerance

SLIDE 12

Future Work

Transactions across document groups.
OLAP workloads.
Multiple data center deployment.

SLIDE 13

Conclusion

Espresso attempts to find a nice medium

between traditional RDBMS and NoSQL.

LinkedIn particularly emphasized operability
ease of schema changes, horizontal

On Brewing Fresh Espresso: LinkedIn’s Distributed Data Serving Platform

Thomas Marshall

Motivation

than traditional RDBMS.

schema support than NoSQL.

Data Model

○ Documents - the equivalent of rows

○ Document groups - share same partitioning key, span tables, largest unit of transactions

Secondary Indexes

document group.

groups, implemented as derived tables.

Secondary Indexes

○ Inverted index. ○ Log structured.

○ Inverted index, prefixed by the partition key.

Architecture

via REST API.

appropriate node based

Architecture

○ Cluster management system ○ Assigns partitions

Architecture

○ When a master partition fails, a slave is promoted by Helix. ○ Zookeeper heartbeat and performance metrics determine failure.

Overpartitioning

many more partitions than there are nodes.

failover/cluster expansion.

Architecture

○ Stores partitions. ○ Performs queries. ○ Maintains log. ○ Performs background tasks.

Architecture

○ Achieves replication via pub/sub ○ Ensures timeline consistency ○ Replicated for fault tolerance

Future Work

Conclusion

between traditional RDBMS and NoSQL.

scalability, etc.