Why Actors Rock: Designing a Distributed Database with libcppa
Matthias Vallentin
matthias@bro.org
University of California, Berkeley
C++Now May 15, 2014
Why Actors Rock: Designing a Distributed Database with libcppa - - PowerPoint PPT Presentation
Why Actors Rock: Designing a Distributed Database with libcppa Matthias Vallentin matthias@bro.org University of California, Berkeley C ++ Now May 15, 2014 Outline 1. System Overview: VAST 2. Architecture: Ingestion, Indexing, and Query
Matthias Vallentin
matthias@bro.org
University of California, Berkeley
C++Now May 15, 2014
Ingestion Indexing Query
1 / 13
VAST
Distributed database built with libcppa
Goals
◮ Scalability
◮ Sustain high & continuous input rates ◮ Linear scaling with number of nodes
◮ Interactivity
◮ Sub-second response times ◮ Iterative query refinement
◮ Strong and rich typing
◮ High-level types and operations ◮ Type safety in query language Ingestor Ingestor Ingestor Client Client Client Archive Index Search Receiver 2 / 13
Network Forensics & Incident Response
◮ Scenario: security breach discovered ◮ Analysts tasked with determining scope and impact
Analyst questions
◮ How did the attacker(s) get in? ◮ How long did the they stay under the radar? ◮ What is the damage ($$$, reputation, data loss, etc.)? ◮ How to detect similar attacks in the future?
3 / 13
Ingestion Indexing Query
3 / 13
Ingestion Indexing Query
3 / 13
Ingestor Ingestor Ingestor Core Client Client Client 4 / 13
Ingestor 4 / 13
Ingestor Source Segmentizer 4 / 13
ingestor
Ingestor Source Segmentizer
10.0.0.1 10.0.0.254 53/udp 10.0.0.2. 10.0.0.254 80/tcp
4 / 13
ingestor
Ingestor Source Segmentizer
10.0.0.1 10.0.0.254 53/udp 2013-08-12 12:08:32 type info
4 / 13
ingestor
segments
Ingestor Source Segmentizer 4 / 13
ingestor
segments
Ingestor Source Segmentizer chunk meta data segment 4 / 13
ingestor
segments
Ingestor Source Segmentizer 4 / 13
ingestor
segments
Ingestor Source Segmentizer Core Receiver Archive Index 4 / 13
ingestor
segments
Ingestor Source Segmentizer Core Receiver Archive Index 4 / 13
ingestor
segments
Receiver Archive Index 4 / 13
ingestor
segments
receiver
from space 264
Receiver Archive Index Tracker 4 / 13
ingestor
segments
receiver
from space 264
Receiver Archive Index Tracker Search 4 / 13
ingestor
segments
receiver
from space 264
Receiver Archive Index Tracker Search Partitions ... 4 / 13
ingestor
segments
receiver
from space 264
archive and index
Receiver Archive Index Tracker Search Partitions ... 4 / 13
Ingestion Indexing Query
4 / 13
Index Partitions ... ... 5 / 13
index
relevant partition
Index Partitions ... ... 5 / 13
index
relevant partition
event values
Index Partitions ... ... ... Bitmap Indexers 5 / 13
index
relevant partition
event values
Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13
index
relevant partition
event values
into events
Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13
index
relevant partition
event values
into events
indexer
Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13
index
relevant partition
event values
into events
indexer
Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13
index
relevant partition
event values
into events
indexer
partition
Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13
Ingestion Indexing Query
5 / 13
Ingestor Ingestor Ingestor Core Client Client Client 6 / 13
Client 6 / 13
client
Client Search 6 / 13
client
search
Client Search
src == 10.0.0.1 && port == 53/udp
Index Partitions Indexers 6 / 13
client
search
Client Search
src == 10.0.0.1 && port == 53/udp
Index Partitions Indexers Query 6 / 13
client
search
Client Search
src == 10.0.0.1 && port == 53/udp
Index Partitions Indexers Query 6 / 13
client
search
Client Search Index Partitions Indexers Query
src == 10.0.0.1 && port == 53/udp
6 / 13
client
search
Client Search Index Partitions Indexers Query
port == 53/udp src == 10.0.0.1
6 / 13
client
search
query
Client Search Index Partitions Indexers Query
1 = “mass” 10100010011100 0 = empty
6 / 13
client
search
query
Client Search Index Partitions Indexers Query 6 / 13
client
search
query
Client Search Index Partitions Indexers Query 6 / 13
client
search
query
Client Search Index Partitions Indexers Query 6 / 13
client
search
query
Client Search Index Partitions Indexers Query Archive 6 / 13
client
search
query
Client Search Index Partitions Indexers Query Archive 6 / 13
client
search
query
Client Search Index Partitions Indexers Query Archive 6 / 13
client
search
query
Client Search Index Partitions Indexers Query Archive 6 / 13
Ingestion Indexing Query
6 / 13
Bufferbloat
Large buffers cause high latency and jitter
Aside: Go
goroutines execute concurrently and exchange messages via channels
◮ Sender blocks when channel is full ◮ Receiver blocks when channel is empty
→ Explicit notion of buffer
◮ libcppa : no blocking to signal overload
Bufferbloat in VAST
◮ Large segments (128MB) ◮ Data flow rates
◮ Ingestion: 80k–100k events/sec ◮ Indexing: 20k–200k events/sec
→ Sender overloads receiver: system runs out-of-memory
7 / 13
Flow Control
Feedback on capacity from overloaded resource up to sender
Revised indexing process
◮ Queue length: number of events sent to indexer
◮ Decreases queue length by events processed
◮ If watermark reached, tell ingestors to stop ◮ If watermark cleared, tell ingestors to go 8 / 13
Initial indexing process
Issues
9 / 13
Intra-Process Performance
Share data intelligently instead of partitioning it beforehand
Revised indexing process
◮ Do not “inflate” data just to partition it for workers ◮ GPGPU-style: make data available “globally” in workers
◮ Disburdens CPU: no time needed to transform data ◮ Reduces memory footprint: data exists exactly once 10 / 13
Complex Query processing
A query actor receives messages from archive, index, and client
◮ query acts as “iterator” over the archive for index hit ◮ Maintains lots of state for incremental extraction of matches ◮ Difficult to implement correctly when messages arrive in any order ◮ Many if-then-else constructs clutter main logic
11 / 13
Finite State Machine
Implement stateful logic with a finite state machine
Revised query process
◮ Each state defines a set
◮ Explicit transitions make
readable and clear code
◮ libcppa primitive:
become/unbecome
extracting waiting failed idle ready done
hits arrive segment arrives user triggers extraction finishes segment, more segments available finishes segment, need more hits finishes segment, no more segments experiences unrecoverable error extract fewer events than available in segment all hits arrived, no hits to process
12 / 13
Lesson #1
Programming distributed systems feels like “networking”
◮ Flow control prevents imbalanced sender/receiver speeds ◮ Bufferbloat increases latency and causes processing spikes ◮ Explicit state machines keep asynchronous messaging manageable
Lesson #2
GPGPU programming style fits well for intra-process concurrency
◮ Make full data available to all workers ◮ Each worker is responsible for extracting its relevant data
13 / 13
Ingestion Indexing Query
13 / 13
https://github.com/mavam/vast https://github.com/Neverlord/libcppa IRC at Freenode: #vast, #libcppa
1 / 1