Why Actors Rock: Designing a Distributed Database with libcppa - - PowerPoint PPT Presentation

why actors rock designing a distributed database with
SMART_READER_LITE
LIVE PREVIEW

Why Actors Rock: Designing a Distributed Database with libcppa - - PowerPoint PPT Presentation

Why Actors Rock: Designing a Distributed Database with libcppa Matthias Vallentin matthias@bro.org University of California, Berkeley C ++ Now May 15, 2014 Outline 1. System Overview: VAST 2. Architecture: Ingestion, Indexing, and Query


slide-1
SLIDE 1

Why Actors Rock: Designing a Distributed Database with libcppa

Matthias Vallentin

matthias@bro.org

University of California, Berkeley

C++Now May 15, 2014

slide-2
SLIDE 2

Outline

  • 1. System Overview: VAST
  • 2. Architecture: Ingestion, Indexing, and Query

Ingestion Indexing Query

  • 3. Experience
  • 4. Demo

1 / 13

slide-3
SLIDE 3

VAST: Visibility Across Space and Time

VAST

Distributed database built with libcppa

Goals

◮ Scalability

◮ Sustain high & continuous input rates ◮ Linear scaling with number of nodes

◮ Interactivity

◮ Sub-second response times ◮ Iterative query refinement

◮ Strong and rich typing

◮ High-level types and operations ◮ Type safety in query language Ingestor Ingestor Ingestor Client Client Client Archive Index Search Receiver 2 / 13

slide-4
SLIDE 4

Example Use Case: Network Security Analysis

Network Forensics & Incident Response

◮ Scenario: security breach discovered ◮ Analysts tasked with determining scope and impact

Analyst questions

◮ How did the attacker(s) get in? ◮ How long did the they stay under the radar? ◮ What is the damage ($$$, reputation, data loss, etc.)? ◮ How to detect similar attacks in the future?

3 / 13

slide-5
SLIDE 5

Outline

  • 1. System Overview: VAST
  • 2. Architecture: Ingestion, Indexing, and Query

Ingestion Indexing Query

  • 3. Experience
  • 4. Demo

3 / 13

slide-6
SLIDE 6

Outline

  • 1. System Overview: VAST
  • 2. Architecture: Ingestion, Indexing, and Query

Ingestion Indexing Query

  • 3. Experience
  • 4. Demo

3 / 13

slide-7
SLIDE 7

Ingestion

Ingestor Ingestor Ingestor Core Client Client Client 4 / 13

slide-8
SLIDE 8

Ingestion

Ingestor 4 / 13

slide-9
SLIDE 9

Ingestion

Ingestor Source Segmentizer 4 / 13

slide-10
SLIDE 10

Ingestion

ingestor

  • 1. Parse input into events

Ingestor Source Segmentizer

10.0.0.1 10.0.0.254 53/udp 10.0.0.2. 10.0.0.254 80/tcp

4 / 13

slide-11
SLIDE 11

Ingestion

ingestor

  • 1. Parse input into events

Ingestor Source Segmentizer

10.0.0.1 10.0.0.254 53/udp 2013-08-12 12:08:32 type info

4 / 13

slide-12
SLIDE 12

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

Ingestor Source Segmentizer 4 / 13

slide-13
SLIDE 13

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

Ingestor Source Segmentizer chunk meta data segment 4 / 13

slide-14
SLIDE 14

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

Ingestor Source Segmentizer 4 / 13

slide-15
SLIDE 15

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

  • 3. Send segments to receiver

Ingestor Source Segmentizer Core Receiver Archive Index 4 / 13

slide-16
SLIDE 16

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

  • 3. Send segments to receiver

Ingestor Source Segmentizer Core Receiver Archive Index 4 / 13

slide-17
SLIDE 17

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

  • 3. Send segments to receiver

Receiver Archive Index 4 / 13

slide-18
SLIDE 18

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

  • 3. Send segments to receiver

receiver

  • 1. Accept and ACK segment
  • 2. Assign segment an ID range

from space 264

Receiver Archive Index Tracker 4 / 13

slide-19
SLIDE 19

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

  • 3. Send segments to receiver

receiver

  • 1. Accept and ACK segment
  • 2. Assign segment an ID range

from space 264

  • 3. Record segment schema

Receiver Archive Index Tracker Search 4 / 13

slide-20
SLIDE 20

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

  • 3. Send segments to receiver

receiver

  • 1. Accept and ACK segment
  • 2. Assign segment an ID range

from space 264

  • 3. Record segment schema

Receiver Archive Index Tracker Search Partitions ... 4 / 13

slide-21
SLIDE 21

Ingestion

ingestor

  • 1. Parse input into events
  • 2. Compress & chunk into

segments

  • 3. Send segments to receiver

receiver

  • 1. Accept and ACK segment
  • 2. Assign segment an ID range

from space 264

  • 3. Record segment schema
  • 4. Forward segment to

archive and index

Receiver Archive Index Tracker Search Partitions ... 4 / 13

slide-22
SLIDE 22

Outline

  • 1. System Overview: VAST
  • 2. Architecture: Ingestion, Indexing, and Query

Ingestion Indexing Query

  • 3. Experience
  • 4. Demo

4 / 13

slide-23
SLIDE 23

Indexing

Index Partitions ... ... 5 / 13

slide-24
SLIDE 24

Indexing

index

  • 1. Forward segment to

relevant partition

Index Partitions ... ... 5 / 13

slide-25
SLIDE 25

Indexing

index

  • 1. Forward segment to

relevant partition

  • 2. Spawn indexer for

event values

Index Partitions ... ... ... Bitmap Indexers 5 / 13

slide-26
SLIDE 26

Indexing

index

  • 1. Forward segment to

relevant partition

  • 2. Spawn indexer for

event values

Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13

slide-27
SLIDE 27

Indexing

index

  • 1. Forward segment to

relevant partition

  • 2. Spawn indexer for

event values

  • 3. Unpack segment back

into events

Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13

slide-28
SLIDE 28

Indexing

index

  • 1. Forward segment to

relevant partition

  • 2. Spawn indexer for

event values

  • 3. Unpack segment back

into events

indexer

  • 1. Receive event

Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13

slide-29
SLIDE 29

Indexing

index

  • 1. Forward segment to

relevant partition

  • 2. Spawn indexer for

event values

  • 3. Unpack segment back

into events

indexer

  • 1. Receive event
  • 2. Select value to index

Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13

slide-30
SLIDE 30

Indexing

index

  • 1. Forward segment to

relevant partition

  • 2. Spawn indexer for

event values

  • 3. Unpack segment back

into events

indexer

  • 1. Receive event
  • 2. Select value to index
  • 3. Report statistics back to

partition

Index Partitions ... ... ... Bitmap Indexers Unpacker 5 / 13

slide-31
SLIDE 31

Outline

  • 1. System Overview: VAST
  • 2. Architecture: Ingestion, Indexing, and Query

Ingestion Indexing Query

  • 3. Experience
  • 4. Demo

5 / 13

slide-32
SLIDE 32

Query

Ingestor Ingestor Ingestor Core Client Client Client 6 / 13

slide-33
SLIDE 33

Query

Client 6 / 13

slide-34
SLIDE 34

Query

client

  • 1. Send query string to search

Client Search 6 / 13

slide-35
SLIDE 35

Query

client

  • 1. Send query string to search

search

  • 1. Parse and validate query string

Client Search

src == 10.0.0.1 && port == 53/udp

Index Partitions Indexers 6 / 13

slide-36
SLIDE 36

Query

client

  • 1. Send query string to search

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query

Client Search

src == 10.0.0.1 && port == 53/udp

Index Partitions Indexers Query 6 / 13

slide-37
SLIDE 37

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query

Client Search

src == 10.0.0.1 && port == 53/udp

Index Partitions Indexers Query 6 / 13

slide-38
SLIDE 38

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

Client Search Index Partitions Indexers Query

src == 10.0.0.1 && port == 53/udp

6 / 13

slide-39
SLIDE 39

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

Client Search Index Partitions Indexers Query

port == 53/udp src == 10.0.0.1

6 / 13

slide-40
SLIDE 40

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

query

Client Search Index Partitions Indexers Query

1 = “mass” 10100010011100 0 = empty

6 / 13

slide-41
SLIDE 41

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

query

Client Search Index Partitions Indexers Query 6 / 13

slide-42
SLIDE 42

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

query

Client Search Index Partitions Indexers Query 6 / 13

slide-43
SLIDE 43

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

query

  • 1. Receive hits from index

Client Search Index Partitions Indexers Query 6 / 13

slide-44
SLIDE 44

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

query

  • 1. Receive hits from index

Client Search Index Partitions Indexers Query Archive 6 / 13

slide-45
SLIDE 45

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

query

  • 1. Receive hits from index
  • 2. Ask archive for segments

Client Search Index Partitions Indexers Query Archive 6 / 13

slide-46
SLIDE 46

Query

client

  • 1. Send query string to search
  • 2. Receive query actor

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

query

  • 1. Receive hits from index
  • 2. Ask archive for segments
  • 3. Extract events, check candidates

Client Search Index Partitions Indexers Query Archive 6 / 13

slide-47
SLIDE 47

Query

client

  • 1. Send query string to search
  • 2. Receive query actor
  • 3. Extract results from query

search

  • 1. Parse and validate query string
  • 2. Spawn dedicated query
  • 3. Forward query to index

query

  • 1. Receive hits from index
  • 2. Ask archive for segments
  • 3. Extract events, check candidates
  • 4. Send results to client

Client Search Index Partitions Indexers Query Archive 6 / 13

slide-48
SLIDE 48

Outline

  • 1. System Overview: VAST
  • 2. Architecture: Ingestion, Indexing, and Query

Ingestion Indexing Query

  • 3. Experience
  • 4. Demo

6 / 13

slide-49
SLIDE 49

Issue #1: Bufferbloat

Bufferbloat

Large buffers cause high latency and jitter

Aside: Go

goroutines execute concurrently and exchange messages via channels

◮ Sender blocks when channel is full ◮ Receiver blocks when channel is empty

→ Explicit notion of buffer

◮ libcppa : no blocking to signal overload

Bufferbloat in VAST

◮ Large segments (128MB) ◮ Data flow rates

◮ Ingestion: 80k–100k events/sec ◮ Indexing: 20k–200k events/sec

→ Sender overloads receiver: system runs out-of-memory

7 / 13

slide-50
SLIDE 50

Solution #1: Flow Control

Flow Control

Feedback on capacity from overloaded resource up to sender

Revised indexing process

  • 1. partition spawns indexers and dispatches events

◮ Queue length: number of events sent to indexer

  • 2. Indexers report back how many events they have indexed

◮ Decreases queue length by events processed

  • 3. Receiver polls index every 100ms for maximum queue length

◮ If watermark reached, tell ingestors to stop ◮ If watermark cleared, tell ingestors to go 8 / 13

slide-51
SLIDE 51

Problem #2: Data Structure Inflation

Initial indexing process

  • 1. Unpack segment
  • 2. Create one vector<event> for meta indexes (across events)
  • 3. Create one vector<event> for data indexes (per event)
  • 4. Forward to events to the corresponding indexers

Issues

  • 1. Memory overhead from maintaining multiple different data slices
  • 2. Effect exacerbated by buffer bloat

9 / 13

slide-52
SLIDE 52

Solution #2: Data Sharing

Intra-Process Performance

Share data intelligently instead of partitioning it beforehand

Revised indexing process

◮ Do not “inflate” data just to partition it for workers ◮ GPGPU-style: make data available “globally” in workers

◮ Disburdens CPU: no time needed to transform data ◮ Reduces memory footprint: data exists exactly once 10 / 13

slide-53
SLIDE 53

Problem #3: Messaging Complexity

Complex Query processing

A query actor receives messages from archive, index, and client

◮ query acts as “iterator” over the archive for index hit ◮ Maintains lots of state for incremental extraction of matches ◮ Difficult to implement correctly when messages arrive in any order ◮ Many if-then-else constructs clutter main logic

11 / 13

slide-54
SLIDE 54

Solution #3: State Machine

Finite State Machine

Implement stateful logic with a finite state machine

Revised query process

◮ Each state defines a set

  • f valid messages

◮ Explicit transitions make

readable and clear code

◮ libcppa primitive:

become/unbecome

extracting waiting failed idle ready done

hits arrive segment arrives user triggers extraction finishes segment, more segments available finishes segment, need more hits finishes segment, no more segments experiences unrecoverable error extract fewer events than available in segment all hits arrived, no hits to process

12 / 13

slide-55
SLIDE 55

Summary & Lessons Learned

Lesson #1

Programming distributed systems feels like “networking”

◮ Flow control prevents imbalanced sender/receiver speeds ◮ Bufferbloat increases latency and causes processing spikes ◮ Explicit state machines keep asynchronous messaging manageable

Lesson #2

GPGPU programming style fits well for intra-process concurrency

◮ Make full data available to all workers ◮ Each worker is responsible for extracting its relevant data

13 / 13

slide-56
SLIDE 56

Outline

  • 1. System Overview: VAST
  • 2. Architecture: Ingestion, Indexing, and Query

Ingestion Indexing Query

  • 3. Experience
  • 4. Demo

13 / 13

slide-57
SLIDE 57

Thank You. . . Questions?

FIN

https://github.com/mavam/vast https://github.com/Neverlord/libcppa IRC at Freenode: #vast, #libcppa

1 / 1