Designing for Distributed, Unstructured Data Matt Brender - - PowerPoint PPT Presentation

designing for distributed unstructured data
SMART_READER_LITE
LIVE PREVIEW

Designing for Distributed, Unstructured Data Matt Brender - - PowerPoint PPT Presentation

Designing for Distributed, Unstructured Data Matt Brender Developer Advocate at Basho 1 => curl $RIAK/props { Matt Brender : developer advocate, ops > dev, mbrender@basho.com, @mjbrender,


slide-1
SLIDE 1

Designing for Distributed, Unstructured Data

Matt Brender

Developer Advocate at Basho

1

slide-2
SLIDE 2

tweet me @mjbrender

=> curl $RIAK/props

2

{ “Matt Brender” :

‘developer advocate’, ‘ops > dev’, ’mbrender@basho.com’, ‘@mjbrender’, ‘neckbeardinfluence.com’, ‘geek-whisperers.com’, ‘indoor enthusiast’

}

slide-3
SLIDE 3

tweet me @mjbrender

I’m saying “Riak”

3

Not “react,” as in react.js

slide-4
SLIDE 4

tweet me @mjbrender 4

slide-5
SLIDE 5

tweet me @mjbrender 5

slide-6
SLIDE 6

tweet me @mjbrender 6

slide-7
SLIDE 7

{ "text": ”Woot! #qconnewyork", "entities": { "hashtags": [“#qconnewyork”], "symbols": [], "urls": [], "user_mentions": [{ "screen_name": ”mjbrender", "name": ”Matt Brender", "id": 4948123, "id_str": ”42424242", "indices": [81, 92] }, { "screen_name": ”mjbrender", "name": ”Matt Brender", "id": 376825877, "id_str": "376825877", "indices": [121, 132] }] } }

7

slide-8
SLIDE 8

tweet me @mjbrender

Just Hording?

8

slide-9
SLIDE 9

tweet me @mjbrender

Just Hording?

9

slide-10
SLIDE 10

tweet me @mjbrender

A common pattern

10

slide-11
SLIDE 11

tweet me @mjbrender 11

slide-12
SLIDE 12

tweet me @mjbrender 12

slide-13
SLIDE 13

tweet me @mjbrender 13

slide-14
SLIDE 14

tweet me @mjbrender 14

slide-15
SLIDE 15

tweet me @mjbrender 15

slide-16
SLIDE 16

tweet me @mjbrender 16

slide-17
SLIDE 17

tweet me @mjbrender 17

slide-18
SLIDE 18

tweet me @mjbrender 18

slide-19
SLIDE 19

tweet me @mjbrender 19

slide-20
SLIDE 20

tweet me @mjbrender 20

slide-21
SLIDE 21

tweet me @mjbrender

Our Problem(s)

  • Same data in different formats
  • Cache
  • Denormalisation
  • Indexes
  • Aggregations
  • We’re sticking to what we know
  • Relational databases with SQL queries
  • Not anticipating scaling needs
  • We’re not sure what’s next
  • Bitten by architectural choices in the past
  • New systems require consideration
  • Not sure what’s justifies investment

21

slide-22
SLIDE 22

tweet me @mjbrender

Can’t I just…

22

slide-23
SLIDE 23

tweet me @mjbrender 23

slide-24
SLIDE 24

tweet me @mjbrender 24

slide-25
SLIDE 25

tweet me @mjbrender 25

slide-26
SLIDE 26

tweet me @mjbrender 26

slide-27
SLIDE 27

tweet me @mjbrender 27

slide-28
SLIDE 28

tweet me @mjbrender 28 36

slide-29
SLIDE 29

tweet me @mjbrender 29

slide-30
SLIDE 30

tweet me @mjbrender 30

slide-31
SLIDE 31

tweet me @mjbrender

The Choices

31

slide-32
SLIDE 32

tweet me @mjbrender

This or That

  • NoSQL
  • Types
  • Key/Value
  • Document
  • Columnar
  • Graph
  • “Messaging Queues”
  • Pub/Sub
  • Commit Log

32

  • Hadoop
  • HDFS
  • Map/Reduce
  • YARN
  • Spark
  • Successor to Map/

Reduce

  • Compute-focused
slide-33
SLIDE 33

tweet me @mjbrender

So, NoSQL

33

slide-34
SLIDE 34

tweet me @mjbrender

Basho Confidential

What Qualifies as NoSQL?

34

slide-35
SLIDE 35

tweet me @mjbrender

Basho Confidential

NOSQL Community

35

slide-36
SLIDE 36

tweet me @mjbrender

Persistence

Querying

Scaling

36

slide-37
SLIDE 37

tweet me @mjbrender

Persistence

37

slide-38
SLIDE 38

tweet me @mjbrender 38

slide-39
SLIDE 39

tweet me @mjbrender

Querying

39

slide-40
SLIDE 40

tweet me @mjbrender

Other Queries

Understanding how you get your data back

Query Languages

  • SQL(?)

Query Interfaces

  • HTTP/S
  • Protocol Buffers

40

slide-41
SLIDE 41

tweet me @mjbrender

Apache Solr Integration

Write it like Riak. Query it like Solr.

Distributed Full-Text Search Standard full-text Solr queries automatically expand into distributed search queries for a complete result set across instances.

Ad-Hoc Query Support

Broad support for Solr query parameters, e.g., exact match, range queries, and/or/not, sorting, pagination, scoring, ranking, etc.

Index Synchronization

Data is automatically synchronized between Riak KV and Solr using intelligent monitoring to detect changes, and propagates those to Solr indexes.

Solr API Support

Query data in Riak KV using existing Solr APIs

Auto-Restart

Monitor Solr OS processes continuously and automatically start or restart them whenever failures are detected.

41

slide-42
SLIDE 42

tweet me @mjbrender 42

There are a diverse group of client libraries for Riak that support both the HTTP and Protocol Bufger APIs:

Basho Supported Libraries:

  • Java
  • Ruby
  • Python
  • PHP
  • Erlang
  • .NET
  • Node.js
  • C

Community Libraries:

  • Clojure
  • Go
  • Perl
  • Scala
  • R

Polylingual Querying

slide-43
SLIDE 43

tweet me @mjbrender

Scale means

43

slide-44
SLIDE 44

tweet me @mjbrender 44

slide-45
SLIDE 45

tweet me @mjbrender

Sharding

45

slide-46
SLIDE 46

tweet me @mjbrender

Sharding Strategies

46

Master Slave Slave Slave

OR

Node%1% Node%2% Node%3%

slide-47
SLIDE 47

tweet me @mjbrender

Sharding Strategies

47

slide-48
SLIDE 48

tweet me @mjbrender

CAP Theorem

48

AP Riak Cassandra Couchbase Voldemort CP MongoDB BigTable Redis Hbase

P C A

CA RDBMS MySQL Postgres

slide-49
SLIDE 49

tweet me @mjbrender

What Are You Sacrificing?

  • CA
  • Data is consistent and R/W from any node until partition, when

data will be out of sync (and won't re-sync)

  • CP
  • Data is consistent between all nodes, and maintains partition

tolerance (preventing data de-sync) by becoming unavailable when a node goes down

  • AP
  • Nodes remain online even if they can't communicate with each
  • ther and will resync data once the partition is resolved, but you

aren't guaranteed that all nodes will have the same data (either during or after the partition)

49

slide-50
SLIDE 50

tweet me @mjbrender 50

The Dynamo Paper

slide-51
SLIDE 51

tweet me @mjbrender

Conflict

51

slide-52
SLIDE 52

tweet me @mjbrender

Conflict Resolution

52

slide-53
SLIDE 53

tweet me @mjbrender

set conflict resolution

{ [“Beth” : “Tom”], [“Beth” : “Jim”], [“Beth” : “George”] } 2015:05:27 { [“George” : “Tom”], [“Beth” : “Jim”], [“George” : “Jim”] } 2015:05:26 { [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } 2015:05:25

53

slide-54
SLIDE 54

tweet me @mjbrender

set conflict resolution

Riak

54

Client Client Client

slide-55
SLIDE 55

tweet me @mjbrender

set conflict resolution

Riak

55

Client Client Client

{ [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } { [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] }

slide-56
SLIDE 56

tweet me @mjbrender

set conflict resolution

Riak

56

Client Client Client

{ [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] }

slide-57
SLIDE 57

tweet me @mjbrender

set conflict resolution

Riak

57

Client Client Client

{ [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] }

slide-58
SLIDE 58

tweet me @mjbrender

set conflict resolution

Riak

58

Client Client Client

{ [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] }

slide-59
SLIDE 59

tweet me @mjbrender

set conflict resolution

Riak

59

Client Client Client

{ [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] }

slide-60
SLIDE 60

tweet me @mjbrender

set conflict resolution

Riak

60

Client Client Client

{ [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”], [“Tom”: “Jane”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] }

slide-61
SLIDE 61

tweet me @mjbrender

set conflict resolution

Riak

61

Client Client Client

{ [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”], [“Tom”: “Jane”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”], [“Beth”, “Jane”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] }

slide-62
SLIDE 62

tweet me @mjbrender

set conflict resolution

Riak

62

Client Client Client

{ [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”], [“Tom”: “Jane”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”], [“Beth”, “Jane”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”], [“Beth”, “Jane”] }

slide-63
SLIDE 63

tweet me @mjbrender

set conflict resolution

{ [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”], [“Tom”: “Jane”], [“Beth”: “Jane”] }

63

{ [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”], [“Tom”: “Jane”] } { [“Jane”: “Tom”], [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”], [“Beth”, “Jane”] }

}set CRDT behavior

slide-64
SLIDE 64

tweet me @mjbrender

The Choices

64

slide-65
SLIDE 65

tweet me @mjbrender

What Matters Most

What do I need (most) from my database?

  • Consistency?
  • Transactionality?
  • Availability (and Performance)?

65

Where do I need most from my data?

  • A single, scalable platform?
  • A revolving door of new systems?
  • A mix?
slide-66
SLIDE 66

tweet me @mjbrender 66

slide-67
SLIDE 67

tweet me @mjbrender

Closing Questions

67

slide-68
SLIDE 68

tweet me @mjbrender

Do I have to scale?

68

slide-69
SLIDE 69

tweet me @mjbrender

Will I ever want this?

69

slide-70
SLIDE 70

tweet me @mjbrender

Do we have to build it?

70

slide-71
SLIDE 71

tweet me @mjbrender

Get Hands-on

71

slide-72
SLIDE 72

tweet me @mjbrender 72

riak-dev cluster

https://github.com/basho-labs

git clone

slide-73
SLIDE 73

tweet me @mjbrender 73

Matt Brender @mjbrender

Thank You