designing for distributed unstructured data
play

Designing for Distributed, Unstructured Data Matt Brender - PowerPoint PPT Presentation

Designing for Distributed, Unstructured Data Matt Brender Developer Advocate at Basho 1 => curl $RIAK/props { Matt Brender : developer advocate, ops > dev, mbrender@basho.com, @mjbrender,


  1. Designing for Distributed, Unstructured Data Matt Brender Developer Advocate at Basho 1

  2. => curl $RIAK/props { “Matt Brender” : ‘developer advocate’, ‘ops > dev’, ’mbrender@basho.com’, ‘@mjbrender’, ‘neckbeardinfluence.com’, ‘geek-whisperers.com’, ‘indoor enthusiast’ } tweet me @mjbrender 2

  3. I’m saying “Riak” Not “react,” as in react.js tweet me @mjbrender 3

  4. tweet me @mjbrender 4

  5. tweet me @mjbrender 5

  6. tweet me @mjbrender 6

  7. { "text": ”Woot! #qconnewyork", "entities": { "hashtags": [“#qconnewyork”], "symbols": [], "urls": [], "user_mentions": [{ "screen_name": ”mjbrender", "name": ”Matt Brender", "id": 4948123, "id_str": ”42424242", "indices": [81, 92] }, { "screen_name": ”mjbrender", "name": ”Matt Brender", "id": 376825877, "id_str": "376825877", "indices": [121, 132] }] } } 7

  8. Just Hording? tweet me @mjbrender 8

  9. Just Hording? tweet me @mjbrender 9

  10. A common pattern tweet me @mjbrender 10

  11. tweet me @mjbrender 11

  12. tweet me @mjbrender 12

  13. tweet me @mjbrender 13

  14. tweet me @mjbrender 14

  15. tweet me @mjbrender 15

  16. tweet me @mjbrender 16

  17. tweet me @mjbrender 17

  18. tweet me @mjbrender 18

  19. tweet me @mjbrender 19

  20. tweet me @mjbrender 20

  21. Our Problem(s) • Same data in different formats • Cache • Denormalisation • Indexes • Aggregations • We’re sticking to what we know • Relational databases with SQL queries • Not anticipating scaling needs • We’re not sure what’s next • Bitten by architectural choices in the past • New systems require consideration • Not sure what’s justifies investment tweet me @mjbrender 21

  22. Can’t I just … tweet me @mjbrender 22

  23. tweet me @mjbrender 23

  24. tweet me @mjbrender 24

  25. tweet me @mjbrender 25

  26. tweet me @mjbrender 26

  27. tweet me @mjbrender 27

  28. tweet me @mjbrender 28 36

  29. tweet me @mjbrender 29

  30. tweet me @mjbrender 30

  31. The Choices tweet me @mjbrender 31

  32. This or That • NoSQL • Hadoop • Types • HDFS • Key/Value • Map/Reduce • Document • YARN • Columnar • Graph • “Messaging Queues” • Spark • Pub/Sub • Successor to Map/ Reduce • Commit Log • Compute-focused tweet me @mjbrender 32

  33. So, NoSQL tweet me @mjbrender 33

  34. What Qualifies as NoSQL? tweet me @mjbrender 34 Basho Confidential

  35. NOSQL Community tweet me @mjbrender 35 Basho Confidential

  36. Persistence Querying Scaling tweet me @mjbrender 36

  37. Persistence tweet me @mjbrender 37

  38. tweet me @mjbrender 38

  39. Querying tweet me @mjbrender 39

  40. Other Queries Understanding how you get your data back Query Languages • SQL(?) Query Interfaces • HTTP/S • Protocol Buffers tweet me @mjbrender 40

  41. Apache Solr Integration Write it like Riak. Query it like Solr. Distributed Full-Text Search Standard full-text Solr queries automatically expand into distributed search queries for a complete result set across instances. Ad-Hoc Query Support Broad support for Solr query parameters, e.g., exact match, range queries, and/or/not, sorting, pagination, scoring, ranking, etc. Index Synchronization Data is automatically synchronized between Riak KV and Solr using intelligent monitoring to detect changes, and propagates those to Solr indexes. Solr API Support Query data in Riak KV using existing Solr APIs Auto-Restart Monitor Solr OS processes continuously and automatically start or restart them whenever failures are detected. tweet me @mjbrender 41

  42. Polylingual Querying There are a diverse group of client libraries for Riak that support both the HTTP and Protocol Bu fg er APIs: Basho Supported Libraries: Community Libraries: • Java • Clojure • Ruby • Go • Python • Perl • PHP • Scala • Erlang • R • .NET • Node.js • C tweet me @mjbrender 42

  43. Scale means tweet me @mjbrender 43

  44. tweet me @mjbrender 44

  45. Sharding tweet me @mjbrender 45

  46. Sharding Strategies Master OR Slave Slave Slave Node%1% Node%2% Node%3% tweet me @mjbrender 46

  47. Sharding Strategies tweet me @mjbrender 47

  48. CAP Theorem A AP CA Riak RDBMS Cassandra MySQL Couchbase Postgres Voldemort C P CP MongoDB BigTable Redis Hbase tweet me @mjbrender 48

  49. What Are You Sacrificing? • CA • Data is consistent and R/W from any node until partition, when data will be out of sync (and won't re-sync) • CP • Data is consistent between all nodes, and maintains partition tolerance (preventing data de-sync) by becoming unavailable when a node goes down • AP • Nodes remain online even if they can't communicate with each other and will resync data once the partition is resolved, but you aren't guaranteed that all nodes will have the same data (either during or after the partition) tweet me @mjbrender 49

  50. The Dynamo Paper tweet me @mjbrender 50

  51. Conflict tweet me @mjbrender 51

  52. Conflict Resolution tweet me @mjbrender 52

  53. set conflict resolution 2015:05:25 2015:05:26 2015:05:27 { { { [“Tom” : “Beth”], [“George” : “Tom”], [“Beth” : “Tom”], [“Beth” : “Tom”], [“Beth” : “Jim”], [“Beth” : “Jim”], [“George” : “Jim”] [“George” : “Jim”] [“Beth” : “George”] } } } 53 tweet me @mjbrender

  54. set conflict resolution Client Client Client Riak 54 tweet me @mjbrender

  55. set conflict resolution Client Client Client { [“Tom” : “Beth”], { [“Beth” : “Tom”], Riak [“Tom” : “Beth”], [“George” : “Jim”] [“Beth” : “Tom”], } [“George” : “Jim”] } 55 tweet me @mjbrender

  56. set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] } { Riak [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } 56 tweet me @mjbrender

  57. set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] } { [“Jane”: “Tom”], Riak [“Tom” : “Beth”], [“Beth” : “Tom”], [“George” : “Jim”] } 57 tweet me @mjbrender

  58. set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] { } [“Jane”: “Tom”], [“Tom” : “Beth”], { [“Beth” : “Tom”], [“Jane”: “Tom”], Riak [“George” : “Jim”] [“Tom” : “Beth”], } [“Beth” : “Tom”], [“George” : “Jim”] } 58 tweet me @mjbrender

  59. set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] { } [“Jane”: “Tom”], [“Tom” : “Beth”], { [“Beth” : “Tom”], [“Jane”: “Tom”], Riak [“George” : “Jim”] [“Tom” : “Beth”], } [“Beth” : “Tom”], [“George” : “Jim”] } 59 tweet me @mjbrender

  60. set conflict resolution Client { [“Jane”: “Tom”], Client [“Tom” : “Beth”], Client [“Beth” : “Tom”], [“George” : “Jim”] { } [“Jane”: “Tom”], { [“Tom” : “Beth”], [“Jane”: “Tom”], [“Beth” : “Tom”], Riak [“Tom” : “Beth”], [“George” : “Jim”] [“Beth” : “Tom”], } [“George” : “Jim”], [“Tom”: “Jane”] } 60 tweet me @mjbrender

  61. set conflict resolution Client { [“Jane”: “Tom”], [“Tom” : “Beth”], Client Client [“Beth” : “Tom”], [“George” : “Jim”], [“Beth”, “Jane”] } { { [“Jane”: “Tom”], [“Jane”: “Tom”], [“Tom” : “Beth”], Riak [“Tom” : “Beth”], [“Beth” : “Tom”], [“Beth” : “Tom”], [“George” : “Jim”] [“George” : “Jim”], } [“Tom”: “Jane”] } 61 tweet me @mjbrender

  62. set conflict resolution Client { [“Jane”: “Tom”], [“Tom” : “Beth”], Client Client [“Beth” : “Tom”], [“George” : “Jim”], [“Beth”, “Jane”] } { [“Jane”: “Tom”], { Riak [“Tom” : “Beth”], [“Jane”: “Tom”], [“Beth” : “Tom”], [“Tom” : “Beth”], [“George” : “Jim”], [“Beth” : “Tom”], [“Tom”: “Jane”] [“George” : “Jim”], } [“Beth”, “Jane”] } 62 tweet me @mjbrender

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend