modern fast streaming data
play

Modern Fast Streaming Data Todd L. Montgomery @toddlmontgomery - PowerPoint PPT Presentation

Modern Fast Streaming Data Todd L. Montgomery @toddlmontgomery Why Should We Care? Myths & Misconceptions You cant escape the Math Technologies & Techniques Why Should We Care? Human Knowledge is now doubling every year* * by


  1. Modern Fast Streaming Data Todd L. Montgomery @toddlmontgomery

  2. Why Should We Care? Myths & Misconceptions You can’t escape the Math Technologies & Techniques

  3. Why Should We Care?

  4. Human Knowledge is now doubling every year* * by discipline, 12-18 months

  5. Fueled by Technology

  6. Middle Ages - 1500 yrs Renaissance - 250 yrs Industrial Revolution - 150 yrs WWII - 25 years Buckminster Fuller - Critical Path 1981

  7. IoT

  8. IoT Ubiquitous Computing

  9. In the near future, Human Knowledge could double every 72 hours

  10. What this could mean for our systems…

  11. Either ingest or streaming. 2x for Request/Response Updates/Sec = Devices * Frequency * Market Share

  12. Updates/Sec = Devices * Frequency * Market Share 9 Billion (Today) 50 Billion by 2020 (Cisco) 26 Billion by 2020 (Smartphone/Tablet - Gartner) 75 Billion by 2020 (Morgan Stanley)

  13. Updates/Sec = 50 Billion * 6/min * 1% = 50 Million/sec

  14. Bandwidth = 50 Billion * 6/min * 1% * 200 bytes = 9.3 GB/s (74.5 Gb/s)

  15. 10% 30% 20% 10% 15% 15% And… Geographic Distribution

  16. Social & Societal demands will require processing an intense stream of data in real-time

  17. Myths & Misconceptions

  18. Excuses, Excuses!

  19. Myth (CPUs, Storage, Networks) are not capable of processing in real-time* * for some unknown, unquantified data volume

  20. Accumulated Network Improvement Bandwidth CPU Cores Storage Capacity Memory Capacity Response Time Time

  21. Year Processor MIPS 1974 Intel 8080 0.29 1982 Intel 286 1.28 1993 PowerPC 601 157 2003 Pentium 4 Extreme 9,726 Intel Core i7 920 2008 82,300 (Quad) Intel Core i7 2600K 2011 128,300 (4/8) Sandy Bridge Intel Core i7 5960x 2014 298,190 (8/16) Haswell http://en.wikipedia.org/wiki/Instructions_per_second

  22. Raspberry Pi 2 (Quad) 1,186 MIPS! http://en.wikipedia.org/wiki/Instructions_per_second

  23. DDRSSD PCIe - 3 100 GbE … OmniPath

  24. 1 thread of awesome > 128 cores of so-so http://blog.acolyer.org/2015/06/05/scalability-but-at-what-cost/ http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html

  25. Misconception Data needs to come to rest to be processed

  26. Data at rest is a liability

  27. ETL Warehouse Data at rest is a liability ILM MDM

  28. Why would you want transient data at rest?

  29. But what about sort? …REALLY?!

  30. You can’t escape the Math

  31. … and you don’t want to The Math will guide you to a solution

  32. "AmdahlsLaw" by Daniels220 at English Wikipedia - Own work based on: File:AmdahlsLaw.png. Licensed under CC BY-SA 3.0 via Wikimedia Commons

  33. Setup & Scheduling Work Unit Work Unit Work Unit Work Unit Post Processing

  34. Setup & Scheduling Contention Work Unit Work Unit Work Unit Work Unit Post Processing Contention

  35. Contention isn’t the biggest enemy

  36. Coherence is!

  37. Universal Scalability Law 20 18 16 14 Speedup 12 10 8 6 4 2 0 1 2 4 8 16 32 64 128 256 512 1024 Processors Amdahl USL

  38. Setup & Scheduling Contention Contention + Coherence Work Unit Work Unit Work Unit Work Unit Contention + Coherence Post Processing Contention

  39. Up Front Partitioning Work Unit Work Unit Work Unit Work Unit

  40. … and more Queuing Theory ⭐ Complexity Theory CAP Theorem

  41. Technologies & Techniques

  42. The Essence of Architecture Data Structures Protocols of Interaction Mechanical Sympathy

  43. Understanding is essential

  44. Accumulated Network Improvement Bandwidth CPU Cores Storage Capacity Memory Capacity Response Time Time

  45. Batching… Accumulated Network Improvement Bandwidth CPU Cores Storage Capacity Memory Capacity Response Time Time

  46. Technique Smart Batching (Natural Batching) http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html

  47. Resource

  48. Ring Buffer Resource

  49. Batching Thread Resource Pull off as much waiting data as possible

  50. Single Writer Principle Avoid Resource Contention Batching only when needed Rate Decoupling Back Pressure

  51. Techniques Freedom! Lock-Free, Wait-Free http://en.wikipedia.org/wiki/Non-blocking_algorithm

  52. Words Matter

  53. Obstruction-Freedom Partially completed operations aborted & changes made rolled back

  54. Lock-Freedom Individual thread may starve, but guaranteed system-wide throughput Lock-Free is Obstruction-Free

  55. Wait-Freedom Starvation free and guaranteed system-wide throughput Wait-Free is Lock-Free

  56. These properties are awesome! Who wouldn’t want them?

  57. System-wide properties start at the lowest level

  58. Essence Just because we could take an action right now, doesn’t mean we should

  59. Technology CRDTs http://en.wikipedia.org/wiki/Conflict-free_replicated_data_type

  60. Node Value 0 0 1 0 2 sum(0,N) = 0 0 … N 0

  61. Node Value 0 0 1 1 2 sum(0,N) = 1 0 … N 0

  62. Node Value 0 1 1 1 2 sum(0,N) = 2 0 … N 0

  63. Gossip for visibility

  64. 0 0 2 4 [N] = 0 [2] = 0 [1] = 2 [0] = 4 … 4 2 0 0 Shared View

  65. Technology Append-only Data Structures https://github.com/real-logic/Aeron

  66. Log Header Message

  67. Log Header Message Header Message Header Message

  68. Efficiently Replicating an Append-only Log

  69. What If…? The Data Structure could be directly sent to the “network”?

  70. Header Message

  71. Position in Log Header Length Message

  72. Position in Log Header Length + Message Version/Flags Type etc.

  73. Fragment 0 Header Message

  74. Fragment 0 Header Header Message Message

  75. Fragment 0 Header Header Message Message Header Message Header Message

  76. Fragment 0 Header Header Message Message Header Message Fragment 1 Header Message

  77. Fragment 0 Header Header Message Message Header Header Message Message Fragment 1 Header Header Message Message

  78. Natural for broadcast replication

  79. In Closing…

  80. A flood of data is coming, Many say it is already here, How will you deal with it?

  81. Ever seen a PitBull drink from a Fire Hydrant?

  82. It won’t give up… Be the Pitbull at the Fire Hydrant!

  83. Questions? Aeron https://github.com/real-logic/Aeron • SlideShare http://www.slideshare.com/toddleemontgomery • Twitter @toddlmontgomery • Thank You! @toddlmontgomery

Recommend


More recommend