caribou intelligent distributed storage
play

Caribou: Intelligent Distributed Storage Zsolt Istvn, David Sidler, - PowerPoint PPT Presentation

Caribou: Intelligent Distributed Storage Zsolt Istvn, David Sidler, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich 1 Rack-scale thinking ToR Switch Compute In the Cloud Compute Compute + Provisioning Compute +


  1. Caribou: Intelligent Distributed Storage Zsolt István, David Sidler, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich 1

  2. Rack-scale thinking ToR Switch Compute In the Cloud Compute Compute + Provisioning Compute + Independent Scalability - Data movement bottleneck In an Appliance Storage Storage Storage Storage 2

  3. Storage Design Options Compute > Bandwidth Compute < Bandwidth Oracle Exadata Samsung YourSQL IBM PureData Winsconsin SmartSSD Compute ~ Bandwidth Deuteronomy Kinetic Drives … BlueCache … + Full-fledged Features similar to - Outside management - SW+HW overhead software + No-overhead access - Large footprint Balanced design + Small footprint 3

  4. What is Caribou?  Intelligent Distributed Storage with FPGAs 10Gbps Switch  Easy integration on commodity network Clients  Random access to tuples & in-storage scans Clients Clients  Selection predicate pushdown Clients Clients  Data replicated consistently to nodes  Extensible (open-source) design Caribou Caribou Node Node Caribou Caribou Node Node fpgasystems 4

  5. FPGA 101 Field Programmable Gate Array  Reprogrammable hardware  Large number of configurable logic blocks  Tight integration, massive parallelism  Network/App Co-design FPGA  Innovation… 5

  6. Inside a Caribou node The pipeline runs at the 10Gbps Switch same speed at the network (line-rate) Clients Software clients, Key-value interface (Single-key lookup or Scanning) Clients Clients Network TCP/IP Clients Clients Key-value Replication Processing management Caribou Caribou Caribou Node Node Caribou Caribou DRAM Node Node 1000s of connections, SW clients Cuckoo hash Conditionals, Primary/backup table, slab memory Regex, Atomic allocation, Decompression Broadcast bitmap indexes 6

  7. Throughput of random access to storage 7

  8. Random access response times • Response times comparable to SW on Infiniband, but Caribou uses commodity networking Get Put/Update Put/Update (Replicated) 60 Response time [us] 50 40 30 20 10 0 0 64 128 192 256 Value size [B] 8

  9. Operator push-down The filtering circuits SELECT … FROM customer are parameterized at WHERE age<35 AND purchases>2 runtime, with no overhead. AND address LIKE “% Luzern%CH %”  Multiple comparisons to constants (conjunction)  Substrings or regular expression matching [1]  Can filter compressed data (LZ77)  Extensible pipeline design [1] Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures . D. Sidler, Zs. 9 Istvan, M. Ewaida, G. Alonso. 2017 ACM SIGMOD/PODS Conference (SIGMOD'17)

  10. Exploiting Parallelism Complexity Value Throughput Throughput Value’ Regex LZ77 Core Regex LZ77 Core 1 Value’ Value DRAM Value’ Value’ Regex … LZ77 Core … … 1 0 1 Keep? Regex Core LZ77 Comparison Regular Transform Predicate Expressions 10

  11. Scan and filter  Choice of filter and value size do not impact scan rate. Bound by the Bound by the network/client Filter performance Scan rate in GB/s is same regardless value size 11

  12. Near Data Processing without Surprises  Filtering can be combined with random access reads as well 12

  13. “The Times They Are A -Changin ”  In-Storage Processing  Stand-alone boards, MPSoC (ARM+FPGA)  Add NVMe flash, N.V. Memory  Explore different KVS (memcached, redis , …)  In-Network Processing  Microsoft Catapult NICs  Work on streaming data  Distributed service in the cloud  Accelerator  Intel Xeon+FPGA  Offload computation without partitioning or 13 copying data

  14. Time to Explore…  Data movement bottleneck on many levels  Caribou – Intelligent Distributed Storage  Software-like service in a small footprint  Balanced design with “right amount” of compute  Caribou – Platform to Explore Near-data Processing  Open source, modular and portable  Data processing operators applicable on other HW platforms  https://github.com/fpgasystems/caribou https://www.systems.ethz.ch/fpga/ zsolt.istvan@inf.ethz.ch 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend