Data Processing on the fast la lane Gustavo Alonso Systems Group - PowerPoint PPT Presentation

Data Processing on the fast la lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland

The team behind the work: • Rene Müller (now at IBM Almaden) • Louis Woods (now at Apcera) • Jens Teubner (now Professor at TU Dortmund) David Sidler Muhsen Owaida Zsolt Istvan Kaan Kara

Data processing today: Appliances Data Centers (Cloud)

What is a database engine? • As complex or more complex than an operating system • Full software stack including • Parsers, Compilers, Optimizers • Own resource management (memory, storage, network) • Plugins for application logic • Infrastructure for distribution, replication, notifications, recovery • Extract, Transform, and Load infrastructure • Large legacy, backward compatibility, standards • Hugely optimized

Databases are blindly fast at what they do well

Databases = think big ORACLE EXADATA From Oracle documentation

Database engine trends: Appliances Oracle: T7, SQL in Hardware, RAPID SAP: OLTP+OLAP on main memory SAP Hana on SGI UV 300H Hana on SGI supercomputer SGI documentation Nobody ever got fired for using Hadoop on a Cluster A. Rowstron, D. Narayanan, A. Donnely , G. O’Shea, A. Douglas HotCDP 2012, Bern, Switzerland

SQL on FPGAs Presentation at HotChips’16 from Baidu http://www.nextplatform.com/2016/08/24/baidu-takes-fpga-approach-accelerating-big-sql/

The challenge of hardware acceleration

If it sounds too good to be true ..

Usual unspoken caveats in HW acceleration • Where is the data to start with? • Where does the data has to be at the end? • What happens with irregular workloads? • What happens with large intermediate states? • What is the architecture? • Is the design preventing the system from doing something else? • Can the accelerator be multithreaded? • Is the gain big enough to justify the additional complexity? • Can the gains be characterized?

Do not replace, enhance Help the CPU to do what it does not do well

Text search in databases FCCM’16 INTEL HARP: This is an experimental system provided by Intel any results presented are generated using pre- production hardware and software, and may not reflect the performance of production or future systems.

100% processing on FPGA

Hybrid Processing CPU/FPGA

Accelerators to come From Oracle M7 documentation

If the data moves, do it efficiently Bumps in the wire(s)

(Woods, VLDB’14) IBEX

A processor on the data path

Storage to come • Recent example BISCUIT from Samsung (ISCA’16) • User programmable Near-Data Processing for SSDs From Samsung presentation at ISCA’16 http://isca2016.eecs.umich.edu/wp-content/uploads/2016/07/3A-1.pdf

Sounds good? The goal is to be able to do this at all levels: Smart storage On the network switch (SDN like) On the network card (smart NIC) On the PCI express bus On the memory bus (active memory) Every element in the system (a node, a computer rack, a cluster) will be a processing component

Disaggregated data center Near Data Computation

Consensus in a Box (Istvan et al, NSD’16) Xilinx VC709 Evaluation Board FPGA SW Clients / SFP+ TCP Reads Other nodes Replicated Other nodes Writes SFP+ Direct Networking key-value store Atomic Broadcast Other nodes SFP+ Direct SFP+ DRAM (8GB) 01-Sep-16 23

The system 3 FPGA cluster 10Gbps Switch Comm. over TCP/IP Comm. over direct + Leader election connections X 12 + Recovery Clients • Drop-in replacement for memcached with Zookeeper’s replication • Standard tools for benchmarking (libmemcached) • Simulating 100s of clients 24

Latency of puts in a KVS Direct connections ~3 μ s Consensus Memaslap (ixgbe) 15-35 μ s ~10 μ s TCP / 10Gbps Ethernet 25

The benefit of specialization… 10000000 Specialized Througput (consensus rounds/s) solutions 1000000 10-100x FPGA (Direct) FPGA (TCP) 100000 DARE* (Infiniband) General Libpaxos (TCP) purpose Etcd (TCP) 10000 solutions Zookeeper (TCP) 1000 1 10 100 1000 Consensus latency (us) [1] Dragojevic et al. FaRM: Fast Remote Memory . In NSDI’14. 26 [2] Poke et al. DARE: High-Performance State Machine Replication on RDMA Networks. In HPDC’15. *=We extrapolated from the 5 node setup for a 3 node setup.

This is the end … Most exciting time to be in research Many opportunities at all levels and in all areas FPGAs great tools to: Explore parallelism Explore new architectures Explore Software Defined X/Y/Z Prototype accelerators

FPGAs: the view from an outsider

Difficulty to program • FPGAs are no more difficult to program than system software (OS, databases, infrastructure, etc.) • Only a handful of programmers can do system software, my guess is system programmers are not many more than the people who can program FPGAs • But FPGAs have no tools to enhance productivity, specially no freely available tools (GCC, instrumentation, libraries, open source tools …)

CS vs EE • EE = understand parallelism • CS= understand abstraction You need both (and these days a lot more: systems, algorithms, machine learning, data center architecture, …)

Complete systems • The proof of something that makes a difference is an end to end argument • Showing that something is faster when running on an FPGA does not mean it will be faster when hooked into a real system (example: GPUs)

Data Processing on the fast la lane Gustavo Alonso Systems Group - PowerPoint PPT Presentation

Data Processing on the fast la lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland The team behind the work: Rene Mller (now at IBM Almaden) Louis Woods (now at Apcera) Jens Teubner (now

Lane Closure Advisory Management System (LCAMS) Lane Closure Advisory Management System (LCAMS)

Life in the Life in the Fast Lane Fast Lane Learning from Learning from Biosensors

Robert H. Lane, MBA, Ph.D. Lane Services, LLC Richard Gaines, PMP Salesian Missions Ask

Multi-Source Translation Methods Lane Schwartz lane@cs.umn.edu University of Minnesota 23 Oct

P L A N N I N G & T R A N S P O R T A T I O N C O M M I T T E E 6 October 2020 Swan Lane

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Life in the Fast Lane: the confluence lens George Varghese, Microsoft Research I drive fast

BEYOND CONVERGENCE, TOWARDS INTEGRATION - The next steps for the ILS Market Morton Lane Ph.D.

YOU MATTER! 2015 WOOD LANE EMPLOYEE HEALTH BENEFITS PLAN WELCOME: WOOD LANE EMPLOYEE HEALTH

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Fast Food and Your Health www.ddssafety.net Last updated October 2009 What is fast food?

Lurssen 32,9 A classic fast Lurssen 32,9 A classic fast A F T D E C K Lurssen 32,9 A

SUPER FAST 15 MINS SUPER FAST 15 MINS 1300 733 215 1300 733 215 UNLIMITED DATA UNLIMITED DATA

Anticipate customers payment issues and prioritize collection efforts With our Cloud based AI

(SAP) Marion Denantes Senior consultant Simplified Approval Process (SAP) P2 SAP approved

UTM FIS Workshop Series Day 7 Basic CO Reporting, Reconciling & Reviewing Month End

Workshop: Reverse Engineering the SAP R/3 Client Protocol Nils Magnus Jochen Kellner 21C3

openHPIs Coding Tool Family: CodeOcean, CodeHarbor, CodePilot Thomas Staubitz, Ralf Teusner

Stakeholder Involvement in the SSIP Process, will start at 3 p.m. While you wait, consider

Covered Business Method Review CBM2012 00001 US Patent No. 6,553,350 Method and Apparatus for

Implementing Centralized Reporting using SAP Analytics Philippe Johnston Senior Director,

Data Processing on the fast la lane Gustavo Alonso Systems Group - PowerPoint PPT Presentation

Data Processing on the fast la lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland The team behind the work: Rene Mller (now at IBM Almaden) Louis Woods (now at Apcera) Jens Teubner (now

Lane Closure Advisory Management System (LCAMS) Lane Closure Advisory Management System (LCAMS)

Life in the Life in the Fast Lane Fast Lane Learning from Learning from Biosensors

Robert H. Lane, MBA, Ph.D. Lane Services, LLC Richard Gaines, PMP Salesian Missions Ask

Multi-Source Translation Methods Lane Schwartz lane@cs.umn.edu University of Minnesota 23 Oct

P L A N N I N G &amp; T R A N S P O R T A T I O N C O M M I T T E E 6 October 2020 Swan Lane

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Life in the Fast Lane: the confluence lens George Varghese, Microsoft Research I drive fast

BEYOND CONVERGENCE, TOWARDS INTEGRATION - The next steps for the ILS Market Morton Lane Ph.D.

YOU MATTER! 2015 WOOD LANE EMPLOYEE HEALTH BENEFITS PLAN WELCOME: WOOD LANE EMPLOYEE HEALTH

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Fast Food and Your Health www.ddssafety.net Last updated October 2009 What is fast food?

Lurssen 32,9 A classic fast Lurssen 32,9 A classic fast A F T D E C K Lurssen 32,9 A

SUPER FAST 15 MINS SUPER FAST 15 MINS 1300 733 215 1300 733 215 UNLIMITED DATA UNLIMITED DATA

Anticipate customers payment issues and prioritize collection efforts With our Cloud based AI

(SAP) Marion Denantes Senior consultant Simplified Approval Process (SAP) P2 SAP approved

UTM FIS Workshop Series Day 7 Basic CO Reporting, Reconciling &amp; Reviewing Month End

Workshop: Reverse Engineering the SAP R/3 Client Protocol Nils Magnus Jochen Kellner 21C3

openHPIs Coding Tool Family: CodeOcean, CodeHarbor, CodePilot Thomas Staubitz, Ralf Teusner

Stakeholder Involvement in the SSIP Process, will start at 3 p.m. While you wait, consider

Covered Business Method Review CBM2012 00001 US Patent No. 6,553,350 Method and Apparatus for

Implementing Centralized Reporting using SAP Analytics Philippe Johnston Senior Director,

P L A N N I N G & T R A N S P O R T A T I O N C O M M I T T E E 6 October 2020 Swan Lane

UTM FIS Workshop Series Day 7 Basic CO Reporting, Reconciling & Reviewing Month End