M i c r o k e r n e l s i n t h e E r a o f D a t a - C e n t r i c C o m p u t i n g Martjn Děcký martjn.decky@huawei.com February 2018
Who Am I Passionate programmer and operatjng systems enthusiast With a specifjc inclinatjon towards multjserver microkernels HelenOS developer since 2004 Research Scientjst since 2006 Charles University (Prague), Distributed Systems Research Group Senior Research Engineer since 2017 Huawei Technologies (Munich), Central Sofuware Instjtute Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 2
M o t i v a t i o n 3 Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 3
Memory Barrier Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 4
LaMarca, Ladner (1996) Quick Sort Radix Sort O( n ×log n ) operatjons O( n ) operatjons Quick Sort Radix Sort 1200 1000 800 Instructjons / item 600 400 200 0 4 8 16 32 64 128 256 512 1024 2048 4096 Thousands of items Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 5
LaMarca, Ladner (1996) Quick Sort Radix Sort O( n ×log n ) operatjons O( n ) operatjons Quick Sort Radix Sort 2000 1800 1600 1400 Clock cycles / item 1200 1000 800 600 400 200 0 4 8 16 32 64 128 256 512 1024 2048 4096 Thousands of items Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 6
LaMarca, Ladner (1996) Quick Sort Radix Sort O( n ×log n ) operatjons O( n ) operatjons Quick Sort Radix Sort 5 4.5 4 3.5 Cache misses / item 3 2.5 2 1.5 1 0.5 0 4 8 16 32 64 128 256 512 1024 2048 4096 Thousands of items Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 7
The Myth of RAM Accessing a random memory locatjon requires O(1) operatjons Accessing a random memory locatjon takes O(1) tjme units Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 8
The Myth of RAM Accessing a random memory locatjon requires O(1) operatjons Accessing a random memory locatjon takes O(1) tjme units Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 9
The Myth of RAM Accessing a random memory locatjon requires O(1) operatjons Accessing a random memory locatjon takes O( √n ) tjme units Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 10
B r e a k i n g t h e B a r r i e r Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 11
Von Neumann Forever? Data Control RAM Status Input Output ALU peripheral peripheral Controller Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 12
Von Neumann Forever? New emerging memory technologies Bridging the gap between volatjle and non-volatjle memory No longer necessary to keep the distjnctjon between RAM and storage (peripherals) Single-level memory (universal memory) See also the talk by Liam Proven ( The circuit less traveled ), Janson, Sat 13:00 Many technologies in development Magnetoresistjve random-access memory (MRAM) Racetrack memory Ferroelectric random-access memory (FRAM) Phase-change memory (PCM) Nano-RAM (Nanotube RAM) Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 13
Less Radical Solutjon Near-Data Processing Moving the computatjon closer to the place where the data is Not a completely new idea at all Spatjal locality in general GPUs processing graphics data locally Breaking the monopoly of CPU on data processing even more CPUs are fast, but also power-hungry CPUs can only process the data already fetched from the memory/storage The more data we avoid moving from the memory/storage to the CPU, the more effjciently the CPU runs Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 14
Near-Data Processing Benefjts Decreased latency, increased throughput Not necessarily on an unloaded system, but improvement under load [1] Gu B., Yoon A. S., Bae D.-H., Jo I., Lee J., Yoon J., Kang J.-U., Kwon M., Yoon C., Cho S., Jeong J., Chang D.: Biscuit: A Framework for Near-Data Processing of Big Data Workloads , in Proceedings of 43rd Annual Internatjonal Symposium on Computer Architecture, ACM/IEEE, 2016 Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 15
Near-Data Processing Benefjts (2) Decreased energy consumptjon [2] Kim S., Oh H., Park C., Cho S., Lee S.- W.: Fast, Energy Effjcient Scan inside Flash Memory SSDs , in Proceedings of 37th Internatjonal Conference on Very Large Data Bases (VLDB), VLDB Endowment, 2011 Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 16
In-Memory Near-Data Processing Adding computatjonal capability to DRAM cells Simple logical comparators/operators that could be computed in parallel on individual words Filtering based on bitwise patuern Bitwise operatjons Making use of the inherent parallelism Avoiding moving unnecessary data out of DRAM Avoiding linear processing of independent words of the data in CPU Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 17
Dynamic RAM row decoder address memory matrix ⁞ control logic ......... Sense amps RAS ......... CAS Y Y-gatjng data WE Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 18
Dynamic RAM with NDP row decoder address memory matrix ⁞ control logic ......... Sense amps RAS ......... CAS opcode Filtering / Computjng data WE Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 19
In-Storage Near-Data Processing Adding computatjonal capability to SSD controllers Again, making use of inherent parallelism of fmash memory But SSD controllers also contain powerful embedded cores Flash Translatjon Layer, garbage collectjon, wear leveling Thus computatjon is not limited to simple bitwise fjltering and operatjons Complex trade-ofgs between computjng on the primary CPU and offmoading to the SSD controller Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 20
Our Prototype Based on OpenSSD htup://openssd.io/ Open source (GPL) NVMe SSD controller Hanyang University (Seoul), Embedded and Network Computjng Lab FPGA design for Xilinx Xynq-7000 ONFI NAND fmash controller with ECC engine PCI-e NVMe host interface with scatuer-gather DMA engine Controller fjrmware ARMv7 Flash Translatjon Layer, page caching, greedy garbage collectjon Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 21
Our Prototype (2) NVMe NDP extensions NDP module deployment Statjc natjve code so far, moving to safe byte-code (eBPF) NDP datasets Safety boundaries for the NDP modules (for multjtenancy, etc.) NDP Read / Write Extensions of the standard NVMe Read / Write commands NDP module executed on each block, transforms/fjlters data NDP Transform Arbitrary data transformatjons (in-place copying, etc.) Flow-based computatjonal model Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 22
Our Prototype (3) Fast prototyping QEMU model of the OpenSSD hardware (for running the OpenSSD fjrmware on ARMv7) Connected to a second host QEMU/KVM (as a regular PCI-e NVMe storage device) Planned evaluatjon Real-world applicatjon proof-of-concept Custom storage engine for MySQL with operator push-down Ceph operator push-down Key-value store Generic fjle system acceleratjon Stjll a very much work-in-progress Preliminary results quite promising Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 23
H o w M i c r o k e r n e l s F i t i n t o t h i s ? Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 24
Future Vision Not just near-data processing, but data-centric computjng As opposed to CPU-centric computjng Running the computatjon dynamically where it is the most effjcient Not necessarily moving the data to the central processing unit The CPU is the orchestrator Massively distributed systems Within the box of your machine (desktop, laptop, smartphone) Outside your machine (edge cloud, fog, cloud, data center) Massively heterogeneous systems Difgerent ISAs Fully programmable, partjally programmable, fjxed-functjon Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 25
Recommend
More recommend