m i c r o k e r n e l s i n t h e e r a o f d a t a c e n
play

M i c r o k e r n e l s i n t h e E r a o f D - PowerPoint PPT Presentation

M i c r o k e r n e l s i n t h e E r a o f D a t a - C e n t r i c C o m p u t i n g Martjn Dck martjn.decky@huawei.com February 2018 Who Am I Passionate programmer and operatjng systems enthusiast


  1. M i c r o k e r n e l s i n t h e E r a o f D a t a - C e n t r i c C o m p u t i n g Martjn Děcký martjn.decky@huawei.com February 2018

  2. Who Am I Passionate programmer and operatjng systems enthusiast With a specifjc inclinatjon towards multjserver microkernels HelenOS developer since 2004 Research Scientjst since 2006 Charles University (Prague), Distributed Systems Research Group Senior Research Engineer since 2017 Huawei Technologies (Munich), Central Sofuware Instjtute Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 2

  3. M o t i v a t i o n 3 Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 3

  4. Memory Barrier Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 4

  5. LaMarca, Ladner (1996) Quick Sort Radix Sort O( n ×log n ) operatjons O( n ) operatjons Quick Sort Radix Sort 1200 1000 800 Instructjons / item 600 400 200 0 4 8 16 32 64 128 256 512 1024 2048 4096 Thousands of items Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 5

  6. LaMarca, Ladner (1996) Quick Sort Radix Sort O( n ×log n ) operatjons O( n ) operatjons Quick Sort Radix Sort 2000 1800 1600 1400 Clock cycles / item 1200 1000 800 600 400 200 0 4 8 16 32 64 128 256 512 1024 2048 4096 Thousands of items Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 6

  7. LaMarca, Ladner (1996) Quick Sort Radix Sort O( n ×log n ) operatjons O( n ) operatjons Quick Sort Radix Sort 5 4.5 4 3.5 Cache misses / item 3 2.5 2 1.5 1 0.5 0 4 8 16 32 64 128 256 512 1024 2048 4096 Thousands of items Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 7

  8. The Myth of RAM Accessing a random memory locatjon requires O(1) operatjons Accessing a random memory locatjon takes O(1) tjme units Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 8

  9. The Myth of RAM Accessing a random memory locatjon requires O(1) operatjons Accessing a random memory locatjon takes O(1) tjme units Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 9

  10. The Myth of RAM Accessing a random memory locatjon requires O(1) operatjons Accessing a random memory locatjon takes O( √n ) tjme units Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 10

  11. B r e a k i n g t h e B a r r i e r Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 11

  12. Von Neumann Forever? Data Control RAM Status Input Output ALU peripheral peripheral Controller Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 12

  13. Von Neumann Forever? New emerging memory technologies Bridging the gap between volatjle and non-volatjle memory No longer necessary to keep the distjnctjon between RAM and storage (peripherals) Single-level memory (universal memory) See also the talk by Liam Proven ( The circuit less traveled ), Janson, Sat 13:00 Many technologies in development Magnetoresistjve random-access memory (MRAM) Racetrack memory Ferroelectric random-access memory (FRAM) Phase-change memory (PCM) Nano-RAM (Nanotube RAM) Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 13

  14. Less Radical Solutjon Near-Data Processing Moving the computatjon closer to the place where the data is Not a completely new idea at all Spatjal locality in general GPUs processing graphics data locally Breaking the monopoly of CPU on data processing even more CPUs are fast, but also power-hungry CPUs can only process the data already fetched from the memory/storage The more data we avoid moving from the memory/storage to the CPU, the more effjciently the CPU runs Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 14

  15. Near-Data Processing Benefjts Decreased latency, increased throughput Not necessarily on an unloaded system, but improvement under load [1] Gu B., Yoon A. S., Bae D.-H., Jo I., Lee J., Yoon J., Kang J.-U., Kwon M., Yoon C., Cho S., Jeong J., Chang D.: Biscuit: A Framework for Near-Data Processing of Big Data Workloads , in Proceedings of 43rd Annual Internatjonal Symposium on Computer Architecture, ACM/IEEE, 2016 Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 15

  16. Near-Data Processing Benefjts (2) Decreased energy consumptjon [2] Kim S., Oh H., Park C., Cho S., Lee S.- W.: Fast, Energy Effjcient Scan inside Flash Memory SSDs , in Proceedings of 37th Internatjonal Conference on Very Large Data Bases (VLDB), VLDB Endowment, 2011 Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 16

  17. In-Memory Near-Data Processing Adding computatjonal capability to DRAM cells Simple logical comparators/operators that could be computed in parallel on individual words Filtering based on bitwise patuern Bitwise operatjons Making use of the inherent parallelism Avoiding moving unnecessary data out of DRAM Avoiding linear processing of independent words of the data in CPU Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 17

  18. Dynamic RAM row decoder address memory matrix ⁞ control logic ......... Sense amps RAS ......... CAS Y Y-gatjng data WE Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 18

  19. Dynamic RAM with NDP row decoder address memory matrix ⁞ control logic ......... Sense amps RAS ......... CAS opcode Filtering / Computjng data WE Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 19

  20. In-Storage Near-Data Processing Adding computatjonal capability to SSD controllers Again, making use of inherent parallelism of fmash memory But SSD controllers also contain powerful embedded cores Flash Translatjon Layer, garbage collectjon, wear leveling Thus computatjon is not limited to simple bitwise fjltering and operatjons Complex trade-ofgs between computjng on the primary CPU and offmoading to the SSD controller Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 20

  21. Our Prototype Based on OpenSSD htup://openssd.io/ Open source (GPL) NVMe SSD controller Hanyang University (Seoul), Embedded and Network Computjng Lab FPGA design for Xilinx Xynq-7000 ONFI NAND fmash controller with ECC engine PCI-e NVMe host interface with scatuer-gather DMA engine Controller fjrmware ARMv7 Flash Translatjon Layer, page caching, greedy garbage collectjon Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 21

  22. Our Prototype (2) NVMe NDP extensions NDP module deployment Statjc natjve code so far, moving to safe byte-code (eBPF) NDP datasets Safety boundaries for the NDP modules (for multjtenancy, etc.) NDP Read / Write Extensions of the standard NVMe Read / Write commands NDP module executed on each block, transforms/fjlters data NDP Transform Arbitrary data transformatjons (in-place copying, etc.) Flow-based computatjonal model Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 22

  23. Our Prototype (3) Fast prototyping QEMU model of the OpenSSD hardware (for running the OpenSSD fjrmware on ARMv7) Connected to a second host QEMU/KVM (as a regular PCI-e NVMe storage device) Planned evaluatjon Real-world applicatjon proof-of-concept Custom storage engine for MySQL with operator push-down Ceph operator push-down Key-value store Generic fjle system acceleratjon Stjll a very much work-in-progress Preliminary results quite promising Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 23

  24. H o w M i c r o k e r n e l s F i t i n t o t h i s ? Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 24

  25. Future Vision Not just near-data processing, but data-centric computjng As opposed to CPU-centric computjng Running the computatjon dynamically where it is the most effjcient Not necessarily moving the data to the central processing unit The CPU is the orchestrator Massively distributed systems Within the box of your machine (desktop, laptop, smartphone) Outside your machine (edge cloud, fog, cloud, data center) Massively heterogeneous systems Difgerent ISAs Fully programmable, partjally programmable, fjxed-functjon Martjn Děcký , FOSDEM 2018, February 3 rd 2018 Microkernels in the Era of Data-Centric Computjng 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend