GPUfs: Integrating a file system with GPUs Mark Silberstein (UT - PowerPoint PPT Presentation

GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Mark Silberstein - UT Austin

Traditional System Architecture Applications OS CPU 2 Mark Silberstein - UT Austin

Modern System Architecture Accelerated applications OS Manycore Hybrid CPU GPUs FPGA processors CPU-GPU 3 Mark Silberstein - UT Austin

Software-hardware gap is widening Accelerated applications OS Manycore Hybrid CPU GPUs FPGA processors CPU-GPU 4 Mark Silberstein - UT Austin

Software-hardware gap is widening Accelerated applications Ad-hoc abstractions and OS management mechanisms Manycore Hybrid CPU GPUs FPGA processors CPU-GPU 5 Mark Silberstein - UT Austin

On-accelerator OS support closes the programmability gap Accelerated Native accelerator applications applications On-accelerator OS support OS Coordination Manycore Hybrid CPU GPUs FPGA processors CPU-GPU 6 Mark Silberstein - UT Austin

● GPUfs: File I/O support for GPUs ● Motivation ● Goals ● Understanding the hardware ● Design ● Implementation ● Evaluation 7 Mark Silberstein - UT Austin

Building systems with GPUs is hard. Why? 8 Mark Silberstein - UT Austin

Goal of GPU programming frameworks GPU CPU Parallel Data transfers GPU invocation Algorithm Memory management 9 Mark Silberstein - UT Austin

Headache for GPU programmers GPU CPU Data transfers Parallel Invocation Algorithm Memory management Half of the CUDA SDK 4.1 samples: at least 9 CPU LOC per 1 GPU LOC 10 Mark Silberstein - UT Austin

GPU kernels are isolated GPU CPU Data transfers Parallel Invocation Algorithm Memory management 11 Mark Silberstein - UT Austin

Example: accelerating photo collage http://www.codeproject.com/Articles/36347/Face-Collage While(Unhappy()){ Read_next_image_file() Decide_placement() Remove_outliers() } 12 Mark Silberstein - UT Austin

CPU Implementation Application CPU CPU CPU While(Unhappy()){ Read_next_image_file() Decide_placement() Remove_outliers() } 13 Mark Silberstein - UT Austin

Offloading computations to GPU Application CPU CPU CPU Move to GPU While(Unhappy()){ Read_next_image_file() Decide_placement() Remove_outliers() } 14 Mark Silberstein - UT Austin

Offloading computations to GPU Co-processor programming model CPU Data transfer GPU Kernel Kernel start termination 15 Mark Silberstein - UT Austin

Kernel start/stop overheads Invocation CPU latency copy to o invoke t Cache flush GPU U y p P o C c GPU Synchronization 16 Mark Silberstein - UT Austin

Hiding the overheads Asynchronous invocation Manual data reuse management Double buffering CPU copy to copy to o invoke t GPU GPU U y p P o C c GPU 17 Mark Silberstein - UT Austin

Implementation complexity Management overhead Asynchronous invocation Manual data reuse management Double buffering CPU copy to copy to o invoke t GPU GPU U y p P o C c GPU 18 Mark Silberstein - UT Austin

Implementation complexity Management overhead Asynchronous invocation Manual data reuse management Double buffering CPU copy to copy to o invoke t GPU GPU U y p P o C c GPU Why do we need to deal with low-level system details? 19 Mark Silberstein - UT Austin

The reason is.... GPUs are peer-processors They need I/O OS services 20 Mark Silberstein - UT Austin

GPUfs: application view C PUs G PU1 G PU2 G PU3 open(“shared_file”) o p e n ( ) “ ( s write() p h a a m r m e d _ f i l e ” ) GPUfs Host File System 21 Mark Silberstein - UT Austin

GPUfs: application view C PUs G PU1 G PU2 G PU3 open(“shared_file”) o p e n ( System-wide ) “ ( s write() p h a a m shared r m e d namespace _ f i l e ” ) GPUfs POSIX (CPU)-like API Host File System Persistent storage 22 Mark Silberstein - UT Austin

Accelerating collage app with GPUfs No CPU management code CPU CPU CPU GPUfs GPUfs GPU open/read from GPU 23 Mark Silberstein - UT Austin

Accelerating collage app with GPUfs CPU CPU CPU Read-ahead GPUfs buffer cache GPUfs GPUfs GPU Overlapping Overlapping computations and transfers 24 Mark Silberstein - UT Austin

Accelerating collage app with GPUfs CPU CPU CPU GPUfs GPU Data reuse Random data access 25 Mark Silberstein - UT Austin

Challenge GPU ≠ CPU 26 Mark Silberstein - UT Austin

Massive parallelism Parallelism is essential for performance in deeply multi-threaded wide-vector hardware AMD HD5870* NVIDIA Fermi* 23,000 31,000 active threads active threads From M. Houston/A. Lefohn/K. Fatahalian – A trip through the architecture of modern GPUs* 27 Mark Silberstein - UT Austin

Heterogeneous memory GPUs inherently impose high bandwidth demands on memory GPU CPU 10-32GB/s 288-360GB/s ~x20 Memory Memory 6-16 GB/s 28 Mark Silberstein - UT Austin

How to build an FS layer on this hardware? 29 Mark Silberstein - UT Austin

GPUfs: principled redesign of the whole file system stack ● Relaxed FS API semantics for parallelism ● Relaxed FS consistency for heterogeneous memory ● GPU-specific implementation of synchronization primitives, lock-free data structures, memory allocation, …. 30 Mark Silberstein - UT Austin

GPUfs high-level design CPU GPU GPU application Unchanged applications using GPUfs File API using OS File API Massive parallelism GPUfs hooks GPUfs GPU OS File System Interface File I/O library OS GPUfs Distributed Buffer Cache (Page cache) Heterogeneous CPU Memory GPU Memory memory Host File System Disk 31 Mark Silberstein - UT Austin

GPUfs high-level design CPU GPU GPU application Unchanged applications using GPUfs File API using OS File API GPUfs hooks GPUfs GPU OS File System Interface File I/O library OS GPUfs Distributed Buffer Cache (Page cache) CPU Memory GPU Memory Host File System Disk 32 Mark Silberstein - UT Austin

Buffer cache semantics Local or Distributed file system data consistency? 33 Mark Silberstein - UT Austin

GPUfs buffer cache Weak data consistency model ● close(sync)-to-open semantics (AFS) open() read(1) GPU1 Not visible to CPU GPU2 write(1) fsync() write(2) Remote-to-Local memory performance ratio is similar to a distributed system >> 34 Mark Silberstein - UT Austin

On-GPU File I/O API open/close gopen/gclose r e read/write gread/gwrite p a p mmap/munmap gmmap/gmunmap e h t n fsync/msync gfsync/gmsync I ftrunc gftrunc Changes in the semantics are crucial 35 Mark Silberstein - UT Austin

Implementation bits ● Paging support ● Dynamic data structures and memory allocators r e ● Lock-free radix tree p a p ● Inter-processor communications (IPC) e h t n ● Hybrid H/W-S/W barriers I ● Consistency module in the OS kernel ~1,5K GPU LOC, ~600 CPU LOC 36 Mark Silberstein - UT Austin

Evaluation All benchmarks are written as a GPU kernel: no CPU-side development 37 Mark Silberstein - UT Austin

Matrix-vector product (Inputs/Outputs in files) Vector 1x128K elements, Page size = 2MB, GPU=TESLA C2075 CUDA piplined CUDA optimized GPU file I/O 3500 3000 2500 Throughput (MB/s) 2000 1500 1000 500 0 280 560 2800 5600 11200 Input matrix size (MB) 38 Mark Silberstein - UT Austin

Word frequency count in text ● Count frequency of modern English words in the works of Shakespeare, and in the Linux kernel source tree English dictionary: 58,000 words Challenges Dynamic working set Small files Lots of file I/O (33,000 files,1-5KB each) Unpredictable output size 39 Mark Silberstein - UT Austin

Results 8CPUs GPU-vanilla GPU-GPUfs Linux source 6h 50m (7.2X) 53m (6.8X) 33,000 files, 524MB Shakespeare 292s 40s (7.3X) 40s (7.3X) 1 file, 6MB 40 Mark Silberstein - UT Austin

Results 8CPUs GPU-vanilla GPU-GPUfs Linux source 6h 50m (7.2X) 53m (6.8X) 33,000 files, 524MB 8% overhead Shakespeare 292s 40s (7.3X) 40s (7.3X) 1 file, 6MB Unbounded input/output size support 41 Mark Silberstein - UT Austin

GPUfs is the first system to provide native access to host OS services from GPU programs GPUfs CPU CPU GPU GPU Code is available for download at: https://sites.google.com/site/silbersteinmark/Home/gpufs http://goo.gl/ofJ6J 42 Mark Silberstein - UT Austin

Our life would have been easier with ● PCI atomics ● Preemptive background daemons ● GPU-CPU signaling support ● In-GPU exceptions ● GPU virtual memory API (host-based or device) ● Compiler optimizations for register-heavy libraries ● Seems like accomplished in 5.0 43 Mark Silberstein - UT Austin

Sequential access to file: 3 versions CUDA whole file transfer GPU file I/O GPU CPU gmmap() Read file Transfer to GPU CUDA pipelined transfer CPU Read chunk Transfer to GPU Read chunk Transfer to GPU Read chunk Transfer to GPU Read chunk Transfer to GPU 44 Mark Silberstein - UT Austin

Sequential read Throughput vs. Page size 4000 GPU File I/O CUDA whole file CUDA pipeline 3500 3000 Throughput (MB/s) 2500 2000 1500 1000 500 0 16K 64K 256K 512K 1M 2M Page size 45 Mark Silberstein - UT Austin

GPUfs: Integrating a file system with GPUs Mark Silberstein (UT - PowerPoint PPT Presentation

GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Mark Silberstein - UT Austin Traditional System Architecture Applications OS CPU 2

GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

File Management What is a file? Elements of file management File organization

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

~FILE SYSTEM~ SUNU WIBIRAMA OUTLINE FILE SYSTEM ACCESS METHODS DIRECTORY STRUCTURE FILE

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

File System Implementation Summer 2016 Cornell University Today File allocation Unix

FILE SYSTEM IMPLEMENTATION Sunu Wibirama Outline File-System Structure File-System

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

File Systems: Semantics & Structure What is a File a file is a named collection of

File Systems: Semantics & Structure What is a File a file is a named collection of

Chapter 12: File System Implementation File System Structure File System Implementation

File Systems Chapter 11, 13 OSPP What is a File? What is a Directory? Goals of File System

What if... There is no file with the name given to the File constructor: new File

June 18, 2020 | 8:00-9:00 am Teleconference: (647) 951-8467 or Long Distance: 1 (844) 304 -7743

Wayne Mitic July 26, 2012 Some day................. How do you gather information, organize

Diascopic- -science Approach to science Approach to Diascopic Teaching General- -education

Wel elcome come to th o the e Schn hneid eider er Do Downs wns Qu Quarterly erly

Predicting PicCollage users first purchase for targeted promotions Team 2 Reggie Escobar .

Stewardship Plan Laying the Foundation | Community Workshop October 1, 2019 Warm Up What

Create Memorable Experience with Wefio info@wefio.id | www.wefio.id Wefio can help giving the

2020-2021 The future begins here! INTRODUCTIONS Sandra Brunet Principal Wayne Beck Asst.

GPUfs: Integrating a file system with GPUs Mark Silberstein (UT - PowerPoint PPT Presentation

GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Mark Silberstein - UT Austin Traditional System Architecture Applications OS CPU 2

GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

File Management What is a file? Elements of file management File organization

Click on M odel File for CAD Click on M odel File for CAD Click on Model File for CAD Click

~FILE SYSTEM~ SUNU WIBIRAMA OUTLINE FILE SYSTEM ACCESS METHODS DIRECTORY STRUCTURE FILE

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

CPSC 410/611: File Management What is a file? Elements of file management File

Week 10: File Management What is a file? Elements of file management File

File System Implementation Summer 2016 Cornell University Today File allocation Unix

FILE SYSTEM IMPLEMENTATION Sunu Wibirama Outline File-System Structure File-System

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

File Systems: Semantics &amp; Structure What is a File a file is a named collection of

File Systems: Semantics &amp; Structure What is a File a file is a named collection of

Chapter 12: File System Implementation File System Structure File System Implementation

File Systems Chapter 11, 13 OSPP What is a File? What is a Directory? Goals of File System

What if... There is no file with the name given to the File constructor: new File

June 18, 2020 | 8:00-9:00 am Teleconference: (647) 951-8467 or Long Distance: 1 (844) 304 -7743

Wayne Mitic July 26, 2012 Some day................. How do you gather information, organize

Diascopic- -science Approach to science Approach to Diascopic Teaching General- -education

Wel elcome come to th o the e Schn hneid eider er Do Downs wns Qu Quarterly erly

Predicting PicCollage users first purchase for targeted promotions Team 2 Reggie Escobar .

Stewardship Plan Laying the Foundation | Community Workshop October 1, 2019 Warm Up What

Create Memorable Experience with Wefio info@wefio.id | www.wefio.id Wefio can help giving the

2020-2021 The future begins here! INTRODUCTIONS Sandra Brunet Principal Wayne Beck Asst.

File Systems: Semantics & Structure What is a File a file is a named collection of

File Systems: Semantics & Structure What is a File a file is a named collection of