Distributed Shared Memory and Machine Learning CSci 8211 Chai-Wen - PowerPoint PPT Presentation

Distributed Shared Memory and Machine Learning CSci 8211 Chai-Wen Hsieh 11/5/2018

Agenda Distributed Shared memory - Architecture: Shared Memory & Distributed Shared Memory Machine Learning - Supervised, Unsupervised Training - Gradient Descent - Model/Data Parallelism Topics - Problems We Could Solve - Distributed Shared Memory - Deep Learning & DSM

Architecture - Shared Memory ● Sharing one memory among several processors ● Communication through shared variables ● Architectures ○ SMP ○ NUMA ○ COMA From Advanced Operating Systems - Udacity

Architecture - Distributed Shared Memory(DSM) ● Multiple independent processing nodes with local memory modules ● Models: Message Passing v.s. DSM ● Hidden data movement ● Locality of reference ● Provides large virtual memory space ● Cheaper than multiprocessor system From Advanced Operating Systems - Udacity ● Unlimited number of nodes

DSM Issues ● Rewrite to shared memory aware program ● Cache coherence problem - maintaining coherence among several copies of data item ● Performance loss ○ Network ○ Synchronization: lock, barrier ● Failure of nodes ● “Shared memory machines scale well when you don’t share memory” -- Chuck Thacker

Machine Learning Supervised Learning Unsupervised Learning ● Have input variables (X) and ● Only have input data (X) and an output variable (Y) and no corresponding output you use an algorithm to learn variables the mapping function ● Problems: ● Problems: ○ Clustering ○ Classification ○ Association ○ Regression

Deep Learning - Gradient descent

Multi-node Strategy: Data/Model Parallelism Model Data Parallelism Parallelism

Problems We Could Solve 1. Design a distributed shared memory framework that benefits machine learning training 2. Rewrite existing serial programs into parallel programs with ML 3. Adding nodes to a running system, where and when 4. Reduce overhead by prefetch, redistribution 需要選一個 topic focus on it. Go deeper

Topics - Distributed Shared Memory 1. Z. Tasoulas, I. Anagnostopoulos, L. Papadopoulos and D. Soudris, " A Message-Passing Microcoded Synchronization for Distributed Shared Memory Architectures ," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2. Fresno, J., Barba, D., Gonzalez-Escribano, A. et al. Int J Parallel Prog (2018). HitFlow: A Dataflow Programming Model for Hybrid Distributed and Shared-Memory Systems. https://doi.org/10.1007/s10766-018-0561-2 3. Yuji Tamura, Doan Truong Th, Takahiro Chiba, Myungryun Yoo, Takanori Yokoyama, A Real-Time Operating System Supporting Distributed Shared Memory for Embedded Control Systems , Information Science and Applications 2017. ICISA 2017. Lecture Notes in Electrical Engineering, vol 424. Springer, Singapore

Topics - Deep Learning & DSM 1. Probir Roy, Shuaiwen Leon Song, Sriram Krishnamoorthy, Abhinav Vishnu, Dipanjan Sengupta, and Xu Liu. 2018. NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks. ACM Trans. Archit. Code Optim. 15, 2, Article 24 (June 2018), 26 pages. DOI: https://doi.org/10.1145/3199605 2. Shinyoimg Ahn, Joongheon Kim, and Sungwon Kang. 2018. A novel shared memory framework for distributed deep learning in high-performance computing architecture . In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings (ICSE '18). ACM, New York, NY, USA, 191-192. DOI: https://doi.org/10.1145/3183440.3195091

Topics - Deep Learning & DSM - cont’ 1. Amin Tootoonchian, Aurojit Panda, Aida Nematzadeh, Scott Shenker. 2018. Tasvir: Distributed Shared Memory for Machine Learning. SysML Conference. http://www.sysml.cc/doc/214.pdf 2. Wei Jinliang, “ Efficient and Programmable Distributed Shared Memory Systems for Machine Learning Training ”, PhD dissertation, Carnegie Mellon University, 2018.

Distributed Shared Memory and Machine Learning CSci 8211 Chai-Wen - PowerPoint PPT Presentation

Distributed Shared Memory and Machine Learning CSci 8211 Chai-Wen Hsieh 11/5/2018 Agenda Distributed Shared memory - Architecture: Shared Memory & Distributed Shared Memory Machine Learning - Supervised, Unsupervised Training -

Distributed Shared Memory 1 Distributed Shared Memory Making the main memory of a cluster of

Distributed Shared Memory Shared memory : difficult to realize vs . easy to program with.

Distributed Shared Memory Presented by Humayun Arafat 1 Outline Background Shared Memory,

Outline Asynchronous shared memory model Wait-free Consensus in shared memory with R/W

Distributed Shared Memory Distributed Shared Memory Systems Page based

COMP 590-154: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Shared Memory History, fundamentals and a few examples Coming up Cluster Computing

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Distributed Shared Memory and Machine Learning CSci 8211 Chai-Wen Hsieh 11/5/2018 Overview of

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Programming with Shared Memory In a shared memory system, any memory location can be accessible by

Todays Topics - Distributed Shared Memory The Shared Memory Abstraction, why? Approaches

Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine

Research Interests Distributed algorithms Distributed shared memory systems Distributed

Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co- design Pengtao Xie

Mlbase: distributed machine learning system Adapted slides from mlbase.org S

Distributed Machine Learning on Spark Reza Zadeh @Reza_Zadeh | http://reza-zadeh.com Outline

Lecture 5: Parallel machines and models; shared memory programming David Bindel 8 Feb 2010

CSE 473: Artificial Intelligence Machine Learning: Nave Bayes Hanna Hajishirzi Many slides

SystemML: Declarative Machine Learning on Spark 05/03/19 Presented by: Juan Carrillo Candidate

Machine Learning on Blue Waters Using TensorFlow with the Image Feature Detection Problem Or:

Version (Source Code) Control SWEN-250 Overview Motivation why is version control useful?

Sambuz

Useful Links

Newsletter

Mail Us