High Performance Data Intensive Computing Dongfang Zhao, Assistant - PowerPoint PPT Presentation

High Performance Data Intensive Computing Dongfang Zhao, Assistant Professor Department of Computer Science & Engineering University of Nevada, Reno

Who am I • 2017 – , Assistant Professor, University of Nevada, Reno • 2016, Postdoctoral Fellow, University of Washington, Seattle • 2015, PhD in Computer Science, Illinois Institute of Technology, Chicago • 2015, Summer Intern, IBM Research – Almaden, San Jose, CA • 2009-2011, Software Engineer, Epic Systems, Madison, WI • 2008, MS in Computer Science, Emory University, Atlanta, GA • 2005, MS in Statistics, Katholieke Universiteit Leuven, Belgium

Outline • Past Work – 2005-2008: Machine Intelligence, Computer Vision – 2012-2015: High Performance Computing, Distributed Systems – 2015-2016: Big Data Systems, Database Systems • Current Status – Personnel – Facilities • Future Research Directions – Distributed Memory Management for Big Data Systems – Locality-aware Resource Management in Virtualized Computing – High Performance Database Systems

Past Work: 2005-2008 • Incremental Dimensionality Reduction • E.g., published at IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)

Past Work: 2012-2015 • High Performance Computing • E.g., published at IEEE Transactions on Parallel and Distributed Systems (TPDS)

Past Work: 2015-2016 • Big Data Systems • E.g., published at Very Large Database Systems (VLDB)

Current Status: Personnel • Currently at Nevada: – 1 PhD student starting Fall 2017 – 1 master student starting Fall 2017 – Collaborating closely with Prof. Dr. Feng Yan working on Data Mining and Performance Modelling. He used to publish at KDD, SIGMETRICS, Supercomputing, CLOUD, NOMS, etc. • Plan: – By Fall 2018, the lab will recruit: two more PhD students, two more master students

Current Status: Facilities • Nevada’s HPC cluster – 56 compute nodes: PowerEdge C6320 • 1792 cores • 128 (or 192?) GB RAM per node – 11 GPU Nodes: PowerEdge C4130 each with 4xP100 with NVLink • 352 cores • 44 P100 GPUs • Our lab’s 10 -node GPU cluster, each node has – 12 CPU cores – 4 GeForce GTX 1080 cards – 64 GB RAM

Future Directions • Distributed Memory Management for Big Data Systems – Motivation: Modern big data systems do not have a coordinated way to manage memory • Users are asked to specify the memory allocation • Local OS takes the responsibility – Objective • A middleware to automatically manipulate memory for big data systems • The middleware oversees the overall memory status rather than optimizing the local usage • Users should be able to plug in ad-hoc strategy for the underlying memory management

Future Directions • Locality-aware Resource Management in Virtualized Computing – Extension of my intern work in Summer 2015 – Motivation: Load balance is sometimes overemphasized – Objective: improve data locality for virtualized computation

Future Directions • High Performance Distributed Databases – Motivation: for some reason, HPC’s dominant storage solution is file system – Objective: building a high-performance distributed database system atop existing parallel/distributed file systems that will support performant: • Queries expressed in SQL • Data load, transform, extract, etc. – Challenges • Performance bottleneck: from network to what? • How to leverage GPUs, InfiniBand, MPI, etc. for database workloads? • …

Thanks! Dongfang Zhao dzhao@unr.edu

High Performance Data Intensive Computing Dongfang Zhao, Assistant - PowerPoint PPT Presentation

High Performance Data Intensive Computing Dongfang Zhao, Assistant Professor Department of Computer Science & Engineering University of Nevada, Reno Who am I 2017 , Assistant Professor, University of Nevada, Reno 2016,

MapReduce Data Intensive Computing Data-intensive computing is a class of parallel

Data-Intensive Workfmows A journey to a Holistjc Framework for Data-Intensive Workfmows Ian

for Data Intensive Scalable Computing CAP3 Gene Assembly Program Compute intensive

Data Intensive Computing Frameworks Amir H. Payberah amir@sics.se Amirkabir University of

New York University High Performance Computing High Performance Computing Information

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

What is Advanced Research Computing? Data Supercomputing Computationally Mining Intensive

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Intensive Family Support Project Katherine Manchester Paula Hill What is the Intensive Family

High-performance computing in Java: the data processing of Gaia X. Luri & J. Torra ICCUB/IEEC

Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand , Chitra

Enabling Enabling Data- -Intensive Science Intensive Science Data with Tactical Storage

OCIO UFOs Template 4 April 26, 2011 4 April 26, 2011 Objectives 1. Provide an interoperable

and Observational Science The Convergence of Data-Intensive and Compute-Intensive Infrastructure

Turning Data Into Business Value Qwertee 101: Finding Your Next Data Partner Data-Intensive

Decoupled I/O for Data-Intensive High Performance Computing Chao Chen 1 Yong Chen 1 Kun Feng 2

Advanced Machine Learning CS 7140 - Spring 2018 Lecture 13: Project Discussion Jan-Willem van de

Reproducible Research, Replicability, and Ethical Practice Ronald A. Thisted Departments of

Rare decays at LHCb: looking for new physics in b s + - transitions Luca

Properties of the Stochastic Approximation Schedule in the Wang-Landau Algorithm Pierre E. Jacob

Probability and Statistics for Computer Science many problems are naturally

Changing needs in a changing world Research supported by TLRI Why change? FROM STATISTICAL

Computer Science Education Research at Wits Vashti Galpin vashti@cs.wits.ac.za

The Role of Adaptive Designs in Clinical Development Program * Sue- -Jane Wang, Ph.D. Jane Wang,

High Performance Data Intensive Computing Dongfang Zhao, Assistant - PowerPoint PPT Presentation

High Performance Data Intensive Computing Dongfang Zhao, Assistant Professor Department of Computer Science & Engineering University of Nevada, Reno Who am I 2017 , Assistant Professor, University of Nevada, Reno 2016,

MapReduce Data Intensive Computing Data-intensive computing is a class of parallel

Data-Intensive Workfmows A journey to a Holistjc Framework for Data-Intensive Workfmows Ian

for Data Intensive Scalable Computing CAP3 Gene Assembly Program Compute intensive

Data Intensive Computing Frameworks Amir H. Payberah amir@sics.se Amirkabir University of

New York University High Performance Computing High Performance Computing Information

Getting the Performance Out Of Getting the Performance Out Of High Performance Computing High

What is Advanced Research Computing? Data Supercomputing Computationally Mining Intensive

High Performance Computing in Web Browsers CE Seminar WT14/15 Henning Lohse High Performance

Intensive Family Support Project Katherine Manchester Paula Hill What is the Intensive Family

High-performance computing in Java: the data processing of Gaia X. Luri &amp; J. Torra ICCUB/IEEC

Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand , Chitra

Enabling Enabling Data- -Intensive Science Intensive Science Data with Tactical Storage

OCIO UFOs Template 4 April 26, 2011 4 April 26, 2011 Objectives 1. Provide an interoperable

and Observational Science The Convergence of Data-Intensive and Compute-Intensive Infrastructure

Turning Data Into Business Value Qwertee 101: Finding Your Next Data Partner Data-Intensive

Decoupled I/O for Data-Intensive High Performance Computing Chao Chen 1 Yong Chen 1 Kun Feng 2

Advanced Machine Learning CS 7140 - Spring 2018 Lecture 13: Project Discussion Jan-Willem van de

Reproducible Research, Replicability, and Ethical Practice Ronald A. Thisted Departments of

Rare decays at LHCb: looking for new physics in b s + - transitions Luca

Properties of the Stochastic Approximation Schedule in the Wang-Landau Algorithm Pierre E. Jacob

Probability and Statistics for Computer Science many problems are naturally

Changing needs in a changing world Research supported by TLRI Why change? FROM STATISTICAL

Computer Science Education Research at Wits Vashti Galpin vashti@cs.wits.ac.za

The Role of Adaptive Designs in Clinical Development Program * Sue- -Jane Wang, Ph.D. Jane Wang,

High-performance computing in Java: the data processing of Gaia X. Luri & J. Torra ICCUB/IEEC