High Performance Data Intensive Computing Dongfang Zhao, Assistant - - PowerPoint PPT Presentation

high performance data intensive computing
SMART_READER_LITE
LIVE PREVIEW

High Performance Data Intensive Computing Dongfang Zhao, Assistant - - PowerPoint PPT Presentation

High Performance Data Intensive Computing Dongfang Zhao, Assistant Professor Department of Computer Science & Engineering University of Nevada, Reno Who am I 2017 , Assistant Professor, University of Nevada, Reno 2016,


slide-1
SLIDE 1

High Performance Data Intensive Computing

Dongfang Zhao, Assistant Professor Department of Computer Science & Engineering University of Nevada, Reno

slide-2
SLIDE 2

Who am I

  • 2017 – , Assistant Professor, University of Nevada, Reno
  • 2016, Postdoctoral Fellow, University of Washington, Seattle
  • 2015, PhD in Computer Science, Illinois Institute of Technology, Chicago
  • 2015, Summer Intern, IBM Research – Almaden, San Jose, CA
  • 2009-2011, Software Engineer, Epic Systems, Madison, WI
  • 2008, MS in Computer Science, Emory University, Atlanta, GA
  • 2005, MS in Statistics, Katholieke Universiteit Leuven, Belgium
slide-3
SLIDE 3

Outline

  • Past Work

– 2005-2008: Machine Intelligence, Computer Vision – 2012-2015: High Performance Computing, Distributed Systems – 2015-2016: Big Data Systems, Database Systems

  • Current Status

– Personnel – Facilities

  • Future Research Directions

– Distributed Memory Management for Big Data Systems – Locality-aware Resource Management in Virtualized Computing – High Performance Database Systems

slide-4
SLIDE 4

Past Work: 2005-2008

  • Incremental Dimensionality Reduction
  • E.g., published at IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
slide-5
SLIDE 5

Past Work: 2012-2015

  • High Performance Computing
  • E.g., published at IEEE Transactions on Parallel and Distributed Systems (TPDS)
slide-6
SLIDE 6

Past Work: 2015-2016

  • Big Data Systems
  • E.g., published at Very Large Database Systems (VLDB)
slide-7
SLIDE 7

Current Status: Personnel

  • Currently at Nevada:

– 1 PhD student starting Fall 2017 – 1 master student starting Fall 2017 – Collaborating closely with Prof. Dr. Feng Yan working on Data Mining and Performance Modelling. He used to publish at KDD, SIGMETRICS, Supercomputing, CLOUD, NOMS, etc.

  • Plan:

– By Fall 2018, the lab will recruit: two more PhD students, two more master students

slide-8
SLIDE 8

Current Status: Facilities

  • Nevada’s HPC cluster

– 56 compute nodes: PowerEdge C6320

  • 1792 cores
  • 128 (or 192?) GB RAM per node

– 11 GPU Nodes: PowerEdge C4130 each with 4xP100 with NVLink

  • 352 cores
  • 44 P100 GPUs
  • Our lab’s 10-node GPU cluster, each node has

– 12 CPU cores – 4 GeForce GTX 1080 cards – 64 GB RAM

slide-9
SLIDE 9

Future Directions

  • Distributed Memory Management for Big Data Systems

– Motivation: Modern big data systems do not have a coordinated way to manage memory

  • Users are asked to specify the memory allocation
  • Local OS takes the responsibility

– Objective

  • A middleware to automatically manipulate memory for big data systems
  • The middleware oversees the overall memory status rather than optimizing the local usage
  • Users should be able to plug in ad-hoc strategy for the underlying memory management
slide-10
SLIDE 10

Future Directions

  • Locality-aware Resource Management in Virtualized Computing

– Extension of my intern work in Summer 2015 – Motivation: Load balance is sometimes overemphasized – Objective: improve data locality for virtualized computation

slide-11
SLIDE 11

Future Directions

  • High Performance Distributed Databases

– Motivation: for some reason, HPC’s dominant storage solution is file system – Objective: building a high-performance distributed database system atop existing parallel/distributed file systems that will support performant:

  • Queries expressed in SQL
  • Data load, transform, extract, etc.

– Challenges

  • Performance bottleneck: from network to what?
  • How to leverage GPUs, InfiniBand, MPI, etc. for database workloads?
slide-12
SLIDE 12

Thanks!

Dongfang Zhao dzhao@unr.edu