Towards Unification of HPC and Big Data Paradigms
Universidad Complutense de Madrid Conferencias de Postgrado
Computer Science and Engineering Department University Carlos III of Madrid
Jesús Carretero
jcarrete@inf.uc3m.es
Towards Unification of HPC and Big Data Paradigms Jess Carretero - - PowerPoint PPT Presentation
Universidad Complutense de Madrid Conferencias de Postgrado Towards Unification of HPC and Big Data Paradigms Jess Carretero Computer Science and Engineering Department University Carlos III of Madrid jcarrete@inf.uc3m.es Science research
Computer Science and Engineering Department University Carlos III of Madrid
jcarrete@inf.uc3m.es
2
University Carlos III of Madrid
3
University Carlos III of Madrid
4
University Carlos III of Madrid
5
University Carlos III of Madrid
6
University Carlos III of Madrid
7
qMore input data (ingestion) qMore output data for integration/analysis qReal time, near-real time requirements
University Carlos III of Madrid
8
University Carlos III of Madrid
9
University Carlos III of Madrid
10
University Carlos III of Madrid
11
University Carlos III of Madrid
12
University Carlos III of Madrid
Daniel A. Reed And Jack Dongarra. Exascale Computing and Big Data.Communications Of The Acm. 58(1). July 2015. 7
13
University Carlos III of Madrid
14
University Carlos III of Madrid
15
University Carlos III of Madrid
16
University Carlos III of Madrid
17
q A combination of the Map and Reduce models with an associated implementation
University Carlos III of Madrid
18
Node 1
Chunk of input data
Node 2
Chunk of input data
Node 3
Chunk of input data
University Carlos III of Madrid
19
University Carlos III of Madrid
20
University Carlos III of Madrid
21
q Subsequent simulations can run autonomously for each (Tx; parameters) entry.
q With the necessary data that was mapped to them in the previous stage q Plus the required simulation parameters that are common for every partition
University Carlos III of Madrid
22
University Carlos III of Madrid
"Efficient design assessment in the railway electric infrastructure domain using cloud computing", S. Caíno-Lores, A. García, F. García-Carballeira, J. Carretero, Integrated Computer-Aided Engineng, vol. 24, no. 1, pp. 57-72, December, 2016.
23
University Carlos III of Madrid
24
University Carlos III of Madrid
25
University Carlos III of Madrid
}
http://mapreduce.sandia.gov/
}
[1] T. Gao, Y. Guo, B. Zhang, P. Cicotti, Y. Lu, P. Balaji, and M. Taufer. Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems. In Proceedings of the IPDPS, 2017.
26
64X 4X
University Carlos III of Madrid
27
University Carlos III of Madrid
28
University Carlos III of Madrid
29
University Carlos III of Madrid
30
University Carlos III of Madrid
31
.... …. Compute nodes I/O nodes Storage nodes Back-end storage
NVRM NVRM NVRM NVRM
management
and fast inteconnection network
(burst scheduling)
loads
GPFS, Lustre, PVFS)
University Carlos III of Madrid
32
University Carlos III of Madrid
33
University Carlos III of Madrid
34
q Connect scheduler with Hercules to ask for data allocation
University Carlos III of Madrid
35
q Multiple-level write-back / write-though, Multiple-level prefetching
University Carlos III of Madrid
Distributed Systems, 2011.
36
q Integrate with memory-centric ad-hoc storage systems. q Create mechanisms to induce data locality in HPC-oriented paradigms.
q Joining MPI ang Spark models through RDDs
University Carlos III of Madrid
Computer Science and Engineering Department University Carlos III of Madrid
jcarrete@inf.uc3m.es