hadoop performance evaluation
play

Hadoop Performance Evaluation Praktikum fr Fortgeschrittene Name: - PowerPoint PPT Presentation

Hadoop Performance Evaluation Praktikum fr Fortgeschrittene Name: Tien Duc Dinh Betreuer: Olga Mordvinova, Julian Kunkel Datum: 04-12-2007 Outline Introduction 1. Motivation Basic notations HDFS Overview 1.


  1. Hadoop Performance Evaluation Praktikum für Fortgeschrittene Name: Tien Duc Dinh Betreuer: Olga Mordvinova, Julian Kunkel Datum: 04-12-2007

  2. Outline Introduction 1. Motivation  Basic notations  HDFS Overview 1. Architecture  MapReduce  HDFS Performance 1. Test Scenarios  Write  Read  Comparison with local FS  2

  3. What is Hadoop ? Hadoop is an open-source, Java-based programming Outline  framework – Apache project 1. Introduction o Motivation supports the processing of large data sets in a distributed  o Basic notations computing environment was inspired by Google MapReduce and Google File  2. Overview System (GFS) o Architectur currently used by many famous IT enterprises, e.g. o MapReduce  Google, Yahoo, IBM 2. Performance o Test scenarios o Write o Read o Comparison with local FS 3

  4. Basic notations HDFS = Hadoop Distributed File System Outline  Distributed file system  1. Introduction – contains mechanisms for job scheduling/execution o Motivation – for instance allows to move jobs to data o Basic notations 2. Overview Job/Task = MapReduce job/task  o Architectur Metadata  o MapReduce – data, which consist of other data information 2. Performance – e.g. file name, block location o Test scenarios Block  o Write o Read – part of a logical file o Comparison with – contiguous data stored on one server local FS – 64 MB default 4 – configurable

  5. HDFS Overview XXXXXXXX Outline get job Secondary Queue JobTracker Namenode 1. Introduction o Motivation metadata request submit Namenode Metadata o Basic notations job metadata response 2. Overview o Architectur Client o p - r e q u e s t o MapReduce o p - Datanode Datanode r e s p o n s 2. Performance e TaskTracker TaskTracker o Test scenarios Filesystem Filesystem o Write o Read o Comparison with local FS 5 5

  6. Client get job Secondary Queue JobTracker Namenode Outline metadata request submit Namenode Metadata job metadata response 1. Introduction o Motivation o Client p - r e q u e s t o Basic notations Datanode Datanode op-response TaskTracker TaskTracker 2. Overview Filesystem Filesystem o Architectur o MapReduce 2. Performance - is an api of a HDFS application o Test scenarios o Write - communicates with the Namenode because of metadata and directly runs the operation on Datanodes o Read o Comparison with - if it’s a MapReduce operation, client creates an job and send it into the queue. local FS JobTracker handles this queue 6

  7. Namenode get job Secondary Queue JobTracker Namenode submit metadata request Namenode Metadata Outline job metadata response 1. Introduction o Client p - r e q u e s t o Motivation Datanode Datanode op-response o Basic notations TaskTracker TaskTracker Filesystem Filesystem 2. Overview o Architectur o MapReduce - is the master server which manages all system metadata like the namespace, 2. Performance access control information, mapping from files to chunks and chunk locations o Test scenarios executes file system namespace operations like opening, closing, renaming files o Write and directories o Read o Comparison with - gives instructions to the Datanodes to perform system operations, e.g. block local FS creation, deletion and replication - having only one Namenode simplifies the design 7

  8. Datanode get job Secondary Queue JobTracker Namenode Outline metadata request submit Namenode Metadata job metadata response 1. Introduction o Motivation o Client p - r e q u e s t o Basic notations Datanode Datanode op-response TaskTracker TaskTracker 2. Overview Filesystem Filesystem o Architectur o MapReduce 2. Performance - one per node o Test scenarios o Write - stores HDFS data in its local file system o Read - performs operations by clients and system operations upon instruction from the o Comparison with Namenode local FS 8

  9. Secondary Namenode get job Secondary Queue JobTracker Namenode submit metadata request Namenode Metadata Outline job metadata response 1. Introduction o Client p - r e q u e s t o Motivation Datanode Datanode op-response o Basic notations TaskTracker TaskTracker Filesystem Filesystem 2. Overview o Architectur o MapReduce - modifications to the file system are stored as a log file by the Namenode 2. Performance - while starting up, the Namenode reads the HDFS state from an image file o Test scenarios (fsimage) and then applies modifications from the log file o Write o Read - after the Namenode finished writing the new HDFS state to the image file, it o Comparison with empties the log file local FS - merges fsimage and the log file periodically and keeps the log size within a limit 9

  10. TaskTracker get job Secondary Queue JobTracker Namenode submit metadata request Namenode Metadata Outline job metadata response 1. Introduction o Client p - r e q u e s t o Motivation Datanode Datanode op-response o Basic notations TaskTracker TaskTracker Filesystem Filesystem 2. Overview o Architectur o MapReduce - is a node in the cluster that accepts MapReduce tasks from the JobTracker 2. Performance - is configured with a set of slots, these indicate the number of tasks that it can o Test scenarios accept o Write - spawns a separate JVM processes to do the actual work, this helps to ensure that o Read process failure does not take down the TaskTracker o Comparison with local FS - monitors the processes and reports their state to the JobTracker 10 - contacts to the JobTracker through heartbeat meassages

  11. JobTracker (1) get job Secondary Queue JobTracker Namenode submit metadata request Namenode Metadata Outline job metadata response 1. Introduction o Client p - r e q u e s t o Motivation Datanode Datanode op-response o Basic notations TaskTracker TaskTracker Filesystem Filesystem 2. Overview o Architectur o MapReduce 2. Performance - is the MapReduce master o Test scenarios - runs normally on a separate node o Write - uses a queue for the IO scheduling o Read - talks to the NameNode to determine the location of the data o Comparison with local FS - submits the work to the chosen TaskTracker nodes and monitors them through 11 heartbeat meassages in a time interval

  12. JobTracker (2) get job Secondary Queue JobTracker Namenode submit metadata request Namenode Metadata Outline job metadata response 1. Introduction o Client p - r e q u e s t o Motivation Datanode Datanode op-response o Basic notations TaskTracker TaskTracker Filesystem Filesystem 2. Overview o Architectur o MapReduce 2. Performance - if a task is failed, it may resubmitted elsewhere o Test scenarios - when the work is completed, the JobTracker updates its status o Write - Client applications can poll the JobTracker for information o Read o Comparison with - JobTracker is a single point of failure for the Map/Reduce infrastructure. If it goes local FS down, all running jobs are lost. The fileystem remains live 12 - t here is currently no checkpointing or recovery within a single map/reduce job

  13. MapReduce (1) Outline Is a programming model and an associated implementation  for processing and generating large data sets 1. Introduction Its functions map and reduce are supplied by the user  o Motivation Map  o Basic notations – process a key/value pair to generate a set of intermediate key/value pairs 2. Overview – group together all intermediate values with the same key and pass them o Architectur to the Reducer o MapReduce Reduce  2. Performance – XXXXXXXXXXXXXXX o Test scenarios o Write o Read o Comparison with local FS 13

  14. MapReduce (2) Outline 1. Introduction o Motivation o Basic notations 2. Overview o Architectur o MapReduce 2. Performance o Test scenarios o Write o Read o Comparison with local FS 14

  15. MapReduce (3) Outline 1. Introduction o Motivation o Basic notations 2. Overview o Architectur o MapReduce 2. Performance o Test scenarios o Write o Read o Comparison with local FS 15

  16. Example: Word count occurences (1) Outline map(String key, String value): // key: document name (usually key isn’t used) 1. Introduction // value: document contents o Motivation for each word w in value:pair. o Basic notations EmitIntermediate(w, ”1”); 2. Overview reduce(String key, Iterator values): o Architectur o MapReduce // key: a word // values: a list of counts 2. Performance int result = 0; o Test scenarios o Write for each v in values: o Read result += ParseInt(v); o Comparison with Emit(AsString(result)); local FS 16

  17. Example: Word count occurences (2) the folder “data” contains 2 files a and b with the following  contents: Outline – a: Hello World Bye World – b: Hello Hadoop Goodbye Hadoop 1. Introduction the following command will solve this problem  o Motivation o Basic notations > perl -p -e ’s/s+/n/g’ data/* | sort | uniq -c 2. Overview the output looks like  o Architectur 1 Bye o MapReduce 1 Goodbye 2. Performance 2 Hadoop o Test scenarios 2 Hello o Write 2 World o Read o Comparison with local FS 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend