Coupling Task Progress for MapReduce Resource-Aware Scheduling
Jian Tan, Xiaoqiao Meng, Li Zhang IBM T. J. Watson Research Center Yorktown Heights, New York, 10598 Email: {tanji, xmeng, zhangli}@us.ibm.com
Abstract—Schedulers are critical in enhancing the perfor- mance of MapReduce/Hadoop in presence of multiple jobs with different characteristics and performance goals. Though current schedulers for Hadoop are quite successful, they still have room for improvement: map tasks (MapTasks) and reduce tasks (ReduceTasks) are not jointly optimized, albeit there is a strong dependence between them. This can cause job starvation and unfavorable data locality. In this paper, we design and implement a resource-aware scheduler for Hadoop. It couples the progresses of MapTasks and ReduceTasks, utilizing Wait Scheduling for ReduceTasks and Random Peeking Scheduling for MapTasks to jointly optimize the task placement. This mitigates the starvation problem and improves the overall data locality. Our extensive experiments demonstrate significant improvements in job response times.
- I. INTRODUCTION
MapReduce [1] has emerged as a popular paradigm for processing large datasets in parallel over a cluster. As an open source implementation, Hadoop [2] has been successfully used in a variety of applications, such as social network mining, log processing, video and image analysis, search indexing, recommendation systems, etc. In many scenarios, long batch jobs and short interactive queries are submitted to the same MapReduce cluster, sharing limited common computing re- sources with different performance goals. To meet these im- posed challenges an efficient scheduler is critical to providing the desired quality of service for the MapReduce cluster. In this domain of MapReduce scheduling, Fair Scheduler [3] is the most widely used one in practice. Other commonly used schedulers include, the default FIFO Scheduler, Capacity Scheduler [4] and variations [5]–[7]. To improve performance
- f large-scale MapReduce clusters, more complicated resource
management schemes have also been proposed [8]–[11]. While focusing on Fair Scheduler as it is the de facto standard in the Hadoop community, we observe that it, as well as many other schedulers, still exhibits room for improvement. 1) Map and reduce tasks are scheduled separately [5] without joint optimization. First, Fair Scheduler only guarantees the fairness of MapTasks, and is not really fair for ReduceTasks. We observe that allocating excess resources to ReduceTasks without coordinating with the map progress will lead to cluster-wise resource under- utilization, which is evidenced by the starvation prob- lem [12]. Secondly, most MapReduce schedulers only consider data locality for MapTasks and ignore that it is also an issue for ReducerTasks. Though it has recently been addressed in [13], [14], the adopted approaches are sensitive to future run-time information (e.g., map
- utput distribution and competitions among new jobs)
that is difficult to predict. 2) Fair Scheduler uses Delay Scheduling [12] that allows MapTasks to wait for a period of time to find local
- data. It usually improves the data locality for MapTasks.
However, we observe that the introduced delays may lead to under-utilization and instability, i.e., the number
- f MapTasks running simultaneously is far below a
desired level, and change with large variations over time. In view of these observations, we propose a resource- aware scheduler, termed Coupling Scheduler. It couples the progresses of map and reduce tasks to mitigate starvation, and jointly optimizes the placements for both of them to improve the overall data locality. Specifically, we utilize Wait Schedul- ing for ReduceTasks and Random Peeking Scheduling for MapTasks, taking into consideration the interactions between them, to holistically optimize the data locality. Our extensive experiments demonstrate superior performance improvements to the job processing times.
- A. Scheduling ReduceTasks
While MapTasks are small tasks that can run independently in parallel, ReduceTasks are long-running tasks that contain copy/shuffle and reduce phases. In most existing schedulers, ReduceTasks are not preemptive, i.e., a ReduceTask will not release the occupied slot until its reduce phase completes. It is feasible to introduce ReduceTask preemptions in engineering realization [15]. However, this work adheres to the current non-preemption assumption. For Fair Scheduler, once a certain percentage of MapTasks of a job finish, the ReduceTasks are launched greedily to a maximum. This method overlaps the copy/shuffle and map phase of a job and can greatly reduce the job processing times. However, this approach can starve newly arrived jobs [12], and this problem is even more pronounced when many small jobs arrive after large ones [16]. The experiment in Fig. 1 further illustrates the problem.
- Fig. 1 shows the number of map and reduce tasks running
simultaneously at every time point for two Grep jobs. Job 1 grabs all the reduce slots at time 0.9 minutes, just before job 2 is submitted at time 1.0 minute. Thus, when job 2 finishes its MapTasks at time 3.8 minutes, it can not launch