Detecting and Fixing Memory-Related Performance Problems in Managed - - PowerPoint PPT Presentation

detecting and fixing memory related performance problems
SMART_READER_LITE
LIVE PREVIEW

Detecting and Fixing Memory-Related Performance Problems in Managed - - PowerPoint PPT Presentation

Detecting and Fixing Memory-Related Performance Problems in Managed Languages Lu Fang Committee: Prof. Guoqing Xu (Chair), Prof. Alex Nicolau, Prof. Brian Demsky University of California, Irvine May 26, 2017, Irvine, CA, USA Lu Fang (UC


slide-1
SLIDE 1

Detecting and Fixing Memory-Related Performance Problems in Managed Languages

Lu Fang

Committee: Prof. Guoqing Xu (Chair), Prof. Alex Nicolau, Prof. Brian Demsky University of California, Irvine

May 26, 2017, Irvine, CA, USA

Lu Fang (UC Irvine) Final Defense May 26, 2017 1 / 51

slide-2
SLIDE 2

Performance Problems in Real World

Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51

slide-3
SLIDE 3

Performance Problems in Real World

Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51

slide-4
SLIDE 4

Performance Problems in Real World

Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51

slide-5
SLIDE 5

Performance Problems in Real World

Many distributed systems, such as Spark, Hadoop, also suffer from performance problems java.lang.OutOfMemoryError: Java heap space

Lu Fang (UC Irvine) Final Defense May 26, 2017 2 / 51

slide-6
SLIDE 6

Performance Problems

Commonly exist in real world applications

◮ Single-machine apps, such as Eclipse, IE ◮ Traditional databases, web servers, such as MySQL, Tomcat ◮ Big Data systems, such as Hadoop, Spark

Lu Fang (UC Irvine) Final Defense May 26, 2017 3 / 51

slide-7
SLIDE 7

Performance Problems

Commonly exist in real world applications

◮ Single-machine apps, such as Eclipse, IE ◮ Traditional databases, web servers, such as MySQL, Tomcat ◮ Big Data systems, such as Hadoop, Spark

Further exacerbated by managed languages

◮ Such as Java, C# ◮ Big overhead introduced by automatic memory management

Lu Fang (UC Irvine) Final Defense May 26, 2017 3 / 51

slide-8
SLIDE 8

Performance Problems

Commonly exist in real world applications

◮ Single-machine apps, such as Eclipse, IE ◮ Traditional databases, web servers, such as MySQL, Tomcat ◮ Big Data systems, such as Hadoop, Spark

Further exacerbated by managed languages

◮ Such as Java, C# ◮ Big overhead introduced by automatic memory management

Cannot be optimized by compilers

◮ Cannot understand the deep semantics ◮ Cannot guarantee the correctness

Lu Fang (UC Irvine) Final Defense May 26, 2017 3 / 51

slide-9
SLIDE 9

Performance Problems

Difficult to find, especially during development

◮ Invisible effect ◮ Often escape to production runs

Lu Fang (UC Irvine) Final Defense May 26, 2017 4 / 51

slide-10
SLIDE 10

Performance Problems

Difficult to find, especially during development

◮ Invisible effect ◮ Often escape to production runs

Difficult to fix

◮ Large systems are complicated ◮ Enough diagnostic information is necessary ◮ Problems may be located deeply in systems

Lu Fang (UC Irvine) Final Defense May 26, 2017 4 / 51

slide-11
SLIDE 11

Performance Problems

Difficult to find, especially during development

◮ Invisible effect ◮ Often escape to production runs

Difficult to fix

◮ Large systems are complicated ◮ Enough diagnostic information is necessary ◮ Problems may be located deeply in systems

Can lead to severe problems

◮ Scalability reductions ◮ Programs hang and crash ◮ Financial losses

Lu Fang (UC Irvine) Final Defense May 26, 2017 4 / 51

slide-12
SLIDE 12

Existing Solutions

Many solutions are proposed

◮ Pattern-based ◮ Mining-based ◮ Learning-based

Lu Fang (UC Irvine) Final Defense May 26, 2017 5 / 51

slide-13
SLIDE 13

Existing Solutions

Many solutions are proposed

◮ Pattern-based ◮ Mining-based ◮ Learning-based

Most are postmortem debugging techniques

◮ Require user logs/input to trigger bugs ◮ Bugs already escape to production runs

Lu Fang (UC Irvine) Final Defense May 26, 2017 5 / 51

slide-14
SLIDE 14

Drawbacks in Existing Works

◮ Lacking a general way to describe problems ◮ Cannot detect problems under small workload ◮ Lacking a systematic approach to tune memory usage

in data-intensive systems

Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51

slide-15
SLIDE 15

Drawbacks in Existing Works → Our Solutions

◮ Lacking a general way to describe problems

→ Instrumentation Specification Language (ISL)

◮ Cannot detect problems under small workload ◮ Lacking a systematic approach to tune memory usage

in data-intensive systems

Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51

slide-16
SLIDE 16

Drawbacks in Existing Works → Our Solutions

◮ Lacking a general way to describe problems

→ Instrumentation Specification Language (ISL)

◮ Cannot detect problems under small workload

→ PerfBlower

◮ Lacking a systematic approach to tune memory usage

in data-intensive systems

Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51

slide-17
SLIDE 17

Drawbacks in Existing Works → Our Solutions

◮ Lacking a general way to describe problems

→ Instrumentation Specification Language (ISL)

◮ Cannot detect problems under small workload

→ PerfBlower

◮ Lacking a systematic approach to tune memory usage

in data-intensive systems → ITask

Lu Fang (UC Irvine) Final Defense May 26, 2017 6 / 51

slide-18
SLIDE 18

Lu Fang, Liang Dou, Guoqing Xu PerfBlower: Quickly Detecting Memory-Related Performance Problems via Amplification ECOOP’15

Lu Fang (UC Irvine) Final Defense May 26, 2017 7 / 51

slide-19
SLIDE 19

Instrumentation Specification Language

◮ Motivation 1: an easy way to develop new detectors

Lu Fang (UC Irvine) Final Defense May 26, 2017 8 / 51

slide-20
SLIDE 20

Instrumentation Specification Language

◮ Motivation 1: an easy way to develop new detectors ◮ Motivation 2: detect the problems with small effects

Lu Fang (UC Irvine) Final Defense May 26, 2017 8 / 51

slide-21
SLIDE 21

Instrumentation Specification Language

◮ Focus on problems with observable heap symptoms ◮ Users define symptoms/counter-evidence in events ◮ Two important actions: amplify and deamplify

Lu Fang (UC Irvine) Final Defense May 26, 2017 9 / 51

slide-22
SLIDE 22

Amplification and Deamplification

amplify: increases the penalty

Lu Fang (UC Irvine) Final Defense May 26, 2017 10 / 51

slide-23
SLIDE 23

Amplification and Deamplification

amplify: increases the penalty deamplify: resets the penalty

Lu Fang (UC Irvine) Final Defense May 26, 2017 10 / 51

slide-24
SLIDE 24

Amplification and Deamplification

amplify: increases the penalty deamplify: resets the penalty Virtual space overhead (VSO)

◮ VSO = Sumpenalty+Sizelive heap Sizelive heap ◮ Reflects the severity on 2 dementions: Time and Size

Lu Fang (UC Irvine) Final Defense May 26, 2017 10 / 51

slide-25
SLIDE 25

An ISL Program Example

Detecting Leaking Object Arrays

Context TypeContext { type = ''java.lang.Object[]''; } History UseHistory { type = ''boolean''; size = 10; } Partition AllPartition { kind = all; history = UseHistory; } TObject TrackedObject { include = TypeContext; partition = AllPartition; instance boolean useFlag = false; } Event on_rw(Object o, Field f, Word w1, Word w2) {

  • .useFlag = true;

deamplify(o); } Event on_reachedOnce(Object o) { UseHistory h = getHistory(o); h.update(o.useFlag); if (h.isFull() && !h.contains(true)) amplify(o);

  • .useFlag = false;

} Lu Fang (UC Irvine) Final Defense May 26, 2017 11 / 51

slide-26
SLIDE 26

An ISL Program Example

1 Context defines the type 2 History of partition instance 3 Heap partitioning 4 Tracked objects

Detecting Leaking Object Arrays

Context TypeContext { type = ''java.lang.Object[]''; } History UseHistory { type = ''boolean''; size = 10; } Partition AllPartition { kind = all; history = UseHistory; } TObject TrackedObject { include = TypeContext; partition = AllPartition; instance boolean useFlag = false; } Event on_rw(Object o, Field f, Word w1, Word w2) {

  • .useFlag = true;

deamplify(o); } Event on_reachedOnce(Object o) { UseHistory h = getHistory(o); h.update(o.useFlag); if (h.isFull() && !h.contains(true)) amplify(o);

  • .useFlag = false;

} Lu Fang (UC Irvine) Final Defense May 26, 2017 11 / 51

slide-27
SLIDE 27

An ISL Program Example

1 Context defines the type 2 History of partition instance 3 Heap partitioning 4 Tracked objects 5 The actions on events

Detecting Leaking Object Arrays

Context TypeContext { type = ''java.lang.Object[]''; } History UseHistory { type = ''boolean''; size = 10; } Partition AllPartition { kind = all; history = UseHistory; } TObject TrackedObject { include = TypeContext; partition = AllPartition; instance boolean useFlag = false; } Event on_rw(Object o, Field f, Word w1, Word w2) {

  • .useFlag = true;

deamplify(o); } Event on_reachedOnce(Object o) { UseHistory h = getHistory(o); h.update(o.useFlag); if (h.isFull() && !h.contains(true)) amplify(o);

  • .useFlag = false;

} Lu Fang (UC Irvine) Final Defense May 26, 2017 11 / 51

slide-28
SLIDE 28

PerfBlower

A general performance testing framework Supports ISL Can capture problems with small effects Reports reference path to problematic objects

Lu Fang (UC Irvine) Final Defense May 26, 2017 12 / 51

slide-29
SLIDE 29

PerfBlower

Lu Fang (UC Irvine) Final Defense May 26, 2017 13 / 51

slide-30
SLIDE 30

Heap Reference Path

1 Object leak is referenced by array

Leak is reference by whom?

Object[] array = new Object[10]; // Allocation site 1, creating the leak. Object leak = new Object(); // Object leak is referenced by array array[0] = leak; // Keep using Object leak ... // ... Never use leak again. // However, leak is referenced by array, // GC cannot reclaim object leak. Lu Fang (UC Irvine) Final Defense May 26, 2017 14 / 51

slide-31
SLIDE 31

Heap Reference Path

1 Object leak is referenced by array 2 Knowing allocation site 1 is not

enough

Leak is reference by whom?

Object[] array = new Object[10]; // Allocation site 1, creating the leak. Object leak = new Object(); // Object leak is referenced by array array[0] = leak; // Keep using Object leak ... // ... Never use leak again. // However, leak is referenced by array, // GC cannot reclaim object leak. Lu Fang (UC Irvine) Final Defense May 26, 2017 14 / 51

slide-32
SLIDE 32

Heap Reference Path

1 Object leak is referenced by array 2 Knowing allocation site 1 is not

enough

3 Key point: array keeps a reference to

leak, which can be shown by leak’s heap reference path

Leak is reference by whom?

Object[] array = new Object[10]; // Allocation site 1, creating the leak. Object leak = new Object(); // Object leak is referenced by array array[0] = leak; // Keep using Object leak ... // ... Never use leak again. // However, leak is referenced by array, // GC cannot reclaim object leak. Lu Fang (UC Irvine) Final Defense May 26, 2017 14 / 51

slide-33
SLIDE 33

Mirroring the reference path Original Objects Mirror Objects

Mirroring Ref. Path

Stack stack = new stack; // Allocation site 1, creating the leak. Object obj = new Object(); // stack.elements[0] = leak stack.push(); // Keep using Object leak ... // ... Never use obj again // However, leak is referenced by stack, // GC cannot reclaim object leak. Lu Fang (UC Irvine) Final Defense May 26, 2017 15 / 51

slide-34
SLIDE 34

Experiments

Three detectors

◮ Memory leak amplifier ◮ Under-utilized container amplifier ◮ Over-populated container amplifier

DaCapo benchmarks with 500MB heap

Lu Fang (UC Irvine) Final Defense May 26, 2017 16 / 51

slide-35
SLIDE 35

Memory Leak Amplifier

Programs with confirmed unknown leaks 10 20 30 40 50 60 antlr bloat eclipse fop luindexlusearch pmd hsqldb jython xalan VSOs caused by confirmed memory leaks Basic VSOs VSO is large  The program is likely to have leaks

Lu Fang (UC Irvine) Final Defense May 26, 2017 17 / 51

slide-36
SLIDE 36

Under-Utilized Container Amplifier

10 20 30 40 50 60 antlr bloat eclipse fop luindexlusearch pmd hsqldb jython xalan VSOs caused by confirmed under-utilized containers Basic VSOs Programs with confirmed unknown UUCs VSO is large  The program is very likely to have UUCs

Lu Fang (UC Irvine) Final Defense May 26, 2017 18 / 51

slide-37
SLIDE 37

Over-Populated Container Amplifier

5 10 15 20 25 30 antlr bloat eclipse fop luindexlusearch pmd xalan hsqldb jython VSOs caused by confirmed over-populated containers Basic VSOs Programs with confirmed unknown OPCs VSO is large  The program is very likely to have OPCs

Lu Fang (UC Irvine) Final Defense May 26, 2017 19 / 51

slide-38
SLIDE 38

Performance Improvements Benchmark Space Reduction Time Reduction xalan-leak 25.4% 14.6% jython-leak 24.3% 7.4% hsqldb-leak 15.6% 3.1% xalan-UUC 5.4% 34.1% jython-UUC 19.1% 1.1% hsqldb-UUC 17.4% 0.7% hsqldb-OPC 14.9% 2.9%

Lu Fang (UC Irvine) Final Defense May 26, 2017 20 / 51

slide-39
SLIDE 39

The Effectiveness of PerfBlower

VSOs indicate the existence of problems

◮ 8 unknown problems are detected ◮ All reports contain useful diagnostic information

Low overhead

◮ Space overheads are 1.23–1.25× ◮ Time overheads are 2.39–2.74×

Lu Fang (UC Irvine) Final Defense May 26, 2017 21 / 51

slide-40
SLIDE 40

Fixing Performance Problems

Fixing performance problems is hard

◮ Enough information is necessary ◮ Have to understand the logic of the system ◮ The problem exists deeply in the system

Memory pressure

◮ A common performance problem in data-paralle systems

Lu Fang (UC Irvine) Final Defense May 26, 2017 22 / 51

slide-41
SLIDE 41

Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, Shan Lu Interruptible Tasks: Treating Memory Pressure As Interrupts for Highly Scalable Data-Parallel Programs SOSP’15

Lu Fang (UC Irvine) Final Defense May 26, 2017 23 / 51

slide-42
SLIDE 42

Memory Pressure in Data-Parallel Systems

Data-parallel system

◮ Input data are divided into independent partitions ◮ Many popular big data systems

Lu Fang (UC Irvine) Final Defense May 26, 2017 24 / 51

slide-43
SLIDE 43

Memory Pressure in Data-Parallel Systems

Data-parallel system

◮ Input data are divided into independent partitions ◮ Many popular big data systems

  • Memory pressure on single nodes

Our study

◮ Search “out of memory” and “data parallel” in StackOverflow ◮ We have collected 126 related problems

Lu Fang (UC Irvine) Final Defense May 26, 2017 24 / 51

slide-44
SLIDE 44

Memory Pressure in the Real World

Memory pressure on individual nodes

◮ Executions push heap limit (using managed language) ◮ Data-parallel systems struggle for memory

Memory consumption Execution time Heap size OutOfMemoryError point Long and useless GC

Lu Fang (UC Irvine) Final Defense May 26, 2017 25 / 51

slide-45
SLIDE 45

Memory Pressure in the Real World

Memory pressure on individual nodes

◮ Executions push heap limit (using managed language) ◮ Data-parallel systems struggle for memory

Memory consumption Execution time Heap size OutOfMemoryError point Long and useless GC

CRASH OutOfMemory Error

Lu Fang (UC Irvine) Final Defense May 26, 2017 25 / 51

slide-46
SLIDE 46

Memory Pressure in the Real World

Memory pressure on individual nodes

◮ Executions push heap limit (using managed language) ◮ Data-parallel systems struggle for memory

Memory consumption Execution time Heap size OutOfMemoryError point Long and useless GC

CRASH OutOfMemory Error SLOW Huge GC effort

Lu Fang (UC Irvine) Final Defense May 26, 2017 25 / 51

slide-47
SLIDE 47

Root Cause 1: Hot Keys

Key-value pairs

Lu Fang (UC Irvine) Final Defense May 26, 2017 26 / 51

slide-48
SLIDE 48

Root Cause 1: Hot Keys

Key-value pairs Popular keys have many associated values

Lu Fang (UC Irvine) Final Defense May 26, 2017 26 / 51

slide-49
SLIDE 49

Root Cause 1: Hot Keys

Key-value pairs Popular keys have many associated values Case study (from StackOverflow)

◮ Process StackOverflow posts ◮ Long and popular posts ◮ Many tasks process long and popular posts

Lu Fang (UC Irvine) Final Defense May 26, 2017 26 / 51

slide-50
SLIDE 50

Root Cause 2: Large Intermediate Results

Temporary data structures

Lu Fang (UC Irvine) Final Defense May 26, 2017 27 / 51

slide-51
SLIDE 51

Root Cause 2: Large Intermediate Results

Temporary data structures Case study (from StackOverflow)

◮ Use NLP library to process customers’ reviews ◮ Some reviews are quite long ◮ NLP library creates giant temporary data structures for long

reviews

Lu Fang (UC Irvine) Final Defense May 26, 2017 27 / 51

slide-52
SLIDE 52

Existing Solutions

More memory? Not really!

◮ Data double in size every two years, [http://goo.gl/tM92i0] ◮ Memory double in size every three years, [http://goo.gl/50Rrgk]

Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51

slide-53
SLIDE 53

Existing Solutions

More memory? Not really!

◮ Data double in size every two years, [http://goo.gl/tM92i0] ◮ Memory double in size every three years, [http://goo.gl/50Rrgk]

Application-level solutions

◮ Configuration tuning ◮ Skew fixing

Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51

slide-54
SLIDE 54

Existing Solutions

More memory? Not really!

◮ Data double in size every two years, [http://goo.gl/tM92i0] ◮ Memory double in size every three years, [http://goo.gl/50Rrgk]

Application-level solutions

◮ Configuration tuning ◮ Skew fixing

System-level solutions

◮ Cluster-wide resource manager, such as YARN

Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51

slide-55
SLIDE 55

Existing Solutions

More memory? Not really!

◮ Data double in size every two years, [http://goo.gl/tM92i0] ◮ Memory double in size every three years, [http://goo.gl/50Rrgk]

Application-level solutions

◮ Configuration tuning ◮ Skew fixing

System-level solutions

◮ Cluster-wide resource manager, such as YARN

We need a systematic and effective solution!

Lu Fang (UC Irvine) Final Defense May 26, 2017 28 / 51

slide-56
SLIDE 56

Our Solution Interruptible Task: treat memory pressure as interrupt Dynamically change parallelism degree

Lu Fang (UC Irvine) Final Defense May 26, 2017 29 / 51

slide-57
SLIDE 57

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Program starts with multiple tasks Task Consumed Memory Task Consumed Memory Task Consumed Memory

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-58
SLIDE 58

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Program pushes heap limit Task Consumed Memory Task Consumed Memory Task Consumed Memory

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-59
SLIDE 59

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Long and useless GC Task Consumed Memory Task Consumed Memory Task Consumed Memory

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-60
SLIDE 60

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size OutOfMemory Error Task Consumed Memory Task Consumed Memory Task Consumed Memory

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-61
SLIDE 61

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Task Consumed Memory Task Consumed Memory Task Consumed Memory Long and useless GCs are detected

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-62
SLIDE 62

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Task Consumed Memory Task Consumed Memory Task Consumed Memory Killed Killed Long and useless GCs are detected, start interrupting tasks

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-63
SLIDE 63

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Release the memory, memory pressure is gone Task Consumed Memory Task Consumed Memory Task Consumed Memory Local Data Structures Processed Input Unprocessed Input Output Killed Consumed Memory Killed

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-64
SLIDE 64

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Release the memory, memory pressure is gone Task Consumed Memory Task Consumed Memory Task Consumed Memory Local Data Structures Processed Input Unprocessed Input Output Killed Consumed Memory Released Killed

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-65
SLIDE 65

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Release the memory, memory pressure is gone Task Consumed Memory Task Consumed Memory Task Consumed Memory Local Data Structures Processed Input Unprocessed Input Output Killed Consumed Memory Released Killed Released

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-66
SLIDE 66

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Release the memory, memory pressure is gone Task Consumed Memory Task Consumed Memory Task Consumed Memory Local Data Structures Processed Input Unprocessed Input Output Killed Consumed Memory Released Kept in memory, can be serialized Killed Released

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-67
SLIDE 67

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Release the memory, memory pressure is gone Task Consumed Memory Task Consumed Memory Task Consumed Memory Local Data Structures Processed Input Unprocessed Input Output Killed Consumed Memory Released Kept in memory, can be serialized Final result: push

  • ut and released

Killed Released

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-68
SLIDE 68

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Release the memory, memory pressure is gone Task Consumed Memory Task Consumed Memory Task Consumed Memory Local Data Structures Processed Input Unprocessed Input Output Killed Consumed Memory Released Kept in memory, can be serialized Intermediate result: kept in memory, can be serialized Final result: push

  • ut and released

Killed Released

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-69
SLIDE 69

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size Program executes without memory pressure Task Consumed Memory Task Consumed Memory Task Consumed Memory Killed Killed

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-70
SLIDE 70

Why Does Our Technique Help

Task Consumed Memory Memory consumption Execution time Heap size If there is enough memory, increase parallelism degree Task Consumed Memory Task Consumed Memory Task Consumed Memory Killed Killed Task Consumed Memory Newly created

Lu Fang (UC Irvine) Final Defense May 26, 2017 30 / 51

slide-71
SLIDE 71

Making Task Interruptible Is Non-trivial

Task Consumed Memory ? ? ? ? Consumed Memory Released or Kept in Memory? Interrupted Released or Kept in Memory? Released or Kept in Memory? Released or Kept in Memory? Lu Fang (UC Irvine) Final Defense May 26, 2017 31 / 51

slide-72
SLIDE 72

Making Task Interruptible Is Non-trivial

Task Consumed Memory ? ? ? ? Consumed Memory Released or Kept in Memory? Interrupted Released or Kept in Memory? Released or Kept in Memory? Released or Kept in Memory?

Require Semantics

Lu Fang (UC Irvine) Final Defense May 26, 2017 31 / 51

slide-73
SLIDE 73

Challenges How to expose semantics How to interrupt/reactivate tasks

Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51

slide-74
SLIDE 74

Challenges How to expose semantics → a programming model How to interrupt/reactivate tasks

Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51

slide-75
SLIDE 75

Challenges How to expose semantics → a programming model How to interrupt/reactivate tasks → a runtime system

Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51

slide-76
SLIDE 76

Challenges How to expose semantics → a programming model How to interrupt/reactivate tasks → a runtime system

Lu Fang (UC Irvine) Final Defense May 26, 2017 32 / 51

slide-77
SLIDE 77

The Programming Model

An ITask requires more semantics

◮ Separate processed and unprocessed input ◮ Specify how to serialize and deserialize ◮ Safely interrupt tasks ◮ Specify the actions when interrupt happens ◮ Merge the intermediate results

Lu Fang (UC Irvine) Final Defense May 26, 2017 33 / 51

slide-78
SLIDE 78

The Programming Model

An ITask requires more semantics

◮ Separate processed and unprocessed input ◮ Specify how to serialize and deserialize ◮ Safely interrupt tasks ◮ Specify the actions when interrupt happens ◮ Merge the intermediate results

A unified representation of input/output A definition of an interruptible task

Lu Fang (UC Irvine) Final Defense May 26, 2017 33 / 51

slide-79
SLIDE 79

Representing Input/Output as DataPartitions

◮ How to separate processed and unprocessed input ◮ How to serialize and deserialize the data

DataPartition Abstract Class

// The DataPartition abstract class abstract class DataPartition { // Some fields and methods ... // A cursor points to the first // unprocessed tuple int cursor; // Serialize the DataPartition abstract void serialize(); // Deserialize the DataPartition abstract DataPartition deserialize(); } Lu Fang (UC Irvine) Final Defense May 26, 2017 34 / 51

slide-80
SLIDE 80

Representing Input/Output as DataPartitions

◮ How to separate processed and unprocessed input ◮ How to serialize and deserialize the data

1 A cursor points to the first

unprocessed tuple

DataPartition Abstract Class

// The DataPartition abstract class abstract class DataPartition { // Some fields and methods ... // A cursor points to the first // unprocessed tuple int cursor; // Serialize the DataPartition abstract void serialize(); // Deserialize the DataPartition abstract DataPartition deserialize(); } Lu Fang (UC Irvine) Final Defense May 26, 2017 34 / 51

slide-81
SLIDE 81

Representing Input/Output as DataPartitions

◮ How to separate processed and unprocessed input ◮ How to serialize and deserialize the data

1 A cursor points to the first

unprocessed tuple

2 Users implement serialize and

deserialize methods

DataPartition Abstract Class

// The DataPartition abstract class abstract class DataPartition { // Some fields and methods ... // A cursor points to the first // unprocessed tuple int cursor; // Serialize the DataPartition abstract void serialize(); // Deserialize the DataPartition abstract DataPartition deserialize(); } Lu Fang (UC Irvine) Final Defense May 26, 2017 34 / 51

slide-82
SLIDE 82

Defining an ITask

◮ What actions should be taken when interrupt happens ◮ How to safely interrupt a task

ITask Abstract Class

// The ITask interface in the library abstract class ITask { // Some methods ... abstract void interrupt(); boolean scaleLoop(DataPartition dp) { // Iterate dp, and process each tuple while (dp.hasNext()) { // If pressure occurs, interrupt if (HasMemoryPressure()) { interrupt(); return false; } process(); } } } Lu Fang (UC Irvine) Final Defense May 26, 2017 35 / 51

slide-83
SLIDE 83

Defining an ITask

◮ What actions should be taken when interrupt happens ◮ How to safely interrupt a task

1 In interrupt, we define how to deal

with partial results

ITask Abstract Class

// The ITask interface in the library abstract class ITask { // Some methods ... abstract void interrupt(); boolean scaleLoop(DataPartition dp) { // Iterate dp, and process each tuple while (dp.hasNext()) { // If pressure occurs, interrupt if (HasMemoryPressure()) { interrupt(); return false; } process(); } } } Lu Fang (UC Irvine) Final Defense May 26, 2017 35 / 51

slide-84
SLIDE 84

Defining an ITask

◮ What actions should be taken when interrupt happens ◮ How to safely interrupt a task

1 In interrupt, we define how to deal

with partial results

2 Tasks are always interrupted at the

beginning in the scaleLoop

ITask Abstract Class

// The ITask interface in the library abstract class ITask { // Some methods ... abstract void interrupt(); boolean scaleLoop(DataPartition dp) { // Iterate dp, and process each tuple while (dp.hasNext()) { // If pressure occurs, interrupt if (HasMemoryPressure()) { interrupt(); return false; } process(); } } } Lu Fang (UC Irvine) Final Defense May 26, 2017 35 / 51

slide-85
SLIDE 85

Multiple Input for an ITask

◮ How to merge intermediate results

MITask Abstract Class

// The MITask interface in the library abstract class MITask extends ITask{ // Most parts are the same as ITask ... // The only difference boolean scaleLoop( PartitionIterator<DataPartition> i) { // Iterate partitions through iterator while (i.hasNext()) { DataPartition dp = (DataPartition) i.next(); // Iterate all the data tuples in this partition ... } return true; } } Lu Fang (UC Irvine) Final Defense May 26, 2017 36 / 51

slide-86
SLIDE 86

Multiple Input for an ITask

◮ How to merge intermediate results

1 scaleLoop takes a

PartitionIterator as input

MITask Abstract Class

// The MITask interface in the library abstract class MITask extends ITask{ // Most parts are the same as ITask ... // The only difference boolean scaleLoop( PartitionIterator<DataPartition> i) { // Iterate partitions through iterator while (i.hasNext()) { DataPartition dp = (DataPartition) i.next(); // Iterate all the data tuples in this partition ... } return true; } } Lu Fang (UC Irvine) Final Defense May 26, 2017 36 / 51

slide-87
SLIDE 87

ITask WordCount on Hyracks

Map Operator Map Operator Map Operator Merge Operator Reduce Operator Final

...

Final Final Shuffling Reduce Operator

...

Reduce Operator 1 HDFS Merge Operator 1 1 n n HDFS

MapOperator

class MapOperator extends ITask implements HyracksOperator { void interrupt() {

// Push out final // results to shuffling

... } // Some other fields and methods ... } Lu Fang (UC Irvine) Final Defense May 26, 2017 37 / 51

slide-88
SLIDE 88

ITask WordCount on Hyracks

Map Operator Map Operator Map Operator Merge Operator Reduce Operator Final

...

Final Final Shuffling Reduce Operator

...

Reduce Operator 1 HDFS Merge Operator 1 1 n n HDFS

ReduceOperator

class ReduceOperator extends ITask implements HyracksOperator { void interrupt() {

// Tag the results; // Output as intermediate // results

... } // Some other fields and methods ... } Lu Fang (UC Irvine) Final Defense May 26, 2017 37 / 51

slide-89
SLIDE 89

ITask WordCount on Hyracks

Map Operator Map Operator Map Operator Merge Operator Reduce Operator Final

...

Final Final Shuffling Reduce Operator

...

Reduce Operator 1 HDFS Merge Operator 1 1 n n HDFS

MergeOperator

class MergeTask extends MITask { void interrupt() {

// Tag the results; // Output as intermediate // results

} // Some other fields and methods ... } Lu Fang (UC Irvine) Final Defense May 26, 2017 37 / 51

slide-90
SLIDE 90

Challenges How to expose semantics → a programming model How to interrupt/activate tasks → a runtime system

Lu Fang (UC Irvine) Final Defense May 26, 2017 38 / 51

slide-91
SLIDE 91

ITask Runtime System

Monitor Scheduler Partition Manager

Data Partition Data Partition Data Partition Data Partition

ITask Runtime System

Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51

slide-92
SLIDE 92

ITask Runtime System

Monitor Scheduler Partition Manager

Data Partition Data Partition Data Partition Data Partition

Memory ITask Runtime System

Grow/Reduce Check Reduce

Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51

slide-93
SLIDE 93

ITask Runtime System

Monitor Scheduler Partition Manager

Data Partition Data Partition Data Partition Data Partition

Memory ITasks Disk ITask Runtime System

Grow/Reduce Check Serialize/Deserialize Input/Output Reduce

Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51

slide-94
SLIDE 94

ITask Runtime System

Monitor Scheduler Partition Manager

Data Partition Data Partition Data Partition Data Partition

Memory ITasks Disk ITask Runtime System

Grow/Reduce Check Interrupt/Create Serialize/Deserialize Input/Output Reduce

Lu Fang (UC Irvine) Final Defense May 26, 2017 39 / 51

slide-95
SLIDE 95

Evaluation Environments

We have implemented ITask on

◮ Hadoop 2.6.0 ◮ Hyracks 0.2.14

Lu Fang (UC Irvine) Final Defense May 26, 2017 40 / 51

slide-96
SLIDE 96

Evaluation Environments

We have implemented ITask on

◮ Hadoop 2.6.0 ◮ Hyracks 0.2.14

An 11-node Amazon EC2 cluster

◮ Each machine: 8 cores, 15GB, 80GB*2 SSD

Lu Fang (UC Irvine) Final Defense May 26, 2017 40 / 51

slide-97
SLIDE 97

Experiments on Hadoop

Goal

◮ Show the effectiveness on real-world problems

Lu Fang (UC Irvine) Final Defense May 26, 2017 41 / 51

slide-98
SLIDE 98

Experiments on Hadoop

Goal

◮ Show the effectiveness on real-world problems

Benchmarks

◮ Original: five real-world programs collected from Stack Overflow ◮ RFix: apply the fixes recommended on websites ◮ ITask: apply ITask on original programs

Name Dataset Map-Side Aggregation (MSA) Stack Overflow Full Dump In-Map Combiner (IMC) Wikipedia Full Dump Inverted-Index Building (IIB) Wikipedia Full Dump Word Cooccurrence Matrix (WCM) Wikipedia Full Dump Customer Review Processing (CRP) Wikipedia Sample Dump

Lu Fang (UC Irvine) Final Defense May 26, 2017 41 / 51

slide-99
SLIDE 99

Improvements

Benchmark Original Time RFix Time ITask Time Speed Up MSA 1047 (crashed) 48 72

  • 33.3%

IMC 5200 (crashed) 337 238 41.6% IIB 1322 (crashed) 2568 1210 112.2% WCM 2643 (crashed) 2151 1287 67.1% CRP 567 (crashed) 6761 2001 237.9%

◮ With ITask, all programs survive memory pressure ◮ On average, ITask versions are 62.5% faster than RFix

Lu Fang (UC Irvine) Final Defense May 26, 2017 42 / 51

slide-100
SLIDE 100

Experiments on Hyracks

Goal

◮ Show the improvements on performance ◮ Show the improvements on scalability

Lu Fang (UC Irvine) Final Defense May 26, 2017 43 / 51

slide-101
SLIDE 101

Experiments on Hyracks

Goal

◮ Show the improvements on performance ◮ Show the improvements on scalability

Benchmarks

◮ Original: five hand-optimized applications from repository ◮ ITask: apply ITask on original programs

Name Dataset WordCount (WC) Yahoo Web Map and Its Subgraphs Heap Sort (HS) Yahoo Web Map and Its Subgraphs Inverted Index (II) Yahoo Web Map and Its Subgraphs Hash Join (HJ) TPC-H Data Group By (GR) TPC-H Data

Lu Fang (UC Irvine) Final Defense May 26, 2017 43 / 51

slide-102
SLIDE 102

Tuning Configurations for Original Programs

Configurations for best performance

Name Thread Number Task Granularity WordCount (WC) 2 32KB Heap Sort (HS) 6 32KB Inverted Index (II) 8 16KB Hash Join (HJ) 8 32KB Group By (GR) 6 16KB

Configurations for best scalability

Name Thread Number Task Granularity WordCount (WC) 1 4KB Heap Sort (HS) 1 4KB Inverted Index (II) 1 4KB Hash Join (HJ) 1 4KB Group By (GR) 1 4KB

Lu Fang (UC Irvine) Final Defense May 26, 2017 44 / 51

slide-103
SLIDE 103

Improvements on Performance

WC HS II HJ GR 1 2

1 1 1 1 1 1.4 1.11 1.28 1.67 1.61 Normalized Speed Up Original Best ITask

On average, ITask is 34.4% faster

Lu Fang (UC Irvine) Final Defense May 26, 2017 45 / 51

slide-104
SLIDE 104

Improvements on Scalability

WC HS II HJ GR 10 20

1 1 1 1 1 5.1 2.7 24 6 5 Normalized Dataset Size Original Best ITask

On average, ITask scales to 6.3×+ larger datasets

Lu Fang (UC Irvine) Final Defense May 26, 2017 46 / 51

slide-105
SLIDE 105

The Effectiveness of ITask

ITask is pratical

◮ it has helped 13 real-world applications survive memory problems

ITask improves performance and scalability

◮ On Hadoop, ITask is 62.5% faster ◮ On Hyracks, ITask is 34.4% faster ◮ ITask helps programs scale to 6.3× larger datasets

A programming model + a runtime system

◮ Non-intrusive ◮ Easy to use

Lu Fang (UC Irvine) Final Defense May 26, 2017 47 / 51

slide-106
SLIDE 106

Conclusions

First general technique to amplify problems

◮ A class of performance problems ◮ Reveals pontential problems during testing

A general performance testing framework

◮ Includes a compiler and a runtime system ◮ Very pratical

First systematic approach to address memory pressure

◮ Consists of a programming model and a runtime system ◮ Solves real-world problems ◮ Significantly improves data-parallel tasks’ performance and

scalability

Lu Fang (UC Irvine) Final Defense May 26, 2017 48 / 51

slide-107
SLIDE 107

Future Works

Extend ISL Add support into production JVMs Consider more factors to improve test oracle Instantiate ITask in more data-parallel systems

Lu Fang (UC Irvine) Final Defense May 26, 2017 49 / 51

slide-108
SLIDE 108

Publications

◮ K. Nguyen, L. Fang, G. Xu, B. Demsky, S. Lu, S. Alamian, O. Mutlu Yak: A High-Performance Big-Data-Friendly Garbage Collector OSDI’16 ◮ Z. Zuo, L. Fang, S. Khoo, G. Xu, S. Lu Low-Overhead and Fully Automated Statistical Debugging with Abstraction Refinement OOPSLA’16 ◮ K. Nguyen, L. Fang, G. Xu, B. Demsky. Speculative Region-based Memory Management for Big Data Systems PLOS’15 ◮ L. Fang, K. Nguyen, G. Xu, B. Demsky, S. Lu Interruptible Tasks: Treating Memory Pressure As Interrupts for Highly Scalable Data-Parallel Programs SOSP’15 ◮ L. Fang, L. Dou, G. Xu PerfBlower: Quickly Detecting Memory-Related Performance Problems via Amplification ECOOP’15 ◮ K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, G. Xu Facade: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications ASPLOS’15

Lu Fang (UC Irvine) Final Defense May 26, 2017 50 / 51

slide-109
SLIDE 109

Thank You

Q & A

Lu Fang (UC Irvine) Final Defense May 26, 2017 51 / 51