Software Streams Big Data Challenges in Dynamic Program Analysis - - PowerPoint PPT Presentation

software streams big data challenges in dynamic program
SMART_READER_LITE
LIVE PREVIEW

Software Streams Big Data Challenges in Dynamic Program Analysis - - PowerPoint PPT Presentation

Intro Software streams Case studies Conclusions Software Streams Big Data Challenges in Dynamic Program Analysis Irene Finocchi Dept. Computer Science Sapienza U. Rome 1 / 41 Irene Finocchi CiE 2013 special session on data streams and


slide-1
SLIDE 1

Intro Software streams Case studies Conclusions

Software Streams Big Data Challenges in Dynamic Program Analysis

Irene Finocchi

  • Dept. Computer Science – Sapienza U. Rome

1 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-2
SLIDE 2

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Theory versus practice

Theory is when you know something, but it doesn’t work.

2 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-3
SLIDE 3

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Theory versus practice

Theory is when you know something, but it doesn’t work. Practice is when something works, but you don’t know why.

2 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-4
SLIDE 4

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Theory versus practice

Theory is when you know something, but it doesn’t work. Practice is when something works, but you don’t know why. Programmers combine theory and practice: Nothing works, and they don’t know why.

(Anonymous)

2 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-5
SLIDE 5

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Topic of the talk

Algorithm engineering talk: boosting practice with theory

3 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-6
SLIDE 6

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Topic of the talk

Algorithm engineering talk: boosting practice with theory Theory: data stream algorithmics Application area: dynamic program analysis

3 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-7
SLIDE 7

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Program analysis

Development of techniques and tools for analyzing the structure and the behavior of a software system

4 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-8
SLIDE 8

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Program analysis

Development of techniques and tools for analyzing the structure and the behavior of a software system Goals: conclude properties about the program: e.g., correctness, resource consumption seek opportunities for optimization error detection and correction: e.g., type checking, memory safety, data structure repair, protection against security attacks study how the program or its parts are used: e.g., usage patterns, intrusion detection program understanding

4 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-9
SLIDE 9

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Static vs. dynamic analysis

Static analysis: based on knowledge of code (source, object, ...) Examples: compilers formal verification systems theoretical analysis of algorithms

5 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-10
SLIDE 10

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Static vs. dynamic analysis

Static analysis: based on knowledge of code (source, object, ...) Examples: compilers formal verification systems theoretical analysis of algorithms Dynamic analysis: exploits information gathered at runtime Examples: debuggers, memory checkers performance profilers platforms for the experimental evaluation of algorithms

5 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-11
SLIDE 11

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Program analysis in algorithm engineering

6 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-12
SLIDE 12

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Soundness vs. accuracy

Static analysis huge success in software design, but dynamic nature of modern computing scenarios makes it increasingly more inaccurate

7 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-13
SLIDE 13

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Program analysis community

Many disciplines involved: programming languages, SE, architectures, algorithms, statistics. . .

8 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-14
SLIDE 14

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

This talk: algorithmics for dynamic program analysis

Events of interest: routine calls memory accesses low-level instructions ... system calls cache misses interrupts

9 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-15
SLIDE 15

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

What’s difficult?

Capturing events hardware support (counters, watchpoints) programmable interrupts/signals program instrumentation (source code or binary code)

10 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-16
SLIDE 16

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

What’s difficult?

Capturing events hardware support (counters, watchpoints) programmable interrupts/signals program instrumentation (source code or binary code) Intrusiveness: Heisenberg effects (the act of observing a system causes the system to change)

10 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-17
SLIDE 17

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

What’s difficult?

Capturing events hardware support (counters, watchpoints) programmable interrupts/signals program instrumentation (source code or binary code) Intrusiveness: Heisenberg effects (the act of observing a system causes the system to change) Performance: analysis inlined with program execution, slow down analyzed programs, real-time performance (billions of events per second)

10 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-18
SLIDE 18

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

What’s difficult?

Capturing events hardware support (counters, watchpoints) programmable interrupts/signals program instrumentation (source code or binary code) Intrusiveness: Heisenberg effects (the act of observing a system causes the system to change) Performance: analysis inlined with program execution, slow down analyzed programs, real-time performance (billions of events per second) Massive data: dynamic analysis tools process huge amounts of data, cannot store all of them

10 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-19
SLIDE 19

Intro Software streams Case studies Conclusions Program analysis Static vs. dynamic Dynamic issues

Efficient algorithms can make a difference

Automated dynamic analysis less explored than static analysis from an algorithmic perspective...

11 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-20
SLIDE 20

Intro Software streams Case studies Conclusions Execution traces Some properties

Software streams

12 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-21
SLIDE 21

Intro Software streams Case studies Conclusions Execution traces Some properties

An example: performance profiling

Form of dynamic program analysis that typically measures: execution time of instructions, basic blocks, routines frequency of portions of code Our goal: identify routines that contribute most to the running time (hot routines) Mainly useful for performance optimization

13 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-22
SLIDE 22

Intro Software streams Case studies Conclusions Execution traces Some properties

Profiler characteristics

Granularity

Basic blocks Routines

14 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-23
SLIDE 23

Intro Software streams Case studies Conclusions Execution traces Some properties

Profiler characteristics

Granularity

Basic blocks Routines

Metrics

Time Number of routine calls Cache misses, I/Os . . .

14 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-24
SLIDE 24

Intro Software streams Case studies Conclusions Execution traces Some properties

Profiler characteristics

Granularity

Basic blocks Routines

Metrics

Time Number of routine calls Cache misses, I/Os . . .

Data aggregation level

Vertex: how many times is routine f called? Edge: how many times is f called from g? Calling context: how many times is f called along path main → g → h → f ?

14 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-25
SLIDE 25

Intro Software streams Case studies Conclusions Execution traces Some properties

Vertex vs. calling context profiling

Vertex profiling: Stream Σ = = main, g, h, f , ... = = f1, f2, ..., fn Item universe: fi ∈ {routines} Query: find most frequently called routines

15 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-26
SLIDE 26

Intro Software streams Case studies Conclusions Execution traces Some properties

Vertex vs. calling context profiling

Vertex profiling: Stream Σ = = main, g, h, f , ... = = f1, f2, ..., fn Item universe: fi ∈ {routines} Query: find most frequently called routines Calling context profiling: Stream Σ = main, main → h, main → g → h ... = = π1, π2, ..., πn Item universe: πi ∈ ∞

j {routines}j

Query: find most frequent calling contexts

15 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-27
SLIDE 27

Intro Software streams Case studies Conclusions Execution traces Some properties

Conventional approaches

Keep complete profiling info about vertices or paths Vertex profiling: Hash table Space required = Θ(number

  • f distinct routines in Σ)

16 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-28
SLIDE 28

Intro Software streams Case studies Conclusions Execution traces Some properties

Conventional approaches

Keep complete profiling info about vertices or paths Vertex profiling: Hash table Space required = Θ(number

  • f distinct routines in Σ)

Calling context profiling: Calling context tree Space required = Θ(number

  • f CCT nodes ) =

= Θ(number of distinct call paths in Σ)

16 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-29
SLIDE 29

Intro Software streams Case studies Conclusions Execution traces Some properties

How much space?

Application |Call graph| |Call sites| |CCT| |Σ| amarok 13 754 113 362 13 794 470 991 112 563 audacity 6 895 79 656 13 131 115 924 534 168 bluefish 5 211 64 239 7 274 132 248 162 281 dolphin 10 744 84 152 11 667 974 390 134 028 firefox 6 756 145 883 30 294 063 625 133 218 gedit 5 063 57 774 4 183 946 407 906 721 gimp 5 146 93 372 26 107 261 805 947 134 sudoku 5 340 49 885 2 794 177 325 944 813 inkscape 6 454 89 590 13 896 175 675 915 815

  • ocalc

30 807 394 913 48 310 585 551 472 065 pidgin 7 195 80 028 10 743 073 404 787 763 quanta 13 263 113 850 27 426 654 602 409 403

Runs of a few minutes of real applications produce Gigabytes

  • f information

Storing the CCT requires hundreds of Megabytes

17 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-30
SLIDE 30

Intro Software streams Case studies Conclusions Execution traces Some properties

Skewness

18 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-31
SLIDE 31

Intro Software streams Case studies Conclusions Execution traces Some properties

Pareto principle (80-20 rule)

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 cumulative frequency relative to total number of calls % of hot contexts sorted by rank cumulative frequency distributions audacity audacity (startup only) bzip2 gimp gnome-dictionary inkscape

19 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-32
SLIDE 32

Intro Software streams Case studies Conclusions Execution traces Some properties

Patterns

Execution traces typically contain: several event repetitions, either contiguous or not a very large number of patterns each pattern can have thousands of occurrences Data mining, pattern detection, and compression techniques very useful to understand the characteristics of execution traces

20 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-33
SLIDE 33

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Case studies

21 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-34
SLIDE 34

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Mining hot calling contexts space-efficiently

Keep information about hot contexts only Ignore on the fly info about contexts with low frequency

[D’Elia, Demetrescu & F., PLDI 2011]

22 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-35
SLIDE 35

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Hot calling context tree

The CCT unfolds during program execution How do we prune it on-line (to get the HCCT)?

23 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-36
SLIDE 36

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

The Britney Spears problem...

24 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-37
SLIDE 37

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

... tracking who’s hot and who’s not

“... can’t just pay attention to a few popular subjects, because you can’t know in advance which ones are going to rank near the top. To be certain of catching every new trend as it unfolds, you have to monitor all the incoming queries – and their variety is unbounded. ”

25 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-38
SLIDE 38

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Heavy hitters

Given a stream of n items, find those that appear “most frequently” E.g., items occurring more than 1% of the time

26 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-39
SLIDE 39

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Heavy hitters

Given a stream of n items, find those that appear “most frequently” E.g., items occurring more than 1% of the time Formally “hard” in small space, so allow approximation

26 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-40
SLIDE 40

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Heavy hitters

Given a stream of n items, find those that appear “most frequently” E.g., items occurring more than 1% of the time Formally “hard” in small space, so allow approximation No false negatives: return all items with count ≥ ϕn

26 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-41
SLIDE 41

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Heavy hitters

Given a stream of n items, find those that appear “most frequently” E.g., items occurring more than 1% of the time Formally “hard” in small space, so allow approximation No false negatives: return all items with count ≥ ϕn “Good” false positives: no item with count < (ϕ − ε)n is returned (error ε ∈ (0, 1), ε ≪ ϕ)

26 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-42
SLIDE 42

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Heavy hitters

Given a stream of n items, find those that appear “most frequently” E.g., items occurring more than 1% of the time Formally “hard” in small space, so allow approximation No false negatives: return all items with count ≥ ϕn “Good” false positives: no item with count < (ϕ − ε)n is returned (error ε ∈ (0, 1), ε ≪ ϕ) Related problem: estimate each frequency with error ±εn

26 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-43
SLIDE 43

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

A well-studied problem

Core streaming problem: connections with entropy estimation, itemsets mining, compressed sensing Extensive research: scores of streaming papers on frequent items and its variations

27 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-44
SLIDE 44

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

A well-studied problem

Core streaming problem: connections with entropy estimation, itemsets mining, compressed sensing Extensive research: scores of streaming papers on frequent items and its variations Two approaches:

1 Sketch-based

Maintain a sketch of the whole data set Estimate frequency of both frequent and non-frequent items

27 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-45
SLIDE 45

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

A well-studied problem

Core streaming problem: connections with entropy estimation, itemsets mining, compressed sensing Extensive research: scores of streaming papers on frequent items and its variations Two approaches:

1 Sketch-based

Maintain a sketch of the whole data set Estimate frequency of both frequent and non-frequent items

2 Counter-based

Maintain estimated counters of frequent items only Work very well on skewed input distributions

27 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-46
SLIDE 46

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

(Some) counter-based algorithms

1 Sticky sampling Gibbons & Matias, SIGMOD 1998 – Manku &

Motwani, VLDB 2002] probabilistic, sampling-based approach correct with probability ≥ 1 − δ, with δ ∈ (0, 1) user-specified probability of failure space O( 1

ε · log 1 ϕδ)

2 Lossy counting [Manku & Motwani, VLDB 2002]

deterministic space O( 1

ε · log(εn))

3 Space saving [Metwally, Agrawal & El Abbadi, ACM TODS 2006]

deterministic space O( 1

ε) (provably optimal)

28 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-47
SLIDE 47

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Back to hot calling contexts

Maintain a set M of monitored calling contexts Upon query, return a subset A ⊆ M: A = { (ϕ, ε)-heavy hitters} All true hot contexts are returned: H ⊆ A (no false negatives) False positives are “good”

CCT (all contexts) false positives monitored M true hot H (ε,ϕ)-heavy hitters A

The CCT induces a tree structure over sets H, A, M

29 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-48
SLIDE 48

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Back to hot calling contexts

Maintain a set M of monitored calling contexts Upon query, return a subset A ⊆ M: A = { (ϕ, ε)-heavy hitters} All true hot contexts are returned: H ⊆ A (no false negatives) False positives are “good”

CCT (all contexts) false positives HCCT monitored M true hot H (ε,ϕ)-heavy hitters A

The CCT induces a tree structure over sets H, A, M

30 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-49
SLIDE 49

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Back to hot calling contexts

Maintain a set M of monitored calling contexts Upon query, return a subset A ⊆ M: A = { (ϕ, ε)-heavy hitters} All true hot contexts are returned: H ⊆ A (no false negatives) False positives are “good”

CCT (all contexts) (ε,ϕ)-HCCT false positives HCCT monitored M true hot H (ε,ϕ)-heavy hitters A

The CCT induces a tree structure over sets H, A, M

31 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-50
SLIDE 50

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Back to hot calling contexts

Maintain a set M of monitored calling contexts Upon query, return a subset A ⊆ M: A = { (ϕ, ε)-heavy hitters} All true hot contexts are returned: H ⊆ A (no false negatives) False positives are “good” The CCT induces a tree structure over sets H, A, M

32 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-51
SLIDE 51

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

(ϕ, ε)-hot calling context tree

100 100 100 100 50 50 40 10 10 10 10 1 false positive false positive (a) (b) (c)

(a) CCT: entire calling context tree (b) HCCT: hot calling context tree hot nodes cold internal nodes (c) (ϕ, ε)-HCCT: (ϕ, ε)-hot calling context tree hot nodes cold internal nodes “almost hot” leaves (false positives)

33 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-52
SLIDE 52

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

How many false positives?

Lossy Counting (left bars) versus Space Saving (right bars)

20 40 60 80 100 a m a r

  • k

a r k a u d a c i t y b l u e f i s h d

  • l

p h i n f i r e f

  • x

g e d i t g h e x 2 g i m p s u d

  • k

u g w e n v i e w i n k s c a p e

  • c

a l c

  • i

m p r e s s

  • w

r i t e r p i d g i n q u a n t a v l c Cold nodes / hot nodes / false positives (%) Classification of (φ,ε)-HCCT nodes

34 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-53
SLIDE 53

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Parameter tuning

Rule of thumb: ε = ϕ/10

[Cormode and Hadjieleftheriou, PVLDB 2008]

35 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-54
SLIDE 54

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Parameter tuning

Rule of thumb: ε = ϕ/10

[Cormode and Hadjieleftheriou, PVLDB 2008]

ε = ϕ/5 works great here due to distribution skewness

35 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-55
SLIDE 55

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Parameter tuning

Rule of thumb: ε = ϕ/10

[Cormode and Hadjieleftheriou, PVLDB 2008]

ε = ϕ/5 works great here due to distribution skewness What about ϕ?

35 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-56
SLIDE 56

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Parameter tuning

Rule of thumb: ε = ϕ/10

[Cormode and Hadjieleftheriou, PVLDB 2008]

ε = ϕ/5 works great here due to distribution skewness What about ϕ? HCCT nodes HCCT nodes HCCT nodes Benchmark φ = 10−3 φ = 10−5 φ = 10−7 audacity 112 9 181 233 362 dolphin 97 14 563 978 544 gimp 96 15 330 963 708 inkscape 80 16 713 830 191

  • ocalc

136 13 414 1 339 752 quanta 94 13 881 812 098

35 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-57
SLIDE 57

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Space analysis

Space Saving (LSS) vs. Lossy Counting (LC): ϕ = 10−4, ε = ϕ/5

36 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-58
SLIDE 58

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Counter accuracy

Space Saving (LSS) vs. Lossy Counting (LC): ϕ = 10−4, ε = ϕ/5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 a m a r

  • k

a r k a u d a c i t y b l u e f i s h d

  • l

p h i n f i r e f

  • x

g e d i t g h e x 2 g i m p s u d

  • k

u g w e n v i e w i n k s c a p e

  • c

a l c

  • i

m p r e s s

  • w

r i t e r p i d g i n q u a n t a v l c a m a r

  • k

a r k a u d a c i t y b l u e f i s h d

  • l

p h i n f i r e f

  • x

g e d i t g h e x 2 g i m p s u d

  • k

u g w e n v i e w i n k s c a p e

  • c

a l c

  • i

m p r e s s

  • w

r i t e r p i d g i n q u a n t a v l c Avg/max error (%) Benchmarks Avg/max counter error among hot elements (% of the true frequency) LSS avg error LC avg error LSS max error LC max error Max Avg 37 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-59
SLIDE 59

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Other applications

38 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-60
SLIDE 60

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Range adaptive profiling

Assume, e.g., that we want to profile lines of code: if 90% of time is spent on the top half of the code, fine-grained profile data on the bottom half would not be very useful

39 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-61
SLIDE 61

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Range adaptive profiling

Assume, e.g., that we want to profile lines of code: if 90% of time is spent on the top half of the code, fine-grained profile data on the bottom half would not be very useful Output profile data into a hierarchical fashion, grouping data into ranges: [Mysore et al., CGO 2006] most frequent ranges broken down into subranges least frequent events kept as larger ranges

39 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-62
SLIDE 62

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Range adaptive profiling

Assume, e.g., that we want to profile lines of code: if 90% of time is spent on the top half of the code, fine-grained profile data on the bottom half would not be very useful Output profile data into a hierarchical fashion, grouping data into ranges: [Mysore et al., CGO 2006] most frequent ranges broken down into subranges least frequent events kept as larger ranges Adaptive Spatial Partitioning (ranges and their counters stored in a tree): [Hershberger et al., Algorithmica 2006] when range gets sufficiently hot, corresponding tree node split into subranges ranges that get colder are merged together, pruning the tree

39 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-63
SLIDE 63

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Uniform sampling over software streams

Sampling very popular to reduce size of execution traces and runtime overhead of dynamic analysis tools

40 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-64
SLIDE 64

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Uniform sampling over software streams

Sampling very popular to reduce size of execution traces and runtime overhead of dynamic analysis tools Main approach in state-of-the-art profilers: fixed rate sampling (e.g., take one item every 10 ms)

Pros: easy to implement Cons: produce biased samples when the original trace exhibits regular patterns

40 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-65
SLIDE 65

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Uniform sampling over software streams

Sampling very popular to reduce size of execution traces and runtime overhead of dynamic analysis tools Main approach in state-of-the-art profilers: fixed rate sampling (e.g., take one item every 10 ms)

Pros: easy to implement Cons: produce biased samples when the original trace exhibits regular patterns

To avoid bias and get representative samples, need uniform sampling probability [Mytkowicz et al., PLDI 2010]

40 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-66
SLIDE 66

Intro Software streams Case studies Conclusions Mining calling contexts Other applications

Uniform sampling over software streams

Sampling very popular to reduce size of execution traces and runtime overhead of dynamic analysis tools Main approach in state-of-the-art profilers: fixed rate sampling (e.g., take one item every 10 ms)

Pros: easy to implement Cons: produce biased samples when the original trace exhibits regular patterns

To avoid bias and get representative samples, need uniform sampling probability [Mytkowicz et al., PLDI 2010] Randomized approaches (e.g., reservoir sampling [Vitter, ACM

TMS 1985]) leverage these issues [Coppa et al., 2013]

40 / 41 Irene Finocchi CiE 2013 special session on data streams and compression

slide-67
SLIDE 67

Intro Software streams Case studies Conclusions

Conclusions

Dynamic program analysis: data-intensive nature makes it great source of algorithmic problems a lot of fun with algorithms, systems, and architectures automated analysis provides valuable tools in algorithm engineering Challenges: analysis of programs on multi-core platforms, big data applications, and resource-constrained systems

41 / 41 Irene Finocchi CiE 2013 special session on data streams and compression