Scalability-First Pointer Analysis with Self-Tuning Context - - PowerPoint PPT Presentation

scalability first pointer analysis with self tuning
SMART_READER_LITE
LIVE PREVIEW

Scalability-First Pointer Analysis with Self-Tuning Context - - PowerPoint PPT Presentation

Scalability-First Pointer Analysis with Self-Tuning Context Sensitivity Yue Li, Tian Tan, Anders Mller and Yannis Smaragdakis 1 Pointer Analysis Concept Which objects a variable may point to? Importance Fundamental for virtually


slide-1
SLIDE 1

Scalability-First Pointer Analysis with Self-Tuning Context Sensitivity

Yue Li, Tian Tan, Anders Møller and Yannis Smaragdakis

1

slide-2
SLIDE 2

Pointer Analysis

  • Importance

Fundamental for virtually all static analyses Useful for many software engineering tasks

e.g., call graphs, alias, etc.

Which objects a variable may point to?

  • Concept

e.g., bug detection, security analysis, program understanding, etc.

2

slide-3
SLIDE 3

Problem: Unpredictable Scalability

3

slide-4
SLIDE 4

Problem: Unpredictable Scalability

  • Precise pointer analysis is hard to scale
  • Context Sensitivity (CS): precise but slow
  • Context Insensitivity (CI): imprecise but fast

4

slide-5
SLIDE 5

Problem: Unpredictable Scalability

  • Variants of Context Sensitivity
  • Type Sensitivity (type)
  • Object Sensitivity (obj)

2type 1type CI 2obj Less precise Faster

  • Precise pointer analysis is hard to scale
  • Context Sensitivity (CS): precise but slow
  • Context Insensitivity (CI): imprecise but fast

5

slide-6
SLIDE 6

Problem: Unpredictable Scalability

285 2458 53 93 5374 95 960 289 2950 45 54 1203 228 48 135 49 117 112 22 22 67 994

2000 4000 6000 8000 10000

2obj 2type CI

timeout (>10800 seconds)

6

slide-7
SLIDE 7

Problem: Unpredictable Scalability

  • Scenario

7

slide-8
SLIDE 8

Problem: Unpredictable Scalability

  • Scenario

as a part of a large-scale security analysis

8

slide-9
SLIDE 9

Problem: Unpredictable Scalability

  • Scenario

as a part of a large-scale security analysis

9

slide-10
SLIDE 10

Problem: Unpredictable Scalability

Precise 2obj Unscalable for many

?

  • Scenario

as a part of a large-scale security analysis

X X X X X X X

10

slide-11
SLIDE 11

Problem: Unpredictable Scalability

Precise 2obj Unscalable for many

?

  • Scenario

as a part of a large-scale security analysis Scalable CI Imprecise for all?

11

slide-12
SLIDE 12

Problem: Unpredictable Scalability

Precise 2obj Unscalable for many

?

  • Scenario
  • Iterate until most precise that scales:

2obj à 2type à 1type à CI

  • Sleepless nights and still not great precision!

as a part of a large-scale security analysis

12

Scalable CI Imprecise for all?

slide-13
SLIDE 13

Scaler

Good Scalability & High Precision regardless of the program being analyzed

13

slide-14
SLIDE 14

Scaler

Scalability Precision

as good as CI comparable to or better than the best scalable CS

285 2458 53 93 5374 95 960 289 2950 45 54 1203 1194 254 652 272 1769 452 53 93 705 1236 2000 4000 6000 8000 10000

2obj 2type Scaler

timeout (>10800 seconds)

Good Scalability & High Precision regardless of the program being analyzed

14

slide-15
SLIDE 15

Idea

Scaler

15

slide-16
SLIDE 16

: number of contexts for method m under CS c : number of points-to facts for method m

#ptsm #ctxm

c *

#ctxm

c

#ptsm

Number of worst-case CS points-to facts for method m

  • Key Concept

16

slide-17
SLIDE 17
  • Too many CS points-to facts generated for certain methods

Insight

17

slide-18
SLIDE 18

*

#ctxm

c

#ptsm

>

ST (Scalability Threshold)

  • m is scalability-critical method under CS c

(c is expensive)

Insight

18

  • Too many CS points-to facts generated for certain methods
slide-19
SLIDE 19

*

#ctxm

c

#ptsm

ST (Scalability Threshold)

  • m is scalability-critical method under CS c

(c is expensive)

  • Identify scalability-critical method m

*

#ctxm

c’

#ptsm ≤

ST (Scalability Threshold)

(choose cheap c’)

Insight

>

19

  • Too many CS points-to facts generated for certain methods
slide-20
SLIDE 20

*

#ctxm

c

#ptsm

ST (Scalability Threshold)

  • m is scalability-critical method under CS c

(c is expensive)

  • Identify scalability-critical method m

*

#ctxm

c’

#ptsm ≤

ST (Scalability Threshold)

(choose cheap c’)

How to estimate ?

*

#ctxm

c

#ptsm

Insight

How to identify scalability-critical methods? >

20

  • Too many CS points-to facts generated for certain methods
slide-21
SLIDE 21

How to estimate

Context estimation problem à Graph traversal problem

*Making k-object-sensitive pointer analysis more precise with still k-limiting. Tan et al. SAS 2016 *

#ctxm

c

#ptsm ?

Pre-analysis: points-to results of CI

  • #pts
  • btained directly

m

  • btained by leveraging Object Allocation Graph* (based on CI)

#ctxm

c

21

slide-22
SLIDE 22

Example

Scaler

22

slide-23
SLIDE 23

10 000

method

10 000 1

2obj

1 000 000 100 000

#ctx

c

m·#pts m

c =

method

23

slide-24
SLIDE 24

STp

10 000

method

10 000

method

1

2obj

1 000 000 100 000

#ctx

c

m·#pts m

c =

method

1000

2obj

ST: Scalability Threshold

24

slide-25
SLIDE 25

STp

10 000

method

10 000

method

1

2obj

1 000 000 100 000

#ctx

c

m·#pts m

c =

method

1000

2obj

ST: Scalability Threshold

25

slide-26
SLIDE 26

STp

10 000

method

10 000

method

1

2obj

1 000 000 100 000

#ctx

c

m·#pts m

c =

method

1000

2obj

ST: Scalability Threshold

26

?

slide-27
SLIDE 27

STp

10 000

method

10 000

method

1

2obj

1 000 000 100 000

#ctx

c

m·#pts m

c =

method

1000

2type 1type

c = c =

2obj

?

27

slide-28
SLIDE 28

STp

10 000

method

10 000

method

1

2obj

1 000 000 100 000

#ctx

c

m·#pts m

c =

method

1000

2type 1type

c = c =

2obj 2type

28

slide-29
SLIDE 29

10 000

ST

method

10 000

method

2000 1 method 4000

method

2type 1type 2obj

1 000 000 100 000

p

#ctx

c

m·#pts m

c = c = c =

2obj 2type 1type

29

slide-30
SLIDE 30

10 000

ST

method

10 000

method

2000 1 method 4000

method

2type 1type 2obj

1 000 000 100 000

p

#ctx

c

m·#pts m

c = c = c =

2obj 2type 1type

For any scalability-critical method, use the most precise CS variant that can turn it to a non-scalability-critical method

30

slide-31
SLIDE 31

10 000

ST

method

10 000

method

2000 1 method 4000

method

2type 1type 2obj

1 000 000 100 000

p

#ctx

c

m·#pts m

c = c = c =

2obj 2type 1type

For any scalability-critical method, use the most precise CS variant that can turn it to a non-scalability-critical method

31

slide-32
SLIDE 32

Total Scalability Threshold (TST)

32

To automatically choose an appropriate for different program p

p

ST

slide-33
SLIDE 33

Total Scalability Threshold (TST)

To automatically choose an appropriate for different program p

p

ST

  • TST is memory size related
  • TST indicates analysis capacity

How many points-to facts can the memory hold?

33

slide-34
SLIDE 34

Total Scalability Threshold (TST)

  • TST is memory size related

Memory TST Program A #ctxm

c

#ptsm

*

Σ

Program B #ctxm

c

#ptsm

*

Σ

  • TST indicates analysis capacity

How many points-to facts can the memory hold?

34

To automatically choose an appropriate for different program p

p

ST

slide-35
SLIDE 35

10 000

ST

method

10 000

method

2000 1 method 4000

method

2type 1type 2obj

1 000 000 100 000

p

#ctx

c

m·#pts m

c = c = c =

35

slide-36
SLIDE 36

method

A1 A2 A3 A1 + A2 + A3 ≤ TST

is automatically computed based on the above inequality

method

10 000 2000 1 method 4000

method

10 000 1 000 000 100 000

STp

#ctx

c

m·#pts m p

ST

2type 1type 2obj

c = c = c =

E

= ( ) ST

p

E(

) ST

p

Program P #ctxm

c

#ptsm

*

Σ

36

slide-37
SLIDE 37

method

A1 A2 A3 A1 + A2 + A3 ≤ TST

is automatically computed based on the above inequality

method

10 000 2000 1 method 4000

method

10 000 1 000 000 100 000

STp

#ctx

c

m·#pts m p

ST

2type 1type 2obj

c = c = c =

E

= ( ) ST

p Program P #ctxm

c

#ptsm

*

Σ

37

slide-38
SLIDE 38

method

A1 A2 A3 A1 + A2 + A3 ≤ TST

is automatically computed based on the above inequality

method

10 000 2000 1 method 4000

method

10 000 1 000 000 100 000

STp

#ctx

c

m·#pts m p

ST

2type 1type 2obj

c = c = c =

E

= ( ) ST

p

is the max value satisfying this inequality

p

ST

38

slide-39
SLIDE 39

Scalability-First Pointer Analysis with Self-Tuning Context Sensitivity

Scaler

CS Variants STp TST self-tuned by depends on

method

A1 A2 A3 A1 + A2 + A3 ≤ TST

is automatically computed based on the above inequality method 10 000 2000 1 method 4000 method 10 000 1 000 000 100 000

STp

#ctx c m·#pts m p ST

2type 1type 2obj

c = c = c =

E

= ( ) ST

p

39

slide-40
SLIDE 40

Results

Scaler

40

slide-41
SLIDE 41

Luindex Chart

10 Popular Java Programs

41

slide-42
SLIDE 42
  • TST = 30M (48G Memory)
  • 20M, 40M, 60M, etc. are all ok
  • Larger TST means better precision but worse efficiency
  • Time budget = 3 hours (per program)

Settings

Scalability Precision

as good as CI comparable to or better than the best scalable CS

Results

285 2458 53 93 5374 95 960 289 2950 45 54 1203 1194 254 652 272 1769 452 53 93 705 1236 2000 4000 6000 8000 10000 2obj 2type Scaler timeout (>10800 seconds)

Scaler

42

slide-43
SLIDE 43

Settings

Scalability Precision

as good as CI comparable to or better than the best scalable CS

Results

Scaler

Complex program Medium-Complexity program Simple program Luindex

  • TST = 30M (48G Memory)
  • 20M, 40M, 60M, etc. are all ok
  • Larger TST means better precision but worse efficiency
  • Time budget = 3 hours (per program)

43

slide-44
SLIDE 44

Analysis Time (seconds) 3h = 10800s Precision Metrics #may-fail casts #poly calls #reachable methods #call graph edges CI 112 2234 2778 12718 114856 2obj à 2type à 1type >3h + >3h + 1997 2117 2577 12430 111834 Scaler 452 1852 2500 12167 107410

Complex program

In all cases, lower is better

44

slide-45
SLIDE 45

Analysis Time (seconds) 3h = 10800s Precision Metrics #may-fail casts #poly calls #reachable methods #call graph edges CI 49 2508 2925 13036 77370 2obj à 2type à 1type 2458 1409 2182 12657 65836 Scaler 272 1452 2195 12676 66177

Medium-Complexity program

In all cases, lower is better

45

slide-46
SLIDE 46

Analysis Time (seconds) 3h = 10800s Precision Metrics #may-fail casts #poly calls #reachable methods #call graph edges CI 22 734 940 6670 33130 2obj à 2type à 1type 53 297 675 6256 29021 Scaler 53 297 675 6256 29021

Simple program

In all cases, lower is better

Luindex

46

slide-47
SLIDE 47

Analysis Time (seconds) 3h = 10800s Precision Metrics #may-fail casts #poly calls #reachable methods #call graph edges CI 22 734 940 6670 33130 2obj à 2type à 1type 53 297 675 6256 29021 Scaler 53 297 675 6256 29021 Analysis Time (seconds) 3h = 10800s Precision Metrics #may-fail casts #poly calls #reachable methods #call graph edges CI 49 2508 2925 13036 77370 2obj à 2type à 1type 2458 1409 2182 12657 65836 Scaler 272 1452 2195 12676 66177 Analysis Time (seconds) 3h = 10800s Precision Metrics #may-fail casts #poly calls #reachable methods #call graph edges CI 112 2234 2778 12718 114856 2obj à 2type à 1type >3h + >3h + 1997 2117 2577 12430 111834 Scaler 452 1852 2500 12167 107410

47

slide-48
SLIDE 48

Analysis Time (seconds) 3h = 10800s Precision Metrics #may-fail casts #poly calls #reachable methods #call graph edges CI 22 734 940 6670 33130 2obj à 2type à 1type 53 297 675 6256 29021 Scaler 53 297 675 6256 29021 Analysis Time (seconds) 3h = 10800s Precision Metrics #may-fail casts #poly calls #reachable methods #call graph edges CI 49 2508 2925 13036 77370 2obj à 2type à 1type 2458 1409 2182 12657 65836 Scaler 272 1452 2195 12676 66177 Analysis Time (seconds) 3h = 10800s Precision Metrics #may-fail casts #poly calls #reachable methods #call graph edges CI 112 2234 2778 12718 114856 2obj à 2type à 1type >3h + >3h + 1997 2117 2577 12430 111834 Scaler 452 1852 2500 12167 107410

Scaler: Good Scalability & High Precision regardless of the program being analyzed

48

slide-49
SLIDE 49

Want to have a good night’s sleep?

49

slide-50
SLIDE 50

Good Scalability & High Precision

Scaler

Want to have a good night’s sleep?

50

slide-51
SLIDE 51

Conclusion

Scaler

51

slide-52
SLIDE 52
  • Extremely good scalability + high precision (one shot)
  • Directly benefit many software engineeing tasks/tools
  • Expect the idea behind help scale other static analyses

Scaler

  • Artifact successfully evaluated
  • Open Source Tool available

http://www.brics.dk/scaler

52

  • Predict #CS-points-to facts using fast pre-analysis
  • Memory-related scalability (TST)
  • Method-level CS configurations (ST)
slide-53
SLIDE 53

Luindex

ST

p 16607 78699

ST

p

ST

p 35080733

≤ TST

E(

) ST

p

(30M)

53

slide-54
SLIDE 54

2 188 2 176 2 080 2 080 2 050

2 000 4 000 6 000 8 000 10 000 12 000 2 000 2 050 2 100 2 150 2 200 20M 30M 60M 80M 150M #may-fail casts 12GB 48GB 368GB

Time

Timeout

Precision

Timeout

TST value Memory size

54