CO444H Ben Livshits 1 Basic Instrumentation Insert additional - - PowerPoint PPT Presentation

co444h
SMART_READER_LITE
LIVE PREVIEW

CO444H Ben Livshits 1 Basic Instrumentation Insert additional - - PowerPoint PPT Presentation

Runtime monitoring CO444H Ben Livshits 1 Basic Instrumentation Insert additional code into the program This code is designed to record important events as they occur at runtime Some examples A particular function is being hit or


slide-1
SLIDE 1

Runtime monitoring

CO444H

Ben Livshits

1

slide-2
SLIDE 2

Basic Instrumentation

  • Insert additional code into the program
  • This code is designed to record important events as

they occur at runtime

  • Some examples
  • A particular function is being hit or a statement is being

hit

  • This leads to function-level or line-level coverage
  • Each allocation to measure overall memory allocation

2

slide-3
SLIDE 3

Levels of Instrumentation

  • Native code
  • Instrument machine code
  • Tools like LLVM are often used for rewriting
  • Bytecode
  • Common for languages such as Java and C#
  • A variety of tools are available for each bytecode format
  • JoeQ is in this category as well, although it’s a lot more

general

  • Source code
  • Common for languages like JavaScript
  • Often the easiest option – parse the code and add more

statements

3

slide-4
SLIDE 4

Runtime Code Monitoring

  • Three major examples of

monitoring

  • Purify/Valgrind
  • Detecting data races
  • Detecting memory leaks

4

slide-5
SLIDE 5

Memory Error Detection

5

slide-6
SLIDE 6

Purify

  • C and C++ are not type-safe
  • The type system and the runtime fail to enforce type

safety

  • What are some of the examples?
  • Possible to read and write outside of your intended

data structures

  • Write beyond loop bounds
  • Or object bounds
  • Or overwrite the code pointer, etc.

6

slide-7
SLIDE 7

Track Each Byte of Memory

  • Three states for every byte of tracker memory
  • Unallocated: cannot be read or written
  • Allocated but not initialized: cannot be read
  • Allocated and initialized: all operations are allowed

7

slide-8
SLIDE 8

Instrumentation for Purify

  • Check the state of each byte at every access
  • Binary instrumentation:
  • Add code before each load and store
  • 2 bits per byte of memory (3 different states)
  • 25% memory overhead as a result (8+2)

8

slide-9
SLIDE 9

Red Zones

  • Leave buffer space between allocated
  • bjects that is never allocated – red

zones

  • Red zones are unallocated chunks of

memory

  • Guarantees that walking off the end of

an array hits unallocated memory

9

slide-10
SLIDE 10

Aging Free Memory

  • When memory is freed, do not reallocate it

immediately

  • Wait until the memory has “aged” somewhat
  • This helps with catching dangling pointer errors
  • Red zones are and aging are easily implemented in

the malloc library

10

slide-11
SLIDE 11

Summary of Purify

  • Used quite widely
  • Started with Purify
  • Now people use Valgrind
  • An open-source tool
  • What is the overhead?
  • Can you use these in production?

11

slide-12
SLIDE 12

Data Race Detection

12

slide-13
SLIDE 13

Data Races

  • Data races are miltithreaded bugs
  • At least two threads share a variable or memory location
  • At least one threat writes to the variable
  • This is similar to what we did for loop analysis
  • Races are to be avoided
  • Typical bug patterns in multithreaded code
  • Sources of non-determinism
  • Very hard to reproduce bugs
  • Why?

13

slide-14
SLIDE 14

Not All Races Are Made Equal

  • We can have data races that involve writes that

don’t lead to anything particularly bad

  • x=1 by two threats – doesn’t matter which one gets

to execute first

14

slide-15
SLIDE 15

Looking for Data Races

  • Event A happens before event B if
  • B follows A in a single thread
  • A in thread a and B is in thread B, event c such that
  • c is a sync event after A in a and before B in b
  • There is a natural partial order on events

15

slide-16
SLIDE 16

Early Days of Race Detection

  • First race tools that is based on happens-before
  • Monitor all data references
  • Watch for
  • Access of v in thread a
  • Access of v in thread b
  • No intervening sync between a and b

16

slide-17
SLIDE 17

Issues with This Approach

  • Can be expensive
  • We need to do a lot of

instrumentation:

  • Requires access to all

shared variables

  • All synchronization

points

  • The approach is

fundamentally unsound, i.e. prone to false negatives

  • Can miss data races
  • Needs to be tested with

many execution schedules

17

slide-18
SLIDE 18

What Happens Here?

  • Thread a
  • y=y+1
  • lock(m)
  • unlock(m)
  • How many schedules are there to explore?
  • Thread 2
  • lock(m)
  • unlock(m)
  • y=y+1

18

slide-19
SLIDE 19

What Else Can We Do?

  • What is the proper

programming discipline?

  • Most likely, we need to

guard access to shared variables with locks

  • Enforce this discipline:
  • Any access to a shared

variable is protected by at least one lock

  • Any access that is not

protected by locks is an error

19

slide-20
SLIDE 20

Which Lock?

  • How do we know which

lock protects a variable?

  • A program may have

many unrelated locks

  • Links between shared

variables and locks may not be very clear

  • At runtime, we don’t

want to do extensive analysis because of

  • verhead
  • Lock inference:
  • It must be one of the

locks that is held at the time of accessing the variable

  • Initialize C(v) to the set of

all locks in the program

  • On access to v by threat t
  • C v ← C(𝑤) ∩

𝑚𝑝𝑑𝑙𝑡_ℎ𝑓𝑚𝑒(𝑢)

  • If C(v) is empty, print an

error

20

slide-21
SLIDE 21

Complications

  • It’s not this simple
  • We need to think about
  • Uninintialized data
  • Read-shared data
  • Read-write locks
  • Uninitialized data
  • Data initialized by the
  • wner
  • No need to lock access

before initialization

  • When does initialization

happen?

  • No good answer at

runtime

21

slide-22
SLIDE 22

More Complications

  • Some data is only read
  • We don’t have to worry

about shared reads

  • We don’t have to update

locksets until

  • More than one thread has

the value

  • At least one thread is writing

the value

  • Keep the lockset algorithm

as before but only infer locksets for shared- modified state locations

22

slide-23
SLIDE 23

Read-Write Locks

  • Support a single writer

but multiple readers

  • Some lock must be held

either in write mode or read mode for all accesses of a shared location

  • We separate between

read and write mode locks

  • For each location read
  • 𝐷 𝑤 ← 𝐷 𝑤 ∩

𝑚𝑝𝑑𝑙𝑡_ℎ𝑓𝑚𝑒(𝑢)

  • For each location write
  • 𝐷 𝑤 ← 𝐷 𝑤 ∩

𝑥𝑠𝑗𝑢𝑓_𝑚𝑝𝑑𝑙𝑡_ℎ𝑓𝑚𝑒(𝑢)

23

slide-24
SLIDE 24

Implementation Details

  • Instrument the

program at the binary level

  • Could also be done at

the level of the source

  • Every memory word has a

shadow word (32 bits)

  • 30 bits designed for the

lockset key

  • Sets of locks that are

encoded using an integer key in a hashtable

  • Depends on having not

many distinct sets of locks

  • 2 bits for state in the DFA

24

slide-25
SLIDE 25

This is the Basis for a Tool Called Eraser

  • Works quite well
  • Can find lots of errors with relatively few runs
  • However, the overhead is dramatic
  • 10-30x slowdown
  • Could be optimized with the help of a static analysis

25

slide-26
SLIDE 26

Memory Leak Detection

26

slide-27
SLIDE 27

Looking for Memory Leaks

  • Generally, very difficult to find
  • They manifest themselves over time
  • Sometimes, it takes hours or days in a long-running

program to find a slow memory leak

  • An issue in production code when these things are

not found in testing

27

slide-28
SLIDE 28

Basic Idea

  • Approach:
  • Look for memory leaks using

techniques that are borrowed from garbage collection

  • Any allocated memory that

has no more pointers to it is considered to be a leak

  • It’s possible to run a garbage

collector, don’t free any garbage, just detect it and report

  • What is a memory leak

in Java?

  • Object that haven’t

been accessed for a long time

  • Track the time of

allocation, track the last access time, periodically report unused objects

28

slide-29
SLIDE 29

Difficult in C and C++

  • While in Java, we can easily tell what portions of

the heap are accessible, in C and C++ that is a difficult task

  • Some of the possibilities:
  • No pointers to a malloced block at all – garbage
  • No pointers to the head of a malloced block – likely

garbage

  • How do we identify what is reachable in C/C++?

29