CO444H Ben Livshits 1 Basic Instrumentation Insert additional - PowerPoint PPT Presentation

Runtime monitoring CO444H Ben Livshits 1

Basic Instrumentation • Insert additional code into the program • This code is designed to record important events as they occur at runtime • Some examples • A particular function is being hit or a statement is being hit • This leads to function-level or line-level coverage • Each allocation to measure overall memory allocation 2

Levels of Instrumentation • Native code • Instrument machine code • Tools like LLVM are often used for rewriting • Bytecode • Common for languages such as Java and C# • A variety of tools are available for each bytecode format • JoeQ is in this category as well, although it’s a lot more general • Source code • Common for languages like JavaScript • Often the easiest option – parse the code and add more statements 3

Runtime Code Monitoring • Three major examples of monitoring • Purify/Valgrind • Detecting data races • Detecting memory leaks 4

5 Memory Error Detection

Purify • C and C++ are not type-safe • The type system and the runtime fail to enforce type safety • What are some of the examples? • Possible to read and write outside of your intended data structures • Write beyond loop bounds • Or object bounds • Or overwrite the code pointer, etc. 6

Track Each Byte of Memory • Three states for every byte of tracker memory • Unallocated: cannot be read or written • Allocated but not initialized: cannot be read • Allocated and initialized: all operations are allowed 7

Instrumentation for Purify • Check the state of each byte at every access • Binary instrumentation: • Add code before each load and store • 2 bits per byte of memory (3 different states) • 25% memory overhead as a result (8+2) 8

Red Zones • Leave buffer space between allocated objects that is never allocated – red zones • Red zones are unallocated chunks of memory • Guarantees that walking off the end of an array hits unallocated memory 9

Aging Free Memory • When memory is freed, do not reallocate it immediately • Wait until the memory has “aged” somewhat • This helps with catching dangling pointer errors • Red zones are and aging are easily implemented in the malloc library 10

Summary of Purify • Used quite widely • Started with Purify • Now people use Valgrind • An open-source tool • What is the overhead? • Can you use these in production? 11

12 Data Race Detection

Data Races • Data races are miltithreaded bugs • At least two threads share a variable or memory location • At least one threat writes to the variable • This is similar to what we did for loop analysis • Races are to be avoided • Typical bug patterns in multithreaded code • Sources of non-determinism • Very hard to reproduce bugs • Why? 13

Not All Races Are Made Equal • We can have data races that involve writes that don’t lead to anything particularly bad • x=1 by two threats – doesn’t matter which one gets to execute first 14

Looking for Data Races • Event A happens before event B if • B follows A in a single thread • A in thread a and B is in thread B, event c such that • c is a sync event after A in a and before B in b • There is a natural partial order on events 15

Early Days of Race Detection • First race tools that is based on happens-before • Monitor all data references • Watch for • Access of v in thread a • Access of v in thread b • No intervening sync between a and b 16

Issues with This Approach • Can be expensive • The approach is fundamentally unsound, i.e. prone to • We need to do a lot of false negatives instrumentation: • Can miss data races • Requires access to all • Needs to be tested with shared variables many execution • All synchronization schedules points 17

What Happens Here? • Thread a • Thread 2 • y=y+1 • lock(m) • lock(m) • unlock(m) • unlock(m) • y=y+1 • How many schedules are there to explore? 18

What Else Can We Do? • What is the proper • Enforce this discipline: programming • Any access to a shared discipline? variable is protected by at least one lock • Most likely, we need to • Any access that is not guard access to shared protected by locks is an variables with locks error 19

Which Lock? • How do we know which • Lock inference: lock protects a • It must be one of the variable? locks that is held at the time of accessing the • A program may have variable many unrelated locks • Initialize C(v) to the set of • Links between shared all locks in the program variables and locks may not be very clear • On access to v by threat t • At runtime, we don’t • C v ← C(𝑤) ∩ 𝑚𝑝𝑑𝑙𝑡_ℎ𝑓𝑚𝑒(𝑢) want to do extensive • If C(v) is empty, print an analysis because of overhead error 20

Complications • It’s not this simple • Uninitialized data • Data initialized by the • We need to think about owner • Uninintialized data • No need to lock access • Read-shared data before initialization • Read-write locks • When does initialization happen? • No good answer at runtime 21

More Complications • Some data is only read • We don’t have to worry about shared reads • We don’t have to update locksets until • More than one thread has the value • At least one thread is writing the value • Keep the lockset algorithm as before but only infer locksets for shared- modified state locations 22

Read-Write Locks • Support a single writer • For each location read but multiple readers • 𝐷 𝑤 ← 𝐷 𝑤 ∩ • Some lock must be held 𝑚𝑝𝑑𝑙𝑡_ℎ𝑓𝑚𝑒(𝑢) either in write mode or read mode for all • For each location write accesses of a shared • 𝐷 𝑤 ← 𝐷 𝑤 ∩ location 𝑥𝑠𝑗𝑢𝑓_𝑚𝑝𝑑𝑙𝑡_ℎ𝑓𝑚𝑒(𝑢) • We separate between read and write mode locks 23

Implementation Details • Instrument the • Every memory word has a shadow word (32 bits) program at the binary • 30 bits designed for the level lockset key • Could also be done at • Sets of locks that are the level of the source encoded using an integer key in a hashtable • Depends on having not many distinct sets of locks • 2 bits for state in the DFA 24

This is the Basis for a Tool Called Eraser • Works quite well • Can find lots of errors with relatively few runs • However, the overhead is dramatic • 10-30x slowdown • Could be optimized with the help of a static analysis 25

26 Memory Leak Detection

Looking for Memory Leaks • Generally, very difficult to find • They manifest themselves over time • Sometimes, it takes hours or days in a long-running program to find a slow memory leak • An issue in production code when these things are not found in testing 27

Basic Idea • What is a memory leak • Approach: • Look for memory leaks using in Java? techniques that are • Object that haven’t borrowed from garbage been accessed for a collection long time • Any allocated memory that has no more pointers to it is • Track the time of considered to be a leak allocation, track the last • It’s possible to run a garbage access time, periodically collector, don’t free any report unused objects garbage, just detect it and report 28

Difficult in C and C++ • While in Java, we can easily tell what portions of the heap are accessible, in C and C++ that is a difficult task • Some of the possibilities: • No pointers to a malloced block at all – garbage • No pointers to the head of a malloced block – likely garbage • How do we identify what is reachable in C/C++? 29

CO444H Ben Livshits 1 Basic Instrumentation Insert additional - PowerPoint PPT Presentation

Runtime monitoring CO444H Ben Livshits 1 Basic Instrumentation Insert additional code into the program This code is designed to record important events as they occur at runtime Some examples A particular function is being hit or

CO444H Pointer analysis Ben Livshits 1 Approaches to Finding Reliability and Security Bugs

CO444H Pointer analysis Ben Livshits 1 Call Graphs Class analysis: Given a reference

CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to

CO444H Administrivia Overview of the Material Ben Livshits Two Primary Goals We Pursue

CO444H parallelism Ben Livshits 1 Why Parallelism? One way to speed up a computation is to

CO444H SSA SSA Construction SSA-based analysis Ben Livshits 1 Refresher: Reaching Definitions

W ALKING AS A P ARTICIPATORY , P ERFORMATIVE AND M OBILE M ETHOD Professor Maggie ONeill,

1 John Series Lesson #011 March 4, 2001 Dean Bible Ministries www.deanbibleministries.org Dr.

Eduard Hildebrandt http://www.eduard-hildebrandt.de 3 million images are uploaded to

Graph-based Methods in Pattern Recognition and Document Image Analysis (G M PR D I A ) Tutorial

Energy Democracy: Racial Equity in an Energy Future Anthony Giancatarino NEWHAB and EEFA

Responding to Situations around Race and Racism at Camp A Maine Summer Camps Webinar With Lisa

Policy Insights 2019: Examining Racial and Gender Wealth Inequity Jhumpa Bhattacharya, Insight

10/21/2020 Teaching Social Justice: Navigating the Deep Waters of Equity in Early Childhood

P-Companion: A Principled Framework for Diversified Complementary Product Recommendation Ju

Differential cohomology and topological actions Ben Gripaios Cambridge 28.x.20 with Joe Davighi

6.006- Introduction to Priority Queue Algorithms A data structure implementing a set S of

Personal Choice and Challenge Questions: A Security and Usability Assessment 16 July 2009 Mike

Whither Challenge Question Authentication? 12 May 2009 Mike Just University of Edinburgh

In a Silent Way Communication Between AI and Improvising Musicians Beyond Sound MASAHIRO ISERI

Rationalization Vadim Cherepanov (Upenn and Kellogg) Alvaro Sandroni (Kellogg) Tim Feddersen

Take two minutes to get your Notebooks, Signals, Calculators, & Student Orgzniers before we

Credit Suisse Third Quarter 2018 Results Tidjane Thiam, Chief Executive Officer David Mathers,

Dysfunctional Momentum Unchallenged concept that events are unfolding as expected Class of

CO444H Ben Livshits 1 Basic Instrumentation Insert additional - PowerPoint PPT Presentation

Runtime monitoring CO444H Ben Livshits 1 Basic Instrumentation Insert additional code into the program This code is designed to record important events as they occur at runtime Some examples A particular function is being hit or

CO444H Pointer analysis Ben Livshits 1 Approaches to Finding Reliability and Security Bugs

CO444H Pointer analysis Ben Livshits 1 Call Graphs Class analysis: Given a reference

CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to

CO444H Administrivia Overview of the Material Ben Livshits Two Primary Goals We Pursue

CO444H parallelism Ben Livshits 1 Why Parallelism? One way to speed up a computation is to

CO444H SSA SSA Construction SSA-based analysis Ben Livshits 1 Refresher: Reaching Definitions

W ALKING AS A P ARTICIPATORY , P ERFORMATIVE AND M OBILE M ETHOD Professor Maggie ONeill,

1 John Series Lesson #011 March 4, 2001 Dean Bible Ministries www.deanbibleministries.org Dr.

Eduard Hildebrandt http://www.eduard-hildebrandt.de 3 million images are uploaded to

Graph-based Methods in Pattern Recognition and Document Image Analysis (G M PR D I A ) Tutorial

Energy Democracy: Racial Equity in an Energy Future Anthony Giancatarino NEWHAB and EEFA

Responding to Situations around Race and Racism at Camp A Maine Summer Camps Webinar With Lisa

Policy Insights 2019: Examining Racial and Gender Wealth Inequity Jhumpa Bhattacharya, Insight

10/21/2020 Teaching Social Justice: Navigating the Deep Waters of Equity in Early Childhood

P-Companion: A Principled Framework for Diversified Complementary Product Recommendation Ju

Differential cohomology and topological actions Ben Gripaios Cambridge 28.x.20 with Joe Davighi

6.006- Introduction to Priority Queue Algorithms A data structure implementing a set S of

Personal Choice and Challenge Questions: A Security and Usability Assessment 16 July 2009 Mike

Whither Challenge Question Authentication? 12 May 2009 Mike Just University of Edinburgh

In a Silent Way Communication Between AI and Improvising Musicians Beyond Sound MASAHIRO ISERI

Rationalization Vadim Cherepanov (Upenn and Kellogg) Alvaro Sandroni (Kellogg) Tim Feddersen

Take two minutes to get your Notebooks, Signals, Calculators, &amp; Student Orgzniers before we

Credit Suisse Third Quarter 2018 Results Tidjane Thiam, Chief Executive Officer David Mathers,

Dysfunctional Momentum Unchallenged concept that events are unfolding as expected Class of

Take two minutes to get your Notebooks, Signals, Calculators, & Student Orgzniers before we