Test Set Coverage Efficiency Bryan Hickerson Monica Farkash - - - PDF document

test set coverage efficiency
SMART_READER_LITE
LIVE PREVIEW

Test Set Coverage Efficiency Bryan Hickerson Monica Farkash - - - PDF document

Slide 1 Mining Coverage Data for Test Set Coverage Efficiency Bryan Hickerson Monica Farkash - presenter Mike Behm Balavinayagam Samynathan IBM Austin UT Austin Slide 2 Outline Coverage Efficiency Coverage in Time First Time


slide-1
SLIDE 1

Slide 1

Mining Coverage Data for Test Set Coverage Efficiency

Monica Farkash - presenter Balavinayagam Samynathan UT Austin Bryan Hickerson Mike Behm IBM Austin

slide-2
SLIDE 2

Slide 2

Outline

  • Coverage Efficiency
  • Coverage in Time
  • First Time Per Test Coverage
  • Hard To Hit Coverage
  • Coverage Distribution
  • Scenarios to Waves
  • Wave Windows of Probability
  • Controlling the Test Load
  • Results & Conclusion
  • Acknowledgments / References

3/6/2015 Change "footer" to presenter's name and affiliation 2

slide-3
SLIDE 3

Slide 3

Coverage Efficiency

  • 12000 scenario files
  • Millions of tests
  • Coverage

– All Events

150k – Hard-to-Hit 73k

(< than 2k hits for1M tests)

– Never-Hit events 15k

– Coverage driven verification – Coverage driven test case generation – Graph based test case generation Automatic or manual targeting

3

S1 S2 Sk Scenario files Input to TCG n Test files generated with S1 Test Case Gen. T11 T12 T1n Simulator C1_1 Coverage . . . . . .

A test case generator receives as input a file which contains the scenario that the user desire as a pattern to generate tests. It uses it to generate as many tests as desired, longer or shorter, which all follow the given pattern. For each scenario we can generate tests, and for each such test we gather the coverage information as a witness of the impact that the test had on the design (as in functions that it exercised). The PowerPC methodology uses more than 12 thousand such scenario files while generating millions of tests. It collects information regarding the coverage of about 150 thousand coverage events. Each coverage event is classified depending on how many times it was hit in the moving window of the last 1 Million tests. The classification used here considers events hit less than 2000 times as hard-to-hit. We notice an approximate 10% ration of never-hit events.

slide-4
SLIDE 4

Slide 4

  • Coverage

– Never-Hit – Hard-to-Hit – Often-Hit => redundancy

  • Efficiency

– Achieve coverage goal less resources – Reduce redundancy

  • Observe

– Summarization, model identification, probability

  • Control

– Control the test case generation

4

Coverage Efficiency

used to drive the verification process

Coverage efficiency would show how quickly do I cover the coverage event list, given existing resources. Most of the time we talk in terms of two lists: the never-hit and the hard-to-hit. In spite of this, to increase the efficiency of our verification process we need to reduce the redundancy in our often-hit events too. In order to increase the efficiency of our methodology, the first step is to observe how coverage happens. We did so using summarization, model identification, probability analysis. The next step is to use that information, what we learned from looking at the data, to control the verification process to our benefit. We used that information to control only

  • ne very simple parameter in the process of test case generation, with spectacular

impact on the resulting efficiency.

slide-5
SLIDE 5

Slide 5

5

Coverage in Time

 Same scenario:

  • Semaphores

Locking mechanism

 Same load

  • Nb. of instr.
  • Nb. of cycles

The first information we looked at was how coverage happens in time. For this we chose a few testcases generated using the same scenario as input file, this means they were normally following the same high level pattern, and followed the evolution of coverage in time. For this we simply put the names of the signals which represent coverage monitors, in the list of signals to be recorded during simulation. What we show here is the number of coverage events that happen to be hit in the same simulation cycle, for 10 tests. The scenario is a locking mechanism, and even though we could guess that the lack of activity could mean locking, that is all we could guess. We didn’t learn much about how efficiency coverage is while running tests.

slide-6
SLIDE 6

Slide 6

6

First Time Per Test Coverage

Test A Test B

Same Scenario DSI_EAO

To increase our understanding, we decide to remove the redundancy from our plotting. This means that we post-process the data and keep as information only the first time a given coverage event is hit. We plot it again, in time, as in how many events were covered in a given cycle, for the first time. We can remove redundancy because we can assume that there is less value in a coverage event being hit many times in the same test, than in that event being hit in different tests. By removing redundancy we start to learn about how real coverage efficiency happens. We notice that there is a rather large wave of FTPT coverage followed by other, smaller waves, later on. We also notice that, even though the overall coverage seems impressive, the only one that matters, comparatively, is not much. This would means that the most efficient tests would be the shortest, those with the highest ratio of FTPT coverage per cycle, with the big problem that some coverage events would never happen in that time-frame.

slide-7
SLIDE 7

Slide 7 HTH Coverage in Time and FTPT

A LARX_STCX Test

7

We come back to our real problem, the hard to hit events, and try to understand where do they happen. For this we plot the hard-to-hit events compared to the all events, first as overall , and then as only the FTPT events. We already know that, according to the definition of what a hard-to-hit event is in our methodology, approximately ½ of events are defined as such. This is visible from both

  • comparisons. What we also notice is that the FTPT coverage continues to keep the

same shape, in all events as in hard-to-hit events. This means that we can again state that the hard-to-hit events are accessed in “waves” throughout the test.

slide-8
SLIDE 8

Slide 8

100 200 300 400 FTPT

A LARX_STCX Test

FTPT Gamma Distribution

8

What we learned until now is that the FTPT hard-to-hit events follow the shape of

  • waves. We identify the waves as gamma distributions. It re-enforces the shape we

generally use to show how coverage is being achieved. Until now we know that there are “coverage waves” that come throughout a test. We can explain them as areas that are being “opened” by certain activities. For example, after a few memory operations with a given relation between addresses, there is a cache operation that is being triggered. That cache activity is new to the test, hence it will “trigger” a new wave.

slide-9
SLIDE 9

Slide 9

  • Expectation Minimization (EM) algorithm to identify the mixture of Gaussians
  • Waves show the exercising of a new area in the design
  • We do not target coverage,

we target coverage waves

Mixture of Coverage Waves

Cycles Number Of FTPT Events

9

We looked if that is true for the rest of the test, not only the obvious first wave. We used a modeling identification algorithm, and for ease we approximated the gamma with

  • Gaussians. We learned that the rest of the FTPT fits to waves, throughout the whole

test, no matter how far from the “initial” wave they are.

slide-10
SLIDE 10

Slide 10

10

Different Scenarios

Four tests, two different scenarios

(DSI_EAO 456 and 163 and ATOMIC 58 and 20 )

What we learned until now is that the coverage happens in waves. What we show now is that the waves are different from a scenario to another one, but consistent to a given scenario file. Intuitively that should be easy to explain what that is correct. If a scenario stresses caches, eventually it will trigger the same activity, opening similar “wave” throughout the

  • HW. A different scenario, going for a different activity, will reach it in a different way, and

the activity can open a larger area of new events. Here we compare two scenarios, and two tests each.

slide-11
SLIDE 11

Slide 11

11

Scenarios to Generate Certain Waves

Particular wave(s) targeted by each scenario =>

Focus on the Hard-To-Hit waves for each scenario

We notice that, if we look at the “wave” pattern, there is a huge difference between these two scenarios, but is rather consistent for tests generated with the same scenario.

slide-12
SLIDE 12

Slide 12

For each scenario – Identify which hard-to-hit wave it targets – Identify the conditions under which it succeeds to achieve it.

12

HTH Coverage Wave Windows

T1 T2 T3 T4 T5 T6 T8 T9 T11 T12 T10

Cycle window likely to see a given wave.

If we look at the tests generated with the same scenario, some of them will reach the desired functionality, and hence trigger the activity wave, earlier, later, or maybe never. If we run enough such tests we can define a “window” in which there is a higher likelihood of that wave to happen. This means that there is a cycle window which we should target with our tests. Longer tests would be a waste, because we already reached the targeted wave, shorter would be a waste because we decrease the chances of reaching that wave. This is why we look at hard-to-hit events and the probability distribution of those events to be hit in a given cycle.

slide-13
SLIDE 13

Slide 13

  • 13

Probability Mass Function

50 100 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829 ID93352 ID41930 ID126982 ID206127

 Overall

Probability => Identifies the Hard- to-Hit cycle windows

Probability for an event to be hit during a test: ( ) - number of tests which contain event e

  • total number of tests

( )=(Nftpt(e))/Ntests probability mass function : →[0,1] The probability distribution of an event C to happen at cycle ∈ Cycles - the set of cycles in simulation, c one such cycle. E- sample space is the set of all possible outcomes, e - one element of E , e∈ . ( )=Pr⁡( = )=Pr⁡({ ∈ : ( )= }); ∑1_(c∈Cycles)▒〖p(c)=1〗;

slide-14
SLIDE 14

Slide 14

14

HTH-FTPT Load Dependency

200 400 600 800 1000 1 668 1335 2002 2669 3336 4003 4670 5337 6004 6671 7338 8005 8672 9339 200 400 600 800 1000 1200 1 668 1335 2002 2669 3336 4003 4670 5337 6004 6671 7338 8005 8672 9339 10006 10673 11340 12007 12674 13341 14008 14675 15342 16009 16676 17343 18010 18677 19344 20011 20678 21345 22012 22679 23346 24013 24680 25347 26014 26681 27348 28015 28682 29349

What we learned until now is that, if we identify the window of opportunity for the wave we are interested in, we can decide how long/short the tests should be to increase our likelihood of hitting that wave (signal) without wasting resources. We can control a single variable in the test case generation process to control how many instructions the test will have. The higher the number, the more loaded with instructions the test will be, even if it follows the same pattern with another test, with less instructions.

slide-15
SLIDE 15

Slide 15

15

Experimental Test Size to HTH Coverage

40 tests; TM 40000 cycles+ Original Size

We extracted experimentally, for a complicated transaction memory scenario, its coverage as a function of the size of the test (as in how many cycles it runs). We easily notice that the original load wasn’t large enough to reach two important waves of activity, and only by increasing the load did we enabled the tests to reach those areas. What we learned was that with fewer tests we increased our coverage, each test being a hit, while we used to run many tests with no result. For some of the scenario files we needed to increase the load to find the “sweet spot”, for others to decrease. This analysis allowed us to suggest a load which should enable a more efficient coverage. We implemented a feedback mechanism which uses a minimalistic implementation of such an analysis and uses the dynamic values to adjust the test case generation accordingly.

slide-16
SLIDE 16

Slide 16

  • Coverage Efficiency
  • Observe

– Coverage in Time – FTPT Coverage in Time – HTH – Coverage waves Mixture – Model Fitting – Probability distributions

  • Control

– Test case number of instructions

  • Industry results

Summary

16

slide-17
SLIDE 17

Slide 17

  • Decreased hard-to-hit by 12%

– 73,000 to 64,000

  • Never-hit before events decreased by 13%

– 15,000 to 13,000 – saving 18 Person/Months.

  • Less redundancy on easy-to-hit coverage.
  • Shifted manual work to the automatic process
  • Decreased time to achieve targeted coverage =>

enabled finding bugs earlier.

17

Results

The results we got were impressive. With a simple tuning of our default values per scenario, we achieved a 12% decrease in hard-to-hit events and a decrease of 13% of the never-hit events. This means that many events nobody will have to go and hunt

  • independently. This is on top of removing the useless redundancy in covering over and
  • ver again redundant events.
slide-18
SLIDE 18

Slide 18

Acknowledgments

University of Texas

Adnan Aziz

IBM

Wolfgang Roesner

18

slide-19
SLIDE 19

Slide 19

  • Adir, A., Almog, E., Fournier, L. & Eitan, M., 2004. Genesys-Pro: innovations in test program generation for

functional processor verification. Design & Test of Computers, IEEE, 21(2), pp. 84 - 93.

  • Benjamin, M., Geist, D., Hartman, A. & Wolfsthal, Y., 1999. A study in coverage-driven test generation. New

Orleans, IEEE, pp. 970-975.

  • Bergeron, J., Nightingale, A., Cerny, E. & Hunter, A., 2006. Coverage-Driven Verification. In: Springer, ed.

Verification Methodology for System Verilog. s.l.:s.n., pp. 259-280.

  • Bishop, C. M., 2006. Pattern recognition and machine learning.. s.l.:Springer.
  • Bruce, W., Gross, C. J. & Roesner, W., 2005. Comprehensive functional verification the complete industry cycle.

Amsterdam: Elsevier/Morgan Kaufmann.

  • Dit, B., Revelle, M., Gethers, M. & Poshyvanyk, D., 2013. Feature location in source code: a taxonomy and
  • survey. Journal of Software: Evolution and Process, 25(1), pp. 53-59.
  • Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P., 1996. From Data Mining to Knowledge Discovery Databases.

Artificial Intelligence, 17(3).

  • Foster, H., 2013. Wilson Research Group 2012 Functional Verification Study. [Online]

Available at: http://testandverification.com/DVClub/08_Apr_2013/2013AprWRGStudyatDVClubUK.pdf [Accessed 20 05 2014].

  • IBM Research, 2014. Coverage Directed Test Generation. [Online]

Available at: http://www.research.ibm.com/haifa/projects/verification/ml_cdg/cdg_sbfv.html [Accessed 20 05 2014].

  • Piziali, A., 2004. Coverage Driven Verification. In: Functional Verification Coverage Measurement and Analysis.

s.l.:Kluwer, pp. 109-136.

  • Sane, Michael, Solving Modern Verification Challenges for Today’s Industry Leaders,

Available at: http://chipdesignmag.com/display.php?articleId=4503 [Accessed 20 05 2014]. 19

References