Test Set Coverage Efficiency Bryan Hickerson Monica Farkash - - - PDF document
Test Set Coverage Efficiency Bryan Hickerson Monica Farkash - - - PDF document
Slide 1 Mining Coverage Data for Test Set Coverage Efficiency Bryan Hickerson Monica Farkash - presenter Mike Behm Balavinayagam Samynathan IBM Austin UT Austin Slide 2 Outline Coverage Efficiency Coverage in Time First Time
Slide 2
Outline
- Coverage Efficiency
- Coverage in Time
- First Time Per Test Coverage
- Hard To Hit Coverage
- Coverage Distribution
- Scenarios to Waves
- Wave Windows of Probability
- Controlling the Test Load
- Results & Conclusion
- Acknowledgments / References
3/6/2015 Change "footer" to presenter's name and affiliation 2
Slide 3
Coverage Efficiency
- 12000 scenario files
- Millions of tests
- Coverage
– All Events
150k – Hard-to-Hit 73k
(< than 2k hits for1M tests)
– Never-Hit events 15k
– Coverage driven verification – Coverage driven test case generation – Graph based test case generation Automatic or manual targeting
3
S1 S2 Sk Scenario files Input to TCG n Test files generated with S1 Test Case Gen. T11 T12 T1n Simulator C1_1 Coverage . . . . . .
A test case generator receives as input a file which contains the scenario that the user desire as a pattern to generate tests. It uses it to generate as many tests as desired, longer or shorter, which all follow the given pattern. For each scenario we can generate tests, and for each such test we gather the coverage information as a witness of the impact that the test had on the design (as in functions that it exercised). The PowerPC methodology uses more than 12 thousand such scenario files while generating millions of tests. It collects information regarding the coverage of about 150 thousand coverage events. Each coverage event is classified depending on how many times it was hit in the moving window of the last 1 Million tests. The classification used here considers events hit less than 2000 times as hard-to-hit. We notice an approximate 10% ration of never-hit events.
Slide 4
- Coverage
– Never-Hit – Hard-to-Hit – Often-Hit => redundancy
- Efficiency
– Achieve coverage goal less resources – Reduce redundancy
- Observe
– Summarization, model identification, probability
- Control
– Control the test case generation
4
Coverage Efficiency
used to drive the verification process
Coverage efficiency would show how quickly do I cover the coverage event list, given existing resources. Most of the time we talk in terms of two lists: the never-hit and the hard-to-hit. In spite of this, to increase the efficiency of our verification process we need to reduce the redundancy in our often-hit events too. In order to increase the efficiency of our methodology, the first step is to observe how coverage happens. We did so using summarization, model identification, probability analysis. The next step is to use that information, what we learned from looking at the data, to control the verification process to our benefit. We used that information to control only
- ne very simple parameter in the process of test case generation, with spectacular
impact on the resulting efficiency.
Slide 5
5
Coverage in Time
Same scenario:
- Semaphores
Locking mechanism
Same load
- Nb. of instr.
- Nb. of cycles
The first information we looked at was how coverage happens in time. For this we chose a few testcases generated using the same scenario as input file, this means they were normally following the same high level pattern, and followed the evolution of coverage in time. For this we simply put the names of the signals which represent coverage monitors, in the list of signals to be recorded during simulation. What we show here is the number of coverage events that happen to be hit in the same simulation cycle, for 10 tests. The scenario is a locking mechanism, and even though we could guess that the lack of activity could mean locking, that is all we could guess. We didn’t learn much about how efficiency coverage is while running tests.
Slide 6
6
First Time Per Test Coverage
Test A Test B
Same Scenario DSI_EAO
To increase our understanding, we decide to remove the redundancy from our plotting. This means that we post-process the data and keep as information only the first time a given coverage event is hit. We plot it again, in time, as in how many events were covered in a given cycle, for the first time. We can remove redundancy because we can assume that there is less value in a coverage event being hit many times in the same test, than in that event being hit in different tests. By removing redundancy we start to learn about how real coverage efficiency happens. We notice that there is a rather large wave of FTPT coverage followed by other, smaller waves, later on. We also notice that, even though the overall coverage seems impressive, the only one that matters, comparatively, is not much. This would means that the most efficient tests would be the shortest, those with the highest ratio of FTPT coverage per cycle, with the big problem that some coverage events would never happen in that time-frame.
Slide 7 HTH Coverage in Time and FTPT
A LARX_STCX Test
7
We come back to our real problem, the hard to hit events, and try to understand where do they happen. For this we plot the hard-to-hit events compared to the all events, first as overall , and then as only the FTPT events. We already know that, according to the definition of what a hard-to-hit event is in our methodology, approximately ½ of events are defined as such. This is visible from both
- comparisons. What we also notice is that the FTPT coverage continues to keep the
same shape, in all events as in hard-to-hit events. This means that we can again state that the hard-to-hit events are accessed in “waves” throughout the test.
Slide 8
100 200 300 400 FTPT
A LARX_STCX Test
FTPT Gamma Distribution
8
What we learned until now is that the FTPT hard-to-hit events follow the shape of
- waves. We identify the waves as gamma distributions. It re-enforces the shape we
generally use to show how coverage is being achieved. Until now we know that there are “coverage waves” that come throughout a test. We can explain them as areas that are being “opened” by certain activities. For example, after a few memory operations with a given relation between addresses, there is a cache operation that is being triggered. That cache activity is new to the test, hence it will “trigger” a new wave.
Slide 9
- Expectation Minimization (EM) algorithm to identify the mixture of Gaussians
- Waves show the exercising of a new area in the design
- We do not target coverage,
we target coverage waves
Mixture of Coverage Waves
Cycles Number Of FTPT Events
9
We looked if that is true for the rest of the test, not only the obvious first wave. We used a modeling identification algorithm, and for ease we approximated the gamma with
- Gaussians. We learned that the rest of the FTPT fits to waves, throughout the whole
test, no matter how far from the “initial” wave they are.
Slide 10
10
Different Scenarios
Four tests, two different scenarios
(DSI_EAO 456 and 163 and ATOMIC 58 and 20 )
What we learned until now is that the coverage happens in waves. What we show now is that the waves are different from a scenario to another one, but consistent to a given scenario file. Intuitively that should be easy to explain what that is correct. If a scenario stresses caches, eventually it will trigger the same activity, opening similar “wave” throughout the
- HW. A different scenario, going for a different activity, will reach it in a different way, and
the activity can open a larger area of new events. Here we compare two scenarios, and two tests each.
Slide 11
11
Scenarios to Generate Certain Waves
Particular wave(s) targeted by each scenario =>
Focus on the Hard-To-Hit waves for each scenario
We notice that, if we look at the “wave” pattern, there is a huge difference between these two scenarios, but is rather consistent for tests generated with the same scenario.
Slide 12
For each scenario – Identify which hard-to-hit wave it targets – Identify the conditions under which it succeeds to achieve it.
12
HTH Coverage Wave Windows
T1 T2 T3 T4 T5 T6 T8 T9 T11 T12 T10
Cycle window likely to see a given wave.
If we look at the tests generated with the same scenario, some of them will reach the desired functionality, and hence trigger the activity wave, earlier, later, or maybe never. If we run enough such tests we can define a “window” in which there is a higher likelihood of that wave to happen. This means that there is a cycle window which we should target with our tests. Longer tests would be a waste, because we already reached the targeted wave, shorter would be a waste because we decrease the chances of reaching that wave. This is why we look at hard-to-hit events and the probability distribution of those events to be hit in a given cycle.
Slide 13
- 13
Probability Mass Function
50 100 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829 ID93352 ID41930 ID126982 ID206127
Overall
Probability => Identifies the Hard- to-Hit cycle windows
Probability for an event to be hit during a test: ( ) - number of tests which contain event e
- total number of tests
( )=(Nftpt(e))/Ntests probability mass function : →[0,1] The probability distribution of an event C to happen at cycle ∈ Cycles - the set of cycles in simulation, c one such cycle. E- sample space is the set of all possible outcomes, e - one element of E , e∈ . ( )=Pr( = )=Pr({ ∈ : ( )= }); ∑1_(c∈Cycles)▒〖p(c)=1〗;
Slide 14
14
HTH-FTPT Load Dependency
200 400 600 800 1000 1 668 1335 2002 2669 3336 4003 4670 5337 6004 6671 7338 8005 8672 9339 200 400 600 800 1000 1200 1 668 1335 2002 2669 3336 4003 4670 5337 6004 6671 7338 8005 8672 9339 10006 10673 11340 12007 12674 13341 14008 14675 15342 16009 16676 17343 18010 18677 19344 20011 20678 21345 22012 22679 23346 24013 24680 25347 26014 26681 27348 28015 28682 29349
What we learned until now is that, if we identify the window of opportunity for the wave we are interested in, we can decide how long/short the tests should be to increase our likelihood of hitting that wave (signal) without wasting resources. We can control a single variable in the test case generation process to control how many instructions the test will have. The higher the number, the more loaded with instructions the test will be, even if it follows the same pattern with another test, with less instructions.
Slide 15
15
Experimental Test Size to HTH Coverage
40 tests; TM 40000 cycles+ Original Size
We extracted experimentally, for a complicated transaction memory scenario, its coverage as a function of the size of the test (as in how many cycles it runs). We easily notice that the original load wasn’t large enough to reach two important waves of activity, and only by increasing the load did we enabled the tests to reach those areas. What we learned was that with fewer tests we increased our coverage, each test being a hit, while we used to run many tests with no result. For some of the scenario files we needed to increase the load to find the “sweet spot”, for others to decrease. This analysis allowed us to suggest a load which should enable a more efficient coverage. We implemented a feedback mechanism which uses a minimalistic implementation of such an analysis and uses the dynamic values to adjust the test case generation accordingly.
Slide 16
- Coverage Efficiency
- Observe
– Coverage in Time – FTPT Coverage in Time – HTH – Coverage waves Mixture – Model Fitting – Probability distributions
- Control
– Test case number of instructions
- Industry results
Summary
16
Slide 17
- Decreased hard-to-hit by 12%
– 73,000 to 64,000
- Never-hit before events decreased by 13%
– 15,000 to 13,000 – saving 18 Person/Months.
- Less redundancy on easy-to-hit coverage.
- Shifted manual work to the automatic process
- Decreased time to achieve targeted coverage =>
enabled finding bugs earlier.
17
Results
The results we got were impressive. With a simple tuning of our default values per scenario, we achieved a 12% decrease in hard-to-hit events and a decrease of 13% of the never-hit events. This means that many events nobody will have to go and hunt
- independently. This is on top of removing the useless redundancy in covering over and
- ver again redundant events.
Slide 18
Acknowledgments
University of Texas
Adnan Aziz
IBM
Wolfgang Roesner
18
Slide 19
- Adir, A., Almog, E., Fournier, L. & Eitan, M., 2004. Genesys-Pro: innovations in test program generation for
functional processor verification. Design & Test of Computers, IEEE, 21(2), pp. 84 - 93.
- Benjamin, M., Geist, D., Hartman, A. & Wolfsthal, Y., 1999. A study in coverage-driven test generation. New
Orleans, IEEE, pp. 970-975.
- Bergeron, J., Nightingale, A., Cerny, E. & Hunter, A., 2006. Coverage-Driven Verification. In: Springer, ed.
Verification Methodology for System Verilog. s.l.:s.n., pp. 259-280.
- Bishop, C. M., 2006. Pattern recognition and machine learning.. s.l.:Springer.
- Bruce, W., Gross, C. J. & Roesner, W., 2005. Comprehensive functional verification the complete industry cycle.
Amsterdam: Elsevier/Morgan Kaufmann.
- Dit, B., Revelle, M., Gethers, M. & Poshyvanyk, D., 2013. Feature location in source code: a taxonomy and
- survey. Journal of Software: Evolution and Process, 25(1), pp. 53-59.
- Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P., 1996. From Data Mining to Knowledge Discovery Databases.
Artificial Intelligence, 17(3).
- Foster, H., 2013. Wilson Research Group 2012 Functional Verification Study. [Online]
Available at: http://testandverification.com/DVClub/08_Apr_2013/2013AprWRGStudyatDVClubUK.pdf [Accessed 20 05 2014].
- IBM Research, 2014. Coverage Directed Test Generation. [Online]
Available at: http://www.research.ibm.com/haifa/projects/verification/ml_cdg/cdg_sbfv.html [Accessed 20 05 2014].
- Piziali, A., 2004. Coverage Driven Verification. In: Functional Verification Coverage Measurement and Analysis.
s.l.:Kluwer, pp. 109-136.
- Sane, Michael, Solving Modern Verification Challenges for Today’s Industry Leaders,
Available at: http://chipdesignmag.com/display.php?articleId=4503 [Accessed 20 05 2014]. 19