Improving Trace Accuracy through Data-Driven Configuration and - PowerPoint PPT Presentation

Improving Trace Accuracy through Data-Driven Configuration and Composition of Tracing Features Barleen Kaur COMP 762

What is traceability? Traces are navigable links between data held in software artifacts (like requirement document, design documents, code, test cases) that are otherwise disconnected 1 . Requirement Traceability matrix: A matrix which shows which pair of artifacts will be associated via which link. 1 Software and Systems Traceability by Jane Cleland Huang et. al.

Algorithms for traceability Developers need to manually go through each document/artifact thoroughly and then generate links. Manual maintenance of traceability links is an error prone and laborious job. Algorithms to semi-automate the process of creating trace links: 1) Vector Space model 2) Probabilistic approach 3) Latent semantic indexing 4) Rule based approaches and so on … Problem: Different algorithms work best for different datasets.

Motivation Manual creation and maintenance of trace links is an error prone and laborious ● job. No one size fit solution: Finding the best configuration of tracing techniques for a ● specific dataset can lead to significant improvements in the quality of generated links. To set the best configuration of existing techniques as a baseline for new ● techniques rather than a single technique which performs inadequately for a specific dataset. Goal : Find a best combination of existing traceability techniques in order to generate accurate trace links for a specific dataset at hand.

Dynamic Trace Configuration : High Level Architecture Selection of best configuration dynamically at run-time for a specific dataset. Best/Top configuration generating DTC Training set of source and target quality links for a given dataset and artifacts and validated trace links. feature model. Genetic algorithm (to search through space of viable Existing tracing techniques configurations intelligently) Elements of DTC : 1) Feature model 2) Simulation environment to generate trace links from a configuration for a dataset 3) Intelligent search algorithm

Feature Model Representation of Feature model is done Using: Textual Variability Language (non graphical, text based language like C). Goal: Scalable to be succinct, modular To be comprehensible. It has its own Grammer rules, and semantics. Preprocessor: Acronym expander: ● “RBAC ” -> Role based access Control Stemmer: Inflected word forms to base form. For e.g. Bank, banking -> bank ● Stopper: remove the commonly occuring words which don’t convey significant meaning “the”, ● “this” etc. Dynamic stopper : not using a precompiled list of stopwords, but generating it dynamically. ●

Feature Model Dictionary builder: Local tf-idf: based on ● terms present in artifacts ANC : based on terms in ● American national corpus. Trace Algorithms: Generate the trace links using VSM (vector representation tf-idf and then cosine similarity) or LSI (uses SVD to match queries and documents by meaning). Ordering of trace links: Ranked order: based on similarity scores. ● Incremental approach: Incremental feedback is needed to decide the order. ● Direct Query Manipulation: Query modification till user gets his desired results. ● In addition to these, “requires” and “constraints” relationships and parameters of each feature are also captured.

Evaluation of generated trace links Intuition behind Mean Average Precision (MAP) : Suppose we are searching for images of a flower on image retrieval system, we do get back a bunch of ranked images (from most likely to least likely). Usually not all of them are correct. So we compute the precision at every correctly returned image, and then take an average. If our returned result is 1, 0, 0, 1, 1, 1 where 1 is an image of a flower, while 0 not, then the precision at every correct point is: how many correct images have been encountered up to this point (including current) divided by the total images seen up to this point. 1/1, 0, 0, 2/4, 3/5, 4/6 . The AP for above example is 0.6917. For example, an AP of 0.5 could have results like 0, 1, 0, 1, 0, 1, … where every second image is correct, while an AP of 0.333 has 0, 0, 1, 0, 0, 1, 0, 0, 1, … where every third image is correct. MAP is just an extension, where the mean is taken across all AP scores for many queries. Source: https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/

Pipe and filter architecture Components in the pipeline can be turned on or off to produce different ● configurations. Future Work: Dynamic sequencing of the components of preprocessors ● and/or merge output of multiple tracing techniques using voting techniques.

Genetic Algorithm A wise man called Charles Darwin once said….. It is not the strongest of the species that survives, nor the most intelligent , but the one most responsive to change.

Intelligent search: Genetic Algorithm Initialisation : Define your population, where each ● individual has its own set of chromosomes (binary strings). Fitness Function : Compare two chromosomes based ● on fitness score like MAP in the paper. Selection: Select fit chromosomes from the population ● which can mate and create their healthy off-springs. But that would lead to chromosomes that are more close to one another in a few next generation, and therefore less diversity. Roulette Wheel: let’s divide the wheel into m divisions, where m is the number of chromosomes in our populations. The area occupied by each chromosome will be proportional to its fitness value. Now this wheel is rotated and the region of wheel which comes in front of the fixed point is chosen. Source: https://www.analyticsvidhya.com/blog/2017/07/introduction-to-genetic-algorithm/

Genetic Algorithm Contd... Crossover : Nothing but reproduction. We select a random crossover point and the tails of both the chromosomes are swapped to produce a new off-springs. This is also known as one point crossover. Mutation: Children don’t have the same exact traits as their parents. This process is known as mutation, which may be defined as a random tweak in the chromosome, which also promotes the idea of diversity in the population. The off-springs thus produced are again validated using our fitness function, and if considered fit then will replace the less fit chromosomes from the population.

Stopping Criteria There is no improvement in the population for over x iterations. ( 5 generations) ● We have already predefined an absolute number of generation for our algorithm. ( 60th ● generation) When our fitness function has reached a predefined value. ● End result : Highest performing configuration across all generation is selected.

Hypothesis Testing Question: A company has stated that their straw machine makes straws that are 4 mm in diameter. A worker believes that the machines no longer makes straws of that size and samples 100 straws to perform a hypothesis test with 99% confidence. N =100, c= 0.99, alpha = 1-c =0.01 Null Hypothesis: H0 => mean = 100 Alternative Hypothesis: Ha => mean is not equal to 100

Source: https://www.youtube.com/watch?v=cW16A7hXbTo

If the P-value is low, the null must go! (reject Ho) If the P-value is high, the null must fly! (fail to reject Ho)

Improving Trace Accuracy through Data-Driven Configuration and - PowerPoint PPT Presentation

Improving Trace Accuracy through Data-Driven Configuration and Composition of Tracing Features Barleen Kaur COMP 762 What is traceability? Traces are navigable links between data held in software artifacts (like requirement document, design

Trace Caches and optimizations therein CSE 240C - Rushi Chakrabarti - Winter 2009 Trace Caches

Assessing the Performance of MPI Applications Through Time-Independent Trace Replay . Desprez 1

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

Our Hobbies 1B Cindy Chan Trace Chan Yuki Lo All: Good morning ,everybody. Cindy: I am Cindy

Trace Elements in igneous petrology Abundances of trace elements are used to test petrogenetic

Trace and center of the twisted Heisenberg category Michael Reeks June 4, 2018 Michael Reeks

DIV 26000 AND HEAT TRACE FOR MECHANICAL SYSTEMS ACE/ASM DOS AND DONTS OF HEAT TRACE IN

Semantic Trace-based Malware Variants Detection Khalid Alzarooni CREST - DCS - UCL April 6,

Indoor Accuracy Test Bed Framework Indoor Accuracy Test Bed Framework Working Group #3 E911

the myth of accuracy Damian Harty, Lucid Motors the myth of accuracy Its easy to believe

Trace-driven Simulation of Multithreaded Applications Alejandro Rico, Alejandro Duran, Felipe

False fasting is driven by pride False fasting is driven by pride False fasting is

Conquering Microservices Complexity @Uber With Distributed Tracing Yuri Shkuro SOFTWARE ENGINEER

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

ANALYTICAL N-BPM METHOD ANALYTICAL N-BPM METHOD IMPROVING ACCURACY AND ROBUSTNESS OF LINEAR

Improving Bug Prediction Accuracy by Regularization and Hyperparameter Optimization Haidar Osman

Comparison of complementary statistical analysis approaches in metabolomic food traceability Ral

Scott Hebbard Scott Hebbard Communicatjons Manager at Sparx Systems Over 2 decades of

Non-Hamiltonian and Non-Traceable Regular 3-Connected Planar Graphs Nico Van Cleemput Carol T.

E-Passport: The Global Traceability or How to Feel Like an UPS Package Dario Carluccio, Kerstin

Group Signatures [CH91] allow a member to anonymously and accountably sign on behalf of a group.

Some Advances in Broadcast Encryption and Traitor Tracing Duong Hieu Phan ( S eminaire LIPN -

EGI-InSPIRE Cloud Security Implementations/Policies/Certification Sven Gabriel, sveng@nikhef.nl

Requirements Engineering Requirem ents Engineering Unit 6: Requirem ents Engineering process