Towards Intelligence Augmented Software Traceability Jin L.C. Guo - - PowerPoint PPT Presentation
Towards Intelligence Augmented Software Traceability Jin L.C. Guo - - PowerPoint PPT Presentation
Towards Intelligence Augmented Software Traceability Jin L.C. Guo School of Computer Science McGill University Analysis Design Implementation R 1 : T h e H ig h w a y W a y s id e S e g m e n t s h a ll m o n ito r s ig n a l d a ta R
Source Code
... ... ... ...
Requirement
R 2 : T h e B O S s h a ll s e n d m e s s a g e s in th e o r d e r p r o v id e d b y th e S e q u e n c e N u m b e r a s s o c ia te d w ith a s u b d iv is io n .. R 3 : T h e B O S s h a ll s e t th e M e s s a g e S o u r c e fie ld in a C S X R a ilr o a d S y s te m s m e s s a g e... ... … ...
R 4 5 9 : T h e B O S s h a ll s u p p o r t a t m in im u m tw o in te r fa c e v e r s io n s a s d e fin e d b y C S X IC D . R 1 : T h e H ig h w a y W a y s id e S e g m e n t s h a ll m o n ito r s ig n a l d a ta R 2 : T h e B O S s h a ll s e n d m e s s a g e s in th e o r d e r p r o v id e d b y th e S e q u e n c e N u m b e r a s s o c ia te d w ith a s u b d iv is io n .. R 3 : T h e B O S s h a ll s e t th e M e s s a g e S o u r c e fie ld in a C S X R a ilr o a d S y s te m s m e s s a g e... ... … ...
R 4 5 9 : T h e B O S s h a ll s u p p o r t a t m in im u m tw o in te r fa c e v e r s io n s a s d e fin e d b y C S X IC D .Design
…… ……
Analysis Design Implementation
Page 2
Source Code
... ... ... ...
Requirement
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw- interface versions
- interface versions
- interface versions
- interface versions
- nitor signal data
- interface versions
Design Test Cases
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw- interface versions
- interface versions
- interface versions
- interface versions
- nitor signal data
- interface versions
Program Comprehension Code Inspection
Page 3
Source Code
... ... ... ...
Requirement
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw- interface versions
- interface versions
- interface versions
- interface versions
- nitor signal data
- interface versions
Design Test Cases
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw- interface versions
- interface versions
- interface versions
- interface versions
- nitor signal data
- interface versions
Impact Analysis Evolution Maintenance Requirement Coverage Analysis Regression T est Selection
Page 4
Source Code
... ... ... ...
Design
Regulatory Compliance/ Certification
Regulatory Code/Standards Test Cases
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw- interface versions
- interface versions
- interface versions
- interface versions
- nitor signal data
- interface versions
Requirement
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw- interface versions
- interface versions
- interface versions
- interface versions
- nitor signal data
- interface versions
Page 5
Page 6
Regulatory Compliance/ Certification
Mind the Gap: Assessing the Conformance of Software Traceability to Relevant Guidelines, Patrick Rempel, Patrick Mäder, Tobias Kuschke , and Jane Cleland-Huang, ICSE 2014, Hyderabad, India DO-178B: Software Considerations in Airborne Systems and Equipment Certification.
Automated Tracing Solution Needed!
Page 7
Human Process
Evaluate Understand Review Page 8 Domain concepts Terminology Assumptions etc. Requirement
R 1 : T h e H ig h w a y W a y s id e S e g m e n t s h a ll m o n ito r s ig n a l d a ta R 2 : T h e B O S s h a ll s e n d m e s s a g e s in th e o r d e r p r o v id e d b y th e S e q u e n c e N u m b e r a s s o c ia te d w ith a s u b d iv is io n .. R 3 : T h e B O S s h a ll s e t th e M e s s a g e S o u r c e fie ld in a C S X R a ilr o a d S y s te m s m e s s a g e... ... … ...
R 4 5 9 : T h e B O S s h a ll s u p p o r t a t m in im u m tw o in te r fa c e v e r s io n s a s d e fin e d b y C S X IC D .Components
D 1 : D u r in g la m p -o u t c o n d itio n s th e W IU s h a ll s e n d th e c u r r e n t s ta te o f th e h ig h w a y s ig n a l. D 2 : D a ta p o in ts p la y a s ig n ific a n t r o le in p u b lis h -s u b s c r ib e -b a s e d c o m m u n ic a tio n . D 3 :T h e B O S s u b s y s te m h a s th e fo llo w in g e x te r n a l in te r fa c e s .... ... … ...
D 3 5 1 : T h e B O M R D r iv e r p r o v id e s m e s s a g e lo g s th r o u g h th e B O M R _ R e c o r d in g in te r fa c e a s lo g file s .Tracing Network
Guo, J., Cheng, J., Cleland-Huang, J. (2016). Semantically Enhanced Software Traceability Using Deep Learning Techniques. Accepted to ICSE’17. IEEE.
Domain Corpus Sample Links New Links
Page 9
Automated Tracing?
Natural Language Inference
Premise: A soccer game with multiple males playing. Hypothesis: Some men are playing a sport. Entailment Contradiction Neutral
Informal reasoning Lexical semantic knowledge Variability of linguistic expression
Page 10
Tracing Network
Word Representation Mapping Plink
Source s1 s2 … sm T arget t1 t2 … tn
Sentence Semantic Representation Trace Link Evaluation
Source vs1 vs2 … vsm T arget vt1 vt2 … vtn Source V
s
T arget Vt Informal reasoning Lexical semantic knowledge Variability of linguistic expression
Page 11
t1 tn t2
RNN Unit RNN Unit RNN Unit … Vector Direction Comparison Vector Distance Calculation Integration Layer (Sigmoid) Probability Generation Layer (Softmax) Semantic Relation Evaluation Layers Target Semantic Vector
Plink …
Word Embedding Mapping
s1 sm s2 …
RNN Unit RNN Unit RNN Unit … Word Embedding Mapping Source Semantic Vector
Page 12
Dataset and Experiment
Positive Train Control (PTC) domain
Page 13 Source Artifact Source Artifact Source Artifact Source Artifact 1651 Subsystem Requirements 33 tokens on average 1387 Trace Links PTC Domain Document Target Artifact 466 Subsystem Design Descriptions 99 tokens on average Target Artifact Target Artifact Target Artifact 52.7MB clean text 769,366 pairs 45%, 80% Training 10% Validating 45%, 10% Testing
VSM LSI Compare with
Page 14
- 0. 05
- 0. 1
- 0. 15
- 0. 2
- 0. 25
- 0. 3
- 0. 35
- 0. 4
- 0. 45
- 0. 1
- 0. 2
- 0. 3
- 0. 4
- 0. 5
- 0. 6
- 0. 7
- 0. 8
- 0. 9
1
Precision Recall
Traci ng N et w
- rk t
rai ned by 45% d at a Traci ng N et w
- rk t
rai ned by 80 % dat a V S M LSI
Precision-Recall Curve
Source Artifact Source Artifact Source Artifact 1651 Subsystem Requirements
PTC Domain Document Wikipedia
466 Subsystem Design Descriptions Target Artifact Target Artifact Target Artifact
52.7MB 19.92GB 570,152 pairs 769,366 pairs
1387 Links
Source Artifact: 33 tokens on average Target Artifact: 99 tokens on average Premise: 14 tokens on average Hypothesis: 8 tokens on average
Natural Language Inference Software Tracing 0.18%
Page 15
Page 16
Negative Link Sampling
Source Artifact Source Artifact Source Artifact Target Artifact Target Artifact Target Artifact 769,366 pairs
1387 Links 0.18%
Negative Link Sampling Ratio k = 1, 5, 10
- 0. 1
- 0. 2
- 0. 3
- 0. 4
- 0. 5
- 0. 6
- 0. 7
- 0. 8
- 0. 9
- 0. 1
- 0. 2
- 0. 3
- 0. 4
- 0. 5
- 0. 6
- 0. 7
- 0. 8
- 0. 9
1
Precision Recall
VSM LS I TN_Ne g 1 TN_Ne g 5 TN_Ne g 10
Precision-Recall Curve
Page 18
Sentence Embedding
I could see the cat
- n
the steps <eos> I got back home I got back home <eos> <eos> This was strange This was strange <eos> I could see the cat
- n
the steps <eos> I could see the cat
- n
the steps I could see the cat
- n
the steps <eos>
Skip-Thought Auto-Encoder
Source Artifact Source Artifact Target Artifact Target Artifact
Automated Trace Link Generation Techniques Trace Link Creation Tool Support Trace Link Applications Scalable, Trusted Traceability
Page 19
Knowledge Synthesis
T erm-Matching Basic Untyped Associations Semantic Associations Expert Systems
Guo, J., Gibiec M., Cleland-Huang, J. (2016). T ackling the T erm-Mismatch Problem in Automated Trace Retrieval. EMSE Journal. Springer. Cleland-Huang, J., Guo, J. (2014). T
- wards more intelligent trace retrieval algorithms. RAISE'14.
ACM. Cost Effort/ Training Data Needed Accuracy/Informative Rationales Page 20
Automated Trace Link Generation Techniques Trace Link Creation Tool Support Trace Link Applications Configurable, Cost-effective Traceability
Page 21
Target Artifact ID Rank Relevancy Score
1 2 3 8
Traceability Analyst Trace Link Generation Engine
TA #134 TA #005 TA #048 TA #361
… ...
0.884 0.762 0.744 0.690
Explanations Target Artifact ID Rank Relevancy Score Rationales Explanations Explanations Explanations
Guo, J., Monaikul, N., and Cleland-Huang, J. (2015). Trace Links Explained: An Automated Approach for Generating Rationales. RE’15 Next Track. IEEE. Page 22
Explain Trace Link
Issue I Commit C File Legend: Artifact Trace Containment Bug Improvement Source Code File F Generalisation
Issue Tracking Repository
ISSUE TRACKER CODE REPO CODE REPO
Add reference to issue #2578 Add reference to issue #3489 Add reference to issue #19 0.91 0.68 0.49 Rath, M., Rendall J., Guo, J., Cleland-Huang, J., Maeder P. (2018). Traceability in the Wild: Automatically Augmenting Incomplete Trace Links. @ICSE 2018. ACM.
Recommend Issue ID
Automated Trace Link Generation Techniques Trace Link Creation Tool Support Trace Link Applications Purposed , Valued Traceability
Page 25
Syntactic Relatedness Analyser Association Rule Miner Topic Modeler Semantic Relatedness Domain Documents Project Artifacts Trace Links External Source text text entity pair lexical data Heuristic Based Ranking of facts Candidate Facts Present to User Domain Ontology Evaluate and Add to Ontology
Guo, J. (2016). Ontology Learning and its Application in Software-Intensive Projects. ICSE'16 Companion. Page 26
Domain Knowledge Discovery
ISSUE TRACKER CODE REPO
Impact Analysis
Source Code Analysis CKJM Metrics Code Smells Code Change History Basic Counts Temporal Decay Requirements Content Analysis Requirements Relatedness to Code Metrics Across Multiple Releases Requirement to Requirement Set Similarity Falessi, D., Roll, J., Guo, J., Cleland-Huang, J. (2018). Leveraging Semantic Similarity between Requirements to Identify Impacted Classes. Under minor revision for TSE Journal. IEEE.
Cleland-Huang, J., Guo, J., Monaikul, N., Lohar, S., Goss, W., and Rasin, A. Using Natural Language Processing to Translate Software Project Queries into Structured Form. NLSE Workshop @ FSE’16
Page 28
Project Q&A
Page 29
Domain Experts Engineer AI Experts/Scientists Assumptions Constrains Learning Models Platform Data