Towards Intelligence Augmented Software Traceability Jin L.C. Guo - - PowerPoint PPT Presentation

towards intelligence augmented software traceability
SMART_READER_LITE
LIVE PREVIEW

Towards Intelligence Augmented Software Traceability Jin L.C. Guo - - PowerPoint PPT Presentation

Towards Intelligence Augmented Software Traceability Jin L.C. Guo School of Computer Science McGill University Analysis Design Implementation R 1 : T h e H ig h w a y W a y s id e S e g m e n t s h a ll m o n ito r s ig n a l d a ta R


slide-1
SLIDE 1

Towards Intelligence Augmented Software Traceability

Jin L.C. Guo School of Computer Science McGill University

slide-2
SLIDE 2

Source Code

... ... ... ...

Requirement

R 2 : T h e B O S s h a ll s e n d m e s s a g e s in th e o r d e r p r o v id e d b y th e S e q u e n c e N u m b e r a s s o c ia te d w ith a s u b d iv is io n .. R 3 : T h e B O S s h a ll s e t th e M e s s a g e S o u r c e fie ld in a C S X R a ilr o a d S y s te m s m e s s a g e

... ... … ...

R 4 5 9 : T h e B O S s h a ll s u p p o r t a t m in im u m tw o in te r fa c e v e r s io n s a s d e fin e d b y C S X IC D . R 1 : T h e H ig h w a y W a y s id e S e g m e n t s h a ll m o n ito r s ig n a l d a ta R 2 : T h e B O S s h a ll s e n d m e s s a g e s in th e o r d e r p r o v id e d b y th e S e q u e n c e N u m b e r a s s o c ia te d w ith a s u b d iv is io n .. R 3 : T h e B O S s h a ll s e t th e M e s s a g e S o u r c e fie ld in a C S X R a ilr o a d S y s te m s m e s s a g e

... ... … ...

R 4 5 9 : T h e B O S s h a ll s u p p o r t a t m in im u m tw o in te r fa c e v e r s io n s a s d e fin e d b y C S X IC D .

Design

…… ……

Analysis Design Implementation

Page 2

slide-3
SLIDE 3

Source Code

... ... ... ...

Requirement

R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 1: The H ighw ay W ayside Segm ent shall m
  • nitor signal data
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD .

Design Test Cases

R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 1: The H ighw ay W ayside Segm ent shall m
  • nitor signal data
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD .

Program Comprehension Code Inspection

Page 3

slide-4
SLIDE 4

Source Code

... ... ... ...

Requirement

R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 1: The H ighw ay W ayside Segm ent shall m
  • nitor signal data
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD .

Design Test Cases

R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 1: The H ighw ay W ayside Segm ent shall m
  • nitor signal data
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD .

Impact Analysis Evolution Maintenance Requirement Coverage Analysis Regression T est Selection

Page 4

slide-5
SLIDE 5

Source Code

... ... ... ...

Design

Regulatory Compliance/ Certification

Regulatory Code/Standards Test Cases

R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 1: The H ighw ay W ayside Segm ent shall m
  • nitor signal data
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD .

Requirement

R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD . R 1: The H ighw ay W ayside Segm ent shall m
  • nitor signal data
R 2: The B O S shall send m essages in the order provided by the Sequence N um ber associated w ith a subdivision.. R 3: The B O S shall set the M essage Source field in a CSX R ailroad System s m essage ... ... … ... R 459: The B O S shall support at m inim um tw
  • interface versions
as defined by CSX ICD .

Page 5

slide-6
SLIDE 6

Page 6

Regulatory Compliance/ Certification

slide-7
SLIDE 7

Mind the Gap: Assessing the Conformance of Software Traceability to Relevant Guidelines, Patrick Rempel, Patrick Mäder, Tobias Kuschke , and Jane Cleland-Huang, ICSE 2014, Hyderabad, India DO-178B: Software Considerations in Airborne Systems and Equipment Certification.

Automated Tracing Solution Needed!

Page 7

slide-8
SLIDE 8

Human Process

Evaluate Understand Review Page 8 Domain concepts Terminology Assumptions etc. Requirement

R 1 : T h e H ig h w a y W a y s id e S e g m e n t s h a ll m o n ito r s ig n a l d a ta R 2 : T h e B O S s h a ll s e n d m e s s a g e s in th e o r d e r p r o v id e d b y th e S e q u e n c e N u m b e r a s s o c ia te d w ith a s u b d iv is io n .. R 3 : T h e B O S s h a ll s e t th e M e s s a g e S o u r c e fie ld in a C S X R a ilr o a d S y s te m s m e s s a g e

... ... … ...

R 4 5 9 : T h e B O S s h a ll s u p p o r t a t m in im u m tw o in te r fa c e v e r s io n s a s d e fin e d b y C S X IC D .

Components

D 1 : D u r in g la m p -o u t c o n d itio n s th e W IU s h a ll s e n d th e c u r r e n t s ta te o f th e h ig h w a y s ig n a l. D 2 : D a ta p o in ts p la y a s ig n ific a n t r o le in p u b lis h -s u b s c r ib e -b a s e d c o m m u n ic a tio n . D 3 :T h e B O S s u b s y s te m h a s th e fo llo w in g e x te r n a l in te r fa c e s .

... ... … ...

D 3 5 1 : T h e B O M R D r iv e r p r o v id e s m e s s a g e lo g s th r o u g h th e B O M R _ R e c o r d in g in te r fa c e a s lo g file s .
slide-9
SLIDE 9

Tracing Network

Guo, J., Cheng, J., Cleland-Huang, J. (2016). Semantically Enhanced Software Traceability Using Deep Learning Techniques. Accepted to ICSE’17. IEEE.

Domain Corpus Sample Links New Links

Page 9

slide-10
SLIDE 10

Automated Tracing?

Natural Language Inference

Premise: A soccer game with multiple males playing. Hypothesis: Some men are playing a sport. Entailment Contradiction Neutral

Informal reasoning Lexical semantic knowledge Variability of linguistic expression

Page 10

slide-11
SLIDE 11

Tracing Network

Word Representation Mapping Plink

Source s1 s2 … sm T arget t1 t2 … tn

Sentence Semantic Representation Trace Link Evaluation

Source vs1 vs2 … vsm T arget vt1 vt2 … vtn Source V

s

T arget Vt Informal reasoning Lexical semantic knowledge Variability of linguistic expression

Page 11

slide-12
SLIDE 12

t1 tn t2

RNN Unit RNN Unit RNN Unit … Vector Direction Comparison Vector Distance Calculation Integration Layer (Sigmoid) Probability Generation Layer (Softmax) Semantic Relation Evaluation Layers Target Semantic Vector

Plink …

Word Embedding Mapping

s1 sm s2 …

RNN Unit RNN Unit RNN Unit … Word Embedding Mapping Source Semantic Vector

Page 12

slide-13
SLIDE 13

Dataset and Experiment

Positive Train Control (PTC) domain

Page 13 Source Artifact Source Artifact Source Artifact Source Artifact 1651 Subsystem Requirements 33 tokens on average 1387 Trace Links PTC Domain Document Target Artifact 466 Subsystem Design Descriptions 99 tokens on average Target Artifact Target Artifact Target Artifact 52.7MB clean text 769,366 pairs 45%, 80% Training 10% Validating 45%, 10% Testing

VSM LSI Compare with

slide-14
SLIDE 14

Page 14

  • 0. 05
  • 0. 1
  • 0. 15
  • 0. 2
  • 0. 25
  • 0. 3
  • 0. 35
  • 0. 4
  • 0. 45
  • 0. 1
  • 0. 2
  • 0. 3
  • 0. 4
  • 0. 5
  • 0. 6
  • 0. 7
  • 0. 8
  • 0. 9

1

Precision Recall

Traci ng N et w

  • rk t

rai ned by 45% d at a Traci ng N et w

  • rk t

rai ned by 80 % dat a V S M LSI

Precision-Recall Curve

slide-15
SLIDE 15

Source Artifact Source Artifact Source Artifact 1651 Subsystem Requirements

PTC Domain Document Wikipedia

466 Subsystem Design Descriptions Target Artifact Target Artifact Target Artifact

52.7MB 19.92GB 570,152 pairs 769,366 pairs

1387 Links

Source Artifact: 33 tokens on average Target Artifact: 99 tokens on average Premise: 14 tokens on average Hypothesis: 8 tokens on average

Natural Language Inference Software Tracing 0.18%

Page 15

slide-16
SLIDE 16

Page 16

Negative Link Sampling

Source Artifact Source Artifact Source Artifact Target Artifact Target Artifact Target Artifact 769,366 pairs

1387 Links 0.18%

Negative Link Sampling Ratio k = 1, 5, 10

slide-17
SLIDE 17
  • 0. 1
  • 0. 2
  • 0. 3
  • 0. 4
  • 0. 5
  • 0. 6
  • 0. 7
  • 0. 8
  • 0. 9
  • 0. 1
  • 0. 2
  • 0. 3
  • 0. 4
  • 0. 5
  • 0. 6
  • 0. 7
  • 0. 8
  • 0. 9

1

Precision Recall

VSM LS I TN_Ne g 1 TN_Ne g 5 TN_Ne g 10

Precision-Recall Curve

slide-18
SLIDE 18

Page 18

Sentence Embedding

I could see the cat

  • n

the steps <eos> I got back home I got back home <eos> <eos> This was strange This was strange <eos> I could see the cat

  • n

the steps <eos> I could see the cat

  • n

the steps I could see the cat

  • n

the steps <eos>

Skip-Thought Auto-Encoder

Source Artifact Source Artifact Target Artifact Target Artifact

slide-19
SLIDE 19

Automated Trace Link Generation Techniques Trace Link Creation Tool Support Trace Link Applications Scalable, Trusted Traceability

Page 19

slide-20
SLIDE 20

Knowledge Synthesis

T erm-Matching Basic Untyped Associations Semantic Associations Expert Systems

Guo, J., Gibiec M., Cleland-Huang, J. (2016). T ackling the T erm-Mismatch Problem in Automated Trace Retrieval. EMSE Journal. Springer. Cleland-Huang, J., Guo, J. (2014). T

  • wards more intelligent trace retrieval algorithms. RAISE'14.

ACM. Cost Effort/ Training Data Needed Accuracy/Informative Rationales Page 20

slide-21
SLIDE 21

Automated Trace Link Generation Techniques Trace Link Creation Tool Support Trace Link Applications Configurable, Cost-effective Traceability

Page 21

slide-22
SLIDE 22

Target Artifact ID Rank Relevancy Score

1 2 3 8

Traceability Analyst Trace Link Generation Engine

TA #134 TA #005 TA #048 TA #361

… ...

0.884 0.762 0.744 0.690

Explanations Target Artifact ID Rank Relevancy Score Rationales Explanations Explanations Explanations

Guo, J., Monaikul, N., and Cleland-Huang, J. (2015). Trace Links Explained: An Automated Approach for Generating Rationales. RE’15 Next Track. IEEE. Page 22

Explain Trace Link

slide-23
SLIDE 23

Issue I Commit C File Legend: Artifact Trace Containment Bug Improvement Source Code File F Generalisation

Issue Tracking Repository

slide-24
SLIDE 24

ISSUE TRACKER CODE REPO CODE REPO

Add reference to issue #2578 Add reference to issue #3489 Add reference to issue #19 0.91 0.68 0.49 Rath, M., Rendall J., Guo, J., Cleland-Huang, J., Maeder P. (2018). Traceability in the Wild: Automatically Augmenting Incomplete Trace Links. @ICSE 2018. ACM.

Recommend Issue ID

slide-25
SLIDE 25

Automated Trace Link Generation Techniques Trace Link Creation Tool Support Trace Link Applications Purposed , Valued Traceability

Page 25

slide-26
SLIDE 26

Syntactic Relatedness Analyser Association Rule Miner Topic Modeler Semantic Relatedness Domain Documents Project Artifacts Trace Links External Source text text entity pair lexical data Heuristic Based Ranking of facts Candidate Facts Present to User Domain Ontology Evaluate and Add to Ontology

Guo, J. (2016). Ontology Learning and its Application in Software-Intensive Projects. ICSE'16 Companion. Page 26

Domain Knowledge Discovery

slide-27
SLIDE 27

ISSUE TRACKER CODE REPO

Impact Analysis

Source Code Analysis CKJM Metrics Code Smells Code Change History Basic Counts Temporal Decay Requirements Content Analysis Requirements Relatedness to Code Metrics Across Multiple Releases Requirement to Requirement Set Similarity Falessi, D., Roll, J., Guo, J., Cleland-Huang, J. (2018). Leveraging Semantic Similarity between Requirements to Identify Impacted Classes. Under minor revision for TSE Journal. IEEE.

slide-28
SLIDE 28

Cleland-Huang, J., Guo, J., Monaikul, N., Lohar, S., Goss, W., and Rasin, A. Using Natural Language Processing to Translate Software Project Queries into Structured Form. NLSE Workshop @ FSE’16

Page 28

Project Q&A

slide-29
SLIDE 29

Page 29

slide-30
SLIDE 30

Domain Experts Engineer AI Experts/Scientists Assumptions Constrains Learning Models Platform Data

slide-31
SLIDE 31

Thank you!

Jin L.C. Guo jguo@cs.mcgill.ca http://jguo-web.com