SLIDE 1 Submodel Pattern Extraction for Simulink Models
James R. Cordy Queen’s University NECSIS Automotive Partnership Canada
SLIDE 2
MPE
near-miss clones
near-miss clone detection
Simone
analysis of GM models
pattern extraction
pattern evolution
SLIDE 3
MPE
near-miss clones
near-miss clone detection
Simone
analysis of GM models
pattern extraction
pattern evolution
SLIDE 4
Model Pattern Engineering discover, catalogue and formalize
submodel patterns
emergent
domain-specific, client-specific
SLIDE 5 discovery
analysis, identification methodology, techniques
classification
characterization, formalization notation, tooling, catalogues
application
deployment, analysis
- rganization, documentation, use cases
SLIDE 6
why?
reuse in model development
standards/consistency analysis/enforcement
failure/change propagation in model maintenance
verification/test optimization
deployment variation/optimization
model product lines
SLIDE 7
MPE
near-miss clones
near-miss clone detection
Simone
analysis of GM models
pattern extraction
pattern evolution
SLIDE 8
code clones
copy-paste programming
efficient, widely used
problematic
SLIDE 9 code clones
type 1 - exact
bool ConfNextToken (char **p) { // skip white space while (1) switch (**p) { case '\t' : // ignore case ' ' : (*p)++; break; case '\0' : return FALSE; default : return TRUE; }; } bool ConfNextToken (char **p) { // skip white space while (1) switch (**p) { case '\t' : // ignore case ' ' : (*p)++; break; case '\0' : return FALSE; default : return TRUE; }; } [Roy, Cordy, Koschke SCP 2009]
SLIDE 10 code clones
type 1 - exact
bool ConfNextToken (char **p) { // skip white space while (1) switch (**p) { case '\t' : // ignore case ' ' : (*p)++; break; case '\0' : return FALSE; default : return TRUE; }; } bool ConfNextToken (char **p) { while (1) switch (**p) { case '\t': case ' ': // just skip (*p)++; break; case '\0': // eof return FALSE; default: // something we want return TRUE; }; } [Roy, Cordy, Koschke SCP 2009]
SLIDE 11 code clones
type 2 - renamed
bool ConfNextToken (char **p) { // skip white space while (1) switch (**p) { case '\t' : // ignore case ' ' : (*p)++; break; case '\0' : return FALSE; default : return TRUE; }; } bool NextToken (char **bp) { while (1) // not really switch (**bp) { case '\t': case ' ': // next (*bp)++; break; case '\0': return 0; default: return 1; }; } [Roy, Cordy, Koschke SCP 2009]
SLIDE 12 bool ConfNextToken (char **p) { // skip white space while (1) switch (**p) { case '\t' : // ignore case ' ' : (*p)++; break; case '\0' : return FALSE; default : return TRUE; }; }
code clones
type 3 - near miss
bool NextToken (char **bp) { while (1) // not really switch (**bp) { case '\t': (*bp)++ case ' ': break; case '\0': return 0; default: return 1; } } [Roy, Cordy, Koschke SCP 2009]
SLIDE 13 model clones
type 1 - exact
Tfmaxs 2 Tfmaxk 1 Torque Conversion 2/3*R*muk Ratio of static to kinetic mus/muk Fn 1 Tfmaxs 2 Tfmaxk 1 Torque Conversion 2/3*R*muk Ratio of static to kinetic mus/muk Fn 1
[Alalfi, Cordy, Dean, Stephan, Stevenson ICSM 2012] [Störrle SSM 2013]
SLIDE 14 model clones
type 2 - renamed
[Alalfi, Cordy, Dean, Stephan, Stevenson ICSM 2012] [Störrle SSM 2013]
SLIDE 15 model clones
type 3 - near miss
[Alalfi, Cordy, Dean, Stephan, Stevenson ICSM 2012] [Störrle SSM 2013]
SLIDE 16
MPE
near-miss clones
near-miss clone detection
Simone
analysis of GM models
pattern extraction
pattern evolution
SLIDE 17 ConQAT
graph-based model clone detection
[Deissenboeck et al. IWSC 2010]
SLIDE 18
graph flattening ignores hierarchical structure
problems with near-miss
SLIDE 19
graph flattening ignores hierarchical structure
problems with near-miss
SLIDE 20
code-based near-miss works well
NiCad, iClones, others
mature, accurate, efficient
handles unexpected differences
threshold-based, tunable
scalable
SLIDE 21 NiCad
parse - extract - normalize - diff threshold
Pretty-printed Potential Clones
1 2 3 4
Parsing & Potential Clone Extraction Original Code Base
- 1. Parse / Extract
- 2. Rename / Filter / Normalize
Renaming, Filtering, Normalization Normalized Potential Clones
1 2 3 4
Clone Classes
5.pc 23.pc 67.pc . . . 12.pc 17.pc 22.pc . . . 15.pc 18.pc 78.pc . . . 21.pc 63.pc 97.pc . . . 37.pc 39.pc 44.pc . . .
Choose Next Potential Clone as Exemplar
Comparable Size Potential Clone Cluster Pairwise Comparison with Exemplar (Repeat) Cluster Comparable Size PCs Normalized Potential Clones
1 2 3 4
[Roy, Cordy ICPC 2008]
SLIDE 22 crazy idea:
can we use near-miss text code methods
“Models are source code too”
Mark Harman, keynote at SCAM 2010
[Harman SCAM 2010]
SLIDE 23
MPE
near-miss clones
near-miss clone detection
Simone
analysis of GM models
pattern extraction
pattern evolution
SLIDE 24 Simone
Simulink near-miss clone detection
experiment adapt NiCad near-miss code clone
detector to graphical models
validate vs. ConQAT
for types 1 & 2
hand validate type 3 (near-miss)
[Alalfi, Cordy,Dean, Stephan, Stevenson ICSM 2012]
SLIDE 25 Simulink
hybrid hardware/software models
widespread in industry
- automotive, aerospace, embedded systems
mature and interesting at GM
SLIDE 26 Simulink
hierarchical models
[Alalfi, Cordy,Dean, Stephan, Stevenson ICSM 2012]
SLIDE 27 Challenge #1
code methods require text
NiCad requires a parser
Solution: grammar inference
internal form
... System { Name "onoff" Location [168, 385, 668, 686] Open on ModelBrowserVisibility off ModelBrowserWidth 200 ScreenColor "automatic" PaperOrientationi "landscape" PaperPositionMode "auto" PaperType "usletter" PaperUnits "inches" ZoomFactor "100" AutoZoom on ReportName "simulink-default.rpt" Block { BlockType DiscretePulseGenerator Name "Discrete Pulse\nGenerator" Position [45, 25, 75, 55] Amplitude "1" Period "2" PulseWidth "1" PhaseDelay "0" SampleTime "1" } Block { BlockType Product Name "Product" Ports [2, 1, 0, 0, 0] Position [145, 67, 175, 98] Inputs "2" SaturateOnIntegerOverflow on } ... } ...
SLIDE 28 Challenge #2
what granularity?
NiCad requires candidates for comparison
Simulink: model (too big) block (too small) system (just right!)
... System { Name "onoff" Location [168, 385, 668, 686] Open on ModelBrowserVisibility off ModelBrowserWidth 200 ScreenColor "automatic" PaperOrientationi "landscape" PaperPositionMode "auto" PaperType "usletter" PaperUnits "inches" ZoomFactor "100" AutoZoom on ReportName "simulink-default.rpt" Block { BlockType DiscretePulseGenerator Name "Discrete Pulse\nGenerator" Position [45, 25, 75, 55] Amplitude "1" Period "2" PulseWidth "1" PhaseDelay "0" SampleTime "1" } Block { BlockType Product Name "Product" Ports [2, 1, 0, 0, 0] Position [145, 67, 175, 98] Inputs "2" SaturateOnIntegerOverflow on } ... } ...
SLIDE 29 even with raw text, find some subsystem clones
but:
90% irrelevant Simulink internal “formatting” systems
some identical systems only 70% same
entirely missed exact copies displayed differently
neutral_up_down 1 mutually_exclusive
neutral up down validated_neutral validated_up validated_down
check_up
action reset checked_action
check_down
action reset checked_action
Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive
neutral up down validated_neutral validated_up validated_down
check_up
action reset checked_action
check_down
action reset checked_action
Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
SLIDE 30 Challenge #3
problems with “noise”
solution: “agile parsing” to
filter out irrelevant elements
... System { Name "onoff" Location [168, 385, 668, 686] Open on ModelBrowserVisibility off ModelBrowserWidth 200 ScreenColor "automatic" PaperOrientationi "landscape" PaperPositionMode "auto" PaperType "usletter" PaperUnits "inches" ZoomFactor "100" AutoZoom on ReportName "simulink-default.rpt" Block { BlockType DiscretePulseGenerator Name "Discrete Pulse\nGenerator" Position [45, 25, 75, 55] Amplitude "1" Period "2" PulseWidth "1" PhaseDelay "0" SampleTime "1" } Block { BlockType Product Name "Product" Ports [2, 1, 0, 0, 0] Position [145, 67, 175, 98] Inputs "2" SaturateOnIntegerOverflow on } ... } ... [Dean, Cordy, Malton, Schneider JASE 2003]
SLIDE 31 filtering
removes more than 300 kinds
elements and blocks
increases signal-to-noise
ratio in text
... System { Name "onoff” Block { BlockType DiscretePulseGenerator Name "Discrete Pulse\nGenerator” Amplitude "1" Period "2" PulseWidth "1" PhaseDelay "0" SampleTime "1" } Block { BlockType Product Name "Product" Ports [2, 1, 0, 0, 0] Inputs "2” } ... } ...
SLIDE 32
filtering significantly improved performance
precision - 10x fewer false positives hand validation of results
recall - many fewer false negatives
fewer missed clones much larger clones
but:
some clones we could clearly see by hand still not detected - why?
SLIDE 33
Challenge #4
no linear order of model elements
SLIDE 34
Challenge #4
solution: topological sort by block, line, port, branch
SLIDE 35 sorting
increases recall, to find many more clones
neutral_up_down 1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
SLIDE 36 sorting
increases recall, to find many more clones
neutral_up_down 1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
1 mutually_exclusive neutral up down validated_neutral validated_up validated_down check_up action reset checked_action check_down action reset checked_action Goto1 [reset] From2 [reset] From1 [reset] reset 4 down 3 up 2 neutral 1
SLIDE 37 ... System { Name "onoff” Block { BlockType DiscretePulseGenerator Name "Discrete Pulse\nGenerator” Amplitude "1" Period "2" PulseWidth "1" PhaseDelay "0" SampleTime "1" } Block { BlockType Product Name "Product" Ports [2, 1, 0, 0, 0] Inputs "2” } ... } ...
Challenge #5
finding type 2 (renamed) requires anonymization
names in Simulink not like other
languages
solution:
context-dependent anonymizer
SLIDE 38 validation - Simone vs. ConQAT
- n Matlab Central public model systems
finds all type1(exact) and type 2 (renamed)
clones found by ConQAT
finds many new type 3 (near-miss)
clones not found by ConQAT
finds larger clones and larger clone classes
[Alalfi, Cordy,Dean, Stephan, Stevenson ICSM 2012]
SLIDE 39
Simone vs. ConQAT
SLIDE 40 Total nontrivial subsystems 357 Extractor only Filtered Filtered & Sorted Filtered, Sorted & Renamed Clone Type Type 1 Type 3-1 @30% Type 1 Type 3-1 @30% Type 1 Type 3-1 @30% Type 2 Type 3-2 @30% Clone Pairs 116 / 10* 364 / 164* 204 204 303 181 279 1938 Clone Classes 8 / 4* 57 / 56* 44 55 45 52 48 24 Clone Coverage 8% / 3% 52% / 46% 37% 48% 42% 45% 49% 75%
Simone near-miss clones
in Simulink public automotive model variants
[Alalfi, Cordy,Dean, Stephan, Stevenson ICSM 2012]
SLIDE 41 !"#$%&%'()(*+&$%'( ,+-.".$%-$(/%-01"-2( 3#"-'(/%-01"-2( *4.$%1.( 56$&07$%'( ,#+-%.( ,#0..%.( ,#+-%.( ,#0..%.( ,#+-%.( ,#0..%.( !"#$"%&'()#*"$& +,-& ,,.& ,+& ,/+& ,.&
,1& 2345$3)6&7(48& 1/9:& ;0-& .,& ,11& .,& 1.;0&
'8)<"#&=>"4?$(@&
1+.&
;;;&
89('+$)::( 099& ;11& ;.& ;90& ;0& /::& ;:& 89('+$):;( 099& ;11& ;/& ;.-& ;.& /1.& ;1& 89('+$):<( 0./& ;1;& ;+& ;.-& ;/& /;-& ;,& 89('+$):=( 1:01& ;/-& ,+& ,9;&
11-9& ,:&
Simone near-miss pattern mining
in Simulink models
SLIDE 42
MPE
near-miss clones
near-miss clone detection
Simone
analysis of GM models
pattern extraction
pattern evolution
SLIDE 43
Case study
GM fuel system models
SimGraph visualization
understanding Simulink model subsystem similarity
SLIDE 44 GM Fuel System Models
Subsystem similarity overview
Large subsystems Midsize subsystems Small subsystems
Z models
(red)
Y models
(green)
X models
(blue)
SLIDE 45 GM Fuel System Models
Subsystem similarity overview
Many subsystems unique - not similar to any others in these models
SLIDE 46 GM Fuel System Models
Remove unique subsystems
Connecting lines represent subsystem similarity - thick lines, 90-100% similar thin lines, 70-80% similar Similar subsystems “near-miss clones” both within and between models
SLIDE 47 GM Fuel System Models
Rearrange to cluster similar subsystems
Clusters reveal groups of similar subsystems - “clone classes”
SLIDE 48 GM Fuel System Models
Infer common subsystem patterns
Patterns characterize common repeated similar subsystem paradigms
Small groups of relatively large similar subsystems both within and across models Large groups of small to mid-sized similar subsystems across models
SLIDE 49
MPE
near-miss clones
near-miss clone detection
Simone
analysis of GM models
pattern extraction
pattern evolution
SLIDE 50
using patterns SimNav
exemplar sets as patterns
SimPat
modeling variance
SLIDE 51
SimNav
presenting and integrating results in Simulink
exemplar sets as patterns
SLIDE 52 SimNav
&'()*'+,
&=!>-? !"#$"%
#12345/61078*9+170131839:7 ;69):8178901< &'()*'+,78*9+170131839:7 ;(901*6<
&'(-/@
SLIDE 53
SLIDE 54
SimPat
characterizing and representing subsystem patterns
modeling variance
SLIDE 55
SimPat
SLIDE 56 !901*7D/E1:+6
!901*7.*9+17.*/66A7 B76)56C631(78*9+16
!901*7D/E1:+7FA
!1:G17#H:9E*177/+07 D:166):17?6I(/I9+
SLIDE 57 !901*7D/E1:+7JA
777!1:G17/**73H:117 77777776)56C631(6 7
!901*7D/E1:+6
!901*7.*9+17.*/66A7 B76)56C631(78*9+16
SLIDE 58 !901*7D/E1:+7BA
.9((9+71*1(1+36 77777777777779+*C
!901*7D/E1:+6
!901*7.*9+17.*/66A7 B76)56C631(78*9+16
SLIDE 59 MPE
near-miss clones
near-miss clone detection
Simone
analysis of GM models
pattern extraction
pattern evolution
!"#$
%&$ %"$
&'$ &($ &)$ &'$ &'$ &*$
%!$
&($ &)$ &*$ &($ &)$ &*$ !"#$
SLIDE 60 model pattern evolution
SimCCT
evolution of patterns
across versions
pattern variance in two dimensions
instance, time
[Stephan, Alalfi, Cordy, Stevenson ME 2013]
SLIDE 61
model pattern
MCC - model clone class
elements of a model pattern
MCI - model clone instance
evolution of patterns
migration of MCIs between MCCs across versions of the system
SLIDE 62
evolution of patterns
1-1 pattern is stable across versions 1-1* pattern exists, but loses or gains MCIs 1-many pattern splits into multiple patterns 1-many* pattern splits, losing or gaining MCIs 1-0 pattern unifies or disappears
SLIDE 63
SimCCT
SLIDE 64 SimCCT - Power Window MCC 3
!" #" !" $" %" &" '(" '!" '#" !" $" %" &" '(" '!" '#" )("
*'" *)" *+" *,"-"*!" ./01"2/3"405/26"3/"728"9::"
!"#"$#%&'()#*+,
!"#"$#%/0)#1$2",
45"$6%78,
SLIDE 65 !"# !$#
$# %# &# %# &# %# &#
!'# ()*+#,)-#.*/),0#-)#1,2#344#
%# &#
!%# !&#
SimCCT - Power Window MCC 2
SLIDE 66 !"#$
%&$ %"$
&'$ &($ &)$ &'$ &'$ &*$
%!$
&($ &)$ &*$ &($ &)$ &*$ !"#$
SimCCT - AVS MCC 7
SLIDE 67
MPE
near-miss clones
near-miss clone detection
Simone
analysis of GM models
pattern extraction
pattern evolution
SLIDE 68
current work
Stateflow models
deployment at GM
analysis of more systems
SLIDE 69
Thank you!
Manar H. Alalfi Thomas R. Dean Matthew Stephan Andrew Stevenson Joseph d’Ambrosio Cheryl Williams
SLIDE 70 James R. Cordy Queen’s University NECSIS Automotive Partnership Canada
Submodel Pattern Extraction for Simulink Models