When is a Clone not a Clone? (and vice-versa) Contextualized - - PowerPoint PPT Presentation

when is a clone not a clone
SMART_READER_LITE
LIVE PREVIEW

When is a Clone not a Clone? (and vice-versa) Contextualized - - PowerPoint PPT Presentation

When is a Clone not a Clone? (and vice-versa) Contextualized Analysis of Web Services Douglas Martin James R. Cordy Scott Grant David B. Skillicorn School of Computing Kingston, Canada Motivation The Personal Web Rapidly growing


slide-1
SLIDE 1

School of Computing Kingston, Canada

Contextualized Analysis of Web Services

James R. Cordy David B. Skillicorn Douglas Martin Scott Grant

When is a Clone not a Clone?

(and vice-versa)

slide-2
SLIDE 2

Motivation

— The Personal Web

— Rapidly growing number of web services makes it

increasingly difficult to find and choose the right ones

— Need a quick and convenient

way to find alternatives

— Hand tagging impractical –

automation is needed!

slide-3
SLIDE 3

— Automation

— Similarity detection techniques offer solutions! — Code clone detection from

software engineering research can find similar code fragments – why not similar services?

— Topic models from data mining

research can find text documents with similar semantics – why not similar services?

Motivation

slide-4
SLIDE 4

Web Service Similarity

— Web services are stored in

service registries, containing WSDL service description files

— Could apply clone detection to

entire service descriptions

— But what we really want are

similar service operations

slide-5
SLIDE 5

<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation>

<complexType name=“Stock”> <sequence> <element name=“Supplier” type=“xsd:string”/> <element name=“Warehouse” type=“xsd:string”/> <element name=“OnHand” type=“xsd:string”/> <element name=“OnOrder” type=“xsd:string”/> <element name=“Demand” type=“xsd:string”/> </sequence> </complexType >

Let’s try it!

<complexType name=“Stock”> <sequence> <element name=“date” type=“xsd:string”/> <element name=“open” type=“xsd:float”/> <element name=“high” type=“xsd:float”/> <element name=“low” type=“xsd:float”/> <element name=“close” type=“xsd:float”/> <element name=“volume” type=“xsd:float”/> </sequence> </complexType >

slide-6
SLIDE 6

<operation name=“DrawRateChartCustom”> <input message=“DrawRateChartCustomIn”/> <output message=“DrawRateChartCustomOut”/> </operation> <operation name="GetTopicBinaryChartCustom"> <input message="GetTopicBinaryChartCustomSoapIn"/> <output message="GetTopicBinaryChartCustomSoapOut"/> </operation>

How about these?

slide-7
SLIDE 7

So what went wrong?

— At this point we thought maybe our idea wasn’t

going to work — Maybe clone detection can’t help with

web service discovery?

— But why? What’s so special about WSDL?

slide-8
SLIDE 8

Web Service Description Language (WSDL)

— A WSDL service description has

3 main parts:

slide-9
SLIDE 9

Web Service Description Language (WSDL)

— A WSDL service description has

3 main parts:

— a <portType> element where the

  • perations are declared;
slide-10
SLIDE 10

Web Service Description Language (WSDL)

— A WSDL service description has

3 main parts:

— a <portType> element where the

  • perations are declared;

— <message> elements

corresponding to inputs, outputs and faults of the operations;

slide-11
SLIDE 11

Web Service Description Language (WSDL)

— A WSDL service description has

3 main parts:

— a <portType> element where the

  • perations are declared;

— <message> elements

corresponding to inputs, outputs and faults of the operations;

— and a <types> element

containing an XML Schema that defines the data and structure types used in the messages

slide-12
SLIDE 12

Web Service Description Language (WSDL)

— This simple example service

has two operations:

slide-13
SLIDE 13

Web Service Description Language (WSDL)

— This simple example service

has two operations:

— ReserveRoom

slide-14
SLIDE 14

Web Service Description Language (WSDL)

— This simple example service

has two operations:

— ReserveRoom — GetAvailableRooms

slide-15
SLIDE 15

Web Service Description Language (WSDL)

— WSDL service description files contain descriptions

  • f the operations that a web service has to offer

— But the pieces of each operation’s own description

are scattered over different parts of the WSDL file

— Difficult to identify complete units to analyze and

compare

slide-16
SLIDE 16

The Problem

— This poses a problem for analysis techniques:

— Operations cannot easily be compared for similarity

using clone detectors, because there are no contiguous fragments to compare

— And they cannot be analyzed using data mining topic

models, because there are no separate complete documents to generate a model from

slide-17
SLIDE 17

Our Solution

— Our solution is to contextualize the original

<operation> elements, to create self-contained

  • peration descriptions

— We use source transformation to inline remote

information from the context into the elements that reference or depend on them

— We call these contextualized WSDL operations

Web Service Cells, or WSCells — The first example of a new kind of clone detection:

contextual clones

slide-18
SLIDE 18

Contextualizing WSDL Operations

slide-19
SLIDE 19

Contextual Clone Detection

slide-20
SLIDE 20

An Experiment

— We have run an experiment to investigate the

difference between clone detection on WSCells and original raw operations — Two sets of WSDL service description files:

1,100 operations and 7,500 operations

— Compared NICAD clone detector results for each

set at various near-miss difference thresholds

0% = exact clone,

10% = 1 line in 10 different, and so on

slide-21
SLIDE 21

An Experiment

— Number of clones decreases with WSCells

Difference ¡ Threshold ¡ Clone ¡Pairs ¡in ¡Set ¡1 ¡ Clone ¡Pairs ¡in ¡Set ¡2 ¡ Originals ¡ WSCells ¡ Originals ¡ WSCells ¡ 0.0 ¡ 852 ¡ 705 ¡ 1434 ¡ 1066 ¡ 0.1 ¡ 852 ¡ 734 ¡ 1434 ¡ 1228 ¡ 0.2 ¡ 879 ¡ 775 ¡ 1438 ¡ 1637 ¡ 0.3 ¡ 884 ¡ 813 ¡ 1469 ¡ 1637 ¡

<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <complexType name=“Stock”> <sequence> <element name=“Supplier” type=“xsd:string”/> <element name=“Warehouse” type=“xsd:string”/> <element name=“OnHand” type=“xsd:string”/> <element name=“OnOrder” type=“xsd:string”/> <element name=“Demand” type=“xsd:string”/> </sequence> </complexType > <complexType name=“Stock”> <sequence> <element name=“date” type=“xsd:string”/> <element name=“open” type=“xsd:float”/> <element name=“high” type=“xsd:float”/> <element name=“low” type=“xsd:float”/> <element name=“close” type=“xsd:float”/> <element name=“volume” type=“xsd:float”/> </sequence> </complexType >

— Reduction in

false positives

slide-22
SLIDE 22

— Number of clone classes can increase with WSCells

Difference ¡ Threshold ¡ Clone ¡Classes ¡in ¡Set ¡1 ¡ Clone ¡Classes ¡in ¡Set ¡2 ¡ Originals ¡ WSCells ¡ Originals ¡ WSCells ¡ 0.0 ¡ 169 ¡ 187 ¡ 587 ¡ 433 ¡ 0.1 ¡ 169 ¡ 139 ¡ 587 ¡ 499 ¡ 0.2 ¡ 172 ¡ 142 ¡ 589 ¡ 631 ¡ 0.3 ¡ 171 ¡ 136 ¡ 591 ¡ 631 ¡

An Experiment

<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <complexType name=“Stock”> <sequence> <element name=“Supplier” type=“xsd:string”/> <element name=“Warehouse” type=“xsd:string”/> <element name=“OnHand” type=“xsd:string”/> <element name=“OnOrder” type=“xsd:string”/> <element name=“Demand” type=“xsd:string”/> </sequence> </complexType > <complexType name=“Stock”> <sequence> <element name=“date” type=“xsd:string”/> <element name=“open” type=“xsd:float”/> <element name=“high” type=“xsd:float”/> <element name=“low” type=“xsd:float”/> <element name=“close” type=“xsd:float”/> <element name=“volume” type=“xsd:float”/> </sequence> </complexType >

— Splits by deeper

differences – more precision

slide-23
SLIDE 23

Clone Detection for Web Services

— Contextual clone detection with WSCells works! — Not only finds similar web service operations,

but uncovers similar operations we could not find in any other way

<operation name=“DrawRateChartCustom”> <input message=“DrawRateChartCustomIn”/> <output message=“DrawRateChartCustomOut”/> </operation> <operation name="GetRealChartCustom"> <input message="GetRealChartCustomSoapIn"/> <output message="GetRealChartCustomSoapOut"/> </operation> <operation name="GetLastSaleChartCustom"> <input message="GetLastSaleChartCustomSoapIn"/> <output message="GetLastSaleChartCustomSoapOut"/> </operation> <operation name=“DrawYieldCurveCustom”> <input message=“DrawYieldCurveCustomIn”/> <output message=“DrawYieldCurveCustomOut”/> </operation> <operation name="GetTopicChartCustom"> <input message="GetTopicChartCustomSoapIn" /> <output message="GetTopicChartCustomSoapOut" /> </operation> <operation name="GetTopicBinaryChartCustom"> <input message="GetTopicBinaryChartCustomSoapIn"/> <output message="GetTopicBinaryChartCustomSoapOut"/> </operation>

slide-24
SLIDE 24

Semantic Analysis of Web Services

— Contextualized WSCells also make it possible to use

data mining topic models to do semantic analysis

  • f web services

— Because they provide self-contained documents of

significant size

— Might topic models provide a different view

  • f web service similarity?
slide-25
SLIDE 25

Latent Dirichlet Allocation

— Latent Dirichlet Allocation (LDA) :

— A statistical model to uncover latent topics — Identifies the correlation between documents in

terms of shared latent topics (sets of tokens)

— Accepts a set of documents (e.g., source files) as

input, returns probability distributions over inferred topics (a topic model) as output — Each document has some probability of being related

to topic 1, another probability for topic 2, and so on

— Similar documents should be related to similar topics

slide-26
SLIDE 26

Latent Dirichlet Allocation

— Documents are represented in the model in terms

  • f probability distributions over topics

— Similarity between documents is found using the

Hellinger Distance — A measure of how much agreement there is between

the shared topics of two documents

— Almost identical documents have a small Hellinger

Distance since they will be related to the same topics

— In terms of web services, small Hellinger Distances

indicate highly related operations

slide-27
SLIDE 27

Evaluating WSCells

— To evaluate the use of WSCells with LDA, we :

— Generate an LDA model for the original <operation>

elements, and another for the contextualized WSCells

— Explore the Global and Local Similarity between each

pair of operations in the models

— Global Similarity an overall view of the most closely

related web service operations in the service set

— Local Similarity a per-operation view of the other

most related web service operations for each

  • peration
slide-28
SLIDE 28

Global Similarity

— We look at Global Similarity using a visualization

called Bluevis

— Bluevis shows the global conceptual structure of a

system by highlighting similar operations using an illuminated line from left-to-right — Plot some top fraction of similar operations

(top 25,000 in our examples)

— Use a consistently ordered list of web service

  • perations for the LDA model to view the differences

— If a display is noisy, it is often an indication that the

model is not identifying meaningful data

slide-29
SLIDE 29

Global Similarity

slide-30
SLIDE 30

Global Similarity

— For original raw operations:

— Bluevis highlights the LDA

most similar operations

— Some clear structure — However, most of this is

due to shared keywords, like get and SOAP

— This uncontextualized

model has very little value

slide-31
SLIDE 31

Global Similarity

slide-32
SLIDE 32

Global Similarity

— For contextualized WSCells:

— A clearer semantic

structure, less noise overall

— Operation similarity

becomes meaningful

— Services with semantic

similarity discovered — E.g., Operations with

similar parameters or faults, such as those that manipulate holiday dates

  • r financial rates
slide-33
SLIDE 33

Local Similarity

— We can also examine the local similarity for each

individual operation — Identify the complete ordered list of similarity scores

for an operation in the data set

— Using the top similarity scores, evaluate how

meaningful the data is from a user's perspective — For example, how can I find the most similar web

service operations to the one I am using now?

— We use a tool called POCO (Pairwise Observation of

Concepts) to examine the most similar operations

slide-34
SLIDE 34

Local Similarity

slide-35
SLIDE 35

Local Similarity

Operation Most similar WSCell Most similar original raw WSDL operation ListFinancials GetFinancialServicesFromList LanguagesList ExportShipsAndCategories ExportIteneraryAndSteps Search GetIssueData GetFlightData word_cloud GetWeatherReport GetWeather GetIndices GetAIDIBOR GetTRLIBOR GetCarriers searchByIdentifier searchByNameAndAddress GetLastSecurityHeadlines ToolsAndHardwareBox KitchenAndHousewareBox ListRenditions GetReservations GetRoomAvailabilityForDay GetSOFIBOR GetOtherProductInfo NextOtherProductPortion GetParkingInfo GetAllSplitsByExchange GetAllCashDividendsByExchange GetTeamLoyalties2

slide-36
SLIDE 36

Summary

— Very-high-level domain-specific languages such as

WSDL make poor targets for similarity analysis using clone detection and topic models — Lack of local context prevents meaningful results

— Contextualizing using WSCells exposes both cloning

and semantic relationships between web operations — Clone detection of WSCells identifies similar web

service operations

— Topic models of WSCells expose both global

system-wide semantic relationships and local individual relationships between operations

slide-37
SLIDE 37

Current & Future

— Continue analysis of web services for the Personal

Web using our results

— Apply contextualization to similarity analysis of

  • ther modeling and specification languages

(currently Simulink, Stateflow and UML sequence diagrams)

— Experiment with effect of contextualization on

clone and topic model analysis of traditional languages such as Java and C (“contextual clones”)

slide-38
SLIDE 38

James R. Cordy David B. Skillicorn Douglas Martin Scott Grant

Questions?

Contextualized Analysis of Web Services

When is a Clone not a Clone?

(and vice-versa)