School of Computing Kingston, Canada
Contextualized Analysis of Web Services
James R. Cordy David B. Skillicorn Douglas Martin Scott Grant
When is a Clone not a Clone? (and vice-versa) Contextualized - - PowerPoint PPT Presentation
When is a Clone not a Clone? (and vice-versa) Contextualized Analysis of Web Services Douglas Martin James R. Cordy Scott Grant David B. Skillicorn School of Computing Kingston, Canada Motivation The Personal Web Rapidly growing
School of Computing Kingston, Canada
James R. Cordy David B. Skillicorn Douglas Martin Scott Grant
increasingly difficult to find and choose the right ones
way to find alternatives
automation is needed!
software engineering research can find similar code fragments – why not similar services?
research can find text documents with similar semantics – why not similar services?
<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation>
<complexType name=“Stock”> <sequence> <element name=“Supplier” type=“xsd:string”/> <element name=“Warehouse” type=“xsd:string”/> <element name=“OnHand” type=“xsd:string”/> <element name=“OnOrder” type=“xsd:string”/> <element name=“Demand” type=“xsd:string”/> </sequence> </complexType >
<complexType name=“Stock”> <sequence> <element name=“date” type=“xsd:string”/> <element name=“open” type=“xsd:float”/> <element name=“high” type=“xsd:float”/> <element name=“low” type=“xsd:float”/> <element name=“close” type=“xsd:float”/> <element name=“volume” type=“xsd:float”/> </sequence> </complexType >
<operation name=“DrawRateChartCustom”> <input message=“DrawRateChartCustomIn”/> <output message=“DrawRateChartCustomOut”/> </operation> <operation name="GetTopicBinaryChartCustom"> <input message="GetTopicBinaryChartCustomSoapIn"/> <output message="GetTopicBinaryChartCustomSoapOut"/> </operation>
web service discovery?
a <portType> element where the
a <portType> element where the
<message> elements
corresponding to inputs, outputs and faults of the operations;
a <portType> element where the
<message> elements
corresponding to inputs, outputs and faults of the operations;
and a <types> element
containing an XML Schema that defines the data and structure types used in the messages
ReserveRoom
ReserveRoom GetAvailableRooms
are scattered over different parts of the WSDL file
compare
using clone detectors, because there are no contiguous fragments to compare
models, because there are no separate complete documents to generate a model from
information from the context into the elements that reference or depend on them
contextual clones
1,100 operations and 7,500 operations
set at various near-miss difference thresholds
0% = exact clone,
10% = 1 line in 10 different, and so on
Difference ¡ Threshold ¡ Clone ¡Pairs ¡in ¡Set ¡1 ¡ Clone ¡Pairs ¡in ¡Set ¡2 ¡ Originals ¡ WSCells ¡ Originals ¡ WSCells ¡ 0.0 ¡ 852 ¡ 705 ¡ 1434 ¡ 1066 ¡ 0.1 ¡ 852 ¡ 734 ¡ 1434 ¡ 1228 ¡ 0.2 ¡ 879 ¡ 775 ¡ 1438 ¡ 1637 ¡ 0.3 ¡ 884 ¡ 813 ¡ 1469 ¡ 1637 ¡
<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <complexType name=“Stock”> <sequence> <element name=“Supplier” type=“xsd:string”/> <element name=“Warehouse” type=“xsd:string”/> <element name=“OnHand” type=“xsd:string”/> <element name=“OnOrder” type=“xsd:string”/> <element name=“Demand” type=“xsd:string”/> </sequence> </complexType > <complexType name=“Stock”> <sequence> <element name=“date” type=“xsd:string”/> <element name=“open” type=“xsd:float”/> <element name=“high” type=“xsd:float”/> <element name=“low” type=“xsd:float”/> <element name=“close” type=“xsd:float”/> <element name=“volume” type=“xsd:float”/> </sequence> </complexType >
Difference ¡ Threshold ¡ Clone ¡Classes ¡in ¡Set ¡1 ¡ Clone ¡Classes ¡in ¡Set ¡2 ¡ Originals ¡ WSCells ¡ Originals ¡ WSCells ¡ 0.0 ¡ 169 ¡ 187 ¡ 587 ¡ 433 ¡ 0.1 ¡ 169 ¡ 139 ¡ 587 ¡ 499 ¡ 0.2 ¡ 172 ¡ 142 ¡ 589 ¡ 631 ¡ 0.3 ¡ 171 ¡ 136 ¡ 591 ¡ 631 ¡
<operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <operation name="GetStock" > <input message="tns:GetStockRequest" /> <output message="tns:GetStockResponse" /> </operation> <complexType name=“Stock”> <sequence> <element name=“Supplier” type=“xsd:string”/> <element name=“Warehouse” type=“xsd:string”/> <element name=“OnHand” type=“xsd:string”/> <element name=“OnOrder” type=“xsd:string”/> <element name=“Demand” type=“xsd:string”/> </sequence> </complexType > <complexType name=“Stock”> <sequence> <element name=“date” type=“xsd:string”/> <element name=“open” type=“xsd:float”/> <element name=“high” type=“xsd:float”/> <element name=“low” type=“xsd:float”/> <element name=“close” type=“xsd:float”/> <element name=“volume” type=“xsd:float”/> </sequence> </complexType >
differences – more precision
<operation name=“DrawRateChartCustom”> <input message=“DrawRateChartCustomIn”/> <output message=“DrawRateChartCustomOut”/> </operation> <operation name="GetRealChartCustom"> <input message="GetRealChartCustomSoapIn"/> <output message="GetRealChartCustomSoapOut"/> </operation> <operation name="GetLastSaleChartCustom"> <input message="GetLastSaleChartCustomSoapIn"/> <output message="GetLastSaleChartCustomSoapOut"/> </operation> <operation name=“DrawYieldCurveCustom”> <input message=“DrawYieldCurveCustomIn”/> <output message=“DrawYieldCurveCustomOut”/> </operation> <operation name="GetTopicChartCustom"> <input message="GetTopicChartCustomSoapIn" /> <output message="GetTopicChartCustomSoapOut" /> </operation> <operation name="GetTopicBinaryChartCustom"> <input message="GetTopicBinaryChartCustomSoapIn"/> <output message="GetTopicBinaryChartCustomSoapOut"/> </operation>
significant size
terms of shared latent topics (sets of tokens)
to topic 1, another probability for topic 2, and so on
the shared topics of two documents
Distance since they will be related to the same topics
indicate highly related operations
elements, and another for the contextualized WSCells
pair of operations in the models
(top 25,000 in our examples)
model is not identifying meaningful data
most similar operations
due to shared keywords, like get and SOAP
structure, less noise overall
becomes meaningful
similar parameters or faults, such as those that manipulate holiday dates
for an operation in the data set
service operations to the one I am using now?
Operation Most similar WSCell Most similar original raw WSDL operation ListFinancials GetFinancialServicesFromList LanguagesList ExportShipsAndCategories ExportIteneraryAndSteps Search GetIssueData GetFlightData word_cloud GetWeatherReport GetWeather GetIndices GetAIDIBOR GetTRLIBOR GetCarriers searchByIdentifier searchByNameAndAddress GetLastSecurityHeadlines ToolsAndHardwareBox KitchenAndHousewareBox ListRenditions GetReservations GetRoomAvailabilityForDay GetSOFIBOR GetOtherProductInfo NextOtherProductPortion GetParkingInfo GetAllSplitsByExchange GetAllCashDividendsByExchange GetTeamLoyalties2
service operations
system-wide semantic relationships and local individual relationships between operations
James R. Cordy David B. Skillicorn Douglas Martin Scott Grant