Discovering and Ranking Web Services with BASIL: A Personalized - PowerPoint PPT Presentation

Discovering and Ranking Web Services with BASIL: A Personalized Approach with Biased Focus James Caverlee , Ling Liu, and Daniel Rocco College of Computing Georgia Institute of Technology ICSOC 2004

Categorization-based Service Discovery � Find all stock ticker services: ICSOC 2004

Categorization-based Service Discovery � The UDDI approach � Group services based on common properties – All stock ticker services – All services offered by New York companies – ... � A user can search on properties or browse the registry to find candidate matches ICSOC 2004

Personalized Relevance-Based Service Discovery � Identify services based on their relationships to other services – Not supported by today’s registries � Sample discovery tasks: – Find the top-ten services that offer more coverage than the BLAST services at NCBI – Which medical literature sites are more specialized than PubMed – … ICSOC 2004

Personalized Relevance-Based Service Discovery More general NCBI 1 2 3 More specialized ICSOC 2004

Techniques for Service Discovery and Ranking � Based on communities – Reputation systems – PageRank-style (?) � Schema/Interface matching – Find the services with similar inputs, outputs � Semantic matching – Using a markup like OWL � Instance/data matching – Use the data that the service provides to better understand the service – Use that data to compare across services ICSOC 2004

Our Solution: BASIL � B i A sed S ervice d I scovery a L gorithm � Three key components: Source-Biased Probing – Evaluation and ranking of services with Biased Focus – Identification of interesting relationships based on bi-lateral – evaluation of biased focus � Focuses on the nature and degree of topical relevance � Avoids significant human intervention or hand-tuned categorization schemes ICSOC 2004

We focus on one type of web service � Data-intensive web services – Access to huge amounts of data – Tools for searching, manipulating, and analyzing data – Examples: Amazon, Google, Lifesciences resources like BLAST (genetic sequence search) � Unlike transactional services (e.g. for purchasing a box of pencils) ICSOC 2004

Modeling Data-Intensive Web Services � Service Summary – Bag-of-words model – XML Tags and Text � ActualSummary(S i ) = {(t 1 ,w 1 ), (t 2 ,w 2 ), …, (t N ,w N )} arthritis, 3912 bacteria, 2450 cancer, 4201 drug, 989 PubMed … Summary(PubMed) ICSOC 2004

Estimating Service Summaries � Query-based Sampling [Callan ’99] Send a query; retrieve top-m documents; repeat until stopping – condition reached EstSummary(PubMed) [only a fraction of all terms in Actual – Summary] Over text databases, need ~300 docs for high-quality estimated – summaries � Good at generating overall summaries � But not necessarily good for comparing summaries (see paper) Intuition: a service with broad coverage (like Google) will have few – terms in common with a service with narrow coverage (like PubMed) ICSOC 2004

Source-Biased Probing � Bias the estimate of the target towards the source of bias – EstSummary PubMed (Google) vs. EstSummary(Google) � Hone in on what Google has in common with PubMed �� ICSOC 2004

Source-Biased Probing ICSOC 2004

Probe Selection � Uniform random selection – Prob(selecting term j) = 1 / N’ � Weighted random selection – Prob(selecting term j) = w j / Sum i (w i ) � Weight-based selection – Select terms that occur the most times in all documents – Select terms that occur in the most documents � Focal term probing ICSOC 2004

Probing with Focal Terms � Instead of treating a source as a single collection of candidate probe terms, let’s try to break the source up into rough groups of co-occurring terms � Cluster terms (not documents) – Term j = {(doc 1 ,w j1 ), …, (doc M ,w jM )} � Use off-the-shelf clustering algorithm to find k focal term groups – Simple KMeans, in this case ICSOC 2004

Probing with Focal Terms (2) � Use round-robin selection to choose a probe from each focal term group ICSOC 2004

Evaluating and Ranking Services � Biased Focus – Captures the topical focus of a target on the source � focus source (Target) � Should range from 0 (no focus) to 1 (complete focus) � Not a symmetric measure; for example: – focus PubMed (Google) = high – focus Google (PubMed) = low ICSOC 2004

Cosine-Based Biased Focus � Cosine ESummary( � ) – normalized inner product – Independent of the vector θ θ θ θ length ESummary � ( � ) Other metrics discussed in the paper ICSOC 2004

Identifying Interesting Relationships � Consider two services: A and B � Evaluate their relationship by understanding the focus of each with respect to the other – focus B (A) and focus A (B) � Relies on a family of lambda-parameters � Example: – Let lambda_high = 0.9 – if focus B (A) > 0.9 and focus A (B) > 0.9, then A and B are lambda- equivalent � Of course, determining the appropriate lambda is tricky! ICSOC 2004

Experimental setup � Two datasets: – Newsgroups � 780 collections � 100-16,000 documents in each � 2.5GB total – Web collection – ‘in the wild’ � 50 real web databases � 50 docs collected from each ICSOC 2004

Probing Efficiency ICSOC 2004

SBP Identifies High Quality Documents ICSOC 2004

Precision For 10 Source Newsgroups ICSOC 2004

Ranking Web Sources ICSOC 2004

Relationships Relative to PubMed More in the paper! ICSOC 2004

Conclusions � Introduced techniques to support Personalized Relevance-Based Service Discovery – Source-biased probing � Focal term probing – Source-biased ranking (with biased focus) – Identification of relationships ICSOC 2004

Open issues � Exploiting structure – E.g. for schema matching, use of ontologies, etc. � More advanced probing techniques � Fine-grained inter-service analysis � Better understanding of complex service computations (e.g. correlating input to output) � Could extend this “personalization” approach to consider other factors as well ICSOC 2004

Thank You! ICSOC 2004

Discovering and Ranking Web Services with BASIL: A Personalized - PowerPoint PPT Presentation

Discovering and Ranking Web Services with BASIL: A Personalized Approach with Biased Focus James Caverlee , Ling Liu, and Daniel Rocco College of Computing Georgia Institute of Technology ICSOC 2004 Categorization-based Service Discovery

Basil Grand Vert Mint Purple Basil Spicy Mint Basil Latino Persil

BASIL TRIALS BSIR, Birmingham, 2 November 2017 Introduction Professor Andrew Bradbury BASIL-2

How Plants Help People Comenius A Recipe for 21st Century Life 2014 / 2015 March 2014 Basil

Westat Basil ACASI Using Basil for ACASI at Westat From custom to COTS G J Boris Allan, Peter

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

BASIL TRIALS Investigators Meeting, VSGBI, Manchester, 23 November 2017 Introduction

Impressions of Basil Impressions of Basil by Kathleen OReagan, Richard Frey and Karen

The Economics of The Economics of Maori Fishing Maori Fishing Basil Sharp Basil Sharp

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Web Services Serge Abiteboul INRIA-Futurs Web services 2002 1 Abstract Web services

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Overview 1 Agenda Evolution of network computing What is Web Services? Why Web

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Example: Travelling Salesperson Problem Start with any complete tour, perform pairwise exchanges

Conference and Candidate Forum Thanks to the Fall Conference Committee: Toni Schneider, Russ

Conference and Candidate Forum Thanks to the Fall Conference Committee: Toni Schneider, Russ

Evaluation of the Current State of HSR in Australia and New Zealand Suzanne Robinson on behalf

Gene set testing in limma COMBINE RNA-seq Workshop Why? Sometimes after differential

C++ Roast PRESENTED BY TIM STRAUBINGER Todays Agenda A Brief History of C++ Gentle

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content

Computer Scientists and the Law: Technical leadership on public policy and ethics challenges of