IBM Confidential
Enterprise Information Mashups: Integrating Information, Simply - - PowerPoint PPT Presentation
Enterprise Information Mashups: Integrating Information, Simply - - PowerPoint PPT Presentation
Enterprise Information Mashups: Integrating Information, Simply Anant Jhingran CTO, Information Management IBM IBM Confidential Outline Web 2.0 and Info 2.0 Example and the research problems we see IBM efforts in this area
Outline
Web 2.0 and Info 2.0 Example and the research problems we see IBM efforts in this area Creativity v. Control
1964: S/360 debuts 1971: First Intel Micro 1981: IBM PC 1994: Netscape Navigator 2000: Dot- com collapse
Information Technology Spend “had” been growing nicely
Actual Application Architecture for Consumer Electronics Company
E01-EDI Data Warehouse (Interfaces to and from the Data Warehouse are not displayed on this diagram) G02 - General Ledger A05 - AP S01 - Sales Corrections I01 PO Receiving I03 Return to Vendor I06 Warehouse Management
Mainframe PC/NT apps Unix apps 3rd Party Interface
S06 - Credit App P15 EES Employee Change Notice
OTHER APPS - PC AP - Collections/Credit TM - Credit Card DB ACCTS REC APPS - PC 990COR Bad Debt Beneficial Fees Beneficial Reconcile JEAXF JEBFA JEBKA JEDVA JESOA JEVSA JEVSF NSF TeleCredit Fees INVENTORY CONTROL APPS - PC Code Alarm Debit Receivings Devo Sales Display Inventory In Home Junkouts Merchandise Withdrawal Promo Credits RTV Accrual Shrink AP Research - Inv Cntrl AP Research-Addl Rpts Book to Perpetual Inventory Close Out Reporting Computer Intelligence Data Count Corrections Cross Ref for VCB Dnlds Damage Write Off Debit Receivings DFI Vendor Database Display Inventory Reconcile Display Inventory Reporting INVENTORY CONTROL APPS - PC DPI/CPI IC Batching Inventory Adj/Count Correct Inventory Control Reports Inventory Levels Inventory Roll Merchandise Withdrawal Open Receivings PI Count Results PI Time Results from Inv Price Protection Sales Flash Reporting Shrink Reporting SKU Gross Margin SKU Shrink Level Detail USM VCB DownloadsJournal Entry Tool Kit Scorecard - HR L02-Resource Scheduling (Campbell) P09 - P17 Cyborg M02 - Millennium M03 - Millennium 3.0 Banks - ACH and Pos to Pay Cobra B01 - Stock Status S03-Polling P14 On-line New Hire Entry CTS Plan Administrators (401K, PCS, Life, Unicare, Solomon Smith Barney) D01 Post Load Billing I04 Home Deliveries I02 - Transfers Arthur Planning I07 Purchase Order I12 Entertainment Software I05 Inventory Info E13 E3 Interface S04 - Sales Posting V01-Price Management System I10 Cycle Physical Inventory I55 SKU Information K02 Customer Repair Tracking I35 Early Warning System B02 Merchandise Analysis I13- Auto Replenishment U18 - CTO Intercept I09 Cycle Counts E02-Employee Purchase Texlon 3.5 ACH Stock Options I17 Customer Perceived In-Stock U16-Texlon SiteSeer C02 - Capital Projects F06 - Fixed Assets US Bank Recon File Star Repair EDI Coordinator
Mesa Data NEW Soundscan NPD Group AIG Warranty GuardResumix Optika Store Budget Reporting P16 - Tally Sheet Cash Receipts/Credit S05 - House Charges Ad Expense L01-Promo Analysis V02-Price Marketing Support BMP - Bus performance Mngt Store Scorecard I11 Price Testing Valley Media P09 Bonus/HR I15 Hand Scan Apps Roadshow POS S08 - Vertex Sales Tax A04 - Cust Refund Chks Equifax ICMS Credit Cellular Rollover S09 - Digital Satellite System NPD, SoundScan Sterling VAN Mailbox (Value) I18 SKU Rep X92-X96 Host to AS400 Communication S02 - Layaways Washington, RGIS, Ntl Bus Systems V04-Sign System I14 Count Corrections NARM P01- Employee Masterfile I06 - Customer Order Frick Co UAR - Universal Account Reconciliation Depository Banks S07 - Cell Phones S11 - ISP Tracking AAS Fringe PO Cash Over/ Short L60 MDF Coop SKU Selection Tool SKU Performance Supplier Compliance
1I35 - CEI ASIS Misc Accounting/Finance Apps - PC/NT COBA (Corp office Budget Assistant) PCBS(Profit Center Budget System) Merchandising Budget AIMS Merch Mngr Approval Batch Forcasting Ad Measurement AIMS Admin AIMS Reporting Ad Launcher V03- Mkt Reactions Spec Source CTO2.Bestbuy. com Rebate Transfer Sign System CopyWriter's Workspace ELT PowerSuite Store Monitor AIS Calendar Stores & Mrkts Due Dates Smart Plus Insertions Orders Budget Analysis Tool Print Costing Invoice App AIS Reports Broadcast Filter Smart Plus Launcher General Maintenance Printer PO Printer Maintenance Vendor Maintenance Vendor Setup Connect 3 Connect 3 Reports Connect 3 PDF Transfe Spec Source SKU Tracking S20-Sales Polling Prodigy PSP In-Home Repair Warranty Billing System Process Servers (Imaging)
Prepared by Michelle MillsOver time, complexity got built into the IT systems
Presentation Services
EDW Legacy Legacy Portals, Browsers, and or Devices
Strategic APPL
Event Processing
Tactical APPL Tx APPL App Server Discovery APPL Master Data APPL
Process Services
Information Integration Services
Analytic Services Master Data Services Transaction Application Services Analytic Application Services Business Process Management Federation Discovery Services
ECW
Content Services Collaboration Services
Notes Email
Enterprise Service Bus
Metadata Services
Master data Hubs
Product Customer Supplier Location
Transaction Services
OLTP2 OLTP1 OLTP
Business Rules Business Monitoring Streaming Batch Metadata
And using Information as a Strategic Asset to build better Architectures
Open Innovation is Here to Stay, Exemplified by Web 2.0
But…
Web 2.0 outside, and inside an enterprise will succeed only with a Info 2.0 Mashup Fabric
Info 2.0 Enables the same separation of “data” and “logic” that revolutionized the use of databases in the ’80’s. Web 2.0 Enables the same separation of “information” and “process” that is now happening in Web 1.5
Within enterprises, it will…
Enable connections to information that does not make it into the enterprise IT Architectures: – Email – Presentations and Documents – External Data (Web) – Spreadsheets – Decision Support Datasets… And Enable it to be done “quickly”, as “assembly” as
- pposed to as “programming”
EDW Legacy Legacy Portals, Browsers, and or Devices Strategic APPL Event Processing Tactical APPL Tx APPL App Server Discovery APPL Master Data APPL Process Services Information Integration Services Analytic Services Master Data Services Transaction Application Services Analytic Application Services Business Process Management Presentation Services Federation Discovery Services ECW Content Services Collaboration Services Notes Email Enterprise Service Bus Metadata Services Master data Hubs Pr
- d
uct Cu sto me r Sup plier L
- c
ati
- n
Transaction Services OLTP2 OLTP1 OLTP Business Rules Business Monitoring Streaming Batch Metadata
doc CM DB Files doc email
How the Architecture could play out…
ppt
IT Focus LOB Focus
Web 2.0 Info 2.0
SaaS Model Software Model Info 2.0 Fabric Info 2.0 Fabric Situational Apps Situational Apps Process Server/ESB Process Server/ESB Information Integration Information Integration
External Web
http://water.usgs.gov/waterwatch/
(Zipcode)
edc.usgs.gov/
Example
(Geocode = Latitude/Longitude) (Geocode = Latitude/Longitude)
http://www.dotd.louisiana.gov/
(HUC = Hydrological Unit Code)
http://florida.maps.anant/
Meet Pete, an insurance agent in Florida. He sees a news report of a severe storm. What is the company’s risk? He needs to forward a risk summary to executives.
Flood Risk Assessment Mashup
Mashup Search Mashup Search Report Report Standardize Standardize www.floodlevels.com www.floodlevels.com standardize policy XLS policy XLS water.usgs.gov water.usgs.gov edc.usgs.gov edc.usgs.gov dotd.florida.gov dotd.florida.gov Screen Scraping Screen Scraping Lineage Lineage Standardization Standardization
So how can Pete write his mashup simply?
Simplicity Accuracy
Simplicity Accuracy
Procedural Code
Procedural Code
<?php
// Get policy holders in a Policy object array $url = "file://policies/myclients.xsl"; $content = file_get_contents($url); $policyArr = getPolicy($content); // Find high risk zones $url = "http://www.floodlevels.com"; $content = file_get_contents($url); // Do screen scraping to extract high risk zones $zoneArr = findRiskyZones($content); // Initialize the return array $riskArr = array(); // Find corresponding policy holders for each city foreach ($policyArr as $policy) { if ($policy->amount < 250000) { continue; } // Standardize the address $policyZone = findZone($policy->address); // Check whether this policy affected foreach ($zoneArr as $zone) { if ($zone == $policyZone) { // This policy carries a high risk. // Insert into high risk array $riskArr [] = $policy; } } } // Send email to manager for high risk policies sendEmail("suzan@trustinsurance.com", "High risk policies", $riskArr);
?>
So how can Pete write his mashup simply?
So how can Pete write his mashup simply?
Simplicity Accuracy
sendMail("suzan@trustinsurance.com", <highRiskPolicies> { for $i in url(“file://policies/myclients.xsl”) for $j in url("http://www.floodlevels.com”) where $i//amount > 250000 and $i//address in $j/zone return <policy> {$i} </policy> } </highRiskPolicies>);
Declarative Queries
So how can Pete write his mashup simply?
Simplicity Accuracy
GUIs, Spreadsheets, Wikis
So how can Pete write his mashup simply?
Simplicity Accuracy
Flood risk for homes in myclients.xsl worth
- ver 250000
Search
How do we get there?
Research Agenda
It is all about “simplicity” – do deep research and build deep technology, but make the job of application writer much easier! Much of our past research is applicable (including Information Manifold and its children), but new problems exist because of new target users.
Info 2.0 Mashup Fabric needs to address these issues, over time
How to create such a Mashup? – Finding what exists, specifying what he wants, and creating what is needed (expressiveness vs. ease of use – DWIS vs. DWIM) How to integrate the information? – What is the minimal level of semantics that the Information 2.0 layer needs to have, and has the world evolved to make it easier now? How to deal with unstructured data? How do Mashups evolve?
How does Pete find the floodlevels.com Mashup?
Pages on floodlevels.com are dynamically generated AJAX pages (produced by another mashup) Pete may have typed “Flood Levels Louisiana” into a search engine Similar to deep Web search problem, but now we have to deal with joins and other mashup operations, or even workflow Search has to understand the logic of the mashup
Web 2.0 magnifies the deep web search problem
How does Pete specify his Mashup?
Pete is an insurance agent, not an expert Javascript
- r PHP/Java/Ruby/etc. programmer
How does Pete specify a screen scraper if needed? How does Pete describe the Mashup flow? – Current mashups are a hodge-podge of application and data access – Similarity to ETL Flow – Is the answer an XQuery-like language for mashups,
- r programming by example?
Web 2.0 needs simple methods to write mashups!
Can he create the Mashup by giving an example?
Could it have been even easier?
Could Pete’s mashup have been dynamically constructed when he searched for “flood levels for zipcodes 33101, 34106, etc.”? – Test of Time Award: “Information Manifold” Querying Heterogeneous Information Sources Using Source Descriptions by A. Halevy, A. Rajamaran, and
- A. Ordille
– automatically finding the right sources based on query
Extend Information Manifold to dynamically create Mashups!
How does one simplify “semantics”?
Helped by: – Microformats growing in popularity in the open community – Standardization services increasingly available – Master Data Management taking off in enterprises Issues: – Standardization is inherently uncertain. How is uncertainty handled? – Quality of services differ. How to track the lineage of both data and integration services? – Services vary in price. How to trade-off price, quality, and time? Search shows us some ways
Issues in Unstructured Data
Everybody wants to run analytics on unstructured data, and create structured data, and then we are back in our favorite world. This poses two challenges: – Analytics are hard and require some fundamentally new techniques. – The extracted structured (meta-) data is inherently imprecise. But unstructured query systems have evolved to address this!
DATA QUERY/INTEGRATION
S U U S
Analytics Semantics
In another Web 2.0 sense, how does this co-exist and augment social tagging?
Manual tagging – By Professionals
Costly Human resource intensive Cannot keep up Controlled vocabularies & standard taxonomies Higher quality
- Example: ?
Cons Pros
Social Tagging – By Users
Ambiguity Uncontrolled vocabulary Synonyms User driven Emergent folksonomies Serpendipitous browsing
- Examples: Del.icio.us and Flickr
Cons Pros
Automated Tagging – By Machine
Requires training
- f models
Lower quality than manual tagging Learns from professional & user tagging Lower human cost
- Example: Semantic tagging
Cons Pros Popularity Digital item Consumer content Deep archives, large personal collections High-value content & enterprise data sources “Long tail”
Mashup Evolution
Mission Critical Best Effort, AdHoc Limited Time, Immediate Lots of Time Mashups
SCA Portals
New Initiatives Proof of Concept Line of Business IT Dept DataMart DataWarehouse
Mashup Starter Kit – A Mashup Fabric for Intranet Applications being built @ IBM
XML/Atom/RSS Feed HTML Web Services Web Services Web Pages Web Pages XML/Atom/RSS Feeds XML/Atom/RSS Feeds Atom/RSS Store Atom/RSS Store
MAFIA MAFIA
Presentation Presentation Ingestion Ingestion Augmentation Augmentation
Fusion Union Standardization Transformation Feed Generation Screen Scraping Web Services
Lightweight Semantics