Enterprise Information Mashups: Integrating Information, Simply - - PowerPoint PPT Presentation

enterprise information mashups
SMART_READER_LITE
LIVE PREVIEW

Enterprise Information Mashups: Integrating Information, Simply - - PowerPoint PPT Presentation

Enterprise Information Mashups: Integrating Information, Simply Anant Jhingran CTO, Information Management IBM IBM Confidential Outline Web 2.0 and Info 2.0 Example and the research problems we see IBM efforts in this area


slide-1
SLIDE 1

IBM Confidential

Enterprise Information Mashups:

Integrating Information, Simply Anant Jhingran CTO, Information Management IBM

slide-2
SLIDE 2

Outline

Web 2.0 and Info 2.0 Example and the research problems we see IBM efforts in this area Creativity v. Control

slide-3
SLIDE 3

1964: S/360 debuts 1971: First Intel Micro 1981: IBM PC 1994: Netscape Navigator 2000: Dot- com collapse

Information Technology Spend “had” been growing nicely

slide-4
SLIDE 4

Actual Application Architecture for Consumer Electronics Company

E01-EDI Data Warehouse (Interfaces to and from the Data Warehouse are not displayed on this diagram) G02 - General Ledger A05 - AP S01 - Sales Corrections I01 PO Receiving I03 Return to Vendor I06 Warehouse Management

Mainframe PC/NT apps Unix apps 3rd Party Interface

S06 - Credit App P15 EES Employee Change Notice

OTHER APPS - PC AP - Collections/Credit TM - Credit Card DB ACCTS REC APPS - PC 990COR Bad Debt Beneficial Fees Beneficial Reconcile JEAXF JEBFA JEBKA JEDVA JESOA JEVSA JEVSF NSF TeleCredit Fees INVENTORY CONTROL APPS - PC Code Alarm Debit Receivings Devo Sales Display Inventory In Home Junkouts Merchandise Withdrawal Promo Credits RTV Accrual Shrink AP Research - Inv Cntrl AP Research-Addl Rpts Book to Perpetual Inventory Close Out Reporting Computer Intelligence Data Count Corrections Cross Ref for VCB Dnlds Damage Write Off Debit Receivings DFI Vendor Database Display Inventory Reconcile Display Inventory Reporting INVENTORY CONTROL APPS - PC DPI/CPI IC Batching Inventory Adj/Count Correct Inventory Control Reports Inventory Levels Inventory Roll Merchandise Withdrawal Open Receivings PI Count Results PI Time Results from Inv Price Protection Sales Flash Reporting Shrink Reporting SKU Gross Margin SKU Shrink Level Detail USM VCB Downloads

Journal Entry Tool Kit Scorecard - HR L02-Resource Scheduling (Campbell) P09 - P17 Cyborg M02 - Millennium M03 - Millennium 3.0 Banks - ACH and Pos to Pay Cobra B01 - Stock Status S03-Polling P14 On-line New Hire Entry CTS Plan Administrators (401K, PCS, Life, Unicare, Solomon Smith Barney) D01 Post Load Billing I04 Home Deliveries I02 - Transfers Arthur Planning I07 Purchase Order I12 Entertainment Software I05 Inventory Info E13 E3 Interface S04 - Sales Posting V01-Price Management System I10 Cycle Physical Inventory I55 SKU Information K02 Customer Repair Tracking I35 Early Warning System B02 Merchandise Analysis I13- Auto Replenishment U18 - CTO Intercept I09 Cycle Counts E02-Employee Purchase Texlon 3.5 ACH Stock Options I17 Customer Perceived In-Stock U16-Texlon SiteSeer C02 - Capital Projects F06 - Fixed Assets US Bank Recon File Star Repair EDI Coordinator

Mesa Data NEW Soundscan NPD Group AIG Warranty Guard

Resumix Optika Store Budget Reporting P16 - Tally Sheet Cash Receipts/Credit S05 - House Charges Ad Expense L01-Promo Analysis V02-Price Marketing Support BMP - Bus performance Mngt Store Scorecard I11 Price Testing Valley Media P09 Bonus/HR I15 Hand Scan Apps Roadshow POS S08 - Vertex Sales Tax A04 - Cust Refund Chks Equifax ICMS Credit Cellular Rollover S09 - Digital Satellite System NPD, SoundScan Sterling VAN Mailbox (Value) I18 SKU Rep X92-X96 Host to AS400 Communication S02 - Layaways Washington, RGIS, Ntl Bus Systems V04-Sign System I14 Count Corrections NARM P01- Employee Masterfile I06 - Customer Order Frick Co UAR - Universal Account Reconciliation Depository Banks S07 - Cell Phones S11 - ISP Tracking AAS Fringe PO Cash Over/ Short L60 MDF Coop SKU Selection Tool SKU Performance Supplier Compliance

1

I35 - CEI ASIS Misc Accounting/Finance Apps - PC/NT COBA (Corp office Budget Assistant) PCBS(Profit Center Budget System) Merchandising Budget AIMS Merch Mngr Approval Batch Forcasting Ad Measurement AIMS Admin AIMS Reporting Ad Launcher V03- Mkt Reactions Spec Source CTO2.Bestbuy. com Rebate Transfer Sign System CopyWriter's Workspace ELT PowerSuite Store Monitor AIS Calendar Stores & Mrkts Due Dates Smart Plus Insertions Orders Budget Analysis Tool Print Costing Invoice App AIS Reports Broadcast Filter Smart Plus Launcher General Maintenance Printer PO Printer Maintenance Vendor Maintenance Vendor Setup Connect 3 Connect 3 Reports Connect 3 PDF Transfe Spec Source SKU Tracking S20-Sales Polling Prodigy PSP In-Home Repair Warranty Billing System Process Servers (Imaging)

Prepared by Michelle Mills

Over time, complexity got built into the IT systems

slide-5
SLIDE 5

Presentation Services

EDW Legacy Legacy Portals, Browsers, and or Devices

Strategic APPL

Event Processing

Tactical APPL Tx APPL App Server Discovery APPL Master Data APPL

Process Services

Information Integration Services

Analytic Services Master Data Services Transaction Application Services Analytic Application Services Business Process Management Federation Discovery Services

ECW

Content Services Collaboration Services

Notes Email

Enterprise Service Bus

Metadata Services

Master data Hubs

Product Customer Supplier Location

Transaction Services

OLTP2 OLTP1 OLTP

Business Rules Business Monitoring Streaming Batch Metadata

And using Information as a Strategic Asset to build better Architectures

slide-6
SLIDE 6

Open Innovation is Here to Stay, Exemplified by Web 2.0

But…

slide-7
SLIDE 7

Web 2.0 outside, and inside an enterprise will succeed only with a Info 2.0 Mashup Fabric

Info 2.0 Enables the same separation of “data” and “logic” that revolutionized the use of databases in the ’80’s. Web 2.0 Enables the same separation of “information” and “process” that is now happening in Web 1.5

slide-8
SLIDE 8

Within enterprises, it will…

Enable connections to information that does not make it into the enterprise IT Architectures: – Email – Presentations and Documents – External Data (Web) – Spreadsheets – Decision Support Datasets… And Enable it to be done “quickly”, as “assembly” as

  • pposed to as “programming”

EDW Legacy Legacy Portals, Browsers, and or Devices Strategic APPL Event Processing Tactical APPL Tx APPL App Server Discovery APPL Master Data APPL Process Services Information Integration Services Analytic Services Master Data Services Transaction Application Services Analytic Application Services Business Process Management Presentation Services Federation Discovery Services ECW Content Services Collaboration Services Notes Email Enterprise Service Bus Metadata Services Master data Hubs Pr

  • d

uct Cu sto me r Sup plier L

  • c

ati

  • n

Transaction Services OLTP2 OLTP1 OLTP Business Rules Business Monitoring Streaming Batch Metadata

slide-9
SLIDE 9

doc CM DB Files doc email

How the Architecture could play out…

ppt

IT Focus LOB Focus

Web 2.0 Info 2.0

SaaS Model Software Model Info 2.0 Fabric Info 2.0 Fabric Situational Apps Situational Apps Process Server/ESB Process Server/ESB Information Integration Information Integration

External Web

slide-10
SLIDE 10

http://water.usgs.gov/waterwatch/

(Zipcode)

edc.usgs.gov/

Example

(Geocode = Latitude/Longitude) (Geocode = Latitude/Longitude)

http://www.dotd.louisiana.gov/

(HUC = Hydrological Unit Code)

http://florida.maps.anant/

Meet Pete, an insurance agent in Florida. He sees a news report of a severe storm. What is the company’s risk? He needs to forward a risk summary to executives.

slide-11
SLIDE 11

Flood Risk Assessment Mashup

Mashup Search Mashup Search Report Report Standardize Standardize www.floodlevels.com www.floodlevels.com standardize policy XLS policy XLS water.usgs.gov water.usgs.gov edc.usgs.gov edc.usgs.gov dotd.florida.gov dotd.florida.gov Screen Scraping Screen Scraping Lineage Lineage Standardization Standardization

slide-12
SLIDE 12

So how can Pete write his mashup simply?

Simplicity Accuracy

slide-13
SLIDE 13

Simplicity Accuracy

Procedural Code

Procedural Code

<?php

// Get policy holders in a Policy object array $url = "file://policies/myclients.xsl"; $content = file_get_contents($url); $policyArr = getPolicy($content); // Find high risk zones $url = "http://www.floodlevels.com"; $content = file_get_contents($url); // Do screen scraping to extract high risk zones $zoneArr = findRiskyZones($content); // Initialize the return array $riskArr = array(); // Find corresponding policy holders for each city foreach ($policyArr as $policy) { if ($policy->amount < 250000) { continue; } // Standardize the address $policyZone = findZone($policy->address); // Check whether this policy affected foreach ($zoneArr as $zone) { if ($zone == $policyZone) { // This policy carries a high risk. // Insert into high risk array $riskArr [] = $policy; } } } // Send email to manager for high risk policies sendEmail("suzan@trustinsurance.com", "High risk policies", $riskArr);

?>

So how can Pete write his mashup simply?

slide-14
SLIDE 14

So how can Pete write his mashup simply?

Simplicity Accuracy

sendMail("suzan@trustinsurance.com", <highRiskPolicies> { for $i in url(“file://policies/myclients.xsl”) for $j in url("http://www.floodlevels.com”) where $i//amount > 250000 and $i//address in $j/zone return <policy> {$i} </policy> } </highRiskPolicies>);

Declarative Queries

slide-15
SLIDE 15

So how can Pete write his mashup simply?

Simplicity Accuracy

GUIs, Spreadsheets, Wikis

slide-16
SLIDE 16

So how can Pete write his mashup simply?

Simplicity Accuracy

Flood risk for homes in myclients.xsl worth

  • ver 250000

Search

How do we get there?

slide-17
SLIDE 17

Research Agenda

It is all about “simplicity” – do deep research and build deep technology, but make the job of application writer much easier! Much of our past research is applicable (including Information Manifold and its children), but new problems exist because of new target users.

slide-18
SLIDE 18

Info 2.0 Mashup Fabric needs to address these issues, over time

How to create such a Mashup? – Finding what exists, specifying what he wants, and creating what is needed (expressiveness vs. ease of use – DWIS vs. DWIM) How to integrate the information? – What is the minimal level of semantics that the Information 2.0 layer needs to have, and has the world evolved to make it easier now? How to deal with unstructured data? How do Mashups evolve?

slide-19
SLIDE 19

How does Pete find the floodlevels.com Mashup?

Pages on floodlevels.com are dynamically generated AJAX pages (produced by another mashup) Pete may have typed “Flood Levels Louisiana” into a search engine Similar to deep Web search problem, but now we have to deal with joins and other mashup operations, or even workflow Search has to understand the logic of the mashup

Web 2.0 magnifies the deep web search problem

slide-20
SLIDE 20

How does Pete specify his Mashup?

Pete is an insurance agent, not an expert Javascript

  • r PHP/Java/Ruby/etc. programmer

How does Pete specify a screen scraper if needed? How does Pete describe the Mashup flow? – Current mashups are a hodge-podge of application and data access – Similarity to ETL Flow – Is the answer an XQuery-like language for mashups,

  • r programming by example?

Web 2.0 needs simple methods to write mashups!

slide-21
SLIDE 21

Can he create the Mashup by giving an example?

slide-22
SLIDE 22

Could it have been even easier?

Could Pete’s mashup have been dynamically constructed when he searched for “flood levels for zipcodes 33101, 34106, etc.”? – Test of Time Award: “Information Manifold” Querying Heterogeneous Information Sources Using Source Descriptions by A. Halevy, A. Rajamaran, and

  • A. Ordille

– automatically finding the right sources based on query

Extend Information Manifold to dynamically create Mashups!

slide-23
SLIDE 23

How does one simplify “semantics”?

Helped by: – Microformats growing in popularity in the open community – Standardization services increasingly available – Master Data Management taking off in enterprises Issues: – Standardization is inherently uncertain. How is uncertainty handled? – Quality of services differ. How to track the lineage of both data and integration services? – Services vary in price. How to trade-off price, quality, and time? Search shows us some ways

slide-24
SLIDE 24

Issues in Unstructured Data

Everybody wants to run analytics on unstructured data, and create structured data, and then we are back in our favorite world. This poses two challenges: – Analytics are hard and require some fundamentally new techniques. – The extracted structured (meta-) data is inherently imprecise. But unstructured query systems have evolved to address this!

DATA QUERY/INTEGRATION

S U U S

Analytics Semantics

slide-25
SLIDE 25

In another Web 2.0 sense, how does this co-exist and augment social tagging?

Manual tagging – By Professionals

Costly Human resource intensive Cannot keep up Controlled vocabularies & standard taxonomies Higher quality

  • Example: ?

Cons Pros

Social Tagging – By Users

Ambiguity Uncontrolled vocabulary Synonyms User driven Emergent folksonomies Serpendipitous browsing

  • Examples: Del.icio.us and Flickr

Cons Pros

Automated Tagging – By Machine

Requires training

  • f models

Lower quality than manual tagging Learns from professional & user tagging Lower human cost

  • Example: Semantic tagging

Cons Pros Popularity Digital item Consumer content Deep archives, large personal collections High-value content & enterprise data sources “Long tail”

slide-26
SLIDE 26

Mashup Evolution

Mission Critical Best Effort, AdHoc Limited Time, Immediate Lots of Time Mashups

SCA Portals

New Initiatives Proof of Concept Line of Business IT Dept DataMart DataWarehouse

slide-27
SLIDE 27

Mashup Starter Kit – A Mashup Fabric for Intranet Applications being built @ IBM

XML/Atom/RSS Feed HTML Web Services Web Services Web Pages Web Pages XML/Atom/RSS Feeds XML/Atom/RSS Feeds Atom/RSS Store Atom/RSS Store

MAFIA MAFIA

Presentation Presentation Ingestion Ingestion Augmentation Augmentation

Fusion Union Standardization Transformation Feed Generation Screen Scraping Web Services

Lightweight Semantics

Enterprise IT Services External Data Services

slide-28
SLIDE 28

As my Mom Used to say (perhaps still says!) “How can you have any pudding if you don’t eat clean your feet?” (Apologies to Pink Floyd, “The Wall”)

slide-29
SLIDE 29

How do we unleash creativity, yet keep light control? Transform Create and Explore Assemble & Use

Unleash this Manage this

Web Content Departmental Content Personal Assets Enterprise Information

slide-30
SLIDE 30

Summary

Web style of architectures represent the next “sustainable” phase of IT spend The database research community can make a big difference! – Re-enable the separation of data and logic: Web 2.0 built on Info 2.0! New research problems exist – Ease of use and ad-hoc integration. – Bringing Unstructured and (semi-) structured data We at IBM are building such an Info 2.0 Fabric, targeting enterprise situational applications One of the biggest battles will be creativity vs. control