From MUD to MIRE: Managing Inherent Risk in the Enterprise Peter - PowerPoint PPT Presentation

From MUD to MIRE: Managing Inherent Risk in the Enterprise Peter J. Haas IBM Almaden Research Center San Jose, CA 1 MUD Workshop, September, 2010

The Two Perpetual Questions • “Where do the probabilities come from?” • “Who is going to use this stuff in the real world?” 2 MUD Workshop, September, 2010

My background in probabilistic DB 3 MUD Workshop, September, 2010

RAQA: Resolution-Aware Query Answering for Business Intelligence (Sismanis et al. 2009) • OLAP querying (datacubes: roll-up, drill-down) • Uncertainty due to entity City State Strict range Status resolution San Francisco CA [$30,$230] guaranteed San Jose CA [$70,$200] non-guaranteed • Bounds on query answers Sum(Sales) group by City,State • Implemented via SQL queries State Strict range Status • Conservative approach CA [$230,$230] guaranteed Sum(Sales) group by State 4 MUD Workshop, September, 2010

The MCDB System (with Chris Jermaine & students) i.i.d. samples from possible-worlds dist’n Random DB = D d 1 Q Schema Q( d 1 ) VG Functions Q Monte Carlo Q( d 2 ) d 2 Parameter : Generator Tables Q( d n ) ... Q( D ) = Q i.i.d. samples from Select SUM(sales) query-result AS t_sales d n dist’n ˆ E [ t_sales ] ˆ Var [ t_sales ] Many implementation tricks ˆ q .01 [ t_sales ] to ensure acceptable performance Monte Histogram Carlo Error bounds Estimator Inference 5 MUD Workshop, September, 2010

Query-Result Distributions Long tail in Delivery times Q1 Q2 80 60 Frequency Frequency 60 40 40 20 20 0 0 8.2 8.25 8.3 8.35 8.4 8.45 8.5 200 250 300 350 400 450 Revenue change Days until completion 9 x 10 Q3 Q4 40 80 Frequency Frequency 30 60 20 40 10 20 0 0 1.3375 1.338 1.3385 1.339 1.3395 1.34 1.3405 1.341 −8.842 −8.84 −8.838−8.836−8.834−8.832 −8.83 −8.828 Total supplier cost Additional profits 10 10 x 10 x 10 6 MUD Workshop, September, 2010

MC 3 : MapReduce + MCDB (Xu et al. 2009) High-level query www.jaql.org language for //code.google.com/p/jaql semi-structured JSON data Jaql Parallel batch processing Map-Reduce Hadoop HDFS Distributed File System Tricks to manage Pseudo-random numbers 7 MUD Workshop, September, 2010

Where do the probabilities come from? 8 MUD Workshop, September, 2010

Data-Warehouse Uncertainty Data Integration {John Smith, San Jose} Name City {John Smith, San Jose} ETL John Smith (SJ, 0.66), (LA, 0.33) {John Smith, Los Angeles} Name City City Sales Similarity Name Sales John Smith LA LA $50K ? (0.92) Join J. Smith $50K Information extraction Hotels (Michelakis et al., 2009) NY Marriott A lovely thing to behold System T is Paris Hilton in the ? (0.20) Hotel Annotator Paris Hilton Springtime … 09/09/2007 Re: system crash Source Problem type -------------------------- Text Miner This morning, my ORACLE Cust0385 (DBMS, 0.8), (OS, 0.2) system on LINUX exploded in a spectacular fireball … 9 MUD Workshop, September, 2010

Data-Warehouse Uncertainty – Cont’d Measurement Uncertainty f(t) Sensor_ID Temp (F) Sensor S23 78.32 t 78.32 f(t) Event Time System Monitor Buffer overflow 10/17/2007:18:20:02 t 10 MUD Workshop, September, 2010

Real-World Challenges with Data-Warehouse Uncertainty • People don’t like to admit that it exists! – Retailers view uncertainty as failure of security, supply chain management • IBM research relationship manager for retail – Law enforcement • Photo ID in meth dealer trial – Scientists pretend data is perfect: uncertainty undermines results • Hans-Joachim Lenz – Database vendors • Data “cleaning” products • Data warehouse may not even exist! – Ex: cancer data at medical center – Ex: tomato soup supply chain data 11 MUD Workshop, September, 2010

Stochastic Predictive Analytics on Big Data • Uncertain data describes future or hypothetical events – Based on complex, fine-grained stochastic model over big data – Minimizes denial problem • Intense recent interest in “business analytics” driven by – Need for low risk, quick payback projects (flexibility, low cost, fine data granularity) – Technical advances • Cloud computing • Software as a Service (SaaS) • Next generation tools, portals, visualization • Often with a spreadsheet front end – $8 Billion of such tools [Gnatovich06] – IBM services pricing • Lots of prototype activity – Fox/GreenPlum [Cohen09 MAD analytics paper] – VISA/IBM [Das10 SIGMOD paper] 12 MUD Workshop, September, 2010

Ex. 1: Portfolio Values Customer EuroCallOptions CustID OptionID NumShares … OptionID InitVal … StrikeP OVal John Smith 23 50 … 23 $2.35 … $4.00 ? … … … … … … … … SELECT SUM (c.NumShares * o.Val) Option value one month from now FROM Customer c, EuroCallOptions o (exercise date) WHERE c.OptionID = o.OptionID AND c.CustType = ‘Institutional’ Modified Black-Scholes model for European call option:         OVal max ( ) ,0 dV rVdt a V VdW V t S final Simulation approximation (Euler approach):          ( ) ( ) ( ) ( ) ( ) V t t V t rV t t a V t V t tZ Sample from j Normal dist’n 13 Also CMOs, etc. MUD Workshop, September, 2010

Ex. 2: Pricing Decisions Bayes Theorem price price Data for all Data for one customers customer demand demand Unit Order Amount CustID Price J. Smith $10.20 500 Global demand Individual demand … … … distribution (prior) distribution (posterior) • Can analyze arbitrary dynamically-defined customer segments when determining effect of price increase 14 MUD Workshop, September, 2010

Ex. 3: Individual Click Behavior (EBay) Click data for all EBay customers x 13 y 13 p 3 p 3 p 1 p 1 x 34 Data for one y 34 x 14 x 32 y 14 y 32 customer p 4 p 2 p 4 p 2 x 24 y 24 Global Markov model Individual Markov model distribution (Dirichelet prior) distribution (posterior) • Can analyze arbitrary dynamic customer segments when determining effect of changing EBay pages 15 MUD Workshop, September, 2010

Ex. 4: Clinic-Capacity Risk Medical data for all Stochastic Pharmacy data for all customers dosage model customers Cox hazard-rate disease model Clinic-resource demand model CustID Time period Resource needed Jane Smith June-Sept ? … … 16 MUD Workshop, September, 2010

MCDB: Improvement of Traditional Analytics Workflow Arena, R, Matlab,… Arena, R, Matlab,… Model Model Data reduction Analyst (PhD) Develops model Model fitting Model application & querying • Data extraction slow and bug-prone • Hard to re-link model results to DB • Only coarse-grained modeling • Hard to deal with data updates • No encapsulation for user • Sensitivity, what-if analysis are hard Goal: Integrate model with Database Model 17 MUD Workshop, September, 2010

Where do the probabilities come from? From stochastic predictive models over big data 18 MUD Workshop, September, 2010

Who is going to use this stuff in the real world? 19 MUD Workshop, September, 2010

Key Driver: Risk Management • Ex: Projected sales under SELECT SUM (s.amount) FROM SALES s, CUST c micromarketing campaign WHERE s.ID = c.ID • Ex: ERP AND c.city = ‘Los Angeles’ – # OS experts for help desk – Demand projected from historical Query-result text data (2x uncertainty) Loss distribution distribution probability probability – Provide principled safety factor • Regulatory pressure – Basel II, Solvency II • Business pressure expected expected 5% – Ex.: Energy Risk Professionals answer loss VaR Total LA sales Loss 20 MUD Workshop, September, 2010

Challenge: Decision-makers’ Poor Intuition About Risk Flaw of averages (weak form): Flaw of averages (strong form): Mean correct, Variance ignored Wrong value of mean: Sam Savage’s book f(E[X]) ≠ E[f(X)] (why we underestimate risk) 21 MUD Workshop, September, 2010

Examples • Red River (ND) flooding • Perishable Inventory (Red Lobster) • U.S. accounting standards (FASB) • Project completion time: 10 parallel tasks, E [ T i ] = 6 mo. • Data cleansing • Machine learning • Trio agg. paper “Expected to crest at 50 feet” (MUD 2008) $800 • Basic probability $600 cost $400 stock = E[demand] = 5 $200 0 2 4 6 8 10 22 demand MUD Workshop, September, 2010

Probability Management and Interactive Spreadsheets • DIST 1.1 standard – DIST = distribution string – IID Monte Carlo (multivariate) samples – Compressed, with metadata • Ensures correct, coherent risk computations throughout enterprise and beyond – E.g., Royal Dutch Shell Audit seal of • “Electricity network” for probability approval – Royal Dutch Shell, Merck Pharmaceutical, Oracle, Wells Fargo Bank, Bessemer Trust, and IBM • DISTs can be manipulated like numbers – Facilitates interactive spreadsheets (demo) 23 MUD Workshop, September, 2010

Demo 1 24 MUD Workshop, September, 2010

Demo 2 25 MUD Workshop, September, 2010

From MUD to MIRE: Managing Inherent Risk in the Enterprise Peter - PowerPoint PPT Presentation

From MUD to MIRE: Managing Inherent Risk in the Enterprise Peter J. Haas IBM Almaden Research Center San Jose, CA 1 MUD Workshop, September, 2010 The Two Perpetual Questions Where do the probabilities come from? Who

Mud Cat Innovation Quality Support Mud Cat MC-115D Mud Cat Reputation &

Adit Enterprise. Adit Enterprise. Adit Enterprise. Adit Enterprise. ADIT Enterprise is a

Mud Watcher Technology Dr. Reginald C Minton Director, Mosarric Serv. Mel Schmidt

VISION Creative Mixed Use District (MUD-2) I nformal Discussion to explore a new (MUD-2)

UNDERSTANDING THE ORIGIN OF UNDERSTANDING THE ORIGIN OF SIDOARJO MUD VOLCANO IN SIDOARJO MUD

Its time to go scoop me some mud! Its time to go scoop me some mud! STOP

Drainage & Flooding: MUD 286 & Lakewood Crossing HOA 19June 2018 Mike Rhodes VP MUD 286

Enterprise Risk Management and Culture Jai Ramaswamy Managing Vice President Enterprise Risk

Enterprise Applications Enterprise Systems Enterprise Systems Also called enterprise

Biological Control of the invasive Biological Control of the invasive New Zealand Mud Snail: New

PDE Backstepping Control Traffic Congestion Control: of Congested Traffic A PDE Backstepping

Potential for mud crab aquaculture in New Caledonia with emphasis on nutritional requirements

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

Presentation Mire Morinec Dean, Applied Technology, Business and Vacaville Center Perkins

Johnathan Croft & Kerry Alley VTrans Mapping Section What is MIRE? M odel I nventory of R

Monte Ca rlo Ana lysis of Monte Ca rlo Ana lysis of Unc e rta intie s in the Ne the rla nds

2008 I nvestor Meeting 2008 I nvestor Meeting February 19, 2008 Forward- -Looking Statements

Mapletree Commercial Trust Investor Presentation March 2016 1 Important Notice This

Planning 23 25 August 2017 Tagaytay Country Hotel Tagaytay City Five Tasks of Strategic

+ MA EEAC Assessment Dr. Jonathan Raab, Raab Associates, Ltd., with Pat Field, CBI November 12,

1 Incidents 30,000+ per year Victoria prone to regular bushfires 2 CFA responds to

Next Generation Multi-Purpose Microprocessor Presentation at MPSA, 4 th of November 2009

perform a dedicated narrow range of functions as part of large systems. An embedded system is

Student Conference 2016 Scientific Presentations Fabian Benduhn, Veit Kppen, Gunter Saake

From MUD to MIRE: Managing Inherent Risk in the Enterprise Peter - PowerPoint PPT Presentation

From MUD to MIRE: Managing Inherent Risk in the Enterprise Peter J. Haas IBM Almaden Research Center San Jose, CA 1 MUD Workshop, September, 2010 The Two Perpetual Questions Where do the probabilities come from? Who

Mud Cat Innovation Quality Support Mud Cat MC-115D Mud Cat Reputation &amp;

Adit Enterprise. Adit Enterprise. Adit Enterprise. Adit Enterprise. ADIT Enterprise is a

Mud Watcher Technology Dr. Reginald C Minton Director, Mosarric Serv. Mel Schmidt

VISION Creative Mixed Use District (MUD-2) I nformal Discussion to explore a new (MUD-2)

UNDERSTANDING THE ORIGIN OF UNDERSTANDING THE ORIGIN OF SIDOARJO MUD VOLCANO IN SIDOARJO MUD

Its time to go scoop me some mud! Its time to go scoop me some mud! STOP

Drainage &amp; Flooding: MUD 286 &amp; Lakewood Crossing HOA 19June 2018 Mike Rhodes VP MUD 286

Enterprise Risk Management and Culture Jai Ramaswamy Managing Vice President Enterprise Risk

Enterprise Applications Enterprise Systems Enterprise Systems Also called enterprise

Biological Control of the invasive Biological Control of the invasive New Zealand Mud Snail: New

PDE Backstepping Control Traffic Congestion Control: of Congested Traffic A PDE Backstepping

Potential for mud crab aquaculture in New Caledonia with emphasis on nutritional requirements

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

Presentation Mire Morinec Dean, Applied Technology, Business and Vacaville Center Perkins

Johnathan Croft &amp; Kerry Alley VTrans Mapping Section What is MIRE? M odel I nventory of R

Monte Ca rlo Ana lysis of Monte Ca rlo Ana lysis of Unc e rta intie s in the Ne the rla nds

2008 I nvestor Meeting 2008 I nvestor Meeting February 19, 2008 Forward- -Looking Statements

Mapletree Commercial Trust Investor Presentation March 2016 1 Important Notice This

Planning 23 25 August 2017 Tagaytay Country Hotel Tagaytay City Five Tasks of Strategic

+ MA EEAC Assessment Dr. Jonathan Raab, Raab Associates, Ltd., with Pat Field, CBI November 12,

1 Incidents 30,000+ per year Victoria prone to regular bushfires 2 CFA responds to

Next Generation Multi-Purpose Microprocessor Presentation at MPSA, 4 th of November 2009

perform a dedicated narrow range of functions as part of large systems. An embedded system is

Student Conference 2016 Scientific Presentations Fabian Benduhn, Veit Kppen, Gunter Saake

Mud Cat Innovation Quality Support Mud Cat MC-115D Mud Cat Reputation &

Drainage & Flooding: MUD 286 & Lakewood Crossing HOA 19June 2018 Mike Rhodes VP MUD 286

Johnathan Croft & Kerry Alley VTrans Mapping Section What is MIRE? M odel I nventory of R