Enterprise Data Unification Powered by Machine Learning Jerome - - PowerPoint PPT Presentation

enterprise data unification powered by machine learning
SMART_READER_LITE
LIVE PREVIEW

Enterprise Data Unification Powered by Machine Learning Jerome - - PowerPoint PPT Presentation

Enterprise Data Unification Powered by Machine Learning Jerome Gransac Confidential Confidential HI-IS Big Data Event Intro Tamr Overview Background Use Cases & Customer Success Technology & Differentiation


slide-1
SLIDE 1

Confidential Confidential

Enterprise Data Unification Powered by Machine Learning

Jerome Gransac

slide-2
SLIDE 2

Confidential Confidential

HI-IS Big Data Event

  • Intro
  • Tamr Overview

○ Background ○ Use Cases & Customer Success ○ Technology & Differentiation

  • Product Demo
slide-3
SLIDE 3

Confidential

Reality: Constant change & entropy have created “Random Data Salad”

State of the Data in the Enterprise: Large Data Debt

Restructuring Leadership Changes Politics Dynamic Schema DBs - Mongo et al “Data Hoarding” Legacy Burden M&A

Requirement: Accurate, up-to-date, curated view of core business entities

Customers Suppliers Products Parts Transactions [...]

Problem: 1. Too much time spent on data prep vs. analysis & action 2. High failure rate of BI & analytics projects 3. Game-changing initiatives deemed ‘impossible’ and never start

slide-4
SLIDE 4

Confidential

Tamr At A Glance

Tamr solutions unify enterprise data by combining machine learning and human expertise to power transformational analytic and operational outcomes.

Headquarters: Cambridge, MA Additional Offices: San Francisco, London Founded: 2013

Key Founders Company Overview

  • Dr. Michael Stonebraker

Co-Founder & CTO Previously: Founder, Ingres/Postgres, HP Vertica Andy Palmer Co-Founder & CEO Previously: Founder & CEO, HP Vertica

slide-5
SLIDE 5

Confidential

What Tamr Does: Enterprise Data Unification

$500M+ Savings

From sourcing analytics across siloed business

10x Reduction

In new data set integration from 6 months to 2 weeks

1,500+ Studies

Unified clinical study data to empower researchers

Customer Insights

Unified buyer profiles across siloed dealer systems in 30+ geos

Tamr uses machine learning to attack the enterprise data variety problem to power transformative analytic and

  • perational outcomes

Spend Classification

From pilot to live on Google Cloud Platform in 6 weeks

Inventory Optimization

$100M in reduced inventory by harmonizing parts across 5 fleets

Product Sales Insights

Unifying product sales data from distributors to enable new analytics

Video Case Study Video Case Study Video Case Study Case Study Video Case Study

Well Productivity

Integrating disparate data related to wells to optimize productivity

slide-6
SLIDE 6

Confidential

Tamr Unify: Platform Overview

Internal Data External Data

COMBINE CONSOLIDATE CLASSIFY

Machine Learning Expert Input

Schema Mapping Classification Record Matching Microservices: RESTful APIs

BI / Analytics Data Wrangling Custom Apps Automated Integration Source Remediation

slide-7
SLIDE 7

Confidential Unified Name Supplier Source1: Vendor_Name Source2: Supplier Source3: LFA1-MANDT Source4: CompanyName SourceN: Nom_de_Cie

Tamr’s Three Core Capabilities in Action

Building Data Integration Probabilistic Models for Mastering & Classification

  • Combine thousands of sources
  • Extract signal from underlying data

to drive mapping decisions

  • Increase levels of mapping

automation as sources increase

slide-8
SLIDE 8

Confidential Husky Manufacturing 342 Suite 34 KY Allied Husky Incorporated 342 Main St KY, USA

=

  • r

Tamr’s Three Core Capabilities in Action

Building Data Integration Probabilistic Models for Mastering & Classification

  • Automatically group similar records

using machine learning

  • Infuse expert feedback easily and

quickly for maximum accuracy

  • Create ‘golden records’
slide-9
SLIDE 9

Confidential

Should “Husky #.25 J Blt” in table “Invoices” be categorized as Hardware > Fasteners > Bolt

Tamr’s Three Core Capabilities in Action

Building Data Integration Probabilistic Models for Mastering & Classification

  • Automatically map records into any

taxonomy

  • Leverage standard or custom n-tier

classification scheme

  • Easily adjust as needs evolve
slide-10
SLIDE 10

Confidential Confidential

Use Cases & Customer Success

slide-11
SLIDE 11

Confidential

Case Study: General Electric

Agile, multi-domain entity mastering drives $500M+ in Value

“We’ve seen firsthand the transformative results that Tamr’s technology has on an enterprise of GE’s scale. When the cost and complexity of bringing together enterprise datasets is massively reduced, the resulting analytic breakthroughs create opportunities that were previously inaccessible.” Lisa Coca Managing Director, GE Ventures

  • < 6 months from pilot to globally

deployed data pipeline; 2M records consolidated to 700k

  • 25M reduced to 6.4M unique parts;

5-tier analytic-ready classification

  • Data from 3 acquisitions integrated

with GE’s in <2 weeks

Technical Outcome

  • Suppliers: Build an integrated view from

75+ ERP systems and 2M supplier records

  • Parts: 25M non-unique parts in

purchasing systems across 8 BUs

  • M&A: Integrate data from acquired

entities with existing master views

Business Outcome

“The supplier data integration was a big win.” Bill Ruh CEO GE Digital & Chief Digital Officer at GE

  • $80M savings in Year 1 from Tamr-

mastered unified supplier view

  • $300M in annual savings identified

(0.5% reduction of direct spend)

  • Supplier, purchasing, and customer

base opportunities quickly identified

  • Suppliers: Get GE’s best terms with a

given supplier in every negotiation

  • Parts: Optimize sourcing strategies to

most cost-effective suppliers

  • M&A: Increase the velocity of realizing

synergies post-acquisition

Technical Challenge Business Challenge

slide-12
SLIDE 12

Confidential

Case Study: Thomson Reuters

Optimizing an information company’s data curation operations Technical Outcome Business Outcome Technical Challenge Business Challenge

  • Automation levels as high as 90% in

key projects

  • SME’s knowledge incorporated

Tamr’s models

  • Sustainable precision and recall rates

in excess of 95%

  • Increase levels of automation in data

processing

  • Capture and leverage expertise of data

stewards

  • Improve data quality

“Since we brought Tamr in four years ago, it has become an integral part of our big data platform which powers our products and services.” Mona Vernon CIO Thomson Reuters Labs

  • Months shaved off new product intro
  • 40% reduction of manual integration
  • Hybrid on-prem / cloud deployments
  • New TR products launched that were

previously stuck on the drawing board

  • Accelerate time to market
  • Reduce manual effort
  • Support increasing cloud adoption
  • Take on data integration projects

previously deemed ‘too hard’

“Tamr’s novel integration platform enabled us to expedite our own entity integration efforts by several months while reducing the manual effort by over 40% – a substantial achievement.” Tim Baker Global Head of Content Initiatives

slide-13
SLIDE 13

Confidential

Case Study: Toyota Motor Europe

Unified customer 360 views from highly fragmented data collection Technical Outcome Business Outcome Technical Challenge Business Challenge

  • Effectively integrate locally managed

customer data from 30 countries

  • Maintain flexibility for data collection

and management at the country level

  • Integrate new sources quickly
  • Support local migration projects

“A lack of consistent view of customer data was restricting our ability to innovate and meet the expectations of our customers… Addressing these issues led us to an enterprise data unification approach, and a vendor, Tamr. We rejected traditional commercial offerings like MDM tools because their top-down approach required a single data model.” Matt Stevens Director of Information Systems

  • CSRs now have single UI to search

Tamr-mastered customer data

  • Better customer knowledge at point of

sale / service

  • First ever unified view of customers

fueling new analytical and operational use cases

  • Deliver better service for customers who

move between countries

  • Provide consistent experience across all

customer touch points

  • Understand and predict the needs of

customers to improve ability to exceed expectations

  • ~125 sources currently integrated; 500 by

project end

  • No disruption to local systems, but new

mastered data now available

  • 1 - 2 weeks on average
  • TME France CRM migration completed in

6 weeks

slide-14
SLIDE 14

Confidential

Case Study: GSK

Unified R&D data lake from fragmented research domain silos Technical Outcome Business Outcome Technical Challenge Business Challenge

  • R&D data too siloed / fragmented to be

used effectively for exploratory purposes.

  • A traditional master data management

approach would have taken too much time and effort to implement.

“GSK R&D’s data environment is something that one ofuen hears about in startups, but is rarely found in large enterprises whose roots go back over 300 years. And it’s great news for all of us humans who will benefit from the scientific advances it is likely to engender.” Tom Davenport

Biting The Data Management Bullet At GSK, FORBES

  • Tamr’s machine learning-based enterprise

data unification platform transformed the company’s data management capabilities.

  • Made it easier to access and use data for

exploratory analysis and decision making about new medicines

  • New drug development cycles impeded

by inflexible, weakly integrated data environment

  • Needed transformation in how data and

analytics were used across the

  • rganization
  • Used Tamr’s “probabilistic matching” to

combine data into a single data lake with three different domains.

  • All assay, clinical, and genetic data

moved into lake and unified in 3-months

slide-15
SLIDE 15

Confidential

Case Study: Société Générale

First ever unified view of Soc Gen spend globally that stays up-to-date Technical Outcome Business Outcome Technical Challenge Business Challenge

  • Went from 60 days to add a new source

to 5 days

  • 90% reduction in manual support effort

(IT & Procurement)

  • Entire project completed in 2 months
  • Running on GCP
  • Legacy solution (rules-based,
  • n-premise ERP add-on) can’t scale to

cover all spend

  • Expensive to maintain and support
  • Tight security and infrastructure

requirements have made it difficult to change solutions “In 30 hours of work, we accurately classified 75% of $12 billion Euros in spend, representing 6 million records”

  • Jean Baptiste, Head of Sourcing

Methods & Information Systems

  • Société Générale has a first-ever unified

view of spend globally that stays up-to-date

  • Significant increase in analytic

adoption and trust driven by 90%+ accuracy of classification

  • 300+ user group of employees (primarily

Sourcing) has an incomplete and inaccurate view of the company’s spend; executive mandate to solve the problem quickly (~3 months) Google Cloud Platform being used to meet internal deadlines and security requirements, while lower total costs

slide-16
SLIDE 16

Confidential Confidential

Demo

slide-17
SLIDE 17

Confidential

Next Steps: The Tamr Proof-of-Value Process

  • Fast: ~4 weeks start to finish
  • Easy: Limited customer engagement required,

but full engagement welcome

  • Free: No charge for qualified engagements
  • Organized: Well-defined, documented process
  • High Impact: Focused on a hard technical

problem with major business value

  • Clear KPIs: Mutually agreed success metrics
  • Action-Oriented: Shared expectation of

commercial engagement pending successful

  • utcome
slide-18
SLIDE 18

Confidential

Next Steps: Expert Services to Deliver a Quick Win

World-class data engineering and domain expertise experienced in deploying next generation data management stacks Services Professionals: 20+ Expertise: Data Pipeline / ETL Enablement, Data Visualization, Solution Customization Sample Profiles

Ted Gudmundsen AB, Physics, Princeton MS, Physics, Cornell Previous Experience: Quantum Computing Research at MIT Lincoln Laboratory Liam Cleary BS & PhD, Electrical Engineering, Trinity College, Dublin Previous Experience: Consultant at Ab Initio

Sample Projects Team Overview Data Migration Data Pipelining Data Enrichment

slide-19
SLIDE 19

The Tamr Advantage

  • A Focused Mission Tamr was built for unifying big data. Comprehending and unifying data is the company’s

core mission.

  • Best-of-Breed Technology A disruptive approach using machine learning conceived by a Turing Award winner

and developed with $50M of R&D over 6 years.

  • Open & Flexible Modern cloud-ready architecture ( available on-prem) exposed via RESTful APIs that fuse

Tamr’s patented IP with open-source big data technologies make it easy to integrate with your current and future data investments.

  • Large Scale Deployment Success Extensive experience putting large projects into production for both Global

2000 and government customers.

  • World-Class Services Team Tamr’s data scientists and data engineers are exceptional in their skill and

dedication to solving our customers’ hardest data challenges.

slide-20
SLIDE 20

Confidential Confidential

Internal Only

Links to Additional Resources

  • Solution Pitch Decks in Confluence
  • Case Study Slides (complete collection)
  • Whitepapers
  • Tamr Unify Product Screenshots
slide-21
SLIDE 21

Confidential Confidential

Appendix Slides

Not part of the standard intro deck but used as needed

slide-22
SLIDE 22

Confidential

Tamr’s Impact on the Data Debt Problem

41bn+

Total Records Unified

Demonstrated ability to reliably solve customer’s biggest data challenges

16+

Key Entities

Proven expertise across multiple domains

20%+

Average Improvement

In matches identified, classification coverage vs. legacy processes

99.9%+

Machine-driven

Limited human input required to develop accurate, scalable model

slide-23
SLIDE 23

Confidential

Machine-driven

This is a problem for machines

OLD WAY

Rules-based

NEW WAY

Source data Unified data

Time Quality Months to years Mediocre Days to weeks High

Source data Unified data Identify developers Get business input Write rules Review with business Modify rules, create exceptions

Months 1 - 4 Months 5 - 12+ Weeks 1 - 12

Provide examples Review recommendations

Iterate Iterate

slide-24
SLIDE 24

Confidential

Tamr’s DNA

Tamr solutions unify enterprise data by combining machine learning and human expertise to power transformational analytic and operational outcomes.

Headquarters: Cambridge, MA Additional Offices: San Francisco, London Founded: 2013

Key Founders Investors Company Overview

  • Dr. Michael Stonebraker

Co-Founder & CTO Previously: Founder, Ingres/Postgres, HP Vertica Andy Palmer Co-Founder & CEO Previously: Founder & CEO, HP Vertica

slide-25
SLIDE 25

Confidential

Tamr at a Glance

How It’s Different Who We Do It For What We Do

  • Data unification at scale
  • Powered by machine learning, informed by human

expertise

  • Deployable on-prem or in cloud
  • vs. Traditional ETL
  • User = data steward, not developer
  • Speed, ease, and cost of unifying myriad sources
  • vs. Self-Service Data Prep
  • Built for large numbers of sources
  • Big data ready architecture
  • vs. MDM
  • Faster and easier to deploy
  • Greater flexibility as underlying sources change

CDOs, CAOs, Innovative CIOs, & Frustrated Business Execs

High Impact Proof-of-Value (2 weeks)

  • Free proof of value (~ 2 weeks effort)
  • Prove Tamr is the best way to fix a painful problem
  • Clear success criteria

Succeed & Grow Approach

  • Initial deployment (typically $300k 2-year license)
  • Focus on nailing initial; deliver 10x ROI
  • Leverage world-class services team where needed

How We Engage

slide-26
SLIDE 26

Confidential

Tamr: Academic Project to Issued Patent in < 4 Years

2013 CIDR Research Paper 2017 Issued Patent Method & System for Large Scale Data Curation

slide-27
SLIDE 27

Confidential

Tamr in the Public Sector

Applications in the Drug Development Lifecycle Commercial Development Research

  • Citations
  • Assays
  • Metadata /

Terminologies

  • Clinical Data
  • Biomarkers
  • Genetics
  • Clinical

Operations

  • IDMP
  • PharmacoVigilance
  • Sales Analytics
  • Sunshine Act
slide-28
SLIDE 28

Confidential

US Intelligence Agency

  • Match data of known entities with inbound, open source data
  • Consolidate and score output of various intel gathering systems

US Customs and Border Protection

  • Consolidate and harmonize traveler information from watchlists, reservations

systems and manifest lists

Proposed projects and POCs with DoD and other agencies

  • Identify and track sources of non-compliant imported goods
  • Supplier and logistic data mastering
  • Automating personnel record merging

Tamr in the Public Sector

slide-29
SLIDE 29

Confidential

Tamr in Financial Services

Anti-Money Laundering (AML) & Know-Your-Customer (KYC)

  • Simpler data integration process: multiple regulators / domains can be

fed off the same data

  • Fewer full time employees to manually review data matching
  • Enhanced customer experience and sales opportunities

Customer Experience and Journey (CRM)

  • Higher quality client records makes prospecting more efficient
  • Less manual maintenance / increased automation
  • Seamlessly connecting internal and external data
  • Address fragmentation from free-form text entry and missing links
  • Fill in corporate hierarchy / family tree due to bad matching
  • Facilitate marketing attribution analysis and overall data quality

Risk Reporting and Regulatory

  • Accelerate cleanup times
  • Less manual maintenance / increased automation
  • Simpler data integration process: multiple regulators / domains can be

fed off the same data

slide-30
SLIDE 30

Confidential

Tamr enterprise deployment architecture

Connect

RDBMS Apps Data Lake Excel Data Enrichment Analytics / Visualization 360° Views

Ingest ETL Scripts APIs Publish ETL Scripts APIs Schema Mapping Classification Record Matching Tamr Compute Cluster Human / Machine Collaboration Feedback Tamr Repository

slide-31
SLIDE 31

Confidential

Tamr System Architecture

Unify

slide-32
SLIDE 32

Confidential

DataOps: Key Tech Principles

Internal Data External Data

BI / Analytics Data Wrangling Custom Apps Automated Integration Source Remediation

  • Agile/Continuous
  • Open/Best of Breed (not one platform/vendor)
  • Bi-Directional (Feedback)
  • Collaborative - Humans at the Core
  • Service Oriented
  • Loosely Coupled/Restful Interfaces Table In/Out
  • Scale Out/Distributed
  • Hybrid - Mix of Cloud & On-Prem
slide-33
SLIDE 33

Confidential

Tamr’s Focus - Best of Breed Data Ops

BI / Data Viz / Analytics Data Wrangling Enterprise Data Custom Apps External Data Logical Models - Unify Automation, Logging Provenance Movement Feedback Governance Raw Source Catalog Unified Data Hub

slide-34
SLIDE 34

Confidential

Tamr Professional Services

World-class data science and domain expertise to harness data for transformational results Patented Data Unification Sofuware Platform

Solution Customization Data Pipeline / ETL Enablement Transformational business results Case Study: Tamr enabled a global leader in pharma to slash conversion time of clinical study data to industry standards from 6 weeks to 1 day, while reducing their dependency on external consultants and manual labor. Domain Expertise

slide-35
SLIDE 35

Confidential

Tamr Professional Services

Best practices applied across four-phase implementation process to ensure successful deployment.

Understand existing infrastructure, and tailor the deployment by developing custom pre and post-processing Lead the initial training of Tamr’s models, maximizing the value of each subject matter expert Define strategy for further optimizing and maintaining results Train key stakeholders on capturing continuous value from their data through analytics and the Tamr sofuware

slide-36
SLIDE 36

Confidential

Tamr Professional Services

Breadth of services offered that can be tailored to meet the needs of the specific use case Service

Data Pipelining / ETL Model Training User Training Analytics

Details Timing

  • Develop custom pre and post-processing to automate

movement of data from source to target destination(s)

  • Drive the process of training Tamr’s machine learning

model, while engaging experts as necessary

  • Provide training to data curators and business users to

use Tamr for the use case

  • Deliver insights and build production-ready dashboards

within analytics / visualization tool of choice 2 weeks - 8 weeks 3 weeks - 6 weeks Up to 1 week 2 weeks - 10 weeks

slide-37
SLIDE 37

Confidential

Tamr’s Service Offerings

Data Migration Data Pipelining Data Enrichment

One-time unification of multiple input sources into a target source A repeatable flow of unified data from multiple sources Identify and integrate high impact external data

  • ERP Consolidation
  • Post-M&A IT Rationalization
  • Cloud Migration
  • SDTM Conversion (Biopharma)
  • BI Implementation
  • Data Lake / Data Warehouse

Integration

  • Hybrid Cloud Integration
  • Contact Profiles (job title, social

media footprint)

  • Supplier Competition Graph
  • Organizational Parenting
slide-38
SLIDE 38

Confidential

Service-Led Project #1 - Data Migration

Data Migration

One-time unification of multiple input sources into a target source

  • ERP Consolidation
  • Post-M&A IT Rationalization
  • Cloud Migration
  • SDTM Conversion (Biopharma)
  • Define data preparation tasks that will enable you to make the data flexible

enough to be utilized across a range of applications ○ This includes profiling input sources and identifying critical data attributes, transformations that should be performed, entities to master, and data that would benefit from classification

  • If code has already been written, we review it to ensure it follows best practices
  • Perform the data movement and unification tasks for the migration project

○ This may include writing ETL processes, mastering the data, and / or classifying the data to a target taxonomy ○ Includes unlimited use of Tamr’s sofuware to accelerate the migration

  • Define the specific best-of-breed tooling to use, so that work can be reused to
  • nboard a new source or modify elements of the data preparation
slide-39
SLIDE 39

Confidential

Service-Led Project #2 - Data Pipelining

Data Pipelining

A repeatable flow of unified data from multiple sources

  • BI Implementation
  • Data Lake / Data Warehouse

Integration

  • Hybrid Cloud Integration
  • We provide frameworks to utilize in designing data pipelines so that they are

built correctly for scalability the first time

  • Frameworks include specific tools and technologies that should be used, core

principles for building the ETL processes, and recommendations on

  • rganizational structure & skill sets required to sustain the pipelines
  • If code has already been written, we review it to ensure it follows best practices
  • Develop the pipeline and perform a hand-off at the end of the project, providing

training and enablement to ensure you are successful afuer the project is complete

  • Define the specific best-of-breed tooling to use to build the pipelines, so that our

work can be reused later to onboard a new data source

  • We can quickly implement our sofuware, Unify (licensed separately), as part of

the pipeline if it becomes necessary

slide-40
SLIDE 40

Confidential

Service-Led Project #3 - Data Enrichment

Data Enrichment

A repeatable flow of unified data from multiple sources

  • BI Implementation
  • Data Lake / Data Warehouse

Integration

  • Hybrid Cloud Integration
  • Define the analytical outcomes you are driving towards for specific use cases

(e.g., customer segmentation, spend analytics, etc), provide an overview of the external data sources that are relevant to the use case, and make recommendations on which sources to acquire and what the enrichment process should include ○ We offer these services for the domains where we have expertise including sales & marketing, procurement, inventory management, and biopharma

  • Build the data pipeline, or specific pieces that have been challenging to build, to

integrate the external data with internal data ○ This includes writing the API calls to acquire the data, performing the required data preparation to increase matches (inside or outside of Unify, depending on need and preference), and plumbing data back into the target systems on a repeatable basis

  • We handoff the pipeline afuer we deploy it, and provide training to maintain it