Confidential Confidential
Enterprise Data Unification Powered by Machine Learning Jerome - - PowerPoint PPT Presentation
Enterprise Data Unification Powered by Machine Learning Jerome - - PowerPoint PPT Presentation
Enterprise Data Unification Powered by Machine Learning Jerome Gransac Confidential Confidential HI-IS Big Data Event Intro Tamr Overview Background Use Cases & Customer Success Technology & Differentiation
Confidential Confidential
HI-IS Big Data Event
- Intro
- Tamr Overview
○ Background ○ Use Cases & Customer Success ○ Technology & Differentiation
- Product Demo
Confidential
Reality: Constant change & entropy have created “Random Data Salad”
State of the Data in the Enterprise: Large Data Debt
Restructuring Leadership Changes Politics Dynamic Schema DBs - Mongo et al “Data Hoarding” Legacy Burden M&A
Requirement: Accurate, up-to-date, curated view of core business entities
Customers Suppliers Products Parts Transactions [...]
Problem: 1. Too much time spent on data prep vs. analysis & action 2. High failure rate of BI & analytics projects 3. Game-changing initiatives deemed ‘impossible’ and never start
Confidential
Tamr At A Glance
Tamr solutions unify enterprise data by combining machine learning and human expertise to power transformational analytic and operational outcomes.
Headquarters: Cambridge, MA Additional Offices: San Francisco, London Founded: 2013
Key Founders Company Overview
- Dr. Michael Stonebraker
Co-Founder & CTO Previously: Founder, Ingres/Postgres, HP Vertica Andy Palmer Co-Founder & CEO Previously: Founder & CEO, HP Vertica
Confidential
What Tamr Does: Enterprise Data Unification
$500M+ Savings
From sourcing analytics across siloed business
10x Reduction
In new data set integration from 6 months to 2 weeks
1,500+ Studies
Unified clinical study data to empower researchers
Customer Insights
Unified buyer profiles across siloed dealer systems in 30+ geos
Tamr uses machine learning to attack the enterprise data variety problem to power transformative analytic and
- perational outcomes
Spend Classification
From pilot to live on Google Cloud Platform in 6 weeks
Inventory Optimization
$100M in reduced inventory by harmonizing parts across 5 fleets
Product Sales Insights
Unifying product sales data from distributors to enable new analytics
Video Case Study Video Case Study Video Case Study Case Study Video Case Study
Well Productivity
Integrating disparate data related to wells to optimize productivity
Confidential
Tamr Unify: Platform Overview
Internal Data External Data
COMBINE CONSOLIDATE CLASSIFY
Machine Learning Expert Input
Schema Mapping Classification Record Matching Microservices: RESTful APIs
BI / Analytics Data Wrangling Custom Apps Automated Integration Source Remediation
Confidential Unified Name Supplier Source1: Vendor_Name Source2: Supplier Source3: LFA1-MANDT Source4: CompanyName SourceN: Nom_de_Cie
Tamr’s Three Core Capabilities in Action
Building Data Integration Probabilistic Models for Mastering & Classification
- Combine thousands of sources
- Extract signal from underlying data
to drive mapping decisions
- Increase levels of mapping
automation as sources increase
Confidential Husky Manufacturing 342 Suite 34 KY Allied Husky Incorporated 342 Main St KY, USA
=
- r
Tamr’s Three Core Capabilities in Action
Building Data Integration Probabilistic Models for Mastering & Classification
- Automatically group similar records
using machine learning
- Infuse expert feedback easily and
quickly for maximum accuracy
- Create ‘golden records’
Confidential
Should “Husky #.25 J Blt” in table “Invoices” be categorized as Hardware > Fasteners > Bolt
Tamr’s Three Core Capabilities in Action
Building Data Integration Probabilistic Models for Mastering & Classification
- Automatically map records into any
taxonomy
- Leverage standard or custom n-tier
classification scheme
- Easily adjust as needs evolve
Confidential Confidential
Use Cases & Customer Success
Confidential
Case Study: General Electric
Agile, multi-domain entity mastering drives $500M+ in Value
“We’ve seen firsthand the transformative results that Tamr’s technology has on an enterprise of GE’s scale. When the cost and complexity of bringing together enterprise datasets is massively reduced, the resulting analytic breakthroughs create opportunities that were previously inaccessible.” Lisa Coca Managing Director, GE Ventures
- < 6 months from pilot to globally
deployed data pipeline; 2M records consolidated to 700k
- 25M reduced to 6.4M unique parts;
5-tier analytic-ready classification
- Data from 3 acquisitions integrated
with GE’s in <2 weeks
Technical Outcome
- Suppliers: Build an integrated view from
75+ ERP systems and 2M supplier records
- Parts: 25M non-unique parts in
purchasing systems across 8 BUs
- M&A: Integrate data from acquired
entities with existing master views
Business Outcome
“The supplier data integration was a big win.” Bill Ruh CEO GE Digital & Chief Digital Officer at GE
- $80M savings in Year 1 from Tamr-
mastered unified supplier view
- $300M in annual savings identified
(0.5% reduction of direct spend)
- Supplier, purchasing, and customer
base opportunities quickly identified
- Suppliers: Get GE’s best terms with a
given supplier in every negotiation
- Parts: Optimize sourcing strategies to
most cost-effective suppliers
- M&A: Increase the velocity of realizing
synergies post-acquisition
Technical Challenge Business Challenge
Confidential
Case Study: Thomson Reuters
Optimizing an information company’s data curation operations Technical Outcome Business Outcome Technical Challenge Business Challenge
- Automation levels as high as 90% in
key projects
- SME’s knowledge incorporated
Tamr’s models
- Sustainable precision and recall rates
in excess of 95%
- Increase levels of automation in data
processing
- Capture and leverage expertise of data
stewards
- Improve data quality
“Since we brought Tamr in four years ago, it has become an integral part of our big data platform which powers our products and services.” Mona Vernon CIO Thomson Reuters Labs
- Months shaved off new product intro
- 40% reduction of manual integration
- Hybrid on-prem / cloud deployments
- New TR products launched that were
previously stuck on the drawing board
- Accelerate time to market
- Reduce manual effort
- Support increasing cloud adoption
- Take on data integration projects
previously deemed ‘too hard’
“Tamr’s novel integration platform enabled us to expedite our own entity integration efforts by several months while reducing the manual effort by over 40% – a substantial achievement.” Tim Baker Global Head of Content Initiatives
Confidential
Case Study: Toyota Motor Europe
Unified customer 360 views from highly fragmented data collection Technical Outcome Business Outcome Technical Challenge Business Challenge
- Effectively integrate locally managed
customer data from 30 countries
- Maintain flexibility for data collection
and management at the country level
- Integrate new sources quickly
- Support local migration projects
“A lack of consistent view of customer data was restricting our ability to innovate and meet the expectations of our customers… Addressing these issues led us to an enterprise data unification approach, and a vendor, Tamr. We rejected traditional commercial offerings like MDM tools because their top-down approach required a single data model.” Matt Stevens Director of Information Systems
- CSRs now have single UI to search
Tamr-mastered customer data
- Better customer knowledge at point of
sale / service
- First ever unified view of customers
fueling new analytical and operational use cases
- Deliver better service for customers who
move between countries
- Provide consistent experience across all
customer touch points
- Understand and predict the needs of
customers to improve ability to exceed expectations
- ~125 sources currently integrated; 500 by
project end
- No disruption to local systems, but new
mastered data now available
- 1 - 2 weeks on average
- TME France CRM migration completed in
6 weeks
Confidential
Case Study: GSK
Unified R&D data lake from fragmented research domain silos Technical Outcome Business Outcome Technical Challenge Business Challenge
- R&D data too siloed / fragmented to be
used effectively for exploratory purposes.
- A traditional master data management
approach would have taken too much time and effort to implement.
“GSK R&D’s data environment is something that one ofuen hears about in startups, but is rarely found in large enterprises whose roots go back over 300 years. And it’s great news for all of us humans who will benefit from the scientific advances it is likely to engender.” Tom Davenport
Biting The Data Management Bullet At GSK, FORBES
- Tamr’s machine learning-based enterprise
data unification platform transformed the company’s data management capabilities.
- Made it easier to access and use data for
exploratory analysis and decision making about new medicines
- New drug development cycles impeded
by inflexible, weakly integrated data environment
- Needed transformation in how data and
analytics were used across the
- rganization
- Used Tamr’s “probabilistic matching” to
combine data into a single data lake with three different domains.
- All assay, clinical, and genetic data
moved into lake and unified in 3-months
Confidential
Case Study: Société Générale
First ever unified view of Soc Gen spend globally that stays up-to-date Technical Outcome Business Outcome Technical Challenge Business Challenge
- Went from 60 days to add a new source
to 5 days
- 90% reduction in manual support effort
(IT & Procurement)
- Entire project completed in 2 months
- Running on GCP
- Legacy solution (rules-based,
- n-premise ERP add-on) can’t scale to
cover all spend
- Expensive to maintain and support
- Tight security and infrastructure
requirements have made it difficult to change solutions “In 30 hours of work, we accurately classified 75% of $12 billion Euros in spend, representing 6 million records”
- Jean Baptiste, Head of Sourcing
Methods & Information Systems
- Société Générale has a first-ever unified
view of spend globally that stays up-to-date
- Significant increase in analytic
adoption and trust driven by 90%+ accuracy of classification
- 300+ user group of employees (primarily
Sourcing) has an incomplete and inaccurate view of the company’s spend; executive mandate to solve the problem quickly (~3 months) Google Cloud Platform being used to meet internal deadlines and security requirements, while lower total costs
Confidential Confidential
Demo
Confidential
Next Steps: The Tamr Proof-of-Value Process
- Fast: ~4 weeks start to finish
- Easy: Limited customer engagement required,
but full engagement welcome
- Free: No charge for qualified engagements
- Organized: Well-defined, documented process
- High Impact: Focused on a hard technical
problem with major business value
- Clear KPIs: Mutually agreed success metrics
- Action-Oriented: Shared expectation of
commercial engagement pending successful
- utcome
Confidential
Next Steps: Expert Services to Deliver a Quick Win
World-class data engineering and domain expertise experienced in deploying next generation data management stacks Services Professionals: 20+ Expertise: Data Pipeline / ETL Enablement, Data Visualization, Solution Customization Sample Profiles
Ted Gudmundsen AB, Physics, Princeton MS, Physics, Cornell Previous Experience: Quantum Computing Research at MIT Lincoln Laboratory Liam Cleary BS & PhD, Electrical Engineering, Trinity College, Dublin Previous Experience: Consultant at Ab Initio
Sample Projects Team Overview Data Migration Data Pipelining Data Enrichment
The Tamr Advantage
- A Focused Mission Tamr was built for unifying big data. Comprehending and unifying data is the company’s
core mission.
- Best-of-Breed Technology A disruptive approach using machine learning conceived by a Turing Award winner
and developed with $50M of R&D over 6 years.
- Open & Flexible Modern cloud-ready architecture ( available on-prem) exposed via RESTful APIs that fuse
Tamr’s patented IP with open-source big data technologies make it easy to integrate with your current and future data investments.
- Large Scale Deployment Success Extensive experience putting large projects into production for both Global
2000 and government customers.
- World-Class Services Team Tamr’s data scientists and data engineers are exceptional in their skill and
dedication to solving our customers’ hardest data challenges.
Confidential Confidential
Internal Only
Links to Additional Resources
- Solution Pitch Decks in Confluence
- Case Study Slides (complete collection)
- Whitepapers
- Tamr Unify Product Screenshots
Confidential Confidential
Appendix Slides
Not part of the standard intro deck but used as needed
Confidential
Tamr’s Impact on the Data Debt Problem
41bn+
Total Records Unified
Demonstrated ability to reliably solve customer’s biggest data challenges
16+
Key Entities
Proven expertise across multiple domains
20%+
Average Improvement
In matches identified, classification coverage vs. legacy processes
99.9%+
Machine-driven
Limited human input required to develop accurate, scalable model
Confidential
Machine-driven
This is a problem for machines
OLD WAY
Rules-based
NEW WAY
Source data Unified data
Time Quality Months to years Mediocre Days to weeks High
Source data Unified data Identify developers Get business input Write rules Review with business Modify rules, create exceptions
Months 1 - 4 Months 5 - 12+ Weeks 1 - 12
Provide examples Review recommendations
Iterate Iterate
Confidential
Tamr’s DNA
Tamr solutions unify enterprise data by combining machine learning and human expertise to power transformational analytic and operational outcomes.
Headquarters: Cambridge, MA Additional Offices: San Francisco, London Founded: 2013
Key Founders Investors Company Overview
- Dr. Michael Stonebraker
Co-Founder & CTO Previously: Founder, Ingres/Postgres, HP Vertica Andy Palmer Co-Founder & CEO Previously: Founder & CEO, HP Vertica
Confidential
Tamr at a Glance
How It’s Different Who We Do It For What We Do
- Data unification at scale
- Powered by machine learning, informed by human
expertise
- Deployable on-prem or in cloud
- vs. Traditional ETL
- User = data steward, not developer
- Speed, ease, and cost of unifying myriad sources
- vs. Self-Service Data Prep
- Built for large numbers of sources
- Big data ready architecture
- vs. MDM
- Faster and easier to deploy
- Greater flexibility as underlying sources change
CDOs, CAOs, Innovative CIOs, & Frustrated Business Execs
High Impact Proof-of-Value (2 weeks)
- Free proof of value (~ 2 weeks effort)
- Prove Tamr is the best way to fix a painful problem
- Clear success criteria
Succeed & Grow Approach
- Initial deployment (typically $300k 2-year license)
- Focus on nailing initial; deliver 10x ROI
- Leverage world-class services team where needed
How We Engage
Confidential
Tamr: Academic Project to Issued Patent in < 4 Years
2013 CIDR Research Paper 2017 Issued Patent Method & System for Large Scale Data Curation
Confidential
Tamr in the Public Sector
Applications in the Drug Development Lifecycle Commercial Development Research
- Citations
- Assays
- Metadata /
Terminologies
- Clinical Data
- Biomarkers
- Genetics
- Clinical
Operations
- IDMP
- PharmacoVigilance
- Sales Analytics
- Sunshine Act
Confidential
US Intelligence Agency
- Match data of known entities with inbound, open source data
- Consolidate and score output of various intel gathering systems
US Customs and Border Protection
- Consolidate and harmonize traveler information from watchlists, reservations
systems and manifest lists
Proposed projects and POCs with DoD and other agencies
- Identify and track sources of non-compliant imported goods
- Supplier and logistic data mastering
- Automating personnel record merging
Tamr in the Public Sector
Confidential
Tamr in Financial Services
Anti-Money Laundering (AML) & Know-Your-Customer (KYC)
- Simpler data integration process: multiple regulators / domains can be
fed off the same data
- Fewer full time employees to manually review data matching
- Enhanced customer experience and sales opportunities
Customer Experience and Journey (CRM)
- Higher quality client records makes prospecting more efficient
- Less manual maintenance / increased automation
- Seamlessly connecting internal and external data
- Address fragmentation from free-form text entry and missing links
- Fill in corporate hierarchy / family tree due to bad matching
- Facilitate marketing attribution analysis and overall data quality
Risk Reporting and Regulatory
- Accelerate cleanup times
- Less manual maintenance / increased automation
- Simpler data integration process: multiple regulators / domains can be
fed off the same data
Confidential
Tamr enterprise deployment architecture
Connect
RDBMS Apps Data Lake Excel Data Enrichment Analytics / Visualization 360° Views
Ingest ETL Scripts APIs Publish ETL Scripts APIs Schema Mapping Classification Record Matching Tamr Compute Cluster Human / Machine Collaboration Feedback Tamr Repository
Confidential
Tamr System Architecture
Unify
Confidential
DataOps: Key Tech Principles
Internal Data External Data
BI / Analytics Data Wrangling Custom Apps Automated Integration Source Remediation
- Agile/Continuous
- Open/Best of Breed (not one platform/vendor)
- Bi-Directional (Feedback)
- Collaborative - Humans at the Core
- Service Oriented
- Loosely Coupled/Restful Interfaces Table In/Out
- Scale Out/Distributed
- Hybrid - Mix of Cloud & On-Prem
Confidential
Tamr’s Focus - Best of Breed Data Ops
BI / Data Viz / Analytics Data Wrangling Enterprise Data Custom Apps External Data Logical Models - Unify Automation, Logging Provenance Movement Feedback Governance Raw Source Catalog Unified Data Hub
Confidential
Tamr Professional Services
World-class data science and domain expertise to harness data for transformational results Patented Data Unification Sofuware Platform
Solution Customization Data Pipeline / ETL Enablement Transformational business results Case Study: Tamr enabled a global leader in pharma to slash conversion time of clinical study data to industry standards from 6 weeks to 1 day, while reducing their dependency on external consultants and manual labor. Domain Expertise
Confidential
Tamr Professional Services
Best practices applied across four-phase implementation process to ensure successful deployment.
Understand existing infrastructure, and tailor the deployment by developing custom pre and post-processing Lead the initial training of Tamr’s models, maximizing the value of each subject matter expert Define strategy for further optimizing and maintaining results Train key stakeholders on capturing continuous value from their data through analytics and the Tamr sofuware
Confidential
Tamr Professional Services
Breadth of services offered that can be tailored to meet the needs of the specific use case Service
Data Pipelining / ETL Model Training User Training Analytics
Details Timing
- Develop custom pre and post-processing to automate
movement of data from source to target destination(s)
- Drive the process of training Tamr’s machine learning
model, while engaging experts as necessary
- Provide training to data curators and business users to
use Tamr for the use case
- Deliver insights and build production-ready dashboards
within analytics / visualization tool of choice 2 weeks - 8 weeks 3 weeks - 6 weeks Up to 1 week 2 weeks - 10 weeks
Confidential
Tamr’s Service Offerings
Data Migration Data Pipelining Data Enrichment
One-time unification of multiple input sources into a target source A repeatable flow of unified data from multiple sources Identify and integrate high impact external data
- ERP Consolidation
- Post-M&A IT Rationalization
- Cloud Migration
- SDTM Conversion (Biopharma)
- BI Implementation
- Data Lake / Data Warehouse
Integration
- Hybrid Cloud Integration
- Contact Profiles (job title, social
media footprint)
- Supplier Competition Graph
- Organizational Parenting
Confidential
Service-Led Project #1 - Data Migration
Data Migration
One-time unification of multiple input sources into a target source
- ERP Consolidation
- Post-M&A IT Rationalization
- Cloud Migration
- SDTM Conversion (Biopharma)
- Define data preparation tasks that will enable you to make the data flexible
enough to be utilized across a range of applications ○ This includes profiling input sources and identifying critical data attributes, transformations that should be performed, entities to master, and data that would benefit from classification
- If code has already been written, we review it to ensure it follows best practices
- Perform the data movement and unification tasks for the migration project
○ This may include writing ETL processes, mastering the data, and / or classifying the data to a target taxonomy ○ Includes unlimited use of Tamr’s sofuware to accelerate the migration
- Define the specific best-of-breed tooling to use, so that work can be reused to
- nboard a new source or modify elements of the data preparation
Confidential
Service-Led Project #2 - Data Pipelining
Data Pipelining
A repeatable flow of unified data from multiple sources
- BI Implementation
- Data Lake / Data Warehouse
Integration
- Hybrid Cloud Integration
- We provide frameworks to utilize in designing data pipelines so that they are
built correctly for scalability the first time
- Frameworks include specific tools and technologies that should be used, core
principles for building the ETL processes, and recommendations on
- rganizational structure & skill sets required to sustain the pipelines
- If code has already been written, we review it to ensure it follows best practices
- Develop the pipeline and perform a hand-off at the end of the project, providing
training and enablement to ensure you are successful afuer the project is complete
- Define the specific best-of-breed tooling to use to build the pipelines, so that our
work can be reused later to onboard a new data source
- We can quickly implement our sofuware, Unify (licensed separately), as part of
the pipeline if it becomes necessary
Confidential
Service-Led Project #3 - Data Enrichment
Data Enrichment
A repeatable flow of unified data from multiple sources
- BI Implementation
- Data Lake / Data Warehouse
Integration
- Hybrid Cloud Integration
- Define the analytical outcomes you are driving towards for specific use cases
(e.g., customer segmentation, spend analytics, etc), provide an overview of the external data sources that are relevant to the use case, and make recommendations on which sources to acquire and what the enrichment process should include ○ We offer these services for the domains where we have expertise including sales & marketing, procurement, inventory management, and biopharma
- Build the data pipeline, or specific pieces that have been challenging to build, to
integrate the external data with internal data ○ This includes writing the API calls to acquire the data, performing the required data preparation to increase matches (inside or outside of Unify, depending on need and preference), and plumbing data back into the target systems on a repeatable basis
- We handoff the pipeline afuer we deploy it, and provide training to maintain it