Architecting a data platform to support analytic workflows for - PowerPoint PPT Presentation

Architecting a data platform to support analytic workflows for scientific data Sun Maria Lehmann, Equinor Jane McConnell, Teradata

We work in the Upstream Oil and Gas Industry Upstream Downstream – like chemical Transport Retail – like manufacturing and any other Logistics Supply & Exploration Development Production Trading Retail Refining Distribution Trading – similar to any other commodities trading #sunandjanetalkdata

Upstream O&G has complex and varied data Subsurface Scientific Data IT data Facilities OT data Business/Mgmt Reporting #sunandjanetalkdata

Subsurface Data § Measurement data from sensors, often many TBs § Mainly received in batch as the result of a data acquisition event carried out by a different company (oil field service company) #sunandjanetalkdata

…buried in a long history of data exchange formats DVD Tape Disk #sunandjanetalkdata

This is how this data is traditionally stored § Library style storage - Physical items (rocks, fluids) - Tapes and hardcopy - Digital files § To use data, it is moved into technical/scientific applications - File > Import… - Manual - Decisions made during import #sunandjanetalkdata

Digital Transformation? Not yet. Interpreting seismic #sunandjanetalkdata

And it’s the same for interpreting well logs #sunandjanetalkdata

True digital transformation requires a data platform that: Allows you to look wide – across all of Allows you to look Allows you to Keeps all the data the data, from deep - into the combine data safe for the future different oil fields, detail and history of across traditional from different the data boundaries countries #sunandjanetalkdata

We need to reduce the manual steps § Every piece of data treated as unique § Data is stored as-is, and then manually coerced into applications § Data is valid for decades - so we are often loading old formats § Data sets might not have complete metadata § If metadata is missing – can we infer it? - Humans can, if they are experts in the field #sunandjanetalkdata

Improving data management with autonomous pipelines Explore Scale Identify Building Autonomy Automate Contextualize Pipleine Standardize 12

Building Autonomous Capabilities Full Autonomous • Fully created by AI on demand High Automation • Fully scalable • AI can do new Lifting the Baseline pipelines with Conditional human supervision Automation • High Scalability • Humans defines new Partial scalable piplines and Automation AI can be trained • Scalable • Humans build Assisted Ingest pipelines • Humans Standardize, Test Fully Manual and Improve ingest • Humans Explore, Identify and Describes new pipelines Increased Automation #sunandjanetalkdata

Implement a layered data architecture Ingest Prepare Consume Governance & Security Sources Consumers Metadata Technical Business Operational Reference Information Architecture Applications Business General Generated Matching & Consumers Preparing Subsurface Interpretation Landing Transforming Add measurements in standard units Human Well Planning Generated Data Data as received, with metadata Connecting data as requested Analysts Assign common keys Optimised Structures Extract from closed formats Values API Layer Production Forecasting Interaction Create Derived Generated Simulation Autonomous Application Machine Generated New Apps? Data Scientists #sunandjanetalkdata

Autonomous Pipelines with Layered Architecture LAND EXTRACT MATCH TRANSFORM PREPARE • Safely store what you • Get the data out of the • Assign correct keys • Transform the data to a • Create datasets that received, in the format weird file format from MDM standard model will serve specific you received it – and usage needs • Calculate and add with any metadata that standardised came with it measurements Building Building Building Building Autonomy Autonomy Autonomy Autonomy Add New Products to the Pipeline #sunandjanetalkdata

LAND: Store what you receive, together with the metadata about your measurement data Is this the What’s the raw accuracy of measure, or the measure? a derived value? Why was it measured? 42 What was measured? When was it measured? What unit? Who was it measured by? Where was it measured? #sunandjanetalkdata

EXTRACT: Make it readable and re-usable. Don’t worry if data gets bigger #sunandjanetalkdata

MATCH: Common Keys § Master Data Management § Reference Data Management § Ontology and Business Glossary - Old workflows stayed within disciplines, and now drilling engineers use different words from exploration geoscience from production operations #sunandjanetalkdata

MATCH: Units of Measure § Define your standard Units of Measure (normally best to use SI) § Create a service to do UoM conversions - We suggest to use the Energistics UoM v1 dataset as your source § When your source data is in a different unit system, convert it and add the standard UoM values to your data § Keep the original measured data and unit – just in case J #sunandjanetalkdata

MATCH: Geospatial Data § Geospatial data in lat/lon – in decimal degrees, or degrees, minutes, seconds § Geospatial data in a projected coordinate system – like UTM or NAD83 – in metres or in yards § To be able to combine data from different regions, create a converted version (normally WGS84 lat/lon in decimal degrees) and store with your data § Create a service to do transformations – suggest http://www.epsg-registry.org as your source for transformations #sunandjanetalkdata

TRANSFORM : SOL vs SOR – it’s AND not OR Data Combined and ready for the Analytics you want to perform, needs precision not guesses. This will require transformation models Some transformations are not known, only guesses. How you transform the data to join it depends on what you are using it for #sunandjanetalkdata

Derived Data in Transform § With transactional data, derived data is normally SUM, MIN, MEAN, MAX § With scientific data, it can be positions and geometric projections § If there is an accepted company standard way to convert from eg directional survey to well path, then this can be done in TRANSFORM #sunandjanetalkdata

Things which are guesses – Joining Well and Seismic Data § Measured in depth, a § Measured in two-way travel time (ms) distance (metres or § Each data point is the size of an office feet/inches) building § Cm scale or smaller § Data for a large volume of the § Data only valid on the well subsurface path When there is a choice – you Matching this data requires decisions shouldn’t do it in TRANSFORM • A time-depth mapping – it belongs in PREPARE • Decisions on how far – and how - to propagate well data through the volume #sunandjanetalkdata

PREPARE: Datasets for a specific purpose OLD WORLD APPLICATIONS NEW WORLD ANALYTICS § Creation of files in “transfer format” § Creating big wide analytical datasets § Feeding existing app APIs § Datasets supporting new § Re-creatable applications via APIs Understand : • Usage scenarios • Data freshness • Performance requirements • Accuracy, precision • Granularity #sunandjanetalkdata

PREPARE: The biggest blocker to data science is creating the analytical datasets If your boss asks you, tell them that I said “build a Unified Data Warehouse” – Andrew Ng Source: Nuts and bolts of applying deep learning #sunandjanetalkdata

PREPARE: You need re-creatable datasets Whether you persist your prepared layer or deliver it on LAND EXTRACT MATCH TRANSFORM PREPARE the fly: AUTOMATE!!! • Safely store what you • Get the data out of the • Assign correct keys • Transform the data to a • Create datasets that You will need to recreate these prepared datasets many, received, in the format weird file format from MDM standard model will serve specific you received it – and usage needs • Calculate and add many times based on changing assumptions for with any metadata that standardised came with it measurements transformations and joins Anything you cannot automatically recreate – if there was Building Building Building Building human intervention – then it’s a NEW dataset and needs Autonomy Autonomy Autonomy Autonomy to go back in to LAND, with new metadata Add New Products to the Pipeline #sunandjanetalkdata

In Summary § LAND data – as it was received, with all required metadata § EXTRACT from ugly, maybe binary, transfer formats to human-readable, self-describing formats If you can’t recreate (and check metadata again) a dataset without § MATCH human input – it’s a new dataset, and - Master and reference data, Units of Measure, needs to go back to Geospatial data LAND § TRANSFORM everything that is true – and no more § PREPARE datasets for specific usage, and for the old way of working as well as the new #sunandjanetalkdata

Jane McConnell Sun Maria Lehmann Practice Partner O&G , Industrial IoT Group Leading Engineer, Enterprise Data Management Jane.mcconnell@teradata.com Equinor, Norway, Trondheim +44 (0)7936 703343 My blog on Teradata.com Follow me on Twitter @sunle Follow me on Twitter @jane_mcconnell My profile My profile #sunandjanetalkdata

Rate today ’s session Session page on conference website O’Reilly Events App #sunandjanetalkdata

Architecting a data platform to support analytic workflows for - PowerPoint PPT Presentation

Architecting a data platform to support analytic workflows for scientific data Sun Maria Lehmann, Equinor Jane McConnell, Teradata We work in the Upstream Oil and Gas Industry Upstream Downstream like chemical Transport Retail like

Zeros of analytic functions Lecture 14 Zeros of analytic functions Zeros of analytic functions

A Decision A Decision A Decision-Analytic Approach for A Decision Analytic Approach for

Architecting the Internet of Things Dieter Uckelmann Mark Harrison Florian Michahelles

Architecting Java solutions for CICS Architecting Java solutions for CICS Course introduction

Architecting a 30 PB all - Architecting a 30 PB all flash file system flash file system Kirill

Architecting a Kotlin JVM and JS multiplatform project FELIPE LIMA / OCT 4TH, 2018 / KOTLINCONF

INSIDE THE PLATFORM Who are we Classic platforms Classic platform Modern platform Modern

The Role of Event Description in Architecting Dependable Systems Marcio S. Dias Debra J.

RAIC: Architecting Dependable Systems Through Redundancy and Just-In-Time Testing For The ICSE

On p -adic comparison theorems for analytic spaces Wies lawa Nizio l, joint with Pierre

Analytic Combinatorics in Several Variables Robin Pemantle and Mark Wilson A of A conference, 30

Hadamard type operators for real analytic functions of several variables and moments of analytic

5. Analytic Combinatorics http://aofa.cs.princeton.edu Analytic combinatorics is a calculus for

Functional Analytic Framework Functional Analytic Framework for Model Selection for Model

Architecting Cross-Platform Mobile Frameworks Spencer Chan Quora Motivation Two extremes

Delfi COM Platform What stands behind the term: Delfi COM Platform? The Delfi COM Platform

The General-Purpose Storage Revolution Jeff Bonwick Sun Fellow and CTO, Storage Sun

GDPR Breach & Automated Decision Making Part 5 of our series on GDPR and its impact on the

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and Tony Hammond

ExFaKT Ex plaining F acts over K nowledge Graphs & T ext Mohamed H. Gad-Elrab Daria Stepanova

Automatic Code Generation for Library Method Inclusion in Domain Specific Languages

A Human-Centric Approach to Program Understanding Ph.D. Dissertation Proposal Raymond P.L. Buse

Question format -- open Dos and donts of occupation Because occupations are

MultiGPU Made Easy by OmpSs + CUDA/OpenACC Antonio J. Pea Sr. Researcher & Activity Lead

Architecting a data platform to support analytic workflows for - PowerPoint PPT Presentation

Architecting a data platform to support analytic workflows for scientific data Sun Maria Lehmann, Equinor Jane McConnell, Teradata We work in the Upstream Oil and Gas Industry Upstream Downstream like chemical Transport Retail like

Zeros of analytic functions Lecture 14 Zeros of analytic functions Zeros of analytic functions

A Decision A Decision A Decision-Analytic Approach for A Decision Analytic Approach for

Architecting the Internet of Things Dieter Uckelmann Mark Harrison Florian Michahelles

Architecting Java solutions for CICS Architecting Java solutions for CICS Course introduction

Architecting a 30 PB all - Architecting a 30 PB all flash file system flash file system Kirill

Architecting a Kotlin JVM and JS multiplatform project FELIPE LIMA / OCT 4TH, 2018 / KOTLINCONF

INSIDE THE PLATFORM Who are we Classic platforms Classic platform Modern platform Modern

The Role of Event Description in Architecting Dependable Systems Marcio S. Dias Debra J.

RAIC: Architecting Dependable Systems Through Redundancy and Just-In-Time Testing For The ICSE

On p -adic comparison theorems for analytic spaces Wies lawa Nizio l, joint with Pierre

Analytic Combinatorics in Several Variables Robin Pemantle and Mark Wilson A of A conference, 30

Hadamard type operators for real analytic functions of several variables and moments of analytic

5. Analytic Combinatorics http://aofa.cs.princeton.edu Analytic combinatorics is a calculus for

Functional Analytic Framework Functional Analytic Framework for Model Selection for Model

Architecting Cross-Platform Mobile Frameworks Spencer Chan Quora Motivation Two extremes

Delfi COM Platform What stands behind the term: Delfi COM Platform? The Delfi COM Platform

The General-Purpose Storage Revolution Jeff Bonwick Sun Fellow and CTO, Storage Sun

GDPR Breach &amp; Automated Decision Making Part 5 of our series on GDPR and its impact on the

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and Tony Hammond

ExFaKT Ex plaining F acts over K nowledge Graphs &amp; T ext Mohamed H. Gad-Elrab Daria Stepanova

Automatic Code Generation for Library Method Inclusion in Domain Specific Languages

A Human-Centric Approach to Program Understanding Ph.D. Dissertation Proposal Raymond P.L. Buse

Question format -- open Dos and donts of occupation Because occupations are

MultiGPU Made Easy by OmpSs + CUDA/OpenACC Antonio J. Pea Sr. Researcher &amp; Activity Lead

GDPR Breach & Automated Decision Making Part 5 of our series on GDPR and its impact on the

ExFaKT Ex plaining F acts over K nowledge Graphs & T ext Mohamed H. Gad-Elrab Daria Stepanova

MultiGPU Made Easy by OmpSs + CUDA/OpenACC Antonio J. Pea Sr. Researcher & Activity Lead