Architecting a data platform to support analytic workflows for - - PowerPoint PPT Presentation

architecting a data platform to support analytic
SMART_READER_LITE
LIVE PREVIEW

Architecting a data platform to support analytic workflows for - - PowerPoint PPT Presentation

Architecting a data platform to support analytic workflows for scientific data Sun Maria Lehmann, Equinor Jane McConnell, Teradata We work in the Upstream Oil and Gas Industry Upstream Downstream like chemical Transport Retail like


slide-1
SLIDE 1

Architecting a data platform to support analytic workflows for scientific data

Sun Maria Lehmann, Equinor Jane McConnell, Teradata

slide-2
SLIDE 2

#sunandjanetalkdata

We work in the Upstream Oil and Gas Industry

Exploration Development Production Refining Trading Supply & Distribution Retail

Retail – like any other Transport and Logistics Downstream – like chemical manufacturing Trading – similar to any

  • ther commodities trading

Upstream

slide-3
SLIDE 3

#sunandjanetalkdata

IT data Scientific Data OT data

Upstream O&G has complex and varied data

Subsurface Facilities Business/Mgmt Reporting

slide-4
SLIDE 4

#sunandjanetalkdata

Subsurface Data

§ Measurement data from sensors, often many TBs § Mainly received in batch as the result of a data acquisition event carried out by a different company (oil field service company)

slide-5
SLIDE 5

#sunandjanetalkdata

…buried in a long history of data exchange formats

DVD Tape Disk

slide-6
SLIDE 6

#sunandjanetalkdata

This is how this data is traditionally stored

§ Library style storage

  • Physical items (rocks, fluids)
  • Tapes and hardcopy
  • Digital files

§ To use data, it is moved into technical/scientific applications

  • File > Import…
  • Manual
  • Decisions made during import
slide-7
SLIDE 7

#sunandjanetalkdata

Digital Transformation? Not yet. Interpreting seismic

slide-8
SLIDE 8

#sunandjanetalkdata

And it’s the same for interpreting well logs

slide-9
SLIDE 9

#sunandjanetalkdata

True digital transformation requires a data platform that:

Keeps all the data safe for the future Allows you to look wide – across all of the data, from different oil fields, from different countries Allows you to look deep - into the detail and history of the data Allows you to combine data across traditional boundaries

slide-10
SLIDE 10

#sunandjanetalkdata

We need to reduce the manual steps

§ Every piece of data treated as unique § Data is stored as-is, and then manually coerced into applications § Data is valid for decades

  • so we are often loading old formats

§ Data sets might not have complete metadata § If metadata is missing – can we infer it?

  • Humans can, if they are experts in the field
slide-11
SLIDE 11

Improving data management with autonomous pipelines

12

Explore Identify Contextualize Standardize Pipleine Automate Scale

Building Autonomy

slide-12
SLIDE 12

#sunandjanetalkdata

Building Autonomous Capabilities

Fully Manual

  • Humans Explore,

Identify and Describes new pipelines

Assisted Ingest

  • Humans

Standardize, Test and Improve ingest

Partial Automation

  • Humans build

pipelines

Conditional Automation

  • Humans defines new

scalable piplines and AI can be trained

  • Scalable

High Automation

  • AI can do new

pipelines with human supervision

  • High Scalability

Full Autonomous

  • Fully created by AI
  • n demand
  • Fully scalable

Increased Automation Lifting the Baseline

slide-13
SLIDE 13

#sunandjanetalkdata

Consumers

Data Analysts Data Scientists General Consumers Autonomous Application

Applications

Subsurface Interpretation Production Forecasting Simulation New Apps? Well Planning Business Generated Human Generated Interaction Generated Machine Generated

Sources

Prepare Ingest Consume Governance & Security

Metadata

Operational Business Technical

Reference Information Architecture

API Layer

Implement a layered data architecture

Preparing Matching & Transforming Landing Assign common keys Add measurements in standard units Create Derived Values Data as received, with metadata Extract from closed formats Connecting data as requested Optimised Structures

slide-14
SLIDE 14

#sunandjanetalkdata

Autonomous Pipelines with Layered Architecture

LAND

  • Safely store what you

received, in the format you received it – and with any metadata that came with it

EXTRACT

  • Get the data out of the

weird file format

MATCH

  • Assign correct keys

from MDM

  • Calculate and add

standardised measurements

TRANSFORM

  • Transform the data to a

standard model

PREPARE

  • Create datasets that

will serve specific usage needs

Building Autonomy

Add New Products to the Pipeline

Building Autonomy Building Autonomy Building Autonomy

slide-15
SLIDE 15

#sunandjanetalkdata

Who was it measured by?

LAND: Store what you receive, together with the metadata about your measurement data

42

What was measured? Why was it measured? When was it measured? Where was it measured? Is this the raw measure, or a derived value? What’s the accuracy of the measure? What unit?

slide-16
SLIDE 16

#sunandjanetalkdata

EXTRACT: Make it readable and re-usable. Don’t worry if data gets bigger

slide-17
SLIDE 17

#sunandjanetalkdata

MATCH: Common Keys

§ Master Data Management § Reference Data Management § Ontology and Business Glossary

  • Old workflows stayed within disciplines, and now

drilling engineers use different words from exploration geoscience from production

  • perations
slide-18
SLIDE 18

#sunandjanetalkdata

MATCH: Units of Measure

§ Define your standard Units of Measure (normally best to use SI) § Create a service to do UoM conversions

  • We suggest to use the Energistics UoM

v1 dataset as your source § When your source data is in a different unit system, convert it and add the standard UoM values to your data § Keep the original measured data and unit – just in case J

slide-19
SLIDE 19

#sunandjanetalkdata

MATCH: Geospatial Data

§ Geospatial data in lat/lon – in decimal degrees, or degrees, minutes, seconds § Geospatial data in a projected coordinate system – like UTM or NAD83 – in metres or in yards § To be able to combine data from different regions, create a converted version (normally WGS84 lat/lon in decimal degrees) and store with your data § Create a service to do transformations – suggest http://www.epsg-registry.org as your source for transformations

slide-20
SLIDE 20

#sunandjanetalkdata

TRANSFORM : SOL vs SOR – it’s AND not OR

Data Combined and ready for the Analytics you want to perform, needs precision not

  • guesses. This will require

transformation models Some transformations are not known, only guesses. How you transform the data to join it depends on what you are using it for

slide-21
SLIDE 21

#sunandjanetalkdata

Derived Data in Transform

§ With transactional data, derived data is normally SUM, MIN, MEAN, MAX § With scientific data, it can be positions and geometric projections § If there is an accepted company standard way to convert from eg directional survey to well path, then this can be done in TRANSFORM

slide-22
SLIDE 22

#sunandjanetalkdata

Things which are guesses – Joining Well and Seismic Data

§ Measured in two-way travel time (ms) § Each data point is the size of an office building § Data for a large volume of the subsurface § Measured in depth, a distance (metres or feet/inches) § Cm scale or smaller § Data only valid on the well path Matching this data requires decisions

  • A time-depth mapping
  • Decisions on how far – and how - to

propagate well data through the volume When there is a choice – you shouldn’t do it in TRANSFORM – it belongs in PREPARE

slide-23
SLIDE 23

#sunandjanetalkdata

PREPARE: Datasets for a specific purpose

OLD WORLD APPLICATIONS § Creation of files in “transfer format” § Feeding existing app APIs § Re-creatable NEW WORLD ANALYTICS § Creating big wide analytical datasets § Datasets supporting new applications via APIs

Understand :

  • Usage scenarios
  • Data freshness
  • Performance requirements
  • Accuracy, precision
  • Granularity
slide-24
SLIDE 24

#sunandjanetalkdata

If your boss asks you, tell them that I said “build a Unified Data Warehouse” – Andrew Ng

Source: Nuts and bolts of applying deep learning

PREPARE: The biggest blocker to data science is creating the analytical datasets

slide-25
SLIDE 25

#sunandjanetalkdata

LAND

  • Safely store what you

received, in the format you received it – and with any metadata that came with it

EXTRACT

  • Get the data out of the

weird file format

MATCH

  • Assign correct keys

from MDM

  • Calculate and add

standardised measurements

TRANSFORM

  • Transform the data to a

standard model

PREPARE

  • Create datasets that

will serve specific usage needs

Building Autonomy

Add New Products to the Pipeline

Building Autonomy Building Autonomy Building Autonomy

PREPARE: You need re-creatable datasets

Whether you persist your prepared layer or deliver it on the fly: AUTOMATE!!! You will need to recreate these prepared datasets many, many times based on changing assumptions for transformations and joins Anything you cannot automatically recreate – if there was human intervention – then it’s a NEW dataset and needs to go back in to LAND, with new metadata

slide-26
SLIDE 26

#sunandjanetalkdata

In Summary

§ LAND data – as it was received, with all required metadata § EXTRACT from ugly, maybe binary, transfer formats to human-readable, self-describing formats (and check metadata again) § MATCH

  • Master and reference data, Units of Measure,

Geospatial data

§ TRANSFORM everything that is true – and no more § PREPARE datasets for specific usage, and for the

  • ld way of working as well as the new

If you can’t recreate a dataset without human input – it’s a new dataset, and needs to go back to LAND

slide-27
SLIDE 27

#sunandjanetalkdata

Jane McConnell

Practice Partner O&G , Industrial IoT Group Jane.mcconnell@teradata.com +44 (0)7936 703343 My blog on Teradata.com Follow me on Twitter @jane_mcconnell My profile

Sun Maria Lehmann

Leading Engineer, Enterprise Data Management

Equinor, Norway, Trondheim

Follow me on Twitter @sunle My profile

slide-28
SLIDE 28

#sunandjanetalkdata

Rate today ’s session

Session page on conference website O’Reilly Events App