Architecting a data platform to support analytic workflows for - - PowerPoint PPT Presentation
Architecting a data platform to support analytic workflows for - - PowerPoint PPT Presentation
Architecting a data platform to support analytic workflows for scientific data Sun Maria Lehmann, Equinor Jane McConnell, Teradata We work in the Upstream Oil and Gas Industry Upstream Downstream like chemical Transport Retail like
#sunandjanetalkdata
We work in the Upstream Oil and Gas Industry
Exploration Development Production Refining Trading Supply & Distribution Retail
Retail – like any other Transport and Logistics Downstream – like chemical manufacturing Trading – similar to any
- ther commodities trading
Upstream
#sunandjanetalkdata
IT data Scientific Data OT data
Upstream O&G has complex and varied data
Subsurface Facilities Business/Mgmt Reporting
#sunandjanetalkdata
Subsurface Data
§ Measurement data from sensors, often many TBs § Mainly received in batch as the result of a data acquisition event carried out by a different company (oil field service company)
#sunandjanetalkdata
…buried in a long history of data exchange formats
DVD Tape Disk
#sunandjanetalkdata
This is how this data is traditionally stored
§ Library style storage
- Physical items (rocks, fluids)
- Tapes and hardcopy
- Digital files
§ To use data, it is moved into technical/scientific applications
- File > Import…
- Manual
- Decisions made during import
#sunandjanetalkdata
Digital Transformation? Not yet. Interpreting seismic
#sunandjanetalkdata
And it’s the same for interpreting well logs
#sunandjanetalkdata
True digital transformation requires a data platform that:
Keeps all the data safe for the future Allows you to look wide – across all of the data, from different oil fields, from different countries Allows you to look deep - into the detail and history of the data Allows you to combine data across traditional boundaries
#sunandjanetalkdata
We need to reduce the manual steps
§ Every piece of data treated as unique § Data is stored as-is, and then manually coerced into applications § Data is valid for decades
- so we are often loading old formats
§ Data sets might not have complete metadata § If metadata is missing – can we infer it?
- Humans can, if they are experts in the field
Improving data management with autonomous pipelines
12
Explore Identify Contextualize Standardize Pipleine Automate Scale
Building Autonomy
#sunandjanetalkdata
Building Autonomous Capabilities
Fully Manual
- Humans Explore,
Identify and Describes new pipelines
Assisted Ingest
- Humans
Standardize, Test and Improve ingest
Partial Automation
- Humans build
pipelines
Conditional Automation
- Humans defines new
scalable piplines and AI can be trained
- Scalable
High Automation
- AI can do new
pipelines with human supervision
- High Scalability
Full Autonomous
- Fully created by AI
- n demand
- Fully scalable
Increased Automation Lifting the Baseline
#sunandjanetalkdata
Consumers
Data Analysts Data Scientists General Consumers Autonomous Application
Applications
Subsurface Interpretation Production Forecasting Simulation New Apps? Well Planning Business Generated Human Generated Interaction Generated Machine Generated
Sources
Prepare Ingest Consume Governance & Security
Metadata
Operational Business Technical
Reference Information Architecture
API Layer
Implement a layered data architecture
Preparing Matching & Transforming Landing Assign common keys Add measurements in standard units Create Derived Values Data as received, with metadata Extract from closed formats Connecting data as requested Optimised Structures
#sunandjanetalkdata
Autonomous Pipelines with Layered Architecture
LAND
- Safely store what you
received, in the format you received it – and with any metadata that came with it
EXTRACT
- Get the data out of the
weird file format
MATCH
- Assign correct keys
from MDM
- Calculate and add
standardised measurements
TRANSFORM
- Transform the data to a
standard model
PREPARE
- Create datasets that
will serve specific usage needs
Building Autonomy
Add New Products to the Pipeline
Building Autonomy Building Autonomy Building Autonomy
#sunandjanetalkdata
Who was it measured by?
LAND: Store what you receive, together with the metadata about your measurement data
42
What was measured? Why was it measured? When was it measured? Where was it measured? Is this the raw measure, or a derived value? What’s the accuracy of the measure? What unit?
#sunandjanetalkdata
EXTRACT: Make it readable and re-usable. Don’t worry if data gets bigger
#sunandjanetalkdata
MATCH: Common Keys
§ Master Data Management § Reference Data Management § Ontology and Business Glossary
- Old workflows stayed within disciplines, and now
drilling engineers use different words from exploration geoscience from production
- perations
#sunandjanetalkdata
MATCH: Units of Measure
§ Define your standard Units of Measure (normally best to use SI) § Create a service to do UoM conversions
- We suggest to use the Energistics UoM
v1 dataset as your source § When your source data is in a different unit system, convert it and add the standard UoM values to your data § Keep the original measured data and unit – just in case J
#sunandjanetalkdata
MATCH: Geospatial Data
§ Geospatial data in lat/lon – in decimal degrees, or degrees, minutes, seconds § Geospatial data in a projected coordinate system – like UTM or NAD83 – in metres or in yards § To be able to combine data from different regions, create a converted version (normally WGS84 lat/lon in decimal degrees) and store with your data § Create a service to do transformations – suggest http://www.epsg-registry.org as your source for transformations
#sunandjanetalkdata
TRANSFORM : SOL vs SOR – it’s AND not OR
Data Combined and ready for the Analytics you want to perform, needs precision not
- guesses. This will require
transformation models Some transformations are not known, only guesses. How you transform the data to join it depends on what you are using it for
#sunandjanetalkdata
Derived Data in Transform
§ With transactional data, derived data is normally SUM, MIN, MEAN, MAX § With scientific data, it can be positions and geometric projections § If there is an accepted company standard way to convert from eg directional survey to well path, then this can be done in TRANSFORM
#sunandjanetalkdata
Things which are guesses – Joining Well and Seismic Data
§ Measured in two-way travel time (ms) § Each data point is the size of an office building § Data for a large volume of the subsurface § Measured in depth, a distance (metres or feet/inches) § Cm scale or smaller § Data only valid on the well path Matching this data requires decisions
- A time-depth mapping
- Decisions on how far – and how - to
propagate well data through the volume When there is a choice – you shouldn’t do it in TRANSFORM – it belongs in PREPARE
#sunandjanetalkdata
PREPARE: Datasets for a specific purpose
OLD WORLD APPLICATIONS § Creation of files in “transfer format” § Feeding existing app APIs § Re-creatable NEW WORLD ANALYTICS § Creating big wide analytical datasets § Datasets supporting new applications via APIs
Understand :
- Usage scenarios
- Data freshness
- Performance requirements
- Accuracy, precision
- Granularity
#sunandjanetalkdata
If your boss asks you, tell them that I said “build a Unified Data Warehouse” – Andrew Ng
Source: Nuts and bolts of applying deep learning
PREPARE: The biggest blocker to data science is creating the analytical datasets
#sunandjanetalkdata
LAND
- Safely store what you
received, in the format you received it – and with any metadata that came with it
EXTRACT
- Get the data out of the
weird file format
MATCH
- Assign correct keys
from MDM
- Calculate and add
standardised measurements
TRANSFORM
- Transform the data to a
standard model
PREPARE
- Create datasets that
will serve specific usage needs
Building Autonomy
Add New Products to the Pipeline
Building Autonomy Building Autonomy Building Autonomy
PREPARE: You need re-creatable datasets
Whether you persist your prepared layer or deliver it on the fly: AUTOMATE!!! You will need to recreate these prepared datasets many, many times based on changing assumptions for transformations and joins Anything you cannot automatically recreate – if there was human intervention – then it’s a NEW dataset and needs to go back in to LAND, with new metadata
#sunandjanetalkdata
In Summary
§ LAND data – as it was received, with all required metadata § EXTRACT from ugly, maybe binary, transfer formats to human-readable, self-describing formats (and check metadata again) § MATCH
- Master and reference data, Units of Measure,
Geospatial data
§ TRANSFORM everything that is true – and no more § PREPARE datasets for specific usage, and for the
- ld way of working as well as the new
If you can’t recreate a dataset without human input – it’s a new dataset, and needs to go back to LAND
#sunandjanetalkdata
Jane McConnell
Practice Partner O&G , Industrial IoT Group Jane.mcconnell@teradata.com +44 (0)7936 703343 My blog on Teradata.com Follow me on Twitter @jane_mcconnell My profile
Sun Maria Lehmann
Leading Engineer, Enterprise Data Management
Equinor, Norway, Trondheim
Follow me on Twitter @sunle My profile
#sunandjanetalkdata
Rate today ’s session
Session page on conference website O’Reilly Events App