You call it Data Lake; we call it Data Historian Naghman Waheed - - PowerPoint PPT Presentation

▶

Dec 16, 2023 244 likes •527 views

You call it Data Lake; we call it Data Historian Naghman Waheed Data Platforms Lead Brian Arnold Data Platforms Architect May-24-2018 Naghman Waheed Brian Arnold Data Platforms Lead Data Platforms Architect 10 year career in IT, 6

SLIDE 1

You call it Data Lake; we call it Data Historian

Naghman Waheed – Data Platforms Lead Brian Arnold – Data Platforms Architect May-24-2018

SLIDE 2

Naghman Waheed Brian Arnold

25+ year career at Monsanto.
Data Warehousing, Business

Intelligence, Data Architecture, Cloud Engineering.

Data solutions spanning key

business functions such as Supply Chain, Manufacturing, Order-To- Cash, Finance and Procurement.

Data Platforms Lead Data Platforms Architect

10 year career in IT, 6 years in Big

Data

Software Development, Functional

Programming, Streaming, Big Data, Cloud Engineering

Ecommerce, Recommendation

Engines

SLIDE 3

Monsanto - Who are we?

Bringing a broad range of solutions to help nourish our growing

world

Headquartered in Saint Louis, Missouri
>20,000 employees in 66 countries
A global company with >50% employees based outside of the

United States

One of the 25 World’s Best Multinational Workplaces by Great

Place to Work Institute

Produce with more judicious use

f limited natural

resources. improve the lives of the world’s farmers. Increase production to meet needs of a growing population.

“We succeed when farmers succeed.”

Hugh Grant, Monsanto

CEO

SLIDE 4

Solving real challenges in agriculture industry

Rising Population

Growing enough for a growing world

Global Population

1980 TODAY 2050

4.4B 7.1B 9.6B+

Limited Farmland

Farmers will need to produce enough food with fewer resources to support our world population

Acres per Person

1961

2050

<1/3

Changing Climate

Farmers are impacted by climate change in many ways:

WATER AVAILABILITY ISSUES INCREASINGLY UNPREDICTABLE WEATHER INSECT RANGE EXPANSION WEED PRESSURE CHANGES CROP DISEASE INCREASES PLANTING ZONE SHIFTS

Changing Economies and Diets

A growing global middle class is choosing animal protein – meat, eggs, and dairy – as a larger part of their diet

Dietary Percentage of Protein

14%

1965 2030

SLIDE 5

Our Solutions for Sustainable Agriculture

Our toolkit includes:

Plant Breeding Biotechnology

Crop Protection Precision Agriculture

SLIDE 6

Key Technology Trends In Agriculture

Economies of Data Science at Scale

2050 <1/3

Mobile Device Proliferation among Growers

A typical farm is generating 20GB of unique field data every year Computing unit costs have gone down by 1,000x in last 10 years 94% of US farmers own a mobile phone or a smartphone Compared to less than 10& 10 years ago

1961

Low-cost Observation Technology /IoT

Connected sensors on tractors, combines, and in fields has increased

ver 1000x in the last 10 years

The cost of the average digital sensor had dropped more than half over that time Source : Gartner Technology Trends 2015

SLIDE 7

Why Data Historian?

Strategy

Cloud First
Open Source
API First
Ecosystem fit

Capabilities

Ingestion
Access
Integration
Self Service

Architecture

Scalable
Fault

Tolerance

Performance

TCO

License
Infrastructure
Cost review
Support

Build vs. Buy

Customization
Iterative

release

Technology

commitment

SLIDE 8

Discover

Data Strategy

Ingest Process Persist Integrate Analyze Expose

Company 360 Product 360 Customer 360 Event 360 Location 360 Insights

thers

Data FrontDoor Haystack

Kafka

Enterprise Data Hub Visualization Enterprise Data Warehouse Research Datastore Other Datastores Ancestry Datastore

Data Historian

Change Data Capture Change Data Capture Geospatial Platform Extract Transform Load Quality Management Analytics Platform

SLIDE 9

API Gateway Data FrontDoor

Custom API Harvester Authentication Authorization

Identity Management

Tag & Register APIs

Virtual Directory Service Transactional Systems Company 360 Product 360 Customer 360 Event 360 Location 360 Insights

thers

Trusted Partner Portal

Kafka

Enterprise Data Hub Enterprise Data Warehouse Research Datastore Change Data Capture

Archive Log 30 minute latency

Data Stores

Other Datastores Ancestry Datastore

Data Historian

Haystack

Topic Metadata

Change Data Capture Batch Ingestion Streaming Ingestion API Ingestion Quality Management UI Ingestion Extract Transform Load Visualization Virtualization Geospatial Platform Ontology Management

To API Gateway Metadata linked to search

Analytics Platform

To Data Historian

Data Platforms Ecosystem

To IDM

SLIDE 10

Data Storage & Processing

Monsanto Internal Users Monsanto Internal Users

Adhoc Analysis

Identity Management

API Gateway API Access

Authentication / Authorization Authentication / Authorization

Metadata Management

Kafka

File Upload

AWS S3 Storage Metadata Store Archive Glacier Storage

Data Ingest

Historian UI

Access

Historian UI

Applications

Data Historian - Reference Architecture

Streaming Data Stores

Governance Rules

Data Stores

Audit Rules

Query Engine

Data Historian Processing Engine Security

SLIDE 11

Data Historian – Technology

S3 Glacier Lambda

SLIDE 12

AWS Data Historian Architecture

SLIDE 13

Batch imports from RDBMS
Full, delta, merge
Streaming from Kafka (Datahub)
File ingestion through API and Data Historian UI
Users can append files to existing datasets as well

Ingest

SLIDE 14

Ingestion Process

Scheduler Import Raw Records Build Hive Staging Tables in HDFS Validation Export Data To Master Tables in S3 Export Raw Data To S3 Archive Export Rebuild Materialized View

SLIDE 15

Required fields
Name, Description, Source, Publisher, etc.
Optional fields
Tags, custom fields
Forwarded to our metadata platform (Haystack)
Metadata objects pushed through Kafka (Datahub)

Metadata

SLIDE 16

Export to RDBMS
Full, delta, merge
Export to Kafka
Export to Redshift
Export to S3
Materialized Export

Exports

Scheduler Calculate Query Predicate Materialized Export Archive Export RDBMS Export Kafka Export S3 Export Purge Target Purge Source Validation

SLIDE 17

Archive & Retention
Automated Compliance Checks
Security
Permissions

Governance

SLIDE 18

Get/List/Put Datasets
Get/Put Dataset Metadata
Get Dataset Status
Query
SDKs
Java, Scala, R, Python

APIs & Integration

SLIDE 19

Physical

APIs - Query

Data Historian API

Virtual

Client Data Historian UI Data Historian JDBC Driver Data Historian Security Service

SLIDE 20

Data Historian UI - Query Interface

SLIDE 21

Data Historian UI – Browse Datasets

SLIDE 22

Data Historian UI – Dataset Details

SLIDE 23

Data Historian UI – Permissions Management

SLIDE 24

Data Historian UI - Future

SLIDE 25

Highlights

v1.0 production release 16 months ago
164 active datasets in prod
10TB of data in prod
>1,000+ query requests per day
Early Adopters :
Internal Security Office, Research & Development
Early Majority :
IoT, Data Assets, Supply Chain
Late Majority :
Finance, Commercial, HR, Other

SLIDE 26

Lessons Learned

Open Source
Flexibility
Learning Curve
Specialized Skill Set
Cloud – AWS
Agility
Security
Support
Resource Staffing

SLIDE 27

You call it Data Lake; we call it Data Historian

Naghman Waheed Brian Arnold

Intelligence, Data Architecture, Cloud Engineering.

business functions such as Supply Chain, Manufacturing, Order-To- Cash, Finance and Procurement.

Data

Programming, Streaming, Big Data, Cloud Engineering

Engines

Monsanto - Who are we?

Solving real challenges in agriculture industry

Our Solutions for Sustainable Agriculture

Our toolkit includes:

Key Technology Trends In Agriculture

Why Data Historian?

Strategy

Capabilities

Architecture

TCO

Build vs. Buy

Data Strategy

Data Historian

Data Platforms Ecosystem

Data Historian - Reference Architecture

Data Historian – Technology

AWS Data Historian Architecture

Ingest

Ingestion Process

Metadata

Exports

Governance

APIs & Integration

APIs - Query

Data Historian UI - Query Interface

Data Historian UI – Browse Datasets

Data Historian UI – Dataset Details

Data Historian UI – Permissions Management

Data Historian UI - Future

Highlights

Lessons Learned

Questions?