You call it Data Lake; we call it Data Historian
Naghman Waheed – Data Platforms Lead Brian Arnold – Data Platforms Architect May-24-2018
You call it Data Lake; we call it Data Historian Naghman Waheed - - PowerPoint PPT Presentation
You call it Data Lake; we call it Data Historian Naghman Waheed Data Platforms Lead Brian Arnold Data Platforms Architect May-24-2018 Naghman Waheed Brian Arnold Data Platforms Lead Data Platforms Architect 10 year career in IT, 6
Naghman Waheed – Data Platforms Lead Brian Arnold – Data Platforms Architect May-24-2018
Data Platforms Lead Data Platforms Architect
world
United States
Place to Work Institute
Produce with more judicious use
resources. improve the lives of the world’s farmers. Increase production to meet needs of a growing population.
“We succeed when farmers succeed.”
CEO
Rising Population
Growing enough for a growing world
Global Population
1980 TODAY 2050
4.4B 7.1B 9.6B+
Limited Farmland
Farmers will need to produce enough food with fewer resources to support our world population
Acres per Person
1961
2050
1
<1/3
Changing Climate
Farmers are impacted by climate change in many ways:
WATER AVAILABILITY ISSUES INCREASINGLY UNPREDICTABLE WEATHER INSECT RANGE EXPANSION WEED PRESSURE CHANGES CROP DISEASE INCREASES PLANTING ZONE SHIFTS
Changing Economies and Diets
A growing global middle class is choosing animal protein – meat, eggs, and dairy – as a larger part of their diet
Dietary Percentage of Protein
14%
1965 2030
9%
5
Plant Breeding Biotechnology
Crop Protection Precision Agriculture
Economies of Data Science at Scale
2050 <1/3
Mobile Device Proliferation among Growers
A typical farm is generating 20GB of unique field data every year Computing unit costs have gone down by 1,000x in last 10 years 94% of US farmers own a mobile phone or a smartphone Compared to less than 10& 10 years ago
1961
1
Low-cost Observation Technology /IoT
Connected sensors on tractors, combines, and in fields has increased
The cost of the average digital sensor had dropped more than half over that time Source : Gartner Technology Trends 2015
Tolerance
release
commitment
Discover
Ingest Process Persist Integrate Analyze Expose
Company 360 Product 360 Customer 360 Event 360 Location 360 Insights
Data FrontDoor Haystack
Kafka
Enterprise Data Hub Visualization Enterprise Data Warehouse Research Datastore Other Datastores Ancestry Datastore
Change Data Capture Change Data Capture Geospatial Platform Extract Transform Load Quality Management Analytics Platform
API Gateway Data FrontDoor
Custom API Harvester Authentication Authorization
Identity Management
Tag & Register APIs
Virtual Directory Service Transactional Systems Company 360 Product 360 Customer 360 Event 360 Location 360 Insights
Trusted Partner Portal
Kafka
Enterprise Data Hub Enterprise Data Warehouse Research Datastore Change Data Capture
Archive Log 30 minute latency
Data Stores
Other Datastores Ancestry Datastore
Data Historian
Haystack
Topic Metadata
Change Data Capture Batch Ingestion Streaming Ingestion API Ingestion Quality Management UI Ingestion Extract Transform Load Visualization Virtualization Geospatial Platform Ontology Management
To API Gateway Metadata linked to search
Analytics Platform
To Data Historian
To IDM
Data Storage & Processing
Monsanto Internal Users Monsanto Internal Users
Adhoc Analysis
Identity Management
API Gateway API Access
Authentication / Authorization Authentication / Authorization
Metadata Management
Kafka
File Upload
AWS S3 Storage Metadata Store Archive Glacier Storage
Data Ingest
Historian UI
Access
Historian UI
Applications
Streaming Data Stores
Governance Rules
Data Stores
Audit Rules
Query Engine
Data Historian Processing Engine Security
S3 Glacier Lambda
Scheduler Import Raw Records Build Hive Staging Tables in HDFS Validation Export Data To Master Tables in S3 Export Raw Data To S3 Archive Export Rebuild Materialized View
Scheduler Calculate Query Predicate Materialized Export Archive Export RDBMS Export Kafka Export S3 Export Purge Target Purge Source Validation
Physical
Data Historian API
Virtual
Client Data Historian UI Data Historian JDBC Driver Data Historian Security Service
20
4
21
4
22
4
23
4
24
4
25
26
27