Apache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks
A Use Case Guided Explanation
Chris Herrera Hashmap
Apache Ignite - Using a Memory Grid for Heterogeneous Computation - - PowerPoint PPT Presentation
Apache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks A Use Case Guided Explanation Chris Herrera Hashmap Topics Who - Key Hashmap Team Members The Use Case - Our Need for a Memory Grid Requirements
A Use Case Guided Explanation
Chris Herrera Hashmap
2
3
WHO
and Pune
REACH
PARTNERS
4
Chris Herrera
Chief Architect/Innovation Officer
Hashmap
Houston, TX
Akshay Mhetre
Team Lead
Hashmap
Pune, India
Jay Kapadnis
Lead Architect
Hashmap
Pune, India
Oilfield Drilling Data Processing
6
Plan
WITSML Server
Plan Store Optimize
Execute
7
Vendors Financial Homegrown
TDM EDM WellView Homegrown Data Analyst
8
Mud Logger Cement Wireline MWD
CSV CSV CSV CSV CSV DLIS
WITSML Server WITSML Server
Magic
Data Analyst
9
Vendors Financial Homegrown TDM EDM WellView Homegrown
data cleansing operations
that has to be done with a combination of experts
man-hours Data Analyst
10
11
Feature Engineering
Generate additional features that are required to get useful insights into the data
Persist & Report
Land the data into a store that allows for BI reports and interactive queries
Clean
Deduplicate, interpolate, pivot, split, aggregate
Load
Load the data into a staging area to start understanding what to do with it
Identify & Enrich
Understand where the data came from and what its global key should be
Parse
Parse the data from CSV, WITSML, DLIS, etc...
What do we have to do?
13
14
Description Requirement
1
Heterogeneous Data Ingest
2
Robust Data Pipeline
computational frameworks / runtimes
3
Extensible Feature Engineering
4
Scalable
step, it does not continue with erroneous data
5
Reliable
How Then?
16
TDM EDM
WellView
Homegrown
HDFS
TDM EDM Well View
WITSML
HDFS Hive
WITS ML Server
CS V CS V
Files
Spark Zeppelin BI
Staging Reporting Marts
17
○
Mostly achieved via Jupyter Notebooks
18
Achieved Description Requirement
1
Heterogeneous Data Ingest
2
Robust Data Pipeline
3
Extensible Feature Engineering
4
Scalable
5
Robust
An Architectural Midstep
20
TDM EDM
WellView
Homegrown
HDF S
TDM
HDFS/IGFS
Hive
WITS ML Server
CS V CS V
Files
Spark Jupyter BI
Staging Reporting Marts
Ignite
WITSML
EDM Well View
In-Memory MapReduce
How Now?
22
Kubernetes
HDFS
Ignite
Spark Zeppelin
Service Grid Memory Grid Docker Caches
Workflow Cache
Workflow API Scheduler API Flink Functions API Persistent Storage (Configurable) Functions
Workflow Cache Function Function
23
Service Service Service
Key Val SQL / DF Key Val SQL / DF Function 1 Function 2 Function 3 Source
24
25
26
df.write .format(FORMAT_IGNITE) .option(OPTION_TABLE, tableName) // table name to store data .option(OPTION_CREATE_TABLE_PRIMARY_KEY_ FIELDS, “id”) .save()
Service
Key Val DF Spark Function
27
val cache = ignite.getOrCreateCache(cacheConfig) val cursor = cache.query(new SqlFieldsQuery(s”SELECT * FROM $tableName limit 20")) val data = cursor.getAll
Service
Key Val DF Spark Function API
28
Service Service Service
Key Val SQL Key Val SQL Java WITSML Client (Docker) Channel Mapping / Unit Conversion (Docker) Rig State Detection / Enrichment / Pivot (Spark)
WITS ML Server
Workflow API Scheduler API
29
Achieved Description Requirement
1
Heterogeneous Data Ingest
2
Robust Data Pipeline
3
Extensible Feature Engineering
4
Scalable
5
Robust
30
31
How Now?
33
How Now?
35
Chris Herrera Hashmap
Apache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks
A Use Case Guided Explanation