IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. - PowerPoint PPT Presentation

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK

DR. KONSTANTIN BOUDNIK • Over 20+ years of expertise in distributed systems, big- and fast-data platforms • Apache Ignite Incubator Champion • Author of 17 US patents in distributed computing • A veteran Apache Hadoop developer DR.KONSTANTIN BOUDNIK • Co-author of Apache Bigtop, used by Amazon EPAM SYSTEMS   CHIEF TECHNOLOGIST BIGDATA,   EMR, Google Cloud Dataproc, and other OPEN SOURCE FELLOW   major Hadoop vendors • Co-author of the book "Professional Hadoop”

DR. ALEXANDRE BOUDNIK • Over 25 years of expertise in compilers, query engine for MPP development, computer security, distributed systems, Big Data and Fast Data • Architect and Visionary at EPAM’s BigData CC • Focusing is on scalable, fault tolerant DR.ALEXANDRE BOUDNIK distributed share-nothing clusters EPAM SYSTEMS   LEAD SOLUTION ARCHITECT BIG& FAST DATA   • Led projects for financial and banking industries with intensive distributed in-memory calculations

AGENDA � Modern data-processing architectures � In-memory Data Fabric � Iota in action: virtual data platform � Use cases

� EVERYTHING IS IN ONE SLIDE THE REST IS MERE DETAILS � Don’t separate batch and stream data processing � Compute should be co-located with data � Data mutations have to be tracked � Data concurrency is annoying That’s it: you can go now

NOT ALL LAMBDAs ARE EQUAL Greek alphabet needs more letters • Lambda ( λ ): an anonymous function (closure) � def greeting = { it -> "Hello, $it!" }   assert greeting('SEC 2017') == 'Hello, SEC 2017!' • PaaS server-less architecture (AWS Lambda and alike) � exports.handler = function (event, context) { context.succeed('Hello, SEC 2017!');   };

LAMBDA: QUICK OVERVIEW 2 • Consists of three main layers High-latency layer for historical 1. Speed layer for recent/stream 2. data Smart reconciliation layer 3. • Properties 1 � Immutable, one-way data ingest • Drawbacks 3 • Data accuracy is an issue • High operational complexity

SOME LAMBDAs ARE KAPPAs � Simplified to 3 Streaming source 1. 2 1 Streaming processing 2. Stream-only serving DB 3. � Properties � Historical processing is a stream � Reprocessing is just a stream job � Drawbacks • (Re)streaming of the historical data on replay • Moderate operational complexity

NEXT TO EACH OTHER Batch (slow): ’Hello, ’ Serving DB   (to reconcile) Events Stream (fast): ’I’,’M’,’C’,’S’,’ ’,’2’,’0’,’1’,’7’,’!’ • Processing (Lambda) architecture for slow and fast data • Some Lambdas are really Kappas Stream Processor: ’Hello’, ’I’,’M’,’C’,’S’,’ ’,’2’,’0’,’1’,’7’,’!’ Serving DB   Events (up-to-date) Catch-up Code change: Code change: repocessing repocessing

IN-MEMORY DATA FABRIC PICTURE OR IT NEVER HAPPEND • Separation of concerns • Sources • Consumers • Abstraction and processing

IN-MEMORY DATA FABRIC IN A NUTSHELL � Data Fabric is a unified view of data in multiple systems � A layer for data access � Low redundancy; few data movements � Write-through caching (might violate legacy app data integrity) � Affinity sensitive compute medium � Highly-available and fault tolerant � Variety of APIs and integration with BigData

NEXT STEP: IOTA BIGMEMORY Events Real- time Cache In-Memory Data Fabric Batch Cloud RDBMS DFS storage

A STEP TOWARDS THE DATA � Don’t separate batch and stream data processing � Compute should be co-located with data � Data mutations have to be tracked (watched and versioned) � Data concurrency is annoying

ISSUES OF DATA STORING & PROCESSING � Data state, persistency and immutability � Misperception of data primacy – what is the main copy? � Versioning of data, data structures, code and metadata � Uniform data access, Multi-structured data � Granular data access rights and security � ETL/ELT & Data Marts, Data lifecycle

TWO BREEDS OF DATAWAREHOUSES Update-Driven Heterogeneous Query-Driven Provides higher performance Builds wrappers/mediators on top of heterogeneous databases Integrates Data from heterogeneous sources Translates query to data-source specific Simplifies analyses: Data are ready for direct querying Single-Source-of-Truth practice Extra storage for copied data Complex information filtering Complex CDC for each data Massive data pull from data source sources

BIGDATA & QUERY-DRIVEN WAREHOUSE � Query-Driven Warehouse borrowed from BigData: � On demand extraction from schema-on-read data � Avoids complex ETLs � BigData addresses high query costs of Query-Driven Warehouse: � Read less data: partitioning � Lesser shuffle: share nothing, collocation, local filtering (pushdown) � Requires sophisticated extendable metadata

TWO BREEDS OF DATA PRIMARY & DERIVED � Primary Data are nondeterministic, non-reproducible and UNIQUE � persistent and immutable � Derived Data are deterministic and reproducible EXACTLY � ephemeral and immutable � Versioned metadata are Primary by its nature � persistent and immutable � Versioned Code is Primary by its nature � persistent and immutable � All abovementioned are immutable and therefor, STATELESS!

BENEFITS OF STATELESSNESS � No data concurrency issues � Majority of transactions are RAMP � Leveraging functional programming paradigm (lambda again!) � Read-through & memoization � Higher re-use of the code � Avoiding complex ETLs � On-demand extraction from schema-on-read data

MOVING PARTS � Persistent WORM stores (Write Once Read Many) � Primary data � Metadata & Code � Transient Cache stores � Derived data � Compute Engine � Reads WORM & Cache � Produces results � Puts results to Cache

PARTITIONING VS PATCHWORK HOW TO READ LESS • Partitions: statically defined in DDL • Patchworks: arbitrary structure of dynamically built patches

PATCHWORK DATA BLOCKS & DATA CATALOG � Data Blocks: � Describe a quantum of data � A set of semantically similar objects, limited by some dimensions � A URI: ftp, web, files, a parametrized SQL SELECT � Data Catalog: � A part of versioned metadata � Organizes Data Blocks into a Patchwork � Is a functional equivalent of RDBMS catalog

CACHE � Cache is transparent and transient by its nature: � Holds function results, instead of actual calls � Might hold Data Blocks � Cache Entry includes Key, Value, and Statistics : � last time value was accessed and how often (frequency) � dependency depth � resources spent, like CPU and IOs � Retention & Eviction: � Is based on Cache Entry statistics � The dependency graph’ Data Blocks are evicted with root entry

MISCELLANEOUS ASPECTS • Dependency graph is built from data access’ history: • Could be replaced by a reference to Data Block (compacted) • Invalidation & Lineage is driven by dependency graph • Functions: follow memoization pattern • Scalability – just put more boxes there, if: • WORM uses distributed Key-Value storage • Cache & Calculation engine use In-Memory Data Fabric

USE CASES � Better data lakes : bi-directional data movements � Minimal networking, Memory-centric, Integration with legacy � Real-time personalization � Better shopping with mobile devices, Location-based marketing � Near real-time promotions, Advanced analytics � Simplified ML-driven CEP � Fraud detection � Discovery of complex fraud patterns, based on historical data � Real-time detection of abnormal behavior � Simplified ML-driven CEP

IOTA BENEFITS • Avoiding multiple copies of the data, instant consistency • In-memory caching with read-ahead/write-behind support • Batch, streaming, CEP, and (near) real-time processing • Speeding up a traditionally slow, batch oriented frameworks • Variety of data processing: read-only, read-write, transactional • Lower inter-component impedance

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. - PowerPoint PPT Presentation

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK DR. KONSTANTIN BOUDNIK Over 20+ years of expertise in distributed systems, big- and fast-data platforms Apache Ignite Incubator

IOTA/FAST Collaboration Meeting - Intro Vladimir SHILTSEV, AD/APC IOTA/FAST Workshop and

iota The Internet of Things Critical Access Company 1 CONFIDENTIAL iota M2M Spectrum Networks

IOTA classification Susanne Johnson FRCOG Southampton, UK GynaecologyUltrasound.com IOTA

SIGMA IOTA RHO NATIONAL HONOR SOCIETY FOR INTERNATIONAL STUDIES WHAT IS SIGMA IOTA RHO? THE

Detecting single electrons in IOTA Giulio Stancari for the FAST/IOTA group Fermi National

Virtualization Virtualization Memory virtualization Process feels like it has its own

CryBlock 2019 G-IOTA: Fair and Confidence Aware Tangle Authors: Gewu BU, nder GURCAN (CEA LIST)

IOTA single-electron and low-intensity studies G. Stancari for the team IOTA/FAST Group Meeting

Virtualization and SDN Applications 2 Virtualization Network Virtualization Sharing

Virtualization What is Virtualization? Virtualization is the simulation of the software and/

Introduction to Virtual Machines Carl Waldspurger (SB SM 89 PhD 95) VMware R&D Overview

AMD Pacifica Virtualization Technology AMD Unveils Virtualization Platform AMD Pacifica

KVM MMU Virtualization Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Index What is MMU

EUROPA: Efficient User-Mode Packet Forwarding in Network Virtualization Virtualization Yong

Virtualization. A dream within a dream Type 1 Virtualization Hypervisor run on bare

Proposal of nonlinear optics measurements and correction in the IOTA ring M. Hofer , E.H. Maclean,

NOvA Data Quality Monitoring Framework Jim Musser NOvA Operational Readiness Review Oct 28,

Integrity Assurance in Resource-Bounded Systems through Stochastic Message Authentication Aron

-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain

Key Wrapping with the Keccak Permutation Dmitry Khovratovich University of Luxembourg 17 January

CS/INFO 330 Data-Driven Web Applications Mirek Riedewald mirek@cs.cornell.edu Course Goals

But Where Are We Going? Hans Fenstermacher (Translations.com/TransPerfect/GALA) David Filip

NYeC Monthly Association Call Kick-Off Valerie Grey Executive Director October 12, 2018 Agenda

Introduction Web Data Management and Distribution Serge Abiteboul Ioana Manolescu Philippe

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. - PowerPoint PPT Presentation

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK DR. KONSTANTIN BOUDNIK Over 20+ years of expertise in distributed systems, big- and fast-data platforms Apache Ignite Incubator

IOTA/FAST Collaboration Meeting - Intro Vladimir SHILTSEV, AD/APC IOTA/FAST Workshop and

iota The Internet of Things Critical Access Company 1 CONFIDENTIAL iota M2M Spectrum Networks

IOTA classification Susanne Johnson FRCOG Southampton, UK GynaecologyUltrasound.com IOTA

SIGMA IOTA RHO NATIONAL HONOR SOCIETY FOR INTERNATIONAL STUDIES WHAT IS SIGMA IOTA RHO? THE

Detecting single electrons in IOTA Giulio Stancari for the FAST/IOTA group Fermi National

Virtualization Virtualization Memory virtualization Process feels like it has its own

CryBlock 2019 G-IOTA: Fair and Confidence Aware Tangle Authors: Gewu BU, nder GURCAN (CEA LIST)

IOTA single-electron and low-intensity studies G. Stancari for the team IOTA/FAST Group Meeting

Virtualization and SDN Applications 2 Virtualization Network Virtualization Sharing

Virtualization What is Virtualization? Virtualization is the simulation of the software and/

Introduction to Virtual Machines Carl Waldspurger (SB SM 89 PhD 95) VMware R&amp;D Overview

AMD Pacifica Virtualization Technology AMD Unveils Virtualization Platform AMD Pacifica

KVM MMU Virtualization Xiao Guangrong &lt;xiaoguangrong@cn.fujitsu.com&gt; Index What is MMU

EUROPA: Efficient User-Mode Packet Forwarding in Network Virtualization Virtualization Yong

Virtualization. A dream within a dream Type 1 Virtualization Hypervisor run on bare

Proposal of nonlinear optics measurements and correction in the IOTA ring M. Hofer , E.H. Maclean,

NOvA Data Quality Monitoring Framework Jim Musser NOvA Operational Readiness Review Oct 28,

Integrity Assurance in Resource-Bounded Systems through Stochastic Message Authentication Aron

-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain

Key Wrapping with the Keccak Permutation Dmitry Khovratovich University of Luxembourg 17 January

CS/INFO 330 Data-Driven Web Applications Mirek Riedewald mirek@cs.cornell.edu Course Goals

But Where Are We Going? Hans Fenstermacher (Translations.com/TransPerfect/GALA) David Filip

NYeC Monthly Association Call Kick-Off Valerie Grey Executive Director October 12, 2018 Agenda

Introduction Web Data Management and Distribution Serge Abiteboul Ioana Manolescu Philippe

Introduction to Virtual Machines Carl Waldspurger (SB SM 89 PhD 95) VMware R&D Overview

KVM MMU Virtualization Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Index What is MMU