SQL for NoSQL and how Apache Calcite can help FOSDEM 2017 Christian - PowerPoint PPT Presentation

SQL for NoSQL and how Apache Calcite can help FOSDEM 2017

Christian Tzolov Engineer at Pivotal BigData, Hadoop, Spring Cloud Dataflow Apache Committer, PMC member Apache {Crunch, Geode, HAWQ, ...} blog.tzolov.net twitter.com/christzolov nl.linkedin.com/in/tzolov Disclaimer This talk expresses my personal opinions. It is not read or approved by Pivotal and does not necessarily reflect the views and opinions of Pivotal nor does it constitute any official communication of Pivotal. Pivotal does not support any of the code shared here. 2

“It will be interesting to see what happens if an established NoSQL database decides to implement a reasonably standard SQL ; The only predictable outcome for such an eventuality is plenty of argument.” 2012, Martin Fowler, P.J.Sadalage, NoSQL Distilled 3

Data Big Bang Why? 4

NoSQL Driving Forces • Rise of Internet Web, Mobile, IoT – Data Volume, Velocity, Variety challenges ACID & 2PC clash with Distributed architectures. CAP, PAXOS instead.. • Row-based Relational Model. Object-Relational Impedance Mismatch More convenient data models: Datastores, Key/Value, Graph, Columnar, Full-text Search, Schema-on-Load … • Infrastructure Automation and Elasticity (Cloud Computing) Eliminate operational complexity and cost. Shift from Integration to application databases … 5

Data Big Bang Implications • Over 150 commercial NoSQL and BigData systems. • Organizations will have to mix data storage technologies! • How to integrate such multitude of data systems? 6

“Standard” Data Process/Query Language? • Functional - Unified Programming pcollection.apply( Read.from (”in.txt")) Model .apply( FlatMapElements . via ((String word) -> asList(word.split("[^a-zA-Z']+"))) • Apache {Beam, Spark, Flink, .apply( Filter . by ((String word)->!word.isEmpty())) Apex, Crunch}, Cascading .apply( Count .<String> perElement ()) • Converging around Apache Batch & Streaming, OLTP Beam • Declarative - SQL • Adopted by many NoSQL SELECT b."totalPrice", c."firstName” FROM "BookOrder" as b Vendors INNER JOIN "Customer" as c • Most Hadoop tasks: Hive and ON b."customerNumber" = c."customerNumber” SQL-on-Hadoop WHERE b."totalPrice" > 0; • Spark SQL - most used production component for 2016 OLAP, EDW, Exploration • Google F1 7

SQL for NoSQL? • Extended Relational Algebra - already present in most NoSql data system • Relational Expression Optimization – Desirable but hard to implement 8

Organization Data - Integrated View Direct (M:N) Single Federated DB (M:1:N) https://issues.apache.org/jira/browse/HAWQ-1235 Organization Data Tools Organization Data Tools SQL/JDBC SQL/JDBC SQL/JDBC SQL/JDBC HAWQ FDBS Apache HAWQ Optimizer, Columnar (HDFS) Calcite Calcite Calcite … SQLAdapter 1 SQLAdapter 2 SQLAdapter n NoSQL 2 NoSQL 1 NoSQL n PXF 2 PXF n PXF 1 Native Native Native API 2 API n API 1 … NoSQL 1 NoSQL n NoSQL 1 9

Single Federated Database Federated External Tables with Apache HAWQ - MPP, Shared-Noting, SQL- on-Hadoop CREATE EXTERNAL TABLE MyNoSQL ( customer_id TEXT, first_name TEXT, last_name TEXT, gender TEXT ) LOCATION ('pxf://MyNoSQL-URL>? FRAGMENTER=MyFragmenter& ACCESSOR=MyAccessor& RESOLVER=MyResolver&') FORMAT 'custom'(formatter='pxfwritable_import'); 10

Apache Calcite? Java framework that allows SQL interface and advanced query optimization, for virtually any data system • Query Parser, Validator and Optimizer(s) • JDBC drivers - local and remote • Agnostic to data storage and processing

Calcite Application • Apache Apex • Apache Drill • Apache Flink • Apache Hive • Apache Kylin • Apache Phoenix • Apache Samza • Apache Storm • Cascading • Qubole Quark • SQL-Gremlin … • Apache Geode 12

SQL Adapter Design Choices SQL completeness vs. NoSql design integrity Catalog – namespaces accessed in queries Schema - collection of schemas and tables • Data Type Conversion Table - single data set, collection of rows RelDataType – SQL fields types in a Table (simple) Predicate Pushdown: Scan , Filter , Projection • Move Computation to Data (complex) Custom Relational Rules and Operations: Sort, Join, GroupBy ... 13

Geode to Calcite Data Types Mapping Geode Cache is mapped into Calcite Schema Create Column Types Calcite Schema Geode Cache (RelDataType) from Geode Value class (JavaTypeFact ctoryImpl) Region 1 Table 1 Col1 Col2 ColN Row1 V(1,1) V(1,2) V(1,N) Key Val Row2 V(2,1) V(2,2) V(2,N) k1 v1 RowM V(M,1) V(M,2) V(M,N) k2 v2 Geode Key/Value is mapped … … into Table Row Region K Table K Regions are mapped into Tables 14

Geode Adapter - Overview SQL SQL/JDBC/ JDBC/ODBC ODBC Parse S SQL, co conv nverts i int nto relationa nal e expression a n and nd optimizes Apache Calcite Push d down t n the r relationa nal expressions ns s supported b by G Geode OQL a and nd f falls b back ck t to t the C Calci cite Spring ng D Data A API f for Enu numerable A Adapter f for t the r rest Enumerable int nteract cting ng w with G Geode Adapter Conv nvert S SQL r relationa nal expressions ns i int nto O OQL q queries Spring Data Geode Adapter Geode (Geode Client) Geode A API a and nd O OQL Geode Server Geode Server Geode Server Data Data Data

Simple SQ L Adapter Initialize <<SchemaFactory>> <<create>> <<Schema>> MySchemaFactory MySchema ! connect jdbc : calcite :model=path-to- model.json +getTableMap():Map<String, Table>) +create(operands):Schema <<create>> defaultSchema: 'MyNoSQL', schemas: [{ name: ’MyNoSQLAdapter, factory : MySchemaFactory ’, ScannableTable, Uses reflection to builds operand : { myNoSqlUrl: … , } <<ScannableTable>> RelDataType from your }] FilterableTable, MyTable value’s class type ProjectableFilterableTable +getRowType(RelDataTypeFactor) Returns an Enumeration +scan(ctx):Ennumerator<Object[]> over the entire target data store <<on scan() create>> SELECT b."totalPrice” Defined in the Linq4j Query FROM "BookOrder" as b sub-project <<Enummerator>> WHERE b."totalPrice" > 0; MyEnummerator Converts MyNoSQL value response into +moveNext() Calcite row data +convert(Object):E <<Get all Data>> My NoSQL 16

Non-Relational Tables (Simple) Scanned without intermediate relational expression. • ScannableTable - can be scanned Enumerable<Object[]> scan (DataContext root); • FilterableTable - can be scanned, applying supplied filter expressions Enumerable<Object[]> scan (DataContext root, List<RexNode> filters ); • ProjectableFilterableTable - can be scanned, applying supplied filter expressions and projecting a given list of columns Enumerable<Object[]> scan (DataContext root, List<RexNode> filters , int[] projects ); 17

Calcite Ecosystem Several “semi-independent” projects. Local and Remote Port of LINQ (Language-Integrated Query) JDBC driver to Java. Linq4j JDBC and Avatica Method for translating executable code into data Expression (LINQ/MSN port) Converts SQL SQL Parser & AST queries Into AST Tree (SqlNode …) Complies Java code generated from linq4j Relational “Expressions”. Part of the Relational Algebra, Interpreter • Relational Expressions physical plan implementer expression, • Row Expression optimizations … • Optimization Rules • Planner … Default (In-memory) Data Enumerable Store Adapter Adapter implementation. Leverages Linq4j 3 rd party Adapters 18

Calcite SQL Query Execution Flow 1. On new SQL query JD JDBC BC delegates to Pr Prepar epare to prepare the query JDBC execution 2. Parse SQL, convert to rel. 1 expressions. Va Valid lidate te and Opti Optimi mize ze Calcite Framework them 2 Geode Adapter Prepare 3. Start building a physical plan from the relation expressions 3 SQL, Interpreter 5 4. Implement the Geode relations and Relational, Enumerable encode them as Expression t n tree Planner 5. Pass the Expression tree to the 4 6 7 Interpreter to generate Java code 2 7 Geode 6. Generate and Compile a Binder Binder Adapter instance that on ‘bind()’ call runs Geodes’ query method 7 7. JDBC uses the newly compiled Geode Binder to perform the query on the Geode Cluster Cluster 19

Calcite Relational Expressions TableScan Input Column Project Ref Row-level Ro Relationa nal expressions expression Filter Literal * Struct field Aggregate RelNode RexlNode access Project, Sort field fields s * Filter, Join co cond nditions ns Join Function call RelTrait Window Intersect expressions Physica cal attribute of a relation Sort 20

SQL for NoSQL and how Apache Calcite can help FOSDEM 2017 Christian - PowerPoint PPT Presentation

SQL for NoSQL and how Apache Calcite can help FOSDEM 2017 Christian Tzolov Engineer at Pivotal BigData, Hadoop, Spring Cloud Dataflow Apache Committer, PMC member Apache {Crunch, Geode, HAWQ, ...} blog.tzolov.net twitter.com/christzolov

SQL and JS Pitfalls Assignment 2 Preparation SQL Concepts SQL vs. NoSQL

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

SQL SQL SQL = Structured Query Language Standard query language for relational

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

& Nosql DB New Syllabus 2019-20 Visit : python.mykvs.in for regular updates SQL SQL is an

Databases SQL, NoSQL, ORMs, REST/GraphQL, JSON/gRPC SQL QL (1974) 4) Initially, relational

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

Combined Static and Dynamic Automated Test Generation Sai Zhang University of Washington Joint

Oracle PL/SQL & JDBC Basic Structure Block DECLARE /* Declarative section: variables,

CSE 510 Web Data Engineering Connection Pool UB CSE 510 Web Data Engineering Handling Database

CS314 Software Engineering Clean Code Dave Matthews Clean Code: A Handbook of Agile Software

Evolving Data Access Evolving Data Access Evolving Data Access Evolving Data Access

JDBC Tutorial MIE456 - Information Systems Infrastructure II Vinod Muthusamy November 4, 2004

CIS 330: Applied Database Systems Lecture 8: SQL Johannes Gehrke johannes@cs.cornell.edu

High-Level Wrapper for CloudKeeper Architecture Configuration Architecture High-Level Workflow

SQL for NoSQL and how Apache Calcite can help FOSDEM 2017 Christian - PowerPoint PPT Presentation

SQL for NoSQL and how Apache Calcite can help FOSDEM 2017 Christian Tzolov Engineer at Pivotal BigData, Hadoop, Spring Cloud Dataflow Apache Committer, PMC member Apache {Crunch, Geode, HAWQ, ...} blog.tzolov.net twitter.com/christzolov

SQL and JS Pitfalls Assignment 2 Preparation SQL Concepts SQL vs. NoSQL

NoSQL and MongoDB 1 2 Introduction to NoSQL Based on a presentation by Traversy Media 3 What

Tarantool - a NoSQL Tarantool - a NoSQL database with SQL database with SQL Pavel Lapaev,

NoSQL Source: Pramod J. Sadalage and Martin Fowler NoSQL Distilled: A Brief Guide to the

SQL SQL SQL = Structured Query Language Standard query language for relational

NoSQL CS226 Big-data Management 1 Based on a presentation by Traversy Media 2 What is

&amp; Nosql DB New Syllabus 2019-20 Visit : python.mykvs.in for regular updates SQL SQL is an

Databases SQL, NoSQL, ORMs, REST/GraphQL, JSON/gRPC SQL QL (1974) 4) Initially, relational

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

NoSQL Terje Gjster, Ph.D. UiA, Grimstad 16. November 2015 Overview Introduction and

NoSQL like There is No Tomorrow Khawaja Head of Engineering, NoSQL Swaminathan Sivasubramanian

Combined Static and Dynamic Automated Test Generation Sai Zhang University of Washington Joint

Oracle PL/SQL &amp; JDBC Basic Structure Block DECLARE /* Declarative section: variables,

CSE 510 Web Data Engineering Connection Pool UB CSE 510 Web Data Engineering Handling Database

CS314 Software Engineering Clean Code Dave Matthews Clean Code: A Handbook of Agile Software

Evolving Data Access Evolving Data Access Evolving Data Access Evolving Data Access

JDBC Tutorial MIE456 - Information Systems Infrastructure II Vinod Muthusamy November 4, 2004

CIS 330: Applied Database Systems Lecture 8: SQL Johannes Gehrke johannes@cs.cornell.edu

High-Level Wrapper for CloudKeeper Architecture Configuration Architecture High-Level Workflow

& Nosql DB New Syllabus 2019-20 Visit : python.mykvs.in for regular updates SQL SQL is an

Oracle PL/SQL & JDBC Basic Structure Block DECLARE /* Declarative section: variables,