Getting Started With Apache Ignite SQL Denis Magda, GridGain - - PowerPoint PPT Presentation

▶

Dec 14, 2023 272 likes •691 views

Getting Started With Apache Ignite SQL Denis Magda, GridGain Developer Relations Igor Seliverstov, GridGain Architecture Group Topics Ignite SQL Basics: DML, DDL, connectivity, configuration Affinity Co-Location and Distributed JOINs

SLIDE 1

Getting Started With Apache Ignite SQL

Denis Magda, GridGain Developer Relations Igor Seliverstov, GridGain Architecture Group

SLIDE 2

Topics

Ignite SQL Basics: DML, DDL, connectivity, configuration
Affinity Co-Location and Distributed JOINs
Beyond Memory Capacity: Disk Tier Usage and Memory Quotas
Ignite SQL Evolution With Apache Calcite

SLIDE 3

Ignite SQL Basics

SLIDE 4

Ignite SQL = ANSI SQL at Scale

ANSI-99 DML and DDL syntax

– SELECT, UPDATE, CREATE…

Distributed joins, grouping, sorting
Schema changes in runtime

– ALTER TABLE, CREATE/DROP INDEX

Works with in-memory and disk-only records

– If Ignite Persistence is used as a disk tier

SLIDE 5

Connectivity Options

Thick Client APIs

– Java, C#/.NET, C++

JDBC and ODBC drivers
Thin Client APIs

– Multi-language support

SLIDE 6

Configuration Option #1: Programmatically With Annotations

Usage Scenario:

Spring-style development by annotating POJOs
DDL can be used to apply changes in runtime.

SLIDE 7

Configuration Option #2: Spring XML With Query Entities

Usage Scenario:

Ignite as a cache that writes-through

changes to an external database.

DDL can be used to apply changes in

runtime.

SLIDE 8

Configuration Option #3: In Pure SQL With DDL

Usage Scenario:

SQL-driven applications
Green-field applications using Ignite as a

database with its native persistence

SLIDE 9

Demo Time

Cluster Startup and Database Creation

SLIDE 10

Affinity Co-Location and Distributed JOINs

SLIDE 11

Ignite SQL Engine Internals

Data & Indexes Ignite SQL H2 Engine Data & Indexes Ignite SQL H2 Engine Data & Indexes Ignite SQL H2 Engine

SLIDE 12

Query Execution Phases

City City Thick Client Map Map Map Reduce Reduce

SLIDE 13

Default Data Distribution

Canada Toronto Calgary Paris France Marseille Montreal Ottawa Country Table City Table

SLIDE 14

SQL JOIN With Data Shuffling

Thick Client

Canada Toronto Calgary Paris France Marseille Ottawa Montreal Paris Ottawa Montreal

1 & 4 2 2 3

1. Initiating Execution
2. Execution on Servers (map phase)
3. Data Shuffling
4. Reduce Phase

SLIDE 15

Co-Located Distribution (aka. Affinity Co-Location)

Canada

Toronto Calgary

France Marseille Country Table City Table

Montreal Ottawa

Paris

SLIDE 16

All You Need is to Configure Affinity Key

SLIDE 17

Affinity Key to Node Mapping Process

Affinity Key Partition Application Process Network Call Node

City Record

SLIDE 18

High-Performance SQL JOIN

Thick Client

Canada Toronto Calgary France Marseille

1 & 3 2 2

1. Initiating Execution
2. Execution on Servers (map phase)
3. Reduce Phase

Ottawa Paris

SLIDE 19

Demo Time

Queries With JOINs

SLIDE 20

Beyond Memory Capacity: Disk-Tier and Memory Quotas

SLIDE 21

Multi-Tier Storage architecture

1. In-Memory - General in-memory caching, high-performance computing 2. In-Memory + Native Persistence - Ignite as an in-memory database 3. In-Memory + External Database - Acceleration of services and APIs with write-through and write-behind capability

SLIDE 22

Multi-Tier Storage Architecture

Index Page

(root)

Index page

(inner)

Inner page 2 Leaf page 2 Index Page

(leaf)

Leaf page 3

Data page Index Page

(root)

Leaf page Metadata page Leaf page Data page Metadata page

Memory segment

Key-Value Key-Value Key-Value Key-Value

Data page

Data Index

SLIDE 23

Multi-Tier Storage Architecture

Data page #0 Data page #1

Partition file with Data

Data page #2 Data page #3 Data page #4 Data page #5 Data page #6 Data page #5 Inner page Leaf page Metadata page Leaf page Data page #2 Metadata page

Memory segment

PageId Pages map Pointer in a memory segment

(read/write ops)

Position in a file

(load page/checkpoint)

SLIDE 24

Java off-heap vs Java heap

SQL query processing Parsing Planning Computing

(filters, joins, expressions)

Scanning

(index or table scan)

Heap Heap

More Heap

Off-heap

SLIDE 25

Java off-heap vs Java heap

Sorting Renaming Aggregation Projection Join Filtering Scanning COUNTRY CITY

σcode in (‘CAN’, ‘FRA’) ⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop πcountry.name, city.name, city.population ρname, name0, max_pop

Here we need full set in heap Here we need full set in heap too

SLIDE 26

Query memory quotas

How to configure:

SLIDE 27

Interim results offloading

Sorting Renaming Aggregation Projection Join Filtering Scanning COUNTRY CITY

σcode in (‘CAN’, ‘FRA’) ⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop πcountry.name, city.name, city.population ρname, name0, max_pop

Why don’t you flush result sets to disk? And it

SLIDE 28

Intermediate results offloading

How to configure:

SLIDE 29

When you need quotas/offloading enabled

Sorting (ORDER BY)
Grouping (DISTINCT, GROUP BY)
Complex subqueries

SLIDE 30

Demo Time

Running SQL Over Disk-Only Records

SLIDE 31

Apache Ignite SQL Evolution With Apache Calcite

SLIDE 32

Why do we need it?

Here we need Map-Reduce phase Here we need Map-Reduce phase too

SLIDE 33

Typical execution flow

User Parser Optimizer mode Rule-based

ptimizer

Cost-based

ptimizer

Dictionary Row source generator Execution SQL Query RBO CBO Statistics Result Query plan Validator Validation Schema Query AST

SLIDE 34

Apache Calcite

JDBC Client JDBC Server SQL Parser/Validator Query optimizer 3rd party ops 3rd party ops Metadata SPI Plugable rules 3rd party data 3rd party data Optional Core Pluggable

Need to implement:

Splitter
Runtime
Indexes support
DML support
DDL support

SLIDE 35

Query Parser and Transformer

Select expr=p.id, d.name from=person, dep cond=p.depId = d.id AND (p.id > 10 OR p.id < 10000)

rder=p.name DESC
ffset=10

dep(d) person(p)

σp.id > 10 OR p.id < 10000 ⋈p.depId = d.id τp.name DESC πp.id, p.name

Query AST Relational operators tree (query plan)

SLIDE 36

Cost-Based Optimizer

Estimator Dictionary Plan generator Query transformer Query AST (from parser) Relational tree Relational tree + costs Statistics Equivalent relational tree Query plan (to Row source generator) Rules

SLIDE 37

Cost-Based Splitter

dep(d) person(p)

σp.id > 10 OR p.id < 10000 ⋈p.depId = d.id τp.name DESC πp.id, p.name

Root

SLIDE 38

Reactive Execution Flow

Scan Filter Sender Receiver Client cursor Push Push Send Push Data flow Request Request Acknowledge Request Node buffer Node buffer Node buffer Node buffer Backpressure Network communication

SLIDE 39

Demo Time

Calcite Prototype Demo With Sub-Queries

SLIDE 40

Learn More

Apache Ignite SQL

–

https://apacheignite-sql.readme.io/docs

Memory Quotas (available in GridGain Community

Edition):

–

https://www.gridgain.com/docs/latest/developers-guide/me mory-configuration/memory-quotas

Demos shown in this webinar

–

https://github.com/GridGain-Demos/ignite-sql-intro-samples

New Apache Calcite-based engine

–

https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+ query+execution+engine