Getting Started With Apache Ignite SQL Denis Magda, GridGain - - PowerPoint PPT Presentation

getting started with apache ignite sql
SMART_READER_LITE
LIVE PREVIEW

Getting Started With Apache Ignite SQL Denis Magda, GridGain - - PowerPoint PPT Presentation

Getting Started With Apache Ignite SQL Denis Magda, GridGain Developer Relations Igor Seliverstov, GridGain Architecture Group Topics Ignite SQL Basics: DML, DDL, connectivity, configuration Affinity Co-Location and Distributed JOINs


slide-1
SLIDE 1

Getting Started With Apache Ignite SQL

Denis Magda, GridGain Developer Relations Igor Seliverstov, GridGain Architecture Group

slide-2
SLIDE 2

Topics

  • Ignite SQL Basics: DML, DDL, connectivity, configuration
  • Affinity Co-Location and Distributed JOINs
  • Beyond Memory Capacity: Disk Tier Usage and Memory Quotas
  • Ignite SQL Evolution With Apache Calcite
slide-3
SLIDE 3

Ignite SQL Basics

slide-4
SLIDE 4

Ignite SQL = ANSI SQL at Scale

  • ANSI-99 DML and DDL syntax

– SELECT, UPDATE, CREATE…

  • Distributed joins, grouping, sorting
  • Schema changes in runtime

– ALTER TABLE, CREATE/DROP INDEX

  • Works with in-memory and disk-only records

– If Ignite Persistence is used as a disk tier

slide-5
SLIDE 5

Connectivity Options

  • Thick Client APIs

– Java, C#/.NET, C++

  • JDBC and ODBC drivers
  • Thin Client APIs

– Multi-language support

slide-6
SLIDE 6

Configuration Option #1: Programmatically With Annotations

Usage Scenario:

  • Spring-style development by annotating POJOs
  • DDL can be used to apply changes in runtime.
slide-7
SLIDE 7

Configuration Option #2: Spring XML With Query Entities

Usage Scenario:

  • Ignite as a cache that writes-through

changes to an external database.

  • DDL can be used to apply changes in

runtime.

slide-8
SLIDE 8

Configuration Option #3: In Pure SQL With DDL

Usage Scenario:

  • SQL-driven applications
  • Green-field applications using Ignite as a

database with its native persistence

slide-9
SLIDE 9

Demo Time

Cluster Startup and Database Creation

slide-10
SLIDE 10

Affinity Co-Location and Distributed JOINs

slide-11
SLIDE 11

Ignite SQL Engine Internals

Data & Indexes Ignite SQL H2 Engine Data & Indexes Ignite SQL H2 Engine Data & Indexes Ignite SQL H2 Engine

slide-12
SLIDE 12

Query Execution Phases

City City Thick Client Map Map Map Reduce Reduce

slide-13
SLIDE 13

Default Data Distribution

Canada Toronto Calgary Paris France Marseille Montreal Ottawa Country Table City Table

slide-14
SLIDE 14

SQL JOIN With Data Shuffling

Thick Client

Canada Toronto Calgary Paris France Marseille Ottawa Montreal Paris Ottawa Montreal

1 & 4 2 2 3

  • 1. Initiating Execution
  • 2. Execution on Servers (map phase)
  • 3. Data Shuffling
  • 4. Reduce Phase
slide-15
SLIDE 15

Co-Located Distribution (aka. Affinity Co-Location)

Canada

Toronto Calgary

France Marseille Country Table City Table

Montreal Ottawa

Paris

slide-16
SLIDE 16

All You Need is to Configure Affinity Key

slide-17
SLIDE 17

Affinity Key to Node Mapping Process

Affinity Key Partition Application Process Network Call Node

City Record

slide-18
SLIDE 18

High-Performance SQL JOIN

Thick Client

Canada Toronto Calgary France Marseille

1 & 3 2 2

  • 1. Initiating Execution
  • 2. Execution on Servers (map phase)
  • 3. Reduce Phase

Ottawa Paris

slide-19
SLIDE 19

Demo Time

Queries With JOINs

slide-20
SLIDE 20

Beyond Memory Capacity: Disk-Tier and Memory Quotas

slide-21
SLIDE 21

Multi-Tier Storage architecture

1.

In-Memory - General in-memory caching, high-performance computing 2. In-Memory + Native Persistence - Ignite as an in-memory database 3. In-Memory + External Database - Acceleration of services and APIs with write-through and write-behind capability

slide-22
SLIDE 22

Multi-Tier Storage Architecture

Index Page

(root)

Index page

(inner)

Inner page 2 Leaf page 2 Index Page

(leaf)

Leaf page 3

Data page Index Page

(root)

Leaf page Metadata page Leaf page Data page Metadata page

Memory segment

Key-Value Key-Value Key-Value Key-Value

Data page

Data Index

slide-23
SLIDE 23

Multi-Tier Storage Architecture

Data page #0 Data page #1

Partition file with Data

Data page #2 Data page #3 Data page #4 Data page #5 Data page #6 Data page #5 Inner page Leaf page Metadata page Leaf page Data page #2 Metadata page

Memory segment

PageId Pages map Pointer in a memory segment

(read/write ops)

Position in a file

(load page/checkpoint)

slide-24
SLIDE 24

Java off-heap vs Java heap

SQL query processing Parsing Planning Computing

(filters, joins, expressions)

Scanning

(index or table scan)

Heap Heap

More Heap

Off-heap

slide-25
SLIDE 25

Java off-heap vs Java heap

Sorting Renaming Aggregation Projection Join Filtering Scanning COUNTRY CITY

σcode in (‘CAN’, ‘FRA’) ⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop πcountry.name, city.name, city.population ρname, name0, max_pop

Here we need full set in heap Here we need full set in heap too

slide-26
SLIDE 26

Query memory quotas

How to configure:

slide-27
SLIDE 27

Interim results offloading

Sorting Renaming Aggregation Projection Join Filtering Scanning COUNTRY CITY

σcode in (‘CAN’, ‘FRA’) ⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop πcountry.name, city.name, city.population ρname, name0, max_pop

Why don’t you flush result sets to disk? And it

slide-28
SLIDE 28

Intermediate results offloading

How to configure:

slide-29
SLIDE 29

When you need quotas/offloading enabled

  • Sorting (ORDER BY)
  • Grouping (DISTINCT, GROUP BY)
  • Complex subqueries
slide-30
SLIDE 30

Demo Time

Running SQL Over Disk-Only Records

slide-31
SLIDE 31

Apache Ignite SQL Evolution With Apache Calcite

slide-32
SLIDE 32

Why do we need it?

Here we need Map-Reduce phase Here we need Map-Reduce phase too

slide-33
SLIDE 33

Typical execution flow

User Parser Optimizer mode Rule-based

  • ptimizer

Cost-based

  • ptimizer

Dictionary Row source generator Execution SQL Query RBO CBO Statistics Result Query plan Validator Validation Schema Query AST

slide-34
SLIDE 34

Apache Calcite

JDBC Client JDBC Server SQL Parser/Validator Query optimizer 3rd party ops 3rd party ops Metadata SPI Plugable rules 3rd party data 3rd party data Optional Core Pluggable

Need to implement:

  • Splitter
  • Runtime
  • Indexes support
  • DML support
  • DDL support
slide-35
SLIDE 35

Query Parser and Transformer

Select expr=p.id, d.name from=person, dep cond=p.depId = d.id AND (p.id > 10 OR p.id < 10000)

  • rder=p.name DESC
  • ffset=10

dep(d) person(p)

σp.id > 10 OR p.id < 10000 ⋈p.depId = d.id τp.name DESC πp.id, p.name

Query AST Relational operators tree (query plan)

slide-36
SLIDE 36

Cost-Based Optimizer

Estimator Dictionary Plan generator Query transformer Query AST (from parser) Relational tree Relational tree + costs Statistics Equivalent relational tree Query plan (to Row source generator) Rules

slide-37
SLIDE 37

Cost-Based Splitter

dep(d) person(p)

σp.id > 10 OR p.id < 10000 ⋈p.depId = d.id τp.name DESC πp.id, p.name

Root

slide-38
SLIDE 38

Reactive Execution Flow

Scan Filter Sender Receiver Client cursor Push Push Send Push Data flow Request Request Acknowledge Request Node buffer Node buffer Node buffer Node buffer Backpressure Network communication

slide-39
SLIDE 39

Demo Time

Calcite Prototype Demo With Sub-Queries

slide-40
SLIDE 40

Learn More

  • Apache Ignite SQL

https://apacheignite-sql.readme.io/docs

  • Memory Quotas (available in GridGain Community

Edition):

https://www.gridgain.com/docs/latest/developers-guide/me mory-configuration/memory-quotas

  • Demos shown in this webinar

https://github.com/GridGain-Demos/ignite-sql-intro-samples

  • New Apache Calcite-based engine

https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+ query+execution+engine