PRESTO TO PRESTO:
DISTRIBUTED QUERIES IN HYBRID DATALAKE
Sajumon “Saj” Joseph Principal Architect Comcast
- Query distribution challenge
- Presto-to-Presto (P2P) connector
- P2P challenges
- P2P best practices
- P2P future work
Agenda:
PRESTO TO PRESTO : DISTRIBUTED QUERIES IN HYBRID DATALAKE Agenda: - - PowerPoint PPT Presentation
PRESTO TO PRESTO : DISTRIBUTED QUERIES IN HYBRID DATALAKE Agenda: Query distribution challenge Presto-to-Presto (P2P) connector P2P challenges Sajumon Saj Joseph P2P best practices Principal Architect Comcast P2P
Sajumon “Saj” Joseph Principal Architect Comcast
Agenda:
2
Teradata SQL
Hadoop Teradata
Query Grid ANSI SQL
Oracle Cassandra SQL Server MongoDB
ANSI SQL ANSI SQL
AWS S3 S3 S3
3
4
T e r a d a t a S Q L
Hadoop Teradata
Query Grid A N S I S Q L
Oracle Cassandra SQL Server MongoDB
ANSI SQL
5
Hadoop Teradata
A N S I S Q L ANSI SQL
AWS S3 Aurora DB S3 S3
6
7
source
authentication
connectors
8
datalake1 datawarehouse Presto single endpoint Teradata (On-premise) Hadoop (On-premise) datalake2 On-premise Presto Cloud Presto datalake2 AWS (S3) P2P
9
datalake1 datawarehouse Presto single endpoint Teradata (On-premise) Hadoop (On-premise) datalake1 On-premise Presto P2P Cloud Presto datalake2 AWS (S3) datawarehouse P2P
1 0
On-premise Presto Cloud Presto
1 2
Step 1: User authenticates in on-premise cluster, submits query Step 2: On Premise Cluster connects to Cloud Presto cluster by passing client certificate and user identity Step 3. User access to dataset validated by checking the user identity against Ranger. Step 4. Cloud Presto runs the query Step 5. Results returned
3 4 5
Presto single endpoint
1 1
connector.name=presto connection-url=jdbc:presto://<host>:<port>/<catalog> presto.SSL=true presto.SSLTrustStorePath=<path of public certificate of remote cluster> presto.SSLKeyStorePath=<path of client java keystore> presto.SSLKeyStorePassword=<client java keystore password> presto.clientTags=<client tags> unsupported-type-handling=CONVERT_TO_VARCHAR
1 2
On Coordinator node:
etc/config.properties
http-server.authentication.type=JWT,PASSWORD,CERTIFICATE,KERBEROS http-server.https.truststore.path=<path of trust store containing client cluster certificate>
1 3
On Coordinator node:
etc/rules.json
"PRINCIPALS": [ { "PRINCIPAL": "CN=<client CN>.*", "USER": "(.*)", "ALLOW": TRUE } ]
More details on Principal rules: https://prestosql.io/docs/current/security/built-in-system-access-control.html
1 4
SOLUTION:
Certificate-based authentication JDBC-based data types LIMIT and PROJECTION pushdown
CHALLENGES:
1 5
1 6
1 7
P2P – DISTRIBUTED QUERY IN HYBRID DATA LAKE P2P enables processing on our “hybrid” data lake:
Contact Details:
Sajumon Joseph
Sajumon_Joseph@cable.comcast.com