ALLUXIO
2019
Enabling Ultra-fast Presto in the Cloud with Alluxio
Haoyuan (H.Y.) Li | Founder & CTO | Alluxio | haoyuan@alluxio.com | alluxio.io/slack 2019-12-11 @ Presto Summit NYC
Enabling Ultra-fast Presto in the Cloud with Alluxio Haoyuan (H.Y.) - - PowerPoint PPT Presentation
Enabling Ultra-fast Presto in the Cloud with Alluxio Haoyuan (H.Y.) Li | Founder & CTO | Alluxio | haoyuan@alluxio.com | alluxio.io/slack 2019-12-11 @ Presto Summit NYC ALLUXIO 2019 Outline Alluxio Overview: History and its Open Source
2019
Haoyuan (H.Y.) Li | Founder & CTO | Alluxio | haoyuan@alluxio.com | alluxio.io/slack 2019-12-11 @ Presto Summit NYC
ALLUXIO
2019
ALLUXIO
2019
Originated as Tachyon project, at the UC Berkley’s AMP Lab by then Ph.D. student & now Alluxio CTO, Haoyuan (H.Y.) Li. 2013 2015 Open Source project established & company to commercialize Alluxio founded Goal: Orchestrate Data for Analytics & ML in the Cloud for data driven apps such as Big Data Analytics, ML and AI. 2018 2019 2018
Consumer Travel & Transportation Telco & Media
Technology Financial Services Retail & Entertainment Data & Analytics Services
COMPUTE STORAGE STORAGE COMPUTE
STORAGE COMPUTE
HDFS
HIVE
HDFS
Presto
NFS
TENSOR FLOW
OBJECT STORE
PRESTO
WAN
HDFS
WAN
S3
Spark AZURE PRESTO
HDFS
HIVE Presto
NFS
TENSOR FLOW
PRESTO
S3
SPARK
DATA ORCHESTRATION DATA ORCHESTRATION DATA ORCHESTRATION DATA ORCHESTRATION DATA ORCHESTRATION
ANY DATA APP
DATA ORCHESTRATION
ALLUXIO
2019
Now available as Developer Preview in v2.1
15
Presto Hive Metastore
location=s3://bucket/table
Read/Write Metadata Read/Write Data Presto Alluxio Mounted to Alluxio Hive Metastore
location=alluxio:///table
Read/Write Metadata Read/Write Data 16
> CREATE TABLE alluxio_table (id varchar) WITH (external_location = 'alluxio:///table'); > SELECT * FROM alluxio_table Create A Table on Alluxio Read A Table from Alluxio 17
▪ S3 performance is variable and consistent
query SLAs are hard to achieve
▪ S3 metadata operations are expensive
making workloads run longer
▪ S3 egress costs add up making the
solution expensive
▪ S3 is eventually consistent making it hard
to predict query results
Accelerate analytical frameworks on the public cloud
Same instance / container
▪ Accessing data over WAN too slow ▪ Copying data to compute cloud time
consuming and complex
▪ Using another storage system like S3
means expensive application changes
▪ Using S3 via HDFS connector leads
to extremely low performance
Burst big data workloads in hybrid cloud environments
Same instance / container
Solution Benefits ▪ Same performance as local ▪ Same end-user experience ▪ 100% of I/O is offloaded
▪ Object stores performance for big
data workloads can be very poor
▪ No native support for popular
frameworks
▪ Expensive metadata operations
reduce performance even more
▪ No support for hybrid environments
directly
Dramatically speed-up big data
Same container / machine
Solution Benefits ▪ Same performance as HDFS ▪ Uses HDFS APIs ▪ Same end-user experience ▪ Storage at fraction of the cost of HDFS
Alluxio Presto AWS S3 Presto AWS S3
HDFS
Leading Online Game Company in China
https://www.alluxio.io/blog/presto-on-alluxio-how-netease- games-leveraged-alluxio-to-boost-ad-hoc-sql-on-hdfs/
Presto HDFS Presto Alluxio
HDFS SPARK
Leading Online Retailer in China
https://www.slideshare.net/Alluxio/alluxio-in-jd
Presto HDFS SPARK Presto Alluxio
25
27
De Details ails: www www.alluxio.io/power ered ed-by by-allu alluxio io/ www www.alluxio.io/data-or
summit-2019/ 2019/
Accelerate query performance as cloud storage caching
On-premise satellite compute clusters across data centers
Satellite Presto Cluster
Spark Hive
Main Hadoop Cluster
Zero-copy burst workloads in hybrid cloud environments
28
Any Cloud / Multi Cloud Same data center / region
Enable big data on object stores across single or multiple clouds
Standalone
Orchestrate data frameworks
Any public / private cloud
ALLUXIO
2019
Now available as Developer Preview in v2.1
31
32
33
Presto Alluxio Caching Service Alluxio Catalog Service Alluxio Transformation Service Hive Connector Alluxio Connector Hive Metastore Storage
34
https://www.starburstdata.com/technical-blog/starburst-presto-alluxio-better-together/
https://www.alluxio.io/blog/top-5-performance-tuning-tips-for-running-presto-on-alluxio-1
https://www.alluxio.io/blog/tutorial-presto-alluxio-hive-metastore-on-your-laptop-in-10-min/
structured-data/ 35