Python, PySpark and Riak TS
Stephen Etheridge Lead Solution Architect, EMEA
Python, PySpark and Riak TS Stephen Etheridge Lead Solution - - PowerPoint PPT Presentation
Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda Introduction to Riak TS The Riak Python client The Riak Spark connector and PySpark Basho Technologies | 3 CONFIDENTIAL BASHO SNAPSHOT
Stephen Etheridge Lead Solution Architect, EMEA
CONFIDENTIAL
Basho Technologies | 3
CONFIDENTIAL
Distributed Systems Software for Big Data, IoT and Hybrid Cloud applications 2011 Creators of Riak Distributed Systems
2015 New Products
databases, caching, in-memory analytics, and search
100+ employees Global Offices
Over 1/3 of the Fortune 50
High Availability - Critical Data High Scale - Heavy Reads & Writes Geo Locality - Multiple Data Centers Operational Simplicity – Resources Don’t Scale as Clusters Data Accuracy – Write Conflict Options
IoT/Devices Financial/Economic Scientific Observations
User Data Session Data Profile Data Real-time Data Log Data
CONFIDENTIAL
Basho Technologies | 7
Basho Technologies | 8
Basho Technologies | 9
Basho Technologies | 10
Basho Technologies | 11
Riak TS currently supports the Protocol Buffers API and five client libraries including Java, Ruby, Python, Erlang, and Node.js.
Basho Technologies | 12
APIs Basho Clients Community Clients
Basho Technologies | 13
applications to Riak TS with the Spark RDD and Spark DataFrames APIs
– Scala (if you have to), – Python (yay!), – and Java (never!).
so multiple Spark workers can process the data in parallel,
node goes down while your Spark job is running.
be pathed in! – Riak TS 1.2+ – Apache Spark 1.6+ – Scala 2.10 – Java 8