Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho - - PowerPoint PPT Presentation

design patterns leveraging spark in pdi
SMART_READER_LITE
LIVE PREVIEW

Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho - - PowerPoint PPT Presentation

Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho Director of Sales Engineering, Hitachi Vantara Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara Quiz Time! What is Spark? A. A good way to start a fire. B. Necessary


slide-1
SLIDE 1

Design Patterns Leveraging Spark in PDI

Chris Skirde Pentaho Director of Sales Engineering, Hitachi Vantara Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara

slide-2
SLIDE 2

Quiz Time!

  • What is Spark?
  • A. A good way to start a fire.
  • B. Necessary for a well running internal combustion engine.
  • C. Fast and general purpose engine for large-scale data processing.
  • D. All of the above.
  • True or False, Pentaho supports Spark?
  • Who is using Spark today (with or without Pentaho)?
slide-3
SLIDE 3

Agenda

  • Introduction to Spark
  • Common design patterns
  • How to leverage Spark with Pentaho
slide-4
SLIDE 4

Introduction to Spark

  • Why are we interested?
  • What is it really?
  • What’s been done?
slide-5
SLIDE 5

Spark Application Architecture

Daemon PDI/Server

slide-6
SLIDE 6

What Do Those Applications Have in Common?

slide-7
SLIDE 7

Common Design Patterns

  • Filter/Organize
  • Join
  • Sum
  • Transform/Enrich
  • Query
  • Machine Learning/Data Science
slide-8
SLIDE 8

Filter/ Organize

slide-9
SLIDE 9

Join

slide-10
SLIDE 10

Sum (and Other Aggregations)

slide-11
SLIDE 11

Transform/Enrich

  • Any step you like!
slide-12
SLIDE 12

Query – Easy!

  • Cloudera use Hive-on-Spark with Hive2
  • Hortonworks use SparkSQL via Simba
slide-13
SLIDE 13

Machine Learning/Data Science

slide-14
SLIDE 14

Recap

What we covered today:

  • Reviewed what Spark is and why organizations are adopting it
  • Discussed several common data integration design patterns
  • Linked those design patterns to Pentaho features for you to try
slide-15
SLIDE 15

Questions?

slide-16
SLIDE 16

Next Steps

Want to learn more?

  • “Meet the Experts” Matt Casters and Mark Hall!
  • Adaptive Execution Layer http://www.pentaho.com/blog/introducing-adaptive-

execution-layer-spark-architecture

  • SQL on Spark http://www.pentaho.com/blog/operationalize-spark-big-data-

newest-enhancements

slide-17
SLIDE 17