Spark architecture Spark architecture Hardware organization - PowerPoint PPT Presentation

Dec 20, 2023 •205 likes •341 views

Spark architecture Spark architecture Hardware organization Hardware organization In local installation, cores serve as master & slaves Communication Communication Sh Sh uf uf fle fle Same machines are used for both map and reduce

Spark architecture Spark architecture
Hardware organization Hardware organization In local installation, cores serve as master & slaves
Communication Communication Sh Sh uf uf fle fle Same machines are used for both map and reduce (decreases communication but only slightly) Communication between slaves is the toughest bottleneck. Design your computation to minimize communication.
spatial software organization spatial software organization The Cluster Master The driver runs on the manages the master computation resources. It executes the "main()" Mesos and Yarn are code of your program. resource management programs for clusters. Workers run on the slaves (usually one per core) Each RDD is partitioned among the workers, Workers manage partitions and Executors Executors execute tasks on their partition, are myopic.
spatial organization spatial organization (more detail) (more detail) SparkContext (sc) is the abstraction that encapsulates the cluster for the driver node (and the programmer). Worker nodes manage resources in a single slave machine. Worker nodes communicate with the cluster manager. Executors are the processes that can perform tasks . Cache refers to the local memory on the slave machine.
RDD Processing RDD Processing RDDs, by default, are not materialized They do materialize if cached or otherwise persisted.
Temporal organization Temporal organization RDD Graph and Physical plan RDD Graph and Physical plan Recall Spatial organization A stage ends when the RDD needs to be materialized
Terms and concepts of execution Terms and concepts of execution RDDs are partitioned across workers, each worker manages a one partition of each RDD. RDD graph de fi nes the Lineage of the RDDs. SparkContext divides the RDD graph into stages which de fi nes the execution plan (or physical plan) A task corresponds to the to one stage, restricted to one partition. An executor is a process that can perform tasks.
Persistance Persistance and Checkpointing and Checkpointing
Levels of persistance Levels of persistance Caching is useful for retaining intermediate results On the other hand, caching can consume a lot of memory If memory is exhausted, caches can be eliminated, spilled to disk etc. If needed again, cache is recomputed or read from disk. The generalization of .cache() is called .persist() which has many options.
Storage Levels Storage Levels .cache() same as .persist(MEMORY_ONLY)
Checkpointing Checkpointing Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash, all the computation needs to be redone. Checkpointing reduces this problem by storing the materialized RDD on a remote disk. On Recovery, the RDD will be recovered from the disk. It is recommended to cache an RDD before checkpointing it.

Recommend

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark Streaming and Spark SQL Explored Streaming API of Apache Spark on Ukko Cluster Window based Stream Content Direct Stream content

221 views • 9 slides

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust - @michaelarmbrust What is Apache Spark? Fast and general cluster computing system, interoperable with Hadoop, included in all major distros

667 views • 43 slides

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Hardware Observability Framework Hardware Observability Framework Hardware Observability Framework Hardware Observability Framework Hardware Observability Framework Hardware Observability Framework Hardware Observability Framework Hardware

743 views • 22 slides

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is SPARK? A sub-language of Ada 83 and 95 with particular properties that make it ideally suited to the most critical of applications: completely

848 views • 10 slides

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark Architecture From MX to Spark MX Rich, styleable components Heavy components => Easy to use (most of the time) Spark introduces

500 views • 30 slides

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more than a name change. It It reflects enormous change for our customers fl t h f t and our business. Our ambition is to be a winning business,

667 views • 30 slides

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of Meeting: Introductions and formalities Chairmans address Managing Director update Resolutions Shareholder questions Conduct of polls Meeting

421 views • 38 slides

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green Snyder, Ph.D. LeeAnne Green Snyder, Ph.D. May 30, 2019 May 30, 2019 Acknowledgements SPARK Families SPARK Team Clinical Sites Libby Brooks,

521 views • 40 slides

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

10/05/2019 Big Data : Informatique pour les donnes et calculs massifs 7 SPARK technology Stphane Vialle Stephane.Vialle@centralesupelec.fr http://www.metz.supelec.fr/~vialle Spark Technology 1. Spark main objectives 2. RDD concepts

818 views • 39 slides

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries included with Spark Spark MLlib Spark SQL GraphX Streaming machine structured graph learning real-time Spark Core Outline Introduction to

682 views • 40 slides

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark: A Unified Engine for Big Data Processing Engine? Unified? Apache Spark: A Unified Engine for Big Data Processing PAGE 2 Apache Spark: A

499 views • 36 slides

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx Streaming Spark Dataframe Spark Core (RDD) 2 Machine Learning Algorithms Supervised learning Given a set of features and labels Builds a model that

590 views • 24 slides

Spark Overview / High-level Architecture Indexing from Spark Reading data from Solr + term

Spark Overview / High-level Architecture Indexing from Spark Reading data from Solr + term vectors & Spark SQL Document Matching user since 2010, committer since April 2014, work for SolrCloud features and bin/solr! Release manager

569 views • 22 slides

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

The Business of Making Strategies for Success from Startup to Exit Hardware and Robotic Startup Accelerator what hardware used to be . VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to Entry Open

605 views • 32 slides

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

SP SPACE ACE 201 2016 Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security Security: : New Front New Frontiers iers Swarup Bhunia Professor Electrical & Computer Engineering SPACE | Dec 2016 1

608 views • 29 slides

Spark Processing 101 September 10, 2015 Justin Sun Overview What is Spark? SparkContext

Spark Processing 101 September 10, 2015 Justin Sun Overview What is Spark? SparkContext Resilient Distributed Datasets (RDDs) Transformations Actions Code Examples Resources What is Spark? General cluster

356 views • 10 slides

Developing Materials Using some Principles from SLA Diane Schmitt More to Word Knowledge than

Developing Materials Using some Principles from SLA Diane Schmitt More to Word Knowledge than Just Meaning meaning spelling pronunciation associations grammar word collocations frequency formality connotation More to Word Knowledge

410 views • 14 slides

Dynamic Deployment and Scalability for the Cloud Jerome Bernard Director, EMEA Operations

Dynamic Deployment and Scalability for the Cloud Jerome Bernard Director, EMEA Operations Elastic Grid, LLC. Speakers qualifications Jerome Bernard is a committer on Rio, Typica, JiBX and co-founder of Elastic Grid, LLC. Jerome

724 views • 48 slides

Simplify Container Networking With iCAN Huawei Cloud Network Lab Container Network Defined By

Simplify Container Networking With iCAN Huawei Cloud Network Lab Container Network Defined By Application 2 What we face today Automation Deployment and Orchestration: Automate deploy resource for application based on Application SLA

561 views • 18 slides

Team: iTimer Hsien-Han Cheng 1 , Tung-Wei Lin 2 , Yu-Cheng Lin 2 , Iris Hui-Ru Jiang 2 ,Pei-Yu Lee

TAU2019 Timing Contest Team: iTimer Hsien-Han Cheng 1 , Tung-Wei Lin 2 , Yu-Cheng Lin 2 , Iris Hui-Ru Jiang 2 ,Pei-Yu Lee 3 1 National Chiao Tung University 2 National Taiwan University 3 Maxeda Technology Problem Formulation The Design

291 views • 14 slides

Continuous Integration & Deploying using Jenkins and uDeploy (Projects used are of

Continuous Integration & Deploying using Jenkins and uDeploy (Projects used are of Java,PHP&Dotnet) Contents Installing the UDeploy Agent

797 views • 42 slides

Big Data for Data Science The MapReduce Framework & Hadoop event.cwi.nl/lsde Key premise:

Big Data for Data Science The MapReduce Framework & Hadoop event.cwi.nl/lsde Key premise: divide and conquer work partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 combine result event.cwi.nl/lsde Parallelisation challenges

906 views • 68 slides

Optimal convergence rates for distributed optimization Francis Bach Inria - Ecole Normale

Optimal convergence rates for distributed optimization Francis Bach Inria - Ecole Normale Sup erieure, Paris Joint work with K. Scaman , S. Bubeck, Y.-T. Lee and L. Massouli e LCCC Workshop - June 2017 Motivations Typical Machine

999 views • 51 slides

DEPLOYING AND SCALING MICROSERVICES Sam Newman Goto Chicago 2016 @samnewman Building

DEPLOYING AND SCALING MICROSERVICES Sam Newman Goto Chicago 2016 @samnewman Building Microservices DESIGNING FINE - GRAINED SYSTEMS Sam Newman @samnewman Core Principles @samnewman Core Principles Artifacts @samnewman Core Principles

2.36k views • 190 slides