Setting Up Spark, PySpark and Notebook Setting up your workstation - PowerPoint PPT Presentation

May 04, 2023 •252 likes •408 views

Setting Up Spark, PySpark and Notebook Setting up your workstation Well Session Outline Set up your system Run Hello World 2 Your System Ubuntu 16.04LTS 64-bit Setting up Python3 (Anaconda) What well

Setting Up Spark, PySpark and Notebook Setting up your workstation
We’ll Session Outline Set up your system ● ● Run “Hello World” 2
Your System ● Ubuntu 16.04LTS 64-bit ● Setting up Python3 (Anaconda) ● What we’ll set-up Spark2.0 ● findspark ● 3
We’ll Hello World Start a local Spark server ● Use pyspark to run a program ● ● Understand the Spark MasterWebUI 4
Setting Up 5
Download link http://d3kbcqa49mib13.cloudfron ● t.net/spark-2.0.0-bin-hadoop2.7.tg Install Spark z Spark Download Page We’ll use Spark 2.0.0, prebuilt for http://spark.apache.org/download ● Hadoop 2.7 or later s.html 6
PySpark isn't on sys.path by ● default This means the Python kernel in ○ Jupyter Notebook doesn’t know where to look for PySpark You can address this by either ● PySpark ○ symlinking pyspark into your site-packages, or adding pyspark to sys.path at ○ How to talk to PySpark from runtime by passing the path diretly ■ Jupyter Notebooks ■ by looking at a running instance findspark adds pyspark to ● sys.path at runtime 7
findspark homepage https://github.com/minrk/findspa ● PySpark rk Install How to talk to PySpark from pip install findspark Jupyter Notebooks 8
Hello World 9
If you’ve used the link in the last slide to download Spark, then ● go to the folder it has been downloaded in Install Spark > tar xvzf spark-2.0.0-bin-hadoop2.7.tgz > mv spark-2.0.0-bin-hadoop2.7 spark2 Just extract the files and folders Start a local (master) server from the compressed file and you ● are done. > cd spark2/sbin > ./start-master.sh 10
11
localhost:8080 12
Hello World in Spark (counting words) import findspark # provide path to your spark directory directly findspark.init("/home/soumendra/downloads/spark2") import pyspark sc = pyspark.SparkContext(appName="helloworld") # let's test our setup by counting the number of lines in a text file lines = sc.textFile('/home/soumendra/helloworld') lines_nonempty = lines.filter( lambda x: len(x) > 0 ) lines_nonempty.count() 13
Hello World in Spark (counting words) Spark_Activities_01_Basics.ipynb: Activity 1 14

Recommend

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark Streaming and Spark SQL Explored Streaming API of Apache Spark on Ukko Cluster Window based Stream Content Direct Stream content

221 views • 9 slides

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust - @michaelarmbrust What is Apache Spark? Fast and general cluster computing system, interoperable with Hadoop, included in all major distros

667 views • 43 slides

Introduction to PySpark DataFrames BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty

Introduction to PySpark DataFrames BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science Analyst, CyVerse What are PySpark DataFrames? PySpark SQL is a Spark library for structured data. It provides more information about the

1.01k views • 33 slides

PySpark()(Data(Processing(in(Python( on(top(of(Apache(Spark Peter%Hoffmann

PySpark()(Data(Processing(in(Python( on(top(of(Apache(Spark Peter%Hoffmann Twi$er:(@peterhoffmann github.com/blue.yonder Spark&Overview Spark&is&a& distributed)general)purpose)cluster) engine

658 views • 24 slides

COMP9313: Big Data Management Classification and PySpark MLlib PySpark MLlib MLlib is

COMP9313: Big Data Management Classification and PySpark MLlib PySpark MLlib MLlib is Sparks scalable machine learning library consisting of common learning algorithms and utilities Basic Statistics Classification Regression

1.24k views • 58 slides

Lecture 4: Cylinders, quadric surfaces and vector functions 1 math2110L4Full.notebook 2

math2110L4Full.notebook Lecture 4: Cylinders, quadric surfaces and vector functions 1 math2110L4Full.notebook 2 math2110L4Full.notebook 3 math2110L4Full.notebook Quadric surfaces 4 math2110L4Full.notebook 5 math2110L4Full.notebook 6

426 views • 29 slides

Engineering Notebook What is an Engineering Notebook? An engineering notebook helps a team to

Engineering Notebook What is an Engineering Notebook? An engineering notebook helps a team to document the journey they go on throughout the year, and it helps to display things that the team has learned throughout the year. This notebook

203 views • 5 slides

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is SPARK? A sub-language of Ada 83 and 95 with particular properties that make it ideally suited to the most critical of applications: completely

848 views • 10 slides

Choosing the Algorithm FE ATU R E E N G IN E E R IN G W ITH P YSPAR K John Hog u e Lead Data

Choosing the Algorithm FE ATU R E E N G IN E E R IN G W ITH P YSPAR K John Hog u e Lead Data Scientist , General Mills Spark ML Landscape FEATURE ENGINEERING WITH PYSPARK Spark ML Landscape FEATURE ENGINEERING WITH PYSPARK Spark ML

467 views • 32 slides

Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda

Python, PySpark and Riak TS Stephen Etheridge Lead Solution Architect, EMEA Agenda Introduction to Riak TS The Riak Python client The Riak Spark connector and PySpark Basho Technologies | 3 CONFIDENTIAL BASHO SNAPSHOT

787 views • 15 slides

Overview of PySpark MLlib BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Overview of PySpark MLlib BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science Analyst, CyVerse What is PySpark MLlib? MLlib is a component of Apache Spark for machine learning Various tools provided by MLlib include: ML

701 views • 36 slides

8.6.20 1 English Term 6 Week 2.notebook June 06, 2020 8.6.20 2 English Term 6 Week 2.notebook

English Term 6 Week 2.notebook June 06, 2020 8.6.20 1 English Term 6 Week 2.notebook June 06, 2020 8.6.20 2 English Term 6 Week 2.notebook June 06, 2020 8.6.20 3 English Term 6 Week 2.notebook June 06, 2020 Awongalema 9.6.20 Can you

706 views • 13 slides

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green Snyder, Ph.D. LeeAnne Green Snyder, Ph.D. May 30, 2019 May 30, 2019 Acknowledgements SPARK Families SPARK Team Clinical Sites Libby Brooks,

521 views • 40 slides

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more than a name change. It It reflects enormous change for our customers fl t h f t and our business. Our ambition is to be a winning business,

667 views • 30 slides

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of Meeting: Introductions and formalities Chairmans address Managing Director update Resolutions Shareholder questions Conduct of polls Meeting

421 views • 38 slides

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

10/05/2019 Big Data : Informatique pour les donnes et calculs massifs 7 SPARK technology Stphane Vialle Stephane.Vialle@centralesupelec.fr http://www.metz.supelec.fr/~vialle Spark Technology 1. Spark main objectives 2. RDD concepts

818 views • 39 slides

High Throughput Computing Notebooks HTCondor Week 2019 Todd Tannenbaum Center for High

High Throughput Computing Notebooks HTCondor Week 2019 Todd Tannenbaum Center for High Throughput Computing 1 Jupyter Notebook Open source browser-based application to create and share interactive documents that contain Live code

311 views • 16 slides

Interactive widgets in the Jupyter Notebook @QuantStack Sylvain Corlay Johan Mabille Wolf

Interactive widgets in the Jupyter Notebook @QuantStack Sylvain Corlay Johan Mabille Wolf Vollprecht Martin Renou Maarten Breddels @SylvainCorlay @JohanMabille @wolfv @martinRenou @maartenbreddels @SylvainCorlay

310 views • 8 slides

ID2223 - Lab Preparation Jupyter Notebooks on Docker Containers In the lab assignments, we will

ID2223 - Lab Preparation Jupyter Notebooks on Docker Containers In the lab assignments, we will use the Jupyter Notebook as the development environment. Since installing the required software packages in the lab assignments is a time consuming

345 views • 5 slides

The quasitopological fundamental group and the first shape map Jeremy Brazas 28th Summer

The fundamental group of a Peano continuum The first shape homomorphism The quasitopological fundamental group Comparing the approaches The quasitopological fundamental group and the first shape map Jeremy Brazas 28th Summer Conference on

1.24k views • 62 slides

H OW TO INSTALL J UPYTER ? Generally: pip install jupyter But what if you dont

J UPYTER / ROOT / G ALLERY I NTEGRATION Wesley Ketchum (FNAL) 19 June 2018 1 E XAMPLE REPOSITORY https://github.com/wesketchum/Sample_Notebooks Contains basically everything Ill mention here today Startup readme:

926 views • 14 slides

JupyterLab: Ian Rose, UC Berkeley Jessica Forde, Jupyter The Evolution of the Jupyter Jason

The JupyterLab Team Jason Grout , Bloomberg Chris Colbert, Jupyter Steven Silvester, JPMorgan Chase Afshin Darian, Jupyter Brian Granger, Cal Poly JupyterLab: Ian Rose, UC Berkeley Jessica Forde, Jupyter The Evolution of the Jupyter

965 views • 6 slides

We are learning to : discuss what we have written with others Just for today we will compare

sharing a shell day 2.notebook June 04, 2020 We are learning to : discuss what we have written with others Just for today we will compare two scenes in the story. sharing a shell day 2.notebook June 04, 2020 Write everything that you

426 views • 7 slides

IPython Notebook as a Unified Data Science Interface for Hadoop Casey Stella Spring, 2015 Casey

IPython Notebook as a Unified Data Science Interface for Hadoop Casey Stella Spring, 2015 Casey Stella (Hortonworks) IPython Notebook as a Unified Data Science Interface for Hadoop Spring, 2015 Table of Contents Preliminaries Data Science in

497 views • 23 slides