setting up spark pyspark and notebook
play

Setting Up Spark, PySpark and Notebook Setting up your workstation - PowerPoint PPT Presentation

Setting Up Spark, PySpark and Notebook Setting up your workstation Well Session Outline Set up your system Run Hello World 2 Your System Ubuntu 16.04LTS 64-bit Setting up Python3 (Anaconda) What well


  1. Setting Up Spark, PySpark and Notebook Setting up your workstation

  2. We’ll Session Outline Set up your system ● ● Run “Hello World” 2

  3. Your System ● Ubuntu 16.04LTS 64-bit ● Setting up Python3 (Anaconda) ● What we’ll set-up Spark2.0 ● findspark ● 3

  4. We’ll Hello World Start a local Spark server ● Use pyspark to run a program ● ● Understand the Spark MasterWebUI 4

  5. Setting Up 5

  6. Download link http://d3kbcqa49mib13.cloudfron ● t.net/spark-2.0.0-bin-hadoop2.7.tg Install Spark z Spark Download Page We’ll use Spark 2.0.0, prebuilt for http://spark.apache.org/download ● Hadoop 2.7 or later s.html 6

  7. PySpark isn't on sys.path by ● default This means the Python kernel in ○ Jupyter Notebook doesn’t know where to look for PySpark You can address this by either ● PySpark ○ symlinking pyspark into your site-packages, or adding pyspark to sys.path at ○ How to talk to PySpark from runtime by passing the path diretly ■ Jupyter Notebooks ■ by looking at a running instance findspark adds pyspark to ● sys.path at runtime 7

  8. findspark homepage https://github.com/minrk/findspa ● PySpark rk Install How to talk to PySpark from pip install findspark Jupyter Notebooks 8

  9. Hello World 9

  10. If you’ve used the link in the last slide to download Spark, then ● go to the folder it has been downloaded in Install Spark > tar xvzf spark-2.0.0-bin-hadoop2.7.tgz > mv spark-2.0.0-bin-hadoop2.7 spark2 Just extract the files and folders Start a local (master) server from the compressed file and you ● are done. > cd spark2/sbin > ./start-master.sh 10

  11. 11

  12. localhost:8080 12

  13. Hello World in Spark (counting words) import findspark # provide path to your spark directory directly findspark.init("/home/soumendra/downloads/spark2") import pyspark sc = pyspark.SparkContext(appName="helloworld") # let's test our setup by counting the number of lines in a text file lines = sc.textFile('/home/soumendra/helloworld') lines_nonempty = lines.filter( lambda x: len(x) > 0 ) lines_nonempty.count() 13

  14. Hello World in Spark (counting words) Spark_Activities_01_Basics.ipynb: Activity 1 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend