Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK - PowerPoint PPT Presentation

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science Analyst, CyVerse

What is Big Data? Big data is a term used to refer to the study and applications of data sets that are too complex for traditional data-processing software - Wikipedia BIG DATA FUNDAMENTALS WITH PYSPARK

The 3 V's of Big Data Volume, Variety and Velocity Volume : Size of the data Variety : Different sources and formats Velocity : Speed of the data BIG DATA FUNDAMENTALS WITH PYSPARK

Big Data concepts and Terminology Clustered computing : Collection of resources of multiple machines Parallel computing : Simultaneous computation Distributed computing : Collection of nodes (networked computers) that run in parallel Batch processing : Breaking the job into small pieces and running them on individual machines Real-time processing : Immediate processing of data BIG DATA FUNDAMENTALS WITH PYSPARK

Big Data processing systems Hadoop/MapReduce: Scalable and fault tolerant framework written in Java Open source Batch processing Apache Spark: General purpose and lightning fast cluster computing system Open source Both batch and real-time data processing BIG DATA FUNDAMENTALS WITH PYSPARK

Features of Apache Spark framework Distributed cluster computing framework Ef�cient in-memory computations for large data sets Lightning fast data processing framework Provides support for Java, Scala, Python, R and SQL BIG DATA FUNDAMENTALS WITH PYSPARK

Apache Spark Components BIG DATA FUNDAMENTALS WITH PYSPARK

Spark modes of deployment Local mode: Single machine such as your laptop Local model convenient for testing, debugging and demonstration Cluster mode: Set of pre-de�ned machines Good for production Work�ow: Local -> clusters No code change necessary BIG DATA FUNDAMENTALS WITH PYSPARK

Coming up next - PySpark BIG DATA F UN DAMEN TALS W ITH P YS PARK

PySpark: Spark with Python BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science Analyst, CyVerse

Overview of PySpark Apache Spark is written in Scala T o support Python with Spark, Apache Spark Community released PySpark Similar computation speed and power as Scala PySpark APIs are similar to Pandas and Scikit-learn BIG DATA FUNDAMENTALS WITH PYSPARK

What is Spark shell? Interactive environment for running Spark jobs Helpful for fast interactive prototyping Spark’s shells allow interacting with data on disk or in memory Three different Spark shells: Spark-shell for Scala PySpark-shell for Python SparkR for R BIG DATA FUNDAMENTALS WITH PYSPARK

PySpark shell PySpark shell is the Python-based command line tool PySpark shell allows data scientists interface with Spark data structures PySpark shell support connecting to a cluster BIG DATA FUNDAMENTALS WITH PYSPARK

Understanding SparkContext SparkContext is an entry point into the world of Spark An entry point is a way of connecting to Spark cluster An entry point is like a key to the house PySpark has a default SparkContext called sc BIG DATA FUNDAMENTALS WITH PYSPARK

Inspecting SparkContext Version: T o retrieve SparkContext version sc.version 2.3.1 Python Version: T o retrieve Python version of SparkContext sc.pythonVer 3.6 Master: URL of the cluster or “local” string to run in local mode of SparkContext sc.master local[*] BIG DATA FUNDAMENTALS WITH PYSPARK

Loading data in PySpark SparkContext's parallelize() method rdd = sc.parallelize([1,2,3,4,5]) SparkContext's textFile() method rdd2 = sc.textFile("test.txt") BIG DATA FUNDAMENTALS WITH PYSPARK

Let's practice BIG DATA F UN DAMEN TALS W ITH P YS PARK

Use of Lambda function in python - �lter() BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science Analyst, CyVerse

What are anonymous functions in Python? Lambda functions are anonymous functions in Python Very powerful and used in Python. Quite ef�cient with map() and filter() Lambda functions create functions to be called later similar to def It returns the functions without any name (i.e anonymous) Inline a function de�nition or to defer execution of a code BIG DATA FUNDAMENTALS WITH PYSPARK

Lambda function syntax The general form of lambda functions is lambda arguments: expression Example of lambda function double = lambda x: x * 2 print(double(3)) 6 BIG DATA FUNDAMENTALS WITH PYSPARK

Difference between def vs lambda functions Python code to illustrate cube of a number def cube(x): return x ** 3 g = lambda x: x ** 3 print(g(10)) print(cube(10)) 1000 1000 No return statement for lambda Can put lambda function anywhere BIG DATA FUNDAMENTALS WITH PYSPARK

Use of Lambda function in python - map() map() function takes a function and a list and returns a new list which contains items returned by that function for each item General syntax of map() map(function, list) Example of map() items = [1, 2, 3, 4] list(map(lambda x: x + 2 , items)) [3, 4, 5, 6] BIG DATA FUNDAMENTALS WITH PYSPARK

Use of Lambda function in python - �lter() �lter() function takes a function and a list and returns a new list for which the function evaluates as true General syntax of �lter() filter(function, list) Example of �lter() items = [1, 2, 3, 4] list(filter(lambda x: (x%2 != 0), items)) [1, 3] BIG DATA FUNDAMENTALS WITH PYSPARK

Let's practice BIG DATA F UN DAMEN TALS W ITH P YS PARK

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK - PowerPoint PPT Presentation

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science Analyst, CyVerse What is Big Data? Big data is a term used to refer to the study and applications of data sets that are too complex for traditional

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

MODULE 5 HVAC FUNDAMENTALS OF MODERN LABORATORY DESIGN Module 5 PG1 5 HVAC FUNDAMENTALS OF

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

HPE SecureData for Big Data Platform HPE Vertica Big Data Platform HPE Security Data

BIG DATA IN HIGH ENERGY PHYSICS Igor Mandrichenko Big Data meeting 4/3/2015 What is Big Data ?

BIG DATA 2 This is the Big Data era Big Data are linked System G WHAT IS GRAPH COMPUTING

From Big Data Management to Big Data Science 1 What is next? Real big data is widely available

CS535 Big Data 2/5/2020 Week 3- B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

Giuliana Aquilanti giuliana.aquilanti@elettra.eu XAS1 smr2812 giuliana.aquilanti@elettra.eu

Notes Shells Simple addition to previous bending formulation: allow for nonzero rest angles

The Shell Model: An Unified Description of the Structure of the Nucleus (III) ALFREDO POVES

Tight relative 2-designs on 2 shells in Johnson scheme Yan Zhu, Eiichi Bannai and Etsuko Bannai

Wayland Input Methods Michael Hasselmann Openismus GmbH Wayland Input Methods Input methods?

The Density Matrix Renormalization Group Method for Realistic Large-Scale Nuclear Shell-Model

Log in using secure shell ssh Y user@tak PuTTY on Windows Unix: Beyond the Basics George W

Software Development in Engineering and Science (SDES) Using Linux Tools FOSSEE team (In

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK - PowerPoint PPT Presentation

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science Analyst, CyVerse What is Big Data? Big data is a term used to refer to the study and applications of data sets that are too complex for traditional

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

MODULE 5 HVAC FUNDAMENTALS OF MODERN LABORATORY DESIGN Module 5 PG1 5 HVAC FUNDAMENTALS OF

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

HPE SecureData for Big Data Platform HPE Vertica Big Data Platform HPE Security Data

BIG DATA IN HIGH ENERGY PHYSICS Igor Mandrichenko Big Data meeting 4/3/2015 What is Big Data ?

BIG DATA 2 This is the Big Data era Big Data are linked System G WHAT IS GRAPH COMPUTING

From Big Data Management to Big Data Science 1 What is next? Real big data is widely available

CS535 Big Data 2/5/2020 Week 3- B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

Giuliana Aquilanti giuliana.aquilanti@elettra.eu XAS1 smr2812 giuliana.aquilanti@elettra.eu

Notes Shells Simple addition to previous bending formulation: allow for nonzero rest angles

The Shell Model: An Unified Description of the Structure of the Nucleus (III) ALFREDO POVES

Tight relative 2-designs on 2 shells in Johnson scheme Yan Zhu, Eiichi Bannai and Etsuko Bannai

Wayland Input Methods Michael Hasselmann Openismus GmbH Wayland Input Methods Input methods?

The Density Matrix Renormalization Group Method for Realistic Large-Scale Nuclear Shell-Model

Log in using secure shell ssh Y user@tak PuTTY on Windows Unix: Beyond the Basics George W

Software Development in Engineering and Science (SDES) Using Linux Tools FOSSEE team (In

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data