with Tableau Avirup Chakraborty(MDS201908) Debangshu - - PowerPoint PPT Presentation

with tableau
SMART_READER_LITE
LIVE PREVIEW

with Tableau Avirup Chakraborty(MDS201908) Debangshu - - PowerPoint PPT Presentation

Big Data Visualization with Tableau Avirup Chakraborty(MDS201908) Debangshu Bhattacharya(MDS201910) Ipsita Ghosh(MDS201913) Swaraj Bose(MDS201936) Sreya K.K.(MDS201804) What is big data? Extremely large data sets that may be analyzed


slide-1
SLIDE 1

Big Data Visualization with Tableau

Avirup Chakraborty(MDS201908) Debangshu Bhattacharya(MDS201910) Ipsita Ghosh(MDS201913) Swaraj Bose(MDS201936) Sreya K.K.(MDS201804)

slide-2
SLIDE 2

What is big data?

Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.

 Velocity  Variety  Volume  Veracity

slide-3
SLIDE 3

Why data visualization is important?

 communicates relationships of the data with images  allows trends and patterns to be more easily seen  give meaning to complicated datasets so that their

message is clear and concise

 outlier detection becomes easier  results from complex algorithms are much easier to

understand in a visual format

 summary of data

slide-4
SLIDE 4

Challenges in big data visualization (4 V’s yet again!)

 Traditional visualization tools are not capable of handling

large datasets. Eg: MS Excel, Minitab

 Providing low latency in visualization  Parallelization is required  Dimensions of the data has to be carefully chosen  Most current visualization tools have low performance

w.r.t scalability, functionality and response time

slide-5
SLIDE 5

Steps for big data visualization

Data acquisition Parsing and filtering Mining hidden patterns Data visualization Refinement

slide-6
SLIDE 6
slide-7
SLIDE 7

 a powerful and fast growing data visualization tool

used in the Business Intelligence Industry.

 connects easily to nearly any data source.  allows for instantaneous insight on data by

transforming it into interactive visualizations called dashboards.

What is Tableau?

slide-8
SLIDE 8

Why is Tableau helpful?

 Handle large volume of data  No scripts or code required, provides user interface  Filter multiple datasets simultaneously  Creates interactive and shareable dashboards depicting

trends and variations

 Incorporate other programming languages to do complex

calculations

 And many more….

slide-9
SLIDE 9

Trivia

 Founded: January 2003, California  Founders: Christian Chabot, Chris Stolte (Stanford

University) , Pat Hanrahan

 Headquarters: Seattle, California  Website: https://www.tableau.com/  Built using C++  Latest version: 2020.1

slide-10
SLIDE 10
slide-11
SLIDE 11

Tableau Desktop:

A data visualization tool designed to create data visualization, report and dashboard in a fast and intelligent way.

 Users can connect to multiple data sources, carry out multi-dimensional

data analysis, create dashboards or report, modify metadata and publish a complete workbook to Tableau server if needed.

 Adapt your content performance for any size and any device (i.e. Desktop,

laptop, tablet or even a smartphone!).

slide-12
SLIDE 12

Tableau Desktop

Personal Edition Professional Edition

slide-13
SLIDE 13

Personal Edition Professional Edition

Connects to limited data sources as: Microsoft Access, Microsoft Excel, Microsoft Azure, Tableau Data Extract, Text files (CSVs). Connects to a wider variety of data sources: Amazon Redshift, Google Analytics, Google BigQuery, Hortonworks Hadoop, OLAP databases, Salesforce. Cannot connect to Tableau Server but allows users to create package files for Tableau Reader. Enables connection to Tableau Server and creating package files for Tableau Reader. Costs $999 per user. Costs $1999 per user.

slide-14
SLIDE 14

Tableau Server:

Tableau server is essentially an online hosting platform to hold all your tableau workbooks, data sources and more. It works like any other server, you can store things here and they will safe from fires and pesky hackers.

So, what are the advantages of Tableau Server??

slide-15
SLIDE 15
  • 1. Firstly…. COLLABORATION!

 Being a Tableau product, Tableau Server lets us to use the

functionality of Tableau, without needing to always be downloading and opening workbooks.

 Users need not to install Tableau Desktop on their machine, and

they can still interact with dashboards shared with them.

slide-16
SLIDE 16
  • 3. COMPATIBILITY

Tableau Server supports variety of Android apps, iPhone apps and Web browsers like Internet Explorer, Mozilla Firefox, Google Chrome and Safari.

  • 2. CLOUD SUPPORT

Tableau server can be deployed on-premises as well as in public clouds like Azure, AWS, IBM Cloud, Google Cloud Platform etc. It also enables an administrator to track and manage the content, licenses, performance, and permissions for data sources with ease.

slide-17
SLIDE 17
  • 4. LIMITED ACCESS DESIGN

On Tableau Server, we can set permissions to different bits

  • f work, to allow us as an organization to determine who

can access and interact with what. Let us illustrate this using a really simple example >>

slide-18
SLIDE 18

Consider this ‘imaginary’ company consisting: Tony Stark

  • Dr. Bruce Banner

and Nick Fury ❖ Tony Stark has access on server to upload and edit work in a project containing test documents. ❖ Dr. Banner can interact with only the production quality documents. ❖ And…Nick Fury can access but not edit the final presentation documents. ❖ Of course…Loki cannot even have a look at the documents! (at least we can hope so)

slide-19
SLIDE 19
slide-20
SLIDE 20

Tableau Public:

Tableau Public is a FREE tool that anyone can use to connect to data, create interactive data visualizations and publish them on the web.

 Once these visualizations are in Tableau public one can share to social medias

  • r even can embed on webpages.

 Since everyone has access to published data, user should be careful not to

put the proprietary data on Tableau Public.

slide-21
SLIDE 21

Limitations to Tableau Public:

 Row limitation:

Limited to 15,000,000 rows of data per workbook.

 Limited storage:

Limited to ten gigabytes (10 GB) of storage space for your workbooks.

 No workbook privacy:

Tableau Public does not allow to save workbooks locally. One has to save them publicly which means that everyone can see the data since it’s saved on the cloud.

 No security:

As visualizations are public so anyone can access the data and make change by downloading the workbook.

slide-22
SLIDE 22

Tableau Online:

Tableau Online is a hosted version of Tableau Server. It is the business analytics platform where people can share dashboards, interact with report and gain insights. It is hosted in the cloud so that there is no hardware, no set-up time needed.

“Want the sharing and collaboration of Server, but without having to actually manage a server? Then you want Tableau

  • Online. Secure. Scalable. And Look Ma—No

hardware to maintain!”

  • https://www.tableau.com/products

Roughly, Tableau Online can also be thought as a private version (and paid, obviously) of Tableau Public.

slide-23
SLIDE 23

Key Features:

 Fully hosted in the cloud. Servers are managed by Tableau

Team.

 Supports live data connections to Amazon Redshift, Google

BigQuery, as well as to SQL-based sources hosted on cloud platforms.

 Ideal for small number of users who need to be able to

interact with the data and visualizations in a secure way.

slide-24
SLIDE 24

 Easily accessible from a browser or Tableau Mobile App.  Authenticate users through TableauID (email address and

password). No guest access allowed.

 Subscription rate is $500 per user for one year (half the

price of individual Tableau Server Licenses)

Key Features (Contd.):

slide-25
SLIDE 25

Tableau Reader

Tableau Reader is a FREE desktop application Allows interaction with data visualizations,

created with Tableau Desktop.

 Users can filter, drill-down and view the details

  • f the data as long as the author allows.
slide-26
SLIDE 26

Tableau Start Page

slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

Data grid - Displays first 1,000 rows of the data contained in the Tableau data source. Left pane- Displays the connected data source and other details about your data. Canvas- Displays information about how the data source is set up and

  • ptions for

combining the data. Metadata grid- Displays the fields in your data source as rows.

slide-30
SLIDE 30

Tableau Worksheet

slide-31
SLIDE 31
slide-32
SLIDE 32

The Dashboard Workspace

slide-33
SLIDE 33
slide-34
SLIDE 34

DEMO

slide-35
SLIDE 35

Philosophy of Tableau working with Big Data

  • Democratisation of Data: Knowledge workers of all skill levels

should be able to access and analyze data wherever it resides.

  • Partnerships within the Big Data Ecosystem
slide-36
SLIDE 36

Overview of how Tableau works with big data

slide-37
SLIDE 37

Data access and connectivity

To enable analysis of data of any size and format, Tableau supports broad access to data wherever it lives.

  • SQL and NoSQL based connections — Tableau uses SQL to

interface with Hadoop, NoSQL databases and Spark.

  • Open Database Connectivity(ODBC) — By using ODBC, one

can access any data source that supports the SQL standard and implements the ODBC API. For Hadoop, this includes interfaces such as Hive Query Language (HiveQL), Impala SQL, BigSQL and Spark SQL.

  • Web Data Connector — With the Tableau Web Data Connector

SDK, users can build connections to data that lives outside of the existing connectors which is any data accessible over HTTP , including internal web services, JSON data, and REST API.

slide-38
SLIDE 38

Fast Interaction with all data at scale

  • 1. Hyper data engine
  • Hyper is a high-performance in-memory data engine technology

that helps customers analyze large or complex data sets faster.

  • They use dynamic code generation and cutting-edge parallelism

techniques to achieve high query speed.

  • Hyper can also augment and accelerate slower data sources by

creating an extract of the data and bringing it in-memory.

slide-39
SLIDE 39

Fast Interaction with all data at scale

  • 2. Hybrid data architecture
  • Tableau can connect live to data sources or bring data (or a

subset) in-memory.

  • Users can go back and forth between these modes to suit their

needs.

  • This hybrid approach brings a lot of flexibility and helps in query
  • ptimization.
slide-40
SLIDE 40

Fast Interaction with all data at scale

  • 3. VizQL™
  • A traditional analysis tool analyzes data in rows and columns,

choose a subset of the data to present, organize that data into a table, then create a chart from that table.

  • VizQL creates a visual representation of the data right away,

giving visual feedback as the user analyzes.

  • VizQL provides an intuitive user experience that lets people

answer questions as fast as they can think of them.

  • In this cycle of visual analysis, users learn as they go, add more

data if needed, and ultimately get deeper insights.

slide-41
SLIDE 41

Tableau and Big data analytics ecosystem

Tableau fits nicely in the big data paradigm because it prioritizes flexibility—the ability to move data across platforms, adjust infrastructure on demand, take advantage of new data types, and enable new users and use cases.

Cloud infrastructure

  • Organizations are increasingly moving business processes and infrastructure to the cloud.
  • Cloud based infrastructure and data services have removed some of the major hurdles

faced with on-premises Hadoop data lakes.

  • Cloud-based big data analytics solutions are easier to implement and manage than ever

before.

  • Tableau delivers key integrations with cloud-based technologies that organizations already

use, including Amazon Web Services, Google Cloud Platform and Microsoft Azure.

slide-42
SLIDE 42

Ingest and prep

  • In modern ingest-and-load design patterns, the destination

for raw data of any size or shape is often a data lake.

  • Stream data is generated continuously by connected devices

and apps located everywhere, such as social networks, smart meters, home automation, video games, and IoT sensors.

  • Often, this data is collected via pipelines of semi-structured

data.

  • While real-time analytics and predictive algorithms can be

applied to streams, we typically see stream data routed and stored in raw formats using lambda architecture and into a data lake, such as Hadoop, for analytics usage.

slide-43
SLIDE 43

Ingest and prep

  • Lambda architecture is a data processing architecture designed to handle

massive quantities of data by taking advantage of both batch and stream processing methods.

  • The design balances latency, throughput, and fault tolerance challenges.
  • A variety of options exist today for streaming data including Amazon Kinesis,

Storm, Flume, Kafka, and Informatica Vibe Data Stream.

  • Once data has landed in a data lake, it needs to be ingested and prepared

for analysis.

  • Tableau has partners like Informatica, Alteryx, Trifacta, and Datameer that

help with this process and work fluidly with Tableau.

slide-44
SLIDE 44

Storage

  • 1. Hadoop Data Lake
  • Hadoop has been used for data lakes due to its resilience and low cost, scale-out data storage,

parallel processing, and clustered workload management.

  • It provides massive storage for any kind of data, massive processing power, and the ability to

handle extreme volumes of concurrent tasks or jobs.

  • Tableau provides direct connectivity to all the major Hadoop distributions with Cloudera via

Impala, Hortonworks via Hive, and MapR via Apache Drill.

  • 2. Databases and Data warehouses
  • Even companies who adopt other technologies typically retain relational databases as a part of

their data source mixture. Snowflake is one example of a cloud-native SQL-based enterprise data warehouse with a native Tableau connector.

slide-45
SLIDE 45

Storage

  • 3. Cloud
  • Object stores, such as Amazon Web Services Simple Storage Service (S3).
  • Tableau supports Amazon’s Athena data service to connect to Amazon S3.
  • 4. NoSQL Databases
  • NoSQL databases with flexible schemas can also be used as data lakes.
  • Tableau has various tools that enable connectivity to NoSQL databases directly.
  • Examples of NoSQL databases that are often used with Tableau include, but

are not limited to, MongoDB, Datastax, and MarkLogic.

slide-46
SLIDE 46

Processing

  • The data science and engineering platform, Databricks,
  • ffers data processing on Spark.
  • Spark is a popular engine for both batch-oriented and

interactive, scale-out data processing.

  • Through a native connector to Spark, one can visualize the

results of complex machine learning models from Databricks in Tableau.

slide-47
SLIDE 47

Query acceleration

  • Faster databases leveraging in-memory and massive parallel

processing (MPP) technology like Exasol and MemSQL

  • Hadoop-based stores like Kudu
  • Technologies that enable faster queries with preprocessing like

Vertica.

  • Query Accelerators

❖SQL-on-Hadoop engines like Apache Impala, Hive LLAP , etc. ❖Online Analytical Processing(OLAP)-on-Hadoop technologies like AtScale, etc.

slide-48
SLIDE 48

Data Catolog

  • Enterprise data catalogs essentially serve as a

business glossary of data sources and common data definitions, allowing users to more easily find the right data for decision making from governed and approved data sources.

  • Data catalogs exist within visual analytics solutions

and are also available as standalone offerings designed for seamless integration with Tableau. Informatica is an example of a data catalog partner

  • f Tableau.
slide-49
SLIDE 49

Major Cloud Provider examples

slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53

USE CASE

slide-54
SLIDE 54

Which is the largest streaming platform for TV shows and movies?

QUIZ

NETFLIX!

slide-55
SLIDE 55
  • Grown to support more than 1/3rd of all internet

traffic

  • Need arose to expand capabilities to do this
  • Extensive platform built on Tableau and AWS acts as

blueprint for many organisations looking to build scalable and flexible business intelligence on the cloud

NETFLIX

slide-56
SLIDE 56
slide-57
SLIDE 57

Features

 Data platform is complex, but elegant  Built on events and operational data fed into Amazon S3  Data sent to appropriate processors(NoSQL, Amazon

Redshift etc) which are then aggregated into Tableau Data Extracts

 Data lake/warehouse strategy allows storage of massive

amounts of data

 Provides a high level view of data to analyze and explore  All data connections and extracts end up on Tableau

Server, hosted on EC2.

slide-58
SLIDE 58

Benefits of NETFLIX by using Tableau Server

 Reuse its data sources and govern them across a wide

range of users. Eg: Dashboards can be developed that show usage and watch patterns within individual countries.

 Helps country managers easily manage programming for

their audiences.

 Dozens of people can view dashboard but only one data

source to feed it.

 Permissions can be set so that right people have access to

information which is relevant to them

slide-59
SLIDE 59

So what did we learn?

slide-60
SLIDE 60

References:

 Big Data Analytics for Data Visualization: Review of Techniques - Geetika Chawla,

Savita Bamal, Rekha Khatana

 Visualizing Big Data – Ekaterina Olshannikova, Aleksandr Ometov, Yevgeni

Koucheryavy, Thomas Olsson

 Big Data and Tableau - Sofia Machairidou  https://www.tableau.com/learn/whitepapers/tableau-big-data-overview  https://www.tableau.com/products  https://www.thedataschool.co.uk/tom-pilgrem/earth-tableau-server/  https://en.wikipedia.org/wiki/Tableau_Software

slide-61
SLIDE 61

THANK YOU