Big Data Visualization with Tableau
Avirup Chakraborty(MDS201908) Debangshu Bhattacharya(MDS201910) Ipsita Ghosh(MDS201913) Swaraj Bose(MDS201936) Sreya K.K.(MDS201804)
with Tableau Avirup Chakraborty(MDS201908) Debangshu - - PowerPoint PPT Presentation
Big Data Visualization with Tableau Avirup Chakraborty(MDS201908) Debangshu Bhattacharya(MDS201910) Ipsita Ghosh(MDS201913) Swaraj Bose(MDS201936) Sreya K.K.(MDS201804) What is big data? Extremely large data sets that may be analyzed
Big Data Visualization with Tableau
Avirup Chakraborty(MDS201908) Debangshu Bhattacharya(MDS201910) Ipsita Ghosh(MDS201913) Swaraj Bose(MDS201936) Sreya K.K.(MDS201804)
What is big data?
Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.
Velocity Variety Volume Veracity
Why data visualization is important?
communicates relationships of the data with images allows trends and patterns to be more easily seen give meaning to complicated datasets so that their
message is clear and concise
outlier detection becomes easier results from complex algorithms are much easier to
understand in a visual format
summary of data
Challenges in big data visualization (4 V’s yet again!)
Traditional visualization tools are not capable of handling
large datasets. Eg: MS Excel, Minitab
Providing low latency in visualization Parallelization is required Dimensions of the data has to be carefully chosen Most current visualization tools have low performance
w.r.t scalability, functionality and response time
Steps for big data visualization
Data acquisition Parsing and filtering Mining hidden patterns Data visualization Refinement
a powerful and fast growing data visualization tool
used in the Business Intelligence Industry.
connects easily to nearly any data source. allows for instantaneous insight on data by
transforming it into interactive visualizations called dashboards.
What is Tableau?
Why is Tableau helpful?
Handle large volume of data No scripts or code required, provides user interface Filter multiple datasets simultaneously Creates interactive and shareable dashboards depicting
trends and variations
Incorporate other programming languages to do complex
calculations
And many more….
Trivia
Founded: January 2003, California Founders: Christian Chabot, Chris Stolte (Stanford
University) , Pat Hanrahan
Headquarters: Seattle, California Website: https://www.tableau.com/ Built using C++ Latest version: 2020.1
Tableau Desktop:
A data visualization tool designed to create data visualization, report and dashboard in a fast and intelligent way.
Users can connect to multiple data sources, carry out multi-dimensional
data analysis, create dashboards or report, modify metadata and publish a complete workbook to Tableau server if needed.
Adapt your content performance for any size and any device (i.e. Desktop,
laptop, tablet or even a smartphone!).
Tableau Desktop
Personal Edition Professional Edition
Personal Edition Professional Edition
Connects to limited data sources as: Microsoft Access, Microsoft Excel, Microsoft Azure, Tableau Data Extract, Text files (CSVs). Connects to a wider variety of data sources: Amazon Redshift, Google Analytics, Google BigQuery, Hortonworks Hadoop, OLAP databases, Salesforce. Cannot connect to Tableau Server but allows users to create package files for Tableau Reader. Enables connection to Tableau Server and creating package files for Tableau Reader. Costs $999 per user. Costs $1999 per user.
Tableau Server:
Tableau server is essentially an online hosting platform to hold all your tableau workbooks, data sources and more. It works like any other server, you can store things here and they will safe from fires and pesky hackers.
So, what are the advantages of Tableau Server??
Being a Tableau product, Tableau Server lets us to use the
functionality of Tableau, without needing to always be downloading and opening workbooks.
Users need not to install Tableau Desktop on their machine, and
they can still interact with dashboards shared with them.
Tableau Server supports variety of Android apps, iPhone apps and Web browsers like Internet Explorer, Mozilla Firefox, Google Chrome and Safari.
Tableau server can be deployed on-premises as well as in public clouds like Azure, AWS, IBM Cloud, Google Cloud Platform etc. It also enables an administrator to track and manage the content, licenses, performance, and permissions for data sources with ease.
On Tableau Server, we can set permissions to different bits
can access and interact with what. Let us illustrate this using a really simple example >>
Consider this ‘imaginary’ company consisting: Tony Stark
and Nick Fury ❖ Tony Stark has access on server to upload and edit work in a project containing test documents. ❖ Dr. Banner can interact with only the production quality documents. ❖ And…Nick Fury can access but not edit the final presentation documents. ❖ Of course…Loki cannot even have a look at the documents! (at least we can hope so)
Tableau Public:
Tableau Public is a FREE tool that anyone can use to connect to data, create interactive data visualizations and publish them on the web.
Once these visualizations are in Tableau public one can share to social medias
Since everyone has access to published data, user should be careful not to
put the proprietary data on Tableau Public.
Limitations to Tableau Public:
Row limitation:
Limited to 15,000,000 rows of data per workbook.
Limited storage:
Limited to ten gigabytes (10 GB) of storage space for your workbooks.
No workbook privacy:
Tableau Public does not allow to save workbooks locally. One has to save them publicly which means that everyone can see the data since it’s saved on the cloud.
No security:
As visualizations are public so anyone can access the data and make change by downloading the workbook.
Tableau Online:
Tableau Online is a hosted version of Tableau Server. It is the business analytics platform where people can share dashboards, interact with report and gain insights. It is hosted in the cloud so that there is no hardware, no set-up time needed.
“Want the sharing and collaboration of Server, but without having to actually manage a server? Then you want Tableau
hardware to maintain!”
Roughly, Tableau Online can also be thought as a private version (and paid, obviously) of Tableau Public.
Key Features:
Fully hosted in the cloud. Servers are managed by Tableau
Team.
Supports live data connections to Amazon Redshift, Google
BigQuery, as well as to SQL-based sources hosted on cloud platforms.
Ideal for small number of users who need to be able to
interact with the data and visualizations in a secure way.
Easily accessible from a browser or Tableau Mobile App. Authenticate users through TableauID (email address and
password). No guest access allowed.
Subscription rate is $500 per user for one year (half the
price of individual Tableau Server Licenses)
Key Features (Contd.):
Tableau Reader
Tableau Reader is a FREE desktop application Allows interaction with data visualizations,
created with Tableau Desktop.
Users can filter, drill-down and view the details
Tableau Start Page
Data grid - Displays first 1,000 rows of the data contained in the Tableau data source. Left pane- Displays the connected data source and other details about your data. Canvas- Displays information about how the data source is set up and
combining the data. Metadata grid- Displays the fields in your data source as rows.
Tableau Worksheet
The Dashboard Workspace
Philosophy of Tableau working with Big Data
should be able to access and analyze data wherever it resides.
Overview of how Tableau works with big data
Data access and connectivity
To enable analysis of data of any size and format, Tableau supports broad access to data wherever it lives.
interface with Hadoop, NoSQL databases and Spark.
can access any data source that supports the SQL standard and implements the ODBC API. For Hadoop, this includes interfaces such as Hive Query Language (HiveQL), Impala SQL, BigSQL and Spark SQL.
SDK, users can build connections to data that lives outside of the existing connectors which is any data accessible over HTTP , including internal web services, JSON data, and REST API.
Fast Interaction with all data at scale
that helps customers analyze large or complex data sets faster.
techniques to achieve high query speed.
creating an extract of the data and bringing it in-memory.
Fast Interaction with all data at scale
subset) in-memory.
needs.
Fast Interaction with all data at scale
choose a subset of the data to present, organize that data into a table, then create a chart from that table.
giving visual feedback as the user analyzes.
answer questions as fast as they can think of them.
data if needed, and ultimately get deeper insights.
Tableau and Big data analytics ecosystem
Tableau fits nicely in the big data paradigm because it prioritizes flexibility—the ability to move data across platforms, adjust infrastructure on demand, take advantage of new data types, and enable new users and use cases.
Cloud infrastructure
faced with on-premises Hadoop data lakes.
before.
use, including Amazon Web Services, Google Cloud Platform and Microsoft Azure.
Ingest and prep
for raw data of any size or shape is often a data lake.
and apps located everywhere, such as social networks, smart meters, home automation, video games, and IoT sensors.
data.
applied to streams, we typically see stream data routed and stored in raw formats using lambda architecture and into a data lake, such as Hadoop, for analytics usage.
Ingest and prep
massive quantities of data by taking advantage of both batch and stream processing methods.
Storm, Flume, Kafka, and Informatica Vibe Data Stream.
for analysis.
help with this process and work fluidly with Tableau.
Storage
parallel processing, and clustered workload management.
handle extreme volumes of concurrent tasks or jobs.
Impala, Hortonworks via Hive, and MapR via Apache Drill.
their data source mixture. Snowflake is one example of a cloud-native SQL-based enterprise data warehouse with a native Tableau connector.
Storage
are not limited to, MongoDB, Datastax, and MarkLogic.
Processing
interactive, scale-out data processing.
results of complex machine learning models from Databricks in Tableau.
Query acceleration
processing (MPP) technology like Exasol and MemSQL
Vertica.
❖SQL-on-Hadoop engines like Apache Impala, Hive LLAP , etc. ❖Online Analytical Processing(OLAP)-on-Hadoop technologies like AtScale, etc.
Data Catolog
business glossary of data sources and common data definitions, allowing users to more easily find the right data for decision making from governed and approved data sources.
and are also available as standalone offerings designed for seamless integration with Tableau. Informatica is an example of a data catalog partner
Major Cloud Provider examples
Which is the largest streaming platform for TV shows and movies?
QUIZ
NETFLIX!
traffic
blueprint for many organisations looking to build scalable and flexible business intelligence on the cloud
NETFLIX
Features
Data platform is complex, but elegant Built on events and operational data fed into Amazon S3 Data sent to appropriate processors(NoSQL, Amazon
Redshift etc) which are then aggregated into Tableau Data Extracts
Data lake/warehouse strategy allows storage of massive
amounts of data
Provides a high level view of data to analyze and explore All data connections and extracts end up on Tableau
Server, hosted on EC2.
Benefits of NETFLIX by using Tableau Server
Reuse its data sources and govern them across a wide
range of users. Eg: Dashboards can be developed that show usage and watch patterns within individual countries.
Helps country managers easily manage programming for
their audiences.
Dozens of people can view dashboard but only one data
source to feed it.
Permissions can be set so that right people have access to
information which is relevant to them
So what did we learn?
References:
Big Data Analytics for Data Visualization: Review of Techniques - Geetika Chawla,
Savita Bamal, Rekha Khatana
Visualizing Big Data – Ekaterina Olshannikova, Aleksandr Ometov, Yevgeni
Koucheryavy, Thomas Olsson
Big Data and Tableau - Sofia Machairidou https://www.tableau.com/learn/whitepapers/tableau-big-data-overview https://www.tableau.com/products https://www.thedataschool.co.uk/tom-pilgrem/earth-tableau-server/ https://en.wikipedia.org/wiki/Tableau_Software