Data Visualization Non-Programming approach to Visualize Data Dr. - - PowerPoint PPT Presentation

data visualization
SMART_READER_LITE
LIVE PREVIEW

Data Visualization Non-Programming approach to Visualize Data Dr. - - PowerPoint PPT Presentation

Data Visualization Non-Programming approach to Visualize Data Dr. Omer Ayoub Senior Data Scientist, House of Mathematical and Statistical Sciences, King Abdul Aziz Univerrsity, Jeddah, Saudi Arabia Dr. Omer er Ayoub ub Ph. h.D in n


slide-1
SLIDE 1

Data Visualization

Non-Programming approach to Visualize Data
  • Dr. Omer Ayoub
Senior Data Scientist, House of Mathematical and Statistical Sciences, King Abdul Aziz Univerrsity, Jeddah, Saudi Arabia
slide-2
SLIDE 2 2
  • Dr. Omer

er Ayoub ub

Ph. h.D in n Comput uter er Sci cience ence (USA) ICTP TP Associ ciate Seni enior Data Sci cient entist Hous use e of Mathem hematica cal Sci ciences ences, Cons nsul ulting ng Firm King ng Abdul ul Aziz Uni niver ersity, Jed eddah, h, Saud udi Arabia Em Email: omer erayoub ub@ho hotmail.co com omer er@statistica calview ew.co com CoDATA-RDA Applied Workshops, ICTP
slide-3
SLIDE 3

Content

3

12 3 4

5

  • 1. Introduction to Data Visualization
  • 2. What is non-programming approach?
  • 3. How to benefit from this workshop?
  • 4. Data Openness and Open Access policy
  • 1. Which type of visual design should I select to present my findings?
  • 2. Chart types and Design best practices
  • 1. An idea and discussion about Next sessions
  • 2. Getting yourself ready with the tools to practice
  • 1. Questions and Answers Session
  • 1. Wrap Up
CoDATA-RDA Applied Workshops, ICTP
slide-4
SLIDE 4 First CoDATA - RDA Summer School Participants in ICTP - 2016 4 CoDATA-RDA Applied Workshops, ICTP
slide-5
SLIDE 5 — Self-assessment questions:
  • How do you plan to contribute to your society in terms of applying
the methodologies and practices learnt during this summer school?
  • Any plans to do something for Open data access?
  • Any thoughts on following standardized procedures to overcome
the barriers in data sharing?

Your contribution to your society …

5 CoDATA-RDA Applied Workshops, ICTP
slide-6
SLIDE 6

Feedback and Suggestion

6 CoDATA-RDA Applied Workshops, ICTP
slide-7
SLIDE 7 “Numbers have an important story to tell. They rely on you to give them a clear and convincing voice.” – Stephen Few, Now You See It: Simple Visualization Techniques for Quantitative Analysis

Visualization

7 CoDATA-RDA Applied Workshops, ICTP
slide-8
SLIDE 8

Visualization

6 CoDATA-RDA Applied Workshops, ICTP
slide-9
SLIDE 9

Finding the Story in your Data

7 TRENDS CORRELATIONS OUTLIERS Ice Cream sales
  • ver time
Ice Cream sales vs. Temperature Ice cream sales in an unusual region
  • Information can be visualized in a number of ways, each of which can provide a specific insight.
  • When you start to work with your data, it’s important to identify and understand the story you are
trying to tell and the relationship you are looking to show. Knowing this information will help you select the proper visualization to best deliver your message.
slide-10
SLIDE 10

Data Types

1

KNOW YOUR DATA

Before understanding visualizations, you must understand the types of data that can be visualized and their relationships to each other. Here are some of the most common you are likely to encounter. QUANTITATIVE Data tha hat ca can n be e co count unted ed or mea easur ured ed; all values ues are e num numer erica cal. DISCRETE Num umer erica cal Data tha hat ha has a fini nite e num number er of possible e values
  • ues. Example:
e: num number er of em employees ees in n the he office ce CONTINUOUS Data tha hat is mea easur ured ed and nd ha has a value ue withi hin n a rang nge.
  • e. Example:
e: Rainf nfall in n a yea ear. CATEGORICAL Data tha hat ca can n be e stored ed acco ccording ng to group up
  • r ca
categ
  • egory. Example:
e: Ty Types es of product ucts sold CoDATA-RDA Applied Workshops, ICTP
slide-11
SLIDE 11

Data Relationships

1 1 NOMINAL COMPARISON Thi This is a simple e co comparison n of the he qua uant ntitative e values ues
  • f
  • f sub
ubca categ egories
  • es. Ex
Exampl mple: Num umber er of visitors to various us web ebsites es. TIME SERIES Thi This track cks cha chang nges es in n values ues of a co cons nsistent ent met etric c over er time. e. Example: e: Mont nthl hly sales es et etc. c. CORRELATION Thi This is data with h two or more e variables es tha hat may dem emons nstrate e a positive e or neg negative e co correl elation n to ea each ch other
  • her. Example:
e: Salaries es acco ccording ng to ed educa ucation n lev evel el. RANKING Thi This sho hows ho how two or more e values ues co compare e to ea each ch other her in n rel elative e magni nitud ude.
  • e. Example:
e: Historic c wea eather her patter erns ns rank nked ed from the he ho hottes est mont nths hs to the he co coldes est. DEVIATION Thi This ex examines nes ho how data point nts rel elate e to ea each ch
  • ther
her, particul cularly ho how far any ny given en data point nt differ ers from the he mea ean. n. Ex Exampl mple: Amus usem ement ent park tick cket ets sold on n a rainy ny day vs. a reg egul ular day. DISTRIBUTION Thi This sho hows data distribut ution, n, often en around und a cent entral value.
  • ue. Examples
es: Hei eight hts of player ers in n Basket etball tea eam PART-TO-WHOLE RELATIONSHIPS Thi This sho hows a sub ubset et of data co compared ed to the he Larger er who hole.
  • e. Example:
e: Per ercent centage e of cus customer ers pur urcha chasing ng speci ecific c product ucts et etc. c. CoDATA-RDA Applied Workshops, ICTP
slide-12
SLIDE 12

Chart Types

12 This section addresses about most common chart types that are usually used for Visualization. Furthermore, we will discuss about the best practices to use these chart types: Bar Chart Pie Chart Line Chart Area Chart Scatterplot Bubble Chart Heat Map CoDATA-RDA Applied Workshops, ICTP
slide-13
SLIDE 13

Bar Charts Variations

13

Bar Charts

Bar charts are very versatile. They are best used to show change over time, compare different categories, or compare parts of a whole. Common Bar chart variations include Stacked, 100% stacked versions. Usually these variations are used to compare multiple part-to-whole
  • relationships. i.e. Monthly online traffic
analysis by different sources. VERTICAL (Column Chart) It is best used for chronological data (time-series should always run left to right) or when visualizing negative values below the axes. HORIZONTAL It is best used when data with long categories are to be labelled CoDATA-RDA Applied Workshops, ICTP
slide-14
SLIDE 14

Bar Charts Design Best Practices

14

Bar Charts

Bes est Pract ctices ces Use Horizontal Labels Avoid steep diagonal or vertical type, as it can be difficult to read Space Bars Appropriately Space between the bars should be at least ½ bar width Start the y-axis value at Zero Starting at a value above zero truncates the bars and doesn’t accurately reflect the full value. Use Consistent Colors Use one color for bar charts. You may use an accent color to highlight a significant data point. Order Data Appropriately Order the categories alphabetically, sequentially or by the values. CoDATA-RDA Applied Workshops, ICTP
slide-15
SLIDE 15 15

Pie Chart

Pie Chart Variations

Pie charts are best used for making portion to whole comparisons with discrete or continuous data. They are most impactful with a small data set. STANDARD It is used to show part-to-whole relationships. DONUT A stylistic variation of the original pie chart with an inclusion of a total value
  • r design element in the center.
CoDATA-RDA Applied Workshops, ICTP
slide-16
SLIDE 16

Pie Charts Design Best Practices

16

Pie Charts

Bes est Pract ctices ces Visualize no more than 5 Categories per Chart It is difficult to differentiate between the small values; depicting to many slices makes it complex and decreases the visualization impact. If needed, multiple small slices may be categorized as “Miscellaneous” or “Other” Don’t use Multiple Pie charts for Comparison Sliced sizes are very complex to compare side by
  • side. Hence, if required; use a stacked bar chart
instead. Total Data Count must be 100% Make sure that total values sum up to 100% and that pie slices are sized proportionate to their corresponding value Order the slices Correctly Option-1: Place the largest section at 12 o’clock going clockwise and second largest at 12 o’clock counterclockwise. Option-2: Place the largest section at 12 o’clock going clockwise. Place remaining sections in the descending order, going clockwise.
slide-17
SLIDE 17 17

Line Chart

Line Chart Variations

Line charts are used to show time- series relationships with continuous
  • data. They help show trend,
acceleration, deceleration, and volatility. Line chart itself doesn’t offer any variations. It may be used to track or identify changing trends in bar chart but it itself doesn’t have any variants. Direct Marketing Views, By Date CoDATA-RDA Applied Workshops, ICTP
slide-18
SLIDE 18

Line Charts Design Best Practices

18

Line Charts

Bes est Pract ctices ces Inclusion of Zero Baseline Although a Line chart doesn’t have to start with a 0 value; it should be included whenever possible. Don’t plot more than 4 lines If you need to display more than 4 lines, break them into separate charts for better comparison Solid Lines ONLY Use of dashed and dotted lines can be distracting Label Directly This allows readers quickly identify lines. Use the right Height Plot all lines so that the line chart takes approximately two-thirds of the y-axis’s total scale. CoDATA-RDA Applied Workshops, ICTP
slide-19
SLIDE 19 19

Area Chart

Area Chart Variations

Area charts depict a time-series relationship, but they are different than line charts in that they can represent volume Area Chart Used to show or compare quantitative progression over time Stacked Area Chart Best used to visualize part-to-whole relationship
  • ver time, how each category contributes to
cumulative total 100% Stacked Area Chart Used to show distribution of categories as part of a whole, where the cumulative total is not important. CoDATA-RDA Applied Workshops, ICTP
slide-20
SLIDE 20

Area Charts Design Best Practices

20

Area Chart

Bes est Pract ctices ces It should be easy to read In stacked area charts, arrange data to position categories with highly variable data on the top of chart and low variability on the bottom. Start y-axis value at 0 Starting above zero truncates the visualization of values. Don’t display more than 4 categories It will result in a complex cluster visual Use Transparent Colors Use of transparency must be ensured for clear visibility Don’t use for Discrete Data The connected lines imply intermediate values, which only exist in continuous data CoDATA-RDA Applied Workshops, ICTP
slide-21
SLIDE 21 21

Scatterplot Chart

Scatterplot Chart Variations

Scatter plots show the relationship between items based on two sets of
  • variables. They are best used to show
correlation in a large amount of data. CoDATA-RDA Applied Workshops, ICTP
slide-22
SLIDE 22

Scatterplot Charts Design Best Practices

22 Start with y-axis value at 0 Include more Variables Use size and dot color to encode additional data variables Use Trend Lines These lines help draw correlation between the trending variables Don’t Compare more than 2 Trend Lines Too many lines make it difficult to interpret

Scatterplot Chart

Bes est Pract ctices ces CoDATA-RDA Applied Workshops, ICTP
slide-23
SLIDE 23 23

Bubble Chart

Bubble Chart Variations

Bubble charts are good for displaying nominal comparisons or ranking relationships. Bubble Plot is a scatterplot with bubbles best used to display an additional variable. Bubble map is best used to visualize values for specific geographic regions. CoDATA-RDA Applied Workshops, ICTP
slide-24
SLIDE 24

Bubble Chart Design Best Practices

24 Label Visibility must be ensured Make sure the labels are visible, easily identifiable and unobstructed Size the Bubbles Appropriately Bubbles should be scaled according to the area and not the diameter. Avoid using Odd shapes Avoid adding too much details or using shapes that are not entirely circular, this can lead to inaccuracies.

Bubble Chart

Bes est Pract ctices ces CoDATA-RDA Applied Workshops, ICTP
slide-25
SLIDE 25 25

Heat maps

Heat Map Variations

Heat maps are used to display categorical data, using intensity of color to represent values of geographic areas or data tables. STATES WITH NEW SERVICE CONTRACTS CoDATA-RDA Applied Workshops, ICTP
slide-26
SLIDE 26

Heat Map Design Best Practices

26 Use a simple Map outline These lines are meant to frame the data Appropriate Choice of Colors Use a single color with varying shades. This will not only make it soothing and appealing visually but also present the results correctly.. Use of Patterns Use patterns to indicate second variable. But using multiple patterns is overwhelming and distracting Appropriate Date Ranges Select 3 to 5 numerical ranges that enable fairly data
  • distribution. Use +/- signs to indicate high and low ranges

Heat Map

Bes est Pract ctices ces
slide-27
SLIDE 27

Do’s and Don’ts in DATA DESIGN & VISUALIZATION

  • Do Use one color to represent each category
  • Do order data sets using logical hierarchy
  • Do use callouts to highlight important or interesting information
  • Do visualize your data in a way that it is easy for readers to compare values
  • Do use icons to enhance comprehension and reduce unnecessary labelling
  • Don’t use high contrast color combinations such as Red/Green or Blue/Yellow
  • Don’t use 3D charts. They can skew perception of the visualization
  • Don’t add chart junk. Unnecessary Illustrations, drop shadows
  • r ornamentations distract from the data
  • Don’t use more than 6 colors in a single layout
  • Don’t use distracting fonts or elements (such as bold, italic or underlined text)
slide-28
SLIDE 28

References

CoDATA-RDA Applied Workshops, ICTP 28
  • Pha
ham Viao; Bes est Pract ctices ces in n Data Visua ualizations ns (2014), Micr crostrateg egy
  • Haider
er Al Sea eaidy; Dashb hboard Des esign n and nd Data Visua ualization n Bes est Pract ctices ces (2016), Splunk unk Conf nfer erence ence on n Data Sci cience ence
  • Syno
no et et al; Bes est Pract ctice ce Visua ualization, n, Dashb hboard and nd Key ey Figur ures es Rep eport (2013), Open en Data Moni nitor.
  • Hub
ubspot; How to Des esign n Cha harts and nd Graphs hs, Data Visua ualization n 101
  • Ta
Tableu; eu; Visua ual Ana nalysis Bes est Pract ctices ces: Simple e Techni Techniques ues for Making ng Ever ery Data Visua ualization n Usef eful ul and nd Bea eaut utiful ul