Statistics I – Chapter 2, Fall 2012 1 / 48
Statistics I – Chapter 2 Visualizing the Data
Ling-Chieh Kung
Department of Information Management National Taiwan University
Statistics I Chapter 2 Visualizing the Data Ling-Chieh Kung - - PowerPoint PPT Presentation
Statistics I Chapter 2, Fall 2012 1 / 48 Statistics I Chapter 2 Visualizing the Data Ling-Chieh Kung Department of Information Management National Taiwan University September 12, 2012 Statistics I Chapter 2, Fall 2012 2 / 48
Statistics I – Chapter 2, Fall 2012 1 / 48
Department of Information Management National Taiwan University
Statistics I – Chapter 2, Fall 2012 2 / 48
◮ In this chapter, we introduce some commonly adopted
◮ Raw data, or data that have not been summarized in any
◮ We will learn how to generate and present grouped data,
Statistics I – Chapter 2, Fall 2012 3 / 48 Frequency distributions
◮ Frequency distributions. ◮ Quantitative data graphs. ◮ Qualitative data graphs. ◮ Visualizing two variables.
Statistics I – Chapter 2, Fall 2012 4 / 48 Frequency distributions
◮ A frequency distribution is a summary of data presented
◮ Three steps to construct a frequency distribution from
◮ Determine the range, the difference between the largest and
◮ Determine the number of classes. ◮ A rule of thumb: 5 to 15 classes. ◮ Determine the width of each class; then count! ◮ Typically all classes have the same width. ◮ Be aware of class endpoints! Classes should NOT overlap with
Statistics I – Chapter 2, Fall 2012 5 / 48 Frequency distributions
◮ A sample: ages of managers from urban child care centers in
◮ Ungrouped data:
◮ Let’s summarize this sample by a frequency distribution.
Statistics I – Chapter 2, Fall 2012 6 / 48 Frequency distributions
◮ Step 1: Range = 74 − 23 = 51. ◮ Step 2: As we only have 50 numbers, it is not very good to
◮ Step 3: Class width ≥ ⌈ 51 6 ⌉ = 9. But widths like 5 or 10 are
◮ Why ceiling? Why not floor?
Statistics I – Chapter 2, Fall 2012 7 / 48 Frequency distributions
◮ The resulting classes:
◮ Why not [21, 31), [31, 41), ...? ◮ Why not (20, 30], (30, 40], ...? ◮ How about [20, 29], [30, 39], ...?
Statistics I – Chapter 2, Fall 2012 8 / 48 Frequency distributions
◮ Then we count:
◮ This is a complete frequency distribution. It is grouped
Statistics I – Chapter 2, Fall 2012 9 / 48 Frequency distributions
◮ You may also call them frequency tables. ◮ It general, deciding the number of classes, the class width,
◮ There is NO best choice. There is NO standard answer.
Statistics I – Chapter 2, Fall 2012 10 / 48 Frequency distributions
◮ We may add class midpoints, relative frequencies, and
◮ A class midpoint (or a class mark) is the midpoint of the
◮ A relative frequency is the proportion of the total
◮ A cumulative frequency is the sum of all frequencies up to
Statistics I – Chapter 2, Fall 2012 11 / 48 Frequency distributions
◮ The extended our frequency table:
◮ How about cumulative relative frequencies?
Statistics I – Chapter 2, Fall 2012 12 / 48 Quantitative data graphs
◮ Frequency distributions. ◮ Quantitative data graphs. ◮ Qualitative data graphs. ◮ Visualizing two variables.
Statistics I – Chapter 2, Fall 2012 13 / 48 Quantitative data graphs
◮ “A picture is worth a thousand words.”
◮ Graphs are intuitive to interpret. ◮ Graphs are helpful for determining the shape of a distribution.
◮ Typically we draw graphs to get some rough ideas before
◮ Moreover, (probably) your boss can read nothing but
Statistics I – Chapter 2, Fall 2012 14 / 48 Quantitative data graphs
◮ A histogram is a graphical representation of a frequency
◮ It consists of a series of contiguous rectangles, each
Statistics I – Chapter 2, Fall 2012 15 / 48 Quantitative data graphs
Statistics I – Chapter 2, Fall 2012 16 / 48 Quantitative data graphs
◮ Never forget:
◮ Caption. ◮ Captions and
◮ Unit of
◮ Contiguous
Statistics I – Chapter 2, Fall 2012 17 / 48 Quantitative data graphs
◮ Histograms are one of the most important types of
◮ One particular reason to draw histograms is to get some
◮ Bell shape? M shape? Skewed? ◮ Any outlier? ◮ Uniformly distributed? Normally distributed?
Statistics I – Chapter 2, Fall 2012 18 / 48 Quantitative data graphs
◮ A frequency polygon also graphically visualizes a
◮ Instead of using rectangles, it uses line segments
◮ The information contained in a frequency polygon is quite
Statistics I – Chapter 2, Fall 2012 19 / 48 Quantitative data graphs
◮ Never forget:
◮ Plot dots at
Statistics I – Chapter 2, Fall 2012 20 / 48 Quantitative data graphs
◮ It is more convenient to use a frequency polygon to
◮ However, people may misunderstand a frequency polygon
Statistics I – Chapter 2, Fall 2012 21 / 48 Quantitative data graphs
◮ An ogive is a cumulative frequency polygon.
◮ A dot of zero frequency is plotted at the beginning of the
◮ Dots of cumulative frequencies are plotted at the end of all
◮ Useful for seeing running totals.
◮ How many classes, from bottom to top, do we need to achieve
Statistics I – Chapter 2, Fall 2012 22 / 48 Quantitative data graphs
◮ Which one is a correct ogive?
Statistics I – Chapter 2, Fall 2012 23 / 48 Quantitative data graphs
◮ An stem-and-leaf plot separates the digits for each
◮ The leftmost digits form the stem. ◮ The other digits form the leave.
◮ The stems will be treated as categories (like those classes in
◮ In our example, the tens are stems and the units are leaves.
◮ E.g., 42: Stem is 4 and leaf is 2. ◮ E.g., 26: Stem is 2 and leaf is 6.
Statistics I – Chapter 2, Fall 2012 24 / 48 Quantitative data graphs
◮ In a column at left, one ranks stems in an ascending order
◮ For each stem, one ranks leaves in an ascending order from
◮ The stem-and-leaf plot for our example: 2 3 5 6 6 8 9 3 1 1 2 2 2 2 3 4 5 6 7 7 4 2 3 3 6 7 9 9 5 2 2 3 4 5 7 8 8 6 1 4 7 4
Statistics I – Chapter 2, Fall 2012 25 / 48 Quantitative data graphs
◮ The main advantage of a stem-and-leaf plot is that it does
◮ The main disadvantage is the table size, especially when
◮ Good for small-size data but impossible for large-size data. ◮ In general, how to divide a number into a stem and a leaf is
◮ Personally, I don’t think stem-and-leaf plots are widely used
Statistics I – Chapter 2, Fall 2012 26 / 48 Qualitative data graphs
◮ Frequency distributions. ◮ Quantitative data graphs. ◮ Qualitative data graphs. ◮ Visualizing two variables.
Statistics I – Chapter 2, Fall 2012 27 / 48 Qualitative data graphs
◮ Qualitative data graphs are for qualitative data... XD
◮ Which two data scales belong to qualitative data?
◮ Qualitative data graphs are also for grouped quantitative
Statistics I – Chapter 2, Fall 2012 28 / 48 Qualitative data graphs
◮ A pie chart is a circular depiction of data where each slice
◮ It visualizes relative frequency distributions well.
Statistics I – Chapter 2, Fall 2012 29 / 48 Qualitative data graphs
◮ Consider a survey in the IM city on what do passengers
Statistics I – Chapter 2, Fall 2012 30 / 48 Qualitative data graphs
Statistics I – Chapter 2, Fall 2012 31 / 48 Qualitative data graphs
◮ No one says those slices must be sorted by their sizes. But
◮ Pie charts are useful in visualizing the proportions of each
◮ However, determining the relative size of slides in a pie
◮ In demonstrating the differences among categories, a bar
Statistics I – Chapter 2, Fall 2012 32 / 48 Qualitative data graphs
◮ A bar chart (or bar graph) depicts each category by a bar.
◮ It does not matter to draw bars vertically or horizontally.
◮ No one says those bars must be sorted by their lengths. But
Statistics I – Chapter 2, Fall 2012 33 / 48 Qualitative data graphs
Statistics I – Chapter 2, Fall 2012 34 / 48 Qualitative data graphs
◮ A bar chart is different from a histogram!!
◮ A bar chart is better for comparing difference categories; a
1While it is still allowed for bars in a bar chart to be contiguous, I
Statistics I – Chapter 2, Fall 2012 35 / 48 Qualitative data graphs
◮ What are differences that distinguish a bar chart from a
Statistics I – Chapter 2, Fall 2012 36 / 48 Qualitative data graphs
◮ A Pareto chart is a bar chart in which bars are sorted
◮ Pareto is not Plato!! He is Vilfredo Pareto, an Italian
◮ Typically, bars in a Pareto chart are vertically depicted. The
Statistics I – Chapter 2, Fall 2012 37 / 48 Qualitative data graphs
Statistics I – Chapter 2, Fall 2012 38 / 48 Qualitative data graphs
◮ A Pareto chart is good for identifying those most critical
◮ Some people add a cumulative frequency distribution on a
Statistics I – Chapter 2, Fall 2012 39 / 48 Visualizing two variables
◮ Frequency distributions. ◮ Quantitative data graphs. ◮ Qualitative data graphs. ◮ Visualizing two variables.
Statistics I – Chapter 2, Fall 2012 40 / 48 Visualizing two variables
◮ When we have data for two variables, typically we want to
◮ Visualizing the data in a two-dimensional manner helps.
Statistics I – Chapter 2, Fall 2012 41 / 48 Visualizing two variables
◮ Cross tabulation produces a two-dimensional table that
◮ Consider how people in three occupations select one out of
◮ Labels occupations as 1, 2, and 3. ◮ Labels newspaper as 1, 2, 3, and 4. ◮ Data:
Statistics I – Chapter 2, Fall 2012 42 / 48 Visualizing two variables
◮ The data can be organized into a contingency table:
◮ Do people in different occupation prefer different newspaper?
Statistics I – Chapter 2, Fall 2012 43 / 48 Visualizing two variables
◮ What do you think?
Statistics I – Chapter 2, Fall 2012 44 / 48 Visualizing two variables
◮ When the two variables are both measured in quantitative
◮ Consider the size of a house and its price in the IM city:
Statistics I – Chapter 2, Fall 2012 45 / 48 Visualizing two variables
◮ We may switch
◮ Is there any
Statistics I – Chapter 2, Fall 2012 46 / 48 Visualizing two variables
◮ Does the line fit our data?
Statistics I – Chapter 2, Fall 2012 47 / 48 Visualizing two variables
◮ Whether there exists a “significant” relationship between
◮ Relationships may also be nonlinear. ◮ A scientific way, regression, will be introduced in the Spring
◮ At this moment, judge a scatter plot by intuitions.
◮ Scatter plots are typically for two quantitative variables.
◮ Scatter plots can be drawn when one variable is qualitative. ◮ What if both variables are qualitative?
Statistics I – Chapter 2, Fall 2012 48 / 48 Visualizing two variables
◮ There is NO standard way of making frequency distributions
◮ In drawing a graph, never forget:
◮ Caption. ◮ Captions and labels for the x- and y-axes. ◮ Unit of measurement.