Tables Zero-dimensional data with position R.W. Oldford University - - PowerPoint PPT Presentation
Tables Zero-dimensional data with position R.W. Oldford University - - PowerPoint PPT Presentation
Tables Zero-dimensional data with position R.W. Oldford University of Waterloo Magnitude Visual representations - labels Recall from last day, the ordering of the elementary tasks from most accurate to least accurate: 1. Position along a
Magnitude
Visual representations - labels
Recall from last day, the ordering of the elementary tasks from most accurate to least accurate:
1. Position along a common scale 2. Position on identical but nonaligned scales 3. Lengths (N.B. line segments were all oriented horizontally or vertically, though nonaligned) 4. Angle Slope (not close to 0, π/2, or π radians) 5. Area 6. Volume 7. Colour density, colour saturation 8. Colour hue
Missing from the list is the possibility of using “labels” which, provided there were not too many, were observed to work well with categorical data. E.g. “forty-two” or 42 Because it is simply read (by the trained person), using the number itself as “label” would probably have the most accurate (and fastest) decoding!
Magnitude
Visual representations - numbers as labels
Which is faster to decode? To compare values?
Word numbers? Or symbolic numbers?
Magnitude
Visual representations - numbers as labels
Which is easier to compare values?
left centre right decimal aligned rounded
Magnitude
Visual representations - choosing positions
Which is easier to compare magnitudes?
unordered ascending descending
Magnitude
Visual representations - choosing positions
Which is easier to compare magnitudes?
unordered ascending descending
Magnitude
Visual representations - choosing positions
Horizontal versus vertical for comparing magnitudes?
Modern numerals
A surprisingly recent innovation
◮ Early European forms ◮ Roman numerals indicate century. ◮ from G.F . Hill’s The Development of Arabic Numerals in Europe (1915), p.
- 28. as recorded in Cajori
(1928), p. 49.
Modern numerals
Important Characteristics Characteristics permit ease of visual reasoning within the written system – hand calculation. E.g. consider
Multiplication Long division
At least for ‘natural’ everyday numbers.
Modern numerals
Important Characteristics
Calculating-Table
by Gregor Reisch (c. 1467 - 1525) A woodcut illustration from Reisch’s Margarita Philosophica (1503) fea- turing Arithmetica instructing an “al- gorist” (Boethius) on the left and an “abacist” (Pythagoras) on right.
These are two different types of arith- metic (Typus Arithmeticae)
Algorism: “technique of performing basic arithmetic by writing numbers in place value form and applying a set
- f memorized rules and facts to the digits.” (Wikipedia)
Modern numerals
Search for an ostensive definition
◮ from Cajori (1928), p. 65. ◮ Earnest but fanciful hypotheses, post-hoc rationalizing. ◮ “They serve merely as entertaining illustrations of the
- peration of a pseudo-scientific
imagination, uncontrolled by all the known facts.” Cajori (1928), p. 68. ◮ they are not themselves an
- stensive definition, or pictorial
form ◮ rather they are a learned symbolic abstraction ◮ must be distinguishable, easily learned, and easily constructed by pen and paper
Modern numerals
Even positional representation is surprisingly recent innovation
◮ The nine numerals are enhanced with powers of 10 by surrounding them with that number of zeros. ◮ From Christoff Rudolff (Augsburg, 1574?) Künstliche Rechnung mit der Ziffer as taken from Cajori (1928), p. 56.
Modern numerals
Cuneiform and positional representation
One of the earliest written number systems was that of cuneiform, used by the Babylonians (circa 2500 BCE). A wedge-shape (cuneus in Latin) tipped reed is pressed into wet clay and a single stroke indicates a single unit, the first number 1. (No zero.) A simple visual representation with a standardized layout of the symbols. 1 2 3 4 5 6 7 8 9 Needs a compression for larger numbers. 10 20 30 40 50 Ten is like two hands pressed together. Note standardized layout. Ends at 50.
· · ·
10 11 12 · · · 19 Which is nice up until 59. After that positional representation is used! E.g. 70 =
Modern numerals
Important Characteristics
◮ fixed and small base
◮ few characters to learn. ◮ Other bases, e.g. 12, might be even better. Babylonians (cuneiform) uses base 60.
◮ positional
◮ character encodes value ◮ position encodes magnitude (increasing right to left)
◮ standardized size and layout
◮ align in columns ◮ sequences separate in groups of 3 Allows them to be used dynamically as visual aids to reasoning.
◮
Visually executed algorithms rely on position (e.g. multiplication, long division, . . . )
◮
Natural grouping by position
◮
Several pass sorting (by first digit, then second, . . . )
Tables of numbers
A Babylonian innovation
(a) Balance sheet (b) Pythagorean triplets
◮ Rows (sometimes unaligned) and columns, gridlines, indexing, headings. ◮ For reference. Static.
Modern Tables
Another layout of digits.
Tables are:
◮ for the record ◮ visual aids to reasoning.
The first is more prevalent, the second more important. Like other layouts of numbers, tables should take advantage of the visual characteristics of the digits they display.
Modern Tables
Example: Acidity of Ontario Lakes
Background: One of the most pressing environmental problems facing large areas of North America is acidic precipitation. All of southern Ontario receives a steady bombardment of acids, acid forming gases and associated pollutants. The acids come down with rain, snow, fog, and small particles in the air. Man’s activities are responsible for the large majority of these acids. Smelters and coal-fired electric generating stations both in Canada and the United States spew millions of tonnes of sulphur dioxide into the atmosphere
- annually. Cars, trucks, and trains contribute more millions of tonnes of nitrogen dioxides. These gases react with
sunlight, oxygen, ozone, water and other gases to form sulphuric and nitric acid – strong, corrosive acids. In unpolluted areas, rain and snow are naturally slightly acidic since carbon dioxide, which is a natural component of the atmosphere, dissolves in water to form weak carbonic acid. Water quality of lakes has developed in response to weathering processes induced by this weak acid. Rocks and minerals react with carbonic acid to form bicarbonate, which is found in natural waters everywhere. The complex biological communities in lakes, streams and forests have adapted and evolved in equilibrium with these natural conditions and processes. However, acid rain has seriously disturbed this equilibrium. There are over 250,000 lakes in Ontario. Thousands have been affected by acid rain, many of them in the Muskoka-Haliburton region, where there is a substantial cottage and tourist industry.
Acidity of Ontario Lakes
Ontario Government Publication: “Acid Sensitivity Survey of Lakes in Ontario – 1989”.
Acidity of Ontario Lakes
◮
Rows ordered alphabetically by “County
- r District”
◮
Lots of redundant information.
◮
Difficult to see patterns, if any. Ontario Government Publication: “Acid Sensitivity Survey of Lakes in Ontario – 1989”.
Acidity of Ontario Lakes
Arrange by region, remove uninteresting redundancy.
Acidity of Ontario Lakes
Arrange by acidity.
Acidity of Ontario Lakes
Arrange by acidity, remove further redundancy, annotate.
Modern Tables
Conveying information visually
Modern tables:
◮ need no longer be for the record (databases are) ◮ should be displayed as visual aids to reasoning.
Like other layouts of numbers, tables should take advantage of the visual characteristics of the digits they display.
◮ Take advantage of rows. ◮ Align digits in columns. ◮ Show important individual numbers. ◮ Use white space to separate groups of numbers. ◮ Think hard about the information to be communicated.
Tables
Analysis
Consider the following table of “Sales data”:
TABLE 1.1 Data in Four Areas and Eight Three-Month Periods in 1969-1970. 13-15 16-18 19-21 22-24 25-27 28-30 31-33 34-36 A 97.62 92.24 100.90 90.39 95.69 94.44 91.13 97.81 B 48.29 42.31 49.98 39.09 46.38 49.74 41.74 37.39 C 75.23 75.16 100.11 74.23 74.23 76.97 71.66 76.47 D 49.69 57.21 80.19 51.09 52.88 49.41 59.32 52.56
Analysis: ◮
Goal is to see some patterns in the data.
◮
Develop a summary description (“model”) for the pattern.
◮
Assess the agreement of the pattern with the data. Source: A.S.C. Ehrenberg (1975) Data Reduction: Analysing and Interpreting Statistical Data.
Analysis of table data
Step 1
Separate row and column headings
13-15 16-18 19-21 22-24 25-27 28-30 31-33 34-36 A 97.62 92.24 100.90 90.39 95.69 94.44 91.13 97.81 B 48.29 42.31 49.98 39.09 46.38 49.74 41.74 37.39 C 75.23 75.16 100.11 74.23 74.23 76.97 71.66 76.47 D 49.69 57.21 80.19 51.09 52.88 49.41 59.32 52.56 TABLE 1.1 Data in Four Areas and Eight Three-Month Periods in 1969-1970.
Lines and space.
Next: Assign meaningful labels. Columns are 3 month periods numbered from 1968.
Analysis of table data
Step 2
Meaningful labels. Separate years.
Quarters (1969) Quarters (1970) Area 1 2 3 4 1 2 3 4 North 97.62 92.24 100.90 90.39 95.69 94.44 91.13 97.81 South 48.29 42.31 49.98 39.09 46.38 49.74 41.74 37.39 East 75.23 75.16 100.11 74.23 74.23 76.97 71.66 76.47 West 49.69 57.21 80.19 51.09 52.88 49.41 59.32 52.56
Gridlines added to separate years and define table. Table title unnecessary (a different
- ne with different information might be added).
Analysis of table data
Step 3
Focus on 1969. Reduce to significant digits.
Quarters (1969) Area 1 2 3 4 North 97.62 92.24 100.90 90.39 South 48.29 42.31 49.98 39.09 East 75.23 75.16 100.11 74.23 West 49.69 57.21 80.19 51.09 ⇒ Quarters (1969) Area 1 2 3 4 North 98 92 101 90 South 48 42 50 39 East 75 75 100 74 West 50 57 80 51 It is a common mistake to present too many digits. Number of significant digits is typically 1, 2 or 3. It may require a change in units of measurement being displayed (e.g. 100, 000s of dollars rather than 1000s). Patterns?
Analysis of table data
On “significant digits”
Because we are looking for patterns in the table, reducing to “significant” digits may not always mean the usual “scientifically significant digits”. For example, had the table been, say Quarters (1969) Area 1 2 3 4 North 12345097.62 12345092.24 12345100.90 12345090.39 South 12345048.29 12345042.31 12345049.98 12345039.09 East 12345075.23 12345075.16 12345100.11 12345074.23 West 12345049.69 12345057.21 12345080.19 12345051.09 We still would have wanted the table below: Quarters (1969) Area 1 2 3 4 North 98 92 101 90 South 48 42 50 39 East 75 75 100 74 West 50 57 80 51 Because the first five significant digits are iden- tical any pattern in the table will be in the re- maining digits. Simply subtract 1,234,500 from every entry and report a location 1,234,500 to accompany the table.
Analysis of table data
Step 4
Looking for patterns. Get column and row sums.
Quarters (1969) Area 1 2 3 4 Total North 98 92 101 90 381 South 48 42 50 39 179 East 75 75 100 74 324 West 50 57 80 51 238 Total 271 266 331 254 1122 Patterns? Sums are on a different scale (magnitude) than the points within the table.
Analysis of table data
Step 5
Looking for patterns. Get column and row averages or medians or . . . .
Quarters (1969) Area 1 2 3 4 Ave. North 98 92 101 90 95 South 48 42 50 39 45 East 75 75 100 74 81 West 50 57 80 51 60 Average 68 67 83 64 70 Patterns? Quarterly averages don’t differ much from overall average. Area averages differ more. Area values also do not differ much from the Areas averaged across quarters.
Analysis of table data
Step 6
Lack of variation across quarters is better seen if these figures align within columns.
Quarters (1969) Area 1 2 3 4 Ave. North 98 92 101 90 95 South 48 42 50 39 45 East 75 75 100 74 81 West 50 57 80 51 60 Average 68 67 83 64 70 Area 1969 North South East West Ave. Q1 98 48 75 50 68 Q2 92 42 75 57 67 Q3 101 50 100 80 83 Q4 90 39 74 51 64 Average 95 45 81 60 70 Note how much easier it is to scan down the North column to see the little variability in the leading digit. Exceptions now stand out more easily – the 100 in the East column and the 80 in the West. Similarly Q3 overall. More widely spaced rows would diminish this visual advantage.
Analysis of table data
Step 7
Column order is arbitrary. Reorder to better reveal patterns.
Area 1969 South West East North Ave. Q1 48 50 75 98 68 Q2 42 57 75 92 67 Q3 50 80 100 101 83 Q4 39 51 74 90 64 Average 45 60 81 95 70 Columns ordered from smallest to largest. Much simpler to see the same decreasing pattern occurs in all quarters. Could rearrange rows, except: not much variation across quarters & the quarters have a meaningful time order. For rearranged rows, place largest at top. It’s easier to look at subtractive difference between rows.
Analysis of table data
Step 8
Could exclude exceptions in calculating row and column summaries.
Area 1969 South West East North Ave. Q1 48 50 75 98 68 Q2 42 57 75 92 67 Q3 50 80 100 101 83 Q4 39 51 74 90 64 Average 45 53∗ 75∗ 95 67∗
∗ Excluding Q3 in West and East.
Exceptions coloured brown. Could also indicate with parentheses, i.e. 100 or (100).
Analysis of table data
How far have we come?
Compare where we started, with where we ended Original 1969 data 13-15 16-18 19-21 22-24 A 97.62 92.24 100.90 90.39 B 48.29 42.31 49.98 39.09 C 75.23 75.16 100.11 74.23 D 49.69 57.21 80.19 51.09 Area 1969 South West East North Ave. Q1 48 50 75 98 68 Q2 42 57 75 92 67 Q3 50 80 100 101 83 Q4 39 51 74 90 64 Average 45 53∗ 75∗ 95 67∗
∗ Excluding Q3 in West and East.
Analysis of table data
Summary Description
Four areas: North 95, East 75, West 53, and South 45. There were two exceptionally high values in Q3 from the East and West regions, differing from their averages by about 25 units. Area 1969 South West East North Ave. Q1 48 50 75 98 68 Q2 42 57 75 92 67 Q3 50 80 100 101 83 Q4 39 51 74 90 64 Average 45 53∗ 75∗ 95 67∗
∗ Excluding Q3 in West and East.
Guidelines for constructing tables
Things to note . . . Ehrenberg (1975), p. 14.
1. Base rules ◮ Reduce number of digits. (N.B. could require finding a common location) Mental arithmetic is more difficult with more than two significant (varying) digits. ◮ Figures to be compared should be close together. ◮ Use memorable self-explanatory symbols and labels. ◮ Separate different types of items/groups with white space or gridlines. 2. Calculations ◮ Avoid introducing new variables or scales (e.g. totals) whenever possible. ◮ Use averages (or medians) to help focus the eye over the array. ◮ Note dramatically exceptional values and exclude them from pattern summary calculations. 3. If possible swap rows and columns, reorder rows and/or columns: ◮ Numbers that vary the least should appear in columns. Both regularities and exceptions are easier to spot. ◮ Rearrange rows to have large numbers appear above small numbers. Differences should easier to detect following subtraction rules. ◮ Rearrange columns so that averages are strictly decreasing (or increasing) from left to right. Easier then to detect departures from this pattern within the table.
Analysis of table data
Modelling the sales figures
South 45, West 53, East 75, and North 95. This is the essential pattern that our analysis uncovered. These are, in some sense, our “model” numbers (i.e. reasonable summaries, but not the actual value for any quarter). We think they capture essential structure in these sales figures. Sounds like a “pictorial form” – modelling entails the possibility of an underlying structure connecting the sales figures to our numerical picture. The model effectively says that the different quarters need not be considered. But not all quarters had these values. . . . There was variation from these values. Questions: What should we say about this variation? About this deviation from our model? Does it matter? Answers: Model the deviation as well. Determine its characteristics.
Analysis of table data
Step 9
Use model for areas, and look at deviations from model.
Area 1969 South West East North Ave. Q1 3
- 3
3 1 Q2
- 3
4
- 3
Q3 5 27 25 6 6∗ Q4
- 6
- 2
- 1
- 5
- 4
Average 0∗ 0∗ 0∗
∗ Excluding Q3 in east and West.
Model has South 45, West 53, East 75, and North 95. Comments? Column averages must be zero, they are from the model. Note rounding effects.
Analysis of table data
Step 9
Model has North 95, East 75, West 53, South 45 Deviations are: Area 1969 South West East North Ave. Q1 3
- 3
3 1 Q2
- 3
4
- 3
Q3 5 27 25 6 6∗ Q4
- 6
- 2
- 1
- 5
- 4
Average 0∗ 0∗ 0∗
∗ Excluding Q3 in West and East.
Summarize size of deviations: Use average absolute deviation. Area 1969 South West East North Ave.
- Ave. Dev.
4 3∗ 0∗ 4 3
∗ Excluding Q3 in West and East.
Analysis of table data
Summary Description
Four areas: North 95, East 75, West 53, and South 45. Deviation is about 3 units about these averages, with no regular pattern. There were two exceptionally high values in Q3 from the East and West regions, differing from their averages by about 25 units. Table has itself become redundant. Need no longer be included.
Add to our guidelines
Things to note . . . Ehrenberg (1975), p. 14.
1. Base rules ◮ Reduce number of digits. Mental arithmetic is more difficult with more than two significant (varying) digits. ◮ Figures to be compared should be close together. ◮ Use memorable self-explanatory symbols and labels. ◮ Separate different types of items/groups with white space or gridlines. 2. Calculations ◮ Avoid introducing new variables or scales (e.g. totals) whenever possible. ◮ Use averages (or medians) to help focus the eye over the array. ◮ Note dramatically exceptional values and exclude them from pattern summary calculations. 3. If possible swap rows and columns, reorder rows and/or columns: ◮ Numbers that vary the least should appear in columns. Both regularities and exceptions are easier to spot. ◮ Rearrange rows to have large numbers appear above small numbers. Differences should easier to detect following subtraction rules. ◮ Rearrange columns so that averages are strictly decreasing (or increasing) from left to right. Easier then to detect departures from this pattern within the table. 4. Summarize irregular aspects of the data statistically, e.g. by average deviations from appropriate averages.
Generalizing
Making use of the model
We started with two years data, 1969 and 1970, but only built a model based on the first of these. Can we apply what we learned about 1969 directly to 1970? If the model works for both years, we have in some sense validated it. The possibility of an underlying structure, described by our model, would seem to be connected with reality (cf. pictorial form) Applying the model amounts to building a final table for 1970 according to the same table organization. For 1970 this yields Area 1970 South West East North Ave. Q1 46 53 74 96 67 Q2 50 49 77 94 68 Q3 42 59 72 91 66 Q4 37 53 76 98 66 Average 44 54 75 95 67 which tells essentially the same story.
Regularity
Model consistency
Area 1969 South West East North Ave. Q1 48 50 75 98 68 Q2 42 57 75 92 67 Q3 50 80 100 101 83 Q4 39 51 74 90 64 Average 45 53∗ 75∗ 95 67∗
∗ Excluding Q3 in West and East.
Area 1970 South West East North Ave. Q1 46 53 74 96 67 Q2 50 49 77 94 68 Q3 42 59 72 91 66 Q4 37 53 76 98 66 Average 44 54 75 95 67 Consistent results across both years.
Irregularity
Deviations
Area 1969 South West East North Ave. Q1 3
- 3
3 1 Q2
- 3
4
- 3
Q3 5 27 25 6 6∗ Q4
- 6
- 2
- 1
- 5
- 4
Average 0∗ 0∗ 0∗
∗ Excluding Q3 in West and East.
Area 1970 South West East North Ave. Q1 2
- 1
- 1
1 Q2 6
- 4
2
- 1
Q3
- 2
5
- 3
- 3
- 1
Q4
- 7
1 1 3 Average No regular patterns, average absolute deviation about the same (≈ 3)
Deviations
Deeper examination
Averaging the deviations over the two years: Average of Area 1969 & ’70 South West East North Ave. Q1 2
- 2
2 1 Q2 2 1
- 2
Q3 2 3∗ −2∗ 2 1 Q4
- 6
- 1
- 2
Average
∗ Excluding Q3 in 1969.
Still no regular patterns, average absolute deviation about 2. (Note that 3/ √ 2 ≈ 2.12.)
Generalizing
Essential features
1.
Consistent model for averages (over both years): Area South West East North Ave. 1969 45 53∗ 75∗ 95 67∗ 1970 44 54 75 95 67
2.
Consistently patternless, irregular deviations (over both years). ◮ Average deviation of zero. (Force of calculation.) ◮ Average absolute deviation the same over each year (about 3). ◮ Consistently patternless deviations over quarters and areas for each year and over both years. It is this consistently irregular pattern of deviations that indicates the ability to generalize the consistent averages of areas to other, unseen, years. Analysis of our tabular model suggests that it can be used to predict.