Welcome to this second in the series of lectures on Managerial - - PDF document

▶

Feb 03, 2023 270 likes •512 views

Welcome to this second in the series of lectures on Managerial statistics. In this lecture we will look at the first topic which is Classification, Summarization and Presentation of Data In the previous lecture we looked at the Overview of

SLIDE 1

SLIDE 2

Welcome to this second in the series of lectures on Managerial statistics. In this lecture we will look at the first topic which is Classification, Summarization and Presentation of Data

SLIDE 3

In the previous lecture we looked at the Overview of Statistical Analysis. In this lecture we will look at Classification, Summarization and processing of data followed by the presentation and communication of the data. The techniques for gathering data will be covered later as it is closely linked to research methodology and design. We will assume that data has been collected to address a research question and learn the basics of data classification, summarization and presentation.

SLIDE 4

Lets assume that you are trying to establish a central fuel storage facility for a given geographical area because of the difficulties in sustaining many small ones. We would need to determine the optimal size for this large central fuel storage facility. Let us assume that you have collected daily off take of fuel in KL from the area for the past 5 financial years. Let us see how this data can be used to determine the optimal size for the central storage facility that needs to be built.

SLIDE 5

First Look at the data. The top row has days numbered from 1 to 31. These are the days of the month. The first column has the year, starting from the current year at the bottom, annotated as CY and working its way up till CY -5 and CY -5. CY-5 is not visible only because of the scroll state of the data sheet otherwise be assured it is there. Questions that would naturally occur to you are what is the maximum offtake, what does the average look like, How many days have there been heavy offtakes? Etc. The first thing to do is to use excel to answer these questions.

SLIDE 6

The maximum and the average are fairly straightforward to obtain using the functions max and average already built into excel. Just provide the complete data range and you should straightaway get the maximum and average values of this data set. We still have a question pending How many days have there been heavy offtakes? This question is answered by a frequency table and we will see how to construct it now.

SLIDE 7

A frequency table achieves condensation of data by classification. By classification of data, for example in the fuel off take case, we mean calculating the number of days the fuel off take was less than 100 KL, between 100-200 KL and so on. Each such slot is called a class and the processes of condensing the data into these classes is called data classification. (P) How many classes should we have? It turns out that the ideal number of classes' that a data set should be divided into, depends on the number of data points in the data set and is give by the expression k = 1+3.3*log(n) where k is the number of classes and n is the number of data points. This is a bit counter intuitive since one would expect it to depend

n the range of the data. But trust me there is formal proof for this although it is beyond

the scope of this course. (P) Generally the number of classes for typical data sets range between 5 and 15 classes

SLIDE 8

Now lets look at some simple terms related to Classes. (P) Lets Assume this straight line is the total range of the data. (P) The minimum and maximum are as indicated. (P) Let us assume that this data has been equally divided into 5 classes using the Strugess

approximation. These classes are now shown divided using the triangular markers. (P) The

upper and lower class limits for Class 2 is now shown on the figure. (P) The upper and lower class limits of all the five classes are now annotated. Notice other than the maximum and the minimum value each limit has a definition for both the upper and the lower class. For example the upper class limit of class 2 is also the lower class limit of class 3 and so on. (P) The class interval which is the difference between two class limits is now annotated. (P) The mid class mark or MCM is the mid point of the class interval and is obtained as the average of the upper and lower class limits. The MCM is now annotated as a mark in the center of each class. This mark in data is used for plotting purposes which we will see shortly.

SLIDE 9

Earlier we showed how to determine the maximum and average of the entire data set using the max and average built in functions in excel. In this slide note that the range $c$5:$AG $79 shown in the formulae is the entire range of the data. This could vary in your worksheet if you had inserted or altered the format and position of your worksheet. We know that in order to calculate the class interval we need to determine the range of the data and divide it with the ideal number of classes that this data set should be divided into. (P) So first we will determine the number of data points (n) which is required to calculate the ideal number of classes using the count function. This data set has 2282 data points. (P) Next we will calculate the ideal number of classes using the strugess approximation 1+ 3.3 * Log (n) where n is the number of data points that we determined to be 2282. The result is 12 classes. (P) We next determine the range of the data by obtaining the difference between the maximum value in the data set and the minimum value in the data set which turns out to be 998. (P) Dividing this range by the ideal number of classes we get the class interval as 83. Since this is a sort of odd figure we could select 100 KL as our class

interval. We are now ready to look at the anatomy of a frequency distribution table.

SLIDE 10

Now we can look at a frequency distribution table for classifying data. The table typically has seven columns. (P) The first two are the Lower Class Limits and the Upper Class limits of each class. (P) The next column is the Mid Class Mark for each class. These definitions we have already seen previously. (P) The next column is frequency of occurrence in each class. This is fairly self explanatory although the mechanics of extracting the frequency of occurrence in each class needs some explanation. The next three columns i.e. Relative Frequency, Cumulative Frequency and Cumulative Relative frequencies are all different ways of presenting the frequency data and are used for plotting purposes. (P) The relative frequency is the percentage of the frequency in the current class to the total of all the frequencies in the data set. The cumulative frequency is the addition of all relative frequencies including and below a certain class. The cumulative relative frequency is the proportion of the relative frequency as a percentage of the relative frequency in the current class to the total of all the relative frequencies in the data set.

SLIDE 11

Let us no build the frequency distribution table step by step and if you pay attention you will learn the trick straight away. (P) As we have seen earlier the first two columns of the frequency distribution table are the lower and upper class limits of each class. Remember we had chosen a class interval of 100 Kilo Liters. (P) The next comes the mid class mark which is essentially the average of the lower and upper class limits. (P) The next comes the column which has the most action, i.e. the frequency column which is the basic purpose of this table. Fortunately for us excel has a built in function called frequency. Simple select the complete column and start typing the frequency formula. It will automatically start typing in the top of the selected column. The frequency function, as you would expect requires two basic parameters which are the complete range of the data on the work sheet and the upper class limits. Feed these two ranges and hold the control and shift keys down while you press enter. This action will fill in the frequencies in the complete column. Viola you have already determined the frequencies. I will repeat this just in case it was a bit quick. Simple select the complete column and start typing the frequency formula. It will automatically start typing in the top of the selected column. The frequency function, as you would expect requires two basic parameters which are the complete range of the data on the work sheet and the upper class limits. Feed these two ranges and hold the control and shift keys down while you press enter. This action will fill in the frequencies in the complete column. (P) The next action is to simply total the frequencies to check that the total is same as the count figure that we got earlier. You can see now that they are the same. (P) The next action is to get the relative frequency in each class which is essentially the frequency of the class as a percentage of the total

frequencies. (P) The cumulative frequency is then obtained by simply adding up all frequencies below

the UCL of the given class. (P) The cumulative relative frequency is obtained by simply adding up all the relative frequencies below the UCL of a given class. We now have the completed the table and are ready to do some data presentation.

SLIDE 12

There are three basic graphs which are used to present the data summarized in a frequency distribution table. They are histograms, Frequency Polygons and Ogives. We will now look at them one by one.

SLIDE 13

A histogram is a typical bar graph (with no spacing between the bars) showing the classes

n the x-axis and the occurrences on the y axis.

SLIDE 14

Lets build this with excel (P) Go to the table and select the column corresponding to the frequencies in the Frequency distribution table. Select insert and the clustered column type

f chart. The chart gets inserted automatically. Position it conveniently. (P) Right click on

the chart and choose select data. (P) The select data source dialog appears. Click the edit button on the Horizontal Category axis section of the dialog and bring up the select dialog (P) Select both the LCL and UCL columns on the frequency distribution table. (P) Select the Edit button in the Legend Entries section and (P) change the name of the series to

Frequency. (P) Select axis titles for the primary horizontal axis and select a title of “Fuel

Uptake – Classes”. (P) Select vertical axis title and select a title of “Frequency”(p) Select the title box and change the name to “Histogram” (P) Right click on any of the bars and choose format data series(P) Reduce the gap width to zero (P) Select the fill tab in the same format dialog and check vary colors by point. Your Histogram is ready for display. (P) You could move this to a separate sheet using the move chart option and copy it into a Microsoft power point presentation for communication (P) Right click on the bars and add data labels to improve the readability of the histogram

SLIDE 15

Take a look at the histogram that as been built and see how well the patterns in the data are revealed. The histogram can now be used to start thinking of some management decisions in respect of the size of the main and back up facilities that need to be built.

SLIDE 16

A frequency polygon is a graph with a curve representing the frequency distribution. This is another form of presentation of the histogram data and comes in hand when comparing two frequency distributions.

SLIDE 17

In excel the frequency polygon can be directly obtained from the graph of the histogram

itself. (P) First we create a copy of the histogram chart sheet that we created earlier using

the move or copy dialog by right clicking on the Histogram chart tab. (P) Rename the histogram sheet as a frequency polygon using the rename dialog by right clicking on the Histogram(2) tab (P) Select the bars by a left click and bring up the format data series

ption using the right click (P) Un check the vary colors by point dialog. (P) Right click on

the chart and bring up the Change Series Chart Type Dialog (P) Select the line with markers chart type and press OK (P) The resultant graph is the frequency polygon.

SLIDE 18

The next data presentation technique is the Ogive. An ogive is a distribution curve where the frequencies are cumulative. Usually this curve is drawn in both directions so that the complementary curves intersect and form the shape of an Ogive which is a pointed arch.

SLIDE 19

In order to generate the data for a Ogive, we need to add another column to our frequency distribution table. That is the greater than column which contains the data of the number of frequencies greater than the lower class limit of each class. (p) The first row of this column contains the frequencies greater than the lower class limit of the lowest class which would always be the total frequencies of occurrence. In this case there are 2282

ccurrences in the data. (P) The next row onwards has the frequencies greater than the

lower class limit of the current class. (P) This is simply calculated by taking the frequencies greater than the LCL of the previous class and the frequency of the previous class.

SLIDE 20

Select the cumulative frequency column in the frequency distribution table including the heading and insert a graph of type line with markers. (P) Once inserted position and size the graph conveniently (P) Select the greater than column including the heading and copy using the right click context menu (P) Right click anywhere on the graph and select paste from the context menu (P) Viola the Ogive shows up on the graph. Now lets spruce up the graph a little bit (P) select the x-axis labels showing the classes and title of “ Class of frequencies of fuel uptake in KL” ,(p) Give a chart title “Ogive Chart” using the chart title dailog (P) Give y axis label “Frequency” using the y-axis title dialog in chart tools (P) Now you can move the chart to a separate sheet as we had done before for histograms and start using it in other communication tools such as Microsoft Power point. (P) The Ogive chart gives a good basis to discuss satisfaction levels for different levels of storage that might be constructed in the facility.

SLIDE 21

We will close this session with a summary of important points to remember while condensing data using frequency distributions. (P) The classes used as the basis

f the frequency distribution must be of equal width and should not overlap. They

should also cover the entire range of data chosen. This is technically called by the fancy phrase “Mutually Exclusive and Collectively Exhaustive. (P) Remember too many classes with not result in meaningful data condensation. At the same time too few classes will result in loss of information due to too much condensation of

data. (P) The number of classes in which the data is to be divided is determined

by the number of data points ‘n’ and not the range of data. This is give by the Strugess approximation k = 1 + 3.3. Log (n)

SLIDE 22

In this session classified, summarized and processed a collected data set in the form of a frequency distribution table. We then looked at the process of generating graphical representations of the summarized information for managerial communications in the form

f Histograms, frequency curves and Ogive chart. (P) In the next session onwards we will

begin looking at descriptive statistics assuming that the data corresponds to the whole population.

SLIDE 23