Statistics for Business Descriptive Statistics Panagiotis Th. - - PowerPoint PPT Presentation

statistics for business
SMART_READER_LITE
LIVE PREVIEW

Statistics for Business Descriptive Statistics Panagiotis Th. - - PowerPoint PPT Presentation

Statistics for Business Descriptive Statistics Panagiotis Th. Konstantinou MSc in International Shipping, Finance and Management , Athens University of Economics and Business First Draft : July 15, 2015. This Draft : September 3, 2020. P.


slide-1
SLIDE 1

Statistics for Business

Descriptive Statistics Panagiotis Th. Konstantinou

MSc in International Shipping, Finance and Management, Athens University of Economics and Business

First Draft: July 15, 2015. This Draft: September 3, 2020.

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 1 / 20

slide-2
SLIDE 2

Descriptive Statistics

Key Concepts

A population is the collection of all items of interest or under investigation (N represents the population size) A sample is an observed subset of the population (n represents the sample size) A parameter is a specific characteristic of a population A statistic is a specific characteristic of a sample

a b c d ef gh i jk l m n

  • p q rs t u v w

x y z

Population Sample

Values calculated using population data are called parameters Values computed from sample data are called statistics b c g i n

  • r u

y

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 2 / 20

slide-3
SLIDE 3

Descriptive Statistics

Data Types

Data Categorical Numerical Discrete Continuous

Examples:

Marital Status

Are you registered to vote?

Eye Color (Defined categories or groups) Examples:

Number of Children

Defects per hour (Counted items) Examples:

Weight

Voltage (Measured characteristics)

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 3 / 20

slide-4
SLIDE 4

Descriptive Statistics

Relationships Between Variables

Cost per Day vs. Production Volume

50 100 150 200 250 10 20 30 40 50 60 70 Volume per Day Cost per Day Volume per day Cost per day 23 125 26 140 29 146 33 160 38 167 42 170 50 188 55 195 60 200

Investment Investor A Investor B Investor C Total Category

Stocks 46.5 55 27.5 129 Bonds 32.0 44 19.0 95 CD 15.5 20 13.5 49 Savings 16.0 28 7.0 51 Total 110.0 147 67.0 324

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 4 / 20

slide-5
SLIDE 5

Descriptive Statistics

Describing Data Numerically

Arithmetic Mean Median Mode Describing Data Numerically Variance Standard Deviation Coefficient of Variation Range Interquartile Range Central Tendency Variation

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 5 / 20

slide-6
SLIDE 6

Descriptive Statistics Measures of Central Tendency

Measures of Central Tendency

Central Tendency

Mean Median Mode

n x x

n 1 i i

=

= Overview

Midpoint of ranked values Most frequently

  • bserved value

Arithmetic average

Median position n+1

2 position in the ordered data

◮ If the number of values is odd, the median is the middle number ◮ If the number of values is even, the median is the average of the two middle numbers

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 6 / 20

slide-7
SLIDE 7

Descriptive Statistics Measures of Central Tendency

Measures of Central Tendency

Example

House Prices $2,000,000 500,000 300,000 100,000 100,000 Sum $3,000,000 Mean: $3,000,000/5 = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 7 / 20

slide-8
SLIDE 8

Descriptive Statistics Measures of Central Tendency

Shape of a Distribution

 

Mean = Median Mean < Median

Median < Mean

Right-Skewed Left-Skewed Symmetric

Describes how data are distributed Measures of shape:

◮ Symmetric or skewed ◮ Left = Negative (mass of distr. concentrated on the right of figure); Right = Positive (mass of distr. concentrated on the left of figure). SK =

1 n

n

i=1(xi − ¯

x)3 1

n

n

i=1(xi − ¯

x)23/2 =

1 n

n

i=1(xi − ¯

x)3 s3

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 8 / 20

slide-9
SLIDE 9

Descriptive Statistics Measures of Variability

Measures of Variability

Same center, different variation

Variation

Variance Standard Deviation Coefficient of Variation Range Interquartile Range

 Measures of variation give

information on the spread

  • r variability of the data

values.

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 9 / 20

slide-10
SLIDE 10

Descriptive Statistics Measures of Variability

Quartiles and IQR – I

Quartiles split the ranked data into 4 segments with an equal number of values per segment (25% of the values in each segment) We may find a quartile by determining the value in the appropriate position in the ranked data (with n being the number

  • f observed values):

◮ First quartile position: Q1 = 0.25(n + 1) ◮ Second quartile position: Q2 = 0.50(n + 1) (the median position) ◮ Third quartile position: Q3 = 0.75(n + 1)

Example: Find the first quartile

◮ Sample Ranked Data: 11 12 ⋆ 13 16 16 17 18 21 22 (n = 9) ◮ Q1 = is in the 0.25(9 + 1) = 2.5 position of the ranked data so use the value half way between the 2nd and 3rd values ◮ So Q1 = 12.5

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 10 / 20

slide-11
SLIDE 11

Descriptive Statistics Measures of Variability

Quartiles and IQR – II

We can eliminate some outlier problems by using the interquartile range

◮ Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data ◮ Interquartile range = 3rd quartile – 1st quartile IQR = Q3 − Q1

Example: Sample Ranked Data: 12 30 45 57 70 (n = 5)

◮ Q1 = 30; Q2 = 45; Q3 = 57; ◮ IQR = Q3 − Q1 = 57 − 30 = 27.

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 11 / 20

slide-12
SLIDE 12

Descriptive Statistics Measures of Variability

Variance

Population Variance: Average of squared deviations of values from the mean σ2 = N

i=1(Xi − µ)2

N where

◮ µ = population mean ◮ N = population size ◮ Xi = i−th value of the variable X

Sample Variance: Average (approximately) of squared deviations of values from the sample mean: s2 = n

i=1(xi − ¯

x)2 n − 1 where

◮ ¯ x = sample mean/average ◮ n = sample size ◮ xi = i−th value of the variable X

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 12 / 20

slide-13
SLIDE 13

Descriptive Statistics Measures of Variability

Standard Deviation

Population Standard Deviation: Most commonly used measure of variation

◮ Shows variation about the mean ◮ Has the same units as the

  • riginal data

σ = N

i=1(Xi − µ)2

N Sample Standard Deviation: Most commonly used measure of variation

◮ Shows variation about the sample mean ◮ Has the same units as the

  • riginal data

s = n

i=1(xi − ¯

x)2 n − 1

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 13 / 20

slide-14
SLIDE 14

Descriptive Statistics Measures of Variability

Standard Deviation

Example: Sample Standard Deviation Computation

Sample Data (xi) : 10 12 14 15 17 18 18 24 n = 8 and sample mean = ¯ x = 16 So the standard deviation is s =

  • (10 − ¯

x)2 + (12 − ¯ x)2 + (14 − ¯ x)2 + · · · + (24 − ¯ x)2 n − 1 =

  • (10 − 16)2 + (12 − 16)2 + (14 − 16)2 + · · · + (24 − 16)2

8 − 1 =

  • 126

7 = 4.2426 This is a measure of the “average” scatter around the (sample) mean.

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 14 / 20

slide-15
SLIDE 15

Descriptive Statistics Measures of Variability

Comparing Standard Deviations

Small standard deviation Large standard deviation

The smaller the standard deviation, the more concentrated are the values around the mean.

Mean = 15.5

s = 3.338

11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21

Data B Data A Mean = 15.5

s = 0.926

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5

s = 4.570

Data C

Same mean, different standard deviations.

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 15 / 20

slide-16
SLIDE 16

Descriptive Statistics Measures of Variability

Coefficient of Variation

Measures relative variation and is always in percentage (%) Shows variation relative to mean Can be used to compare two or more sets of data measured in different units CV = sx ¯ x

  • · 100%

Stock A:

◮ Avg price last year = $50 ◮ Standard deviation = $5 CVA= $5 $50

  • ·100%=10%

Stock B:

◮ Avg. price last year = $100 ◮ Standard deviation = $5 CVB= $5 $100

  • ·100%=5%

Both stocks have the same standard deviation, but stock B is less variable relative to its price

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 16 / 20

slide-17
SLIDE 17

Descriptive Statistics Empirical Rule

The Empirical Rule

If the data distribution is bell-shaped, then the interval:

 

σ μ ±

μ

68%

1σ μ ±

µ ± 1σ contains about 68% of the values in the population or the sample

 

σ μ ± σ μ ±

σ μ ±

95%

2σ μ ±

µ ± 2σ contains about 95% of the values in the population or the sample

 

σ μ ± σ μ ±

3σ μ ±

99.7%

σ μ ±

µ ± 3σ contains almost all (about 99.7%) of the values in the population or the sample.

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 17 / 20

slide-18
SLIDE 18

Descriptive Statistics Covariance and Correlation

Covariance

The covariance measures the strength of the linear relationship between two variables The population covariance: Cov(X, Y) = σXY = N

i=1(Xi − µX)(Yi − µY)

N . The sample covariance:

  • Cov(x, y) = sxy =

n

i=1(xi − ¯

x)(yi − ¯ y) n − 1 . Only concerned with the strength of the relationship No causal effect is implied

◮ Cov(x, y) > 0, x and y tend to move in the same direction ◮ Cov(x, y) < 0, x and y tend to move in opposite directions

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 18 / 20

slide-19
SLIDE 19

Descriptive Statistics Covariance and Correlation

Correlation Coefficients

The correlation coefficient measures the relative strength of the linear relationship between two variables The population correlation coefficient: Corr(X, Y) = ρXY = Cov(X, Y) σXσY . The sample correlation coefficient:

  • Corr(x, y) = rxy =
  • Cov(x, y)

sxsy . Unit free and ranges between −1 and 1

◮ The closer to −1, the stronger the negative linear relationship ◮ The closer to 1, the stronger the positive linear relationship ◮ The closer to 0, the weaker any positive linear relationship

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 19 / 20

slide-20
SLIDE 20

Descriptive Statistics Covariance and Correlation

Correlation Coefficients

Examples

Y X Y X Y X Y X Y X r = -1 r = -.6 r = 0 r = +.3 r = +1 Y X r = 0

  • P. Konstantinou (AUEB)

Statistics for Business – I September 3, 2020 20 / 20