Aims of these Lectures Discuss some basic statistical - - PowerPoint PPT Presentation

aims of these lectures
SMART_READER_LITE
LIVE PREVIEW

Aims of these Lectures Discuss some basic statistical - - PowerPoint PPT Presentation

Aims of these Lectures Discuss some basic statistical concepts/techniques. Relate to these to Economics. Help you to help yourself These lectures are not, and are not intended to be, a substitute for the Applied Statistics for


slide-1
SLIDE 1

Aims of these Lectures

  • Discuss some basic statistical concepts/techniques.
  • Relate to these to Economics.
  • Help you to help yourself
  • These lectures are not, and are not intended to be,

a substitute for the Applied Statistics for Economics and Business course.

slide-2
SLIDE 2

A Useful Website:

http://www.maths.murdoch.edu.au/units/statsnotes/

A useful statistics package:

MINITAB Start Networked Applications General Software Statistics and Graphing Minitab 14

slide-3
SLIDE 3

Two Big Issues

  • Why are Some Countries Richer than

Others?

  • An old issue in Economics: Adam Smith ‘Wealth
  • f Nations, 1776
  • Why do Some Countries Grow faster than

Others?

  • Countries are richer now because they grew

faster in the past, e.g. compare 2000 with 1500

slide-4
SLIDE 4

Neoclassical Growth Theory

Higher s implies larger output per capita and real

wage rate in the long run.

Higher n implies lower output per capita and real

wage rate in the long run.

Countries which are far below their long run

equilibrium will grow faster than countries which are close to their steady state.

slide-5
SLIDE 5

The Penn World Tables

Panel data set of Macroeconomic variables 208 countries 25 macro variables 1950 to 2000 Many missing values

slide-6
SLIDE 6

Accessing the PWT

available at http://www.pwt.econ.upenn.edu/ Select countries/years/variables Choose CSV option Copy and Paste data into Notepad Save as, e.g., “mydata.csv” – keep “” Open in Excel (Alternatively: use Word and save as test

file.)

slide-7
SLIDE 7

Describing Data

Tabulate, List Numerical Summary Graphical Summary

slide-8
SLIDE 8

Frequency Table

Select a suitable set of class intervals bounded by

class limits.

The class frequency is the number of data points in

each interval.

The class mark is the midpoint of the class interval. Class Boundaries may differ from Class Limits (as a

result of rounding).

The class size is the difference between the upper

and lower class boundaries.

slide-9
SLIDE 9

Frequency Distributions

Suppose we have a sample of n observations. The (absolute) frequency of any value is the number of times that value appears in the sample The relative frequency of a value is the proportion of the sample which has that value. The empirical frequency distribution of a random variable is the sample analogue of its probability distribution. It can be graphed by constructing a histogram.

slide-10
SLIDE 10

World Distribution of Real GDP per capita 1960, 2000

rgdp1960 rgdp2000 0-999 23 15 1000-1999 27 13 2000-2999 21 7 3000-3999 13 10 4000-4999 5 8 5000-5999 3 4 6000-6999 1 4 7000-7999 6 2 8000-8999 2 2 9000-9999 2 3 10000-10999 4 2 11000-14999 3 3 15000-19999 6 20000-24999- 10 25000 + 9

Note: Class Boundaries are, e.g. 2999.5-3999.5

slide-11
SLIDE 11

The Median

Smaller than 50% of the sample and larger than 50% of the

sample

Order the sample from smallest to largest, the median lies

halfway up the order.

Let n be the sample size:

if n is odd, median is at observation (n+1)/2 if n is even, average the two values at n/2 and (n/2)+1.

A useful property: the median is insensitive to (changes in)

extreme sample values.

slide-12
SLIDE 12

Quantiles

The First Quartile, Q1, is larger than 25% of the sample values

and smaller that 75%

The Third Quartile, Q3, is larger than 75% of the sample values

and smaller that 25%

The Second Quartile, Q2, is the Median The Interquartile Range, Q3-Q1, is a robust measure of the

variability of the sample data.

Other frequently used quartiles are deciles and percentiles.

slide-13
SLIDE 13

The Mean

Defined as The ‘centre of gravity’ of the distribution. Sensitive to extreme values. Gives each sample value the same ‘weight’, 1/n.

=

= + + + =

n i i n

x n n x x x x

1 2 1

1 ...

slide-14
SLIDE 14

Comparisons

44009 14877 Maximum 482 383 Minimum 14231 2893 IQR 1590 3970 Q3 4361 2305 Median 1669 1076 Q1 9088 3332 Mean RGDP2000 RGDP1960

slide-15
SLIDE 15

Graphical Methods

Stem and Leaf Plots Box Plots Bar Charts Histograms

slide-16
SLIDE 16

Stem and Leaf Plots

Given a set of numbers:

The leaf is the last digit considered. The leaf unit specifies which digit. The stem is the rest of the number. The first column is the count for each stem. The count where the median occurs is

enclosed in parentheses.

slide-17
SLIDE 17

Stem and Leaf for World GDP, 1960

23 0 34445556677778889999999 50 1 000001111123334455666778899 (21) 2 011223333344566667799 40 3 0000122234489 27 4 12669 22 5 238 19 6 8 18 7 334778 12 8 12 10 9 26 8 10 1469 4 11 55 2 12 4 1 13 1 14 8

N = 111, Leaf Unit = 100 i.e. lowest rgdp is in range 300-400, highest is in range 14800-14900

slide-18
SLIDE 18

Departures from Symmetry

Skewness: A measure of asymmetry of a distribution Skewness is zero for a symmetric distribution. Positive Skewness - long tail to the right, mean greater than median. Negative Skewness - long tail to the left, mean less than median. Kurtosis: a measure of thickness of the tails

slide-19
SLIDE 19

Box Plots

Indicate symmetry and variability of the sample values. Measuring along the horizontal or vertical axis, draw a

box with edges at Q1 and Q3 so its length is the IQR.

The width is up to you. Draw a line across the box at the median value Draw lines - whiskers - from the box to the sample

maximum and sample minimum values (excluding

  • utliers).

Observations lying more than 1.5*IQR from the edges of

the box are ‘outliers’ and are represented by asterisks.

slide-20
SLIDE 20

Boxplot of World Income Distribution, 1960

16000 14000 12000 10000 8000 6000 4000 2000

slide-21
SLIDE 21

The Evolution of World Income Distribution, 1960- 2000

rgdp2000 rgdp1990 rgdp1980 rgdp1970 rgdp1960 50000 40000 30000 20000 10000

slide-22
SLIDE 22

Bar Chart of World GDP per capita, 1960

1960 5 10 15 20 25 30

  • 9

9 9 2

  • 2

9 9 9 4

  • 4

9 9 9 6

  • 6

9 9 9 8

  • 8

9 9 9 1

  • 1

9 9 9 1 2

  • 1

2 9 9 9 1 4

  • 1

4 9 9 9 1 6

  • 1

6 9 9 9 1 8

  • 1

8 9 9 9 2

  • 2

9 9 9

  • 2

2

  • 2

2 9 9 9 2 4

  • 2

4 9 9 9 2 6

  • 2

6 9 9 9 2 8 +

slide-23
SLIDE 23

Bar Chart of World GDP per capita, 2000

2000 2 4 6 8 10 12 14 16

  • 9

9 9 2

  • 2

9 9 9 4

  • 4

9 9 9 6

  • 6

9 9 9 8

  • 8

9 9 9 1

  • 1

9 9 9 1 2

  • 1

2 9 9 9 1 4

  • 1

4 9 9 9 1 6

  • 1

6 9 9 9 1 8

  • 1

8 9 9 9 2

  • 2

9 9 9

  • 2

2

  • 2

2 9 9 9 2 4

  • 2

4 9 9 9 2 6

  • 2

6 9 9 9 2 8 +

slide-24
SLIDE 24

Histograms

Given a sample of size n,

  • 1. Select a number of classes - ‘bins’ - of equal width. Each

sample value falls into one of the classes.

  • 2. Calculate the number of values in each class - the

class frequency.

  • 3. Construct a bar graph where

(a) the base of each bar is the class width (b) the height is the frequency for that class

  • r the relative frequency for the class.
  • 4. A rule for bin width

Note: Sometimes useful to have unequal bin widths.

2 1/3 IQR h n =

slide-25
SLIDE 25

Histogram of World GDP per capita, 1960

rgdp1960 F requency

14000 12000 10000 8000 6000 4000 2000 40 30 20 10

slide-26
SLIDE 26

Histogram of World GDP per capita, 2000

rgdp2000 Frequency

40000 30000 20000 10000 35 30 25 20 15 10 5

Note: Badly chosen class intervals.

slide-27
SLIDE 27

Convergence

Do Poorer Countries Grow Faster?

slide-28
SLIDE 28

rgdp1960 avgrowt h

16000 14000 12000 10000 8000 6000 4000 2000 0.06 0.05 0.04 0.03 0.02 0.01 0.00

  • 0.01
  • 0.02

Scatterplot of Growth, 1960-2000, vs Initial RGDP

slide-29
SLIDE 29

A Linear Relationship between Two Variables.

Choose Y as the dependent variable and X

as the independent variable.

What a and b best represent, the data?

Y a bX = +

slide-30
SLIDE 30

Fitting a Line to Data

Could join any two points but line may be a

long way from others.

Any line drawn through the data generates a

set of residuals, some positive some negative.

The distance of a point from the line can be

measured by the squared residual.

The Least Squares criterion: ‘minimise the

sum of the squared residuals’.

slide-31
SLIDE 31

The ‘Least Squares’ Coefficients

( )( ) ( )

X b Y a x y x X X Y Y X X b

n i i n i i i n i i n i i i

− = = − − − =

∑ ∑ ∑ ∑

= = = = 1 2 1 1 2 1

slide-32
SLIDE 32

A Regression Worksheet

0.06 b =

  • 0.07

a =

0.28 5.50 Means 144.42 9.26 0.00 0.00 2.84 55.05 Sums 9.13 0.52

  • 0.17
  • 3.02

0.11 2.48 10 0.00 0.01

  • 0.15
  • 0.07

0.13 5.44 9 9.12 0.53

  • 0.18
  • 3.02

0.11 2.48 8 4.76 0.27 0.13 2.18 0.41 7.69 7 11.80 0.65

  • 0.19
  • 3.44

0.10 2.07 6 0.01

  • 0.02

0.18

  • 0.09

0.47 5.41 5 0.00 0.00 0.07

  • 0.02

0.36 5.49 4 0.58 0.11

  • 0.15
  • 0.76

0.13 4.74 3 4.05 0.42

  • 0.21
  • 2.01

0.08 3.49 2 104.97 6.77 0.66 10.25 0.94 15.75 1 xx xy y x Y X Obs

slide-33
SLIDE 33

rgdp1960 avgrowt h

15000 12500 10000 7500 5000 0.04 0.03 0.02 0.01 0.00

A verage Growth vs RGDP 1960: 30 Richest Countries

slide-34
SLIDE 34

rgdp1960 avgrowt h

1200 1100 1000 900 800 700 600 500 400 300 0.05 0.04 0.03 0.02 0.01 0.00

  • 0.01

Average Growth vs RGDP1960: 30 Poorest Countries