Business Statistics CONTENTS The role of data The data matrix - - PowerPoint PPT Presentation

business statistics
SMART_READER_LITE
LIVE PREVIEW

Business Statistics CONTENTS The role of data The data matrix - - PowerPoint PPT Presentation

DATA Business Statistics CONTENTS The role of data The data matrix Data types Aspects of data Obtaining data Further study THE ROLE OF DATA Data refers to observed facts there are 82 persons in this train the weight of


slide-1
SLIDE 1

DATA

Business Statistics

slide-2
SLIDE 2

The role of data The data matrix Data types Aspects of data Obtaining data Further study CONTENTS

slide-3
SLIDE 3

Data refers to observed facts

▪ “there are 82 persons in this train” ▪ “the weight of this pizza is 283 gram” ▪ “this museum hosts paintings by Picasso”

Data helps

▪ to suggest theories (“pizzas with a high price are less popular”) ▪ to test hypotheses (“advertising increase sales”) ▪ to calibrate coefficients of theories (“𝑟 = 𝑏 − 𝑐𝑞, but what are 𝑏 and 𝑐?”)

THE ROLE OF DATA

slide-4
SLIDE 4

Columns: variables (may have identifying name like “age”) Rows: subjects/cases (may have identifying name like “John”) Cells: observations Entire table: data matrix THE DATA MATRIX

Observation Subject/Case Variable

slide-5
SLIDE 5

THE DATA MATRIX

Variable name Variable unit Numerical data Nominal data Ordinal data Binary data Missing

  • bservation

Subject name

slide-6
SLIDE 6

Information to extract from a data matrix ▪ One variable

▪ mean age at inauguration ▪ odds of republicans vs. democrats ▪ univariate analysis

▪ Two variables

▪ association between handedness and party ▪ correlation between age and number of terms ▪ bivariate analysis

▪ Many variables

▪ predict terms as a function of height and handednes ▪ multivariate analysis

THE DATA MATRIX

slide-7
SLIDE 7

The data matrix can represent: ▪ all data (the population)

▪ a list of all US presidents

▪ a non-random selection of data

▪ a list of all US presidents since 1969

▪ a random selection of data (a sample)

▪ a subset of randomly picked presidents from the full list

▪ descriptive statistics is applicable to all three cases ▪ inferential statistics focuses on how to draw conclusions for a population on the basis of information on a random sample THE DATA MATRIX

slide-8
SLIDE 8

You find data on the body size of 5 men and 5 women Organize these data in a data matrix EXERCISE 1

slide-9
SLIDE 9

▪ Type of data

▪ categorical, numerical

▪ Countability

▪ discrete, continuous

▪ Range

▪ restricted, infinite, semi-infinite

▪ Coded

▪ numbers for text

▪ Recoded

▪ text for ranges of numbers (or ranges of texts)

ASPECTS OF DATA

slide-10
SLIDE 10

Type of data ▪ categorical

▪ e.g., dog, cat, horse

▪ numerical (cardinal)

▪ e.g., 12, 45.29

Has consequences for:

▪ transformations (income per capita vs. car type per capita) ▪ statistical summaries (average income vs. average car type)

Special cases

▪ Likert scale (5 or 7-point scale: “strongly agree”, “somewhat agree”, etc.) ▪ binary variable (0/1, yes/no, Dutch/foreign)

ASPECTS OF DATA

slide-11
SLIDE 11

Countability ▪ discrete

▪ e.g., eggs

▪ (semi-)continuous

▪ e.g., waiting time

Has consequences for:

▪ recoding (“binning”) ▪ statistical summaries (modal income vs. median income)

ASPECTS OF DATA

slide-12
SLIDE 12

Range ▪ (semi-)infinite

▪ e.g., income

▪ restricted

▪ e.g., percentage of satisfied customers

Has consequences for:

▪ dealing with outliers (exceptional data points)

ASPECTS OF DATA

slide-13
SLIDE 13

Coding ▪ replacing nominal categories by numbers

▪ e.g., Ford=1, Audi=2, Volkswagen=3, Opel=4

▪ replacing ordinal categories by numbers

▪ e.g., tiny=1, small=2, normal=3, big=4, huge=5

Has consequences for:

▪ preventing recording mistakes (e.g., Vlokswgaen) ▪ preparing for statistical calculations (SPSS, Stata, R, etc)

ASPECTS OF DATA

slide-14
SLIDE 14

Recoding ▪ grouping categorical data

▪ e.g., “Volkswagen”+“Audi”+“Opel”=“German car”

▪ grouping numerical data

▪ e.g., 𝑦 ∈ 20.000,25.000 =“middle income”

Has consequences for:

▪ statistical summaries (histograms, modal values)

ASPECTS OF DATA

slide-15
SLIDE 15

Coding of categories into numbers ASPECTS OF DATA

slide-16
SLIDE 16

Coding of categories into several binary variables ▪ using dummy variables (or dummies for short) ▪ 𝑜dummies = 𝑜categories (redundant!) ▪ 𝑜dummies = 𝑜categories − 1 (with omitted category) ASPECTS OF DATA

slide-17
SLIDE 17

Some pitfalls: ▪ missing data

▪ blank? 0? 99?

▪ treating coded categories or number-like categories as numbers

▪ e.g., if Volkswage=1, Audi=2, BMW=3, the average car in this street 1.92?

▪ units of data

▪ see Math course

▪ decimals

▪ see Math course

ASPECTS OF DATA

slide-18
SLIDE 18

Describe the appropriate data characteristic (categorical,

  • rdinal, nominal, numerical, continuous, discrete, dummy,

etc.) for

  • a. body size (171, 184, etc.)
  • b. pet (cat, dog, rabbit)
  • c. righthandedness (0, 1)
  • d. income group (low, medium, high)
  • e. number of children (0, 1, 2, etc.)

EXERCISE 2

slide-19
SLIDE 19

Typing

▪ from books, etc.

Downloading

▪ from online databases (like CBS) ▪ from general webpages (like Wikipedia)

OBTAINING DATA

slide-20
SLIDE 20

Purchasing

▪ commercial databases

OBTAINING DATA

slide-21
SLIDE 21

Generating

▪ from secondary sources ▪ combining multiple sources ▪ by primary research ▪ doing interviews ▪ doing observations ▪ doing experiments

OBTAINING DATA

slide-22
SLIDE 22

Doane & Seward 5/E 2.1-2.2 Tutorial exercises week 1 data FURTHER STUDY