business statistics
play

Business Statistics CONTENTS The role of data The data matrix - PowerPoint PPT Presentation

DATA Business Statistics CONTENTS The role of data The data matrix Data types Aspects of data Obtaining data Further study THE ROLE OF DATA Data refers to observed facts there are 82 persons in this train the weight of


  1. DATA Business Statistics

  2. CONTENTS The role of data The data matrix Data types Aspects of data Obtaining data Further study

  3. THE ROLE OF DATA Data refers to observed facts ▪ “there are 82 persons in this train” ▪ “the weight of this pizza is 283 gram” ▪ “this museum hosts paintings by Picasso” Data helps ▪ to suggest theories (“pizzas with a high price are less popular”) ▪ to test hypotheses (“advertising increase sales”) ▪ to calibrate coefficients of theories (“ 𝑟 = 𝑏 − 𝑐𝑞 , but what are 𝑏 and 𝑐 ?”)

  4. THE DATA MATRIX Columns: variables (may have identifying name like “age”) Rows: subjects/cases (may have identifying name like “John”) Cells: observations Variable Entire table: data matrix Observation Subject/Case

  5. THE DATA MATRIX Missing Variable unit Variable name observation Subject name Binary data Nominal data Numerical data Ordinal data

  6. THE DATA MATRIX Information to extract from a data matrix ▪ One variable ▪ mean age at inauguration ▪ odds of republicans vs. democrats ▪ univariate analysis ▪ Two variables ▪ association between handedness and party ▪ correlation between age and number of terms ▪ bivariate analysis ▪ Many variables ▪ predict terms as a function of height and handednes ▪ multivariate analysis

  7. THE DATA MATRIX The data matrix can represent: ▪ all data (the population) ▪ a list of all US presidents ▪ a non-random selection of data ▪ a list of all US presidents since 1969 ▪ a random selection of data (a sample) ▪ a subset of randomly picked presidents from the full list ▪ descriptive statistics is applicable to all three cases ▪ inferential statistics focuses on how to draw conclusions for a population on the basis of information on a random sample

  8. EXERCISE 1 You find data on the body size of 5 men and 5 women Organize these data in a data matrix

  9. ASPECTS OF DATA ▪ Type of data ▪ categorical, numerical ▪ Countability ▪ discrete, continuous ▪ Range ▪ restricted, infinite, semi-infinite ▪ Coded ▪ numbers for text ▪ Recoded ▪ text for ranges of numbers (or ranges of texts)

  10. ASPECTS OF DATA Type of data ▪ categorical ▪ e.g., dog, cat, horse ▪ numerical (cardinal) ▪ e.g., 12, 45.29 Has consequences for: ▪ transformations (income per capita vs. car type per capita) ▪ statistical summaries (average income vs. average car type) Special cases ▪ Likert scale (5 or 7- point scale: “strongly agree”, “somewhat agree”, etc.) ▪ binary variable (0/1, yes/no, Dutch/foreign)

  11. ASPECTS OF DATA Countability ▪ discrete ▪ e.g., eggs ▪ (semi-)continuous ▪ e.g., waiting time Has consequences for: ▪ recoding (“binning”) ▪ statistical summaries (modal income vs. median income)

  12. ASPECTS OF DATA Range ▪ (semi-)infinite ▪ e.g., income ▪ restricted ▪ e.g., percentage of satisfied customers Has consequences for: ▪ dealing with outliers (exceptional data points)

  13. ASPECTS OF DATA Coding ▪ replacing nominal categories by numbers ▪ e.g., Ford=1, Audi=2, Volkswagen=3, Opel=4 ▪ replacing ordinal categories by numbers ▪ e.g., tiny=1, small=2, normal=3, big=4, huge=5 Has consequences for: ▪ preventing recording mistakes (e.g., Vlokswgaen) ▪ preparing for statistical calculations (SPSS, Stata, R, etc)

  14. ASPECTS OF DATA Recoding ▪ grouping categorical data ▪ e.g., “Volkswagen”+“Audi”+“Opel”=“German car” ▪ grouping numerical data ▪ e.g., 𝑦 ∈ 20.000,25.000 =“middle income” Has consequences for: ▪ statistical summaries (histograms, modal values)

  15. ASPECTS OF DATA Coding of categories into numbers

  16. ASPECTS OF DATA Coding of categories into several binary variables ▪ using dummy variables (or dummies for short) ▪ 𝑜 dummies = 𝑜 categories (redundant!) ▪ 𝑜 dummies = 𝑜 categories − 1 (with omitted category)

  17. ASPECTS OF DATA Some pitfalls: ▪ missing data ▪ blank? 0? 99? ▪ treating coded categories or number-like categories as numbers ▪ e.g., if Volkswage=1, Audi=2, BMW=3, the average car in this street 1.92? ▪ units of data ▪ see Math course ▪ decimals ▪ see Math course

  18. EXERCISE 2 Describe the appropriate data characteristic (categorical, ordinal, nominal, numerical, continuous, discrete, dummy, etc.) for a. body size (171, 184, etc.) b. pet (cat, dog, rabbit) c. righthandedness (0, 1) d. income group (low, medium, high) e. number of children (0, 1, 2, etc.)

  19. OBTAINING DATA Typing ▪ from books, etc. Downloading ▪ from online databases (like CBS) ▪ from general webpages (like Wikipedia)

  20. OBTAINING DATA Purchasing ▪ commercial databases

  21. OBTAINING DATA Generating ▪ from secondary sources ▪ combining multiple sources ▪ by primary research ▪ doing interviews ▪ doing observations ▪ doing experiments

  22. FURTHER STUDY Doane & Seward 5/E 2.1-2.2 Tutorial exercises week 1 data

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend