COMM 291 Midterm Review Session By Simon Roberts Types of - PowerPoint PPT Presentation

COMM 291 Midterm Review Session By Simon Roberts

Types of Variables • Categorical Variable: Variable names fall into “bins” or categories • Binary Variable: There are exactly 2 options (true/false) • Nominal: Variables are simply named (colours, shapes) • Ordinal: Variables have a specific order (Infant, Youth, Teen, Adult) • Quantitative Variable: Variables have measured numeric values • Identifier Variable: Unique Identifiers, such as a Social Insurance Number, Student ID or Amazon Tracking Number

Types of Variables – Activity Student ID Age Tuition Major Height Satisfaction Rating 12345 19 $6800 Finance 180 cm Neutral 12346 20 $5900 Computer 168 cm Extremely Science Likes

Surveys and Sampling • Population: All individuals with a common characteristic you want to generalize about • Sample: A “slice” of the population • Parameter: a fact or characteristic about the population • Statistic: a fact or characteristic about the sample

Biased Samples • Nonresponse/Undercoverage: Members of the population are systematically excluded from the sample • Conducting a telephone survey during the day (excludes commuters) • Voluntary Bias: Subjects who feel strongly self-select to participate • Common with hot-button issues (gun control, affirmative action, etc.) • Convenience Bias: Choosing subjects based on whether they’re easy to survey • Standing at a mall and asking the first 50 people who agree to take part

Sampling Designs • Simple Random Sample: Each individual has an equal chance of being selected • Stratified Random Sample: Divide population into homogenous groups and select from each stratum • Divide by age group, political affiliation and then sample by group • Cluster Random Sample: Divide population into heterogenous groups and select a few clusters • Randomly select a few high schools to represent a district • Systematic Sampling: Select every nth individual

Sampling Designs Activity UBC is interested in improving their food services on campus, so they wish to sample their students. Identify the method of sampling. 1. There are 4 faculties (commerce, engineering, science, arts). Randomly select 20 students from each. 2. Randomly select one of the faculties and survey all the students in that faculty. 3. Stop every 10 th person who enters the Nest that enters on a Thursday. 4. Each student has a student ID. Randomly select 200 participants.

Simpson’s Paradox The direction of association of the population may be the opposite of the direction of association of its relevant subgroups

Describing Categorical Data - Activity Male Female Total Finance 200 130 330 Accounting 260 280 540 Marketing 240 300 540 Total 700 710 1410 1. What percentage of students chose marketing? 2. What percentage of finance students are female? 3. What percentage of male students chose accounting?

Displaying Quantitative Data • Typically presented in a histogram, stem/leaf plot or boxplot • Mean = “Center of Mass” • Median = Middle • Mode = Most Frequent Observation • Range = Max – Min ෌ 𝑦 𝑗 −𝜈 2 • Standard Deviation = 𝜏 = 𝑂 • Variance = 𝜏 2 • IQR = Q3 – Q1

Histograms vs Stem-and-Leaf • Shows individual values • Only shows distribution • Excellent for displaying large • Impractical for large datasets datasets

Drawing Boxplots 1. Get a five-number summary (Max, Q3, Median, Q1, Min) 2. Calculate IQR 3. Find Inner Fences, but do not plot them 1. Q3 + 1.5IQR 2. Q1 – 1.5IQR Find 4. Grow whiskers to most extreme values in the fences 5. Show outliers 1. (convention uses ○ for outliers within the fences and * for outliers outside the fences) 6. Use Excel

Effect of Changing Values Activity Suppose you’ve drawn a boxplot with the following data: Min = 10; Q1 = 20; Med = 35; Q3 = 45; Max = 85 There was an error and the max was actually only 75. How does this effect: • The mean? • The median? • The range? • The IQR?

Scatterplots, Correlation and Linear Regression • Correlation (r): How strong is the linear clustering around a line? • Only for quantitative data with a linear pattern • Use “Association” for Categorical Variables as its less descriptive • - 1 ≤ r ≤ 1 • Correlation is unitless • The correlation of variables X and Y = The correlation of variables Y and X • Correlation does not necessarily imply causation. • Lurking variables: a third variable that causes both X and Y • Extrapolation: extending results beyond the range of data provided

Lurking Variables and Extrapolation

How to find a regression line • Slope of the estimated regression equation: 𝑻 𝒛 𝒄 𝟐 = 𝒔( ) 𝑻 𝒚 • Predicted value from regression equation: ഥ 𝒛 = 𝒄 𝟏 + 𝒚𝒄 𝟐 • R-Squared ( 𝑠 2 ) is the % of variation in the y value that the model can explain • Always takes on values from 0 to 1 inclusive

Residual Plots • Residual = Observation – Predicted @ each point • Good fit if there is a symmetric horizontal band around x = 0 • If it is curved, then the data is not a linear trend • If the residuals create a linear trend, there’s either a problem with your algebra or you need to take a root of the observations

Residual Plots - Example Homoscedastic = Constant variance around model. This is good. Heteroscedastic = Non-constant variance around model. This violates several assumptions about linear inference later on. Bias = Residuals form a line/curve pattern around x=0. Indicator that the linear model is not a good fit OR data should be transformed

Correlation and Linear Regression Activity • Is there a relationship between an NFL team’s total spending (in millions) on player salary and its league performance? A linear model predicting Wins (out of 16 regular season games) is shown below: 𝑥 = −16.32 + 0.219𝑡 ෝ A. What is the explanatory variable? What is the response variable? B. What does the slope mean? What does the y-intercept mean? C. Does a team that spends 130 million and wins 13 games over or underperform the model’s prediction? D. The residual SD is 3 games. How practical is this model?

Combining Random Variables E(x) = Expected Value of a Random Variable (think mean) σ = Standard Deviation for Random Variables E(X±Y) =E(X) ± E(Y). Does not require independence E(aX) = aE(x) Var(aX) = a 2 Var(x) SD(aX) = |a|SD(x) Var(X±Y) = Var(X) + Var(Y). Requires Independence! SD(X±Y) = Var(X) + Var(Y) . Requires Independence!

Combining Random Variables Activity Variable B has an expected value of 9.6 and an SD of 0.8. Variable C has an expected value of 30 and SD of 2.2. Find E(24B + C) and SD(24B + C)

Normal Distribution + Empirical Rule What is the total area under the Standard Deviations are measures of spread curve? We use Z-Scores to standardize different obs. Does this curve extend beyond 3 SD? Informally known as a “bell” curve

Finding Probabilities Activity 𝒚 −𝝂 Calculate Z = or use NORM.DIST(x, mu, sigma, true) 𝝉 Given 𝜈 = 85 and 𝜏 = 15, calculate the following: 1. X < 90 2. X > 105 3. 80 < x < 100

Finding X Activity Use NORM.INV(p, mu, sigma) to return the value of x such that the area to the left will have the value p Given a test where 𝜈 = 75 and 𝜏 = 9, calculate the following: 1. What score will put you in the top 5% of the class 2. What score will put you in the bottom 30%

Central Limit Theorem • The mean of a random samples has a sampling distribution that is approximated by a normal distribution • More samples = better! • Has implications for probabilities for samples of proportions and means

Ƹ Sampling Distributions for Proportions • Only for binary categorical data • Sample Proportion Ƹ 𝑞 , Population Proportion is 𝑞 𝑞𝑟 • 𝑇𝐸 𝑞 = 𝑜 ො 𝑞 −𝑞 • 𝑎 = 𝑇𝐸 ො 𝑞 • 10%, Success/Fail, Independence, Sample Size Assumptions

ҧ ҧ ҧ Sampling Distributions for Means • Only for quantitative data • Sample Mean ҧ 𝑦 , Population Proportion is 𝜈 𝜏 • 𝑇𝑢𝑏𝑜𝑒𝑏𝑠𝑒 𝐹𝑠𝑠𝑝𝑠 𝑦 = 𝑜 𝑦−𝜈 • 𝑎 = 𝑇𝐸 𝑦 • If the population is normal, the sample is normal • If the population is not normal, but conditions are met then the distribution will be approximately normal by the central limit theorem (same conditions are proportions)

COMM 291 Midterm Review Session By Simon Roberts Types of - PowerPoint PPT Presentation

COMM 291 Midterm Review Session By Simon Roberts Types of Variables Categorical Variable: Variable names fall into bins or categories Binary Variable: There are exactly 2 options (true/false) Nominal: Variables are simply named

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

04/11/2015 Rhino Oil & Gas Exploration South Africa (Pty) Ltd. Proposed exploration

2012 UBS Conference 11 October 1 Liberty has established capabilities to underpin sustainable

Presented by: Joan Waters Director of Financial Aid/Recruiting 334-291-4914 joan.waters@cv.edu

Communication Assessment 2016 Action Plans from Previous Rounds COMM 20 and COMM 45 (From ILO

Jason Paul Fair Mechanical Option Howard Comm Howard Commun unity Co ity College llege

Comm unity Schools September 1, 2015 If you want to go fast, go alone. If you want to go far,

DNA Interaction Follow Network Network User-Product Network Nonuniform network comm costs

District Popcorn Kickoff Council Per Boy Sales 1.Black Hills Area Rapid City, SD $291.61

Q3FY19 Standalone Highlights Q3FY19 LOANS NIM* P A T NET NPA ` 1,291 cr 4.33% ` 196,432 cr

CROW TRIBE WATER RIGHTS SETTLEMENT OF 2010 TITLE IV of P.L. 111-291, December 8, 2010 Doug

2017 Salida Signalized Intersection Improvements 1 st Street (CO 291)/ F Street US 50 /CR 111

TOWARDS A SPATIAL THEORY OF ORGANIZATIONS. Principles and practices of modern organizational

WEBJET LIMITED FY18 RESULTS PRESENTATION JOHN GUSCIC, Managing Director TONY RISTEVSKI, Chief

LicH BuiLdiNg 5 347 HeNrY Street BrookLYN BLock 291, Lot 1 12.19.2018 PrePAred BY: roMiNeS

www.locallink.ie Local Link Kildare South Dublin currently operate 291 scheduled trips on 45

Mathematics 101: Data Collection and Sampling Techniques Olive R. Cawiding Department of

Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew

On the Long Way of Reforms: The System of RA Local Government in the Eyes of Citizens Presentation

MUST Project Team Me e ting in Ulaanbaatar on Se pte mbe r 10- 14, 2018 Review on Existing

High Resolution Mapping of Fertility and Mortality from National Household Survey Data in Low

Ideas4Work YOUTH EMPLOYABILITY and ENTREPRENEURSHIP in AFRICA Dakar, Senegal January 23-25, 2013

The Role of Research Institutions in the Formation of the Biotech Cluster in Massachusetts Lita

How to build scalable, reliable and stable Kubernetes cluster atop OpenStack Bo Wang

Sambuz

Useful Links

Newsletter

Mail Us

COMM 291 Midterm Review Session By Simon Roberts Types of - PowerPoint PPT Presentation

COMM 291 Midterm Review Session By Simon Roberts Types of Variables Categorical Variable: Variable names fall into bins or categories Binary Variable: There are exactly 2 options (true/false) Nominal: Variables are simply named

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

04/11/2015 Rhino Oil &amp; Gas Exploration South Africa (Pty) Ltd. Proposed exploration

2012 UBS Conference 11 October 1 Liberty has established capabilities to underpin sustainable

Presented by: Joan Waters Director of Financial Aid/Recruiting 334-291-4914 joan.waters@cv.edu

Communication Assessment 2016 Action Plans from Previous Rounds COMM 20 and COMM 45 (From ILO

Jason Paul Fair Mechanical Option Howard Comm Howard Commun unity Co ity College llege

Comm unity Schools September 1, 2015 If you want to go fast, go alone. If you want to go far,

DNA Interaction Follow Network Network User-Product Network Nonuniform network comm costs

District Popcorn Kickoff Council Per Boy Sales 1.Black Hills Area Rapid City, SD $291.61

Q3FY19 Standalone Highlights Q3FY19 LOANS NIM* P A T NET NPA ` 1,291 cr 4.33% ` 196,432 cr

CROW TRIBE WATER RIGHTS SETTLEMENT OF 2010 TITLE IV of P.L. 111-291, December 8, 2010 Doug

2017 Salida Signalized Intersection Improvements 1 st Street (CO 291)/ F Street US 50 /CR 111

TOWARDS A SPATIAL THEORY OF ORGANIZATIONS. Principles and practices of modern organizational

WEBJET LIMITED FY18 RESULTS PRESENTATION JOHN GUSCIC, Managing Director TONY RISTEVSKI, Chief

LicH BuiLdiNg 5 347 HeNrY Street BrookLYN BLock 291, Lot 1 12.19.2018 PrePAred BY: roMiNeS

www.locallink.ie Local Link Kildare South Dublin currently operate 291 scheduled trips on 45

Mathematics 101: Data Collection and Sampling Techniques Olive R. Cawiding Department of

Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew

On the Long Way of Reforms: The System of RA Local Government in the Eyes of Citizens Presentation

MUST Project Team Me e ting in Ulaanbaatar on Se pte mbe r 10- 14, 2018 Review on Existing

High Resolution Mapping of Fertility and Mortality from National Household Survey Data in Low

Ideas4Work YOUTH EMPLOYABILITY and ENTREPRENEURSHIP in AFRICA Dakar, Senegal January 23-25, 2013

The Role of Research Institutions in the Formation of the Biotech Cluster in Massachusetts Lita

How to build scalable, reliable and stable Kubernetes cluster atop OpenStack Bo Wang

Sambuz

Useful Links

Newsletter

Mail Us

04/11/2015 Rhino Oil & Gas Exploration South Africa (Pty) Ltd. Proposed exploration