Data Management Not everything that can be counted counts, and not - - PowerPoint PPT Presentation

data management
SMART_READER_LITE
LIVE PREVIEW

Data Management Not everything that can be counted counts, and not - - PowerPoint PPT Presentation

Data Management Not everything that can be counted counts, and not everything that counts can be counted. Albert Einstein (Physicist) Golden rules for data tables 1. A row represents a unit All measurements of a unit should normally be


slide-1
SLIDE 1

Data Management

“Not everything that can be counted counts, and not everything that counts can be counted.”

Albert Einstein (Physicist)

slide-2
SLIDE 2

Golden rules for data tables

  • 1. A row represents a unit

– All measurements of a unit should normally be in the same row. – Different units must be in different rows. – Important to think about what your units are

slide-3
SLIDE 3

Golden rules for data tables

  • 2. If in doubt, add more rows

– If possible, use categorical (character) variables to indicate the independent effects (treatments, environments). – Repeat measurement (e.g. time series data) normally get individual rows (e.g. time is added as a column) – It is always easy to convert a long table to a wide table (Excel Pivot), but not vice versa.

slide-4
SLIDE 4

Golden rules for data tables

  • 3. Use strong IDs
slide-5
SLIDE 5

Weak IDs

slide-6
SLIDE 6

Strong IDs

slide-7
SLIDE 7

Golden rules for data tables

  • 4. A column represents a variable

– Each column is a different independent or dependent variable – Every column has to have a name

  • Don’t start names with symbols or numbers
  • Avoid duplicate columns names
  • Avoid units – keep them as meta data
slide-8
SLIDE 8

Golden rules for data tables

  • 5. Keep a metafile with information about your

datafile

– If possible, keep record of how your data was collected

  • latitude/longitude of sites, slope, aspect
  • who collected it

– Keep record of useful information

  • What each of your variable names stand for
  • Measurement units
  • resolution of spatial files
slide-9
SLIDE 9

Golden rules for data tables

  • 6. Modify your raw data entries with R scripts

– Easy to do a change something and re-run the analysis (e.g. with or without outliers) – Hunting down and fixing errors is efficient, because script leaves a perfect trail of what you did – Save yourself from repetitive tasks (that likely introduce errors)

slide-10
SLIDE 10

The Data Table Concept

Type 1: Multiple populations

Crop variety Sample of population that you want to learn something about Dependent variables

slide-11
SLIDE 11

The Data Table Concept

Type 2: Single populations

Independent variables

You can think of this representing a population: crop grown without fertilizer Dependent variable

slide-12
SLIDE 12

Variable/Data types

  • Nominal: qualitative measurement where categories or numbers ONLY label the object being measured or

identify the object as belonging to a category E.g. - Forest plots identified by 1-10 or by location

  • Qualitative categories: Low-Medium-High or Male/Female, etc.

Don’t calculate statistics – how do you take a mean of male/female?

  • Ordinal: quantitative measurement that indicates a relative amount, arranged in rank order, but DOES NOT

imply and equal distance between points E.g. – Ranking of growth performance of 10 trees, where 1 is worst and 10 is best Percentiles or Non-parametric statistics ONLY

  • Interval: quantitative measurement that indicates BOTH the order of magnitude AND implies equal

intervals between the measurements. NOTE: These measurements have ARBITRARY ZEROS E.g. – Temperature (◦C) All statistics allowed, but no × or ÷ (alternative % change)

  • Ratio: quantitative measurement where numbers indicate a measure with EQUAL intervals and a TRUE ZERO

E.g. – Precipitation (156mm) – Frequencies (counts of just about anything) All statistics allowed

slide-13
SLIDE 13

Variable/Data types

  • Discrete: values may only fall at particular points on the scale of

measurement and cannot exist between points E.g. Number of trees, number of cones, etc.

  • Continuous: values can fall anywhere on an unbroken scale of

measurements with real limits E.g. temperature, height, volume of fertilizer, etc.

slide-14
SLIDE 14

Learning Objectives - Lab 2

  • Learn a complete set of commands to automate

data preparation in R & SAS.

  • Work through some simplified examples to

understand how they can be applied

  • Try to apply scripts to your own data
  • If you run into problems with your own data: let’s

solve them together.