SLIDE 1
Principles of Data Management (for Biologists) Dr Joe Thorley - - PowerPoint PPT Presentation
Principles of Data Management (for Biologists) Dr Joe Thorley - - PowerPoint PPT Presentation
Principles of Data Management (for Biologists) Dr Joe Thorley R.P.Bio. Poisson Consulting August 14th, 2017 Introduction Biologists spends $1,000,000s of dollars collecting data with little regard for its management. Study Design Study
SLIDE 2
SLIDE 3
Study Design
Study design should preceed data management
◮ Identify question(s)
◮ what do we want to know and why?
◮ Assess existing data/understanding
◮ what do we already know?
◮ Develop field protocol
◮ how much will it cost? ◮ how useful is the answer likely to be?
SLIDE 4
Data Management
Once a study design has been developed data management begins. Data management cycles through the 10 stages of
- 1. data collection
- 2. data backup
- 3. data security
- 4. data digitization
- 5. data cleansing
- 6. data tidying
- 7. data documentation
- 8. data analysis
- 9. data reporting
- 10. data archiving
SLIDE 5
Data Collection
Field crews should be trained and informed and provided with standard protocols and data collection forms. Printed forms on waterproof paper provide a cheap robust solution.
SLIDE 6
Data Backup
Duplicate data as soon as possible. A smartphone camera is a simple way to duplicate data and sync to the cloud.
SLIDE 7
Data Security
Ensure the right people have access. Dropbox (https://www.dropbox.com) provides simple data security and sharing.
SLIDE 8
Data Digitization
Get the data into a useable electronic form. Excel is a useful data entry tool in the hands of a trained user.
SLIDE 9
Data Cleansing
Correct the inevitable errors. At best, errors add noise; at worse, they invalidate subsequent analyses!
SLIDE 10
Data Tidying
Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. Wickham 2014 SQLite (https://sqlite.org) is free, open-source, cross-platform, embedded database software.
SLIDE 11
Relational Data
From R For Data Science (http://r4ds.had.co.nz) available via CC BY-NC-ND 3.0 US.
SLIDE 12
Data Documentation
Data are just numbers and categories unless people know what they mean. A simple metadata table can provide a description and units for each variable Table Column Units Description Site Depth m The tidally corrected depth Visit Hour PST8PDT The hour of the visit
SLIDE 13
Data Analysis
Analytic code can be shared on GitHub (https://github.com).
SLIDE 14
GitHub bcgov
The province already has a GitHub account for sharing code.
SLIDE 15
Data Reporting
An answer only has value if decision-makers are aware of it. Zotero (https://www.zotero.org) is a free, easy-to-use tool to help you collect, organize, cite, and share your research sources. ResearchGate (https://www.researchgate.net) is a free way to share and discover research.
SLIDE 16
Data Archiving
Ensure others are able to use it in perpetuity. Zenodo (https://zenodo.org) is free, citeable, discoverable, long-term, with open, restricted and closed access options. Uses same cloud infrastructure as CERN’s own Large Hadron Collider (LHC) research data.
SLIDE 17
Summary
Data management requires trained personnel with an understanding
- f the principles but does not have to be expensive and pays for
itself many times over.
SLIDE 18
DFO
SLIDE 19
Parks
SLIDE 20
DataBC
The provincial government has DataBC.
SLIDE 21