data handling import cleaning and visualisation
play

Data Handling: Import, Cleaning and Visualisation Lecture 7: Data - PowerPoint PPT Presentation

9/12/2019 Data Handling: Import, Cleaning and Visualisation Data Handling: Import, Cleaning and Visualisation Lecture 7: Data Sources, Data Gathering, Data Import Prof. Dr. Ulrich Matter 24/10/2019 file:///home/umatter/Dropbox/T


  1. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Data Handling: Import, Cleaning and Visualisation Lecture 7: Data Sources, Data Gathering, Data Import Prof. Dr. Ulrich Matter 24/10/2019 file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 1/54

  2. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Recap: Programming with Data file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 2/54

  3. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Loops · Repeatedly execute a sequence of commands. · Known or unknown number of iterations. · Types: ‘for-loop’ and ‘while-loop’. - ‘for-loop’: number of iterations typically known. - ’while-loop: number of iterations typically not known. file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 3/54

  4. 9/12/2019 Data Handling: Import, Cleaning and Visualisation for-loop file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 4/54

  5. 9/12/2019 Data Handling: Import, Cleaning and Visualisation while-loop file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 5/54

  6. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Booleans and logical statements 2+2 == 4 ## [1] TRUE 3+3 == 7 ## [1] FALSE 4!=7 ## [1] TRUE file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 6/54

  7. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Booleans and logical statements condition <- TRUE if (condition) { print("This is true!") } else { print("This is false!") } ## [1] "This is true!" file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 7/54

  8. 9/12/2019 Data Handling: Import, Cleaning and Visualisation R functions · f : X → Y · ‘Take a variable/parameter value as input and provide value as X Y output’ · For example, . 2 × X = Y · R functions take ‘parameter values’ as input, process those values according to a predefined program, and ‘return’ the results. file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 8/54

  9. 9/12/2019 Data Handling: Import, Cleaning and Visualisation R functions # define our own function to compute the mean, given a numeric vector my_mean <- function(x) { x_bar <- sum(x) / length(x) return(x_bar) } file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 9/54

  10. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Today: Putting it All Together file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 10/54

  11. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Putting it all together · You know what ‘data’ is … · You know how digital data is stored … · You know how to write computer code … · You know the basics of programming in R … These are the basics to handel data properly! This is the fundament of data science! file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 11/54

  12. 9/12/2019 Data Handling: Import, Cleaning and Visualisation We are ready to start the data science journey The first key bottleneck in the data pipeline: Gather and import the data! file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 12/54

  13. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Sources/formats in economics file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 13/54

  14. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Sources/formats in economics · CSV (typical for rectangular/table-like data) · Variants of CSV (tab-delimited, fix length etc.) · XML and JSON (useful for complex/high-dimensional data sets) · HTML (a markup language to define the structure and layout of webpages) · Unstructured text file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 14/54

  15. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Sources/formats in economics · Excel spreadsheets ( .xls ) · Formats specific to statistical software packages (SPSS: .sav , STATA: .dat , etc.) · Built-in R datasets · Binary formats file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 15/54

  16. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Data Gathering Procedure file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 16/54

  17. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Organize your data pipeline! · One R script to gather/import data. · The beginning of your data pipeline! file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 17/54

  18. 9/12/2019 Data Handling: Import, Cleaning and Visualisation A Template/Blueprint Tell your future self what this script is all about ####################################################################### # Data Handling Course: Example Script for Data Gathering and Import # # Imports data from ... # Input: links to data sources (data comes in ... format) # Output: cleaned data as CSV # # U. Matter, St. Gallen, 2018 ####################################################################### file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 18/54

  19. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Script sections · Recall: programming tasks can often be split into smaller tasks. · Use sections to implement task-by-task and keep order. · In RStudio: Use ---------- to indicate the beginning of sections. · Start with a ‘meta’-section. file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 19/54

  20. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Script sections ####################################################################### # Data Handling Course: Example Script for Data Gathering and Import # # Imports data from ... # Input: links to data sources (data comes in ... format) # Output: cleaned data as CSV # # U. Matter, St. Gallen, 2018 ####################################################################### # SET UP -------------- # load packages library(tidyverse) # set fix variables INPUT_PATH <- "/rawdata" OUTPUT_FILE <- "/final_data/datafile.csv" file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 20/54

  21. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Script sections Finally we add sections with the actual code (in the case of a data import script, maybe one section per data source) ####################################################################### # Project XY: Data Gathering and Import # # This script is the first part of the data pipeline of project XY. # It imports data from ... # Input: links to data sources (data comes in ... format) # Output: cleaned data as CSV # # U. Matter, St. Gallen, 2018 ####################################################################### # SET UP -------------- # load packages library(tidyverse) # set fix variables INPUT_PATH <- "/rawdata" OUTPUT_FILE <- "/final_data/datafile.csv" file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 21/54

  22. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Loading/Importing Rectangular Data file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 22/54

  23. 9/12/2019 Data Handling: Import, Cleaning and Visualisation Loading built-in datasets In order to load such datasets, simply use the data() -function: data(swiss) file:///home/umatter/Dropbox/T eaching/HSG/datahandling/datahandling/materials/slides/html/07_data_import.html#1 23/54

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend