Files, pathnames Steve Bagley somgen223.stanford.edu 1 Files have - PowerPoint PPT Presentation

Files, pathnames Steve Bagley somgen223.stanford.edu 1

Files have contents and a name • A file that contains R code will have a name such as test.r . • The base name is test . • The extension is .r . • The extension usually signals the file type. • You should use .r or .R for files that contain R code, and ‘.rmd’ or ‘.Rmd’ for files that contain R Markdown. Some possible extensions are .csv (for comma separated values), .txt or .text (for text files). somgen223.stanford.edu 2

File systems are hierarchical collections of directories and files • Files live in directories, also called folders. • Each directory can contain zero or more files, and zero or more directories. • Each directory, except the top, lives inside a single (parent) directory. • This produces a hierarchical, tree-like, organization. • Computer trees grow upside down. somgen223.stanford.edu 3

Naming files, macOS and Unix • Files are given full names using a path notation. • A sample path: /Users/betty/Documents/test.r • This file has base name test , with extension r . • It is located in the directory Documents , which is located in the directory betty , which is located in the directory Users . • The leading / refers to the highest level, or root, of the filesystem. somgen223.stanford.edu 4

• C:\Users\betty\Documents\test.r • C:\\Users\\sbagley\\Documents\\test.r • C:/Users/sbagley/Documents/test.r Naming files, Windows • Files are given full names using a path notation. • This file has base name test , with extension r . • It is located in the directory Documents , which is located in the directory betty , which is located in the directory Users . • The leading \ refers to the highest level, or root, of the filesystem. • When typing paths to R as strings, you must double the backslashes, or convert to forward slashes: somgen223.stanford.edu 5

The working directory • It is annoying to have to always type the full path to the directory. • In R and RStudio, you can set the working directory, which then becomes the default directory for all file operations. • When type a path that does not start with a slash, it is interpreted relative to the current working directory. • If you don’t set the working directory, it is (probably) your home directory, or the directory from which R was started. somgen223.stanford.edu 6

Using relative pathnames • A relative pathname does not start at the root directory. • It is interpreted with respect to the current (working) directory. • Example: data/file1.csv . • In this example, the file should be located in the data subdirectory of the current directory. • If you use relative directories, then you can easily move or rename the top-level project directory without breaking things. somgen223.stanford.edu 7

library (fs) Package fs for working with files and directories • The easiest way to write code that manipulates files and directories is to use the fs package. It is usually installed as part of tidyverse , but you have to load it separately. somgen223.stanford.edu 8

dir_ls () dir_ls : list all files and directories in a directory • dir_ls takes a number of optional arguments to control which files to return. somgen223.stanford.edu 9

dir_ls (path = "/Users/sbagley/temp", glob = "*.csv") Find all csv files in a directory • glob is jargon for a file name pattern. * matches anything. somgen223.stanford.edu 10

p1 <- path ("/Users/sbagley/temp/test.csv") path_dir (p1) [1] "/Users/sbagley/temp" path_file (p1) [1] "test.csv" path_ext (p1) [1] "csv" path_ext_set (p1, "tsv") / Users / sbagley / temp / test.tsv Getting the parts of a path • Using these function is much easier than writing your own functions to manipulate filenames as strings. somgen223.stanford.edu 11

p2 <- path ("~/temp/test.csv") path_expand (p2) / Users / sbagley / temp / test.csv Your home directory • The tilde ~ stands for your home directory. somgen223.stanford.edu 12

Combining all files in a directory • Sometimes your data are spread across multiple files in a directory. For example, there is one file for each of multiple runs of an experiment. • You want to combine them together into a single data frame. • You want some indicator of which run they came from. somgen223.stanford.edu 13

library (fs) my_files <- dir_ls (my_dir) ## these pathnames are very long. to view them here, remove the ## directory part path_file (my_files) [1] "file1.csv" "file2.csv" Step 1: listing all the files in a directory my_dir <- "~/sync/teaching/somgen223/website/data/multiple_runs" somgen223.stanford.edu 14

read_csv (my_files[2]) 2 DEF234 333 1 DKK7 < chr > < dbl > value gene # A tibble: 2 x 2 read_csv (my_files[1]) 12.5 9 1 ABC123 < dbl > < chr > value gene # A tibble: 2 x 2 2.2 Inspect those files 2 LEM9 somgen223.stanford.edu 15

9 # A tibble: 4 x 2 gene value < chr > < dbl > 1 ABC123 12.5 (new_df <- map_df (my_files, read_csv)) 2.2 4 LEM9 Step 2: read and combine into single data frame 2 DEF234 333 3 DKK7 • map_df applies the second argument, a function, to all the things in the first argument, here a list of files. • Each call to read_csv produces a data frame. • These data frames are glued together (stacked vertically) to form the answer. somgen223.stanford.edu 16

Elaboration: save the filename in the data frame • This first solution glues together all the rows, but now you don’t know which files each row came from. (If you are lucky, there will already be a column that indicates this.) • So we need to write a function that reads the csv from a file, and adds the filename, which contains the run number as part of the name, as a new column. somgen223.stanford.edu 17

How to define your own function • R has many built-in functions, but sometimes it is useful to define your own. • A function encapsulates a set of expressions that perform some conceptually meaningful chunk of work. This is especially useful when you need to repeat the chunk of work multiple times. somgen223.stanford.edu 18

add1 <- function (v){ 1 + v } add1 (0) [1] 1 add1 ( - 1 : 1) [1] 0 1 2 x <- 1 : 5 add1 (x) [1] 2 3 4 5 6 Defining a function somgen223.stanford.edu 19

add1 (x) [1] 2 3 4 5 6 How to evaluate a function call 1. Evaluate the argument, here, x . 2. Evaluate the function definition with the function’s argument v temporarily bound to the value of its argument. somgen223.stanford.edu 20

read_and_record_filename <- function (filename){ read_csv (filename) %>% mutate (filename = path_file (filename)) } A function to add the file name as a column somgen223.stanford.edu 21

2 DEF234 333 file2.csv # A tibble: 4 x 3 gene value filename < chr > < dbl > < chr > 1 ABC123 12.5 file1.csv (new_df2 <- map_df (my_files, read_and_record_filename)) file1.csv 3 DKK7 2.2 file2.csv 4 LEM9 9 Try map_df again somgen223.stanford.edu 22

1 ABC123 < dbl > 2.2 9 1 2 1 12.5 new_df2 %>% < dbl > 4 LEM9 < chr > value run_number gene # A tibble: 4 x 3 filename = NULL) ## remove filename column after we are done with it "(file)|(\\.csv)")), mutate (run_number = as.numeric ( str_remove_all (filename, 2 Now convert filename to run number 2 DEF234 333 3 DKK7 somgen223.stanford.edu 23

12.5 < dbl > 2.2 9 1 2 1 new_df2 %>% 1 ABC123 < dbl > 4 LEM9 < chr > value run_number gene # A tibble: 4 x 3 filename = NULL) ## remove filename column after we are done with it mutate (run_number = as.numeric ( str_extract (filename, "[0-9]+")), 2 Convert filename to run number, version 2 2 DEF234 333 3 DKK7 somgen223.stanford.edu 24

Reading data in other formats somgen223.stanford.edu 25

How to read different formats • This course has focused on csv-formatted files: the items on each row are separated by commas, and the header row, if present, follows the same format. • But there are other formats…. somgen223.stanford.edu 26

read_fwf read_csv read_csv2 read_tsv read_delim read_table readr package Function Notes separator is “,” separator is “;”, decimal point is “,” separator is “\t” ( TAB ) general case reads fixed-width fields separator is whitespace somgen223.stanford.edu 27

df <- read_csv ("file1.csv", colnames = c ("id", "weight", "height")) How to read a csv file that does not have a header line somgen223.stanford.edu 28

skip = 1) df <- read_csv ("file1.csv", colnames = c ("id", "weight", "height"), How to read a csv file that does has a header line that you don’t want somgen223.stanford.edu 29

df <- read_csv ("file1.csv", skip = 5) How to skip over some lines at the beginning of the csv file • This will skip the first 5 rows, and start reading on line 6. somgen223.stanford.edu 30

Other read_csv options • n_max is the maximum number of rows to read. This can be useful if you want to first work with just part of a very large file. • skip_empty_rows ( TRUE or FALSE ) controls whether to skip completely empty rows. If not skipped, they’ll produce NA values. somgen223.stanford.edu 31

Files, pathnames Steve Bagley somgen223.stanford.edu 1 Files have - PowerPoint PPT Presentation

Files, pathnames Steve Bagley somgen223.stanford.edu 1 Files have contents and a name A file that contains R code will have a name such as test.r . The base name is test . The extension is .r . The extension usually signals the

Accessing Files in Python Learning Objectives Concepts about files in Python How to open

Interacting with Files Python Files Files Basic container of data in modern computing

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

Using files ITEC 1630 We save data in files on disk or some Week 9: Files & Streams

Manipulating Data Files in Python Learning Objectives Working with CSV files Reading

Flat Files vs. DB Files So far, our PHP examples have

Indexed Files : Outline ! Introduction ! Indexed Files ! Full Index Organization ! Indexed

Multi-Indexed Files : Outline ! Introduction ! Inverted Files ! Multilist Files rasitjutrakul

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Show files during a presentation Explanation/How to do it Show files during a presentation Once

C3P: Context-Aware Crowdsourced Cloud Privacy Privacy Enhancing Technologies Symposium, 2014 1

PATHS & DIRECTORIES MCS 260 Fall 2020 Week 1 Discussion / FILES AND DIRECTORIES A file is a

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed File Systems Early networking and files Had FTP to transfer files Telnet to

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Files Files March 27, 2008 - Lecture

Intro to Perl Practical Extraction and Reporting Language CIS 218 Perl Syntax Perl is an

XPath Na v igation W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU Slashes

Cross-Site Request Forgeries (CSRF) & Path Traversal Professor Larry Heimann Web Application

DATA LINK LAYER (Functions, Error Correction & Framing) ECE 422-Data Communication &

Session 16 XPath 1 Objectives Understand XPath well enough to provide a background to jQuery

openFrameworks! What is openFrameworks? What is openFrameworks? oF is a software framework

create your own visual novel Ren'Py is a visual novel engine that helps you use words, images,

Introduction to Web Development Lecture 1 CGS 3066 Fall 2016 September 8, 2016 Why learn Web

Files, pathnames Steve Bagley somgen223.stanford.edu 1 Files have - PowerPoint PPT Presentation

Files, pathnames Steve Bagley somgen223.stanford.edu 1 Files have contents and a name A file that contains R code will have a name such as test.r . The base name is test . The extension is .r . The extension usually signals the

Accessing Files in Python Learning Objectives Concepts about files in Python How to open

Interacting with Files Python Files Files Basic container of data in modern computing

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

Using files ITEC 1630 We save data in files on disk or some Week 9: Files &amp; Streams

Manipulating Data Files in Python Learning Objectives Working with CSV files Reading

Flat Files vs. DB Files So far, our PHP examples have

Indexed Files : Outline ! Introduction ! Indexed Files ! Full Index Organization ! Indexed

Multi-Indexed Files : Outline ! Introduction ! Inverted Files ! Multilist Files rasitjutrakul

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Show files during a presentation Explanation/How to do it Show files during a presentation Once

C3P: Context-Aware Crowdsourced Cloud Privacy Privacy Enhancing Technologies Symposium, 2014 1

PATHS &amp; DIRECTORIES MCS 260 Fall 2020 Week 1 Discussion / FILES AND DIRECTORIES A file is a

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed File Systems Early networking and files Had FTP to transfer files Telnet to

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Files Files March 27, 2008 - Lecture

Intro to Perl Practical Extraction and Reporting Language CIS 218 Perl Syntax Perl is an

XPath Na v igation W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU Slashes

Cross-Site Request Forgeries (CSRF) &amp; Path Traversal Professor Larry Heimann Web Application

DATA LINK LAYER (Functions, Error Correction &amp; Framing) ECE 422-Data Communication &amp;

Session 16 XPath 1 Objectives Understand XPath well enough to provide a background to jQuery

openFrameworks! What is openFrameworks? What is openFrameworks? oF is a software framework

create your own visual novel Ren'Py is a visual novel engine that helps you use words, images,

Introduction to Web Development Lecture 1 CGS 3066 Fall 2016 September 8, 2016 Why learn Web

Using files ITEC 1630 We save data in files on disk or some Week 9: Files & Streams

PATHS & DIRECTORIES MCS 260 Fall 2020 Week 1 Discussion / FILES AND DIRECTORIES A file is a

Cross-Site Request Forgeries (CSRF) & Path Traversal Professor Larry Heimann Web Application

DATA LINK LAYER (Functions, Error Correction & Framing) ECE 422-Data Communication &