CSV Files 1 / 10 "Comma"-Separated Values Files Say we - PowerPoint PPT Presentation

CSV Files 1 / 10

"Comma"-Separated Values Files Say we have data in a comma-separated values file $ cat capitals.dat # could be .dat, .csv, or anything Japan,Tokyo France,Paris Germany,Berlin U.S.A.,Washington, D.C Can use line-by-line file reading with the split() function we saw earlier to process comma-separated value files $ python >>> capitals = {} # initialize a dictionary to hold our capitals data >>> for line in open (’capitals.dat’, ’r’): # for each line in file ... k, v = line.split(’,’) # split into key and value ... capitals[k] = v # add key:value to dict ... Traceback (most recent call last): File "<stdin>", line 2, in <module> ValueError: too many values to unpack Why didn’t it work? 2 / 10

CSV Separator Characters We can troubleshoot in the Python interpreter >>> for line in open (’capitals.dat’, ’r’): print line.split(’,’) ... ... [’Japan’, ’Tokyo\n’] [’France’, ’Paris\n’] [’Germany’, ’Berlin\n’] [’U.S.A.’, ’Washington’, ’ D.C\n’] There’s a comma in Washington, D.C. that was taken as a separator So let’s change the capitals.dat file to use semicolons as the separators $ cat capitals.dat Japan;Tokyo France;Paris Germany;Berlin U.S.A.;Washington, D.C 3 / 10

CSV Files in Practice Now our capitals.dat file is readable as a "comma"-separated values file >>> capitals = {} >>> for line in open (’capitals.dat’, ’r’): ... k, v = line.split(’;’) ... capitals[k] = v ... >>> capitals {’Japan’: ’ Tokyo\n’, ’U.S.A.’: ’ Washington, D.C\n’, ’Germany’: ’Berlin\n’, ’France’: ’ Paris\n’} ◮ But the values have leading whitespace and trailing \n characters from the data file 4 / 10

Manually Cleaning CSV Lines ◮ We can make our code more robust with strip(), which removes leading and trailing whitespace and non-printing chars >>> for line in open (’capitals.dat’, ’r’): ... k, v = line.split(’;’) ... capitals[k.strip()] = v.strip() ... >>> capitals {’Japan’: ’Tokyo’, ’U.S.A.’: ’Washington, D.C’, ’Germany’: ’Berlin’, ’France’: ’Paris’} But we don’t need to go to this trouble . . . 5 / 10

The csv Module The best way to process CSV files is with the csv module. >>> import csv >>> scripters = [ ... [’Perl’, ’Larry Wall’], ... [’Python’, ’Guido Van Rossum’], ... [’Ruby’, ’Yukihiro Matsumoto’] ... ] >>> with open (’scripters’, ’wt’) as fout: ... csvout = csv.writer(fout) ... csvout.writerows(scripters) ... >>> ^D $ cat scripters Perl,Larry Wall Python,Guido Van Rossum Ruby,Yukihiro Matsumoto ◮ The with statement creates a context manager ◮ After the with block ends, the file is automatically closed 6 / 10

Reading CSV Files We can read our scripters file with >>> import csv >>> with open (’scripters’, ’r’) as fin: ... csvin = csv.reader(fin) scripters = [line for line in csvin] ... ... >>> scripters [[’Perl’, ’Larry Wall’], [’Python’, ’Guido Van Rossum’], [’Ruby’, ’Yukihiro Matsumoto’]] >>> 7 / 10

Column Headers in CSV Files Use a DictReader to store the records from the CSV file in a dict. >>> import csv >>> with open (’scripters’, ’r’) as fin: ... csvin = csv.DictReader(fin, fieldnames=[’langauge’, ’creator’]) ... scripters = [line for line in csvin] ... >>> scripters [{’creator’: ’Larry Wall’, ’langauge’: ’Perl’}, {’creator’: ’Guido Van Rossum’, ’langauge’: ’Python’}, {’creator’: ’Yukihiro Matsumoto’, ’langauge’: ’Ruby’}] 8 / 10

Writing Dicts to CSV Files And we can use a DictWriter to write a CSV file with a header line. >>> with open (’scripters’, ’w’) as fout: ... csvout = csv.DictWriter(fout, fieldnames=[’langauge’, ’creator’]) ... csvout.writeheader() ... csvout.writerows(scripters) ... >>> ^D $ cat scripters langauge,creator Perl,Larry Wall Python,Guido Van Rossum Ruby,Yukihiro Matsumoto 9 / 10

CSV Details CSV files can be complex. ◮ Different delimiters can be used. ◮ Delimiter characters can appear in fields. ◮ Fields can be surrounded with "quotes". ◮ Different operating systems may use different line endings. The CSV module handles all of these issues for you. Read the CSV module documentation to become familiar with its capabilities. 10 / 10

CSV Files 1 / 10 "Comma"-Separated Values Files Say we - PowerPoint PPT Presentation

CSV Files 1 / 10 "Comma"-Separated Values Files Say we have data in a comma-separated values file $ cat capitals.dat # could be .dat, .csv, or anything Japan,Tokyo France,Paris Germany,Berlin U.S.A.,Washington, D.C Can use

Lecture 12: Comma Separated Values (CSV) Format CSV CSV data representing financial information

Comma Police: The Design and Implementation of a CSV Library George Wilson Data61/CSIRO

CSV on the Web Intro to W3C CSV on the Web Specifications DDI Metadata Workshop Dagstuhl 2016

Manipulating Data Files in Python Learning Objectives Working with CSV files Reading

WE CREATE WE ARE ARTISANS WE AR WE ARE E CSV CSV. 1HFY2020 Interim results 2020.02.28

More than you ever wanted to know about CSV Digging into the CSV script Script Outline Load file

Course Business Two new datasets on CourseWeb: vocab.csv and bpd.csv Midterm grades &

Read, inspect, & clean data from csv files Importing & Managing Financial Data in Python

Importing text files Importing and Managing Financial Data in R getSymbols() with CSV files

Lecture 10: Temporary Files; Comma Separated Values (CSV) Format Temporary Files

Accessing Files in Python Learning Objectives Concepts about files in Python How to open

Interacting with Files Python Files Files Basic container of data in modern computing

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

Using files ITEC 1630 We save data in files on disk or some Week 9: Files & Streams

Course Business Midterm grades next week Two new datasets on CourseWeb: vocab.csv and

Basic Import & Update Mike Konovalske & Jason Swain Technical Team, Lucity General Tab

INFORMATION VISUALIZATION Alvitta Ottley Washington University in St. Louis CSE 557A: INFORMATION

Excel l as as a a Tool to Troubleshoot SIS IS Data for EMIS Reporting Overv ervie iew

OFFICE HOURS ESG-CV Reporting Prepared: September 22-24,2020 Call in If you are having audio

RDM + Conquaire RDM: A library perspective of versioning, curating and archiving research data

CS 105 SUMMER WEDNESDAY 6 What to talk about today? The with statement in more detail

Jan 30: How computers work; more histograms Storage is organized in directories (folders) and

CROWD-SOURCING Simin Chen Amazon Mechanical Turk Advantages On demand workforce

Sambuz

Useful Links

Newsletter

Mail Us

CSV Files 1 / 10 "Comma"-Separated Values Files Say we - PowerPoint PPT Presentation

CSV Files 1 / 10 "Comma"-Separated Values Files Say we have data in a comma-separated values file $ cat capitals.dat # could be .dat, .csv, or anything Japan,Tokyo France,Paris Germany,Berlin U.S.A.,Washington, D.C Can use

Lecture 12: Comma Separated Values (CSV) Format CSV CSV data representing financial information

Comma Police: The Design and Implementation of a CSV Library George Wilson Data61/CSIRO

CSV on the Web Intro to W3C CSV on the Web Specifications DDI Metadata Workshop Dagstuhl 2016

Manipulating Data Files in Python Learning Objectives Working with CSV files Reading

WE CREATE WE ARE ARTISANS WE AR WE ARE E CSV CSV. 1HFY2020 Interim results 2020.02.28

More than you ever wanted to know about CSV Digging into the CSV script Script Outline Load file

Course Business Two new datasets on CourseWeb: vocab.csv and bpd.csv Midterm grades &amp;

Read, inspect, &amp; clean data from csv files Importing &amp; Managing Financial Data in Python

Importing text files Importing and Managing Financial Data in R getSymbols() with CSV files

Lecture 10: Temporary Files; Comma Separated Values (CSV) Format Temporary Files

Accessing Files in Python Learning Objectives Concepts about files in Python How to open

Interacting with Files Python Files Files Basic container of data in modern computing

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

What is a Jar File? Java archive (jar) files are compressed files that can store one or many

Using files ITEC 1630 We save data in files on disk or some Week 9: Files &amp; Streams

Course Business Midterm grades next week Two new datasets on CourseWeb: vocab.csv and

Basic Import &amp; Update Mike Konovalske &amp; Jason Swain Technical Team, Lucity General Tab

INFORMATION VISUALIZATION Alvitta Ottley Washington University in St. Louis CSE 557A: INFORMATION

Excel l as as a a Tool to Troubleshoot SIS IS Data for EMIS Reporting Overv ervie iew

OFFICE HOURS ESG-CV Reporting Prepared: September 22-24,2020 Call in If you are having audio

RDM + Conquaire RDM: A library perspective of versioning, curating and archiving research data

CS 105 SUMMER WEDNESDAY 6 What to talk about today? The with statement in more detail

Jan 30: How computers work; more histograms Storage is organized in directories (folders) and

CROWD-SOURCING Simin Chen Amazon Mechanical Turk Advantages On demand workforce

Sambuz

Useful Links

Newsletter

Mail Us

Course Business Two new datasets on CourseWeb: vocab.csv and bpd.csv Midterm grades &

Read, inspect, & clean data from csv files Importing & Managing Financial Data in Python

Using files ITEC 1630 We save data in files on disk or some Week 9: Files & Streams

Basic Import & Update Mike Konovalske & Jason Swain Technical Team, Lucity General Tab