CSV Files

This section is based in part on chapter 16 in Sweigart’s Automate the Boring Stuff with Python (second edition).

Introduction to CSV files

CSV stands for comma-separated values. A CSV file can be opened in Google Sheets or Excel and will be formatted as a spreadsheet. However, a CSV file is actually a plain-text file. It can also be opened with a text editor program such as Atom.

CSVs give us a good, simple way to organize data without using a database program. It’s easy to read from and write to CSV files with Python.

The csv module in Python

This is a built-in module, so you do not need to install it, However, you must import it in any script that uses it.

import csv

After the import, you can use any of the methods that are part of the module:

csv.reader()
csv.writer()
csv.DictReader()
csv.DictWriter()

Note that using methods from the csv module will also involve use of Python’s file handling functions, such as .open(). These are covered in Reading and Writing Files here.

Writing to a CSV

This creates a new CSV file and fills it, row by row.

../python_code_examples/csvs/write_csv.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
"""template script for writing to a csv file"""
import csv

# open new file for writing - will erase file if it already exists -
csvfile = open('example.csv', 'w', newline='', encoding='utf-8')

# make a new variable - c - for Python's CSV writer object -
c = csv.writer(csvfile)

# write a column headings row - do this only once -
c.writerow( ['name','address','job'] )

# call some function that returns a list --
# for purposes of the template, here is a list of lists -
the_list = [
    ['Alice', 'Gainesville, Florida', 'chef'],
    ['Bob', 'Chicago, Illinois', 'writer'],
    ['Ted', 'Miami, Florida', 'driver'],
    ['Carol', 'Portland, Oregon', 'executive']
]

# use a for-loop to write each row into the CSV file
for item in the_list:
    # write one row to csv — item MUST BE a LIST
    c.writerow(item)

# save and close the file
csvfile.close()

Note that lines 13–25 above would normally be replaced by a for-loop that incorporates a function that creates one row to be written to the CSV.

That function might be scraping hundreds of web pages one at a time, for example. In that case, what is scraped from one page is written as one row in the CSV.

Note the steps required:

  1. Import the module.

  2. Open a file for writing.

  3. Create a CSV writer object and assign it to a new variable.

  4. Write the header row into the CSV.

  5. Write all the other rows into the CSV. Normally this will involve a for-loop.

  6. Close the file.

Reading from a CSV

The following example script uses a CSV file named presidents.csv. It contains 46 rows: one row for each U.S. president, plus a header row at the top.

The script opens the CSV and then can get all rows from it.

CSV file in Excel screenshot

Above, the top rows of the CSV file as seen in Excel.

../python_code_examples/csvs/read_csv.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
"""template script for reading from a csv file"""
import csv

# open an existing file for reading -
csvfile = open('presidents.csv', newline='')

# make a new variable - c - for Python's CSV reader object -
c = csv.reader(csvfile)

# read whatever you want from the reader object
# print it or use it any way you like
for row in c:
    print( row[1] + ", " + row[5])

# save and close the file
csvfile.close()

Note that in line 13 above, row is a Python list, and so we can use list indexes to get only the second item — row[1] — and the sixth item — row[5]. When we use csv.reader(), each row from the CSV file is a Python list of strings.

The code above will print 46 lines, starting like this:

result of code screenshot

Note the steps required:

  1. Import the module.

  2. Open a file for reading.

  3. Create a CSV reader object and assign it to a new variable.

  4. Use a for-loop to read from all rows in the CSV.

  5. Close the file.

Reading into a dictionary

The csv.DictReader() method is used to convert a CSV file to a Python dictionary. You read from an existing CSV and create a Python dictionary from it. Note, the CSV file is unchanged, and the dictionary does not exist as a separate file.

../python_code_examples/csvs/dictreader_ex.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
"""template script for creating a dict from a csv file"""
import csv

# open an existing file for reading -
csvfile = open('presidents.csv', newline='')

# make a new variable - c - for Python's DictReader object -
c = csv.DictReader(csvfile)

# read whatever you want from the DictReader object
# using the column headings from the CSV as the dict keys
for row in c:
    print(row['President'] + " ... " + row['Party'])

# save and close the file
csvfile.close()

The key difference between this and the previous script here is in line 13: You can access data from the CSV using dictionary keys instead of list indexes. For working with a CSV with a lot of columns, this is really nice!

You can read generally about Python dictionaries here: Dictionaries

See also: Converting a CSV to a dictionary

Writing from a dictionary

The csv.DictWriter() method will write to a CSV file using rows of data that are already formatted as dictionaries.

If your data is already a list of dictionaries, as in the following example, you can use csv.DictWriter() to write to a normal CSV.

../python_code_examples/csvs/dictwriter_ex.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
"""template script for creating a csv file from a dictionary"""
import csv

# sample of a list of dictionaries
# normally this would be in a separate file, or
# constructed from data in a separate file
presidents_list = [
{"Presidency":1,"President":"George Washington","Wikipedia_entry":"http://en.wikipedia.org/wiki/George_Washington","Took_office":"4/30/1789","Left_office":"3/4/1797","Party":"Independent ","Home_state":"Virginia","Occupation":"Planter","College":"None","Age_when_took_office":57,"Birth_date":"2/22/1732","Birthplace":"Westmoreland County, Virginia","Death_date":"12/14/1799","Location_death":"Mount Vernon, Virginia"},
{"Presidency":2,"President":"John Adams","Wikipedia_entry":"http://en.wikipedia.org/wiki/John_Adams","Took_office":"3/4/1797","Left_office":"3/4/1801","Party":"Federalist ","Home_state":"Massachusetts","Occupation":"Lawyer","College":"Harvard","Age_when_took_office":61,"Birth_date":"10/30/1735","Birthplace":"Quincy, Massachusetts","Death_date":"7/4/1826","Location_death":"Quincy, Massachusetts"},
{"Presidency":3,"President":"Thomas Jefferson","Wikipedia_entry":"http://en.wikipedia.org/wiki/Thomas_Jefferson","Took_office":"3/4/1801","Left_office":"3/4/1809","Party":"Democratic-Republican ","Home_state":"Virginia","Occupation":"Planter, Lawyer","College":"William and Mary","Age_when_took_office":57,"Birth_date":"4/13/1743","Birthplace":"Albemarle County, Virginia","Death_date":"7/4/1826","Location_death":"Albemarle County, Virginia"},
{"Presidency":4,"President":"James Madison","Wikipedia_entry":"http://en.wikipedia.org/wiki/James_Madison","Took_office":"3/4/1809","Left_office":"3/4/1817","Party":"Democratic-Republican ","Home_state":"Virginia","Occupation":"Lawyer","College":"Princeton","Age_when_took_office":57,"Birth_date":"3/16/1751","Birthplace":"Port Conway, Virginia","Death_date":"6/28/1836","Location_death":"Orange County, Virginia"},
{"Presidency":5,"President":"James Monroe","Wikipedia_entry":"http://en.wikipedia.org/wiki/James_Monroe","Took_office":"3/4/1817","Left_office":"3/4/1825","Party":"Democratic-Republican ","Home_state":"Virginia","Occupation":"Lawyer","College":"William and Mary","Age_when_took_office":58,"Birth_date":"4/28/1758","Birthplace":"Westmoreland County, Virginia","Death_date":"7/4/1831","Location_death":"New York, New York"}
]

# open a new file for writing - if file exists, contents will be erased
csvfile = open('new_file.csv', 'w')

# set the headers
headers = ['Presidency', 'President', 'Wikipedia_entry', 'Took_office', 'Left_office', 'Party', 'Home_state', 'Occupation', 'College', 'Age_when_took_office', 'Birth_date', 'Birthplace', 'Death_date', 'Location_death']

# make a new variable - c - for Python's DictWriter object -
# note that fieldnames is required
c = csv.DictWriter(csvfile, fieldnames=headers)

# optional - write a header row
c.writeheader()

# write all rows from list to file
c.writerows(presidents_list)

# save and close file
csvfile.close()

JSON formatted data

Sweigart covers this in chapter 16 in Automate the Boring Stuff with Python (second edition).

Python has a separate built-in module for handling JSON data.

import json

After the import, you can use any of the methods that are part of the module:

new_dict = json.loads(json_string)
new_json = json.dumps(python_data)

Using json.loads() converts a string of JSON data into a Python dictionary. In the example above, new_dict would be that new Python dictionary.

Using json.dumps() converts a Python value into a string of JSON data. In the example above, new_json would be that new string.

Chapter review: chapter 16

Key points

  1. Call the open() function to return a File object for reading or for writing

  2. When and where to add parameters such as 'w', newline='', encoding='utf-8'

  3. When to use each one of these (all have different purposes):

    • csv.reader()

    • csv.writer()

    • csv.DictReader()

    • csv.DictWriter()

  4. How to use:

    • .writerow()

    • .writerows()

  5. The purpose of the built-in json module

Slides: chapters 5 and 16

Python Review 4

.