CSV Operations 101 — with Python's Own `csv` Module
There is a great chance that you have come across, or have to work with a CSV (comma-separated values) file. A CSV file holds data similar to a spreadsheet. It separates the values with a delimiter character, so it does not always have to be a comma. However, since it is a comma-separated values file, let's see an example as such:
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
As in the example above, the very first row (first,last,house
) can be used for headers as column names.
Python conveniently has a built-in csv
module to work with CSV files in an efficient way. The two basic operations are reading and writing; with Python, we can do these operations using lists (or, more generally, any iterable), or dictionaries.
csv.reader()
To read a CSV file, one option is to use csv.reader()
method. Let's see an example for our simple students.csv
file:
import csv
with open('students.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if row == ['first', 'last', 'house']:
continue
print(f'{row[0]} {row[1]} is in {row[2]}.')
# Harry Potter is in Gryffindor.
# Hermione Granger is in Gryffindor.
# Luna Lovegood is in Ravenclaw.
# Draco Malfoy is in Slytherin.
import csv
with open('students.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if row == ['first', 'last', 'house']:
continue
print(f'{row[0]} {row[1]} is in {row[2]}.')
# Harry Potter is in Gryffindor.
# Hermione Granger is in Gryffindor.
# Luna Lovegood is in Ravenclaw.
# Draco Malfoy is in Slytherin.
We use a context manager to open our file. (Note that we do not need to specify the 'r'
argument as the mode for the reading operation because it is the default; see the official docs.)
When we use csv.reader()
, it returns a reader object, which we store in the reader
variable. Then, with a for
loop, we iterate through each row, which is a list — and, because our first row is actually the headers, we pass that iteration and continue. Actually, csv.DictReader()
has better usability for this, which we will see later on. One important point here is that each row is a list
containing three elements, and we access them with indexing (row[0]
, row[1]
, row[2]
). This is not the most elegant implementation, but a simple enough example to see how csv.reader()
works.
csv.writer()
Let's say we want to add Ron Weasley to our CSV file, because we want our Golden Trio to be together. Let's see how we can go about doing it:
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Ron', 'Weasley', 'Gryffindor'])
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Ron', 'Weasley', 'Gryffindor'])
Here, we are again using a context manager to open our CSV file, this time with the 'a'
argument as the mode for appending to it. After that, we use csv.writer()
, passing into it our file object. It returns a writer object, which we store in the writer
variable. Then, we call writerow()
method to add Ron Weasley. One important thing is that we pass a list
as an argument, although it does not have to be a list
— we could have used an iterable like a tuple, but using lists is more common.
Now, our file looks like:
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
csv.DictReader()
We can improve on our last example for reading our CSV file. Let's use csv.DictReader()
this time:
import csv
with open('students.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(f'{row["first"]} {row["last"]} is in {row["house"]}.')
# Harry Potter is in Gryffindor.
# Hermione Granger is in Gryffindor.
# Luna Lovegood is in Ravenclaw.
# Draco Malfoy is in Slytherin.
# Ron Weasley is in Gryffindor.
import csv
with open('students.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(f'{row["first"]} {row["last"]} is in {row["house"]}.')
# Harry Potter is in Gryffindor.
# Hermione Granger is in Gryffindor.
# Luna Lovegood is in Ravenclaw.
# Draco Malfoy is in Slytherin.
# Ron Weasley is in Gryffindor.
Like csv.reader()
, csv.DictReader()
returns a reader object, but this time, as the documentation says, it "maps the information in each row to a dict
whose keys are given by the optional fieldnames parameter."
If you have realized already, we have not specified the fieldnames parameter, so the first row is used as default fieldnames instead. We can see it with looking at the fieldnames
attribute of our reader object:
import csv
with open('students.csv') as csvfile:
reader = csv.DictReader(csvfile)
print(reader.fieldnames) # ['first', 'last', 'house']
import csv
with open('students.csv') as csvfile:
reader = csv.DictReader(csvfile)
print(reader.fieldnames) # ['first', 'last', 'house']
csv.DictReader()
definitely made the code more readable than the previous method before. Let's see how we can use a dictionary again, this time for writing to our file.
csv.DictWriter()
We have added Ron Weasley to our students.csv
before, let's add another Weasley this time, Ginny. Let's see how we can do it with csv.DictWriter()
:
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'])
writer.writerow({'first': 'Ginny', 'last': 'Weasley', 'house': 'Gryffindor'})
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'])
writer.writerow({'first': 'Ginny', 'last': 'Weasley', 'house': 'Gryffindor'})
We are again appending to our file, like we did with csv.writer()
. We create a writer object with csv.DictWriter()
, of course passing into it the file object itself, then assigning the writer object returned to the variable writer
. Notice here that we pass an argument called fieldnames
as well, which is a list containing, well, the fieldnames. It is not an optional argument, so we have to provide it each time we use csv.DictWriter()
.
We use the writerow()
method again, but this time, we pass it a dictionary whose keys are the fieldnames we have just specified.
Now our file looks like this:
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
Ginny,Weasley,Gryffindor
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
Ginny,Weasley,Gryffindor
Now, what if inside writerow()
you pass in a key that is not in fieldnames? Obviously, we will have a ValueError
, but let's see with an example. They say a house is not a home, but imagine that we mistakenly write Gryffindor as Ginny's home
, instead of her house
as we have in the fieldnames. Let's see:
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'])
writer.writerow({'first': 'Ginny', 'last': 'Weasley', 'home': 'Gryffindor'})
# ValueError: dict contains fields not in fieldnames: 'home'
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'])
writer.writerow({'first': 'Ginny', 'last': 'Weasley', 'home': 'Gryffindor'})
# ValueError: dict contains fields not in fieldnames: 'home'
This actually comes from the optional extrasaction
parameter inside csv.DictWriter()
. Its default value is 'raise'
, so when the key is not found in the fieldnames, it raises a ValueError
. However, we can force it to ignore it with passing the value 'ignore'
. So, if we do it:
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'], extrasaction='ignore')
writer.writerow({'first': 'Ginny', 'last': 'Weasley', 'home': 'Gryffindor'})
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'], extrasaction='ignore')
writer.writerow({'first': 'Ginny', 'last': 'Weasley', 'home': 'Gryffindor'})
In that case, we would have ignored the key altogether, resulting in our file looking like this:
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
Ginny,Weasley,
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
Ginny,Weasley,
This is not the wisest decision, though, it is better to have an error in such cases, unless we somehow have a reason to do otherwise.
We can also use the writerows()
method, to pass write more than one row. It takes an iterable as argument, so let's say we have two dictionaries in a list for the Weasley twins, we can add them both at once:
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'])
twins = [
{'first': 'Fred', 'last': 'Weasley', 'house': 'Gryffindor'},
{'first': 'George', 'last': 'Weasley', 'house': 'Gryffindor'}
]
writer.writerows(twins)
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'])
twins = [
{'first': 'Fred', 'last': 'Weasley', 'house': 'Gryffindor'},
{'first': 'George', 'last': 'Weasley', 'house': 'Gryffindor'}
]
writer.writerows(twins)
Now, our file looks like:
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
Ginny,Weasley,Gryffindor
Fred,Weasley,Gryffindor
George,Weasley,Gryffindor
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
Ginny,Weasley,Gryffindor
Fred,Weasley,Gryffindor
George,Weasley,Gryffindor
Quote constants
Especially when working with writer objects, we may need to specify how to quote fields — for example, to handle a situation where our fields might include the delimiter character itself. For this, we can use the optional quoting
parameter for the writer methods, along with quotechar
parameter to specify which character to use for quotes.
The csv
module has four constants to use for quoting
:
-
csv.QUOTE_ALL
: To quote all fields. -
csv.QUOTE_MINIMAL
: To quote the fields which have special characters (such as the delimiter itself). -
csv.QUOTE_NONNUMERIC
: To quote all non-numeric fields. -
csv.QUOTE_NONE
: To never quote fields. If the delimiter character occurs in one of the fields, then theescapechar
character is used. Ifescapechar
is not provided, it will raise an error.
Let's see a simple example with csv.QUOTE_ALL
using csv.DictWriter()
. As you can imagine, it is going to quote all the fields:
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'], quotechar='"', quoting=csv.QUOTE_ALL)
writer.writerow({'first': 'Cho', 'last': 'Chang', 'house': 'Ravenclaw'})
import csv
with open('students.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['first', 'last', 'house'], quotechar='"', quoting=csv.QUOTE_ALL)
writer.writerow({'first': 'Cho', 'last': 'Chang', 'house': 'Ravenclaw'})
Now, Cho Chang's field is all in quotes:
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
Ginny,Weasley,Gryffindor
Fred,Weasley,Gryffindor
George,Weasley,Gryffindor
"Cho","Chang","Ravenclaw"
# 📁 students.csv
first,last,house
Harry,Potter,Gryffindor
Hermione,Granger,Gryffindor
Luna,Lovegood,Ravenclaw
Draco,Malfoy,Slytherin
Ron,Weasley,Gryffindor
Ginny,Weasley,Gryffindor
Fred,Weasley,Gryffindor
George,Weasley,Gryffindor
"Cho","Chang","Ravenclaw"
Conclusion
The convenience that Python provides with its built-in modules is really valuable, and it is no different with the csv
module, too. This article explored very simple operations for reading and writing CSV files, but there is always a lot more to explore, and the first stop is always the official documentation. Hopefully, you now understand some of the basics of dealing with CSV files.
Happy coding.