CSV module for Python

Introduction

This page describes the CSV module developed by Dave Cole. It provides a fast CSV reading module for Python.

The CSV module (with a much improved interface) is included with Python 2.3 or later.

Performance of the module has been measured using the highly scientific technique of comparing the following programs:

import string

for i in xrange(100000):
    string.split('1,2,3,4,5,6', ',')
 
import csv
p = csv.parser()
for i in xrange(100000):
    p.parse('1,2,3,4,5,6')

With my 1.1GHz Athlon and Python 2.0, the string split takes 1.136s and the CSV parser takes 1.032s.

My thanks to Skip Montanaro for providing the following examples.

CSV files can be syntactically more complex than simply inserting commas between fields. For example, if a field contains a comma, it must be quoted:

1,2,3,"I think, therefore I am",5,6

The fields returned by this example are:

['1', '2', '3', 'I think, therefore I am', '5', '6']

Since fields are quoted using quotation marks, you also need a way to escape them. In Microsoft created CSV files this is done by doubling them:

1,2,3,"""I see,"" said the blind man","as he picked up his hammer and saw"

Excel and Access quite reasonably allow you to place newlines in cell and column data. When this is exported as CSV data the output file contains fields with embedded newlines.

1,2,3,"""I see,""
said the blind man","as he picked up his
hammer and saw"

A single record is split over three lines with text fields containing embedded newlines. This is what happens when you pass that data line by line to the CSV parser.

ferret:/home/djc% python
Python 2.0 (#0, Apr 14 2001, 21:24:22) 
[GCC 2.95.3 20010219 (prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> import csv
>>> p = csv.parser()
>>> p.parse('1,2,3,"""I see,""')
>>> p.parse('said the blind man","as he picked up his')
>>> p.parse('hammer and saw"')
['1', '2', '3', '"I see,"\012said the blind man', 'as he picked up his\012hammer
 and saw']

Note that the parser only returns a list of fields when the record is complete.