Web Analytics

CSV Files

Intermediate ~30 min read

CSV (Comma-Separated Values) is one of the most common data formats - every spreadsheet application can export to it, databases can import it, and it's human-readable. Python's built-in csv module handles all the tricky parts: quoting, escaping, different delimiters, and headers. Whether you're processing financial data, user records, or scientific measurements, mastering CSV is essential!

Reading CSV Files

The csv.reader() returns an iterator that yields each row as a list of strings. For more convenient access by column names, csv.DictReader() yields each row as a dictionary with column headers as keys. This makes code more readable and maintainable.

Output
Click Run to execute your code
reader vs DictReader:
csv.reader(f) - Rows as lists: row[0], row[1]
csv.DictReader(f) - Rows as dicts: row['name'], row['age']
Use reader for simple, positional access.
Use DictReader when columns have meaning - code is clearer!

Writing CSV Files

Use csv.writer() to write lists of values, and csv.DictWriter() to write dictionaries. Always open files with newline="" on Windows to prevent blank lines between rows. The writeheader() method writes column names.

Output
Click Run to execute your code
Windows Users! Always use newline="" when opening CSV files for writing: open("file.csv", "w", newline=""). Without this, you'll get extra blank lines between rows. This is because the csv module handles line endings itself.

Dialects and Options

Not all CSV files use commas! European files often use semicolons, tab-separated values (TSV) are common, and different applications quote fields differently. The csv module supports various dialects and lets you customize delimiter, quoting, and other options.

Output
Click Run to execute your code
Quoting Constants:
csv.QUOTE_MINIMAL - Only quote when needed (default)
csv.QUOTE_ALL - Quote all fields
csv.QUOTE_NONNUMERIC - Quote non-numeric fields
csv.QUOTE_NONE - Never quote (escape special chars)
Use QUOTE_ALL for maximum compatibility with other tools.

Practical Examples

In real projects, you'll process CSV data: calculate totals, filter rows, group by categories, add computed columns, and transform data. The csv module combines well with Python's built-in functions and collections for powerful data processing.

Output
Click Run to execute your code

Common Mistakes

1. Forgetting newline="" on Windows

# Wrong - creates extra blank lines on Windows!
with open("data.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(data)

# Correct - always use newline=""
with open("data.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(data)

2. Not handling commas in data

# Wrong - manual string building breaks on commas!
data = ["Alice", "New York, NY", "30"]
line = ",".join(data)  # "Alice,New York, NY,30" - 4 fields!

# Correct - let csv module handle quoting
with open("data.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(data)  # "Alice","New York, NY","30"

3. Assuming all fields are strings

# Wrong - CSV values are always strings!
with open("data.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        total = row['quantity'] * row['price']  # TypeError!

# Correct - convert to numbers
with open("data.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        total = int(row['quantity']) * float(row['price'])

4. Reading the same file twice

# Wrong - reader is exhausted after first loop!
with open("data.csv", "r") as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)
    for row in reader:  # Empty! Iterator exhausted
        process(row)

# Correct - read into list or open again
with open("data.csv", "r") as f:
    rows = list(csv.reader(f))  # Store in memory
for row in rows:
    print(row)
for row in rows:  # Works!
    process(row)

5. DictWriter with missing keys

# Wrong - KeyError if dict missing a field!
fieldnames = ["name", "age", "city"]
data = {"name": "Alice", "age": 30}  # Missing 'city'!

with open("data.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writerow(data)  # KeyError: 'city'

# Correct - use extrasaction and restval
with open("data.csv", "w", newline="") as f:
    writer = csv.DictWriter(
        f, fieldnames=fieldnames,
        extrasaction='ignore',  # Ignore extra keys
        restval=""  # Default for missing keys
    )
    writer.writerow(data)  # Works!

Exercise: Sales Report Generator

Task: Process sales data and generate a summary report CSV.

Requirements:

  • Read a sales CSV with columns: product, quantity, price
  • Calculate total revenue per product
  • Write a summary CSV with: product, total_quantity, total_revenue
  • Sort by revenue (highest first)
Output
Click Run to execute your code
Show Solution
import csv
import os
from collections import defaultdict

# Create sample sales data
sales_csv = """product,quantity,price
Widget,10,9.99
Gadget,5,19.99
Widget,8,9.99
Gadget,12,19.99
Thing,20,5.00
Widget,15,9.99
Thing,25,5.00"""

with open("sales.csv", "w") as f:
    f.write(sales_csv)

# Process sales data
totals = defaultdict(lambda: {"quantity": 0, "revenue": 0.0})

with open("sales.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        product = row["product"]
        qty = int(row["quantity"])
        price = float(row["price"])
        totals[product]["quantity"] += qty
        totals[product]["revenue"] += qty * price

# Convert to sorted list (by revenue, descending)
summary = [
    {"product": p, "total_quantity": d["quantity"], "total_revenue": f"{d['revenue']:.2f}"}
    for p, d in sorted(totals.items(), key=lambda x: -x[1]["revenue"])
]

# Write summary report
with open("summary.csv", "w", newline="") as f:
    fieldnames = ["product", "total_quantity", "total_revenue"]
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(summary)

# Display result
print("=== Sales Summary ===")
with open("summary.csv", "r") as f:
    print(f.read())

# Cleanup
os.remove("sales.csv")
os.remove("summary.csv")

Summary

  • Read with reader: csv.reader(f) - rows as lists
  • Read with DictReader: csv.DictReader(f) - rows as dicts
  • Write with writer: csv.writer(f) with writerow()
  • Write with DictWriter: csv.DictWriter(f, fieldnames=[])
  • Write header: writer.writeheader()
  • Batch write: writer.writerows(data)
  • Custom delimiter: delimiter=";" or delimiter="\t"
  • Windows fix: Always use newline="" when writing
  • Quoting: quoting=csv.QUOTE_ALL for safety
  • Convert types: CSV values are strings - convert with int/float

What's Next?

CSV is great for tabular data, but for more complex structures you'll need JSON. JSON supports nested objects, arrays, and multiple data types - perfect for configuration files, API responses, and hierarchical data. Let's learn how to read and write JSON with Python!