Web Analytics

Working with Paths

Intermediate ~30 min read

File paths are different on Windows (C:\Users\name) vs Unix (/home/name). Python provides tools to handle paths in a cross-platform way. The traditional os.path module uses string functions, while the modern pathlib module (Python 3.4+) treats paths as objects. Understanding both makes you prepared for any codebase!

The os.path Module

The os.path module has been Python's path-handling solution since the beginning. It provides functions to join, split, normalize, and query paths. The key function is os.path.join() which uses the correct separator for the operating system.

Output
Click Run to execute your code
Essential os.path Functions:
os.path.join(a, b) - Join paths with correct separator
os.path.split(path) - Split into (directory, filename)
os.path.splitext(path) - Split into (name, extension)
os.path.exists(path) - Check if path exists
os.path.isfile(path) / isdir(path) - Check type
os.path.abspath(path) - Get absolute path

The pathlib Module (Modern)

The pathlib module, introduced in Python 3.4, represents paths as objects with methods and properties. This is now the recommended approach for new code. Paths can be joined with the / operator, making code more readable and Pythonic.

Output
Click Run to execute your code
Why Prefer pathlib?
- Object-oriented: methods on path objects, not functions
- Readable: Path("a") / "b" / "c" vs os.path.join("a", "b", "c")
- Convenient: properties like .name, .suffix, .parent
- Powerful: built-in glob(), read_text(), write_text()
- Type hints: better IDE support and static analysis

File and Directory Operations

Beyond just manipulating path strings, you'll often need to create directories, copy files, rename items, and delete things. The pathlib module handles most operations, but shutil is needed for copying files and recursively deleting directories.

Output
Click Run to execute your code
Destructive Operations! Be careful with shutil.rmtree() - it recursively deletes everything without confirmation. Always double-check the path before deleting. Consider using send2trash package to move files to trash instead.

Directory Traversal

Finding files in a directory tree is a common task. Use iterdir() for immediate contents, glob() for pattern matching, and os.walk() when you need full control over the traversal. The ** pattern enables recursive searching.

Output
Click Run to execute your code
Glob Patterns:
* - Match any characters (except path separator)
** - Match any path (recursive, crosses directories)
? - Match single character
[abc] - Match character set
*.py - All Python files in current directory
**/*.py - All Python files recursively

Common Mistakes

1. Hardcoding path separators

# Wrong - breaks on Windows!
path = "folder/subfolder/file.txt"
path = "folder" + "/" + "file.txt"

# Correct - use os.path.join or pathlib
import os
path = os.path.join("folder", "subfolder", "file.txt")

from pathlib import Path
path = Path("folder") / "subfolder" / "file.txt"

2. Not checking if path exists before operating

# Wrong - crashes if file doesn't exist!
from pathlib import Path
p = Path("maybe_exists.txt")
content = p.read_text()  # FileNotFoundError!

# Correct - check first
if p.exists():
    content = p.read_text()
else:
    content = ""

# Or use try/except
try:
    content = p.read_text()
except FileNotFoundError:
    content = ""

3. Forgetting parents=True for nested directories

# Wrong - fails if parent doesn't exist!
from pathlib import Path
Path("new/nested/dir").mkdir()  # FileNotFoundError!

# Correct - create parents too
Path("new/nested/dir").mkdir(parents=True)

# And exist_ok to avoid error if exists
Path("new/nested/dir").mkdir(parents=True, exist_ok=True)

4. Using strings instead of Path objects

# Mixing strings and Paths can cause issues
from pathlib import Path

# Wrong - string concatenation
base = Path("/home/user")
full = str(base) + "/file.txt"  # String, not Path!

# Correct - use / operator
full = base / "file.txt"  # Still a Path object

# Or convert at the end if needed
path_str = str(base / "file.txt")

5. Forgetting glob returns an iterator

# Wrong - iterator exhausted after first use!
from pathlib import Path
files = Path(".").glob("*.py")
print(f"Count: {len(list(files))}")
for f in files:  # Empty! Iterator already consumed
    print(f)

# Correct - convert to list first
files = list(Path(".").glob("*.py"))
print(f"Count: {len(files)}")
for f in files:
    print(f)

Exercise: File Organizer

Task: Create a function that organizes files into folders by extension.

Requirements:

  • Use pathlib for all path operations
  • Find all files in a directory (not subdirectories)
  • Group them by extension (.py, .txt, etc.)
  • Move each file to a folder named after its extension
Output
Click Run to execute your code
Show Solution
from pathlib import Path
import shutil

def organize_by_extension(directory):
    """Organize files into folders by their extension."""
    base = Path(directory)

    # Find all files (not directories)
    files = [f for f in base.iterdir() if f.is_file()]

    for file in files:
        # Get extension without dot, or 'no_extension'
        ext = file.suffix[1:] if file.suffix else "no_extension"

        # Create extension folder
        ext_folder = base / ext
        ext_folder.mkdir(exist_ok=True)

        # Move file to extension folder
        dest = ext_folder / file.name
        file.rename(dest)
        print(f"Moved: {file.name} -> {ext}/{file.name}")


# Test it
test_dir = Path("test_organize")
test_dir.mkdir(exist_ok=True)

# Create sample files
(test_dir / "script.py").touch()
(test_dir / "utils.py").touch()
(test_dir / "data.txt").touch()
(test_dir / "notes.txt").touch()
(test_dir / "image.png").touch()
(test_dir / "README").touch()

print("Before organizing:")
for f in test_dir.iterdir():
    print(f"  {f.name}")
print()

organize_by_extension(test_dir)
print()

print("After organizing:")
for item in sorted(test_dir.iterdir()):
    if item.is_dir():
        print(f"  {item.name}/")
        for f in item.iterdir():
            print(f"    {f.name}")

# Cleanup
shutil.rmtree(test_dir)

Summary

  • os.path.join(): Join paths with correct separator
  • pathlib.Path: Modern OOP approach to paths
  • Path / operator: Path("a") / "b" joins paths
  • Path properties: .name, .suffix, .parent, .stem
  • Path methods: .exists(), .is_file(), .is_dir()
  • Create dirs: Path.mkdir(parents=True, exist_ok=True)
  • File ops: .read_text(), .write_text(), .touch()
  • Traversal: .iterdir(), .glob(), .rglob()
  • Copying: Use shutil.copy(), shutil.copytree()
  • Deleting: .unlink() for files, shutil.rmtree() for dirs

What's Next?

Now that you can navigate the filesystem, let's learn about CSV files - one of the most common data formats. Python's csv module makes it easy to read and write spreadsheet-like data, and you'll see how to handle headers, different delimiters, and common pitfalls!