Nested Data Structures
Real-world data is rarely flat - it's usually nested, hierarchical, and complex. Whether you're working with JSON from APIs, configuration files, or database results, you'll encounter lists inside dictionaries, dictionaries inside lists, and deeper combinations. Mastering nested structures is essential for working with real data in Python!
List of Dictionaries
The most common pattern you'll encounter, especially when working with JSON APIs. A list of dictionaries represents a collection of records, where each dictionary is one record with named fields. Think of it like rows in a spreadsheet or database table.
Click Run to execute your code
[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}].
Dictionary of Lists
Use this pattern when you need to group or categorize items. The dictionary keys are category names, and each value is a list of items belonging to that category. Perfect for grouping, tagging, or organizing data by some attribute.
Click Run to execute your code
dict.setdefault(key, []) or
collections.defaultdict(list) to handle missing keys automatically.
Dictionary of Dictionaries
When you need to create hierarchical or tree-like structures, dictionaries of dictionaries are your tool. This pattern is perfect for configuration files, organizational charts, or any data with named sub-sections.
Click Run to execute your code
data["a"]["b"]["c"], any missing key raises a KeyError. Chain
.get() calls with defaults: data.get("a", {}).get("b", {}).get("c", default)
to safely navigate nested structures.
Working with JSON-like Data
Real API responses combine all these patterns - they have lists of dictionaries, where dictionaries contain more lists and dictionaries. Let's look at navigating, extracting, and transforming this complex nested data.
Click Run to execute your code
Common Mistakes
1. Forgetting to handle missing keys
# Wrong - crashes if "address" doesn't exist
city = user["address"]["city"] # KeyError!
# Correct - use get() with defaults
city = user.get("address", {}).get("city", "Unknown")
# Or use try/except for complex cases
try:
city = user["address"]["city"]
except KeyError:
city = "Unknown"
2. Modifying while iterating
# Wrong - modifying dict while iterating
for key in data:
if should_remove(key):
del data[key] # RuntimeError!
# Correct - iterate over a copy of keys
for key in list(data.keys()):
if should_remove(key):
del data[key]
# Or use dict comprehension to filter
data = {k: v for k, v in data.items() if not should_remove(k)}
3. Creating unintended shared references
# Wrong - all inner lists are the same object!
matrix = [[]] * 3
matrix[0].append(1)
print(matrix) # [[1], [1], [1]] - oops!
# Correct - create separate list objects
matrix = [[] for _ in range(3)]
matrix[0].append(1)
print(matrix) # [[1], [], []]
# Same issue with nested dicts
data = {"users": {}} * 3 # Wrong thinking - can't multiply dicts
# Use comprehension or explicit creation
4. Wrong index type for nested access
# Data structure
data = {"users": [{"name": "Alice"}, {"name": "Bob"}]}
# Wrong - mixing up list index and dict key
user = data["users"]["0"] # KeyError - "0" is string, not int
user = data[0]["name"] # KeyError - data is dict, not list
# Correct - use proper types
user = data["users"][0] # List uses integer index
name = data["users"][0]["name"] # Dict uses string key
5. Assuming data structure without checking
# API might return different structures
response = get_api_data()
# Wrong - assumes "data" always exists and is a list
for item in response["data"]: # Might crash!
process(item)
# Correct - validate structure first
if response.get("status") == "success":
data = response.get("data", [])
if isinstance(data, list):
for item in data:
process(item)
Exercise: Process API Response
Task: Extract and aggregate data from a nested API response.
Requirements:
- Find all user emails
- Calculate total posts across all users
- Find the user with the most posts
Click Run to execute your code
Show Solution
api_response = {
"data": {
"users": [
{"name": "Alice", "email": "[email protected]", "posts": [1, 2, 3]},
{"name": "Bob", "email": "[email protected]", "posts": [4, 5]},
{"name": "Charlie", "email": "[email protected]", "posts": [6, 7, 8, 9]}
]
}
}
# 1. All emails
emails = [user["email"] for user in api_response["data"]["users"]]
print(f"Emails: {emails}")
# 2. Total posts
total = sum(len(user["posts"]) for user in api_response["data"]["users"])
print(f"Total posts: {total}")
# 3. User with most posts
most_posts_user = max(api_response["data"]["users"], key=lambda u: len(u["posts"]))
print(f"Most active: {most_posts_user['name']} ({len(most_posts_user['posts'])} posts)")
Summary
- List of dicts: Collection of records -
[{"name": "Alice"}, {"name": "Bob"}] - Dict of lists: Grouped/categorized items -
{"even": [2,4], "odd": [1,3]} - Dict of dicts: Hierarchical/config data -
{"db": {"host": "localhost"}} - Access nested data: Chain brackets:
data["users"][0]["name"] - Safe access: Chain
.get():data.get("a", {}).get("b", default) - Extract data: Use comprehensions with nested loops
- Transform: Build new structures from nested data with comprehensions
- Always validate: Check structure before accessing deeply nested paths
What's Next?
Congratulations on completing the Data Structures module! You now know how to work with all of Python's built-in data structures. Next, we'll dive into Functions - how to define reusable blocks of code, pass arguments, return values, and write cleaner, more modular programs!
Enjoying these tutorials?