Day 32 - Visualizing Data with Python
Visualizing Data with Pandas and Matplotlib
1. Introduction (10 minutes)
- Overview:
- Recap the restaurant orders DataFrame and our computed columns from the previous lecture.
- Today we will visualize our data to better understand restaurant performance.
- Learning Goals:
- Learn to generate plots with Matplotlib.
- Understand how to work with wide versus tall data.
- Explore how to reformat data (using melt) for visualization.
- Do Now:
- Ask: “Why is visualizing data important in data analysis? How might a chart help you see trends that raw numbers do not?”
2. Plotting with Matplotlib (20 minutes)
- Setting Up Matplotlib:
- Example Code:
import matplotlib.pyplot as plt
# Ensure converters are registered for dates if needed
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
- Example Code:
- Line Plot of Daily Total Sales:
- First, compute total sales per day:
daily_sales = orders.groupby('date')['total_price'].sum().reset_index()
print(daily_sales) - Plotting:
plt.figure()
plt.plot(daily_sales['date'], daily_sales['total_price'], marker='o')
plt.xlabel('Date')
plt.ylabel('Total Sales ($)')
plt.title('Daily Total Sales')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
- First, compute total sales per day:
- Bar Chart for Sales by Dish:
- Example Code:
dish_sales = orders.groupby('dish')['total_price'].sum().reset_index()
plt.figure()
plt.bar(dish_sales['dish'], dish_sales['total_price'], color='skyblue')
plt.xlabel('Dish')
plt.ylabel('Total Sales ($)')
plt.title('Sales by Dish')
plt.show()
- Example Code:
- Interactive Exercise:
- Ask: “Try modifying the plots—change markers, colors, or add gridlines. What effect do these changes have?”
3. Reshaping Data: Wide vs. Tall Format (15 minutes)
- Concepts:
- Explain that data can be organized in different shapes. Wide data has many columns; tall data has more rows and fewer columns.
- In Pandas, the
melt
function converts wide data into tall (long) format, which can simplify plotting.
- Example:
- Suppose we have a wide table of weekly sales for different regions:
wide_sales = pd.DataFrame({
'week': ['Week1', 'Week2', 'Week3'],
'north': [150, 200, 180],
'south': [130, 170, 160],
'east': [140, 210, 190],
'west': [120, 160, 150]
})
print(wide_sales) - Convert to Tall Format:
tall_sales = wide_sales.melt(id_vars=['week'], var_name='region', value_name='sales')
print(tall_sales)
- Suppose we have a wide table of weekly sales for different regions:
- Using the Tall Data for Plotting:
- Example:
Plot sales by region for each week:plt.figure()
sns.barplot(x='week', y='sales', hue='region', data=tall_sales)
plt.title('Weekly Sales by Region')
plt.show()
- Example:
- Discussion:
- Ask students: “What are some advantages of having data in tall format for plotting?”
4. Wrap-Up (5 minutes)
- Recap Key Points:
- We generated plots using Matplotlib to visualize daily sales and sales by dish.
- We learned about wide vs. tall data and used the melt function to reformat data.
- Visualizations help us communicate insights and detect trends that aren’t obvious from raw numbers.