Skip to main content

Day 32 - Visualizing Data with Python

Visualizing Data with Pandas and Matplotlib

1. Introduction (10 minutes)

  • Overview:
    • Recap the restaurant orders DataFrame and our computed columns from the previous lecture.
    • Today we will visualize our data to better understand restaurant performance.
  • Learning Goals:
    • Learn to generate plots with Matplotlib.
    • Understand how to work with wide versus tall data.
    • Explore how to reformat data (using melt) for visualization.
  • Do Now:
    • Ask: “Why is visualizing data important in data analysis? How might a chart help you see trends that raw numbers do not?”

2. Plotting with Matplotlib (20 minutes)

  • Setting Up Matplotlib:
    • Example Code:
      import matplotlib.pyplot as plt
      # Ensure converters are registered for dates if needed
      from pandas.plotting import register_matplotlib_converters
      register_matplotlib_converters()
  • Line Plot of Daily Total Sales:
    • First, compute total sales per day:
      daily_sales = orders.groupby('date')['total_price'].sum().reset_index()
      print(daily_sales)
    • Plotting:
      plt.figure()
      plt.plot(daily_sales['date'], daily_sales['total_price'], marker='o')
      plt.xlabel('Date')
      plt.ylabel('Total Sales ($)')
      plt.title('Daily Total Sales')
      plt.xticks(rotation=45)
      plt.tight_layout()
      plt.show()
  • Bar Chart for Sales by Dish:
    • Example Code:
      dish_sales = orders.groupby('dish')['total_price'].sum().reset_index()
      plt.figure()
      plt.bar(dish_sales['dish'], dish_sales['total_price'], color='skyblue')
      plt.xlabel('Dish')
      plt.ylabel('Total Sales ($)')
      plt.title('Sales by Dish')
      plt.show()
  • Interactive Exercise:
    • Ask: “Try modifying the plots—change markers, colors, or add gridlines. What effect do these changes have?”

3. Reshaping Data: Wide vs. Tall Format (15 minutes)

  • Concepts:
    • Explain that data can be organized in different shapes. Wide data has many columns; tall data has more rows and fewer columns.
    • In Pandas, the melt function converts wide data into tall (long) format, which can simplify plotting.
  • Example:
    • Suppose we have a wide table of weekly sales for different regions:
      wide_sales = pd.DataFrame({
      'week': ['Week1', 'Week2', 'Week3'],
      'north': [150, 200, 180],
      'south': [130, 170, 160],
      'east': [140, 210, 190],
      'west': [120, 160, 150]
      })
      print(wide_sales)
    • Convert to Tall Format:
      tall_sales = wide_sales.melt(id_vars=['week'], var_name='region', value_name='sales')
      print(tall_sales)
  • Using the Tall Data for Plotting:
    • Example:
      Plot sales by region for each week:
      plt.figure()
      sns.barplot(x='week', y='sales', hue='region', data=tall_sales)
      plt.title('Weekly Sales by Region')
      plt.show()
  • Discussion:
    • Ask students: “What are some advantages of having data in tall format for plotting?”

4. Wrap-Up (5 minutes)

  • Recap Key Points:
    • We generated plots using Matplotlib to visualize daily sales and sales by dish.
    • We learned about wide vs. tall data and used the melt function to reformat data.
    • Visualizations help us communicate insights and detect trends that aren’t obvious from raw numbers.