Skip to main content

Day 13 - Extracting columns

Skills: None

Pre-reading: 5.1.1, 5.1.2, 5.1.3, 5.1.4.1

Reference: For all work with tables, refer to the Tables page in the menu at the top of the page!

Intro (25 mins)

  • Consider you have a small cafe that tracks daily sales, where each row has the number of drinks sold:

    cafe-data =
    table: day, drinks-sold
    row: "Mon", 45
    row: "Tue", 30
    row: "Wed", 55
    row: "Thu", 40
    row: "Fri", 60
    end
  • There are many basic statistical questions we might want to ask -- e.g., what is the maximum number of drinks sold, the average sold per day, or the total sold.

  • We can extract single columns from tables, and then use standard operations from two different software libraries -- math and statistics -- to operate over those columns.

    import math as M
    import statistics as S

    sales = cafe-data.get-column("drinks-sold")
    M.max(sales) # maximum sales
    S.mean(sales) # average sales
    M.sum(sales) # total sold
  • Here, import math as M declares that you want to use code that comes from Pyret's math library (https://pyret.org/docs/latest/math.html), and in order to not get confused between the functions provided from it and the functions available by default or defined by you, the as M gives the library a short name. This means that any function function-name from the math library is availabe as M.function-name. The choice M is arbitrary -- but convention is to use capital letters, and usually just one or two.

  • If you aren't worried about mixing up functions from the library, you can instead write:

    include math

    This adds all the functions from the math library to the current program directory with no change of name, so we could write the above alternately as:

    include math
    include statistics

    sales = cafe-data.get-column("drinks-sold")
    max(sales) # maximum sales
    mean(sales) # average sales
    sum(sales) # total sold

    However, the context that we use for most programs in this class dcic2024 already defines many of the names defined in the statistics library as table functions (as we saw on Day 7), so Pyret will give you a shadowing error if you try to do that.

  • A column is a type of data called a list -- if we print it out we can see that it looks like:

    cafe-data.get-column("drinks-sold")

    This shows at the interactions as:

    [list: 45, 30, 55, 40, 60]
  • Lists have order, but no column header, and unlike tables, only a single value per entry.

  • We can define list values directly using the above syntax:

    sample-list = [list: 10, 20, 30]
    empty-list = [list: ]
  • And lists can have any type of data in them. On lists of numbers we can use functions from math and statistics on them.

Class Exercise (30 mins)

  • Using the cafe data from the intro, extract the "day" column and determine which day appears first alphabetically using M.min. Then extract both columns and create a list of the total drinks sold across all days using M.sum.
  • Create a new table of student quiz scores:
    quiz-scores =
    table: student, quiz1, quiz2, quiz3
    row: "Alice", 85, 92, 78
    row: "Bob", 90, 88, 95
    row: "Charlie", 78, 85, 82
    row: "Diana", 95, 90, 88
    end
    Extract each quiz column and calculate the class average for each quiz using S.mean. Which quiz had the highest average?
  • Create a list directly using the syntax [list: 12, 8, 15, 22, 5, 18] and use functions from the math library to find the minimum, maximum, and sum. What's the range (difference between max and min)?
  • Load the employees dataset from https://data.boston.gov/dataset/employee-earnings-report using the CSV URL (as in Day 9). Extract the "REGULAR" column, convert it to numbers using the string-to-number-unsafe helper from Day 9, and calculate the average regular salary across all employees.

Wrap-Up (5 mins)

  • Columns can be extracted from tables -- these are lists.
  • Many built in functions work on lists of values.