Day 13 - Extracting columns
Skills: None
Pre-reading: 5.1.1, 5.1.2, 5.1.3, 5.1.4.1
Reference: For all work with tables, refer to the Tables page in the menu at the top of the page!
Intro (25 mins)
-
Consider you have a small cafe that tracks daily sales, where each row has the number of drinks sold:
cafe-data =
table: day, drinks-sold
row: "Mon", 45
row: "Tue", 30
row: "Wed", 55
row: "Thu", 40
row: "Fri", 60
end -
There are many basic statistical questions we might want to ask -- e.g., what is the maximum number of drinks sold, the average sold per day, or the total sold.
-
We can extract single columns from tables, and then use standard operations from two different software libraries --
math
andstatistics
-- to operate over those columns.import math as M
import statistics as S
sales = cafe-data.get-column("drinks-sold")
M.max(sales) # maximum sales
S.mean(sales) # average sales
M.sum(sales) # total sold -
Here,
import math as M
declares that you want to use code that comes from Pyret'smath
library (https://pyret.org/docs/latest/math.html), and in order to not get confused between the functions provided from it and the functions available by default or defined by you, theas M
gives the library a short name. This means that any functionfunction-name
from themath
library is availabe asM.function-name
. The choiceM
is arbitrary -- but convention is to use capital letters, and usually just one or two. -
If you aren't worried about mixing up functions from the library, you can instead write:
include math
This adds all the functions from the
math
library to the current program directory with no change of name, so we could write the above alternately as:include math
include statistics
sales = cafe-data.get-column("drinks-sold")
max(sales) # maximum sales
mean(sales) # average sales
sum(sales) # total soldHowever, the context that we use for most programs in this class
dcic2024
already defines many of the names defined in thestatistics
library as table functions (as we saw on Day 7), so Pyret will give you a shadowing error if you try to do that. -
A column is a type of data called a
list
-- if we print it out we can see that it looks like:cafe-data.get-column("drinks-sold")
This shows at the interactions as:
[list: 45, 30, 55, 40, 60]
-
Lists have order, but no column header, and unlike tables, only a single value per entry.
-
We can define list values directly using the above syntax:
sample-list = [list: 10, 20, 30]
empty-list = [list: ] -
And lists can have any type of data in them. On lists of numbers we can use functions from
math
andstatistics
on them.
Class Exercise (30 mins)
- Using the cafe data from the intro, extract the "day" column and determine
which day appears first alphabetically using
M.min
. Then extract both columns and create a list of the total drinks sold across all days usingM.sum
. - Create a new table of student quiz scores:
Extract each quiz column and calculate the class average for each quiz using
quiz-scores =
table: student, quiz1, quiz2, quiz3
row: "Alice", 85, 92, 78
row: "Bob", 90, 88, 95
row: "Charlie", 78, 85, 82
row: "Diana", 95, 90, 88
endS.mean
. Which quiz had the highest average? - Create a list directly using the syntax
[list: 12, 8, 15, 22, 5, 18]
and use functions from themath
library to find the minimum, maximum, and sum. What's the range (difference between max and min)? - Load the employees dataset from https://data.boston.gov/dataset/employee-earnings-report
using the CSV URL (as in Day 9). Extract the "REGULAR" column, convert it to
numbers using the
string-to-number-unsafe
helper from Day 9, and calculate the average regular salary across all employees.
Wrap-Up (5 mins)
- Columns can be extracted from tables -- these are lists.
- Many built in functions work on lists of values.