Day 13 - Extracting columns
Skills: None
Pre-reading: 5.1.1, 5.1.2, 5.1.3, 5.1.4.1
Supplementary Videos
Reference: For all work with tables, refer to the Tables page in the menu at the top of the page!
Intro (25 mins)
-
Consider you have a small cafe that tracks daily sales, where each row has the number of drinks sold:
cafe-data =
table: day :: String, drinks-sold :: Number
row: "Mon", 45
row: "Tue", 30
row: "Wed", 55
row: "Thu", 40
row: "Fri", 60
end -
There are many basic statistical questions we might want to ask -- e.g., what is the maximum number of drinks sold, the average sold per day, or the total sold.
-
We can extract single columns from tables, and then use standard operations from two different software libraries --
mathandstatistics-- to operate over those columns.import math as M
import statistics as S
sales = cafe-data.get-column("drinks-sold")
M.max(sales) # maximum sales
S.mean(sales) # average sales
M.sum(sales) # total sold -
Here,
import math as Mdeclares that you want to use code that comes from Pyret'smathlibrary (https://pyret.org/docs/latest/math.html), and in order to not get confused between the functions provided from it and the functions available by default or defined by you, theas Mgives the library a short name. This means that any functionfunction-namefrom themathlibrary is available asM.function-name. The choiceMis arbitrary -- but convention is to use capital letters, and usually just one or two. -
If you aren't worried about mixing up functions from the library, you can instead write:
include mathThis adds all the functions from the
mathlibrary to the current program directory with no change of name, so we could write the above alternately as:include math
include statistics
sales = cafe-data.get-column("drinks-sold")
max(sales) # maximum sales
mean(sales) # average sales
sum(sales) # total soldHowever, the context that we use for most programs in this class
dcic2024already defines many of the names defined in thestatisticslibrary as table functions (as we saw on Day 7), so Pyret will give you a shadowing error if you try to do that. -
A column is a type of data called a
list-- if we print it out we can see that it looks like:cafe-data.get-column("drinks-sold")This shows at the interactions as:
[list: 45, 30, 55, 40, 60] -
Lists have order, but no column header, and unlike tables, only a single value per entry.
-
We can define list values directly using the above syntax:
sample-list = [list: 10, 20, 30]
empty-list = [list: ] -
And lists can have any type of data in them. On lists of numbers we can use functions from
mathandstatisticson them.
Class Exercise (30 mins)
-
Using the cafe data from the intro, extract the "day" column and determine which day appears first alphabetically using
M.min. -
Create a list for the number of drinks, and compute the total drinks sold across all days using
M.sum. -
Create a new table of student quiz scores:
quiz-scores =
table: student :: String, quiz1 :: Number, quiz2 :: Number, quiz3 :: Number
row: "Alice", 85, 92, 78
row: "Bob", 90, 88, 95
row: "Charlie", 78, 85, 82
row: "Diana", 95, 90, 88
endExtract each quiz column and calculate the class average for each quiz using
S.mean. Which quiz had the highest average? -
Create a list directly using the syntax
[list: 12, 8, 15, 22, 5, 18]and use functions from themathlibrary to find the minimum, maximum, and sum. What's the range (difference between max and min)? -
Load the employees dataset from https://data.boston.gov/dataset/employee-earnings-report using the CSV URL (as in Day 9). Transform the "REGULAR" column to numbers using the
string-to-number-unsafehelper from Day 9, extract it as a list and calculate the average regular salary across all employees.
Wrap-Up (5 mins)
- Columns can be extracted from tables -- these are lists.
- Many built in functions work on lists of values.