Skip to main content

Day 7 - Introduction to tables

Skills: 2

Pre-reading: 4.1.1 & 4.1.2

Supplementary Videos

Intro to Tables

Reference: For all work with tables, refer to the Tables page in the menu at the top of the page!

Intro (15 mins)

Goal Learn about tabular data, creating tables literally, importing data, extracting rows and cell values.

  • Many everyday pieces of data -- like a workout journal, recipe index, or library catalog — are naturally represented as tables, a type of data where there are many rows where each row has the same set of attributes, called columns.

  • Tables are values, just like numbers, strings, images, and booleans, and small ones can be directly typed into Pyret as:

    workouts = table: date :: String, activity :: String, duration :: Number
    row: "2025-04-01", "Running", 30
    row: "2025-04-02", "Yoga", 45
    row: "2025-04-03", "Cycling", 60
    end
  • Note that after the table: comes a list of columns, with optional type annotations. This is then followed by a sequence of rows, that each must have exactly the columns mentioned at the beginning.

  • Since tables are values, they can be the input and output of functions, and can be used in examples. An important detail: when comparing tables for equality (like in test cases) the order of rows matters! Similarly, the order of the columns matters (see reading).

  • We can use check: ... end to write a set of tests not associated with a function, and use that to see:

    check:
    table: date :: String, activity :: String, duration :: Number
    row: "2025-04-01", "Running", 30
    row: "2025-04-02", "Yoga", 45
    row: "2025-04-03", "Cycling", 60
    end
    is-not
    table: date :: String, activity :: String, duration :: Number
    row: "2025-04-03", "Cycling", 60
    row: "2025-04-01", "Running", 30
    row: "2025-04-02", "Yoga", 45
    end
    end
  • To deal with external files, we need to first include a Pyret piece of functionality that is not enabled by default to handle tables represented as "comma separated values" (CSV) files.

  • Then we can use load-table:, rather than table:, and rather than listing the rows, specifying that they come from a csv file (in this case, from a URL, but in HW and lab, often it will be a file in the same project, using csv-table-file).

    include csv
    recipes = load-table:
    title :: String,
    servings :: Number,
    prep-time :: Number
    source: csv-table-url("https://pdi.run/f25-2000-recipes.csv", default-options)
    end
  • IMPORTANT: by default, all CSV data are strings; to convert numeric data to numbers, we can use "sanitizers". We will talk more about data cleaning on Day 11, but for any csv that has numeric columns, you should add clauses, after the source: line, that look like sanitize column-name using num-sanitizer (each on separate lines, no commas between). This requires that we add a line include data-source at the top. So the above should be done as:

    include csv
    include data-source
    recipes = load-table:
    title :: String,
    servings :: Number,
    prep-time :: Number
    source: csv-table-url("https://pdi.run/f25-2000-recipes.csv", default-options)
    sanitize servings using num-sanitizer
    sanitize prep-time using num-sanitizer
    end
  • In addition to printing the whole table (or a prefix, if the table is long), you can extract a row from it by writing table-identifier.row-n(N) for some N. The first row is numbered 0, the last is one minus the number of rows in the table.

    second-workout = workouts.row-n(1)
    # -> Row: date = "2025-04-02", activity = "Yoga", duration = 45
  • From a row, you can extract a column's value using row-identifier["column-name"], e.g.,

    second-workout["activity"]  # -> "Yoga"
    # or all at once:
    workouts.row-n(1)["duration"] # -> 45
  • There are also built-in functions that compute results -- e.g., mean takes a table, a column name, and returns the average of all values in that column. Other functions like median, modes, sum, and stdev exist as well! To use these functions, be sure your Pyret file starts with use context dcic2024 (see Tables reference page for more details.)

    mean(workouts, "duration") # -> 45

Class Exercise (40 mins)

  • Browse and pick a dataset on the London DataStore. Once you found a dataset to work with, open the page for details. Scroll down until you see a data set in CSV format. Right-click the "Download" button so you can copy (not download) the link; Pyret will download it. Use the copied link (URL) in your code to create a table from it. For now, create a single column with any name you choose.
  • The number of columns must match the CSV -- if it doesn't, Pyret will report an error and it will show you how many columns are in the file; use this information to fix the table declaration and re-run.
  • Use the interactions window, .row-n, and the column extraction to explore the data a little bit.
  • Check the total number of rows with table-identifier.length(). What happens if you try to do table-identifier.row-n(M) for M that is greater than the total number of rows?
  • Similarly, try extracting a column that doesn't exist.

Wrap-up (5 mins)

  • Tables are an extremely common and powerful form of data. Today we just learned how to look at them; in the upcoming days we will see how to program with them!