Skip to main content

Recitation 9 — Trees

Skills: 6

Reading: 7.1

Trees and Hierarchical Data

Trees are hierarchical data -- data that nest sub-entities within a larger entity. Consider a file system where folders contain files and other folders. A table represents flat rows. While you can include a "parent folder" field in each row, you lose the nesting structure.

A File System Table Example

Here's a file system table:

filesystem-table = table: name, type, parent, size
row: "Documents", "folder", "", 0
row: "Photos", "folder", "", 0
row: "resume.pdf", "file", "Documents", 150
row: "notes.txt", "file", "Documents", 25
row: "School", "folder", "Documents", 0
row: "homework.doc", "file", "School", 75
row: "vacation.jpg", "file", "Photos", 500
row: "family.jpg", "file", "Photos", 300
end

What happens if we want to calculate the total size of the Documents folder including all subfolders?

While tables are great for flat data, they make representing and processing hierarchical (tree) data hard. The recursive nature of trees (folders contain folders contain folders...) suggests we need a recursive data structure.

Defining a Recursive Datatype for File System Trees

Data Definition

Define a new datatype called FileTree:

data FileTree:
| file(name :: String, size :: Number)
| folder(name :: String, contents :: List<FileTree>)
end

We have two variants: files (with name and size) and folders (with name and list of contents).

Example

Build a simple file system tree:

resume = file("resume.pdf", 150)
notes = file("notes.txt", 25)
homework = file("homework.doc", 75)
vacation-pic = file("vacation.jpg", 500)
family-pic = file("family.jpg", 300)

school-folder = folder("School", [list: homework])
docs-folder = folder("Documents", [list: resume, notes, school-folder])
photos-folder = folder("Photos", [list: vacation-pic, family-pic])

root = folder("Root", [list: docs-folder, photos-folder])

Discovering Recursive Functions on Trees

Notice that a FileTree can be either a file (base case) or a folder containing a list of other FileTrees (recursive case).

Example

  • For a file, count is 1
  • For a folder, count is 1 (for the folder itself) plus the count of each item in contents

Design a function count-items that counts all items in a file tree:

fun count-items(ft :: FileTree) -> Number:
cases (FileTree) ft:
| file(n, s) => # YOUR CODE HERE
| folder(n, contents) => # YOUR CODE HERE
end
where:
count-items(resume) is 1
count-items(school-folder) is 2
end

fun count-items-list(fts :: List<FileTree>) -> Number:
cases (List) fts:
| empty => # YOUR CODE HERE
| link(first, rest) => # YOUR CODE HERE
end
end

Traversing Trees Recursively

Design a function that returns only file names (not folder names):

fun list-files-only(ft :: FileTree) -> List<String>:
cases (FileTree) ft:
| file(n, s) => # YOUR CODE HERE
| folder(n, contents) => # YOUR CODE HERE
end
where:
list-files-only(resume) is [list: "resume.pdf"]
list-files-only(docs-folder) is [list: "resume.pdf", "notes.txt", "homework.doc"]
end

You'll also need to design a helper function list-files-only-list that processes a list of file trees.

Transforming Trees

for each loops require linear data (lists) because they process elements sequentially. Trees have branching structure, so you must reconstruct the structure recursively.

Design a function that keeps only files larger than a certain size:

fun keep-large-files(ft :: FileTree, min-size :: Number) -> FileTree:
cases (FileTree) ft:
| file(n, s) => # YOUR CODE HERE
| folder(n, contents) => # YOUR CODE HERE
end
end

You'll also need to design a helper function keep-large-files-list that processes a list of file trees.

Wrap-Up

  • How is processing trees different from processing lists?
  • Why can't we use for-each loops on trees?