Recitation 9 — Trees
Skills: 6
Reading: 7.1
Trees and Hierarchical Data
Trees are hierarchical data -- data that nest sub-entities within a larger entity. Consider a file system where folders contain files and other folders. A table represents flat rows. While you can include a "parent folder" field in each row, you lose the nesting structure.
A File System Table Example
Here's a file system table:
filesystem-table = table: name, type, parent, size
row: "Documents", "folder", "", 0
row: "Photos", "folder", "", 0
row: "resume.pdf", "file", "Documents", 150
row: "notes.txt", "file", "Documents", 25
row: "School", "folder", "Documents", 0
row: "homework.doc", "file", "School", 75
row: "vacation.jpg", "file", "Photos", 500
row: "family.jpg", "file", "Photos", 300
end
What happens if we want to calculate the total size of the Documents folder including all subfolders?
While tables are great for flat data, they make representing and processing hierarchical (tree) data hard. The recursive nature of trees (folders contain folders contain folders...) suggests we need a recursive data structure.
Defining a Recursive Datatype for File System Trees
Data Definition
Define a new datatype called FileTree
:
data FileTree:
| file(name :: String, size :: Number)
| folder(name :: String, contents :: List<FileTree>)
end
We have two variants: files (with name and size) and folders (with name and list of contents).
Example
Build a simple file system tree:
resume = file("resume.pdf", 150)
notes = file("notes.txt", 25)
homework = file("homework.doc", 75)
vacation-pic = file("vacation.jpg", 500)
family-pic = file("family.jpg", 300)
school-folder = folder("School", [list: homework])
docs-folder = folder("Documents", [list: resume, notes, school-folder])
photos-folder = folder("Photos", [list: vacation-pic, family-pic])
root = folder("Root", [list: docs-folder, photos-folder])
Discovering Recursive Functions on Trees
Notice that a FileTree
can be either a file (base case) or a folder containing a list of other FileTree
s (recursive case).
Example
- For a file, count is 1
- For a folder, count is 1 (for the folder itself) plus the count of each item in contents
Design a function count-items
that counts all items in a file tree:
fun count-items(ft :: FileTree) -> Number:
cases (FileTree) ft:
| file(n, s) => # YOUR CODE HERE
| folder(n, contents) => # YOUR CODE HERE
end
where:
count-items(resume) is 1
count-items(school-folder) is 2
end
fun count-items-list(fts :: List<FileTree>) -> Number:
cases (List) fts:
| empty => # YOUR CODE HERE
| link(first, rest) => # YOUR CODE HERE
end
end
Traversing Trees Recursively
Design a function that returns only file names (not folder names):
fun list-files-only(ft :: FileTree) -> List<String>:
cases (FileTree) ft:
| file(n, s) => # YOUR CODE HERE
| folder(n, contents) => # YOUR CODE HERE
end
where:
list-files-only(resume) is [list: "resume.pdf"]
list-files-only(docs-folder) is [list: "resume.pdf", "notes.txt", "homework.doc"]
end
You'll also need to design a helper function list-files-only-list
that processes a list of file trees.
Transforming Trees
for each
loops require linear data (lists) because they process elements sequentially. Trees have branching structure, so you must reconstruct the structure recursively.
Design a function that keeps only files larger than a certain size:
fun keep-large-files(ft :: FileTree, min-size :: Number) -> FileTree:
cases (FileTree) ft:
| file(n, s) => # YOUR CODE HERE
| folder(n, contents) => # YOUR CODE HERE
end
end
You'll also need to design a helper function keep-large-files-list
that processes a list of file trees.
Wrap-Up
- How is processing trees different from processing lists?
- Why can't we use for-each loops on trees?