Skip to main content

Day 23 - More trees

Skills: None

Pre-reading: 7.1.3, 7.1.4

Supplementary Videos

More Trees

Intro (20 mins)

  • On Monday we saw trees that represent river networks. Today, we'll see more examples of binary trees. First, one of the most important types of binary trees -- a binary search tree.

  • Binary search trees (BSTs) are a particular type of tree that allows quick lookup of values stored at particular ordered keys. The keys are often numbers, and in our example, the values will be strings.

  • We can represent such a BST with the following data definition:

    data BST:
    | leaf
    | node(key :: Number, val :: String, left :: BST, right :: BST)
    end
    # Invariant: keys are unique, and all nodes in the left
    # subtree have keys less than the key in the current node,
    # all nodes in the right subtree have keys greater than
    # the key in the current node

    BST0 = leaf
    BST1 = node(1, "hello", BST0, BST0)
    BST2 = node(5, "bye", BST1, BST0)
    BST3 = node(6, "bye", BST2, node(10, "cs2000", BST0, BST0))

    #|
    fun bst-temp(bst :: BST) -> ???:
    cases (BST) bst:
    | leaf => ...
    | node(k, v, left, right) =>
    ... k ...
    ... v ...
    ... bst-temp(left) ...
    ... bst-temp(right) ...
    end
    end
    |#
  • The point of BSTs is that they allow faster lookup than a list would -- if you are looking for a key, if you don't find it at the root, it can only be in either the left or the right subtree.

  • If each subtree is roughly the same size, this means at each step we eliminate half of the nodes to look through.

  • The worst case is that you have to go to the longest path from the root to a leaf -- this is also called the height of the tree. Let's design a function to compute that.

    fun height(bst :: BST) -> Number:
    cases (BST) bst:
    | leaf => 0
    | node(k, v, left, right) =>
    1 + num-max(height(left), height(right))
    end
    where:
    height(BST0) is 0
    height(BST1) is 1
    height(BST2) is 2
    height(BST3) is 3
    end
  • As you will probably see in much more detail in a class like algorithms, if the tree is balanced -- i.e., roughly the same number of nodes in each subtree at every level -- the height is roughly the logarithm (base 2) of the number of nodes.

  • This is compared to the worst case when you have a list of elements -- that the one you are looking for is at the end of the list, so you have to look through all of the nodes.

  • As the number of nodes gets large, the difference between n and lg(n) becomes very very big. lg(1,000,000) is roughly 20.

Class Exercises (35 mins)

  1. The main goal of a BST is to be able to look up values at a given key. Design a function retrieve that takes a BST, a Number, and returns an Option<String> -- if the given number exists as a key in the tree, your function should return some(v) where v is the string value that is on the node with the key. If the given number doesn't exist, your function should return none.

  2. In order for BSTs to have the performance characteristics that define them, they have to be balanced. This means not that each subtree is the same size (this would only allow trees that had number of elements equal to a power of 2), but rather, that each subtree is close to the size of the other. Rather than measuring size, often BSTs compare heights, calculated as we did with our height function above. Design a function is-balanced that should take a BST and return a boolean that indicates if the tree is balanced: this means that at every level, the height of the left and right subtrees differs by at most 1.

  3. Programs are also often tree shaped! In this exercise, we consider something simpler than full programs -- arithmetic expressions. Design a data definition Arith that allows you to represent numbers, addition of two arithmetic expressions, multiplication of two arithmetic expressions, division of two arithmetic expressions, and subtraction of two arithmetic expressions. This should be a binary tree with one leaf constructor (for numbers) and four node constructors (for the four types of binary operations).

  4. Define constants using your Arith to represent the following expressions:

1 + 2
1 * (3 / 4)
(7 - 3) * (2 + (5 / 2))
  1. Design a function eval that takes an Arith and returns the single number to which it evaluates. If you use the three examples from exercise 4, your test cases should result in 3, 0.75, and 18.

Wrap-up (5 mins)

  • BSTs are a powerful idea, and there are sophisticated algorithms to handle inserting and deleting data from them while maintaining balance. The core of their performance is exactly as you've seen, however.
  • Trees also show up in many other problems!