## Lecture 22 - 03/10

### Balancing BST's, Tree Rotation, Red-Black Trees

• Problem: performance of BST operations is bad when BST's are spindly/ not well balanced (goes from logN$\log N$ to N$N$)
• Inserting in random order gives logN$\log N$ height, but this won't work (we don't necessarily know all our data in advance to then choose a random insertion order for; have to insert as we recieve data)
• Naive solution: never add new leaves to the tree, add new elements to appropriate leaf node
• For example, we can have a leaf node with three elements in it, all of which are less/greater than the leaf's parent (depending on if leaf is left/right child)
• Problem: have to search through entire leaf, this defeats purpose when leaf has almost N$N$ nodes in it
• Better solution: set cap to number of items a leaf can have (say, 3)
• Once cap met, give middle item to the parent (say, the left-middle element)
• Now the parent node has two elements and three children nodes (a node for less than both parents, in between parents, and greater than both parents)
• Example:
• When we add q and r to the tree, they go into the rightmost leaf, since this is the most appropriate location
• When we add s to the tree, we must move the left-middle item to the parent (which is q)
• We then appropriately split up the node that the left-middle item came from into two nodes (one with elements less than the left-middle (in between both parent elements), and one with elements greater than the left-middle (greater than both parent elements)), for a total of three children nodes:

• Lets say we want to check if the structure contains r
• We first see that r > m so we go right
• Then we see r > o, so compare to q
• r > q so go right
• Highlight: examining a node might take K$K$ comparisons, but this is ok since K$K$ is a capped amount
• Note: with a cap of k=3$k=3$, it is possible for a parent to have 3$3$ elements (for example if we added t and u - in such a case the parent would have k+1=4$k+1=4$ children:
• Note 2: this can continue all the way up the tree,ex - inserting y, z (after inserting t, u already):
• Observe how the 5$5$ children of the 4$4$ element parent node are broken up as the parent node is broken up
• Note 3: If we need to break up root, we simply just set the middle-left element as the new root, and assign the smaller/larger elements from old root as left/right children respectively
• Since new root will only have one element, it will only have at most two children nodes (for smaller/larger as there is no in-between case)
• These B-trees are perfectly balanced
• If M$M$ is max number of children, we have a max of M1$M-1$ items per node
• Height is in between logM(N)$\log_M(N)$ and log2(N)$\log_2(N)$
• Max number of splitting operations per insert is ~H$H$
• Time per insert/contains call: Θ(logN)$\Theta(\log N)$
• A B-tree where M=4$M=4$ is also called a 2-3-4 tree or a 2-4 tree
• Name refers to number of children a node may have (2,3,$2,3,$ or 4$4$)

#### Tree Rotation:

• Tree rotation for shortening/lengthening a tree
• Preserves search tree property:

Example of RotateLeft before and after (this ended up reducing tree height):

#### Red-Black Trees:

• Problem with B-trees/2-3 trees is they are hard to implement, and have performance problems of:
• Hard to maintain different node types
• Interconversion between 1-element and 2-element nodes
• Walking up tree hierarchy to split nodes
• Goal: represent B-tree/2-3 tree as a binary tree - red-black tree

• We create a binary tree that directly maps to a 2-3 tree
• We represent 2-element nodes with 'glue links'
• Red links 'glue' together two nodes (forming the equivalent of a 2-element node in a 2-3 tree)
• The two nodes glued together by a red link have a total of 3 children (consistent with 2-3 tree mapping)
• Black links connect 1-element and 2-element nodes of the tree
• No node has more than one red link (or else we would have 3-element nodes)
• Every path from a root to leaf must have same number of black links (this maintains balance; makes sense since all links in the 2-3 tree correspond to black links in the red-black tree, and as the 2-3 tree is balanced, all root-leaf paths have same number of links)
• Red links lean left (this dictates how the 2-element nodes are split up, see below example):

Red-Black tree example vs 2-3 tree equivalent:

• Cool note: for any 2-3 tree that is balanced, there exists a corresponding red-black tree of depth no more than twice that of the 2-3 tree
• Further reading: there is a very advanced and complex set of rules for how to insert/delete elements from a red-black tree (tree isometry maintenance using tree rotations), look these up