- Note, insertion sort with elements inserted into a binary search tree is actually a quicksort in disguise!
- We have the standard set of Comparison Based sorting algorithms, the Integer/Counting based sorting algorithms, and finally the digit-by-digit/Radix sorting algorithms that build off Counting sorts.
- We have ways to search a data structure that are analogous to using the compareTo() of its elements, and we have ways to sort structures that are more analogous to a Integer/Counting sort (with these, we use properties of the element itself, like a hashcode, in our search/sort)
- There also exists a data structure that is analogous to digit-by-digit sorting and searching; it is called a
*trie*

- Short for re
**trie**val tree - Pronounced officially same way as
*tree*, but everyone pronounces it*try* - Below is how a trie would look after inserting in: “sam”, “sad”, “sap”, “same”, “a”, and “awls”
- We store the characters in words as a tree of nodes, all with a common root. Words with a common sequence of characters in them will share a commn set of nodes
- Note how the nodes corresponding to the final character in the words have a special blue color/attribute; this is how we track the actual words in our structure: simply get the string of characters in the path from the root to the blue-colored node; this is the word
- With this example, we would get the following results upon calling
`contains(String s)`

:`contains("sam")`

: True - following the tree we see that there exists nodes 's', 'a', and 'm' in that order (valid path); furthermore the final node 'm' is a blue node, marking that it is a terminal character in a word. Thus we return true`contains("a")`

: True - same reason as above`contains("z")`

: False - no node 'z' exists from the root`contains("sa")`

: False - The nodes 's' and 'a' exist in the root and are a valid path, but the terminal node, 'a', is not a blue-colored node. This means that there is no word in our trie structure that ends here, so we must return false`contains("")`

: False - same reasoning as above; the root node is not a blue-colored node

- Performance: For a trie with
N keys/nodes in it and a key of sizeL to insert/search,- Worst case insert time:
O(L) - Worst case search time:
O(L) - Best case search time:
O(1) (for a miss on the first ofL characters)

- Worst case insert time:
- Comparing different implementations:
- Note: hashtables may look at many unnecessary characters, since as we look up a hashcode we could be comparing to many unrelated elements
- Searching is constant with respect to number of keys
- Main purpose of tries:
*Prefex Matching*and*Approximate Matching*:- For example: find all keys that match a given prefix:
`prefixMatch(“sa”)`

- With above example, this would return "sad", "sam", "same", and "sap"

- Example 2: find longest prefix of a given key
- For both these examples, a trie is much more efficient to use than a hashmap or BST

- For example: find all keys that match a given prefix:

public class TrieSet {// Support characters upto 256private static final int R = 256;private class Node {// Upto R linksboolean exists;Node[] links;public Node() {links = new Node[R];exists = false;}}private Node root = new Node();public void put(String key) {put(root, key, 0);}private Node put(Node x, String key, int d) {if (x == null) {x = new Node();}if (d == key.length()) {x.exists = true;return x;}char c = key.charAt(d);x.links[c] = put(x.links[c], key, d + 1);return x;}}

- Normally in old phones, to type text, each of the numbers from
0−9 represent multiple letters (ex:2 is 'abc',3 is 'def', ...) and to get different letters on the same key, we multitap (a is just a single tap on1 , while b is a double tap on1 ), see below: - An alternate way of texting is
*T9 texting*, where we only press each key once for a letter. Then, with the string of pressed number keys, the program gives us a list of all the possible words that the number string could represent, and we select the desired word from the list- For example,
4663 could either represent "good", "home", "gone", or "hoof"; so the cell phone returns us a list of these words (possibly ordered by relevance/frequency) - Note that the letters on the
3 key are 'def',4 are 'ghi', and6 are 'mno'

- For example,
- We can nicely implement this with a trie
- We will want a structure like
`TrieMap<String, TreeSet<WeightedWord>>`

- Our alphabet will be
`R = 8`

, since we are using the numbers2−9 - The key is the number string the user types, such as
4663 - The returned value is a tree set containing all the possible words (such as "good", "home", "gone", "hoof")
- The values contained in the tree set are of type
`WeightedWord`

, which are Strings that have a`compareTo()`

method that contains some sort of weight - When a word is used, increase its weight, moving its position in the TreeSet if necessary. This is how we add relevance-based ordering to the results

- The values contained in the tree set are of type
- Note: TreeMap and HashMap would have worked as well, but not as efficient for this use case as a TrieMap

- We will want a structure like

- Tries have exelent performance
- The naive implementation is very bad memory-wise
- The fundamental issue is that mapping a node to children is done with a data-indexed array of size
R (alphabet size) - Thus each node in the trie uses memory
R

- The fundamental issue is that mapping a node to children is done with a data-indexed array of size
- We can improve memory usage with the following Trie optimizations (look these up for fun!):
- Tracking children in a better way than with an array of length R
- Ternary Search Tries
- Patricia/Radix Tries
- DAWG
- Suffix/Prefix Array

- Child link optimizations:
- In the current naive implementation, we have
N nodes, each of which haveR links - Each node uses
R memory since the node stores its 'children' with an array of sizeR **Map-Trie**: we can instead store a node's children in a map from alphabet characters to nodes, so any node only stores the elements of the alphabet for which it has children. This takes much less thanR memory (if we only have zero/one/two/... children on a certain node, the node uses zero/one/two/... memory to store now respectively)- Using a data-indexed approach (naive method), we have the best performance but poor memory efficiency; using a map-based approach, we have slower query performance, but waste much less memory

- In the current naive implementation, we have
**Ternary Search Trie**: Restrict the number of links- Essentially redesign the trie such that we have a fixed number of links
- Assign an alphabet character to each node
- Give each node 3 links:
- Child in left link if key’s next character < node’s character.
- Child in middle link if key’s next character == node’s character.
- Child in right link if key’s next character > node’s character.

With this implementation, we would have a trie structure looking like the following, with the same example input as earlier:

public class TSTSet<Valeue>{private Node<Value> root;private static class Node<Value> {// Character at node:private char c;// Left, middle, and right subtries:private Node<Value> left;private Node<Value> middle;private Node<Value> right;}}- Full implementation: http://algs4.cs.princeton.edu/52trie/TST.java.html
- Note, ternary search tries have a much worse performance than with standard tries if we were to insert the following elements into the trie: "A", "B", "C", "D" (similar to inputs that cause creation of spindly binary trees)

- Summary of data structure performances:
- Where we have
N keys,L digits per key, andR is alphabet size

- Where we have