- Hashing is most common implementation for a set/map
- Slight performance improvement over BST implementations
- With BSTs:
- We must have comparable items
- Must also maintain bushiness (non-trivial, can use methods like red-black tree or 2-3 tree)
- BST gets us to inserting/checking for containing data in
Θ(logN) , but we can improve on this

- Terrible idea for storing data: unordered array
`contains`

and`insert`

takeΘ(N) time

- Good idea: use the data itself as an array index
- This leads to constant time lookup - just looking up memory location specified by data value
- This is highly wasteful of memory, but gives us constant time
`contains`

and`insert`

- Generalizing this idea: use data object's actual value to get an index number
- For Strings, we can try converting the object to base
26 - Ex: for 'cat' let c =
3 , a =1 , and t =20 (position in alphabet). Then we can say the index of 'cat'=3∗262+1∗261+20∗260=2074 - Could also use base
32 (this is particularly nice for binary representations, since it would make each letter of the String5 bits (32=25 ), so every5 bits of memory encodes for a letter - try to understand why multiplying by25 is equivalent to shifting bits right5 places) **Summary:**- We have some sort of
**hashCode**that takes some data object and returns some integer representation, that we can then use as an index - We need to resolve ambiguity - when hashCode returns same index for different objects - called
**collision handling**

- We have some sort of

- For Strings, we can try converting the object to base
- Better idea -
`HashTable`

:- Have a fixed number of boxes
M in an array, take hashCode number moduloM , and use this as index - Collision handling: use method like
**external chaining**- at each of theM boxes, have a linked-list/ArrayList/... structure to store all the items of that common index- Other methods for collision resolution exist, such as
*open addressing*(look this up, not necessarily better than external chaining; in this class we will use external chaining)

- Other methods for collision resolution exist, such as
- Note: a good hashCode will give index values that are well balanced (very even distribution across all
M values) **Load factor:**if we haveN items distributed overM buckets, the load factor isL=N/M - Average runtime for
`contains`

and`insert`

isΘ(L) - We can ensure a very fast data structure by making sure that
L is always small- Whenever
L exceeds some threshold, we resize our array of boxes, thus increasingM and decreasingL - Must appropriately re-assign all the
N elements to their new positions in the resized array of boxes

- Whenever
- Note: for doing modulo operation, use
`Math.floorMod(hashCodeValue, M)`

, as this accounts for negative hashCode values - Performance: if we ensure resizing to keep the load factor
L below a certain constant threshold, and we ensure theN items of the table are spread out/well distributed across theM buckets, we have an amortized constant average runtime:Θ(1) **Hash functions**:- Constant time amortized performance depends on a balanced hash function (like ensuring a BST is bushy, but a good hash function is luckily a much conceputally simpler problem)
- Using a base that is a power of
2 (2,4,8,16,32,... ) makes for a bad hashCode - For strings, base
31 is used - Important to have many parts of the data contribute to the hashCode value - nothing should be ignored
- Base of hashCodes should NOT be a factor of the number of buckets
M of the array - For objects that consist of multiple members, such as a Collection object, we can get a hashCode of the entire Collection by something like the following:
- Take the first member's own hashcode (most likely by calling that member object's own hashCode function)
- Add it to some total hashCode number, and then "smear/elevate" the number (spread it's encoding out over all the bits by multiplying by a number like
31 ) - Repeat with the remaining members (get the hashCode of the next member, add it to the total, smear the total, repeat)

- Can save time in hashing function by only looking at 'top' portion of data when computing
- Java requires all methods have
`hashCode`

method

- Never store mutable objects in a HashTable/HashMap
- Never override
`equals`

without also overriding`hashCode`

- Have a fixed number of boxes