Lecture 19 - 03/03

Asymptotics 3: Amortized and Empirical Analysis

• Let's say we are creating a data type for an array-like structure
• Naive resizing: when adding to array but array capacity full, create new array of size+1 elements and copy over, then add element
• This has terrible runtime since for adding N$N$ items to the array, we have to do roughly NN/2$N*N/2$ element copies, which can get really large
• The smarter way was to instead of resizing size by adding a constant (in this case 1), resize by multiplying by a constant (such as 2)
• Now, whenever adding an element but array at full capacity, create new array of 2*size elements and copy over, then add the element
• If we plot the runtime of this smarter resizing method, most of the add operations are constant time, with a few operations (those everytime resize is needed) being very expensive:
• Worst case runtime (when on resize calls) is Θ(N)$\Theta(N)$
• We will prove below that average runtime is Θ(1)$\Theta(1)$:
• Each time we have to resize the size N$N$ array for an add call, we must do N$N$ read calls to the old array, N$N$ write calls to the new array (thus copying the new array over), and 1$1$ write call to add the new element, for a total call-time of 2N+1$2N+1$
• Each time we call add but don't need to resize, we have a simple call-time of 1$1$ for the single write call
• The table below shows cumulative cost for N=1$N=1$ to 14$14$:
• This table shows the average runtime is 44/143.142$44/14 \approx 3.142$
• GOAL: show that this average runtime of 3$\approx 3$ holds as N$N$ gets even larger

Using Potentials for Amortization

• The idea is in the cheaper calls, we build up a saved up "potential" to expend on the more expensive calls, and weight the potentials such that at any given expensive call, all the cumulated potential from the most recent group of cheap calls is enough to complete the expensive call
• Let us define potential Φi0$\Phi_i \geq 0$ as the potential of the ith$i^\text{th}$ operation, with Φ0=0$\Phi_0 = 0$
• Let us define amortized cost of the ith$i^\text{th}$ operation as ai=ci+Φi+1Φi$a_i = c_i + \Phi_{i+1} - \Phi_i$
• Here, ci$c_i$ is the real cost of the operation
• ai$a_i$ is analagous to the deposit we make, ci$c_i$ the withdrawl taken, and Φ$\Phi$ the total holdings (taking difference between two timesteps tells us how much potential/holdings we currently have to expend)
• On cheap calls, we pick ai>ci$a_i > c_i$, which increases Φ$\Phi$
• On expensive calls, we pick ai$a_i$ such that Φi$\Phi_i$ stays positive
• For our array data structure example, we want aiΘ(1),Φi0   i$a_i \in \Theta(1), \Phi_i\geq0~~\forall~i$
• To do this, we must choose ai$a_i$ such that Φi$\Phi_i$ stays ahead of ci$c_i$
• We choose some ai$a_i$ that satisfies our criteria for all the i$i$ timesteps, and then use the difference between the current timestep's amortized cost and total cost to find the change in potential; this is then added to current potential to update our amount of potential; repeat for next timestep
• This is visualized below, with the choice of 3$3$ for all amortized costs:
• Note: total cost was calculated with resize-adds costing 2i+1$2i+1$ and non-resize-adds costing 1$1$
• This shows that 3$3$ for amortized cost is not sufficient, as for i=5$i=5$ we get a negative potential
• This means that the average runtime for this function is greater than 3
• Now, if we use an amortized cost of 5$5$ we get the following relationship:
• This shows that the true average runtime is less than or equal to 5$5$
• Note, we have only shown this for i[1,14]$i\in [1,14]$ and not rigorously shown this for all values of i$i$
• Rigorous analysis out of scope for this class (covered in CS 70/170), we will resort to using intuition for this class

Empirical Analysis:

• We say for two functions f,g$f,g$:   f(N)$~~f(N)$ ~ g(N) $g(N)~$ iff  limnf(N)g(N)=1$~\lim_{n\to\infty}\cfrac{f(N)}{g(N)}=1$
• To determine, we ignore lower order terms, and only look at highest order terms AND their coefficients (since we want the limit to equal 1$1$, not some k$k$)
• We use this analysis to estimate runtime - crude, experimental curve fitting
• For example, let's say our true runtime R(N)$R(N)$ asymptotically approaches  f=aNb$~f = aN^b$ for some constants a,b$a,b$
• Then we can estimate function runtime by finding a,b$a,b$ such that  f$~f$ ~ R(N)$R(N)$
• a,b$a, b$ determined from actual experimental data points (run function for different size inputs, find time taken, and use log math/algebra to determine what a,b$a,b$ are from resulting data points)