Analysis of Skiplists

Discover the advantage of using the skiplists.

Here, we analyze the expected height, size, and length of the search path in a skiplist. This section requires a background in basic probability. Several proofs are based on the following basic observation about coin tosses.

Lemma 1: Let TT be the number of times a fair coin is tossed up to and including the first time the coin comes up heads. Then E[T]=2.E[T] = 2.

Proof: Suppose we stop tossing the coin the first time it comes up heads. Define the indicator variable

Ii={0   if the coin is tossed less than i times1   if the coin is tossed i or more timesI_i = \begin{cases} 0 \text{\ \ \ if the coin is tossed less than $i$ times}\\ 1 \text{\ \ \ if the coin is tossed $i$ or more times} \end{cases}

Note that Ii=1I_i = 1 if and only if the first i1i -1 coin tosses are tails, so E[Ii]=Pr{Ii=1}=1/2i1.E[I_i] = \Pr \{I_i = 1\} = 1/2^{i-1}. Observe that TT , the total number of coin tosses, can be written as T=i=1Ii.T = \sum^\infty_{i = 1} I_i. Therefore,

E[T]=E[i=1Ii]=i=1E[Ii]=i=11/2i1=1+1/2+1/4+1/8+=2\begin{split} E[T] & = E \left[ \sum^\infty_{i=1} I_i \right] \\ & = \sum^\infty_{i=1} E[I_i] \\ & = \sum^\infty_{i=1} 1/2^{i-1} \\ & = 1 + 1/2 + 1/4 + 1/8 + \cdots \\ & = 2 \end{split}

The next two lemmata tell us that skiplists have linear size:

Number of nodes excluding sentinel

Lemma 2: The expected number of nodes in a skiplist containing nn elements, not including occurrences of the sentinel, is 2n2n.

Proof: The probability that any particular element, xx, is included in list LrL_r is 1/2r1/2^r, so the expected number of nodes in LrL_r is n/2rn/2^r. Therefore, the total expected number of nodes in all lists is

r=0n/2r=n(1+1/2+1/4+1/8+)=2n\sum^\infty_{r = 0} n/2^r = n(1 + 1/2 + 1/4 + 1/8 + \cdots) = 2n

Height

Lemma 3: The expected height of a skiplist containing nn elements is at most logn+2.\log n + 2.

Proof: For each r{1,2,3,,}r \in \{1,2,3,\cdots,\infty\}, define the indicator random variable

Ir={0   if Lr is empty1   if Lr is non-emptyI_r = \begin{cases} 0 \ \ \ \text{if $L_r$ is empty} \\ 1 \ \ \ \text{if $L_r$ is non-empty} \end{cases}

The height, hh, of the skiplist is then given by

h=r=1Irh = \sum^\infty_{r = 1} I_r

Note that IrI_r is never more than the length, Lr|L_r|, of LrL_r, so

E[Ir]E[Lr]=n/2rE[I_r] \le E\left[|Lr|\right] = n/2^r

Therefore, we have

E[h]=E[r=1Ir]=r=1E[Ir]=r=1lognE[Ir]+r=logn+1E[Ir]r=1logn1+r=logn+1n/2rlogn+r=01/2r=logn+2\begin{split} E[h] & = E \left[ \sum^\infty_{r = 1} I_r \right] \\ & = \sum^\infty_{r = 1} E[I_r] \\ & = \sum^{\lfloor \log n \rfloor}_{r = 1} E[I_r] + \sum^\infty_{r = \lfloor \log n \rfloor + 1} E[I_r] \\ & \le \sum^{\lfloor \log n \rfloor}_{r = 1} 1 + \sum^\infty_{r = \lfloor \log n \rfloor + 1} n/2^r \\ & \le \log n + \sum^\infty_{r = 0} 1/2^r \\ & = \log n + 2 \end{split}

Number of nodes including sentinel

Lemma 4: The expected number of nodes in a skiplist containing nn elements, including all occurrences of the sentinel, is 2n+O(logn).2n + O(\log n).

Proof: By Lemma 2, the expected number of nodes, not including the sentinel, is 2n.2n. The number of occurrences of the sentinel is equal to the height, hh, of the skiplist so, by Lemma 3 the expected number of occurrences of the sentinel is at most logn+2=O(logn).\log n + 2 = O(log n).

Length of a search path

Lemma 5: The expected length of a search path in a skiplist is at most 2logn+O(1).2 \log n + O(1).

Proof: The easiest way to see this is to consider the reverse search path for a node, xx. This path starts at the predecessor of xx in L0L_0. At any point in time, if the path can go up a level, then it does. If it cannot go up a level then it goes left. Thinking about this for a few moments will convince us that the reverse search path for xx is identical to the search path for xx, except that it is reversed.

The number of nodes that the reverse search path visits at a particular level, rr, is related to the following experiment: Toss a coin. If the coin comes up as heads, then move up and stop. Otherwise, move left and repeat the experiment. The number of coin tosses before the heads represents the number of steps to the left that a reverse search path takes at a particular level.

Note: Note that this might overcount the number of steps to the left, since the experiment should end either at the first heads or when the search path reaches the sentinel, whichever comes first. This is not a problem since the lemma is only stating an upper bound.

Lemma 1 tells us that the expected number of coin tosses before the first heads is 1.1.

Let SrS_r denote the number of steps the forward search path takes at level rr that go to the right. We have just argued that E[Sr]1.E[Sr] \le 1. Furthermore, SrLrSr \le |Lr|, since we can’t take more steps in LrL_r than the length of LrL_r, so

E[Sr]E[Lr]=n/2rE[S_r] \le E \left[|L_r|\right] = n/2^r

We can now finish as in the proof of Lemma 3. Let SS be the length of the search path for some node, uu, in a skiplist, and let hh be the height of the skiplist. Then

E[S]=E[h+r=0Sr]=E[h]+r=0E[Sr]=E[h]+r=0lognE[Sr]+r=logn+1E[Sr]E[h]+r=0logn1+r=logn+1n/2rE[h]+r=0logn1+r=01/2rE[h]+logn+32logn+5\begin{split} E[S] & = E\left[ h + \sum^\infty_{r = 0} S_r \right] \\ & = E[h] + \sum^\infty_{r = 0} E[S_r] \\ & = E[h] + \sum^{\lfloor \log n\rfloor}_{r = 0} E[S_r] + \sum^\infty_{r = \lfloor \log n \rfloor + 1} E[S_r] \\ & \le E[h] + \sum^{\lfloor \log n\rfloor}_{r = 0} 1 + \sum^\infty_{r = \lfloor \log n \rfloor + 1} n/2^r \\ & \le E[h] + \sum^{\lfloor \log n\rfloor}_{r = 0} 1 + \sum^\infty_{r = 0} 1/2^r \\ & \le E[h] + \log n + 3 \\ & \le 2 \log n + 5 \end{split}

The following theorem summarizes the results in this section:

Theorem: A skiplist containing nn elements has expected size O(n)O(n) and the expected length of the search path for any particular element is at most 2logn+O(1).2 \log n + O(1).

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy