Computer Science Distilled, Chapter 2: Complexity

This is a full chapter excerpt from Wladston Viana Ferreira Filho’s brand new book Computer Science Distilled which he has graciously allowed for us to publish here.

In almost every computation, a variety of arrangements for the processes is possible. It is essential to choose that arrangement which shall tend to minimize the time necessary for the calculation.

—Ada Lovelace

How much time does it take to sort 26 shuffled cards? If instead, you had 52 cards, would it take twice as long? How much longer would it take for a thousand decks of cards? The answer is intrinsic to the method used to sort the cards.

A method is a list of unambiguous instructions for achieving a goal. A method that always requires a finite series of operations is called an algorithm. For instance, a card-sorting algorithm is a method that will always specify some operations to sort a deck of 26 cards per suit and per rank.

Less operations need less computing power. We like fast solutions, so we monitor the number of operations in our algorithms. Many algorithms require a fast-growing number of operations when the input grows in size. For example, our card-sorting algorithm could take few operations to sort 26 cards, but four times more operations to sort 52 cards!

To avoid bad surprises when our problem size grows, we find the algorithm’s time complexity. In this chapter, you’ll learn to:

  • Count and interpret time complexities
  • Express their growth with fancy Big-O‘s
  • Run away from exponential algorithms
  • Make sure you have enough computer memory.

But first, how do we define time complexity?

Time complexity is written T(n). It gives the number of operations the algorithm performs when processing an input of size n. We also refer to an algorithm's T(n) as its running cost. If our card-sorting algorithm follows T(n)=n2, we can predict how much longer it takes to sort a deck once we double its size: T(2n)T(n)=4.

Hope for the best, prepare for the worst

Isn't it faster to sort a pile of cards that's almost sorted already?

Input size isn't the only characteristic that impacts the number of operations required by an algorithm. When an algorithm can have different values of T(n) for the same value of n, we resort to cases:

  • Best Case: when the input requires the minimum number of operations for any input of that size. In sorting, it happens when the input is already sorted.
  • Worst Case: when the input requires the maximum number of operations for any input of that size. In many sorting algorithms, that’s when the input was given in reverse order.
  • Average Case: refers to the average number of operations required for typical inputs of that size. For sorting, an input in random order is usually considered.

In general, the most important is the worst case. From there, you get a guaranteed baseline you can always count on. When nothing is said about the scenario, the worst case is assumed. Next, we’ll see how to analyze a worst case scenario, hands on.

Figure 2.1: “Estimating Time”, courtesy of xkcd.com.

2.1 Counting Time

We find the time complexity of an algorithm by counting the number of basic operations it requires for a hypothetical input of size n. We'll demonstrate it with Selection Sort, a sorting algorithm that uses a nested loop. An outer for loop updates the current position being sorted, and an inner for loop selects the item that goes in the current position1:

function selection_sort(list)
    for current ← 1 … list.length - 1
        smallest ← current
        for i ← current + 1 … list.length
            if list[i] < list[smallest]
                smallest ← i
        list.swap_items(current, smallest)

Let's see what happens with a list of n items, assuming the worst case. The outer loop runs n-1 times and does two operations per run (one assignment and one swap) totaling 2n-2 operations. The inner loop first runs n-1 times, then n-2 times, n-3 times, and so on. We know how to sum these types of sequences2:

number of inner loop runs= n1   +   n2++2+1n1total runs of the outer loop.
= i=1n1i=(n1)(n)2=n2n2.

In the worst case, the if condition is always met. This means the inner loop does one comparison and one assignment (n2-n)/2 times, hence n2-n operations. In total, the algorithm costs 2n-2 operations for the outer loop, plus n2-n operations for the inner loop. We thus get the time complexity:

T(n)=n2+n2.

Now what? If our list size was n=8 and we double it, the sorting time will be multiplied by:

T(16)T(8)=162+16282+823.86.

If we double it again we will multiply time by 3.90. Double it over and over and find 3.94, 3.97, 3.98. Notice how this gets closer and closer to 4? This means it would take four times as long to sort two million items than to sort one million items.

2.1.1 Understanding Growth

Say the input size of an algorithm is very large, and we increase it even more. To predict how the execution time will grow, we don't need to know all terms of T(n). We can approximate T(n) by its fastest-growing term, called the dominant term.

The Index Card Problem: Yesterday, you knocked over one box of index cards. It took you two hours of Selection Sort to fix it. Today, you spilled ten boxes. How much time will you need to arrange the cards back in?

We've seen Selection Sort follows T(n)=n2+n-2. The fastest-growing term is n2, therefore we can write T(n)n2. Assuming there are n cards per box, we find:

T(10n)T(n)(10n)2n2=100.

It will take you approximately (100×2)hours=200 hours! What if we had used a different sorting method? For example, there’s one called "Bubble Sort" whose time complexity is T(n)=0.5n2+0.5n. The fastest-growing term then gives T(n)0.5n2, hence:

T(10n)T(n)0.5×(10n)20.5×n2=100.
Figure 2.2: Zooming out n2,  n2+n-2,  and  0.5n2+0.5n,  as n gets larger and larger.

The 0.5 coefficient cancels itself out! The idea that n2-n-2 and 0.5n2+0.5n both grow like n2 isn't easy to get. How does the fastest-growing term of a function ignore all other numbers and dominate growth? Let’s try to visually understand this.

In Figure 2.2, the two time complexities we've seen are compared to n2 at different zoom levels. As we plot them for larger and larger values of n, their curves seem to get closer and closer. Actually, you can plug any numbers into the bullets of T(n)=n2+n+, and it will still grow like n2.

Remember, this effect of curves getting closer works if the fastest-growing term is the same. The plot of a function with a linear growth (n) never gets closer and closer to one with a quadratic growth (n2), which in turn never gets closer and closer to one having a cubic growth (n3).

That’s why with very big inputs, algorithms with a quadratically growing cost perform a lot worse than algorithms with a linear cost. However, they perform a lot better than those with a cubic cost. If you’ve understood this, the next section will be easy: we will just learn the fancy notation coders use to express this.

2.2 The Big-O Notation

There's a special notation to refer to classes of growth: the Big-O notation. A function with a fastest-growing term of 2n or weaker is O(2n); one with a quadratic or weaker growth is O(n2); growing linearly or less, O(n), and so on. The notation is used for expressing the dominant term of algorithms' cost functions in the worst case—that's the standard way of expressing time complexity3.

Figure 2.3: Different orders of growth often seen inside O.

Both Selection Sort and Bubble Sort are O(n2), but we'll soon discover O(nlogn) algorithms that do the same job. With our O(n2) algorithms, 10× the input size resulted in 100× the running cost. Using a O(nlogn) algorithm, 10× the input size results in only 10log1034× the running cost.

When n is a million, n2 is a trillion, whereas nlogn is just a few million. Years running a quadratic algorithm on a large input could be equivalent to minutes if a O(nlogn) algorithm was used. That’s why you need time complexity analysis when you design systems that handle very large inputs.

When designing a computational system, it’s important to anticipate the most frequent operations. Then you can compare the Big-O costs of different algorithms that do these operations4. Also, most algorithms only work with specific input structures. If you choose your algorithms in advance, you can structure your input data accordingly.

Some algorithms always run for a constant duration regardless of input size—they're O(1). For example, checking if a number is odd or even: we see if its last digit is odd and boom, problem solved. No matter how big the number. We'll see more O(1) algorithms in the next chapters. They're amazing, but first let's see which algorithms are not amazing.

2.3 Exponentials

We say O(2n) algorithms are exponential time. From the graph of growth orders (Figure 2.3), it doesn't seem the quadratic n2 and the exponential 2n are much different. Zooming out the graph, it's obvious the exponential growth brutally dominates the quadratic one:

Figure 2.4: Different orders of growth, zoomed out. The linear and logarithmic curves grow so little they aren’t visible anymore.

Exponential time grows so much, we consider these algorithms “not runnable”. They run for very few input types, and require huge amounts of computing power if inputs aren’t tiny. Optimizing every aspect of the code or using supercomputers doesn’t help. The crushing exponential always dominates growth and keeps these algorithms unviable.

To illustrate the explosiveness of exponential growth, let's zoom out the graph even more and change the numbers (Figure 2.5). The exponential was reduced in power (from 2 to 1.5) and had its growth divided by a thousand. The polynomial had its exponent increased (from 2 to 3) and its growth multiplied by a thousand.

Figure 2.5: No exponential can be beaten by a polynomial. At this zoom level, even the nlogn curve grows too little to be visible.

Some algorithms are even worse than exponential time algorithms. It's the case of factorial time algorithms, whose time complexities are O(n!). Exponential and factorial time algorithms are horrible, but we need them for the hardest computational problems: the famous NP-complete problems. We will see important examples of NP-complete problems in the next chapter. For now, remember this: the first person to find a non-exponential algorithm to a NP-complete problem gets a million dollars5 from the Clay Mathematics Institute.

It’s important to recognize the class of problem you’re dealing with. If it’s known to be NP-complete, trying to find an optimal solution is fighting the impossible. Unless you’re shooting for that million dollars.

2.4 Counting Memory

Even if we could perform operations infinitely fast, there would still be a limit to our computing power. During execution, algorithms need working storage to keep track of their ongoing calculations. This consumes computer memory, which is not infinite.

The measure for the working storage an algorithm needs is called space complexity. Space complexity analysis is similar to time complexity analysis. The difference is that we count computer memory, and not computing operations. We observe how space complexity evolves when the algorithm’s input size grows, just as we do for time complexity.

For example, Selection Sort just needs working storage for a fixed set of variables. The number of variables does not depend on the input size. Therefore, we say Selection Sort's space complexity is O(1): no matter what the input size, it requires the same amount of computer memory for working storage.

However, many other algorithms need working storage that grows with input size. Sometimes, it's impossible to meet an algorithm’s memory requirements. You won't find an appropriate sorting algorithm with O(nlogn) time complexity and O(1) space complexity. Computer memory limitations sometimes force a tradeoff. With low memory, you’ll probably need an algorithm with slow O(n2) time complexity because it has O(1) space complexity.

Conclusion

In this chapter, we learned algorithms can have different types of voracity for consuming computing time and computer memory. We’ve seen how to assess it with time and space complexity analysis. We learned to calculate time complexity by finding the exact T(n) function, the number of operations performed by an algorithm.

We've seen how to express time complexity using the Big-O notation (O). Throughout this book, we'll perform simple time complexity analysis of algorithms using this notation. Many times, calculating T(n) is not necessary for inferring the Big-O complexity of an algorithm.

We’ve seen the cost of running exponential algorithms explode in a way that makes these algorithms not runnable for big inputs. And we learned how to answer these questions:

  • Given different algorithms, do they have a significant difference in terms of operations required to run?
  • Multiplying the input size by a constant, what happens with the time an algorithm takes to run?
  • Would an algorithm perform a reasonable number of operations once the size of the input grows?
  • If an algorithm is too slow for running on an input of a given size, would optimizing the algorithm, or using a supercomputer help?

1: To understand an new algorithm, run it on paper with a small sample input.

2: In the previous chapter, we showed i=1ni=n(n+1)/2.

3: We say ‘oh’, e.g., “that sorting algorithm is oh-n-squared“.

4: For the Big-O complexities of most algorithms that do common tasks, see http://code.energy/bigo

5: It has been proven a non-exponential algorithm for any NP-complete problem could be generalized to all NP-complete problems. Since we don’t know if such an algorithm exists, you also get a million dollars if you prove an NP-complete problem cannot be solved by non-exponential algorithms!


Computer Science Distilled: Learn the Art of Solving Computational Problems by Wladston Viana Ferreira Filho is available on Amazon now.