MailChimp: Grow sales with Customer Journey Smarts

In almost every computation, a variety of arrangements for the processes is possible. It is essential to choose that arrangement which shall tend to minimize the time necessary for the calculation.

—Ada Lovelace

How much time does it take to sort 26 shuffled cards? If instead, you had 52 cards, would it take twice as long? How much longer would it take for a thousand decks of cards? The answer is intrinsic to the **method** used to sort the cards.

A method is a list of unambiguous instructions for achieving a goal. A method that always requires a finite series of operations is called an **algorithm**. For instance, a card-sorting algorithm is a method that will always specify some operations to sort a deck of 26 cards per suit and per rank.

Less operations need less computing power. We like fast solutions, so we monitor the number of operations in our algorithms. Many algorithms require a fast-growing number of operations when the input grows in size. For example, our card-sorting algorithm could take few operations to sort 26 cards, but four times more operations to sort 52 cards!

To avoid bad surprises when our problem size grows, we find the algorithm’s **time complexity**. In this chapter, you’ll learn to:

- Count and interpret time complexities
- Express their growth with fancy
**Big-O**‘s - Run away from
**exponential**algorithms - Make sure you have enough computer
**memory**.

But first, how do we define time complexity?

Time complexity is written $T(n)$. It gives the number of operations the algorithm performs when processing an input of size $n$. We also refer to an algorithm's $T(n)$ as its **running cost**. If our card-sorting algorithm follows $T(n)={n}^{2}$, we can predict how much longer it takes to sort a deck once we double its size: $\frac{T(2n)}{T(n)}=4$.

### Hope for the best, prepare for the worst

Isn't it faster to sort a pile of cards that's almost sorted already?

Input size isn't the only characteristic that impacts the number of operations required by an algorithm. When an algorithm can have different values of $T(n)$ for the same value of $n$, we resort to cases:

**Best Case:**when the input requires the minimum number of operations for any input of that size. In sorting, it happens when the input is already sorted.**Worst Case:**when the input requires the maximum number of operations for any input of that size. In many sorting algorithms, that’s when the input was given in reverse order.**Average Case:**refers to the average number of operations required for typical inputs of that size. For sorting, an input in random order is usually considered.

In general, the most important is the worst case. From there, you get a guaranteed baseline you can always count on. When nothing is said about the scenario, the worst case is assumed. Next, we’ll see how to analyze a worst case scenario, hands on.

### 2.1 Counting Time

We find the time complexity of an algorithm by counting the number of basic operations it requires for a hypothetical input of size $n$. We'll demonstrate it with **Selection Sort**, a sorting algorithm that uses a nested loop. An outer `for`

loop updates the current position being sorted, and an inner `for`

loop selects the item that goes in the current position^{1}:

```
function selection_sort(list)
for current ← 1 … list.length - 1
smallest ← current
for i ← current + 1 … list.length
if list[i] < list[smallest]
smallest ← i
list.swap_items(current, smallest)
```

Let's see what happens with a list of $n$ items, assuming
the worst case. The outer loop runs $n-1$ times and does two
operations per run (one assignment and one swap) totaling $2n-2$ operations. The inner loop first runs $n-1$ times, then $n-2$ times, $n-3$ times, and so on. We know how to sum these types of sequences^{2}:

$\text{number of inner loop runs}=$ | $\stackrel{n\u20131\text{total runs of the outer loop.}}{\stackrel{\u23de}{n\u20131\hspace{1em}\hspace{1em}\hspace{1em}+\hspace{1em}\hspace{1em}\hspace{1em}n\u20132\hspace{1em}+\hspace{1em}\mathrm{\cdots}+2+1}}$ | ||

$=$ | $\sum _{i=1}^{n\u20131}}i={\displaystyle \frac{(n\u20131)(n)}{2}}={\displaystyle \frac{{n}^{2}\u2013n}{2}}.$ |

In the worst case, the if condition is always met. This means the inner loop does one comparison and one assignment $({n}^{2}-n)/2$ times, hence ${n}^{2}-n$ operations. In total, the algorithm costs $2n-2$ operations for the outer loop, plus ${n}^{2}-n$ operations for the inner loop. We thus get the time complexity:

$$T(n)={n}^{2}+n\u20132.$$ |

Now what? If our list size was $n=8$ and we double it, the sorting time will be multiplied by:

$$\frac{T(16)}{T(8)}=\frac{{16}^{2}+16\u20132}{{8}^{2}+8\u20132}\approx 3.86.$$ |

If we double it again we will multiply time by $3.90$. Double it over and over and find $3.94$, $3.97$, $3.98$. Notice how this gets closer and closer to 4? This means it would take four times as long to sort two million items than to sort one million items.

#### 2.1.1 Understanding Growth

Say the input size of an algorithm is very large, and we increase it even more. To predict how the execution time will grow, we don't need to know all terms of $T(n)$. We can approximate $T(n)$ by its fastest-growing term, called the **dominant term**.

The Index Card Problem:Yesterday, you knocked over one box of index cards. It took you two hours of Selection Sort to fix it. Today, you spilled ten boxes. How much time will you need to arrange the cards back in?

We've seen Selection Sort follows $T(n)={n}^{2}+n-2$. The fastest-growing term is ${n}^{2}$, therefore we can write $T(n)\approx {n}^{2}$. Assuming there are $n$ cards per box, we find:

$$\frac{T(10n)}{T(n)}\approx \frac{{(10n)}^{2}}{{n}^{2}}=100.$$ |

It will take you approximately $(100\times 2)\text{hours}=200$ hours! What if we had used a different sorting method? For example, there’s one called "Bubble Sort" whose time complexity is $T(n)=0.5{n}^{2}+0.5n$. The fastest-growing term then gives $T(n)\approx 0.5{n}^{2}$, hence:

$$\frac{T(10n)}{T(n)}\approx \frac{0.5\times {(10n)}^{2}}{0.5\times {n}^{2}}=100.$$ |

The $0.5$ coefficient cancels itself out! The idea that ${n}^{2}-n-2$ and $0.5{n}^{2}+0.5n$ both grow like ${n}^{2}$ isn't easy to get. How does the fastest-growing term of a function ignore all other numbers and dominate growth? Let’s try to visually understand this.

In Figure 2.2, the two time complexities we've seen are compared to ${n}^{2}$ at different zoom levels. As we plot them for larger and larger values of $n$, their curves seem to get closer and closer. Actually, you can plug any numbers into the bullets of $T(n)=\mathrm{\bullet}{n}^{2}+\mathrm{\bullet}n+\mathrm{\bullet}$, and it will still grow like ${n}^{2}$.

Remember, this effect of curves getting closer works if the fastest-growing term is the same. The plot of a function with a linear growth ($n$) never gets closer and closer to one with a quadratic growth (${n}^{2}$), which in turn never gets closer and closer to one having a cubic growth (${n}^{3}$).

That’s why with very big inputs, algorithms with a quadratically growing cost perform a lot worse than algorithms with a linear cost. However, they perform a lot better than those with a cubic cost. If you’ve understood this, the next section will be easy: we will just learn the fancy notation coders use to express this.

### 2.2 The Big-O Notation

There's a special notation to refer to classes of growth: the Big-O notation. A function with a fastest-growing term of ${2}^{n}$ *or weaker* is $O({2}^{n})$; one with a quadratic *or weaker* growth is $O({n}^{2})$; growing linearly *or less*, $O(n)$, and so on. The notation is used for expressing the dominant term of algorithms' cost functions in the worst case—that's the standard way of expressing time complexity^{3}.

Both Selection Sort and Bubble Sort are $O({n}^{2})$, but we'll soon discover $O(n\mathrm{log}n)$ algorithms that do the same job. With our $O({n}^{2})$ algorithms, 10$\times $ the input size resulted in 100$\times $ the running cost. Using a $O(n\mathrm{log}n)$ algorithm, 10$\times $ the input size results in only $10\mathrm{log}10\approx 34\mathrm{\times}$ the running cost.

When $n$ is a million, ${n}^{2}$ is a trillion, whereas $n\mathrm{log}n$ is just a few million. Years running a quadratic algorithm on a large input could be equivalent to minutes if a $O(n\mathrm{log}n)$ algorithm was used. That’s why you need time complexity analysis when you design systems that handle very large inputs.

When designing a computational system, it’s important to anticipate the most frequent operations. Then you can compare the Big-O costs of different algorithms that do these operations^{4}. Also, most algorithms only work with specific input structures. If you choose your algorithms in advance, you can structure your input data accordingly.

Some algorithms always run for a constant duration regardless of input size—they're $O(1)$. For example, checking if a number is odd or even: we see if its last digit is odd and *boom*, problem
solved. No matter how big the number. We'll see more $O(1)$ algorithms in the next chapters. They're amazing, but first let's see which algorithms are *not* amazing.

### 2.3 Exponentials

We say $O({2}^{n})$ algorithms are exponential time. From the graph of growth orders (Figure 2.3), it doesn't seem the quadratic ${n}^{2}$ and the exponential ${2}^{n}$ are much different. Zooming out the graph, it's obvious the exponential growth brutally dominates the quadratic one:

Exponential time grows so much, we consider these algorithms “not runnable”. They run for very few input types, and require huge amounts of computing power if inputs aren’t tiny. Optimizing every aspect of the code or using supercomputers doesn’t help. The crushing exponential always dominates growth and keeps these algorithms unviable.

To illustrate the explosiveness of exponential growth, let's zoom out the graph even more and change the numbers (Figure 2.5). The exponential was reduced in power (from $2$ to $1.5$) and had its growth divided by a thousand. The polynomial had its exponent increased (from $2$ to $3$) and its growth multiplied by a thousand.

Some algorithms are even worse than exponential time algorithms. It's the case of factorial time algorithms, whose time complexities are $O(n!)$. Exponential and factorial time algorithms are horrible, but we need them for the hardest computational problems: the famous NP-complete problems. We will see important examples of NP-complete problems in the next chapter. For now, remember this: the first person to find a non-exponential algorithm to a NP-complete problem gets a million dollars^{5} from the Clay Mathematics Institute.

It’s important to recognize the class of problem you’re dealing with. If it’s known to be NP-complete, trying to find an optimal solution is fighting the impossible. Unless you’re shooting for that million dollars.

### 2.4 Counting Memory

Even if we could perform operations infinitely fast, there would still be a limit to our computing power. During execution, algorithms need working storage to keep track of their ongoing calculations. This consumes **computer memory**, which is not infinite.

The measure for the working storage an algorithm needs is called **space complexity**. Space complexity analysis is similar to time complexity analysis. The difference is that we count computer memory, and not computing operations. We observe how space complexity evolves when the algorithm’s input size grows, just as we do for time complexity.

For example, Selection Sort just needs working storage for a fixed set of variables. The number of variables does not depend on the input size. Therefore, we say Selection Sort's space complexity is $O(1)$: no matter what the input size, it requires the same amount of computer memory for working storage.

However, many other algorithms need working storage that grows with input size. Sometimes, it's impossible to meet an algorithm’s memory requirements. You won't find an appropriate sorting algorithm with $O(n\mathrm{log}n)$ time complexity *and* $O(1)$ space complexity. Computer memory limitations sometimes force a tradeoff. With low memory, you’ll probably need an algorithm with slow $O({n}^{2})$ time complexity because it has $O(1)$
space complexity.

### Conclusion

In this chapter, we learned algorithms can have different types of voracity for consuming computing time and computer memory. We’ve seen how to assess it with time and space complexity analysis. We learned to calculate time complexity by finding the *exact* $T(n)$ function, the number of operations performed by an algorithm.

We've seen how to express time complexity using the Big-O notation ($O$). Throughout this book, we'll perform simple time complexity analysis of algorithms using this notation. Many times, calculating $T(n)$ is not necessary for inferring the Big-O complexity of an algorithm.

We’ve seen the cost of running exponential algorithms explode in a way that makes these algorithms not runnable for big inputs. And we learned how to answer these questions:

- Given different algorithms, do they have a significant difference in terms of operations required to run?
- Multiplying the input size by a constant, what happens with the time an algorithm takes to run?
- Would an algorithm perform a reasonable number of operations once the size of the input grows?
- If an algorithm is too slow for running on an input of a given size, would optimizing the algorithm, or using a supercomputer help?

^{1}: To understand an new algorithm, run it on paper with a small sample input.

^{2}: In the previous chapter, we showed ${\sum}_{i=1}^{n}i=n(n+1)/2$.

^{3}: We say *‘oh’*, e.g., “that sorting algorithm is *oh-n-squared*“.

^{4}: For the Big-O complexities of most algorithms that do common tasks, see http://code.energy/bigo

^{5}: It has been proven a non-exponential algorithm for *any* NP-complete problem could be generalized to *all* NP-complete problems. Since we don’t know if such an algorithm exists, you also get a million dollars if you prove an NP-complete problem cannot be solved by non-exponential algorithms!

*Computer Science Distilled: Learn the Art of Solving Computational Problems* by Wladston Viana Ferreira Filho is available on Amazon now.

Nice explanation of time (and memory) complexity analysis. Small nitpick (emphasis mine):

This isn’t true because it depends on the sorting algorithm. For example, a quicksort implementation that picks either the first element or the last element as the pivot would exhibit worst-case behavior on sorted input, whereas bubblesort would show best-case behavior. As another example, heapsort has the same best, worst, and average case complexity.

Hi Agop! I’m the book’s author. Thank you for your comment :) Yes, your observation is totally correct. I wrote it like that to make things easier for the reader. Because considering all known comparison sort algorithms, it’s only possible to sort the cards with an O(n) cost in the case the input is given in sort order. So what I’m saying is not super precise, but not entirely incorrect as well :)

Fair enough! Thanks for the reply :)

After reading these comments, I remember this: “All models are wrong, but some are useful.” – George Box

– from https://betterexplained.com/articles/math-cartoonist/

Hi Wladston, this was a great chapter.

Although I looked everywhere, I couldn’t find an index of contents of your book.

If it’s no bother to you, can you post an index here? I am still deliberating on whether to buy your book or not, and since it’s crazy expensive in India, I would really like to know what I am getting into.

Thanks in advance

Hi Silver,

Thanks for the kind words! I offer special conditions for readers in developing countries, including India. Please, email me at [email protected], and I’ll send you another sample chapter, and we can find an affordable way to get a book delivered to you.