Algorithm Learning Notes-02

Analysis of Best, Worst, Average, and Amortized Time Complexity

Best and Worst Case Time Complexity#

Let's take a code snippet as an example:

// n represents the length of the array array
int find(int[] array, int n, int x) {
    int i = 0;
    int pos = -1;
    for (; i < n; ++i) {
        if (array[i] == x) {
            pos = i;
            break;
        }
    }

    return pos;
}

Is the time complexity of this code snippet O(n)? Probably not.

In this case, the time complexity of the code needs to be discussed based on different scenarios. Therefore, there are several time complexities:

Best Case Time Complexity: The time complexity of executing this code in the best-case scenario. In this example, if we are lucky and find the number at array[0], the code will stop here. In this case, the time complexity of this code is O(1).

Worst Case Time Complexity: The time complexity of executing this code in the worst-case scenario. In this example, if we are unlucky and find the number at array[n-1], or if the number is not found at all, the code needs to traverse the entire array. In this case, the time complexity of this code is O(n).

Average Time Complexity#

So, the best-case and worst-case time complexities may be too extreme and may not accurately represent the time complexity of this code in most cases. To better represent the complexity in the average case, we need to introduce the concept of average time complexity.

Going back to the previous example, there can be n+1 different scenarios for the position of variable x in the array: in the positions [0]~[n-1] (n) or not in the array (1). For each scenario, we accumulate the number of elements that need to be traversed to find this number, and divide it by the n+1 scenarios to obtain the average number of elements that need to be traversed to find this number:

Average number of elements that need to be traversed

According to the big O notation for time complexity, coefficients, constants, and lower-order terms can be ignored. Therefore, after simplification, the average time complexity is O(n).

March 18, uncertain, how was it simplified? I don't understand. Is it just based on the highest order when it is in fraction form?

March 19, answer: How was it simplified? According to the explanation of a senior on Leetcode, we can generalize it and get n(n+3) / 2(n+1) expanded to (n^2+3n) / (2n+2). Since constants can be ignored, it can be simplified to (n^2+3n) / (2n) = (n+3) / 2. Therefore, the simplified average time complexity is O(n)!!!

However, there are still some small issues with this conclusion. The n+1 scenarios for the position of variable x in the array do not have the same probability of occurring.

First, is the variable x in the array or not? To simplify, let's assume that the probabilities of being in the array and not being in the array are both 1/2. In addition, the probability of appearing in each position [0]~[n-1] in the array is the same, which is 1/n. Therefore, according to the multiplication rule of probability, the probability of x appearing in any position [0]~[n-1] in the array is 1/(2n).

Taking these probabilities into account, the calculation process of the average complexity becomes:

Calculation process of average time complexity considering probabilities

This value is the weighted average in probability theory, also known as the expected value. Therefore, the full name of the average time complexity should be weighted average time complexity or expected time complexity.

Finally, the weighted average time complexity of this code is still O(n).

In fact, in most cases, we do not need to distinguish between the best-case, worst-case, and average-case time complexities. We only use these three complexity notations to differentiate when the time complexity of the same code has a significant difference in magnitude under different scenarios.

Amortized Time Complexity#

Let's take another example:

// array represents an array of length n
// The array.length in the code is equal to n
int[] array = new int[n];
int count = 0;

void insert(int val) {
    if (count == array.length) {
        int sum = 0;
        for (int i = 0; i < array.length; ++i) {
            sum = sum + array[i];
        }
        count = 1; // Clear the array
        array[0] = sum;
     }

    array[count] = val;
    ++count;
 }

The logic of this code is as follows: First, the basic function of this function is to insert a value into an integer array. When the array is not full, the value is inserted directly. When the array is full, a new array with twice the size is allocated, and the data in the original array is copied to the new array using a for loop. Finally, the new array is assigned to the original array, replacing it.

Note: In this case, clearing the array means setting count to 1. For writable storage space, if the user considers it empty, it is empty. If you define empty as being completely overwritten with 0 or some other value, that's fine too! Users should only care about the new value to be stored.

In this code, the best-case time complexity is when the array is not full and the value is inserted directly. It is O(1).

The worst-case time complexity is when the array is full and we need to traverse and sum the array. It is O(n).

The average time complexity can be analyzed using probability theory. Assuming the length of the array is n, there are n scenarios based on the different insertion positions. Each scenario has a time complexity of O(1). In addition, there is an "extra" scenario when the array is full and we need to insert a value, which has a time complexity of O(n). Moreover, these n+1 scenarios have the same probability, which is 1/(n+1). Therefore, according to the calculation of weighted average, the average time complexity is:

Average time complexity obtained from weighted average

March 18, uncertain. Why is it 2n / n+1... But it can also be considered as O(n)? I really don't understand...

March 19, answer: I finally asked a senior on Leetcode, and the senior said, "2n/(n+1), the quantity on the top and bottom is of the order of n, so it can be considered as 2n/n = 2, which is a constant time complexity, so it is O(1)." After reading this, I suddenly realized it. The answer is so enlightening!

In fact, we don't need to analyze the average complexity of this example in such a complex way. We can compare the find() and insert() functions to see the difference.

First, the find() function has a time complexity of O(1) only in extreme cases. However, the insert() function has a time complexity of O(1) in most cases, and only has a higher complexity of O(n) in a few cases. This is the first difference between insert() and find().

Second, for the insert() function, the O(1) insertions and the O(n) insertion occur in a predictable pattern with a sequential relationship. Generally, after n-1 O(1) insertions (inserting data when the array is not full), there is always one O(n) insertion (traversing and summing the array, and clearing it), and this pattern repeats.

Therefore, for this special scenario of complexity analysis, we introduce a simpler analysis method: amortized analysis, which allows us to amortize the higher time complexity of some operations over the lower time complexity of other operations. In addition, in scenarios where amortized analysis can be applied, the amortized time complexity is generally equal to the best-case time complexity.

Amortized time complexity is a special case of average time complexity.

Summary#

These are several concepts related to complexity analysis: best-case time complexity, worst-case time complexity, average time complexity, and amortized time complexity. These concepts are introduced because the time complexity of a code snippet may vary under different inputs. After introducing these concepts, we can more comprehensively represent the execution efficiency of a code snippet.

After-class Exercise#

// Global variables: an array array of size 10, length len, and index i.
int array[] = new int[10]; 
int len = 10;
int i = 0;

// Add an element to the array
void add(int element) {
    if (i >= len) { // Array space is not enough
        // Reallocate an array space with twice the size
        int new_array[] = new int[len*2];
        // Copy the data from the original array to the new_array one by one
        for (int j = 0; j < len; ++j) {
            new_array[j] = array[j];
        }
        // Assign new_array to array, and the size of array is now twice len
        array = new_array;
        len = 2 * len;
    }
    // Put the element at index i, and increment i by one
    array[i] = element;
    ++i;
}

The logic of this code is as follows: Add an element to the array. When the array is not full, the value is inserted directly. When the array is full, a new array with twice the size is allocated, and the data in the original array is copied to the new array using a for loop. Finally, the new array is assigned to the original array, replacing it.

In this code, the best-case time complexity is when the array is not full and the value is inserted directly. It is O(1).

The worst-case time complexity is when the array is full and we need to traverse and copy the array. It is O(n).

The average time complexity can be analyzed based on the different insertion positions, which results in n scenarios. In addition, there is one scenario when the array is full, and each scenario has a time complexity of O(1). Moreover, each scenario has the same probability, which is 1/(n+1). Therefore, according to the calculation of weighted average, the average time complexity is:

Average time complexity obtained from weighted average

The amortized time complexity is O(1) because after every n O(1) operations, there is one O(n) operation, and they have a sequential relationship that can be amortized.

Application Scenarios of Average Time Complexity and Amortized Time Complexity#

When performing a series of continuous operations on a data structure, the time complexity is generally low in most cases, and only a few cases have higher time complexity. Moreover, these operations have a sequential relationship. In this case, we can analyze this group of operations together to see if the higher time complexity of some operations can be amortized over the lower time complexity of other operations. In addition, in scenarios where amortized analysis can be applied, the amortized time complexity is generally equal to the best-case time complexity.