Algorithm Learning Notes-01

Analysis of Complexity

1. Why do we need complexity analysis?#

Usually, we obtain performance data through statistics and monitoring to evaluate performance. Some people call this commonly used performance analysis method "post-analysis".

However, although post-analysis is widely used, it has some flaws:

Test results are highly dependent on the test environment.
- In a sense, this is also convenient, but facing different machines and different compilation environments, the same code segment may have different results.
Test results are greatly affected by the test scale.
- The same code segment will have different results under large and small scales.

So, is there a method that allows us to ignore the test environment and accurately analyze the performance of the code at any scale?

Of course, there is, and that is to use the "Big O notation" to apply "time complexity" and "space complexity" to analyze the code.

2. Big O Notation#

Big O notation: The time complexity of an algorithm is usually expressed using the Big O symbol, defined as $T(n) = O\big(f(n)\big)$. It is said that the function $T(n)$ is bounded by or limited to $f(n)$. If the scale of a problem is n, the time required for an algorithm to solve this problem is $T(n)$. $T(n)$ is called the "time complexity" of this algorithm. When the input size n gradually increases, the extreme case of time complexity is called the "asymptotic time complexity" of the algorithm. - Baidu Baike

Time Complexity#

Time complexity: Also known as "asymptotic time complexity", it represents the growth relationship between the execution time of an algorithm and the data scale.

We use a formula to represent it:

T(n) = O\big(f(n)\big)

$T(n)$	$n$	$f(n)$	$O$
Execution time of the code	Scale of the code	Number of executions of each line of code	$T(n)$ is directly proportional to $f(n)$

To better understand this formula, let's take an example:

#include <stdio.h>
int main(){
    int c = 0;
    for(int i=1; i<=n; ++i)
        c += i;

    return 0;
}

From the perspective of the CPU, each line of code in this code segment is performing similar operations: read data - perform calculations - write data. Although the number of executions and the execution time of each line of code are different, we are only making a rough estimate here, so we can assume that the execution time of each line of code is the same, which is unit_time. Based on this assumption, what is the total execution time of this code segment?

The third line of code requires 1 unit_time of execution time, and the fourth and fifth lines are executed n times, so they require 2n * unit_time of execution time. Therefore, the total execution time of this code segment is (2n+1) * unit_time. As you can see, the total execution time T(n) of all the code is directly proportional to the number of executions of each line of code.

Methods for Analyzing Time Complexity#

1. Only focus on the code segment with the most loops#

Here is an important inference that I think is important: Since the lower order, coefficient, and constant do not affect the scale, we can take the highest order as the representative.

2. Addition rule#

The total complexity is equal to the complexity of the code segment with the highest order of magnitude

Let's take an example:

#include <stdio.h>
int main(){
    int x=0;
    for(int i=1; i<=100; ++i)
        x += i;

    int y=0;
    for(int i=1; i<=n; ++i)
        x += i;

    int z=0;
    for(int i=1; i<=n; ++i){
        for(int j=1; j<=n; ++j)
            z += j;
    }

    printf("%d %d %d\n", x, y, z);
    return 0;
}

In this example, we need to calculate x, y, and z. We calculate the time complexity of each part separately, and then take the highest order as the time complexity of the entire code segment.

For the calculation of x, it loops 100 times. Since it is a constant, it can be ignored.

For the calculation of y, it loops n times. Let's denote it as $O(n)$.

For the calculation of z, it loops $n^2$ times. Let's denote it as $O(n^2)$.

We take the highest order $O(n^2)$ as the time complexity of the entire code segment.

Therefore, the addition formula can be expressed as:

\because T_1(n)=O\big(f(n)\big), T_2(n)=O\big(g(n)\big)\\\\ \therefore T_总(n)=T_1(n)+T_2(n)=max\Big(O\big(f(n)\big)\Big)\\\\ \\\\ O\big(g(n)\big)=O\Big(max\big(f(n),g(n)\big)\Big)

3. Multiplication rule#

The complexity of nested code is equal to the product of the complexities of the outer and inner code

Let's take an example:

#include <stdio.h>
int f(int n);
int main(){
    int x = 0;
    for(int i=0;i<=n;++i)
        x += f(i);

    return 0;
}

int f(int n){
    int y = 0;
    for(int i=0;i<=n;++i)
        y += i;

    return y;
}

In this example, if we treat f() as a normal operation, then the time complexity of the sixth line is $T_1(n)=O(n)$. However, f() is a function, and its time complexity is $T_2(n)=O(n)$. Therefore, the time complexity of this code segment should be $T_{total}=T_1(n) * T_2(n)=O(n*n)=O(n^2)$.

Analysis of Several Common Time Complexities#

Complexity Levels

Among them, we can divide them into two categories: polynomial complexity and non-polynomial complexity. There are only two non-polynomial complexities: $O(2^n)$ and $O(n!)$.

We call algorithm problems with time complexity of non-polynomial complexity NP problems. As the data scale n increases, the execution time of non-polynomial complexity algorithms will increase sharply, and the execution time of solving problems will increase infinitely. Therefore, algorithms with non-polynomial time complexity are actually very inefficient.

1. $O(1)$ - Constant time complexity#

The '1' here is just a representation.

As long as the execution time of the code does not increase with the increase of n, or in general, as long as there are no loop statements or recursive statements in the algorithm, even if there are thousands of lines of code, the time complexity is $O(1)$.

2. $O(\log n)$, $O(n\log n)$ - Logarithmic time complexity#

All logarithmic time complexities can be represented as $O(\log n)$, and $O(n\log n)$ is the same.

According to the change of base formula, logarithms can be converted to each other, so any logarithm can be converted to $Cf(n)$, where $f(n)$ is a logarithm. According to the conclusion we obtained earlier, constants do not affect the scale and can be ignored. Therefore, in the representation of logarithmic time complexity, we ignore the "base" of the logarithm and represent it uniformly as $O(\log n)$.

3. $O(m+n)$, $O(m*n)$ - Time complexity of two data scales#

In general, the complexity of code determined by the scale of two data cannot be simplified by using the addition rule to omit one of them. Therefore, we use $O(m+n)$ and $O(m*n)$ to represent it.

In this case, the addition rule should be:

\because T_1(m)=O\big(f(m)\big), T_2(n)=O(g(n))\\\\ \therefore T_总(n)=T_1(m)+T_2(n)=O\big(f(n)+g(n)\big)

The multiplication rule remains the same.

Space Complexity Analysis#

Space complexity, also known as asymptotic space complexity, represents the growth relationship between the storage space of an algorithm and the data scale.

Let's take an example to see.

#include <stdio.h>
int main(){
    int a[n];
    for(int i=0; i<=n; ++i)
        a[i] = i * i;
    for(int i=n-1; i>=0; --i)
        pritntf("%d", a[i]);

    return 0;
}

So, analyzing this code, the third line allocates an int array of size n, and other than that, the remaining code does not occupy more space, so the space complexity of the entire code is $O(n)$.

3. Summary#

Complexity, also known as asymptotic complexity, includes time complexity and space complexity, which are used to analyze the growth relationship between algorithm execution efficiency and data scale. It can roughly represent that the higher the order of complexity, the lower the execution efficiency. There are not many common complexities, from low to high, there are $O(1)$, $O(\log n)$, $O(n)$, $O(n\log n)$, $O(n^2)$.

The Proportional Relationship between T(n) and n in Various Common Complexities

Reference Articles#

Basic Usage of MathJax

Data Structures and Algorithms - Wang Zheng