Optimization of MySQL

Optimization of MySQL is generally focused on query operations. First, we need to understand the operations involved in an SQL query.

iShot_2023-02-28_08.58.53

1. Connection Configuration Optimization#

1.1 Server-Side Optimization#

Modify the default maximum number of connections on the server (usually 151) and the default client timeout period (usually 8 hours). However, it is not necessary to modify them if not required. The following are the ways to query their default values:

iShot_2023-02-28_09.11.29

iShot_2023-02-28_09.13.11

1.2 Client-Side Optimization#

For client-side optimization, we do not want clients to frequently create and destroy connections. In this case, we need to use a database connection pool to reuse connections and reduce the overhead of database resources. The most popular database connection pools currently are Druid and HikariCP. ~~DBCP and C3P0 have...~~

Of course, we need to pay attention that the size of the connection pool is not the bigger the better. Because each connection requires a thread to maintain, when the number of threads exceeds the number of CPUs, frequent thread context switching is required, which leads to increased performance overhead.

2. Architecture Optimization#

2.1 Use Caching#

Add caching between queries and the database to store data that does not change frequently.

2.2 Read-Write Separation#

This method is mainly used to reduce the load on the server. Use a server cluster, with one server as the master node and the others as slave nodes. When the data on the master server changes, the other servers need to synchronize. The master node is only responsible for insert, update, and delete operations, while the slave nodes are responsible for queries. This method effectively reduces the server load.

2.3 Sharding#

2.3.1 Vertical Sharding#

iShot_2023-02-28_14.08.52

Based on a monolithic database, divide it vertically according to business logic and distribute different tables to different databases, which are located on different servers. This is called vertical sharding.

iShot_2023-02-28_14.12.03

2.3.2 Vertical Partitioning#

iShot_2023-02-28_14.03.01

Vertical partitioning means dividing a single table into multiple smaller tables by splitting one or more fields of the table. This operation needs to be judged based on specific business requirements. Usually, frequently used fields (hot fields) are placed in one table, while less frequently used or not immediately used fields (cold fields) are placed in another table. This not only solves the problem of large data volume in a single table but also improves query speed.

iShot_2023-02-28_14.05.19

Taking the above image as an example: Usually, the details of a product are relatively long, and when viewing the product list, it is often not necessary to immediately display the product details (usually displayed when clicking the details button), but to display more important information about the product (such as price). Based on this business logic, we have vertically partitioned the original product table.

2.3.3 Horizontal Sharding#

After vertical sharding, the database performance issues are partially resolved. However, with the growth of business volume, the data stored in the monolithic database has exceeded expectations, and a single server can no longer support it. However, from a business perspective, vertical sharding is no longer feasible. At this time, horizontal sharding can be attempted based on certain rules, such as placing products with odd IDs and even IDs in two different databases.

iShot_2023-02-28_14.14.57

iShot_2023-02-28_14.15.44

2.3.4 Horizontal Partitioning#

Save the data of a single table on multiple data tables according to certain rules (known as sharding rules). This is horizontal partitioning.

iShot_2023-02-28_14.19.27

iShot_2023-02-28_14.20.14

2.3.5 Summary#

Horizontal partitioning is mainly used to solve storage bottlenecks, while vertical partitioning is mainly used to reduce concurrency pressure.

2.4 Peak Shaving with Message Queues#

Usually, user requests directly access the database. If there is a large number of online users at the same time, it is likely to overwhelm the database (similar to the status of Weibo during a celebrity scandal or announcement of a relationship). In this case, reducing the pressure on the database can be achieved by using message queues. Regardless of the number of simultaneous user requests, they are first stored in the message queue, and then the system consumes the requests from the message queue in an orderly manner.

3. Optimizer - SQL Analysis and Optimization#

At this stage, we have entered the territory of the parser and optimizer. Generally, if your SQL syntax is correct, there should be no problem with the parser. In addition, to prevent low efficiency of your SQL queries, the optimizer will automatically perform some optimizations. However, if the SQL is really bad, the optimizer cannot help, and your SQL query will become a slow query.

3.1 Slow Queries#

As the name suggests, slow queries are queries that are very slow. You can use the following command to check the slow query status of MySQL:

iShot_2023-02-28_22.42.48

Output the following command to view the criteria for slow queries:

iShot_2023-02-28_22.44.30

3.2 View Running Threads#

Run show full processlist to view all running threads in MySQL.

iShot_2023-02-28_23.03.44

Among them,

Id: The unique identifier of the thread, which can be used to kill a specific thread.
User: The user who started the thread. Ordinary accounts can only view their own threads.
Host: The IP and port from which the connection was made.
db: The database operated by the thread.
Command: The command of the thread.
Time: The duration of the operation, in seconds.
State: The state of the thread.
Info: The first 100 characters of the SQL statement.

3.3 View Server Running Status#

Use SHOW STATUS to view the running status of the MySQL server. There are two scopes: session and global.

iShot_2023-02-28_23.12.45

The above code is used to view the number of select operations.

3.4 View Storage Engine Running Information#

SHOW ENGINE INNODB STATUS;

3.5 `EXPLAIN` Execution Plan#

The so-called plan refers to whether the optimizer in our MySQL will optimize the SQL statement we wrote (such as changing outer joins to inner joins, optimizing subqueries to join queries, etc.). The optimizer estimates the cost of executing this SQL statement and decides which index to use (or even not to use an index and perform a full table scan). The optimizer's strategy for executing a single table is determined based on the specific SQL statement, such as which indexes to use, etc.

EXPLAIN execution plan used to be mainly used for query statements, but now it can also be used for insert, update, and delete statements.

3.6 SQL and Index Optimization#

3.6.1 SQL Optimization#

Use small tables to drive large tables; rewrite subqueries (nested queries) with join or union;
In join queries, minimize the fan-out (number of records) of the driving table, and try to create indexes on the join columns of the driven table to reduce access costs. It is best to use the primary key or unique secondary index columns of the driven table as the join columns, which can further reduce the cost of the driven table;
For LIMIT with a large offset, filter first and then sort.

iShot_2023-03-01_23.06.18

3.6.2 Index Optimization#

Avoid unnecessary lookups. If lookups are necessary, reduce the number of lookups as much as possible (lookups have costs and may require loading data pages from external storage);
Use index covering.

4. Storage Structure and Table Structure#

4.1 Selecting Storage Engines#

It is recommended to choose different storage engines based on different business requirements. For example:

For business tables with frequent query and insert operations, MyISAM is recommended;
Use Memory for temporary tables;
Use InnoDB for high concurrency and update-intensive businesses;
Use the default engine if unsure.

4.2 Field Optimization#

Principle: Use the smallest data type that can correctly store the data.

4.2.1 Integer Types#

MySQL provides 6 integer types, which are:

tinyint
smallint
mediumint
int
integer
bigint

4.2.2 Character Types#

If the field length is uncertain, use varchar without hesitation, because varchar has additional space to record the length of the field. However, if the field length is fixed, try to use char, which can save a lot of memory space.

4.2.3 Non-Null Fields#

Set non-null fields as NOT NULL and provide default values, or use special values instead of NULL. Storing NULL types has performance issues in terms of storage and optimization.

4.2.4 Avoid Using Foreign Keys, Triggers, and Views (Especially with Large Data)#

This is also mentioned in the "Alibaba Development Manual". There are three reasons:

It reduces readability. When checking the code, you also need to check the database code.
Assign the calculation work to the program (business layer) instead of relying on the database. The database should only focus on storage and do it well.
In general, when dealing with large data, to solve storage bottlenecks, horizontal sharding is used. Foreign keys will lead to messy table structures and possible circular constraints.
Triggers can be understood as capturing data changes in real-time and performing corresponding operations. The original intention is good, but the disadvantages outweigh the advantages. Insert, update, and delete operations will frequently trigger triggers, resulting in significant resource consumption.
Data integrity verification should be done by developers, not relying on foreign keys. Once foreign keys are used, you will find it difficult to delete garbage data during testing (foreign key constraints will result in a messy table structure and possible circular constraints).

4.2.5 Storage of Images, Audio, and Video#

Store their addresses only.

4.2.6 Large Field Splitting and Data Redundancy#

Large field splitting is actually vertical partitioning mentioned earlier, which separates less frequently used fields or fields with large data volumes to avoid having too many columns and large data volumes. This is especially important when using SELECT *, as having too many columns and large data volumes can cause significant problems!

Field redundancy does not strictly conform to the normalization principles of database design, but it can greatly improve query performance. For example, in a contract table, in addition to storing the client ID, you can also redundantly store the client name, so that you do not need to retrieve the client name based on the client ID. Therefore, appropriate redundancy based on business logic is a good optimization technique. (Usually used in scenarios with frequent queries and fewer modifications)

5. Business Optimization#

Strictly speaking, business optimization is not considered as MySQL tuning, but business optimization can effectively reduce the pressure on database access. A typical example is Alibaba. Here are a few simple examples to provide some ideas:

Pre-sale diversion: In the past, the "buy, buy, buy" mode started on the night of Double 11. In recent years, the pre-sale period for Double 11 has become longer, starting more than half a month in advance. Various deposit and red envelope models have emerged. This method is called pre-sale diversion. This can divert customer service requests and avoid the need to wait until the early morning of Double 11 to place orders collectively.
Degradation strategy: During the early morning of Double 11, you may want to query orders from days other than that day, but the query fails. Even the chicken feed in Alipay is delayed. This is a degradation strategy. It gathers the computing resources of less important services to ensure the core business at the moment.
During Double 11, Alipay strongly recommends using Huabei for payment instead of bank cards. Although part of the consideration is to increase software stickiness, on the other hand, using Yu'ebao actually uses Alibaba's internal servers, which have faster access speed. Using a bank card requires calling the bank's interface, which is slower in comparison.