Blockchain Performance Testing and Optimization - Part 1

特邀专栏作者

2023-04-02 02:04

This article is about 2849 words, reading the full article takes about 5 minutes

The purpose of this article is to introduce the complete performance project process through a specific example (cosmos-sdk SimApp).

AI Summary

Expand

The purpose of this article is to introduce the complete performance project process through a specific example (cosmos-sdk SimApp).

first level title

outline

The purpose of this article is to introduce the complete performance project process through a specific example (cosmos-sdk SimApp). The specific content introduces the blockchain performance test used:
1. Basic concepts
2. Common tools

These 3 pieces of content cover a lot of content, each of which has many books and articles, and the detailed content will not appear in this article.

first level title

Blockchain performance testing is methodologically no different from traditional performance testing. There are many confusing concepts in performance testing. Here I list the concepts described in this article to make some definitions.

secondary title

Definition of performance testing

Performance testing is to establish monitoring strategies for system or service performance indicators, execute tests in specific scenarios, analyze and judge performance bottlenecks and optimize them, and finally obtain performance results to evaluate whether system or service performance indicators meet the predetermined values. Here it is explained in conjunction with the simapp blockchain of cosmos-sdk.
1. It is necessary to clarify indicators, which generally refer to two types of indicators: technical indicators and business indicators. Technical indicators are generally TPS, response time, and resource utilization. Corresponding to the blockchain, it generally refers to how many transactions can be processed per second? What are the response times or statistics for these transactions? What is the state of the resources used by the system in this case? The business indicators expected to be met should come from the statistics of the production environment. Taking the production application cosmos-hub of cosmos-sdk as an example, the current block generation time is about 6 seconds, and the number of transactions in each block is mostly less than 10. It is more reasonable to set the expected business index as TPS to be 100. (Such a low TPS is actually related to the goals of cosmos-hub, since its main focus is on chain interoperability).
2. Test model: It is an abstraction of the real scene, describing what the business model looks like. Taking cosmos-hub as an example, the blockchain nodes distributed around the world process transactions when there are about 500 validator nodes and about 200 active validator nodes. The actual situation can be abstracted to scale when testing.
3. Test plan: including test environment, test data, test model, performance indicators, etc. The test of comparing the blockchain system is to determine the test structure and prepare content such as 1000 users and each user's balance of 1000 stake.
4. Monitoring is required: The monitoring objects include presses, blockchain nodes, and others such as load balancing servers. Monitoring in the cloud-native era is generally Kubernetes+Prometheus+Grafana.
5. Required test conditions: hardware environment, test execution strategy, etc. For example: 4 C 8 G, for the first 60 seconds, add 10 threads per second.
7. Have a result report: The content of the report is of course the actual indicator data.

secondary title

Classification of performance scenarios
1. Baseline performance scenario: make single transactions/capacity (traffic volume) of the interface, and prepare for mixed capacity.
2. (Mixed) capacity performance scenario: Mixed capacity test is because the real online scene is composed of different services, so gradient pressure tests initiated by these services according to different concurrency ratios are mixed capacity test scenarios.
4. Abnormal performance scenarios: Under strong pressure, simulate abnormalities.

secondary title

important performance indicators

1. RT, Response Time
2. HPS, Hits Per Second
3. TPS, Transactions Per Second,There are many indicators for performance testing, such as:
4. QPS, Queries Per Second
5. PV, Page View
6. Throughput
7. IOPS, Input/Output Operations Per Second

Transactions here are generally called "transactions" in traditional applications, and "transactions" in the blockchain field

The more important indicators are resource utilization, throughput, and response time. Service providers care more about the former two, and users update the latter. The general situation of these indicators is illustrated by referring to the classic diagrams in Performance Testing Methodology (http://hosteddocs.ittoolbox.com/questnolg 22106 java.pdf), and the actual situation may be different. 3 lines, 3 areas, and 3 states are defined in the figure. This figure is worth looking at more, and you can roughly understand the relationship between indicators.
1. 3 wires: Utilization, Throughput, Response Time
secondary title

other

other
1. Generally, when do performance tests need to be done.
a. Before the project goes online, estimate the carrying capacity of the system
b. After the project is restructured, evaluate the effect
2. Terminate a project if it gets a performance report, so it's just performance verification. Doing a comprehensive performance test and tuning the system to the optimal state at the same time is a complete performance project. Performance tuning takes a long time and may require development participation, which is expensive.

Blockchain performance testingWhy blockchain performance is hard to measuresecondary title

Delay

How is the start and end of this delayed period defined?
1. Is the starting point the user clicking submit or the transaction arriving in the mempool?
2. The end point is the transaction is confirmed by the first block? Or is it confirmed by the 6th block (POW blockchain thinks so)? Or is it when the end user receives the response from the interface?
3. Some blockchain systems wait for a certain delay and a certain amount of transactions before starting to process them. In this way, the luckiest transaction is the last to join, and its processing delay is the shortest.
5. The transaction processing of some blockchain systems has priority. Transactions with high fees are confirmed quickly, while transactions with low fees are relatively slow. The difference in fee has an impact on transaction delay and TPS statistics.

secondary title

throughput

Another practical problem is that users don’t really care about the TPS of a blockchain. Users only care about how to use less fees and complete transactions as soon as possible. From this perspective, TPS is only meaningful to system service providers.

basic tools

secondary title

pressure toolJmeterPressure tools for general use

Or specific application-specific test tools as follows:

Using Jmeter should be closer to the usage scenario and more general. Generally, there are ways to interact with blockchain nodes (general command line interaction is ultimately to call one of the following interfaces)
1. gRPC protocol

The Sampler supported by Jmeter supports HTTP, and the support for the gRPC protocol requires the help of plug-insjmeter-grpc-request

secondary title

monitoring toolPrometheusGeneral monitoring toolshttps://prometheus.io/assets/architecture.pngThis tool can monitor a lot of content, and its ecology is shown in the figure (

). In the practice of testing blockchain applications, docker-compose is generally used to deploy multiple blockchain nodes to simulate the formal testing environment, because the formal testing environment generally has a high hardware configuration. If it is not a self-built computer room, use the cloud The machines of service manufacturers are expensive, and this can save costs.cadvisor。

In docker-compose, you can limit the resources used by containers, such as memory and CPU computing power, and even bind CPU cores. The monitoring of these resources can be usedstress-ngpicture

first level title

performance tuning

Generally, the common meta-causes of performance bottlenecks (I call them the reasons behind the reasons, usually at the hardware level) are network, CPU, and disk IO. The operations that cause disk IO bottlenecks include frequent log writing, unnecessary log printing, and disk access through the network. These resources will be completed through system calls. To track system calls, you can use strace to see which system calls are executed and the time spent on these calls.

Another issue that may be encountered is system instability, which can manifest itself as CPU usage/TPS instability.

If the CPU usage is unstable (the trend is stable, but fluctuates greatly), from the perspective of CPU instruction execution, it means that the CPU is in the idle state for varying lengths of time. The reason in this case is not that there are CPUs that are idle, but that the time period in idle is long or short. You need to use Linux system tools and profilling tools corresponding to the program to observe and find the reason.

secondary title

analyzing toolhttps://www.brendangregg.com/Perf/linux_perf_tools_full.pngpicture

secondary title

Disk IO generally leads to system bottlenecks, and the disk IO stack is relatively long, making it difficult to analyze. Familiarity with the IO stack will help us find problems (https://www.thomas-krenn.com/en/wikiEN/images/c/c 2/Linux-storage-stack-diagram_v 6.2.pdf)

picture

After finding the reason, it is quicker to optimize the performance by adjusting operating system parameters or application system parameters. If you need to modify the code, it will involve system architecture optimization, involving and coding work, and the tuning cycle will be very long.

technology

Developer

Welcome to Join Odaily Official Community