Risk Warning: Beware of illegal fundraising in the name of 'virtual currency' and 'blockchain'. — Five departments including the Banking and Insurance Regulatory Commission
Information
Discover
Search
Login
简中
繁中
English
日本語
한국어
ภาษาไทย
Tiếng Việt
BTC
ETH
HTX
SOL
BNB
View Market

Blockchain Performance Testing and Optimization - Part 1

ZgTech
特邀专栏作者
2023-04-02 02:04
This article is about 2849 words, reading the full article takes about 5 minutes
The purpose of this article is to introduce the complete performance project process through a specific example (cosmos-sdk SimApp).

first level title

outline

  • The purpose of this article is to introduce the complete performance project process through a specific example (cosmos-sdk SimApp). The specific content introduces the blockchain performance test used:

  • 1. Basic concepts

  • 2. Common tools

These 3 pieces of content cover a lot of content, each of which has many books and articles, and the detailed content will not appear in this article.

first level title

Blockchain performance testing is methodologically no different from traditional performance testing. There are many confusing concepts in performance testing. Here I list the concepts described in this article to make some definitions.

secondary title

Definition of performance testing

  • Performance testing is to establish monitoring strategies for system or service performance indicators, execute tests in specific scenarios, analyze and judge performance bottlenecks and optimize them, and finally obtain performance results to evaluate whether system or service performance indicators meet the predetermined values. Here it is explained in conjunction with the simapp blockchain of cosmos-sdk.

  • 1. It is necessary to clarify indicators, which generally refer to two types of indicators: technical indicators and business indicators. Technical indicators are generally TPS, response time, and resource utilization. Corresponding to the blockchain, it generally refers to how many transactions can be processed per second? What are the response times or statistics for these transactions? What is the state of the resources used by the system in this case? The business indicators expected to be met should come from the statistics of the production environment. Taking the production application cosmos-hub of cosmos-sdk as an example, the current block generation time is about 6 seconds, and the number of transactions in each block is mostly less than 10. It is more reasonable to set the expected business index as TPS to be 100. (Such a low TPS is actually related to the goals of cosmos-hub, since its main focus is on chain interoperability).

  • 2. Test model: It is an abstraction of the real scene, describing what the business model looks like. Taking cosmos-hub as an example, the blockchain nodes distributed around the world process transactions when there are about 500 validator nodes and about 200 active validator nodes. The actual situation can be abstracted to scale when testing.

  • 3. Test plan: including test environment, test data, test model, performance indicators, etc. The test of comparing the blockchain system is to determine the test structure and prepare content such as 1000 users and each user's balance of 1000 stake.

  • 4. Monitoring is required: The monitoring objects include presses, blockchain nodes, and others such as load balancing servers. Monitoring in the cloud-native era is generally Kubernetes+Prometheus+Grafana.

  • 5. Required test conditions: hardware environment, test execution strategy, etc. For example: 4 C 8 G, for the first 60 seconds, add 10 threads per second.

  • 7. Have a result report: The content of the report is of course the actual indicator data.

secondary title

  • Classification of performance scenarios

  • 1. Baseline performance scenario: make single transactions/capacity (traffic volume) of the interface, and prepare for mixed capacity.

  • 2. (Mixed) capacity performance scenario: Mixed capacity test is because the real online scene is composed of different services, so gradient pressure tests initiated by these services according to different concurrency ratios are mixed capacity test scenarios.

  • 4. Abnormal performance scenarios: Under strong pressure, simulate abnormalities.

secondary title

important performance indicators

  • 1. RT, Response Time

  • 2. HPS, Hits Per Second

  • 3. TPS, Transactions Per Second,There are many indicators for performance testing, such as:

  • 4. QPS, Queries Per Second

  • 5. PV, Page View

  • 6. Throughput

  • 7. IOPS, Input/Output Operations Per Second

Transactions here are generally called "transactions" in traditional applications, and "transactions" in the blockchain field

  • The more important indicators are resource utilization, throughput, and response time. Service providers care more about the former two, and users update the latter. The general situation of these indicators is illustrated by referring to the classic diagrams in Performance Testing Methodology (http://hosteddocs.ittoolbox.com/questnolg 22106 java.pdf), and the actual situation may be different. 3 lines, 3 areas, and 3 states are defined in the figure. This figure is worth looking at more, and you can roughly understand the relationship between indicators.

  • 1. 3 wires: Utilization, Throughput, Response Time

  • secondary title

picture

other

  • other

  • 1. Generally, when do performance tests need to be done.

  • a. Before the project goes online, estimate the carrying capacity of the system

  • b. After the project is restructured, evaluate the effect

  • 2. Terminate a project if it gets a performance report, so it's just performance verification. Doing a comprehensive performance test and tuning the system to the optimal state at the same time is a complete performance project. Performance tuning takes a long time and may require development participation, which is expensive.

Blockchain performance testingWhy blockchain performance is hard to measuresecondary title

Delay

Delay

  • How is the start and end of this delayed period defined?

  • 1. Is the starting point the user clicking submit or the transaction arriving in the mempool?

  • 2. The end point is the transaction is confirmed by the first block? Or is it confirmed by the 6th block (POW blockchain thinks so)? Or is it when the end user receives the response from the interface?

  • 3. Some blockchain systems wait for a certain delay and a certain amount of transactions before starting to process them. In this way, the luckiest transaction is the last to join, and its processing delay is the shortest.

  • 5. The transaction processing of some blockchain systems has priority. Transactions with high fees are confirmed quickly, while transactions with low fees are relatively slow. The difference in fee has an impact on transaction delay and TPS statistics.

secondary title

throughput

Another practical problem is that users don’t really care about the TPS of a blockchain. Users only care about how to use less fees and complete transactions as soon as possible. From this perspective, TPS is only meaningful to system service providers.

basic tools

secondary title

pressure toolJmeterPressure tools for general use

Or specific application-specific test tools as follows:

  • Using Jmeter should be closer to the usage scenario and more general. Generally, there are ways to interact with blockchain nodes (general command line interaction is ultimately to call one of the following interfaces)

  • 1. gRPC protocol

The Sampler supported by Jmeter supports HTTP, and the support for the gRPC protocol requires the help of plug-insjmeter-grpc-request

secondary title

monitoring toolPrometheusGeneral monitoring toolshttps://prometheus.io/assets/architecture.pngThis tool can monitor a lot of content, and its ecology is shown in the figure (

). In the practice of testing blockchain applications, docker-compose is generally used to deploy multiple blockchain nodes to simulate the formal testing environment, because the formal testing environment generally has a high hardware configuration. If it is not a self-built computer room, use the cloud The machines of service manufacturers are expensive, and this can save costs.cadvisor

In docker-compose, you can limit the resources used by containers, such as memory and CPU computing power, and even bind CPU cores. The monitoring of these resources can be usedstress-ngpicture

picture

first level title

performance tuning

Generally, the common meta-causes of performance bottlenecks (I call them the reasons behind the reasons, usually at the hardware level) are network, CPU, and disk IO. The operations that cause disk IO bottlenecks include frequent log writing, unnecessary log printing, and disk access through the network. These resources will be completed through system calls. To track system calls, you can use strace to see which system calls are executed and the time spent on these calls.

Another issue that may be encountered is system instability, which can manifest itself as CPU usage/TPS instability.

If the CPU usage is unstable (the trend is stable, but fluctuates greatly), from the perspective of CPU instruction execution, it means that the CPU is in the idle state for varying lengths of time. The reason in this case is not that there are CPUs that are idle, but that the time period in idle is long or short. You need to use Linux system tools and profilling tools corresponding to the program to observe and find the reason.

secondary title

analyzing toolhttps://www.brendangregg.com/Perf/linux_perf_tools_full.pngpicture

picture

secondary title

Disk IO generally leads to system bottlenecks, and the disk IO stack is relatively long, making it difficult to analyze. Familiarity with the IO stack will help us find problems (https://www.thomas-krenn.com/en/wikiEN/images/c/c 2/Linux-storage-stack-diagram_v 6.2.pdf)

picture

picture

After finding the reason, it is quicker to optimize the performance by adjusting operating system parameters or application system parameters. If you need to modify the code, it will involve system architecture optimization, involving and coding work, and the tuning cycle will be very long.

technology
Developer
Welcome to Join Odaily Official Community