forewordforeword | ||||
text
This survey compared the implementation systems similar to Ethereum, and analyzed the difficulties and possibilities of parallel execution of transactions. The chain itself is designed based on the Account model instead of the UTXO model.
first level title
1. Many alliance chains in China, such as FISCO-BCOS, support parallel execution of transaction verification inside the block.
2. Khipu, a public chain project, is a scala implementation of the Ethereum protocol.
3. Aptos, public chain project, Move virtual machine.
first level title
Difficulties in Parallel Execution of Transactions
Let's first review the traditional transaction execution process:
The execution module takes out transactions one by one from the block, and then executes them one by one. During the execution process, the latest world state will be modified. After a transaction is executed, the state will be accumulated to reach the latest world state after the completion of the block. The execution of the next block strictly depends on the state of the world after the execution of the previous block, so the traditional linear execution process of transactions cannot be well optimized for parallel execution.
1. Account Conflict. If two threads process the balance or other attributes of an address account at the same time, can they be consistent with the results of sequential processing, that is, whether the world state is a definite finite state machine.
3. Cross-contract call conflicts. Originally, contract Ar is deployed first, and contract B needs to wait until contract A is deployed to call contract A. However, when the transactions are parallel, there is no such sequence, and there is a conflict.
FISCO-BCOS
first level title
Comparison of transaction parallel schemes
first level title
overview
FISCO-BCOS 2.0 uses a graph structure in the process of processing transactions, and designs a parallel transaction executor (PTE, Parallel Transaction Executor) based on the DAG model. PTE can give full play to the advantages of multi-core processors, so that the transactions in the block can be executed in parallel as much as possible. At the same time, it provides users with a simple and friendly programming interface, so that users do not need to care about the cumbersome parallel implementation details. The experimental results of the benchmark test program show that compared with the traditional serial transaction execution scheme, under ideal conditions, PTE running on a 4-core processor can achieve about 200% ~ 300% performance improvement, and the improvement in computing is comparable to the number of cores In direct proportion, the more cores, the higher the performance.
Program Details
A directed graph without cycles is called a directed acyclic graph (DAG). In a batch of transactions, the mutually exclusive resources occupied by each transaction can be identified in a certain way, and then a transaction-dependent DAG graph can be constructed according to the order of transactions in the block and the occupation relationship of mutually exclusive resources. As shown in the figure below, all transactions with an in-degree of 0 (no dependent pre-order tasks) can be executed in parallel. After topological sorting based on the order of the original transaction list on the left, the transaction DAG on the right can be obtained.
modular architecture
• Users initiate transactions directly or indirectly through the SDK.
• Then the transaction is synchronized among the nodes, and the node with the packaging right calls the sealer (Sealer) to take a certain amount of transactions from (TxPool) and package it into a block. After that, the block is sent to the consensus unit (Consensus) for inter-node consensus.
• Transaction verification needs to be performed before consensus is reached, and this is where the work of PTE begins. As can be seen from the architecture diagram, PTE first reads the transactions in the block sequentially and inputs them into the DAG Constructor. The DAG constructor will construct a transaction DAG containing all transactions based on the dependencies of each transaction. PTE then wakes up the worker thread pool and uses multiple threads to execute the transaction DAG in parallel. The joiner is responsible for suspending the main thread until all threads in the worker thread pool finish executing the DAG. At this time, the Joiner is responsible for calculating the state root and receipt root based on the modification records of the states of each transaction pair, and returns the execution results to the upper caller.
• After the block verification is passed, the block is uploaded to the chain. After the transaction is executed, if the status of each node is consistent, a consensus is reached. The block is then written to the underlying storage (Storage) and permanently recorded on the blockchain.
Transaction DAG construction process
1. Take out all transactions of the block from the packaged block.
2. Initialize a DAG instance with the number of transactions as the maximum number of vertices.
3. Read all transactions sequentially. If a transaction is parallelizable, resolve its conflict domain and check whether any previous transaction conflicts with this transaction. If there is a conflict, a dependent edge is constructed between the corresponding transactions. If the transaction cannot be parallelized, it is considered that it must be executed after all previous transactions are executed, so a dependent edge is established between this transaction and all previous transactions.
Remarks: After all dependent edges are established, they cannot be executed in parallel, but can only be executed sequentially.
DAG execution process
1. The main thread will first initialize a thread group of a corresponding size according to the number of hardware cores. If it fails to obtain the number of hardware cores, no other threads will be created.
2. When the DAG has not finished executing, the thread loops and waits for the waitPop method from the DAG to take out the ready transaction with an in-degree of 0. If the transaction to be executed is successfully taken out, the transaction will be executed, and the in-degree of the subsequent dependent tasks will be reduced by 1 after execution. If the degree of a transaction is reduced to 0, add the transaction to topLevel. If it fails, it means that the DAG has been executed and the thread exits.
problem and solution
1. For the same block, how to ensure that all nodes are executed and have the same status (three roots match)?
FISCO BCOS uses the method of verifying whether the state root, transaction root and receipt root triples are equal to determine whether the state is consistent. The transaction root is a hash value calculated based on all transactions in the block. As long as the block data processed by all consensus nodes is the same, the transaction root must be the same. Since this is relatively easy to guarantee, the focus is on how to ensure that the state generated after the transaction is executed is the same as the receipt root.
It is well known that for instructions executed in parallel on different CPU cores, the order of execution between instructions cannot be predicted in advance. The same is true for transactions executed in parallel. In the traditional transaction execution scheme, every time a transaction is executed, the state root changes once, and the changed state root is written into the transaction receipt. After all transactions are executed, the final state root represents the state of the current blockchain, and a receipt root is calculated based on all transaction receipts.
It can be seen that in the traditional execution scheme, the state root plays a role similar to a global shared variable. When transactions are executed in parallel and out of order, the traditional way of calculating the state root is obviously no longer applicable. This is because the execution order of transactions is generally different on different machines, and the consistency of the final state root cannot be guaranteed at this time. Similarly, the consistency of the receipt root cannot be guaranteed.In FISCO BCOS, transactions are executed in parallel first, and the state change history of each transaction is recorded. After all transactions are executed, a state root is comprehensively calculated based on these historical records. This ensures that even if transactions are executed in parallel, the final consensus nodes can still reach a consensus.
2. How to determine whether two transactions are dependent?
If the two transactions have no dependencies but are judged to be, it will cause unnecessary performance loss; conversely, if the two transactions will rewrite the state of the same account but are executed in parallel, the final state of the account may be is uncertain. therefore,
The determination of dependencies is an important issue that affects performance and even determines whether the blockchain can work normally.
In a simple transfer transaction, we can judge whether there is a dependency between the two transactions based on the addresses of the sender and receiver of the transfer.
Take the following 3 transfer transactions as an example: A→B, C→D, D→E. It can be easily seen that transaction D→E depends on the result of transaction C→D, but transaction A→B has no connection with the other two transactions, so it can be executed in parallel.
This analysis is true in blockchains that only support simple transfers. But because we can't know exactly what operations are in the transfer contract written by the user, once this kind of analysis is put into the blockchain that is Turing complete and runs smart contracts, it is not accurate enough.
The possible situation is: A → B transaction seems to have nothing to do with the account status of C and D, but in the underlying implementation of the user, A is a special account, and every money transferred through the A account must first be transferred from C A certain handling fee will be deducted from the account. In this scenario, the three transactions are all related, so they cannot be executed in parallel. If transactions are also divided according to the previous dependency analysis method, problems will inevitably arise.
So can we automatically deduce which dependencies actually exist in the transaction based on the user's contract content? the answer is negative. As mentioned in static analysis, it is difficult to analyze contract dependencies and execution process.
FISCO BCOS assigns the task of specifying transaction dependencies to developers who are more familiar with contract content. Specifically, the mutually exclusive resources that a transaction depends on can be represented by a set of strings. FISCO BCOS exposes the interface to the developer. The developer defines the resources that the transaction depends on in the form of a string, and informs the executor on the chain. The executor automatically arranges all transactions in the block into a transaction DAG according to the transaction dependencies specified by the developer. . For example, in a simple transfer contract, the developer only needs to specify that the dependency of each transfer transaction is {sender address + receiver address}. Furthermore, if the developer introduces another third-party address in the transfer logic, then the dependency needs to be defined as {sender address + receiver address + third-party address}.
This method is more intuitive and simple to implement, and it is also more general, applicable to all smart contracts, but it also increases the responsibility on the shoulders of developers accordingly. Developers must be very careful when specifying transaction dependencies. If the dependencies are not written correctly, the consequences will be unpredictable.
Parallel Framework ContractFISCO BCOS has set some contract writing specifications for developers to use the framework of parallel contracts. The details are as follows:parallel mutexWhether two transactions can be executed in parallel depends on whether the two transactions exist。
mutually exclusive
• . Mutual exclusion means that two transactions areThere is an intersection between the sets of operation contract storage variablesFor example, in a transfer scenario, a transaction is a transfer operation between users. Using transfer(X, Y) to represent the transfer interface from user X to user Y, the mutual exclusion is as follows:mutually exclusive parameters
• :contractinterface
, the parameters related to the "read/write" operation of contract storage variables. For example, the transfer interface is transfer(X, Y), where X and Y are mutually exclusive parameters.
Mutex
: In a transaction, the specific mutually exclusive content extracted according to the mutually exclusive parameters. For example, the transfer interface is transfer(X, Y), and in a transaction calling this interface, the specific parameter is transfer(A, B), then the mutually exclusive object of this operation is [A, B]; another transaction The parameter of the call is transfer(A, C), then the mutex object of this operation is [A, C].
Judging whether two transactions can be executed in parallel at the same time is to judge whether the mutually exclusive objects of the two transactions overlap. Transactions whose intersection with each other is empty can be executed in parallel.
FISCO-BCOS provides two ways to write parallel contracts, one is solidity contract and the other is precompiled contract. Only the solidity contract is introduced here, and the precompiled contract is the same.
solidity contract parallel framework
When writing a parallel solidity contract, basically you only need to use ParallelContract.sol as the base class of the contract that needs to be parallel, and call the registerParallelFunction() method to register the parallel interface.
The parallelContract code is as follows:
• The following is the transfer contract written based on the parallel framework contract:
Determine if an interface is parallelizable
A parallel contract interface must satisfy:
No calls to external contracts.
• No interface for calling other functions.
Determine Mutual Exclusion Parameters
Before writing an interface, you need to determine the mutual exclusion parameters of the interface. The mutual exclusion of the interface is the mutual exclusion of global variables. The rules for determining mutually exclusive parameters are:
• The interface accesses the global mapping, and the key of the mapping is a mutually exclusive parameter."globalA",• The interface accesses the global array, and the subscript of the array is a mutually exclusive parameter.
• The interface accesses global variables of simple type, and all global variables of simple type share a mutex parameter, and use different variable names as mutex objects.For example: There are multiple simple types of global variables in the contract, and different interfaces access different global variables. If you need to parallelize different interfaces, you need to define a mutually exclusive parameter in the interface parameters that have modified the global variable, and specify which global variable is used when calling. When calling, actively pass the "variable name" of the correspondingly modified global variable to the mutex parameter to indicate the mutex object of this transaction. For example, if the global parameter globalA is modified in the setA(int x) function, the setA function needs to be defined as set(string aflag, int x), and when calling, pass in setA(
10), use the variable name "globalA" to indicate that the mutex object of this transaction is globalA.string、address、uint 256、int 256 After determining the mutually exclusive parameters, determine the parameter type and order according to the rules, the rules are:
Determine parameter type and order
- Interface parameters are limited to:
(More types will be supported in the future).- All mutually exclusive parameters are arranged at the top of the interface parameters.。
Khipu
As can be seen,
In fact, FISCO-BCOS transaction parallelism largely relies on the specification of contracts written by users. If the contract written by the user is not standardized, and the system hastily executes it in parallel, it may cause the problem of inconsistency of the ledger root
first level title
overview
Unlike FISCO-BCOS, Khipu believes that it is unrealistic for users to identify and mark address ranges where static conflicts will occur when writing contracts without errors.
Whether, where, and under what conditions the race will occur can only be judged when the deterministic acquisition involves the current state. With the current contract programming language, it is almost impossible to obtain completely error-free and non-missing results through static analysis of the code.
Khipu has made a comprehensive attempt in this regard and completed the project implementation.
Program Details
In Khipu's implementation, each transaction in the same block starts from the world state of the previous block, and then executes in parallel, recording all the above three races encountered on the ideal experience path during execution . After the parallel execution phase is over, it goes to the merge phase. In the merging phase, the parallel world states are merged one by one. When merging a transaction, it is first judged from the recorded static conditions whether there is a conflict with the previously merged race condition. If not, merge directly. If so, execute this transaction again starting from the previously merged world state. The final merged world state will be finalized with the hash of the block. This is the last line of defense. If the verification is wrong, the previous parallel scheme will be abandoned and the transactions in the block will be re-executed serially.parallelism index:
Khipu introduced a parallelism index here, that is, the proportion of transactions in a block that can directly merge the results without re-executing. Through several days of replay observations on Ethereum from the genesis block to the latest block, Khipu found that this ratio (parallelism) can reach 80% on average.
In general, if computing tasks can be fully parallelized, the scalability of a single chain will be infinite, because more CPU cores can always be added to a node. If this is not the case, the maximum theoretical rate is limited by
Andal's theorem
The limit to which you can speed up a system is the inverse of what cannot be parallelized. That is, if you can parallelize 99% of the time, you can achieve a 100x speedup. But if you can only achieve 95% parallelization, then you can only achieve a speedup of 20 times.
conflict marker
From the replay of all transactions in Ethereum, about 80% of the transactions can be parallelized, and 20% cannot be parallelized. Therefore, the average efficiency of Khipu for speeding up Ethereum is about 5 times.
Through the interpretation of the evm code instructions, it can be found that the conflict is that some limited instructions generate a read and write process for stroage, so a read and write set can be formed by recording these read and write places. Static code analysis alone cannot ensure that these processes are recorded, so it is necessary to pre-execute each transaction in parallel when processing transactions in each block. Through the pre-execution process, it can be known whether these transactions read and write to the same account or storage, and then generate a readSet and writeSet for each transaction. In short, the pre-execution process is to first make multiple copies of the world state as the initial state of all transactions. Assuming that there are 100 transactions in the blockchain, then these 100 transactions can be executed in parallel through the thread pool. Each contract has the same initial world state, and 100 readSets and writeSets will be generated during execution, and 100 new states will be generated each.
Through the transaction replay of the Ethereum mainnet, it can be found that where there are many conflicts, most of them are interrelated transactions carried out by exchanges in the same block, which is also consistent with this process.
Aptos
Process Flowchart
Specific parallel execution process
first level title
overview
Aptos created a high-throughput chain based on Diem's Move language and MoveVM, enabling parallel execution. Aptos' approach is to detect associations while being transparent to users/developers. That is, transactions are not required to explicitly state which part of the state (memory location) they use.
Program Details
Aptos implemented a parallel execution engine based on the Block-STM paper, using a modified version of Software transaction Memory called Block-STM.
Block-STM uses MVCC (Multi-Version Concurrency Control) to avoid write-write conflicts. All writes to the same location are stored with their versions containing their tx-id and the number of times the write tx was re-executed. When the transaction tx reads the value of a certain memory location, it obtains the value written to the location that appeared before tx from MVCC, as well as the associated version, so as to determine whether there is a read-write conflict. In Block-STM, transactions are pre-ordered within a block and divided among processor threads during execution to execute in parallel. In parallel execution, assuming there are no dependencies to execute the transaction, the memory locations modified by the transaction are recorded. After execution, all transaction results are verified. During validation, if a transaction is found to access a memory location modified by a previous transaction, the transaction is invalidated. Refresh the result of the transaction and re-execute the transaction. This process is repeated until all transactions in the block have been executed. Block-STM speeds up execution when multiple processor cores are used. Acceleration depends on the degree of interdependence between transactions.
It can be seen that the scheme adopted by Aptos is roughly similar to the Khipu mentioned above, but the implementation details are slightly different. The main differences are as follows:
• Khipu uses parallel execution and sequential verification for transactions in the block, while Aptos uses parallel execution and parallel verification for transactions in the block. Both options have advantages and disadvantages. Khipu's scheme is easy to implement, but slightly less efficient. Aptos' Block-STM implementation uses synchronization and signal operations in many threads, which is difficult to implement in code, but it is more efficient.
• Since Move natively supports global resource addressing, Aptos will reorder transactions, even across blocks, if it is conducive to parallel execution. Officials claim that this solution can not only improve parallel efficiency, but also solve the MEV problem, but whether this will affect the user experience remains to be considered.
Benchmarks
After integrating Block-STM, Aptos made a corresponding benchmark, and compared the performance of a block of 10k transactions in the case of sequential execution and parallel execution. The results are as follows:
As can be seen from the figure above, Block STM achieves a 16-fold speedup compared to the sequential execution of threads when 32 threads are executed in parallel, while Block-STM achieves a speedup of more than 8 times under high contention conditions.
first level title
Summarize
To sum up, some solutions require users to write storage according to established rules when writing contracts, so that dependencies can be found by static and dynamic analysis. Solana and Sui take a similar approach, but with different user perceptions. Such solutions are essentially changes to the storage model to obtain better analysis results.
The solutions of Khipu and Aptos are user-unaware. That is, users do not need to write contracts carefully, and the virtual machine will dynamically analyze dependencies before execution, so that there will be no parallel execution of dependencies. This type of scheme is difficult to implement, and the degree of parallelism depends to a certain extent on the account division of the transaction. When there are many transaction conflicts, continuous re-execution will lead to serious performance degradation. Aptos mentioned in the paper that some optimizations may be made to user-written contracts in the future, so as to better analyze dependencies and achieve faster execution speed.
Considering engineering implementation and efficiency issues, OlaVM will most likely adopt the Khipu+ customized storage model solution. While improving performance, avoid the complexity brought by the introduction of Block-STM, and facilitate better engineering optimization in the later stage.
2.FISCO-BCOS Github, FISCO-BCOS
3.Khipu Github, GitHub - khipu-io/khipu: An enterprise blockchain platform based on Ethereum
4.Aptos Github, GitHub - aptos-labs/aptos-core: Aptos is a layer 1 blockchain built to support the widespread use of of blockchain through better technology and user experience.
refer to:
1. Block-STM paper: [ 2203.06871 ] Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing (arxiv.org)
first level title
about Us
Sin7y was established in 2021 and is composed of top blockchain developers. We are both a project incubator and a blockchain technology research team, exploring the most important and cutting-edge technologies such as EVM, Layer 2, cross-chain, privacy computing, and autonomous payment solutions. The team launched the OlaVM white paper in July 2022, aiming to build the first fast, scalable, and EVM-compatible ZKVM.
WeChat public account: Sin 7 y
Official website: https://sin 7 y.org/
White Paper: https://olavm.org/
Github:Sin 7 y
Community: http://t.me/sin 7 y_labs
