Monad Co-founder: After Cancun, what is the performance bottleneck of Rollup?

Azuma

Odaily资深作者

@azuma_eth

2024-03-26 09:15

This article is about 2634 words, reading the full article takes about 4 minutes

Hard core analysis of the reasons why L2 gas increased instead of decreased.

AI Summary

Expand

Hard core analysis of the reasons why L2 gas increased instead of decreased.

Original author: Keone Hon, co-founder of Monad

Compiled by: Odaily Azuma

Editors note: On the morning of March 26th, Beijing time, Monad co-founder Keone Hon published an in-depth article on the performance status of Rollup on Personal X. In the article, Keone detailed how to calculate the theoretical TPS limit of Rollup after the Cancun upgrade, and explained why the single transaction fee of some Layer 2 (Base) is still as high as several dollars after the upgrade. In addition, Keone also outlined the challenges faced by Rollup. Some bottleneck limitations and potential improvements.

The following is the original content of Keone, compiled by Odaily. For the convenience of readers, the translator has made certain additions to the original text.

There have been some discussions in the market recently about Rollup execution bottlenecks and Gas limitations, which not only involve Layer 1, but also Layer 2. I discuss these bottlenecks below.

Data Availability (DA)

With the introduction of the Blob data structure (EIP-4844) in the Cancun upgrade, Ethereums data availability (DA) has been greatly improved. Layer 2 data synchronization transactions no longer need to be charged at the same fee as ordinary Layer 1 transactions. Bid in the market.

Currently, the capacity of Blobs is about three 125 kb Blobs per block (12 seconds), which is 31.25 kb per second. Given that the size of a transaction is about 100 bytes, this means that all Rollups share TPS is about 300.

Of course, there is some information here that requires special remarks.

First, if Rollup adopts better transaction data compression technology to reduce the size of a single transaction, TPS can grow.
Second, in theory, in addition to using Blob to synchronize data, Rollup can also continue to use calldata to synchronize data (that is, the old solution before the Cancun upgrade), although doing so will bring additional complexity.
Third, there are differences in the way different ZK-rollups publish status (especially zkSync Era and Starknet), so for these rollups, the calculation methods and results will also be different.

Rollup gas limit

Recently, Base has attracted a lot of attention due to the explosion of its gas fees, which have increased to a few dollars for an ordinary transaction on the network.

Why did the Base network only degrade for a period of time after the Cancun upgrade, and now it has returned to or even exceeded the level before the upgrade? This is because blocks on Base have a total gas limit that is enforced through a parameter in their code.

The gas parameters currently used by Base are the same as Optimism, that is, there is a total limit of 5 million gas per Layer 2 block (2 seconds), when the demand (total number of transactions) on the network exceeds the supply (block space) , price settlement will adopt an on-demand execution mechanism, resulting in a surge in gas on the network.

Why doesnt Base increase this total gas limit? Or in other words, why does Rollup need to set a total gas limit?

In addition to the TPS upper limit on data availability mentioned above, there are actually two other major reasons, namely the bottleneck of execution throughput and the hidden danger of state growth.

Problem 1: Execution throughput bottleneck

Generally speaking, EVM Rollup runs an EVM forked from Geth, which means that they have similar performance characteristics to the Geth client.

Geths client is single-threaded (that is, it can only process one task at a time), uses LevelDB/PebbleDB encoding, and stores its state in a merkle patricia trie (MPT). This is a general-purpose database that uses another tree structure (LSM tree) as the underlying layer to store data on a solid-state drive (SSD).

For Rollup, state access (reading values from the merkle trie) and state update (updating the merkle trie at the end of each block) are the most expensive links in the execution process. This is so because the cost of a single read from the SSD is 40-100 microseconds, and because the merkle trie data structure is embedded in another data structure (LSM tree), it requires a lot of unnecessary extra Find.

This link can be imagined as the process of finding a specific file in a complex file system. You need to go from the root directory (trie root node) all the way to the target file (leaf node). When searching for each file, you need to find a specific key in the database LevelDB, and inside LevelDB you must perform the actual data storage operation through another data structure called the LSM tree. This process causes many additional searches. step. These extra steps make the entire data reading and updating quite slow and inefficient.

In the design of Monad, we solved this problem through MonadDb. MonadDb is a custom database that supports storing merkle trie directly on disk, avoiding the overhead of LevelDb; supports asynchronous IO, allowing multiple reads to be processed in parallel; bypassing the file system.

In addition, the optimistic parallel execution mechanism adopted by Monad allows multiple transactions to proceed in parallel and their status to be extracted from MonadDb in parallel.

However, Rollup does not have these optimizations and therefore has a bottleneck in execution throughput.

It should be noted that the Erigon/Reth client has certain optimizations for database efficiency, and some Rollup clients are also built based on these clients (such as OP-Reth). Erigon/Reth uses a flat data structure, which reduces the query cost when reading to a certain extent; however, they do not support asynchronous reading or multi-threaded processing. Additionally, the merkle root needs to be recalculated after each block, which is also a rather slow process.

Question 2: Hidden dangers of state growth

Like other blockchains, Rollup limits their throughput to prevent their active state from growing too quickly.

A common argument in the market is that the reason state growth rates are concerning is because if state data grows significantly, device demand for solid-state drives (SSDs) will also have to go up. However, I think this is a bit inaccurate, SSDs are relatively cheap (a high-quality 2 TB SSD is about $200), and Ethereums full state has only been around 200 GB in its nearly 10-year history. From a pure storage perspective, there is still a lot of room for growth.

The bigger hidden danger is that as the status continues to grow, the time to query the specified status fragment will become longer. This is because the current merkle patricia trie will use the shortcut when the condition node has only one child node is met, which can reduce the effective depth of the trie and thereby speed up the query process. However, if the status of the merkle trie becomes more and more full, There will be fewer and fewer shortcuts available.

In summary, the hidden danger of state growth is ultimately a matter of state access efficiency. Therefore, accelerating state access is the key to making state growth more sustainable.

Why just optimizing the hardware doesnt work?

Layer 2 is still relatively centralized, that is, the network still relies on a single sequencer to maintain state and produce blocks. One might ask, why not run the sorter on hardware with very high RAM (random access memory) so that all the state can be stored in memory?

There are two reasons for this.

First, this will not solve the data availability bottleneck problem of the Ethereum main network. Although based on the current situation of Base, the surge in the network gas is not caused by insufficient data availability capabilities of the main network, but in the long run This will eventually become a major bottleneck limiting Rollup.

The second is the issue of decentralization. Although the sequencer is still highly centralized, other roles involved in network operation are also important. They also need to be able to run nodes independently, replay the same transaction history and maintain the same state. .

Raw transaction data and state commits above Layer 1 are not enough to unlock the complete state. Any role that needs access to the complete state (such as a merchant, exchange, or automated trader) should run a full Layer 2 node to process transactions and have an up-to-date copy of the state.

Rollups are still blockchains, and blockchains are interesting because of their ability to achieve global coordination through shared global state. For all blockchains, powerful software is necessary, and optimizing hardware alone is not enough to solve the problem.

community interaction

After Keone posted this article, key personnel of multiple head Layer 2 projects interacted below the post.

Alex Gluchowski, co-founder of zkSync, asked Monad how it is different in this regard regarding the content in the article that the merkle root needs to be recalculated after each block.

Keones reply was that there would be an optimized algorithm for calculating the merkle root after each block.

Jesse Pollak, the person in charge of Base, also used this to explain why the gas cost of Base increased instead of falling after the upgrade in Cancun. He said that EIP-4844 has significantly reduced the DA cost at the Layer 1 level. The gas cost should have been reduced, but due to network transactions Demand has increased by more than five times, and blocks on the Base network have a 250 gas/s limit. Demand is greater than supply, causing gas fees to rise.