a16z: Why is it difficult for encrypted memory pools to become a panacea for MEV?-web3资讯-ODAILY

Original article by Pranav Garimidi, Joseph Bonneau, Lioba Heimbach, a16z

Original translation: Saoirse, Foresight News

In the blockchain, money is earned by deciding which transactions to include in a block, which to exclude, or by adjusting the order of transactions. The maximum value that can be earned is called maximum extractable value, or MEV for short. MEV is ubiquitous in most blockchains and has been a topic of widespread concern and discussion in the industry.

Note: This article assumes that readers have a basic understanding of MEV. Some readers may first read our MEV popular science article .

When many researchers observed the MEV phenomenon, they raised a clear question: Can encryption technology solve this problem? One solution is to use an encrypted memory pool: users broadcast encrypted transactions, which are only decrypted and disclosed after the sorting is completed. In this way, the consensus protocol must blindly select the order of transactions, which seems to prevent the use of MEV opportunities to profit during the sorting stage.

Unfortunately, whether from a practical or theoretical perspective, encrypted memory pools cannot provide a universal solution to the MEV problem. This article will explain the difficulties and explore the feasibility of encrypted memory pool design.

How the encrypted memory pool works

There have been many proposals for encrypted mempools, but the general framework is as follows:

The user broadcasts the encrypted transaction.
Encrypted transactions are submitted to the chain (in some proposals, transactions must first undergo a verifiable random shuffle).
When the block containing these transactions is finalized, the transactions are decrypted.
Finally execute these transactions.

It should be noted that there is a key issue in step 3 (transaction decryption): Who is responsible for decryption? What if decryption fails? A simple idea is to let users decrypt their own transactions (in this case, encryption is not even required, just hiding the commitment). However, this approach has a loophole: attackers may implement speculative MEV.

In speculative MEV, the attacker guesses that a certain encrypted transaction contains a MEV opportunity, then encrypts their own transaction and attempts to insert it into a favorable position (such as before or after the target transaction). If the transactions are arranged in the expected order, the attacker will decrypt and extract MEV through their own transaction; if not, they will refuse to decrypt and their transaction will not be included in the final blockchain.

It may be possible to impose penalties on users who fail to decrypt, but this mechanism is extremely difficult to implement. The reason is that the penalty for all encrypted transactions must be uniform (after all, transactions cannot be distinguished after encryption), and the penalties must be severe enough to curb speculative MEV even in the face of high-value targets. This will result in a large amount of funds being locked up, and these funds must remain anonymous (to avoid revealing the connection between transactions and users). Even more difficult is that if real users cannot decrypt normally due to program vulnerabilities or network failures, they will also suffer losses.

Therefore, most solutions recommend that when encrypting a transaction, it is necessary to ensure that it can be decrypted at some point in the future, even if the initiating user is offline or refuses to cooperate. This goal can be achieved in the following ways:

Trusted Execution Environments (TEEs): Users can encrypt transactions to keys held by a trusted execution environment (TEE) secure zone. In some basic versions, the TEE is only used to decrypt transactions after a certain point in time (this requires time perception within the TEE). More complex schemes let the TEE be responsible for decrypting transactions and building blocks, sorting transactions based on criteria such as arrival time and cost. Compared with other encrypted memory pool schemes, the advantage of TEE is that it can directly process plaintext transactions and reduce redundant information on the chain by filtering out transactions that will be rolled back. However, the shortcoming of this method is that it relies on hardware trustworthiness.

Secret-sharing and threshold encryption: In this scheme, users encrypt transactions to a key that is held by a specific committee (usually a subset of validators). Decryption requires a certain threshold condition (for example, two-thirds of the committee members agree).

When threshold decryption is used, the trust carrier is changed from hardware to committee. Supporters believe that since most protocols have assumed that validators have the honest majority feature in the consensus mechanism, we can also make a similar assumption that the majority of validators will remain honest and will not decrypt transactions in advance.

However, it is important to note a key distinction here: these two trust assumptions are not the same concept. Consensus failures such as blockchain forks are publicly visible (a weak trust assumption), while malicious committees privately decrypting transactions in advance will not leave any public evidence, and this attack cannot be detected or punished (a strong trust assumption). Therefore, although on the surface, the consensus mechanism and the security assumption of the encryption committee seem to be consistent, in practice, the assumption that the committee will not collude is much less credible.

Time-lock and delay encryption: An alternative to threshold encryption, delay encryption works by encrypting transactions to a public key whose corresponding private key is hidden in a time-locked puzzle. A time-locked puzzle is a cryptographic puzzle that encapsulates a secret that cannot be revealed until a preset amount of time has passed, more specifically, by repeatedly performing a series of computations that cannot be parallelized. In this mechanism, anyone can solve the puzzle to obtain the key and decrypt the transaction, but only after completing a slow (essentially serial) computation designed to take long enough to ensure that the transaction cannot be decrypted before final confirmation. The strongest form of this cryptographic primitive is to publicly generate such a puzzle through delay encryption; this process can also be approximated by trusted committees using time-locked encryption, but its advantage over threshold encryption is questionable at this point.

Whether using delayed encryption or having a trusted committee perform calculations, such schemes face many practical challenges: first, since the delay is essentially dependent on the calculation process, it is difficult to ensure the accuracy of the decryption time; second, these schemes rely on specific entities to run high-performance hardware to efficiently solve puzzles. Although anyone can take on this role, it is still unclear how to motivate the subject to participate; finally, in such designs, all broadcasted transactions will be decrypted, including those that have never been finally written into the block. Threshold-based (or witness encryption) schemes have the potential to decrypt only those transactions that have been successfully included.

Witness encryption: The last most advanced cryptographic scheme is to use witness encryption. In theory, the mechanism of witness encryption is: after the information is encrypted, only those who know the witness information corresponding to a specific NP relationship can decrypt it. For example, the information can be encrypted so that only those who can solve a Sudoku puzzle or provide a numerical hash original image can complete the decryption.

(Note: NP relation is the correspondence between questions and answers that can be quickly verified)

For any NP relation, similar logic can be implemented through SNARKs. It can be said that witness encryption is essentially encrypting data in a form that can only be decrypted by subjects who can prove that certain conditions are met through SNARK. In the encrypted memory pool scenario, a typical example of such a condition is that transactions can only be decrypted after the block is finally confirmed.

This is a theoretical primitive with great potential. In fact, it is a general scheme, of which committee-based and delay-based approaches are just specific applications. Unfortunately, we do not have any practical witness-based encryption schemes at present. Moreover, even if such a scheme existed, it is difficult to say that it would have an advantage over committee-based approaches in proof-of-stake chains. Even if witness encryption is set to decrypt only after the transaction is sorted in the finalized block, a malicious committee can still privately simulate the consensus protocol to forge the final confirmation status of the transaction, and then use this private chain as a witness to decrypt the transaction. At this time, threshold decryption by the same committee can achieve the same security and operation is much simpler.

However, in the proof-of-work consensus protocol, the advantage of witness encryption is more significant, because even if the committee is completely malicious, it cannot privately mine multiple new blocks at the head of the current blockchain to forge the final confirmation status.

Technical Challenges Facing Crypto Mempools

A number of practical challenges constrain the ability of encrypted memory pools to prevent MEV. In general, information confidentiality is a difficult problem in itself. It is worth noting that encryption technology is not widely used in the field of Web3, but our decades of practice in deploying encryption technology in the network (such as TLS/HTTPS) and private communications (from PGP to modern encrypted messaging platforms such as Signal and WhatsApp) has fully exposed the difficulties: although encryption is a tool to protect confidentiality, it cannot be absolutely guaranteed.

First, some entities may directly obtain the plaintext information of user transactions. In a typical scenario, users usually do not encrypt transactions themselves, but entrust this work to wallet service providers. In this way, wallet service providers can access the plaintext of transactions and may even use or sell this information to extract MEV. The security of encryption always depends on all entities that have access to the key. The scope of control of the key is the boundary of security.

Beyond that, the biggest problem is metadata, which is the unencrypted data around the encrypted payload (transaction). Seekers can use this metadata to infer the intent of the transaction and conduct speculative MEV. It is important to know that the seeker does not need to fully understand the transaction content, nor does it have to guess correctly every time. For example, as long as they can determine with a reasonable probability that a transaction is a buy order from a specific decentralized exchange (DEX), it is enough to launch an attack.

We can divide metadata into several categories: one is the classic problem inherent in cryptography, and the other is the problem that is unique to encrypted memory pools.

Transaction size: Encryption itself cannot hide the size of the plaintext (notably, the formal definition of semantic security explicitly excludes hiding the size of the plaintext). This is a common attack vector in encrypted communications, and a typical example is that even after encryption, an eavesdropper can still determine what is playing on Netflix in real time by the size of each packet in the video stream. In the encrypted memory pool, certain types of transactions may have unique sizes, thereby leaking information.
Broadcast time: Encryption also cannot hide time information (which is another classic attack vector). In Web3 scenarios, some senders (such as structured sell-off scenarios) may initiate transactions at fixed intervals. Transaction times may also be associated with other information, such as activities on external exchanges or news events. A more covert way to use time information is arbitrage between centralized exchanges (CEX) and decentralized exchanges (DEX): sorters can take advantage of the latest CEX price information by inserting transactions created as late as possible; at the same time, sorters can exclude all other transactions broadcast after a certain point in time (even if encrypted) to ensure that their own transactions have exclusive access to the latest price advantages.
Source IP address: A searcher can infer the identity of the sender of a transaction by monitoring the peer-to-peer network and tracking the source IP address. This problem has been known since the early days of Bitcoin (more than a decade ago). This can be very valuable to a searcher if a particular sender has a consistent behavior pattern. For example, knowing the senders identity can link encrypted transactions to decrypted historical transactions.
Transaction sender and fee/gas information: Transaction fees are a type of metadata unique to crypto memory pools. In Ethereum, a traditional transaction contains an on-chain sender address (used to pay fees), a maximum gas budget, and the unit gas fee the sender is willing to pay. Similar to the source network address, the sender address can be used to associate multiple transactions with real-world entities; the gas budget can indicate the intent of the transaction. For example, interacting with a specific DEX may require an identifiable fixed amount of gas.

Sophisticated searchers may combine multiple of the above metadata types to predict transaction content.

In theory, all of this information can be hidden, but at the cost of performance and complexity. For example, padding a transaction to a standard length can hide the size, but it will waste bandwidth and on-chain space; adding a delay before sending can hide the time, but it will increase latency; submitting transactions through anonymous networks such as Tor can hide IP addresses, but this will bring new challenges.

The most difficult metadata to hide is transaction fee information. Encrypting fee data will bring a series of problems to block builders: the first is the spam problem. If the transaction fee data is encrypted, anyone can broadcast malformed encrypted transactions. Although these transactions will be sorted, they cannot pay the fees. After decryption, they cannot be executed but no one can be held accountable. This may be solved by SNARKs, that is, proving that the transaction format is correct and the funds are sufficient, but it will greatly increase the cost.

The second is the efficiency of block construction and fee auctions. Builders rely on fee information to create profit-maximizing blocks and determine the current market price of on-chain resources. Encrypted fee data disrupts this process. One solution is to set a fixed fee for each block, but this is economically inefficient and may also give rise to a secondary market for transaction packaging, which goes against the original design of the encrypted memory pool. Another solution is to conduct fee auctions through secure multi-party computing or trusted hardware, but both methods are extremely costly.

Finally, a secure encrypted memory pool will increase system overhead in many ways: encryption will increase chain latency, computational complexity, and bandwidth consumption; it is not yet clear how it will be combined with important future goals such as sharding or parallel execution; it may also introduce new failure points for liveness (such as the decryption committee and delay function solver in the threshold scheme); at the same time, the design and implementation complexity will also increase significantly.

Many of the problems with crypto memory pools are similar to the challenges faced by blockchains designed to ensure transaction privacy (such as Zcash and Monero). If there is any positive meaning, it is that solving all the challenges that cryptography alleviates in MEV will also clear the way for transaction privacy.

Economic Challenges Facing Crypto Mempools

Finally, crypto memory pools face economic challenges. Unlike technical challenges, which can be mitigated over time with sufficient engineering investment, these economic challenges are fundamental limitations that are extremely difficult to resolve.

The core problem of MEV stems from the information asymmetry between transaction creators (users) and MEV opportunity miners (searchers and block builders). Users are usually unclear about how much extractable value is contained in their transactions, so even if there is a perfect encrypted memory pool, they may be induced to leak decryption keys in exchange for a reward lower than the actual MEV value. This phenomenon can be called incentive decryption.

This scenario is not difficult to imagine, because similar mechanisms such as MEV Share already exist in reality. MEV Share is an order flow auction mechanism that allows users to selectively submit transaction information to a pool, and searchers compete to obtain the right to use the MEV opportunity of the transaction. After the successful bidder extracts MEV, part of the proceeds (i.e., the bid amount or a certain percentage thereof) will be returned to the user.

This model can be directly adapted to the encrypted memory pool: users need to disclose the decryption key (or part of the information) to participate. However, most users are unaware of the opportunity cost of participating in such mechanisms. They only see the immediate rewards and are happy to disclose information. There are similar cases in traditional finance: for example, the zero-commission trading platform Robinhood, whose profit model is to sell user order flow to third parties through payment-for-order-flow.

Another possible scenario is that large builders force users to disclose transaction content (or related information) on the grounds of censorship. Censorship resistance is an important and controversial topic in the Web3 field, but if large validators or builders are subject to legal constraints (such as the regulations of the U.S. Office of Foreign Assets Control OFAC) to implement a review list, they may refuse to process any encrypted transactions. Technically, users may be able to prove that their encrypted transactions meet the review requirements through zero-knowledge proofs, but this will add additional costs and complexity. Even if the blockchain is highly censorship-resistant (ensuring that encrypted transactions must be included), builders may still prioritize known plaintext transactions at the front of the block and encrypted transactions at the end. Therefore, transactions that need to ensure execution priority may eventually be forced to disclose their content to builders.

Other efficiency challenges

Crypto mempools add overhead to the system in several obvious ways. Transactions need to be encrypted by the user, and the system needs to decrypt them somehow, which increases computational costs and may also increase transaction size. As mentioned earlier, processing metadata further exacerbates these overheads. However, there are also efficiency costs that are not so obvious. In finance, markets are considered efficient when prices reflect all available information, and latency and information asymmetry lead to inefficiencies. This is exactly what crypto mempools bring.

One direct consequence of this inefficiency is increased price uncertainty, which is a direct product of the additional latency introduced by crypto mempools. As a result, there is a potential increase in failed transactions due to exceeding price slippage tolerance, wasting on-chain space.

Likewise, this price uncertainty could also spawn speculative MEV transactions that attempt to profit from on-chain arbitrage. It is worth noting that crypto memory pools could make such opportunities more common: the current state of decentralized exchanges (DEXs) becomes more ambiguous due to execution delays, which is likely to lead to less efficient markets and price differences between different trading platforms. Such speculative MEV transactions also waste block space because they tend to terminate execution once no arbitrage opportunities are found.

Summarize

The original intention of this article is to sort out the challenges faced by encrypted memory pools so that people can turn their attention to the research and development of other solutions, but encrypted memory pools may still become part of the MEV governance solution.

One possible approach is a hybrid design: some transactions are blindly sorted through an encrypted memory pool, while others use other sorting schemes. For certain types of transactions (such as buy and sell orders from large market participants who have the ability to carefully encrypt or fill transactions and are willing to pay higher costs to avoid MEV), a hybrid design may be a suitable choice. This design also makes practical sense for highly sensitive transactions (such as repair transactions for security contracts with vulnerabilities).

However, due to technical limitations, high engineering complexity and performance overhead, encrypted memory pools are unlikely to become the MEV universal solution that people expect. The community needs to develop other solutions, including MEV auctions, application layer defense mechanisms, and shortened final confirmation time. MEV will continue to be a challenge for some time to come, and in-depth research is needed to find a balance between various solutions to deal with its negative impact.