Vitalik: Analyzing the extension scheme of Rollups - data sharding

Unitimes

特邀专栏作者

2021-11-29 02:53

This article is about 3892 words, reading the full article takes about 6 minutes

This article describes a practical path to this solution.

AI Summary

Expand

This article describes a practical path to this solution.

Edit: Southwind

For Ethereum, Rollups are the only trustless scalability solution in the short to medium term, and possibly in the long term. Transaction fees on Ethereum L1 have been high for several months, and it is now even more urgent to take whatever action is necessary to help move the entire ecosystem to Rollups. Rollups have significantly lowered fees for many Ethereum users: the l2fees.info website often shows that the Optimism and Arbitrum networks are about 3-8 times cheaper than the Ethereum base layer itself, while zk-Rollups have better data compression, And it can avoid including signatures, so its fees are about 40-100 times lower than the Ethereum base layer.

However, even these (in Rollups) fees are prohibitively expensive for many users. For a long time, data sharding (data sharding) has been considered as a solution to the long-term deficiencies of Rollups in its current form, and data sharding is expected to add approximately 1-2MB/s dedicated data space for Rollups on the Ethereum chain . This article describes a practical path to implementing the solution, unlocking data space for Rollups as quickly as possible, and adding more extra space and security over time.

Step 1: Extend transaction calldata

Currently existing Rollups use transaction calldata. Therefore, if we want to increase the capacity of Rollups and reduce costs in the short term without requiring any additional work by the various Rollups teams, we should reduce the gas cost of transaction calldata. The current average block size is nowhere near a size that threatens the stability of the Ethereum network, so it is potentially safe to do so, although some additional logic may be required to prevent very insecure edge cases.

seeEIP-4488 Proposal, or another (simpler but milder)EIP-4490 proposal.

EIP-4488：

https://github.com/ethereum/EIPs/pull/4488

EIP-4490：

https://github.com/ethereum/EIPs/pull/4490

EIP 4488 should be able toeach slotThe data space available for Rollups is increased to a theoretical maximum of approx.1 MB, and put the Rollups on theAbout 5 times lower costfirst level title

Step 2: Several shards

In the meantime, we can start doing some work to roll out "proper" shards. It will take a long time to implement sharding in a complete (functional) form, but what we can do is implement it step by step and benefit from each step. First of all, it is natural to implement the "business logic" of the sharding specification, but it is necessary toMake the number of shards that go live first very low(e.g. 4 shards) to avoid most of the difficulties surrounding sharded networks. Each shard will be in its ownsubnetworkbroadcast. By default, validators will trust the committee, but they can choose to be in each subnetwork if they wish, but they will only receive that if they have seen the full data of any shard block confirmed by the beacon block. Beacon block.

The shard spec itself is not particularly difficult; it has boilerplate code changes of a similar size to the recently released Altair hard fork (altair's beacon change spec file is 728 lines, shard's beacon change spec file is 888 lines), It is therefore reasonable to expect that it can be achieved in a similar time frame as Altair's implementation and deployment.

In order for sharded data to actually be usable by Rollups, Rollups will need to be able to get their proofs into the sharded data. There are two options:

Add BEACONBLOCKROOT opcode; Rollups will add code to verify Merkle proofs rooted in the historical beacon chain block root;
Add future-proof precompilation of state and history access so that Rollups don't need to change their code when the commitment scheme changes in the future.

first level title

Step 3: N shards, protected by a committee

Increase the number of active shards from 4 to 64. At this point the sharded data will go into sub-networks, so the P2P layer at that point must already be strong enough to make it feasible to split it into a larger number of sub-networks. Security of data availability will be based on majority (validator) honesty assumption, relying on committee security.

first level title

Step 4: Data Availability Sampling (DAS)

Data Availability Sampling (DAS) is added to ensure a higher level of security, allowing users to be protected even in the case of a majority (validator) dishonest attack. Data availability sampling can be done in stages: first, in a non-binding way to allow the network to test it, then as a requirement to receive beacon blocks, possibly even on some clients first.

first level title

Fragment-based Optimistic Rollups and ZK Rollups

One of the main differences between the current Ethereum and the Ethereum after the implementation of sharding is that in the world of sharding, it is actually impossible for Rollup data to be part of the transaction that submits the Rollup block to the smart contract. Instead, the publication of Rollup data and the submission of Rollup blocks will have to be separate: first, data publication will put the data on-chain (that is, into the shard chain), and then block submission will submit the block header and A proof pointing to the underlying data.

Optimism and Arbitrum already use a two-step design for Rollup block commits, so this will be a small code change for both.

With ZK Rollups, things are a bit trickier, since committing a transaction requires providing a proof that the operation directly operates on the data. they can passZK-SNARKTo prove that the data in the shard matches the commitment on the beacon chain, but this operation is very expensive. Fortunately, there are cheaper alternatives.

If the ZK-SNARK is a BLS12-381 basedProof of PLONK, then they can simply package the shard data commitment directly as input. The BLS12-381 shard data commitment is a KZG commitment, the same type of commitment as in PLONK, so it can be passed directly into the proof as a public input.

first level title

In a sharded world, who will store historical data?

A necessary condition for increasing data space isRemoves the property that the Ethereum core protocol is responsible for permanently maintaining all data that reaches consensus. Because the amount of these data is too large. For example:

The theoretical maximum chain size brought by EIP-4488 is about 1,262,861 bytes per 12-second slot, which is about 3.0 TB per year, but in practice it is more likely to be about 250-1000 GB per year, especially at the beginning.
4 shards (1 MB per slot) would add an additional ~2.5 TB per year.
64 shards (16 MB per slot) would result in a total of about 40 TB of storage per year.

Most users have hard drives between 256 GB and 2 TB, with 1 TB appearing to be the middle value. The graph below is the result of an internal survey conducted among a group of blockchain researchers on how much computer hard drive space is:

This means that a user can currently run a node, but if any part of this roadmap is implemented without modification, the user will not be able to run the node. There are of course larger drives available, but users will have to go out of their way to buy them, which adds significantly to the complexity of running a node. The current main solution is EIP-4444,This proposal removes the responsibility of node operators to store blocks or receipts older than 1 year. In the case of sharding, this 1-year period will likely be shortened even further, and nodes will only need to be responsible for shards on the subnetworks they actively participate in.

This raises a question:If the Ethereum core protocol doesn't store this data, who will?

First, it's important to remember that even with sharding, the amount of data won't be that large. Yes, 40 TB per year is indeed beyond the reach of an individual running "default" consumer hardware (in fact, even 1 TB per year is still the case). However, for someone who is willing to invest some resources and find a way to store this data, this is within the acceptable range. A 48 TB HDD (hard disk drive) currently sells for $1,729, and a 14 TB is about $420. Someone running a 32 ETH validator slot might be willing to pay and store the entire chain after sharding is implemented for the sake of staking rewards. Therefore, in practice, "no one will store some historical data of a certain shard so that the data is completely lost"This situation seems impossible.

So who will store this data? Some of my thoughts:

individual and institutional volunteers;
Block explorers (etherchain.org, etherscan.io, amberdata.io, etc.) will definitely store all data, since it is their business model to provide data to users.
Rollup DAOs appoint and pay participants to store and provide historical data related to their Rollups.
Historical data can be uploaded and shared via torrents.
Clients can voluntarily choose to randomly store 0.05% of the blockchain's historical data (using erasure coding so that only small chunks of data are lost if many clients go offline at the same time).
Clients in the Portal Network can randomly store a part of the historical data of the blockchain, and the Portal Network will automatically direct the data request to the node that stores the data.
The storage of historical data can be incentivized in the protocol.
Protocols like The Graph can create incentivized markets where clients pay servers for historical data and Merkle proofs of correctness. This incentivizes people and institutions to run servers that store historical data and make that data available on demand.

Some of these solutions (individual and institutional volunteers, block explorers) are already available. And the current P2P torrenting scene is an excellent example of an ecosystem largely driven by volunteers and storing a lot of content. Other protocol-based schemes are more robust because they provide incentives, but they may take longer to develop. In the long run, it may be more efficient to access historical data through these L2 protocols than through the current Ethereum protocol.