Blockchain and State Explosion

特邀专栏作者

2019-05-24 10:00

This article is about 4948 words, reading the full article takes about 8 minutes

The author of this article, Jan, analyzes the state explosion of the blockchain by comparing and analyzing the history and status of Bitcoin and Ethereum.

AI Summary

Expand

The author of this article, Jan, analyzes the state explosion of the blockchain by comparing and analyzing the history and status of Bitcoin and Ethereum.

ifThe focus of Layer 1 should be state rather than computationstate

state

Every full node in the blockchain network will leave some data on the local storage after running in the network for a period of time. We can divide them into two categories according to history and present:

History — Both block data and transaction data are history, and history is the path from Genesis to the current state.
Status (i.e. now) - the node has finished processing from Genesis

The final result formed after all blocks and transactions up to the current height. The state is always changing with the addition of blocks, and the transaction is the cause of the change.

The role of the consensus protocol is to ensure that the current state seen by each node is the same through a series of message exchanges, and the way to achieve this goal is to ensure that the history seen by each node is the same. As long as the history is the same (that is, the ordering of all transactions is the same), the way of processing transactions is the same (executing the transactions in the same deterministic virtual machine), and the current state seen at the end is the same. When we say "the blockchain is immutable", it means that the history of the blockchain cannot be tampered with. On the contrary, the state is always changing.

Interestingly, different blockchains store history and state in different ways, and the differences make different blockchains form their own characteristics. Since the topic discussed in this article is the state, and the historical data that affects the state are mainly transactions (rather than block headers), the next discussion of history will befocus on transactionssecondary title

Example: History and State of Bitcoin

The state of Bitcoin refers to the current state of the Bitcoin ledger. The state of Bitcoin is composed of UTXO (transaction output that has not yet been spent), each UTXO represents a certain amount of Bitcoin, and each UTXO has a name (scriptPubkey) written on it, recording who the owner of this UTXO is. If an analogy were to be made, the current state of Bitcoin is a bag full of gold coins, each engraved with the owner's name.

The history of Bitcoin consists of a series of transactions, and the main structure within the transaction is the input and output. A transaction changes state by marking some UTXOs contained in the current state (those referenced by transaction inputs) as spent, removing them from the UTXO set, and then adding some new UTXOs (the outputs of this transaction) to the UTXO set go.

It can be seen that the output (TXO, Transaction Output) of the Bitcoin transaction is exactly the UTXO mentioned above, and UTXO is just a TXO in a special stage (not yet spent). Because the components that make up the state of Bitcoin (UTXO) are also the components that make up transactions (TXO). Therefore, Bitcoin has a wonderful property:The state at any moment is a subset of the history, the data types contained in history and status are of the same dimension.The history of transactions (the collection of all packaged transactions, that is, the collection of all generated TXOs) is the history of the state (the collection of UTXOs corresponding to each block, which is also the collection of all generated TXOs), the history of Bitcoin Contains only transactions.

In the Bitcoin network, each block and each UTXO will continue to occupy the storage space of the node. at present,and,andThe size of the state is only about 3G (consisting of about 50 million UTXOs)secondary title

Another example: Ethereum history and state

The state of Ethereum, also known as the "world state", refers to the current state of the Ethereum ledger. The state of Ethereum is a Merkle tree composed of accounts (accounts are leaves). The account not only records the balance (representing a certain amount of ether), but also records the data of the contract (such as the data of each encrypted cat). The state of Ethereum can be regarded as a large ledger. The first column of the ledger is the name, the second column is the balance, and the third column is the contract data.

The history of Ethereum is also made up of transactions, and the main structure inside the transaction is:

to - another account that represents the sender of the transaction
value - the amount of ether carried by the transaction
data - Arbitrary information carried by the transaction

The way a transaction changes state is that the EVM finds the account the transaction was sent to:

1. Calculate the new balance of the target account according to the value of the transaction;
2. Pass the data carried by the transaction as a parameter to the smart contract of the target account, run the logic of the smart contract, and may modify the internal state of any account during operation to generate a new state;
3. Construct a new leaf to store the new state, and update the state Merkle tree.

It can be seen that the history and transaction structure of Ethereum is very different compared to Bitcoin. The state of Ethereum is composed of accounts, while transactions are composed of information that triggers account changes. The state and transactions record completely different types of data. There is no superset and subset relationship between the two.The data types contained in history and status are two-dimensional, there is no necessary relationship between transaction history size and state size. After the transaction modifies the state, it will not only generate a new state (leaves in the solid line box in the figure), but also leave the old state (leaves in the dotted line box in the figure) as a historical state, so the history of Ethereum not only includes transactions, but also Contains historical state. Because the history and state belong to different dimensions, the Ethereum block header not only contains the merkle root of the transaction, but also needs to explicitly contain the merkle root of the state. (Thinking question: EOS uses an account model similar to Ethereum, but does not include the state Merkle Tree Root in the block header. Is this good or bad?)

Every block and every account in Ethereum will continue to occupy the storage space of the node. Ethereum nodes have multiple modes when synchronizing. In Archive mode, all histories and states will be preserved. The history includes historical transactions and historical states.The combined size of all data exceeds 2TB;In the Default mode, the historical state will be trimmed, and only the historical transaction and current state will be kept locally.All data add up to about 170G,inThe transaction history size is 150G, and the current state size is 10G. All overhead management in Ethereum is unified under the gas billing model. The size of the transaction needs to consume the corresponding gas, and the gas consumed by each EVM instruction not only considers the computing overhead, but also takes into account the storage overhead. Through the gaslimit of each block, the growth rate of history and state is indirectly limited.

ps. A common misunderstanding is that the "blockchain size" of Ethereum has exceeded 1T. From the above analysis, we can see that "blockchain size" is a very vague definition. If the historical state is included, it does exceed it. But for all nodes, there is no problem in deleting the historical state. Because as long as there is Genesis and transaction history, the historical state at any time can be recalculated (regardless of the time required for calculation). The really meaningful data is the size of the data necessary for the full node. Bitcoin is 200G, Ethereum is 170G, the two are basically the same, and can be installed on an averagely configured cloud host. The reduction in nodes is not due to the increase in storage (the root cause is the computational overhead during synchronization, which will not be expanded here). Considering that the history length of Ethereum (the timestamp of the current block minus the timestamp of the genesis) is less than half that of Bitcoin, it can be seen that the history and state size of Ethereum grow faster.

The Tragedy of (Storage) Commons: The Blockchain Version of the Tragedy of the Commons
The tragedy of the commons refers to a situation where a finite shared resource is overconsumed by people without any restrictions on its use. The storage paid by blockchain nodes to preserve history and state is just such a shared resource.

There are three types of resources consumed by blockchain nodes to process transactions, CPU, storage, and network bandwidth. CPU and bandwidth are resources that will be refreshed in each block. We can think that there are the same amount of CPU and bandwidth available in each block interval. The CPU and bandwidth consumed in the previous block will not make the next block Less CPU and bandwidth available to chunks. For refreshable resources, we can compensate nodes with a one-time payment of transaction fees.

Unlike CPU and bandwidth, storage is an occupied resource. The storage occupied in a block cannot be used by other users in subsequent blocks unless the user actively releases it. Nodes need to pay for storage continuously, but users do not need to pay for storage continuously (remember that transaction fees only need to be paid once). Users only need to pay a small fee when writing data to the blockchain, and they can permanently use a storage whose availability exceeds Amazon S3, and its infinite permanent storage costs need to be borne by all full nodes in the blockchain network .

Due to the existence of various DApps on Ethereum, The Tragedy of (Storage) Commons is relatively more serious. For example, inyes:yes:

1.EtherDelta, 5.09%
2.IDEX, 4.17%
3.CryptoKitties, 3.05%
4.ENS, 1.92%
5.EOS Sale, 1.73%

More interesting is the last one, EOS Sale. Although the EOS crowdfunding has been completed and the EOS tokens have been circulated on the EOS chain, the records of the EOS crowdfunding remain on the Ethereum nodes forever, consuming the storage resources of the entire Ethereum node.

It can be seen that in the absence of management, the storage resources of the blockchain will be abused intentionally or unintentionally. In a well-designed economic model, users must bear the cost of storage occupancy,secondary title。

state explosion

Both historical and state data take up storage resources. Through the above analysis of Bitcoin and Ethereum (the state models of other blockchains can basically be summarized as one of the two), we can see that although they manage the growth of history and state, the overall history and state There is no control over the size, and these data will continue to accumulate endlessly, making the storage resources required to run a full node larger and larger. Raising the operating threshold of full nodes will make the network less and less decentralized, which is something we don't want to see.

You might say, is it possible that the improvement of the average level of hardware will exceed the accumulation speed of history and status? My answer is very unlikely:

From this graph we can see that as the Ethereum network develops, the amount of state data accumulated increases exponentially. It took 10 years for the status data of Bitcoin to accumulate from 0 to 3G; it took 4 years for the status data of Ethereum to accumulate from 0 to 10G; and this is before we have solved the Scalability problem, and the blockchain is still a niche technology case growth rate.When we solve the scalability problem, when the blockchain really gets mass adoption, and the number of DApps and users explodes, at what speed will blockchain history and status data accumulate?

This is the state explosion problem, which we classify as a post-scalability problem, because it will be very obvious after solving the Scalability problem. We first noticed this problem when we implemented the license chain scenario, becauseThe performance of the permissioned chain is much higher than that of the public chain, just in the stage of post-scalability. (Thinking question: How does the permission chain solve the state explosion problem?)

The accumulation of historical data is relatively easy to handle. In the future, it can be compressed by technologies such as decentralized Checkpoint or zero-knowledge proof. Before that, the full node can even discard the history directly and still operate normally. The accumulation of state data is much troublesome, because it is necessary data for full node operation.

Many blockchain projects have seen this problem and proposed some solutions. EOS RAM is a useful attempt to solve the state explosion problem: RAM represents the memory resources available to the super node server, whether it is account, contract state or code, it needs to occupy a certain amount of RAM to run. The design of RAM also has many problems. It needs to be purchased through the built-in trading market. It is non-transferable and cannot be rented. It mixes the short-term memory requirements during contract execution and the long-term storage requirements of the contract state, and the total amount of RAM is not specified. The determined rules depend more on the hardware configuration that the super node can bear, rather than the cost of the consensus space.

The Ethereum community also saw this issue and askedStorage Rent's solution: Users are required to pay a rent in advance for the use of storage resources. Occupying storage resources will continue to consume this rent. The longer the occupation time, the more rent the user needs to pay. There are two problems with the Storage Rent scheme:

1. The prepaid rent will be used up one day, how to deal with the status of occupancy at this time? It is precisely to solve this problem that Storage Rent needs mechanisms such as recovery to supplement, which increases the complexity of the design, greatly reduces the immutability of smart contracts, and also brings trouble to the user experience;

2. Ethereum's state model is a model of shared state, notFirst-class State. Taking ERC20 Token as an example, all user asset records are stored in the storage of a single ERC20 contract. In this case, who should pay the rent?

Original link:

Original link:https://talk.nervos.org/t/top...

公链

Welcome to Join Odaily Official Community