In-depth interpretation of storage proof: realizing cross-time and cross-chain blockchain status awareness
Original author: LongHash Ventures
Original compilation: Deep Chao TechFlow
What if you lost your memory every hour and had to constantly ask people to tell you what you had done? This is where smart contracts currently stand. On blockchains like Ethereum, smart contracts cannot directly access state beyond 256 blocks. This problem is exacerbated in multi-chain ecosystems, where retrieving and validating data across different execution layers is even more difficult.
In 2020, Vitalik Buterin and Tomasz Stanczak proposed a method to access data across time. Although this EIP solution has stalled, its demand has re-emerged in the multi-chain world centered on Roll-up. Today, proof-of-storage has come to the forefront to give smart contracts awareness and memory.
How to access on-chain data
Dapps can access data and state in a variety of ways. All of these approaches require a certain level of trust from the application in humans/entities, cryptoeconomic security, or code, and all have certain trade-offs:
Trust humans/entities:
Archive node: Operators can run the archive node themselves, or rely on archive node service providers such as Alchemy and Infura to access all data starting from the genesis block. They provide the same data as full nodes, but also include all historical state data for the entire blockchain. Off-chain services such as Etherscan and Dune Analytics use archive nodes to access on-chain data. Off-chain participants can prove the validity of this data, and on-chain smart contracts can verify that the data was signed by a trusted participant/committee. But the integrity of the underlying data cannot be verified. This approach requires the Dapp to trust the archive node service provider to run the infrastructure in the correct manner, without any malicious intent.
Trust Cryptoeconomic Security:
Indexer: The indexing protocol organizes all data on the blockchain, allowing developers to build and publish open APIs so that applications can query it. Individual indexers are node operators who stake tokens to provide indexing and query processing services. However, when the data provided is incorrect, disputes may arise and the arbitration process takes time. In addition, data from indexers such as The Graph cannot be directly utilized in the business logic of smart contracts, but are used in the context of web2-based data analysis.
Oracles: Oracle service providers use data aggregated from many independent node operators. The challenge here is that the data obtained from the oracle may not be updated frequently and has limited scope. Oracles like Chainlink usually only maintain specific states, such as price information, and are not feasible for application-specific state and historical data. Additionally, this approach also introduces a degree of bias in the data and requires trust in the node operators.
Trust code:
Special variables and functions: Blockchains like Ethereum have special variables and functions that are mainly used to provide information about the blockchain, or are general utility functions. Smart contracts can only access the block hashes of the last 256 blocks. For scalability reasons, not all block hashes are available. Having access to historical block hashes would be very useful as it would allow for the verification of proofs against them. There are no opcodes in the EVM execution environment that can access the contents of old blocks, previous transaction contents, or receipt outputs, so nodes can safely forget these contents and still process new blocks. This approach is also limited to a single blockchain.
Given the challenges and limitations of these solutions, there is clearly a clear need for on-chain storage and provision of block hashes. This is where proof of storage comes in. To better understand Proof of Storage, let’s take a quick look at data storage in blockchain.
Data storage in blockchain
Blockchain is a public database that is updated and shared among many computers in a network. Data and state are stored in contiguous groups of blocks, with each block cryptographically referencing its parent block by storing the hash of the previous block header.
Take the Ethereum block as an example. Ethereum uses a special Merkle tree called Merkle Patricia Tree (MPT). The Ethereum block header contains the roots of four different Merkle-Patricia trees, namely the state tree, the storage tree, the receipt tree, and the transaction tree. These four trees encode a map that contains all Ethereum data. Merkle trees are used due to their efficiency in data storage. With recursive hashing, only the root hash needs to be stored in the end, saving a lot of space. They allow anyone to prove the existence of an element in the tree by proving that recursive hashing nodes lead to the same root hash. Merkle proofs allow light clients on Ethereum to obtain answers by answering the following questions:
Does this transaction exist in a specific block?
What is the current balance on my account?
Does this account exist?
Instead of downloading every transaction and every block, light clients can only download the block header chain and use Merkle proofs to verify the information. This makes the whole process very efficient.
Proof of storage
Proof of storage allows us to use cryptographic proofs to prove that something was recorded in the database and is valid. If we could provide such a proof, it would be a verifiable statement that something happened on the blockchain.
What can proof of storage achieve?
Proof of storage allows two main functions:
Access historical on-chain data beyond the last 256 blocks, all the way back to the genesis block
Access on-chain data (historical and current) from one blockchain to another, with consensus verification or L2 bridging (for L2)
How does proof of storage work?
Simply put, proof of storage checks whether a specific block is part of the canonical history of the blockchain and then verifies whether the specific data requested is part of the block. This can be achieved by:
On-chain processing: Dapps can obtain the initial trusted block, pass the block as Calldata to access the previous block, and traverse all the way back to the genesis block. This requires a lot of on-chain computation and a lot of Calldata. This approach is completely impractical due to the massive on-chain computation required. Aragon tried to use an on-chain approach in 2018, but it was not feasible due to high on-chain costs.
Use zero-knowledge proofs: The approach is similar to on-chain processing, except that complex calculations are moved off-chain using zero-knowledge proofs.
Accessing data from the same chain: Zero-knowledge proofs can be used to assert that any historical block header is an ancestor of one of the most recent 256 block headers accessible in the execution environment. Another approach is to index the entire history of the source chain and generate a zero-knowledge proof to prove that the indexing was done correctly. This proof is updated regularly as new blocks are added to the source chain.
Access cross-chain data: The provider collects block headers from the source chain on the target chain and proves the validity of these block headers using zero-knowledge consensus proofs. Block headers can also be queried using existing cross-chain messaging solutions such as Axelar, Celer or LayerZero.
Maintain a cache of block header hashes of the source chain on the target chain, or the root hash of an off-chain block hash accumulator. This cache is updated regularly and is used to efficiently prove on-chain that a given block exists and has a cryptographic link to the most recent block hash accessible from the state. This process is called proving the continuity of the chain. It is also possible to use a dedicated blockchain to store block headers for all source chains.
Based on the Dapps request on the target chain, historical data/blocks are accessed from off-chain indexed data or from on-chain cache (depending on the complexity of the request). Cached block header hashes are maintained on-chain, while the actual data may be stored off-chain.
Check whether data exists in the specified block through Merkle inclusion proof and generate a zero-knowledge proof for this. This proof is combined with a correctly indexed zero-knowledge proof or zero-knowledge consensus proof and is provided on-chain for trustless verification.
The Dapp can then verify the proof on-chain and perform the required actions with the data. In addition to validating zero-knowledge proofs, public parameters such as block numbers and block hashes are also checked against a cache of block headers maintained on-chain.
Projects taking this approach include Herodotus, Lagrange, Axiom, HyperOracle, Brevis Network and the nil Foundation. While significant efforts are being made to make applications state-aware across multiple blockchains, IBC (Inter-Chain Communication) stands out as an interoperability standard that enables applications using technologies such as ICQ (Inter-Chain Query) and ICA (Cross-chain account). ICQ enables applications on Chain A to query the status of Chain B by including queries in simple IBC packets, and ICA allows one blockchain to securely control accounts on another blockchain. Combining them can support interesting cross-chain use cases. RaaS providers like Saga will use IBC by default to provide these capabilities for all application chains.
Storage proofs can be optimized in a variety of ways to find the best balance between memory consumption, proof time, verification time, computational efficiency, and developer experience. The entire process can be roughly divided into 3 main sub-processes:
data access;
data processing;
Zero-knowledge proof generation for data access and processing.
Data Access: In this sub-process, the service provider accesses the block headers of the source chain at the execution layer in a native manner, or by maintaining an on-chain cache. For cross-chain data access, the source chain consensus needs to be verified on the target chain. The methods and optimizations used include:
Existing Ethereum blockchain: You can use the existing structure of the Ethereum blockchain to prove the value of any historical storage slot relative to the current block header using zero-knowledge proofs. This can be thought of as a large inclusion proof. That is, given the nearest block header X at height b, there is a block header Y at height bk that is the ancestor of X. This is based on the security of the Ethereum consensus and requires an efficient proof system. This is the approach taken by Lagrange.
On-chain Merkle Mountain Ranges (MMR) cache: Merkle Mountain Ranges can be viewed as a list of Merkle trees, combined when two trees reach the same size. A single Merkle tree in MMR is composed by adding parent nodes to the previous root of the tree. MMR is similar to a Merkle tree, with some additional advantages, such as efficient appending of elements and efficient data querying, especially reading sequential data from large data sets. Appending a new head through a Merkle tree requires passing all sister nodes at each level. To efficiently append data, Axiom uses MMR to maintain an on-chain cache of block header hashes. Herodotus stores the root hash of the MMR block hash accumulator on-chain. This enables them to check the fetched data against these block header hashes by including proofs. This approach requires regular cache updates, which can cause liveness issues if not decentralized.
To optimize efficiency and computational cost, Herodotus maintains two different MMRs. Depending on the specific blockchain or layer, the accumulator can be customized with different hash functions. It is possible to use poseidon hashes when proving Starknet, but use Keccack hashes for EVM chains.
Off-chain MMR cache: Herodotus maintains an off-chain cache of previously obtained queries and results so that they can be obtained faster when data is requested again. This requires more infrastructure than just running an archive node. Optimizations in off-chain infrastructure can potentially reduce costs for end users.
Dedicated blockchain for storage: Brevis relies on a dedicated zero-knowledge rollup (aggregation layer) to store all block headers for all chains it proves. Without this aggregation layer, each chain would need to store the block headers of every other chain, which would result in O(N 2) connections for N blockchains. By introducing the aggregation layer, each blockchain only needs to store the state root of the rollup, reducing the overall connection to O(N). This layer is also used to aggregate multiple block header/query result proofs and submit a single verification proof on each connected blockchain.
L1-L2 messaging: Since L2 supports native messaging for updating L2 contracts through L1, source chain consensus verification can be avoided. Caches can be updated on Ethereum, and L1-L2 messaging can be used to send off-chain compiled block hashes or tree roots to other L2s. Herodotus is taking this approach, but this is not feasible with alt L1.
data processing:
In addition to accessing data, smart contracts should also be able to perform arbitrary calculations on the data. While some use cases may not require computation, for many others it is an important value-added service. Many service providers support computations on data in the form of zero-knowledge proofs and provide this proof on-chain to verify its validity. Because existing cross-chain messaging solutions such as Axelar, LayerZero, and Polyhedra Network may be used for data access, data processing may become a point of differentiation for storage proof service providers.
For example, HyperOracle allows developers to define custom off-chain calculations using JavaScript. Brevis designed an open zero-knowledge query engine market that accepts data queries from Dapps and processes them using proven block headers. The smart contract sends a data query, which is obtained by the prover in the market. The prover generates a proof based on the query input, the relevant block header (from the Brevis aggregation layer), and the result. Lagrange introduces a zero-knowledge big data technology stack for proven distributed programming models such as SQL, MapReduce, and Spark/RDD. These proofs are modular and can be generated from any block header from existing cross-chain bridging and cross-chain messaging protocols. The first product of Lagranges zero-knowledge big data technology stack is zero-knowledge MapReduce, a distributed computing engine (based on the famous MapReduce programming model) used to prove calculation results involving large amounts of multi-chain data. For example, a single zero-knowledge MapReduce proof can prove the liquidity changes of a DEX deployed on 4-5 chains within a specified time window. For relatively simple queries, calculations can also be done directly on-chain as Herodotus currently does.
Proof generation:
Renewable proofs: Renewable proofs can be used when proofs need to be computed and efficiently maintained over a moving block stream. When a new block is created, in order to maintain a moving average proof of contract variables (such as token prices), existing proofs can be efficiently updated without the need to recompute new proofs from scratch. To prove dynamic data-parallel computation of on-chain state, Lagrange built a batch vector commitment called Recproof on top of a portion of MPT, updated it in real time, and dynamically computed it. By recursively creating Verkle trees on top of MPT, Lagrange can efficiently calculate large amounts of dynamic on-chain state data.
Verkle tree: Unlike Merkle tree, which requires all nodes that share a parent node, Verkle tree only requires the root path. This path is much smaller compared to all sister nodes in the Merkle tree. Ethereum is also considering using Verkle trees in future versions to minimize the amount of state that Ethereum full nodes need to hold. Brevis uses Verkle trees to store proven block headers and query results at the aggregation layer. It greatly reduces the size of data containment proofs, especially when the tree contains a large number of elements, and supports efficient containment proofs for bulk data.
Memory pool monitoring to speed up proof generation: Herodotus recently released turbo, which allows developers to add a few lines of code to the smart contract code to specify data queries. Herodotus monitors the mempool of smart contract transactions that interact with the turbo contract. The proof generation process begins when the transaction is in the mempool itself. Once the proof is generated and verified on-chain, the results are written to the on-chain turbo exchange contract. Only after authentication via storage proof can the result be written to the turbo exchange contract. Once this happens, a portion of the transaction fee is shared with the sequencer or block generator, incentivizing them to wait longer for the fee to be collected. For simple data queries, the requested data may already be available on-chain before the user’s transaction is included in the block.
Application of state/storage proof
Proof of state and storage can unlock many new use cases for smart contracts at the application, middleware, and infrastructure layers. Some of them are:
Application layer:
Governance:
Cross-chain voting: The on-chain voting protocol can allow users on Chain B to prove ownership of assets on Chain A. Users do not need to bridge their assets to gain voting rights on the new chain. For example: SnapshotX on Herodotus
Governance token distribution: Applications can distribute more governance tokens to active users or early adopters. For example: RetroPGF on Lagrange.
Identity and Reputation:
Proof of ownership: Users can prove that they own a certain NFT, SBT, or asset on chain A to perform certain operations on chain B. For example, a gaming application chain could decide to launch its NFT collection on other chains with existing liquidity like Ethereum or any L2. This would allow games to take advantage of liquidity that exists elsewhere without actually requiring cross-chain NFTs.
Proof of Usage: Users can receive discounts or premium features based on their historical usage on the platform (proof that the user has traded X amount on Uniswap).
OG Proof: The user can prove that he/she has an active account that is more than X days old.
On-chain credit scoring: A cross-chain credit scoring platform can aggregate data from multiple accounts of a single user to generate a credit score.
All of the above proofs can be used to provide customized experiences to users. Dapps can offer discounts or privileges to retain experienced traders or users and provide a streamlined user experience for novice users.
Defi:
Cross-chain lending: Users can lock assets on chain A and obtain loans on chain B without the need for bridging tokens.
On-chain insurance: Failures can be determined by accessing historical on-chain data, and insurance compensation can be completed entirely on the chain.
Time-weighted average price of assets in the pool: The application can calculate and obtain the average price of assets in the AMM pool within a specified period of time. For example: Uniswap TWAP oracle on Axiom.
Option pricing: The on-chain options protocol can price options using the volatility of the asset over the past n blocks on the decentralized exchange.
The last two use cases will require updating the proof when a new block is added to the source chain.
middleware:
Intent: Storing proofs will allow users to be more expressive and explicit about intentions. While the solvers job is to perform the necessary steps to satisfy the users intent, the user can more clearly specify conditions based on on-chain data and parameters. The solver can also demonstrate the validity of the on-chain data utilized to find the optimal solution.
Account abstraction: Users can leverage proof of storage to set rules based on data from other chains. For example: every wallet has a nonce. We can show that one year ago the nonce was a specific number and currently the nonce is the same. This can be used to prove that the wallet has not been used at all, and access to the wallet can then be delegated to another wallet.
On-chain automation: Smart contracts can automatically perform certain actions based on predefined conditions that rely on on-chain data. Automated programs need to make regular calls to smart contracts to maintain optimal price flow for AMMs or to keep lending protocols healthy by avoiding bad debt. HyperOracle supports automation and on-chain data access.
infrastructure
Trustless on-chain oracles: A decentralized oracle network that aggregates responses from multiple individual oracle nodes within the oracle network. Oracle networks can eliminate this redundancy and leverage cryptographic security to enable on-chain data. An oracle network can aggregate data from multiple chains (L1, L2, and alt L1) onto a single chain and simply use proof-of-storage to prove its existence elsewhere. DeFi solutions making significant progress can also use custom solutions. For example, Lido Finance, the largest liquidity staking provider, has partnered with the Nil Foundation to fund the development of zkOracle. These solutions will enable trustless data access to EVM historical data and protect Lido Finance’s $15 billion of staked Ethereum liquidity.
Cross-chain messaging protocols: Existing cross-chain messaging solutions can increase the expressiveness of their messages by partnering with proof-of-storage service providers. This is the approach suggested by Lagrange in his modularity paper.
in conclusion
Awareness allows tech companies to better serve their customers. From user identity to purchasing behavior to social connections, technology companies use cognitive capabilities to unlock capabilities such as precise targeting, customer segmentation and viral marketing. Traditional tech companies require explicit permission from users and must exercise caution when managing user data. However, all user data on permissioned blockchains is public and does not necessarily reveal user identities. Smart contracts should be able to leverage publicly available data to better serve users. The development and adoption of more specialized ecosystems will make state awareness across time and chains an increasingly important issue. Proof of storage allows Ethereum to emerge as an identity and asset ownership layer rather than just a settlement layer. Users can maintain their identity and key assets on Ethereum, which can be used across multiple blockchains without the need to always bridge assets. We remain excited about the new possibilities and use cases that will be unlocked in the future.


