By a hundred thousand years ago things started to look pretty different, for starting to organize in various different ways things started picking up speed around ten thousand various different ways, things started picking up speed around ten thousand years ago with different kinds of innovations and so on, and it kept getting crazier .
secondary title
Guide: The Background of the Rise of Distributed Storage Technology
According to IDC's forecast, by 2022, digital-driven economic output will account for 60% of global GDP, and China's digital economic output will exceed the global average, reaching 65%. With the acceleration of the digitalization process and the advancement of the goal of intelligence, the data generated by enterprises will continue to grow, and the data will show a trend of massive and diversified development; the deployment environment of multi-cloud and cloud-edge-end is more complex, and the real-time nature of enterprise data The requirements for security and reliability are getting higher and higher, and the current centralized data storage model needs to be transformed urgently to adapt to the new Internet data structure and the growing demand.
Web3.0 will be the natural evolution of the contemporary Internet to the next generation. The next generation Internet must meet the link requirements of machine intelligence and the Internet of Everything. When people start to look forward to this new Web3.0 era, they will find that the contemporary data storage system (centralized cloud storage supplemented by end-test storage) deviates from the needs of Web3.0, and this is precisely the huge opportunity for decentralized storage , Decentralized storage is bound to become a necessary support for the Web3.0 era.
The greatest significance of decentralized storage is that it can store a large amount of marginalized data generated by the interaction between people and machines, and between machines and machines with distributed storage protocols, and then use blockchain technology to add a trust writing layer, so that Individuals can participate in the governance of the network, and each individual can participate in the distribution according to his work in the network and get incentives. This will be an important area for blockchain technology to give full play to its effectiveness.
Distributed storage migrates data from a long-distance centralized server to an edge storage device that is closer to the data. It has lower network communication overhead, lower interaction delay and bandwidth cost, and can provide real-time and reliable data for edge computing. Data storage and access.
This article specifically elaborates on:
Problems encountered in centralized storage;
The meaning of distributed storage;
Listed the typical application projects of blockchain in the field of distributed storage;
The conditions that successful decentralized storage should possess are analyzed.
first level title
1 Problems and challenges encountered by centralized data storage
Our society is experiencing an era of unprecedented information explosion. Computers, smart devices, TVs, home security systems, wearable devices, cars, and even robots are generating and using data all the time, and these data are increasing exponentially Level growth, the coming era of AI and Internet of Things (IoT) is also constantly challenging the boundaries of current data storage.
The current centralized cloud storage is a storage solution that puts storage resources on the cloud for people to access, and this centralized cloud storage method more centralizes data, involves a larger amount of data, and massive data Centralized storage is extremely vulnerable to attacks, posing unprecedented risks to data privacy, security, and persistence.
secondary title
For the current centralized storage, all sensitive data of users is uploaded, which not only makes users lose control over their own data, but also puts the risk of data leakage on the side of cloud storage operators. There are two types of data that are extremely sensitive to this issue: the first type is the sensitive information of the enterprise itself, such as corporate strategic planning, financial information, investment and financing decisions, production purchase and sales strategies, major customer data, business analysis reports, etc., which are not available to the public. As far as we know, it can bring economic benefits to the enterprise and is practical, including business information and technical information; the second category is the user's personal information. With the rapid development of centralized cloud storage, the business production system has accumulated a large number of information containing user names. , ID number, address, phone number, account number and other sensitive information data, if these data are leaked or damaged, it will not only bring some troubles to the leaker itself, but the centralized storage of data will even cause group instability Sexual events, which are not conducive to the development and arrival of the current high-speed development of the era of big data.
Data breach growth rate, source: Identity Theft Resource Center
secondary title
Data security includes two meanings, one is "to ensure the integrity of data without loss"; the other is "to ensure that data privacy is not leaked". The current cloud storage market competition is also very fierce. Due to the increase in the number of users, in order to ensure a good user experience, the cost of service providers has increased, and there is no good means of profit for the time being. Therefore, in recent years, service providers have run away or stopped services. The news is frequent, but users cannot have any restrictions and claims on the behavior of service providers. As a result, users tend to store data in larger and more credible service providers, and the degree of data centralization is getting higher and higher, which also causes large-scale loss of data once it is lost.
Cloud real-time data continues to rise, source: IDC
secondary title
1.3 Sustainability of data storage
Under the framework of Web3.0 in the future, a large number of smart devices will be connected to the network and will generate massive data in real time, and the growth of data volume will be exponential. In this case, centralized data storage obviously cannot meet the needs of network storage. This will be especially evident in the future unmanned driving and Internet of Things (IoT) fields. The future data storage system must not only store, share, and read data, but also achieve efficient and accurate data storage. Transmission and analysis, which pose a great challenge to the centralized data storage structure.
Therefore, the development of storage technology still faces huge challenges even today. These problems are closely related to human factors and centralized operations and management. To completely solve these problems, we must start from the perspective of decentralization. So the industry turned its attention to blockchain technology. A decentralized storage solution based on blockchain technology came into being.
first level title
2 What is the significance of the existence of decentralized storage?
The challenges faced by centralized storage are the opportunities for decentralized storage.
Decentralized storage will combine the best features of blockchain technology to meet the needs of massive data storage. As the name suggests, decentralized storage distributes data to multiple network nodes, which is similar to the distributed ledger of blockchain.
When we rethink the application of blockchain thinking and structure to decentralized storage solutions, we have opened up a better path for current data storage.
secondary title
2.1 Adapt to storage of unstructured data and edge data
With the large-scale rise of image and video applications, the concept of unstructured data (Unstructured Data) can be seen everywhere. Many people simply understand that the content stored in traditional relational databases is structured data, while the data stored in the form of ordinary files such as pictures, audio, video, and documents is unstructured data. According to the IDC report, 75% of future data will be unstructured edge data.
This storage method is also more suitable for current and future data storage structures.
secondary title
2.2 The ownership of the data is returned to the creator of the data
And blockchain technology has been successful in the field of cryptocurrencies and proved that through its security, efficiency and decentralized control, it can realize a way of democratizing data management and returning ownership back to users , which is a distributed storage technology based on blockchain technology.
secondary title
2.3 Conform to the sustainable development trend of the big data explosion
We have already analyzed that contemporary society has entered the era of data explosion, and massive data is expanding exponentially. With the maturity of Internet of Things, artificial intelligence, cloud computing and other technologies, this trend will be strengthened again.
Distributed storage is more adaptable to this rapidly expanding data structure, and distributed storage is essentially a decentralized distributed ledger. As a growing chain data structure, block chain technology participates in the calculation and recording of data through multiple nodes in the network, and verifies the validity of its information. From this perspective, blockchain technology is also a specific database technology. Due to the security and convenience of the decentralized database, many people in the industry are optimistic about its development, thinking that it is an upgrade and supplement to the existing Internet technology.
image description
The storage form of distributed storage is more in line with the development trajectory of data, source: ZTE Research Report
The data structure of the next-generation Internet must be dominated by unstructured data and marginalized data. Under such a data structure, only decentralized storage protocols can be selected.
When people imagine that Web3.0 can remember all the data, it can also be forgotten. It is a secure Internet that will not be attacked by others, and data ownership can be attributed to the user who generated it; under the distributed storage protocol, it can be an online shopping mall or a public speech square, returning the entire Internet to All Internet users, rather than being controlled by a few Internet giants.
3 Case Analysis of Decentralized Storage Projects
secondary title
3.1 Representative projects
Among the decentralized storage projects that have been released so far, many technologies and implementation forms overlap with each other, each with its own characteristics, and there is no very clear division.
Roughly divided according to the implementation logic, the projects represented by BitTorrent, Filecoin, Arweave, and Crust can be divided into the incentive layer of the file sharing network based on content addressing, while the projects represented by Sia, Storj, and MaidSafe are more inclined to pass Incentive tokens share their own hard disk space, which can also be subdivided by whether they are built on their own independent public chains. For example, Filecoin and Arweave are based on their own public chains, while Crust is based on Polkadot. .
Below we make a summary and comparison of existing distributed storage projects:
secondary title
3.2. BitTorrent—the prototype of decentralized storage route
BitTorrent, BT for short, is an open source content distribution protocol independently developed by Bram Cohen in 2003. It uses an efficient software distribution system and peer-to-peer technology to share large-volume files (such as a movie or TV show), and enables each user to provide upload services like a network redistribution node. Commonly used application software includes BitTorrent, μTorrent, etc.
Although BitTorrent is the earliest decentralized storage project, it can only be called the prototype of the decentralized storage model due to the lack of a perfect incentive mechanism. The BTT token based on the TRON wavelength only borrows the story of BitTorrent, and No real power for BitTorrent.
secondary title
3.3 Filecoin (IPFS official agreement incentive layer) - the leading project of decentralized storage projects
Filecoin is an incentive mechanism and a public chain system based on the IPFS (InterPlanetary File System) protocol. The IPFS protocol defines how files are stored, retrieved and transmitted in a distributed system, and can permanently and decentralized save and share files. This is a Content-addressable, peer-to-peer distributed protocol.
IPFS wants to build a real peer-to-peer, decentralized file storage system based on BitTorrent. In IPFS, all files will be centralized, there will be a common language, and all users will be shared throughout the system, which allows them to find and transfer files to each other.
Filecoin is an incentive token launched by the official protocol laboratory of IPFS, which is used to motivate the behavior of various roles in the storage and retrieval market in the Filecoin network. The technical difficulties of Filecoin are proof of data possession, prevention of cheating and attacks, and zero-knowledge proof.
IPFS has been applied to more than 100 scenarios. Not only JD.com and Huawei are deploying IPFS, but Microsoft, Google, Firefox, etc. have also joined IPFS applications. From this aspect, it can be seen that IPFS is developing rapidly. In the future, regardless of text, pictures, or videos, all kinds of content that users want to store may be realized through IPFS.
secondary title
3.4 Crust - IPFS compatible distributed storage project based on Polkadot ecology
Crust Network is an incentive layer of an application-oriented public chain for decentralized cloud services. In this regard, it is similar to Filecoin as the distributed storage incentive layer of IPFS. To facilitate understanding, Crust can be considered to a certain extent as Filecoin on the Polkadot network. Crust is an incentive layer protocol for distributed clouds based on Polkadot parachains. It adapts to a variety of storage protocols including IPFS. At the current stage, it focuses on solving storage problems. It also has a similar vision to Filecoin in this regard.
Crust, as a key project of Polkadot on the distributed storage track, has always received attention from the encryption community, investment institutions, Polkadot ecology and other aspects. As the cornerstone of Web3.0, the development of Crust has been highly concerned by the management of Parity and Web3 Foundation. It is not only the service partner of Substrate Builders Program and Web3 Foundation Grant, but also the Web3.0 established by Wanxiang Blockchain Web3 Foundation. One of the core members of Bootcamp.
Based on the TEE trusted execution environment technology, Crust proposed MPoW (Meaningful Proof of Work) translated into a meaningful workload proof mechanism to count the storage workload of nodes and report to the chain. At the same time, the Crust team has created an original PoS consensus algorithm that defines the quota with storage resources, called GPoS (Guaranteed Proof of Stake) workload report is recorded together with other transactions and packed into a block to calculate a Staking quota, and then based on this quota, Conduct PoS consensus.
secondary title
3.5 Storj - a distributed cloud storage project based on Ethereum
Storj is an Ethereum-based distributed cloud storage protocol developed by the for-profit company Stroj Labs. Storj's core technology is an executable, point-to-point storage contract, that is, two people (or computers) agree to use a certain amount of storage to obtain benefits without knowing each other.
The for-profit side of Storj Labs is that it rents out its network to thousands of users and charges for its use. It's a slightly more centralized model, competing with the likes of Dropbox and Google Drive. They also have a partnership with Microsoft Azure to deploy some of their development tools.
secondary title
3.6 Swarm —— storage service provider on the Ethereum network
Swarm is also based on Ethereum, which provides a distributed storage platform and content distribution services. Participants can effectively pool storage and bandwidth resources to provide services to all participants in the network. In return, they will get a part of Ethereum rewards .
From an endpoint point of view, Swarm is not much different from the Internet except that the upload operation does not happen on a specific server in Swarm.
secondary title
3.7 Sia——a distributed storage project based on an independent public chain
Siacoin is based on the independent Sia public chain, which is used to resolve agreements between storage tenants and suppliers.
Sia can significantly reduce the overhead cost of users' cloud storage by allowing users to "rent" their unused hard drive space, so many people call Sia the Airbnb of hard drives. Sia is completely private and data files cannot be viewed without the private key.
secondary title
3.8 Arweave——A non-IPFS official incentive layer that focuses on permanent data storage
Unlike the decentralized storage featured by IPFS and the decentralized cloud service featured by Crust, Arweave focuses on permanent storage. Arwaeve focuses on solving the problems of restricted freedom of speech, excessive censorship, and easy tampering on the current Internet, and creates a protocol for one-time payment and permanent file storage. Provides a storage solution called Permaweb permanent network, which uses the immutable characteristics of the blockchain to directly write the content into the block for storage.
At the same time, many community developers also pointed out that the application scenarios of Arweave’s certificate storage are relatively narrow, and currently the most stored ones are screenshots of liberal speeches on Twitter. At the same time, the characteristic of Arweave is that it can never be tampered with, which increases the difficulty of program development.
first level title
4 Core elements for the success of the "storage track" project
Looking at the current blockchain ecology, the bottom layer of the blockchain is widely used for trust cohesion, but it often lacks the support of a large number of storage resources on the road to realize decentralized applications. Therefore, we believe that decentralized storage is still in its infancy , is a very promising development direction.
We see that a successful decentralized storage project should have the following characteristics:
1) Solid technical foundation: The bottom layer of technology determines the superstructure of the project, which plays a decisive role in the consensus and future development of the project;
2) Innovative incentive mechanism: Decentralized storage is technically feasible, and the design of the incentive model is the core for this project to start smoothly and even continue to develop;
4) Excellent team: We have seen that most of the teams of successful decentralized storage projects have the ability to build decentralized storage and design token economic models
secondary title
4.1 Solid technical foundation
In terms of file sharing, the file sharing method of the decentralized storage system is completely different from that of the centralized storage system. After uploading a large file in the centralized storage system, the file is stored in a single or distributed network or server in the form of a whole or a slice. An extremely efficient development and operations team is needed to maintain its operation. However, decentralized storage must use distributed storage technology. The initial seed node (the node that initially owns the complete file resource) slices the large file to generate multiple Pieces, and each Piece is stored in a different node. In general, each general node becomes the seed node of the Piece after downloading a single Piece and uploading it to the decentralized storage network for other nodes to download. In the process of sharing the Piece among multiple nodes, the Piece can be removed from the initial seed node. The number of nodes in the file sharing network is continuously expanded. Therefore, when other conditions remain unchanged at the same time, as the number of downloaders increases, the download speed of the same content will be faster. Therefore, the decentralized storage system makes up for the shortcomings of the slow transmission speed of the centralized storage system, and at the same time overcomes the single point of failure and ensures the security of the data.
IPFS is the pioneer in the field of centralized storage. Since its launch in 2014, it has grown freely like BT and has stored a large amount of data. But to make IPFS a commercially available storage system, rather than a random data sharing platform, it must provide quality of service guarantees. This is the problem to be solved by Filecoin, the economic incentive layer of IPFS.
The Filecoin protocol builds two markets: the data storage market and the data extraction market. Users with storage needs go to the data storage market to declare their needs: I want to store data of XX size, require XX copies, and store it for XX days. Storage service providers (storage miners) in the market offer quotations for this storage demand, and users sign contracts with miners and pay fees after accepting the quotations. When users need to use data, they go to the data extraction market to make demands; then the extraction miners give quotations to meet the data access needs.
The above process does not seem complicated, but there are several difficulties in its implementation:
Miners need to provide unforgeable cryptographic proof that user data is stored; during the validity period of the contract, the agreement must continuously check that miners have saved the data as promised. If the contract is breached, miners will be fined; in order to encourage miners to store data, the capacity of the stored data should earn more rewards than the idle capacity. At the same time, it is necessary to prevent miners from defrauding additional rewards by injecting garbage data.
Filecoin designed Proof of Replication (PoRe) to solve the first problem, and adopted Proof of Time and Space (PoTS) and pledge mechanism to solve problem 2. Solve the third problem by fine-tuning the economic model and introducing authentication of real users.
This complex game involves the design of the incentive model we will talk about in the next section.
secondary title
4.2 Incentive mechanism for innovation
On the basis that decentralized storage is basically technically feasible, the design of the incentive mechanism for decentralized storage becomes the key to the success of the project. Trust consensus design is the core of decentralized storage, and the consensus behind it The economic model is the soul and essence of the project.
Because storage requires the participation of storage hardware servers, the miners of decentralized storage projects have strong industrial attributes, which are different from Bitcoin miners. They not only have financial attributes, but also not just dig a coin. , They have a lot of demand side for data storage, and how to revitalize the entire ecology through incentive tokens will test the economic model design ability of the initial project team.
The essence of "storage mining design" is to solve the problems of incremental system tokens "to whom?", "how much?" Let the system enter a self-running distributed system.
On the other hand, tokens have no value at first, and through mining, the anchoring and capture of value is finally completed, making tokens scarce and "acquiring costs". For example, under the framework of the POW mechanism, miners invest in computing power and operation and maintenance in exchange for block rewards and participate in transactions in the secondary market, thus having the concept of "shutdown price".
For example, in the economic model of the Crust project, the upper limit capacity that miners can store is bound to the CRU held by the miner (or the CRU provided by the guarantor), and the CRU pledged by the miner will determine the upper limit of the miner's mining income; in the original PoS The GPoS (Guaranteed Proof of Stake) consensus has evolved in the model.
secondary title
4.3 Clear scene entry and user positioning
The storage track is a very broad market. In this storage market, there will also be different storage requirements. For example, there are relatively high requirements for permanent storage of data; relatively high requirements for data credibility (not tamperable); The requirements for high-frequency interaction are relatively high; the requirements for marginalized data are relatively high.
Obviously, the development of distributed storage technology in the future will also focus on subdividing the projects focused on the track, focusing on consensus, and the "one trick, eat all over the world" approach is unrealistic. We can start from the needs of users' actual scenarios and position The product application of the project realizes the connection and transformation between technology and commercial value through continuous strengthening of technology and scene implementation.
For example, Arweave is one of the distributed cloud storage projects, but the only difference is that it focuses on a sufficiently small application scenario, focusing on "information to permanent storage".
The consensus mechanism of Arweave is not the consensus of PoS, but a simple extension of PoW, called PoA (Proof of Access). Each round of block generation will require nodes to "recall" past blocks, and only miners who have stored recalled blocks are eligible to participate in this round of block generation elections.
This kind of mining mechanism and consensus design is very ingenious, and it also achieves the "permanent preservation" of data on the chain, and realizes storage applications in subdivided fields.
first level title
5 Challenges and bottlenecks encountered in decentralized storage
We can now find a variety of decentralized storage solutions on the market, and this phenomenon means that the decentralized storage track is still in a very primitive stage, still in a trial and error stage, All projects, regardless of size, have a very high failure rate, and there are also various irrationalities.
Decentralized storage faces huge challenges and bottlenecks in terms of incentive policy formulation, legal compliance, storage costs, data retrieval and storage stability.
secondary title
5.1 The storage cost is unstable and the cost structure is unreasonable
On the one hand, decentralized storage encourages the circulation of tokens on exchanges, and the price of tokens will fluctuate greatly. Both miners as storage space providers and users of storage space will face usage fees. There are large fluctuations, and there may even be events of market speculation;
Two reasons will lead to the loss of ecological players, including storage providers and storage users.
secondary title
5.2 Centralized project party of decentralized storage
And the design of an autonomous decentralized system should be lightweight, but due to the particularity of decentralized storage projects, since storing data requires self-control of nodes, the economic model design of decentralized storage is often complicated, resulting in decentralized Centralized storage projects also highlight certain vulnerabilities in multi-party collaboration.
secondary title
The ultimate goal of decentralized storage must be to attract as many users as possible to contribute the idle storage space of their own devices, so that the idle space of the whole world can be connected to achieve the purpose of edge computing and edge storage. However, the actual situation is that the participation threshold of distributed storage is too high, and even special machines need to be purchased to provide storage space. This situation deviates greatly from the original intention of distributed storage.
secondary title
5.4 There are major flaws in using stored data as the basis for mining income
But storing data is a cumulative amount, that is to say, those who store data first must have a first-mover advantage. This advantage is not only a problem of more incentives and rewards at the beginning, but also the first-mover advantage will continue to expand with the event itself, and this The advantage will continue to squeeze the miners who want to join in the future, which is very detrimental to the development of the entire ecology.
first level title
6 Summary
The business model of the society has gradually changed from "production of material" to "production of data". The significance of data storage infrastructure has a huge impact on the future business society. Distributed storage has indeed solved some current problems of centralized storage. In the future, it will definitely Occupying a certain share in the trillion-dollar storage market is also an important reason why Taihe Capital is optimistic about this track and will continue to make efforts in this track.
Taihe Capital Mining Fund is one of the earliest blockchain capitals in China that pays attention to the ecological construction of distributed storage. It has many layouts in the ecological field of the Filecoin project, including IPFS mining machines, FIL6, FIL12, and cloud computing power; Crust, the distributed storage project of Card Track.
1. https://filecoin.io/zh-cn/2020-engineering-filecoins-economy-zh-cn.pdf
2. Labs, P. A Guide to Filecoin Storage Mining. Filecoin Available at:
https://filecoin.io/blog/filecoin-guide-to-storage-mining/.
3. https://pcpartpicker.com/user/tperson/saved/H2BskL
4. Venturo, B. The economics of Ethereum's Casper. Medium (2018). Available at:
https://medium.com/@brianventuro/the-economics-of-ethereums-casper-6c145f7247a2.
5. https://www.reddit.com/r/CryptoCurrency/comments/982x9l/top100cryptocurrenciesrankedbyannualized/
6. http://app.czce.com.cn/cms/cmsface/option/Calculator/utCal.jsp
7. Project, T. A. Decentralised storage: Incentives vs Contracts. Medium (2019). Available at:
https://blog.goodaudience.com/decentralised-storage-incentives-vs-contracts-b74ee0b7eff1.
8. https://viewblock.io/arweave/stats
9. Bram Cohen. Incentives build robustness in bittorrent. In Workshop on Economics of Peer-to-Peer systems, volume 6, pages 68{72, 2003. [19] Matt Corallo. Compact block relay. bip 152, 2017.
10. Project, T. A. Arweave News: July. Medium (2020). Available at:
https://medium.com/@arweave/arweave-news-july-7905d5e0c84f.
11. ĐApps: What Web 3.0 Looks Like Available at:
http://gavwood.com/dappsweb3.html.
12. Swarm Available at:
13. G. Wood, Ethereum: A secure decentralised generalised transaction ledger, In: Ethereum Project Yellow Paper 151 (2014).
14. Solana - Arweave Bridge: ArweaveTeam Funded Issue Detail. Gitcoin Available at:
https://gitcoin.co/issue/ArweaveTeam/Bounties/30/100023463.
15. SKALE Network - Arweave Bridge: ArweaveTeam Funded Issue Detail. Gitcoin Available at:


