4D Changwen looks forward to the Web3 data market: how to solve the problems in data usage?

PANews

特邀专栏作者

2022-01-24 09:14

This article is about 14057 words, reading the full article takes about 21 minutes

Let’s start with the data market today and sort out some of my understanding of this field.

AI Summary

Expand

Let’s start with the data market today and sort out some of my understanding of this field.

Original source: Mirror

At the beginning of 2022, I wrote on Twitter:

Areas of personal concern in 2022: web3 data market/infrastructure, web3 social/streaming media, non-capital chain games, wallet (web3 entrance, including DID), subculture/youth culture NFT consumption, DeFi regulatory solutions, technological breakthroughs Public chain/cross-chain.

Table of contents

1. How important is data?

1. Changes in production methods and migration of organizational forms

2. Internet's reconstruction of traditional business models

3. The information island of web2

2. What challenges exist in data usage?

1. Privacy boundary and privacy protection

2. Data externalities and establishment of property rights

3. Internet of things and data collection

4. Data value matching

5. Data valuation

3. In which ways may web3 solve these problems?

1. What is web3?

2. Possible ways for blockchain to solve some problems

first level title

secondary title

1. Changes in production methods and migration of organizational forms

Since the development of human society, productivity has undergone several changes. Changes in productivity bring about changes in production methods, which in turn will affect the organizational form of production, because production organizations are created to adapt to production activities after all. Pure production requires bartering to meet demand, which is often inefficient and cumbersome. In order to meet the needs of improving efficiency, money emerged and became the general equivalent of commodity exchange. The circulation market began to be gradually established, and commercial activities based on the circulation market began to flourish.

In my opinion, there have been three types of changes in the mode of production that humans have experienced so far:

First, marked by the appearance of utensils, the primitive society entered the farming society. Through the use of various tools such as stone tools, bronze tools, and iron tools, human beings began to transform nature according to their own needs, began to grow rice and wheat, began to raise poultry, and began to settle down. This period was dominated by self-sufficient production activities (agriculture, cottage industry), and with the development of civilization, some commercial activities gradually appeared (the word "merchant" originated from the Shang Dynasty). With the development of society, products are becoming more and more complex, self-sufficient production methods are becoming more and more difficult to meet individual needs, and the proportion of commercial activities is also increasing. It has continued for thousands of years, and many common commercial institutions in modern society have taken shape at this stage, such as banks and customs.

Second, marked by the invention of the steam engine, the handicraft industry entered the mechanical industry. Coal and steel solved the energy and material problems of productivity change respectively, while the steam engine dealt with the labor efficiency problem. After all, the strengths of human beings are not in physical strength, and repetitive and inefficient production will eventually encounter upper limits and bottlenecks. The emergence of machinery (including the later electric revolution) liberated human hands and improved production efficiency. As a result, the mode of production began to evolve in the direction of specialized division of labor. By enslaving machines, human beings had more time to develop science and technology and humanities, and civilization was able to move forward on a more complex and diverse road. The liberation of the production side has created the prosperity of the circulation side, commercial activities have begun to explode, and the modern enterprise system has begun to take shape.

Third, with the emergence of the Internet as a symbol, from machine production to information production. The Internet, as the name suggests, is a network of interconnected computers. Civilization produces a large amount of information in the development. Taking bookkeeping as an example, human beings first recorded the quantitative relationship in economic activities by knotting ropes and engraving deeds; Recorded on paper after the invention of the technique. As civilization evolved and production activities became more and more complex, the need for a set of clear and understandable bookkeeping rules began to grow stronger. It gradually developed to the double-entry bookkeeping method in accounting that we are familiar with today.

However, these produced information did not have the opportunity to exert greater value. In the long history, they were either recorded and settled in a corner where no one can be canonized; or they were forgotten and dissipated in the past. Until computers (in a broad sense, computing chips) replace pens and paper as information-carrying tools, human beings can record and share information in a more efficient and wider-capacity way. In the context of the Internet, production activities and commercial activities have rediscovered the value of information, making information not only a product, but also a means of production.

Before the emergence of Internet products, information can of course be used as a means of production, but this means high costs; and the emergence of the Internet has enabled information to be digitized, which endows it with a very important feature: zero marginal cost. (The simple understanding of marginal cost is how much the product cost will increase for each additional quantity produced)

In fact, another major advantage of information production over machine production is network externalities (network effects). The network effect means that the increase of each node in the network will bring positive utility to the existing nodes. This essentially still comes from the characteristic of zero marginal cost of information - every newly added node will share a part of new information with zero cost to all nodes in the network (this is the origin of positive utility).

Zero marginal cost and network externalities endow information production with some very scary characteristics, such as rapid expansion and natural monopoly. After understanding these two points, you will easily understand why Internet companies can create value beyond traditional manufacturing in just a few years, understand why startups in the Internet industry always like to burn money, and understand why Chinese Internet companies recently started go downhill.

secondary title

2. Internet's reconstruction of traditional business models

With the continuous transformation of production methods, the focus of human economic activities has also begun to shift. Compared with material production, information production has received more attention because of its broader development prospects. In addition to the original business activities on the Internet, it will be more imperative to use the Internet to transform traditional industries.

The existing transformation methods have two directions. One starts with the production process, and the goal is to improve production efficiency. For example, Industry 4.0, which was called bad a long time ago (2013, Germany), improves the existing Production system, industrial division of labor, logistics management, etc.; the other is to reconstruct business models, such as sharing economy, information platform, online shopping, social networking, etc.

Traditional business models are linear. Suppose you want to buy a thermos cup (why I thought of a thermos cup first), your first thought is to go to retailers such as supermarkets/malls; you won’t say that I will go to the manufacturer to get the goods first, and the manufacturer usually won’t Here you go; you wouldn't say I wish my thermos was made of titanium steel and go to a steel mill further upstream. A complete chain from upstream material suppliers to midstream manufacturers (and then to downstream retailers) is the industrial chain.

The production of manufacturers is also relatively blind. Why do you say that? Because the manufacturer has its own account, which is cost at one end and profit at the other end. Profit comes from downstream orders, and usually whoever's conditions are more suitable will accept the order. The needs of consumers cannot be directly communicated to manufacturers. Broadly speaking, every node in the industrial chain cannot directly transmit information and value with non-adjacent nodes at low cost.

The reconstruction of the Internet is to turn the "chain" into a "net".

In the network, any node can establish a connection with each other (unless the leader does not allow it). Consumers can bypass retailers and directly find manufacturers for wholesale or customized products (the former means that the boundaries of traditional roles are beginning to blur, as long as you want, consumers can also become retailers; the latter means that every node in the industrial chain Both have more choices, which is beneficial to breaking vertical monopoly and improving efficiency); it seems that the role of retailers is intentionally eliminated, but it is not. The Internet actually emphasizes the role of retailers as an information intermediary, because it costs consumers to go directly to manufacturers, and if retailers can integrate and match information well, they can make profits.

However, we know that distributed systems will bring a lot of redundant information. If the Internet just turns the "chain" into a "network", then what follows will be information blocking and information interference, and efficient and accurate matching between information cannot be completed. The second point of the Internet's reconstruction of business models is the emergence of platforms.

What the platform does is essentially information matching. After the linear traditional industrial chain is reconstructed into nodes by the Internet, there needs to be something to realize what was originally realized by the industrial chain, that is, matching supply and demand information. Manufacturers go to the B-side (business), and consumers go to the C-side (customer). Consumers' demand for a certain type of product can be captured by the manufacturer. When enough of the same demand appears on the entire platform, the manufacturer's production will become profitable (decreasing marginal cost).

As we said earlier, the Internet has two characteristics of production: zero marginal cost and network externalities. When more and more nodes are connected through the platform, they will gradually become path-dependent on the platform, which means that the platform's right to speak in production/commercial activities is getting stronger and stronger. The right to speak means pricing power, and zero marginal cost brings almost zero cost to the platform, so pricing power almost means higher profit margins for a single node; while network externalities bring accelerated node entry to the platform. When the two factors of profit are increasing at a terrifying speed, one can imagine how much benefit a successful platform will gain.

Let us explain the three previously mentioned issues below:

Why can Internet companies create value beyond traditional manufacturing in just a few years? Why do startups in the Internet industry always like to burn money? Why have Chinese Internet companies started to decline recently?

Problem one has been solved. Question two. Because the platforms in a state of competition are faced with the instability of the right to speak, and the multiple choices of new nodes. With similar opponents, the result is uncertain. (Typical cases such as bike-sharing wars) and non-stop financing and burning money to compete for users is to make users have no choice in the future, and then use their right to speak to seek profits. (Example: Didi)

This is the essence of the Internet platform business model. "winner-take-all"

But in fact, the platform can do more than that. If it interferes with the normal development of the market just because of the characteristics of the platform itself, this kind of behavior is short-sighted and unsustainable. If burning money wins, it is bound to "tax" nodes in the future to make up for the money that has been burned. At this time, a new platform with good strength will appear, and it is easy to attract traffic through better services and lower prices. Others are debt-free at this time, but what about you? (Cases such as Hello after the bike-sharing war)

Network externalities do not mean a pure moat, but "good service = extremely strong moat" and "bad service = the building will collapse". This unhealthy business model is not established for a long time.

Talk about what the platform can do. (Actually, it’s off topic, but since I’ve said it, let’s finish it)

As mentioned earlier, the Internet restructures the industrial chain, turning the "chain" into a "network", and the platforms fight to snatch these nodes. But they ignore the premise of network externality is the node's path dependence on the platform, and also ignore the difference between nodes. Take online car-hailing as an example. Drivers and passengers are two nodes with different natures. The consumption behavior of passengers taking a taxi is more random, and they pay more attention to the result of "taking a taxi to the destination". As for the discount, which platform is it? Believe me, passengers will download every APP during the online car-hailing war, and those who can prostitute for free will basically not miss it; the driver is different, the relationship between the driver and the platform is more like a new type of free In the employment relationship, although multiple apps are used at the same time, they are well aware of how each app treats them.

In other words, it is easier for drivers to cultivate loyalty, and it also plays a more important role in the behavior of taxis (drivers are service providers, drivers will not blame the platform when they encounter bad passengers, passengers encounter bad drivers The platform will inevitably escape the blame). Therefore, the goal is to use the incentive mechanism to align the interests of drivers and platforms as much as possible. Whether it is subsidies or other measures, it must be as biased as possible to the driver. Someone said, what about the passengers? Don’t forget, in the context of network externalities, the latter is still the best choice among passengers’ two choices (taxi and online car-hailing) (but the reward is slightly less).

Therefore, through the balance of interests in the life cycle, more resources are poured into long-term incentives for drivers, so that their interests are consistent with those of the platform; on the passenger side, priority is given to providing a more convenient and comfortable experience than taxis (provided by drivers), and economic incentives are placed second. , is a more reasonable and healthier way of playing.

image description

Draw a schematic diagram for the mentioned modes

All of the above is the Internet, which exists between computers (humans) and computers (humans); and what if the Internet of Things also joins? The connection between computers (things) and computers (things), and between computers (things) and computers (people) will make the network grow by multiples of the power level. Think about how many items we have on average, and how much each new node will increase the complexity of the network can be understood.

secondary title

3. The data island of web2

As mentioned earlier, Internet companies complete information collection and matching by establishing platforms, and earn a lot of profits by taking advantage of the information production characteristics of zero marginal cost and network externalities. With the increasing development of Internet of Things, big data, cloud computing, artificial intelligence and other technologies, human life will become more and more "digitalized": use digitalization to solve payment scenarios, solve workflow, solve social connections, solve financial business needs... in During this digital migration, the "online" time of human beings will continue to increase, and more human activities will be recorded as data and stored on the Internet.

Think about today, sleep monitors can get your sleep data, smart homes can get your life data, smart travel tools can get your movement track, and ubiquitous monitoring can get all your body and behavior data... In the future, The addition of the Internet of Things will only make your data database more abundant. Big data and cloud computing will allow algorithms to describe your digital image through data, and will accurately locate the connection between data and individuals through search...

The data ecology of web2 is obviously difficult to meet the increasingly complex data production and demand activities.

Giant Internet companies make profits by monopolizing user data, but in essence they do not own the ownership of the data—they just obtain the data by providing free services; they also have no perfect mechanism to protect the data (obviously, they also There is no incentive to do so), privacy leaks are the norm; data is stored on their central servers, and they don't bother to record the details of every copy. The most important thing is that different institutions have their own databases, which come from invalid repetitive collection; data storage and management are not systematic, and there are a lot of distortions; data islands are formed among institutions, and there is a lack of interoperability measures; illegal data transactions occur frequently , The cost of trust is abnormally high.

When web3 and the Internet of Things come together, data will grow exponentially. If the above problems are still not resolved, how many inefficient market transactions will be born? The application value of new technologies will be greatly reduced.

first level title

2. What challenges exist in data usage?

Modern commercial activities are based on the market mechanism. According to different exchange objects, the market is usually divided into: commodity market, service market, technology market, financial market, labor market and information market.

Among them, the technology market can be divided into technical goods and technical services, cut off; and services can also be packaged as commodities in essence; therefore, from my point of view, it is generally divided into: commodity market, labor market, financial market, and information market. (The reason why labor is singled out is that there are people behind it, and human behavior is complex and unpredictable, and cannot be simply defined as a commodity)

The first three are what we can often come into contact with, but the concept of information market is relatively abstract. As the name implies, the objects of exchange in the information market are information, such as business information, economic information, and talent information. Most of the information exchanged in these known information markets, such as real estate agencies, headhunters, HowNet, user information transactions, etc., have specialized information intermediaries. Users must pay to obtain such information, otherwise they will need to pay a lot of cost to find it.

secondary title

1. Privacy boundary and privacy protection

The first issue that needs to be mentioned is privacy protection. I mentioned earlier that a lot of data will be recorded:

Sleep monitors can get your sleep data, smart homes can get your life data, smart travel tools can get your movement track, ubiquitous monitoring can get all your body and behavior data...

These data are valuable to companies that provide corresponding services. For example, if the smart air conditioner detects that you like to turn on the air conditioner in winter, this piece of data may be purchased by a certain "Balabara Ion Heater" manufacturer, and then push you an advertisement for its product "Healthier and more energy-efficient than air conditioners"... Manufacturer targeting The cost of buying 1,000 pieces of such data may be much lower than going to the homepage of a certain website to advertise. Of course, ideally, the money is paid to you, after all, you are the owner of this piece of data.

Here comes the question: what if you don’t want people to know that you like to turn on the air conditioner?

The most extensive way is of course to directly remove the smart air conditioner and replace it with an ordinary air conditioner; but what if the chip of an ordinary air conditioner can also collect data? It may be more reliable to go to the second-hand market to find an old-fashioned electric fan. The same is true for smart refrigerators. It is best to replace them with cellars for ice storage; you cannot take high-speed rail or pass through toll booths. Progress, but you degenerate into a primitive man.

——It is obviously unrealistic to exclude new products and exclude data collection. The point is that individuals must have the right to choose independently, and they can choose what kind of data is collected and what kind of data is not. But is this really realistic?

Friends who have studied economics know a concept called "moral hazard", which comes from the information asymmetry after the event. That is: if the user chooses what kind of data is collected, the user can choose not to provide any data, or provide false data for profit, because no one wants some real data about their lives to be known.

If things develop like this, it is meaningless to discuss data, and the digital economy will cease to exist. Because no one wants to go through a lot of trouble and finally learn that your name is "Kambu Nittlesweezybuck Nibvista I won't give you your real name and you can take your time to guess but I'll take the money first Scattering Lala Zhang".

Therefore, data collection must be objective and default, which requires a sufficient degree of privacy protection recognized by users themselves. At this point, the current cryptography technology has some directions.

But the real question is often philosophical: How do you define the boundaries of privacy? Should privacy boundaries be chosen by individuals or groups? How to balance regulation and individual rights? How to deal with privacy externalities?

For example, if the data is collected by default, it is up to the user to choose whether the collected data is encrypted or not. In this way, in case of an emergency, the government can choose to enable the "encrypted" data selected by the user, and some of the data that is usually involved in business is also handed by the user. Choose, and benefit from the user, this seems like a good solution.

But in reality, what if the person is a terrorist and the data he chooses not to release contains information that could find him? Some people say, let the government enable it! The problem is that the government does not know who the terrorist is before it is activated, so it can only be fully activated in order to know who it is, which will affect other innocent users (privacy is leaked); negative externalities. How to deal with these externalities?

secondary title

2. Data externalities and establishment of property rights

When it comes to data externalities, two concepts must be introduced: non-rivalry and non-excludability. These two concepts are used to define public goods, and externality exists in the problem of public goods.

**Nonrival means that when one person consumes a product, it does not reduce or limit other people's consumption of that product. **Typically, this means zero/low marginal cost (so Internet products are usually non-competitive). Most of the data we see can be reused, and will not burn itself or change the content because it has been used once. The difference is that for the number of university admissions, if I squeeze into the score line, one person must be squeezed out, so the college entrance examination is "competitive".

**Non-excludability means that when one person consumes a certain product, it cannot be excluded (or the cost of exclusion is high) that other people also consume this product. **What does that mean? For example, if you go fishing in a fish pond, you can’t stop others from fishing (unless the pond belongs to your family); or if you go to the road in the middle of the night and see another person walking the road, you can’t hit him unless you give him There is a lot of money to ask him to leave, but if he leaves and another person comes to jam the road, you still can't hit him, because everyone has a share in the road.

Those that satisfy non-rivalry and non-excludability are public goods. There is a famous game in the problem of public goods: the "tragedy of the commons", which means that everyone wants to use public resources as much as possible for personal gain, which eventually leads to the collapse of public resources. This is because each person's use of the common resource creates a "negative externality" for everyone else. We know that in the Internet, externalities are positive. This stems from the zero marginal cost of information production, which public resources obviously do not have.

Regardless of whether externalities are positive or negative, the existence of externalities means that property rights are not clear enough. The market is unable to make reasonable prices for commodities whose property rights are not clear enough. How to treat data externalities?

First we need to classify the data in terms of the concepts of non-rivalry and non-excludability. For data that is non-competitive and non-exclusive, it should obviously be provided by the government/public organization, and the proceeds will go to them. Such as weather forecasts, macroeconomic data. This kind of public data has a characteristic: they have nothing to do with individuals. This is the clearest one.

For competitive/exclusive data, due to the inability to clearly separate the subject of rights during the production process, it is impossible to separate the public content and private content in the data. For example, a company wants to find investment opportunities in City X through the life data of an ordinary person in City X. A total of 100,000 people in City X are willing to provide such data, but the company only needs 10,000. This type of data has externalities, because part of their content is shared, and the adoption of any piece of data will cause other data to be affected by "negative externalities" and depreciate.

For another example, besides me knowing my listening data, the software that records the data must also know, because I use this software to listen to songs. Excluding my behavior, the rest is essentially produced by software. Does this mean that the software also owns part of the property rights of my listening data?

secondary title

3. Internet of things and data collection

The first two points are more or less involved in data collection. For example, should data collection be spontaneous rather than controlled by choice? How can data collection under individual control ensure authenticity? How does spontaneous data collection ensure that privacy is not violated? The scope, method and scale of data collection?

Existing data collection may mainly occur in the behavior of "surfing the Internet". For example, shopping habits and action tracks can be obtained through payment and consumption records; individual thoughts and cognitions can be speculated through online speech; personal preferences can be obtained through browsing records, application download records, etc. However, behind smart home, autonomous driving, monitoring, etc. may represent another data collection path with wider coverage - the Internet of Things.

The Internet of Things will be full of machines equipped with high-speed computing chips in the lives of individuals. The daily work of these machines will accumulate a large amount of data, which will be matched into the database through calculation and processing. These richer details will make the big data's portrait of the individual clearer, from simple behavior habits to thinking cognition and spiritual characteristics. On the one hand, this is of great significance to the digital economy and social governance, and on the other hand, it has also triggered an Orwellian individual privacy dilemma-not only from the anxiety of being constantly monitored, but also because once these important data are leaked, basically The internet can announce the "death" of a digital age citizen.

secondary title

4. Data value matching

When it comes to the data market, one of the issues that has to be said is the value matching of data.

What's the meaning? Compared with the commodity market, we are very clear about what each commodity can do, and it is based on this that we give the expected price based on our own needs. For example, I am a farmer, I can cut ten catties of firewood a day, and a catty of firewood can be sold for twenty yuan, I want to go to the market to buy an ax, which can be used for thirty days, so I know: the ax can cut six thousand in total A piece of firewood, I should earn 3,000 when I am so tired of chopping wood, so the expected price of the ax is less than 3,000.

But data marketplaces are different. There is a paradox in the discussion of the value of data: if I do not know the content of a piece of data, I cannot determine its value; but once I know the content of this piece of data, this piece of data has no value for me. This feature makes it very difficult for the data market to naturally complete value matching.

Fortunately, big data technology allows data whose content cannot be seen at a glance to complete value discovery. Data demanders can search or mine the desired data, and now the difficult problem before them is: how to determine the "correctness" of the data content?

That is: if low-value data is disguised as high-value data, how can data demanders who cannot view the content in advance quickly filter to meet their needs?

There is a technology in cryptography that "convinces the verifier that a certain assertion is correct without providing any useful information to the verifier", which is called "zero-knowledge proof". However, how does the provider of zero-knowledge proof ensure that his motivation to provide correct assertions is not affected by high profits? It is a good idea to design an ex-ante incentive mechanism, but if the exact value of the data cannot be known, how to adjust the incentive amount?

secondary title

5. Data valuation

There is another point that is easily overlooked: data valuation. Since there is a transaction, there must be a generally recognized valuation system, otherwise the market will be chaotic. Current data valuation methods include:

The cost method takes the cost of collecting, storing and analyzing data as the basis for data valuation. An obvious problem is that most of the data is not specially produced, but is an appendage in other activities; most of the data is collected and stored at the same time; the property rights of most of the data are still difficult to define. This makes their costs difficult to divide.

Income method, predicting the future cash flow of the data and discounting it. However, the utility generated by the data is difficult to model at all. Take the value matching just mentioned as an example. If the matching is wrong, the data may be worthless. Should this part of the probability be folded into the expected value? In addition, the utility of the same data for different users is completely different, and it is difficult to formulate a common standard.

first level title

secondary title

1. What is web3?

image description

I searched a few from the Internet

These statements point to several outstanding features of web3 (and my own summary):Data property rights, community co-construction and sharing, open source, data transparency, individual value creation, and value layer.

Data property rights:Individuals have the ownership of private data and can use their own data to create and obtain value; private data is bounded by privacy technology. Corresponding to the monopoly of user data by web2 giants.

Community co-construction and sharing:Open source:

Open source:The premise of consensus is open source, and the premise of joint construction and sharing is open source. Open source is the future of algorithm-based trust mechanisms.

Data transparency:The data is recorded under consensus approval, which is traceable and cannot be tampered with.

Individual value creation:Individuals can, as a whole, complete the division of labor with others through the cooperative mechanism established by the algorithm. Governance issues of all kinds begin to become clear and concise.

priceValue layer:web3 is built on the bottom layer of currency equivalent value, which provides incentives and guidance for data rights confirmation and exchange, community co-construction and sharing, and individual value creation.

The underlying technology of web3 is blockchain. The blockchain has the characteristics of distributed accounting, transaction traceability, non-tampering, openness and transparency, smart contract programmability, and "algorithm + incentive mechanism" collaborative drive. I also wrote an article before to learn more:secondary title

2. Possible solutions to existing problems by blockchain

So what are the advantages of blockchain in solving the data market problems mentioned above?

Let’s take a look at the questions mentioned earlier. The summary is as follows, and my personal answer is attached:

Q: Giant Internet companies make profits by monopolizing user data, but in essence they do not own the ownership of the data—they just obtain the data by providing free services;

A: Every transaction made by a user on the blockchain is maintained by multiple miners, and the transaction records are open, transparent and queryable; currently any project that needs these data will complete the cold start by distributing tokens, and at the same time rewarded users. With the advancement of technologies such as privacy and zero-knowledge proof, users in the future will own their own private data and their property rights, and will be able to independently determine the use of these data.

Q: They also do not have a perfect mechanism to protect these data (obviously there is no incentive to do so), and privacy leakage has become the norm;

A: The consensus mechanism of the public chain determines that its security will not be affected by a single or multiple centers, because the design of the blockchain has combined the consensus mechanism with the incentive mechanism, so there is no need for special incentives except mining rewards. (Considering environmental impact and waste of resources, ETH is currently shifting from POW to POS. However, POS is not a perfect mechanism, and the consensus mechanism is still in a state of continuous evolution.) For evil deeds, the consensus mechanism will also implement punishments according to the algorithm. The security flaws of the consensus mechanism come from malicious attacks that follow the rules of the mechanism. As more and more nodes join, such opportunities will become less and less.

Q: The data is stored on the central server of the giant, and they will not deliberately record the details of each copy;

A: Blockchain stores data in distributed ledgers, which are maintained by decentralized miners. At present, it seems impossible to record the data access records, but this is not necessary, because the records available for inspection have always been open and transparent. In the future, if it is private data that involves privacy, it will be protected by the corresponding algorithm. Any access will require payment of a cost and possession of a transaction record.

Q: Different institutions have their own databases, which come from invalid repetitive collection;

A: Due to the underlying data sharing, users of the blockchain do not need to collect repeatedly. They only need to use the modular front-end or crawl by themselves. He can also share these results with others. It doesn't matter about sharing, it is open and transparent.

Q: The storage and management of data is not systematic, and there are a lot of distortions;

A: Everything that is recorded on the chain is confirmed by the miner group under the consensus mechanism. Since the ledger is distributed, there is no problem of loss; for serious differences, it will be forked after the community votes. History can still be truly recorded.

Q: Data islands are formed among institutions, and there is a lack of interoperability measures;

A: Data sharing. Modular products will be more conducive to interoperability.

Q: Illegal data transactions occur frequently, and the cost of trust is abnormally high;

A: All public data does not need to be improperly traded. The transaction of non-public data is completely free and will be recorded without a separate trust process, because the algorithm achieves this.

Q: How to define the boundaries of privacy? Should privacy boundaries be chosen by individuals or groups?

A: Regarding the existence of privacy boundaries, there is a concept called "reasonable expectations of privacy". In the 1967 case of Katz v. Commonwealth, it was proposed to solve the boundary problem of privacy rights. Since the public telephone booth used by Katz was tapped by federal officials, Katz put it take to court. The Supreme Court of the United States held that "protecting people, not places" means that as long as an individual's behavior is not intended to be made public and deliberately avoids attention, it can be protected even if it occurs in a public place. However, there is a fatal problem with this concept, that is, no one knows whether the "individual's willingness to act" is benign or malicious. As in the example I gave earlier, terrorists do not want to be public and deliberately avoid attracting attention. Should such privacy be protected?

My personal view is that privacy should be protected if it has no externalities. Once personal privacy has a negative impact on the outside world (negative externalities), someone needs to be responsible for it. Individuals who generate negative externalities should pay the cost to restore society to its original state, just like the issue of sewage treatment stipulates the right to discharge sewage.

However, as mentioned earlier, we are not sure who the negative externality comes from, and we can only search through the privacy of all individuals. This behavior has caused another negative externality. Is there such a technology that solves the above-mentioned problems by performing zero-knowledge proof verification on any question of the information queryer by the machine?

Regarding the latter question, I think the basic boundary should be chosen by the group, the objective boundary is determined by the externality principle, and the combination of the two is the legal privacy boundary. Individuals can freely choose to maintain privacy or profit from the use of private data on the basis of legal boundaries based on personal choice.

Q: How to balance supervision and individual rights?

A: Pardon my ignorance.

Q: How to deal with privacy externalities?

A: Blockchain technology is driven by "algorithm + incentive mechanism". When a transaction occurs on the chain, multiple transaction parties can be divided. For division, if the ownership of different nodes in any transaction can be clarified, property rights can be divided based on this to solve the problem of data externality. (The above is my nonsense) Privacy also belongs to data, but there are still problems with privacy externalities, that is, how to prevent evil in advance.

Q: The externality of data seems inevitable, how can we establish clear property rights for data?

A: Above.

Q: Should data collection be spontaneous rather than controlled by selection? How can data collection under individual control ensure authenticity? How does spontaneous data collection ensure that privacy is not violated?

A: I think that with the technical support of a secure and complete privacy algorithm, data collection should be spontaneous. The reason I have said before is that if it is controlled by individuals, the data market will be polluted by a large amount of false data, and it will no longer be necessary. If the data collection under individual control is to ensure authenticity, it must ensure that it has a sufficient punishment mechanism. For example, once the production of false data is discovered, it will be removed from the data market (this means that the right to data revenue becomes useless). However, when technology cannot guarantee the privacy and security of data collection, I think that individuals should retain the right to choose to participate in data collection (giving up data collection also means almost giving up the right to use data, because without machine assistance, it is almost impossible for human beings to do data processing. effective use).

Q: How to determine the matching "correctness" of data content and data title? That is: if low-value data is disguised as high-value data, how can data demanders who cannot view the content in advance quickly filter to meet their needs?

A: Same as the privacy externality part, put your hope on new technologies - you can always trust "algorithm + incentive mechanism". If you don't believe it, then change it to I have always believed in "algorithm + incentive mechanism".

Q: How does the provider of zero-knowledge proof ensure that his motivation to provide correct assertions is not affected by high profits? It is a good idea to design an ex-ante incentive mechanism, but if the exact value of the data cannot be known, how to adjust the incentive amount?

A: Same as the privacy externality part, my ultimate expectation for this is that the role of the verifier is played by artificial intelligence. The existing solution may be that once the verifier does evil, he will be kicked out of the node team forever; but limited by the anonymity of the blockchain, we still cannot make empirical judgments on the good and evil of the address. It is because of the invalidation of penalties that verifiers have the incentive to make dishonest proofs when faced with the temptation of high benefits. From some perspectives, if the zero-knowledge proof verifier is served by a trusted subject in the real world (that is, centralization), it will bring a better result.

Q: In the process of using the market method to value data, how to define "similar data" of non-standardized data?

secondary title

3. Web3 data market outlook

Summarize the content of this article so far.

First, from the transition process of human production mode and corresponding organizational form, I pointed out two characteristics of information production: zero marginal cost and network externality. With the help of these characteristics, the Internet is gradually completing the reconstruction of traditional industries and business models, and it is also making human beings gradually migrate to digitalization. In such a trend, the importance of data has begun to be highlighted, but there are various problems in the use of data by web2. Specific to data market transactions, there are various issues such as privacy boundaries and protection, data externalities, data collection difficulties, data value matching, and data valuation. Web3 based on blockchain technology is an innovation to the traditional Internet. It hopes to solve many problems through the combination of "algorithm + incentive mechanism" and provide a possibility for the realization of the data market.

So, what do I expect from the data marketplace?

First, about the infrastructure of the data market. Regarding privacy, data externalities, zero-knowledge proof, etc., existing technologies need breakthroughs. A new public chain with high concurrency and high performance will also be a rigid demand (and it also needs sufficient security, which is too difficult). From my personal point of view, since many parts of the data market involve supervision, determination of general rules, individual property rights, etc., it is unrealistic to build on the public chain. The global data market is likely to achieve technological breakthroughs on the public chain Afterwards (the general technological breakthrough is in the public chain, um), a stable alliance chain is built among trusted countries corresponding to the existing political structure, and it will be developed after the underlying protocol rules are negotiated and recognized.

However, the partial data market will definitely be a step ahead on the public chain. What data is involved? Public data such as the use records of all dapps and the content created by users on the public chain do not have transaction value between users, but they are valuable to the B-side, and it is expected to use airdrop tokens (using data to delineate target users) . In the future, the private data recorded by the privacy algorithm will start the original peer-to-peer market, and the first emerging market is expected to be used for supporting services, such as mortgage guarantee transactions.

After the privacy algorithm matures, institutions or whales will hide part of their transactions, because in traditional finance, information is value. The token market on the chain will become more complicated, and ordinary users will take more risks due to lack of supervision. Business forms similar to knowledge payment may emerge, because algorithms can automatically complete related transactions, which will be more friendly to individual creators.

In a broad sense, data transactions also include on-chain cultural and spiritual consumption. For example, the membership system of video websites and online novel platforms is essentially a non-competitive and exclusive data product; if such cultural products are not free to the outside world, they need privacy algorithm support, or some other technical means ( For example, opensea unlocks hidden content after purchasing NFT). In the field of spiritual consumption, there is a direction that has always been the favorite of gentlemen, and it perfectly fits the characteristics of privacy and anonymity. I won’t say what it is specifically.

Having said all that, the future of the data market still has a long way to go. It is said that it is the prospect of 2022. When that day comes, it may be 2032. Maybe... maybe... it won't come?

Web3.0

Welcome to Join Odaily Official Community