YBB Capital：Sora横空出世，2024或成AI+Web3变革元年？

YBB Capital

特邀专栏作者

2024-02-24 10:30

This article is about 5416 words, reading the full article takes about 8 minutes

探索AI与Web3融合的未来：去中心化算力、大数据、Dapp创新，及其对产业革新的深远影响。

AI Summary

Expand

探索AI与Web3融合的未来：去中心化算力、大数据、Dapp创新，及其对产业革新的深远影响。

Original Author: YBB Capital Zeke

Introduction

On February 16, OpenAI announced its latest text-controlled video generation diffusion model "Sora," which represents another milestone moment for generative AI by generating high-quality videos covering a wide range of visual data types. Unlike AI video generation tools like Pika, which are still in the stage of generating a few-second video from multiple images, Sora trains by compressing videos and images into latent space and decomposing them into spatiotemporal position patches, achieving scalable video generation. In addition, this model demonstrates the ability to simulate both the physical world and the digital world, and the final 60-second demo can be described as a "universal simulator of the physical world."

In terms of construction, Sora follows the technical path of the previous GPT model "Source Data-Transformer-Diffusion-Emergence," which means that its maturity also requires computational power as an engine. Moreover, due to the much larger amount of data required for video training compared to text training, the demand for computational power will further increase. However, we have already discussed the importance of computational power in the AI era in our previous article "Prospects of the Potential Track: Decentralized Computational Power Market," and with the continuous rise of AI popularity in recent times, there has been a surge of computational power projects in the market, and other DeFi projects (storage, computational power, etc.) that passively benefit have also experienced a wave of growth. So besides DeFi, what kind of sparks can be generated from the intersection of Web3 and AI? What opportunities does this track hold? The main purpose of this article is to update and supplement previous articles, and to reflect on the possibilities of Web3 in the AI era.

Three Major Directions of AI Development

Artificial Intelligence is an emerging science and technology aimed at simulating, extending, and enhancing human intelligence. Since its birth in the 1950s and 1960s, artificial intelligence has become an important technology driving societal and industrial transformations after more than half a century of development. In this process, the three research directions of symbolism, connectionism, and behaviorism have intertwined and become the foundation for the rapid development of AI today.

Symbolism

Also known as logicism or rule-based approach, symbolism believes that simulating human intelligence through processing symbols is feasible. This approach uses symbols to represent and manipulate objects, concepts, and their relationships in the problem domain, and uses logical reasoning to solve problems. It has made significant achievements in expert systems and knowledge representation. The core idea of symbolism is that intelligent behavior can be achieved through the manipulation of symbols and logical reasoning, where symbols represent highly abstract representations of the real world.

Connectionism

Also known as the neural network approach, connectionism aims to achieve intelligence by imitating the structure and functionality of the human brain. This approach constructs networks consisting of numerous simple processing units (similar to neurons), and achieves learning by adjusting the connection strengths (similar to synapses) between these units. Connectionism emphasizes the ability to learn from data and generalize, making it particularly suitable for pattern recognition, classification, and continuous input-output mapping problems. Deep learning, as a development of connectionism, has made breakthroughs in fields such as image recognition, speech recognition, and natural language processing.

Behaviorism

Behaviorism is closely related to the research of biomimetics and autonomous intelligent systems, emphasizing that intelligent agents can learn through interaction with the environment. Unlike the previous two approaches, behaviorism does not focus on simulating internal representations or thought processes, but achieves adaptive behavior through the cycle of perception and action. Behaviorism believes that intelligence is manifested through dynamic interaction and learning with the environment. This approach is particularly effective when applied to mobile robots and adaptive control systems that need to operate in complex and unpredictable environments.

Although these three research directions have essential differences, they can also interact and integrate with each other in actual AI research and applications, jointly promoting the development of the AI field.

AIGC Principles Overview

The currently exploding development of Artificial Intelligence Generated Content (AIGC) is an evolution and application of connectionism. AIGC can mimic human creativity to generate novel content. These models are trained using large datasets and deep learning algorithms to learn the underlying structures, relationships, and patterns in the data. Based on user input prompts, they generate novel and unique output results, including images, videos, code, music, design, translation, question answering, and text. Currently, AIGC is essentially composed of three elements: deep learning (DL), big data, and massive computing power.

Deep Learning

Deep learning is a subfield of machine learning (ML), and deep learning algorithms are modeled after the neural networks of the human brain. For example, the human brain contains millions of interconnected neurons that work together to learn and process information. Similarly, deep learning neural networks (or artificial neural networks) are composed of multiple layers of artificial neurons that work together inside a computer. Artificial neurons are software modules called nodes that use mathematical calculations to process data. Artificial neural networks use these nodes to solve complex problems in deep learning algorithms.

The neural network can be divided into input layers, hidden layers, and output layers, with parameters connecting different layers.

Input Layer: The input layer is the first layer of the neural network and is responsible for receiving external input data. Each neuron in the input layer corresponds to a feature of the input data. For example, when processing image data, each neuron may correspond to a pixel value of the image;
Hidden Layer: The input layer processes the data and passes it to the deeper layers of the neural network. These hidden layers process information at different levels and adjust their behavior when receiving new information. Deep learning networks can have hundreds of hidden layers that can be used to analyze problems from different perspectives. For example, if you have an unknown animal image that needs to be classified, you can compare it with animals you already know. For example, you can judge what animal it is based on the shape of its ears, the number of legs, and the size of its pupils. The hidden layers in deep neural networks work in the same way. If a deep learning algorithm tries to classify animal images, each hidden layer will process different features of the animals and try to classify them accurately;

Output Layer: The output layer is the last layer of a neural network, responsible for generating the network's output. Each neuron in the output layer represents a possible output category or value. For example, in a classification problem, each output layer neuron may correspond to a category, while in a regression problem, the output layer may have only one neuron whose value represents the prediction result;

Parameters: In a neural network, the connections between different layers are represented by weight and bias parameters, which are optimized during the training process to enable the network to accurately identify patterns and make predictions in the data. Increasing the number of parameters can enhance the model capacity of the neural network, meaning the ability of the model to learn and represent complex patterns in the data. However, the increase in parameters also increases the computational requirements.

Big Data

In order to effectively train, neural networks usually require large amounts of diverse, high-quality, and multi-source data. It forms the foundation for training and validating machine learning models. By analyzing big data, machine learning models can learn patterns and relationships in the data, enabling predictions or classifications.

Large-scale Computing Power

The multi-layer complex structure of neural networks, large number of parameters, requirements for handling big data, iterative training process (the model needs to iterate repeatedly during the training phase, including forward propagation and backpropagation for each layer calculation, including activation function calculation, loss function calculation, gradient calculation, and weight update), high-precision computing requirements, parallel computing capabilities, optimization and regularization techniques, as well as model evaluation and validation processes, all contribute to the demand for high computing power.

Sora

As the latest video generation AI model released by OpenAI, Sora represents a huge advancement in the ability of artificial intelligence to process and understand diverse visual data. By adopting video compression networks and spatial-temporal patching techniques, Sora can transform massive visual data captured from different devices around the world into a unified representation, enabling efficient processing and understanding of complex visual content. Leveraging text-conditioned Diffusion model, Sora can generate videos or images highly tailored to the given text prompts, demonstrating high creativity and adaptability.

However, although Sora has made breakthroughs in video generation and simulating interactions in the real world, it still faces limitations including the accuracy of simulating the physical world, consistency in generating long videos, understanding complex textual instructions, and the efficiency of training and generation. Moreover, Sora essentially achieves a brute force aesthetic by relying on OpenAI's monopolistic computing power and first-mover advantage, following the traditional path of "big data-transformer-diffusion-emergence," while other AI companies still have the potential to overtake through technical shortcuts.

Although Sora has little connection with blockchain, I believe that in the next year or two, due to the influence of Sora, it will force the emergence and rapid development of other high-quality AI generation tools, which will radiate into various areas within Web3 such as GameFi, social, creative platforms, Depin, etc. Therefore, it is necessary to have a general understanding of Sora, as it may be a focal point for us to consider how AI will effectively integrate with Web3 in the future.

Four Paths of AI x Web3

As mentioned above, we can see that the underlying foundation required for generative AI can be summarized into three aspects: algorithms, data, and computing power. On the other hand, from the perspectives of versatility and generative performance, AI is a tool that disrupts production methods. Meanwhile, blockchain has two major functions: reconstructing production relations and decentralization. Therefore, I personally believe that the collision between the two can give rise to the following four paths:

Decentralized Computing Power

Because I have already written related articles in the past, the main purpose of this paragraph is to update the latest situation in the field of computing power. When it comes to AI, computing power is always a crucial factor that cannot be ignored. The demand for computing power in AI, since the birth of Sora, has been unimaginable. Recently, during the annual World Economic Forum in Davos, Switzerland in 2024, Sam Altman, the CEO of OpenAI, bluntly stated that computing power and energy are currently the biggest constraints, and their importance in the future may even be equivalent to currency. On February 10th, Sam Altman announced an astonishing plan on Twitter, seeking $7 trillion in funding (equivalent to 40% of China's 23-year national GDP) to reshape the global semiconductor industry and establish a chip empire. When I wrote articles related to computing power, my imagination was limited to national blockades and giant monopolies. It is really crazy that a single company now wants to control the global semiconductor industry.

Therefore, the importance of decentralized computing power is self-evident. The characteristics of blockchain can indeed solve the problem of extreme centralized control of computing power and the expensive prices of specialized GPUs. From the perspective of AI requirements, the use of computing power can be divided into two directions: inference and training. Currently, there are very few projects focusing on training. From the need for decentralized network combined with neural network design, to the extremely high hardware requirements, it is a very challenging and difficult direction. On the other hand, inference is relatively simpler. Firstly, it is not complicated in terms of decentralized network design, and secondly, it requires lower hardware and bandwidth demands. It can be considered a more mainstream direction at present.

The imagination space of the centralized computing power market is huge, often associated with the keyword "trillions", and it is also the topic that is most easily hyped in the era of AI. However, from the recent influx of projects, the vast majority of them are just trying to jump on the bandwagon and ride the hype. They always hold the flag of decentralization, but never talk about the inefficiency of decentralized networks. Moreover, there is a high degree of homogeneity in their designs, with many projects being very similar (L2 mining with one-click design), which may ultimately lead to chaos. It is really difficult for such situations to make a breakthrough in the traditional AI field.

Algorithm and Model Collaboration System

Machine learning algorithms refer to algorithms that can learn patterns and models from data and make predictions or decisions based on them. Algorithms are technologically intensive because their design and optimization require deep expertise and technological innovation. Algorithms are the core of training AI models, defining how data is transformed into useful insights or decisions. Common generative AI algorithms, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformers, are designed for specific domains (such as painting, speech recognition, translation, video generation) or purposes, and specialized AI models are trained through these algorithms.

With so many algorithms and models, each has its own strengths. Can we integrate them into a versatile model? Bittensor, which has recently gained popularity, is a leader in this direction. It enables different AI models and algorithms to collaborate and learn through mining incentives, creating more efficient and versatile AI models. Other projects in the same direction include Commune AI (code collaboration). However, algorithms and models are valuable assets for AI companies and are not readily available for borrowing.

The narrative of AI collaboration is fascinating. The collaborative ecosystem leverages the advantages of blockchain to integrate the disadvantages of AI algorithm islands. However, whether it can create corresponding value is still unknown. After all, leading AI companies have strong capabilities in updating, iterating, and integrating closed-source algorithms and models. For example, OpenAI, in less than two years, has iterated from early text generation models to multi-domain generation models, while projects like Bittensor may need to take a different approach in the domains targeted by their models and algorithms.

Decentralized Big Data

In simple terms, using private data to feed AI and label data align well with blockchain technology. The key is to prevent spam and malicious data and enable data storage to benefit projects like FIL, AR, and Depin. From a more complex perspective, using blockchain data for machine learning (ML) to address the accessibility of blockchain data is also an interesting direction (as explored by Giza).

In theory, blockchain data is accessible at any time and reflects the state of the entire blockchain. However, for people outside the blockchain ecosystem, accessing this massive amount of data is not easy. Storing a complete blockchain requires extensive expertise and a large amount of specialized hardware resources. To overcome the challenges of accessing blockchain data, several solutions have emerged in the industry. For example, RPC providers offer access to nodes through APIs, and indexing services make data extraction possible through SQL and GraphQL, both of which play a crucial role in solving the problem. However, these methods have limitations. RPC services are not suitable for high-density use cases that require a large number of data queries and often fail to meet the demand. Meanwhile, although indexing services provide a more structured way to retrieve data, the complexity of the Web3 protocol makes it extremely difficult to build efficient queries, sometimes requiring writing hundreds or even thousands of lines of complex code. This complexity is a significant barrier for general data practitioners and those with little knowledge of Web3 details. The cumulative effect of these limitations highlights the need for a more accessible and usable method of obtaining blockchain data, which can promote wider application and innovation in the field.

Therefore, by combining ZKML (Zero-Knowledge Proof Machine Learning, reducing the burden of machine learning on the chain) with high-quality blockchain data, it may be possible to create a dataset that solves the accessibility of blockchain. AI can greatly reduce the threshold for accessing blockchain data, so over time, developers, researchers, and ML enthusiasts will be able to access more high-quality and relevant datasets to build effective and innovative solutions.

Empowering Dapps with AI

Since the rise of ChatGPT-3 in 23, empowering Dapps with AI has become a very common direction. The widely applicable generative AI can be accessed through APIs, simplifying and intelligentizing applications such as data analysis platforms, trading bots, and blockchain encyclopedias. On the other hand, it can also act as a chatbot (such as Myshell) or an AI companion (Sleepless AI), or even create NPCs in blockchain games through generative AI. However, due to the low technical barrier, most of them are fine-tuned after accessing an API, and their integration with the project is not perfect enough, so they are rarely mentioned.

However, after the arrival of Sora, I personally believe that the direction of AI empowering GameFi (including the metaverse) and creative platforms will be the focus of attention. Because of the bottom-up nature of the Web3 field, it is definitely difficult to produce products that can compete with traditional games or creative companies. However, the emergence of Sora is likely to break this dilemma (perhaps within two to three years). Based on Sora's demo, it already has the potential to compete with micro-drama companies. The active community culture of Web3 can also generate a lot of interesting ideas. When the only limitation is imagination, the barriers between bottom-up industries and top-down traditional industries will be broken.

Conclusion

With the continuous improvement of generative AI tools, we will experience more epoch-making "iPhone moments" in the future. Although many people scoff at the combination of AI and Web3, I actually think that the current direction is mostly fine. There are only three key points that need to be addressed: necessity, efficiency, and fit. Although the integration of the two is still in the exploration stage, it does not hinder this track from becoming the mainstream of the next bull market.

It is necessary for us to maintain sufficient curiosity and openness to new things. In history, the transition from horse-drawn carriages to cars was a rapid change, just like inscriptions and past NFTs. Holding too many prejudices will only miss out on opportunities.

Web3.0

AIGC

Welcome to Join Odaily Official Community