Original article by @Web3_Mario
I have been looking for a new project direction recently. When I was doing product design, I encountered a technology stack that I had never touched before, so I did some research and sorted out my learning experience to share with you. In general, zkTLS is a new technology that combines zero-knowledge proof (ZKP) and TLS (Transport Layer Security Protocol). In the Web3 track, it is mainly used in the on-chain virtual machine environment. It can verify the authenticity of the off-chain HTTPS data provided by it without trusting a third party. The authenticity here includes three aspects: the data source does come from a certain HTTPS resource, the returned data has not been tampered with, and the effectiveness of the data can be guaranteed. Through this cryptographic implementation mechanism, the on-chain smart contract obtains the ability to access the off-chain Web2 HTTPS resources in a trusted manner, breaking the data silo.
What is TLS protocol?
In order to have a deeper understanding of the value of zkTLS technology, it is necessary to briefly review the TLS protocol. First of all, TLS (Transport Layer Security Protocol) is used to provide encryption, authentication and data integrity in network communications to ensure the secure transmission of data between clients (such as browsers) and servers (such as websites). For those who are not in the field of network development, you may find that when visiting a website, some domain names are prefixed with https and some are prefixed with http. When visiting the latter, mainstream browsers will prompt that it is not safe. The former is prone to encountering prompts such as "Your link is not a private link" or HTTPS certificate errors. The reason for this prompt is the availability of the TLS protocol.
Specifically, the so-called HTTPS protocol uses the TLS protocol based on the HTTP protocol to ensure the privacy and integrity of information transmission and make the authenticity of the server verifiable. We know that the HTTP protocol is a network protocol that transmits plain text, and the protocol cannot verify the authenticity of the server, which creates several security issues:
1. The information transmitted between you and the server may be monitored by a third party, resulting in privacy leakage;
2. You cannot verify the authenticity of the server, that is, whether your request has been hijacked by other malicious nodes and returned malicious information;
3. You cannot verify the integrity of the returned information, i.e. whether data loss may occur due to network problems;
The TLS protocol was designed to solve these problems. Here is an explanation. Some of you may know the SSL protocol. In fact, the TLS protocol was developed based on the SSL 3.1 version. It was just renamed due to some business-related issues, but they are actually the same. So sometimes in some contexts, the two words can be interchangeable.
The main idea of TLS protocol to solve the above problems is:
1. Encrypted communication: Use symmetric encryption (AES, ChaCha 20) to protect data and prevent eavesdropping.
2. Identity authentication: Use a digital certificate (such as an X.509 certificate) issued by a third party to a designated organization to verify the server's identity and prevent man-in-the-middle attacks (MITM).
3. Data integrity: Use HMAC (Hash Message Authentication Code) or AEAD (Authenticated Encryption) to ensure that the data has not been tampered with.
Let's briefly explain the technical details of the HTTPS protocol based on the TLS protocol in the data interaction process. The whole process is divided into two stages. The first is the handshake stage, that is, the client and the server negotiate security parameters and establish an encrypted session. The second is the data transmission stage, that is, using the session key for encrypted communication. The specific process is divided into four steps:
1. The client sends ClientHello:
The client (such as a browser) sends a ClientHello message to the server, which includes:
- Supported TLS versions (such as TLS 1.3) 
- Supported encryption algorithms (Cipher Suites, such as AES-GCM, ChaCha 20) 
- Client Random (used for key generation) 
- Key sharing parameters (such as ECDHE public key) 
- SNI (Server Name Indication) (optional, used to support multiple domain names in HTTPS) 
Its purpose is to let the server know the client's encryption capabilities and prepare security parameters.
2. The server sends ServerHello:
The server responds with a ServerHello message, which includes:
- Selected encryption algorithm 
- Server Random 
- The server's certificate (X.509 certificate) 
- The server's key sharing parameters (such as ECDHE public key) 
- Finished message (used to confirm that the handshake is complete) 
Its purpose is to let the client know the identity of the server and confirm security parameters.
3. Client authenticates server:
The client does the following:
- Verify the server certificate: Ensure that the certificate is issued by a trusted CA (certificate authority) and verify whether the certificate has expired or been revoked; 
- Calculate the shared key: Use your own and the server's ECDHE public key to calculate the session key, which is used for symmetric encryption (such as AES-GCM) for subsequent communications. 
- Send Finished message: prove the integrity of handshake data and prevent man-in-the-middle attacks (MITM). 
Its purpose is to ensure that the server is trustworthy and to generate session keys.
4. Start encrypted communication:
The client and server are now communicating encrypted using the negotiated session key.
- Symmetric encryption (such as AES-GCM, ChaCha 20) is used to encrypt data for increased speed and security. 
- Data integrity protection: Use AEAD (such as AES-GCM) to prevent tampering. 
So after these four steps, the HTTP protocol problem can be effectively solved. However, this basic technology, which is widely used in the Web2 network, has caused trouble for Web3 application development, especially when the on-chain smart contract wants to access some off-chain data. Due to the problem of data availability, the on-chain virtual machine will not open the ability to call external data to ensure the traceability of all data, thereby ensuring the security of the consensus mechanism.
However, after a series of iterations, developers found that DApp still had a demand for off-chain data, so a series of oracle Oracle projects emerged, such as Chainlink and Pyth. They break the phenomenon of data silos by acting as a relay bridge between on-chain data and off-chain data. At the same time, in order to ensure the availability of relayed data, these Oracles are generally implemented through the PoS consensus mechanism, that is, the cost of malicious behavior of relay nodes is higher than the benefits, so that they will not provide wrong information to the chain from an economic point of view. For example, if we want to access the weighted price of BTC on centralized exchanges such as Binance and Coinbase in a smart contract, we need to rely on these Oracles to access and sum the data off-chain and transfer it to the on-chain smart contract for storage before we can use it.
What problem does zkTLS solve?
However, people have found that this Oracle-based data acquisition solution has two problems:
1. Excessive cost: We know that in order to ensure that the data delivered by Oracle to the chain is real data and has not been tampered with, the PoS consensus mechanism needs to be guaranteed. However, the security of the PoS consensus mechanism is based on the amount of pledged funds, which brings costs for maintenance. In addition, there is usually a lot of data interaction redundancy in the PoS consensus mechanism, because when the data set needs to be repeatedly transmitted, calculated, and summarized in the network before it can pass the consensus, this also raises the cost of data use. Therefore, in order to acquire customers, Oracle projects usually only maintain some of the most mainstream data for free, such as the prices of mainstream assets such as BTC. For exclusive needs, you need to pay for it. This hinders application innovation, especially some long-tail, customized needs.
2. Too low efficiency: Generally speaking, the consensus of the PoS mechanism takes a certain amount of time, which causes the lag of on-chain data. This is not conducive to some high-frequency access scenarios because there is a large delay between the data obtained on the chain and the actual off-chain data.
In order to solve the above problems, zkTLS technology came into being. Its main idea is to introduce the ZKP zero-knowledge proof algorithm, so that the on-chain smart contract can act as a third party to directly verify that the data provided by a node is indeed the data returned after accessing a certain HTTPS resource and has not been tampered with. This can avoid the high usage cost of traditional Oracle caused by the consensus algorithm.
Some friends may ask why the ability to call Web2 API is not built directly into the on-chain VM environment. The answer is no, because the reason why a closed data needs to be maintained in the on-chain environment is to ensure the traceability of all data, that is, in the consensus process, all nodes have a unified evaluation logic for the accuracy of a certain data or a certain execution result, or an objective verification logic. This ensures that in a completely trustless environment, most well-intentioned nodes can rely on their redundant data to judge the authenticity of direct results. However, due to Web2 data, it is difficult to build such a unified evaluation logic, because different nodes may obtain different results when accessing Web2 HTTPS resources due to some network delays, which adds difficulties to consensus, especially for some high-frequency data fields. In addition, another key issue is that the security of the TLS protocol on which the HTTPS protocol depends depends on the client-generated random number (Client Random) (for key generation) and key sharing parameters to achieve negotiation with the server for encryption keys, but we know that the on-chain environment is open and transparent. If the smart contract is allowed to maintain random numbers and key sharing parameters, key data will be exposed, thereby compromising data privacy.
Then zkTLS adopts another method. Its idea is to replace the high cost of data availability brought by traditional Oracle based on consensus mechanism through cryptographic protection. It is similar to the optimization of OP-Rollup by ZK-Rollup in L2. Specifically, by introducing ZKP zero-knowledge proof, and calculating and generating Proof for the resources obtained by the off-chain relay node requesting a certain HTTPS, the relevant CA certificate verification information, timing proof, and data integrity proof based on HMAC or AEAD, and maintaining the necessary verification information and verification algorithm on the chain, the smart contract can verify the authenticity, effectiveness, and reliability of the data source without exposing key information. The specific algorithm details will not be discussed here, and interested friends can study it in depth on their own.
The biggest benefit of this technical solution is that it reduces the cost of achieving availability of Web2 HTTPS resources. This has stimulated many new demands, especially in reducing the on-chain price acquisition of long-tail assets, using authoritative websites in the Web2 world to do on-chain KYC, and thus optimizing the technical architecture design of DID and Web3 Game. Of course, we can find that zkTLS also has an impact on existing Web3 companies, especially for the current mainstream oracle projects. Therefore, in order to cope with this impact, industry giants such as Chainlink and Pyth are actively following up on research in related directions, trying to still dominate the process of technological iteration, and will also give rise to new business models, such as switching from the original time-based charging to usage-based charging, Compute as a service, etc. Of course, the difficulty here is the same as most ZK projects, which is how to reduce computing costs and make them commercially valuable.
In summary, when designing products, you can also pay attention to the development of zkTLS and integrate this technology stack in appropriate aspects. Perhaps you can find some new directions in business innovation and technical architecture.

