Compilation of the original text: Deep Tide TechFlow
Compilation of the original text: Deep Tide TechFlow
Zero-Knowledge Machine Learning (ZKML) is an area of research and development that is making waves in the cryptography community recently. But what is it and what is it good for? First, let's break down the term into its two components and explain what they are.
What is ZK?
Zero-knowledge proof is a cryptographic protocol in which one party (the prover) can prove to another party (the verifier) that a given statement is true without revealing any additional information other than that the statement is true. This is a research field that is making great progress in various aspects, covering everything from research to protocol implementation and application.
The two main "primitives" (or building blocks) provided by ZK are the ability to create proofs of computational integrity for a given set of computations, where the proof is much easier than performing the computation itself. (We call this property "simplicity"). ZK proofs also provide the option to hide some parts of the computation while maintaining the correctness of the computation. (We call this property "zero knowledge").
Generating a zero-knowledge proof is very computationally intensive, roughly 100 times more expensive than the original computation. This means that zero-knowledge proofs cannot be computed in some cases because the time required to generate them on optimal hardware makes them impractical.
However, recent advances in cryptography, hardware, and distributed systems have made zero-knowledge proofs a viable option for increasingly powerful computations. These advances have enabled the creation of protocols that can use computationally intensive proofs, expanding the design space for new applications.
ZK use case
Zero-knowledge cryptography is one of the most popular technologies in the Web3 space because it allows developers to build scalable and/or private applications. Here are some examples of how it is used in practice (although note that many of these projects are still ongoing):
1. Scaling Ethereum through ZK rollups
Starknet
Scroll
Polygon Zero,Polygon Miden,Polygon zkEVM
zkSync
2. Build privacy-preserving apps
Semaphore
MACI
Penumbra
Aztec Network
3. Identity Primitives and Data Sources
WorldID
Sismo
Clique
Axiom
4. Layer 1 protocol
Zcash
Mina
As ZK technology matures, we believe there will be an explosion of new applications because the tools used to build these applications will require less domain expertise and will be easier for developers to use.
machine learning
Machine learning is an area of research in the field of artificial intelligence ("AI") that enables computers to automatically learn and improve from experience without being explicitly programmed. It utilizes algorithms and statistical models to analyze and identify patterns in data and then make predictions or decisions based on these patterns. The ultimate goal of machine learning is to develop intelligent systems that can learn adaptively, without human intervention, and solve complex problems in various fields such as healthcare, finance, and transportation.
Recently, you may have seen the progress of large language models such as chatGPT and Bard, and text-to-image models such as DALL-E 2, Midjourney, or Stable Diffusion. As these models get better and are able to perform a wider range of tasks, it becomes important to understand which model performed the operations, or whether the operations were performed by a human. In the next sections, we will explore this line of thought.
ZKML Motivation and Current Efforts
We live in a world where AI/ML generated content is increasingly difficult to distinguish from human generated content. Zero-knowledge cryptography will allow us to make statements like: "Given a piece of content C, it was generated by a model M applied to some input X." We will be able to verify whether a certain output was generated by a large language model such as chatGPT) or text-to-image models (like DALL-E 2 ) or any other model for which we have created a zero-knowledge circuit representation. The zero-knowledge properties of these proofs will allow us to also hide the input or parts of the model if desired. A good example is the application of machine learning models on some sensitive data, without revealing the input to third parties, users can know the results of their data after model inference (for example, in the medical industry).
Note: When we talk about ZKML, we mean creating zero-knowledge proofs of the inference steps of ML models, not about ML model training (which itself is already very computationally intensive). Currently, state-of-the-art zero-knowledge systems coupled with high-performance hardware are still orders of magnitude away from proving huge models such as currently available large language models (LLMs), but some progress has been made in creating proofs of smaller models. progress.
We did some research on the state of the art of zero-knowledge cryptography in the context of creating proofs for ML models, and created a collection of articles aggregating related research, articles, applications, and code repositories. Resources for ZKML can be found in the ZKML community's awesome-zkml repository on GitHub.
The Modulus Labs team recently published a paper called "The Cost of Intelligence", which benchmarks existing ZK proof systems and lists multiple models of different sizes. Currently, proofs can be created for models with ~18 million parameters using a proof system like plonky 2 in ~50 seconds on a powerful AWS machine. Here is a diagram from the paper:
Another initiative aimed at improving the state of the art of ZKML systems is Zkonduit's ezkl library, which allows you to create ZK proofs of ML models exported using ONNX. This enables any ML engineer to create ZK proofs for their model's inference steps and prove the output to any correctly implemented validator.
There are several teams working on improving ZK technology, creating optimized hardware for what happens inside ZK proofs, and building optimized implementations of these protocols for specific use cases. As the technology matures, larger models will be ZK proofed on less powerful machines for a short period of time. We hope these advances will enable new ZKML applications and use cases to emerge.

potential use cases
To determine whether ZKML is suitable for a particular application, we can consider how the properties of ZK cryptography will solve problems related to machine learning. This can be illustrated with a Venn diagram:
definition:
1. Heuristic optimization - a problem solving method that uses rules of thumb or "heuristics" to find good solutions to difficult problems, rather than using traditional optimization methods. Rather than attempting to find an optimal solution, heuristic optimization methods aim to find good or "good enough" solutions in a reasonable amount of time given their relative importance and difficulty of optimization.
2. FHE ML - Fully Homomorphic Encryption ML allows developers to train and evaluate models in a privacy-preserving manner; however, unlike ZK proofs, there is no way to cryptographically prove the correctness of the computations performed.
Teams like Zama.ai are working in this space.
3. ZK vs Validity — In the industry, these terms are often used interchangeably, since a validity proof is a ZK proof that does not hide some part of the computation or its result. In the context of ZKML, most current applications leverage the proof-of-validity aspect of ZK proofs.
4. Validity ML - ZK proves ML models in which no calculations or results are kept secret. They prove the correctness of calculations.
Here are some examples of potential ZKML use cases:
1. Computational completeness (validity ML)
Modulus Labs
On-Chain Verifiable ML Trading Bot - RockyBot
Self-improving visual blockchain (example):
Enhance the intelligent features of Lyra's financial options protocol AMM
Create a transparent AI-based reputation system (ZK oracle) for Astraly
Using ML for Aztec Protocol (zk-rollup with privacy features) to work on the technological breakthroughs required for contract-level compliance tools.
2. Machine Learning as a Service (MLaaS) is transparent;
3. ZK anomaly/fraud detection:
This use case makes it possible to create ZK proofs against exploitability/fraud. Anomaly detection models can be trained on smart contract data and agreed upon by DAOs as interesting metrics to be able to automate security procedures such as more proactive, preventative suspension of contracts. There are already start-ups working on ways to use ML models for security purposes in the context of smart contracts, so ZK anomaly detection proofs seem like a natural next step.
4. Universal Validity Proofs for ML Inference: The ability to easily prove and verify that the output is the product of a given model and input pair.
5. Privacy (ZKML).
6. Decentralized Kaggle: Prove that the model is more than x% accurate on some test data without showing weights.
7. Privacy-preserving inference: Feed medical diagnoses on private patient data into the model and send sensitive inferences (e.g., cancer test results) to the patient.
8.Worldcoin:
Upgradability of IrisCode: World ID users will be able to self-custody their biometrics in encrypted storage on their mobile device, download the ML model used to generate the IrisCode and locally create a zero-knowledge proof that their IrisCode was successfully created. This IrisCode can be inserted permissionlessly by one of the registered Worldcoin users, as the receiving smart contract can verify the zero-knowledge proof and thus the creation of the IrisCode. This means that if Worldcoin upgrades its machine learning model in the future to create IrisCode in a way that breaks compatibility with its previous version, users won't have to go to the Orb again, but can create this zero-knowledge proof locally on the device.
Orb Security: Currently, Orb implements several fraud and tamper detection mechanisms in its trusted environment. However, we can create a zero-knowledge proof that these mechanisms were active when the image was taken and the IrisCode was generated, in order to provide better liveness guarantees for the Worldcoin protocol, since we can be absolutely certain that these mechanisms will be active throughout the IrisCode generation process. run.
Original link


