Detailed explanation of Celer "Pantheon": zero-knowledge proof development framework evaluation platform

first level title
Zero-knowledge proof development framework evaluation platform "Pantheon"
Over the past few months, we have invested a lot of time and effort in developing cutting-edge infrastructure built with zk-SNARK succinct proofs. This next-generation innovative platform enables developers to build unprecedented new paradigms of blockchain applications.
In our development work, we tested and used various zero-knowledge proof (ZKP) development frameworks. While this journey has been rewarding, we do realize that the variety of ZKP frameworks often presents a challenge for new developers as they try to find the framework that best suits their specific use case and performance requirements. Considering this pain point, we believe that there is a need for a community evaluation platform that can provide comprehensive performance test results, which will greatly facilitate the development of these new applications.
To meet this need, we introduced theZero-knowledge proof development framework evaluation platform "Pantheon"This public interest community initiative. The first step of the initiative will encourage the community to shareReproducible performance test results. Our ultimate goal is to collaboratively create and maintain aWidely recognized test platform,first level title
Step 1: Performance testing of the circuit framework using SHA-256
In this article, we take the first steps in building a ZKP Pantheon, using SHA-256 in a series of low-level circuit development frameworks to provide a reproducible set of performance test results. While we acknowledge that other performance testing granularities and primitives may be possible, we chose SHA-256 because it is suitable for a wide range of ZKP use cases, including blockchain systems, digital signatures, zkDID, and more. Also worth mentioning that we also use SHA-256 in our own system, so this is handy for us too!
secondary title
proof system
In recent years, we have observed a proliferation of zero-knowledge proof systems. Keeping up with all the exciting advancements in the field is challenging, and we have handpicked the following proof systems for testing based on maturity and developer adoption. Our goal is to provide a representative sample of different frontend/backend combinations.
Circom + snarkjs / rapidsnark: Circom is a popular DSL for writing circuits and generating R 1 CS constraints, and snarkjs is capable of generating Groth 16 or Plonk proofs for Circom. Rapidsnark is also Circom's prover, it generates Groth 16 proofs, and is often much faster than snarkjs due to the use of ADX extensions and parallelizes proof generation as much as possible.
gnark:gnark is a comprehensive Golang framework from Consensys that supports Groth 16, Plonk and many more advanced features.
Arkworks: Arkworks is a comprehensive Rust framework for zk-SNARKs.
Halo 2 (KZG): Halo 2 is a zk-SNARK implementation of Zcash and Plonk. It is equipped with highly flexible Plonkish arithmetic, supporting many useful primitives such as custom gateways and lookup tables. We use the Halo 2 fork of KZG with Ethereum Foundation and Scroll support.
Plonky 2 : Plonky 2 is a SNARK implementation based on the PLONK and FRI technologies from Polygon Zero. Plonky 2 uses small Goldilocks fields and supports efficient recursion. In our performance tests, we target a security of 100 bits of speculation and use parameters that yield the best proof times for the performance test effort. Specifically, we used 28 Merkle queries, an amplification factor of 8, and a 16-bit proof-of-work challenge. Also, we set num_of_wires = 60 and num_routed_wires = 60.
Starky: Starky is Polygon Zero's high-performance STARK framework. In our performance tests, we target a security of 100 bits of speculation and use the parameters that yield the best proof times. Specifically, we used 90 Merkle queries, a 2x amplification factor, and a 10-bit proof-of-work challenge.
The table below summarizes the above frameworks and the associated configurations used in our performance tests. This list is by no means exhaustive, and we will also investigate many state-of-the-art frameworks/technologies (e.g., Nova, GKR, Hyperplonk) in the future.
Note that these performance test results are for the circuit development framework only. We plan to publish a separate article in the future with performance tests of different zkVMs (e.g., Scroll, Polygon zkEVM, Consensys zkEVM, zkSync, Risc Zero, zkWasm) and IR compiler frameworks (e.g., Noir, zkLLVM).
Performance Evaluation Methodology
To perform performance tests on these different proof systems, we compute the SHA-256 hash of N bytes of data, where we experiment with N = 64, 128, ..., 64 K (Starky is an exception, where the circuit repeats the computation of the SHA-256 fixed 64-byte input, but keeps the same total number of message blocks). allowablethis repositoryThe performance code and SHA-256 circuit configuration are found in .
Additionally, we performed performance tests on each system using the following performance metrics:
Proof generation time (including witness generation time)
Memory usage spikes during proof generation
Average percentage of CPU usage during attestation generation. (This metric reflects the degree of parallelization in the proof generation process)
secondary title
machine
We ran performance tests on two different machines:
Linux server: 20 cores @ 2.3 GHz, 384 GB RAM
Macbook M 1 Pro: 10 cores @ 3.2 Ghz, 16 GB RAM
The Linux server is used to simulate a scenario with a large number of CPU cores and sufficient memory. The Macbook M 1 Pro, which is usually used for research and development, has a more powerful CPU but fewer cores.
secondary title
text
Constraint quantity
Before we move on to detailed performance test results, it is useful to first understand the complexity of SHA-256 by looking at the number of constraints in each proof system. It is important to note that the number of constraints in different arithmetic schemes cannot be directly compared.
The results below correspond to a preimage size of 64 KB. They scale roughly linearly, although results may vary with other preimage sizes.
Circom, gnark, and Arkworks all use the same R 1 CS algorithm, and the number of R 1 CS constraints for calculating 64 KB SHA-256 is roughly between 30 M and 45 M. Differences between Circom, gnark and Arkworks may be due to configuration differences.
Both Halo 2 and Plonky 2 use Plonkish arithmetic, where the number of lines ranges from 2^22 to 2^23. Halo 2's implementation of SHA-256 is much more efficient than Plonky 2's due to the use of lookup tables.
text

proof generation time
[Figure 1] The proof generation time of each frame of SHA-256 on various original image sizes was tested using a Linux server. We can get the following findings:
For SHA-256, the Groth 16 framework (rapidsnark, gnark, and Arkworks) generated proofs faster than the Plonk framework (Halo 2 and Plonky 2). This is because SHA-256 consists mostly of bitwise operations, where the wire values are either 0 or 1. For Groth 16, this reduces most computations from elliptic curve scalar multiplication to elliptic curve point addition. However, wire values are not directly used in Plonk's computations, so the special wire structure in SHA-256 does not reduce the amount of computation required in the Plonk framework.
Among all Groth 16 frameworks, gnark and rapidsnark are 5 to 10 times faster than Arkworks and snarkjs. This is due to their remarkable ability to leverage multiple cores to parallelize proof generation. Gnark is 25% faster than rapidsnark.
For the Plonk framework, Plonky 2's SHA-256 is 50% slower than Halo 2's when using larger preimage sizes >= 4 KB. This is because the Halo 2 implementation primarily uses lookup tables to speed up bitwise operations, resulting in 2x fewer rows than Plonky 2. However, if we compare Plonky 2 and Halo 2 with the same number of rows (for example, SHA-256 over 2 KB in Halo 2 vs. SHA-256 over 4 KB in Plonky 2), Plonky 2 is 50% faster than Halo 2. If we implement SHA-256 in Plonky 2 using a lookup table, we should expect Plonky 2 to be faster than Halo 2, despite the larger proof size for Plonky 2.
On the other hand, when the input preimage size is small (<= 512 bytes), Halo 2 is slower than Plonky 2 (and other frameworks) due to the fixed setup cost of the lookup table accounting for most of the constraints. However, Halo 2's performance becomes more competitive as the preimage size increases, and its proven generation time remains constant for preimage sizes up to 2 KB, which scale almost linearly as shown.
As expected, Starky's proof generation time is much shorter (5x-50x) than any SNARK framework, but this comes at the cost of a larger proof size.
Also note that even though the circuit size scales linearly with the preimage size, proof generation for SNARKs grows superlinearly due to O(nlogn) FFT (although this phenomenon is not shown on the graph due to the logarithmic scale obvious).

We also conducted proof generation time performance tests on a Macbook M 1 Pro, as shown in [Figure 2]. However, it should be noted that rapidsnark was not included in this benchmark due to lack of support for the arm 64 architecture. In order to use snarkjs on arm 64, we have to use webassembly to generate the witness, which is slower than the C++ witness generation used on the Linux server.
A couple of additional observations when running performance tests on a Macbook M 1 Pro:
With the exception of Starky, all SNARK frameworks suffer from out-of-memory (OOM) errors or use swap memory (resulting in slower proof times) as the preimage size grows larger. Specifically, the Groth 16 frameworks (snarkjs, gnark, Arkworks) start using swap memory at preimage sizes >= 8 KB, while gnark runs out of memory at preimage sizes >= 64 KB. Halo 2 ran into a memory limit when the preimage size was >= 32 KB. Plonky 2 starts using swap memory when the preimage size is >= 8 KB.
secondary title

Peak memory usage
[Figure 3] and [Figure 4] show the peak memory usage during proof generation on Linux Server and Macbook M 1 Pro, respectively. From these performance test results the following observations can be made:
Of all the SNARK frameworks, rapidsnark is the most memory efficient. We also see that Halo 2 uses more memory when the preimage size is smaller due to the fixed setup cost of the lookup table, but consumes less memory overall when the preimage size is larger.
Starky is more than 10 times more memory efficient than the SNARK framework. Partly because it uses fewer rows.
secondary title


CPU utilization
We evaluate the degree of parallelization of each proof system by measuring the average CPU utilization of SHA-256 during proof generation for a 4 KB preimage input. The table below shows the average CPU utilization (average per core utilization in parentheses) on Linux Server (20 cores) and Macbook M 1 Pro (10 cores).
The main observations are as follows:
Gnark and rapidsnark exhibit the highest CPU utilization on Linux servers, indicating that they can efficiently use multiple cores and parallelize proof generation. Halo 2 also showed good parallelization performance.
The CPU utilization of most frameworks on the Linux server is twice that of the Macbook Pro M 1, with the exception of snarkjs.
first level title

Conclusions and Future Research
This article comprehensively compares the performance test results of SHA-256 on various zk-SNARK and zk-STARK development frameworks. Through the comparison, we gain insight into the efficiency and usefulness of each framework, in the hopes of helping developers who need to generate succinct proofs for SHA-256 operations. We found that Groth 16 frameworks (eg rapidsnark, gnark) are faster at generating proofs than Plonk frameworks (eg Halo 2, Plonky 2). The lookup tables in Plonkish arithmetics significantly reduce SHA-256 constraints and proof times when using larger preimage sizes. Additionally, gnark and rapidsnark demonstrate an excellent ability to exploit multiple cores to parallelize operations. On the other hand, Starky's proof generation time is much shorter, at the cost of a much larger proof size. In terms of memory efficiency, rapidsnark and Starky outperform other frameworks.
As the first step in building a zero-knowledge proof evaluation platform "Pantheon", we admit that the results of this performance test are far from enough to become a comprehensive test platform that we hope to build in the end. We welcome and welcome feedback and criticism, and invite everyone to contribute to this initiative to make zero-knowledge proofs more accessible and low-barrier to entry for developers. We are also willing to provide grants to individual independent contributors to cover the cost of computational resources for large-scale performance testing. We hope that together we can improve the efficiency and utility of ZKPs and benefit the community more broadly.
Finally, we would like to thank the Polygon Zero team, the gnark team at Consensys, Pado Labs, and the Delphinus Lab team for their valuable reviews and feedback on performance test results.


