a16z: How does Lasso+Jolt implement a more efficient SNARK system?

This article is approximately 3927 words,and reading the entire article takes about 5 minutes
Technical QA about Lasso, Jolt and SNARK

Original title: A technical FAQ on Lasso, Jolt, and recent advancements in SNARK design

Original author:Justin Thaler

Original compilation: Luccy, BlockBeats

Editors note: On November 20, Ben Diamond and Jim Posen (DP) of Ulvetanna published a paper improving polynomial IOPs based on sum checks and the commitment scheme Ligero/Brakedown combining faster provers and larger proofs. , and integrate it with sum-check based SNARKs such as Lasso. After a16zs research partner Justin Thaler conducted an in-depth analysis of the technology, various researchers raised questions about the content of the article.

Related reading: a16z: Lasso+Jolt, a new perspective on rapid commitment

This FAQ integrates 13 questions to answer various questions including Jolt Prover performance, the reason why DP chose Keccak and Grøstl hash functions, and the security of hash-based commitment schemes. In addition, this answer also covers the basic relationship between Lasso and Jolt, the performance advantages of using Reed-Solomon codes, the advantages of Ligero/Brakedown over FRI, and the explanation of the element economy of DPs commitment to GF[2].

Discussion focuses include the Boolean circuit implementation of the entire calculation, the advantages of the characteristic two fields, and the technical and cost issues of SNARK proofs through recursive combination and combining the DP commitment scheme with elliptic-curve-based SNARKs. Finally, the article also raises the question of whether SNARKs using DPs commitment scheme can be used in conjunction with folding schemes such as Nova. In-depth research on these issues is expected to promote technological innovation and further improvements in this field.

Q1: Relative to native execution of RISC-V programs, how fast do you think Jolt prover will be once Jolt is reimplemented to use DP promises?

A:A rough and speculative estimate is provided here. Jolt prover expects that each RISC-V CPU step will commit approximately 800 bits of data, using 32-bit data types and multiplication extensions. Two things to note: First, some RISC-V instructions are handled through multiple pseudo-instructions. For example, the division instruction works by letting the prover provide the remainder and verify both through multiplication, addition, and inequality checks. Second, this number estimate may be adjusted slightly after we parse the lookup table decomposition of GF [2128].

Using DPs commitment scheme and assuming that recursion will not become a bottleneck, the main commitment costs in the T-step calculation are as follows.

First, an additive FFT is applied to a total of about 200 Tbytes of data. More precisely, the Ligero/Brakedown prover performs O(√T) independent FFTs of size O(√T) (which involves less total work and can be parallelized better) than executing O(√T) independent FFTs of length A single FFT of O(T)). Second, approximately 200 terabytes are hashed using a standard hash function such as Keccak or SHA 2.

Empirically, DP found that FFT and hashing are roughly equal in prover time.

Using Keccak hashing, an estimate of about 70 RISC-V cycles per byte indicates that these operations will be about 30,000 times slower than just running an unproven RISC-V program. In other words, to prove that the Jolt prover correctly ran a RISC-V program Ψ, the prover itself (implemented in RISC-V) would require at least 20,000 times more cycles than Ψ itself.

These commitment costs are sufficiently fast to suggest that the provers bottleneck may lie in the limited domain operations it performs in the total checking protocol, even allowing for optimizations over small feature domains. So, as a rough estimate, Id guess that the Jolt prover (implemented in RISC-V) will be about 50,000 times slower than just running the RISC-V program.

The whole calculation is a bit ridiculous: its unlikely that the prover itself will be implemented in RISC-V when Jolt is deployed. But it does give a general idea of ​​how to estimate the cost of a zkVM prover.

While a 50,000x slowdown seems huge, its 20x faster than the 1 millionx overhead I optimistically estimated about 18 months ago. The main reason for this improvement is the smaller promise data unlocked by Lasso and Jolt (and the smaller size of each promise value). The rest is due to better commitment schemes (eg, improved ability to use fast hash functions and exploit the small size of commitment values ​​in hash-based SNARKs).

Q2: DP provides fast SNARKs for Keccak and Grøstl. Why were these hash functions chosen? What other hash functions are suitable for these techniques? BlockBeats Note: Grøstl is a cryptographic hash function

A:Keccak/SHA 3 is an obvious choice because it is a key bottleneck for the NIST standard, Ethereum precompilation, and Type-1 zkEVM. SHA 3 is the official NIST standard and Keccak is Ethereum precompiled, they are independent of SNARK cost.

DP considered Grøstl because it should lead to faster provers while maintaining many of the advantages of Keccak. In particular, Grøstl withstood strong cryptanalytic scrutiny, although it was ultimately not selected, because it reached the final round of NISTs SHA-3 competition and used an AES S-box. Thanks to the AES acceleration instruction AES-NI, Grøstl runs even faster than Keccak on Intel chips.

DPs prover should be faster for Grøstl than for Keccak because Grøstl is essentially defined natively on GF [28], which means that DPs prover can commit to fewer domain elements than Keccak. (See Q 9 for details on how this benefits the prover.) Overall, Grøstl should be better suited for (recursive) SNARKs than Keccak, since it is faster both on the prover and on the chip.

DPs SNARKs have nothing to do with Keccak and Grøstl. These techniques should be applicable to many other hash functions. For example, DP believes that SHA 2 should also be as good as Keccak, but has not yet studied it in detail.

Q3: I thought Lasso/Jolt was a commitment solution based on elliptic-curve?

A:No, Lasso and Jolt are not specifically targeted at curve-based commitment schemes. But in the case of a few months ago, compared to previous work, their advantages are most evident when used in conjunction with curve-based promises. This is because curve-based commitments pay a particularly high price when the prover has to commit to random domain elements, so Lasso/Jolts novel ability to avoid this has the most compelling performance when these commitments are used Influence.

In short, while no one has ever designed SNARKs based on curve promises to take advantage of the small values ​​being promised, to some extent there are already hash-based promises that work on small fields that take advantage of this.

However, even using hash-based promises, Lasso and Jolt improve on previous work in both respects. First, DP shows that hash-based commitments can benefit from having only small domain elements committed more robustly than previously known approaches. For example, while todays commitment schemes incur the same prover cost for committing to a 1-bit value as for a 32-bit value, DPs scheme is almost 32 times cheaper to commit to a 1-bit value. Second, Lasso and Jolt not only ensure that the prover commits to only small domain elements, but also ensures that the prover commits to fewer domain elements than non-sum-check-based SNARKs. In fact, in Jolt, we carefully calculated the total bit complexity of all promise domain elements and confirmed that it is much smaller than the work done in existing zkVMs.

When Lasso/Jolt was released a few months ago, another technical hiccup brought curve-based commitments to the fore: the only hash-based commitment scheme with log-polynomial proof size, FRI, is for univariate polynomials, Lasso/Jolt uses multilinear polynomials. Several transformations are known to adapt FRI to commitment schemes suitable for multilinear polynomials, but these transformations add an overhead in terms of prover time, proof size, or both that we consider highly undesirable. BaseFold now allows direct multilinear commitments with log polynomial proof size, although the resulting proofs are slower than Brakedowns and larger than FRIs.

Unlike FRI, the Ligero/Brakedown commitment scheme applies directly to multilinear polynomials and has a very fast prover. But previously, applying recursion to reduce its proof size was difficult because validators performed a large number of hash operations, making recursion expensive. DPs work will significantly reduce the cost of this recursion by providing faster SNARKs for hashes.

Q4: Didn’t you say that the curve-based commitment scheme is faster than the hash-based scheme (when only small values ​​are promised, such as Lasso/Jolt)? Doesnt this contradict your endorsement of DPs SNARKs?

A:First, as I said before, there are some important SNARK application scenarios where hash-based SNARKs are clearly not the best performing choice, since it makes sense to work on the underlying domain of the elliptic-curve group. Curve-based promises are faster when using these domains. Proving any statement about the elliptic-curve cryptosystem (including knowledge about ECDSA digital signatures authorizing blockchain transactions) falls into this category.

Second, even in applications that make reasonable use of small feature domains, performance comparisons of hash-based and curve-based schemes are complex. For example, it depends heavily on how fast the hash function used in a hash-based scheme is. Today, many (but not all) projects use slower hash functions, such as Poseidon, to implement recursion. Using such a hash function, hash-based schemes are significantly slower than curve-based schemes when committing to small values ​​(such as Lasso/Jolt). Even with fast hash functions, its not clear whether they are faster (as my previous comment stated).

However, DP speeds up hash-based commitments, making them more efficient when using domains with characteristic 2, and allows provers to better exploit the small size of commitment values, compared to existing hash-based schemes such as FRI) compared. So my current expectation is that on domains with characteristic 2, Ligero/Brakedown will be the way forward, unless proven otherwise with local definitions on finite domains.

In conclusion, to this day, the main reason why hash-based commitment schemes are generally considered to be faster than curve-based schemes is that popular SNARKs like Plonk require the prover to commit to random domain elements rather than small domain elements, whereas in In this case, the curve-based promise solution is very slow. Lasso and Jolt showed that the prover need not commit to random domain elements. In this case, the comparison is at least more nuanced. Until today, curve-based solutions were actually faster, but with DPs improvements, the situation is reversed (except when defined locally on a large domain).

Q5: Didn’t you say that the hash-based commitment scheme is less secure?

A:There is nothing inherently insecure about hash-based commitment schemes like FRI or Ligero/Brakedown. However, projects generally prioritize performance over security by deploying FRI on configurations where known attacks are close to feasible, and assuming that these known attacks on FRI happen to be optimal.

One benefit of the Ligero/Brakedown commitment scheme is that the main conjecture about FRI, namely the security of the conjecture at adjacent parameters outside the Johnson bound, is not relevant, so the SNARK designers have no incentive to use it to conjecture based on security.

Likewise, I have long been concerned about the use of ostensibly SNARK-friendly hash functions (such as Poseidon) in SNARK deployments. The security of these hash functions (even the ones that have been around the longest) has received far less scrutiny than standard hash functions like Keccak.

In both cases, I believe the project has been compromising security by covering up todays SNARK performance shortcomings. The easiest way to eliminate these practices is to simply develop better performing SNARKs.

Relatedly, I believe that the current practice of hand-designing SNARK-friendly virtual machines (VMs) and the constraint systems that implement these VMs is a significant security issue (and a huge drain on developer resources) because designing constraint systems and starting from a high-level Language Development New Compilers into Custom VMs Assembly code has an error-prone nature. I hope Jolt will make this practice obsolete by showing that standard instruction sets are indeed equally SNARK friendly, and remove any need or incentive to hand-engineer constraint systems that implement these instruction sets.

The way to eliminate a practice with negative security impact is to come up with higher performance SNARKs that make the practice irrelevant.

Q6: DP used Reed-Solomon codes in their implementation of the polynomial commitment scheme, but Brakedown can use any code. Is it worth exploring other codes for possible performance advantages?

A:Yes.

Q 7: In what ways is Ligero/Brakedown more beneficial to provers than FRI?

A:DPs significantly improved prover time when committing to very small values ​​is unique to Ligero/Brakedown, at least for now. Additionally: DP not only improves prover time when committing to small values, it also improves prover space. This is a major bottleneck, especially for hash-based SNARKs. For example, Polygons zkEVM prover today requires over 250 GB to prove a circuit that handles a transaction batch of approximately 10 million gas.

Ligero/Brakedown offers greater flexibility in using error correcting codes. In fact, many of DPs improvements in committing to small values ​​can be obtained simply by using concatenation codes inside Ligero/Brakedown.

When using Reed-Solomon codes, the Ligero/Brakedown prover performs many small FFTs instead of one large FFT. This saves twice the FFT running time and is more suitable for parallelization.

On a technical level, FRI also requires doing both an FFT and an IFFT (technically, this is because FRI actually needs to evaluate the committed polynomial at many points). Provers of Ligero/Brakedown can skip IFFT (on a technical level, skipping IFFT stems from Ligero/Brakedowns superior flexibility in choosing error correction codes). If you use a Reed-Solomon inflation factor of 2 (which is the inflation factor that optimizes the provers time), this can save another 33% on the provers time.

Ligero/Brakedowns evaluative proof does not require the prover to perform additional Merkle hashing. But FRI requires, although most invocations of FRI amortize the cost of evaluating the proof over many committed polynomials.

Q8: Can you outline how DP ensures that committed GF[2] elements are cheaper than committed GF[2^2] elements, which are cheaper than committed GF[2^4] elements, and so on?

A:The time it takes for a prover to commit to a bunch of values ​​using DPs commitment scheme depends roughly only on the total number of bits required to specify the values, where b bits are used to specify values ​​between 0 and 2b. The additional prover cost in DPs SNARKs does increase with the number of field elements used to encode those bits (see #9 below for details). Heres how DP achieves this.

Brakedowns polynomial commitment scheme encodes a subvector of values ​​to be committed using any required error correction code. Suppose the bunch of values ​​to be committed are in GF[2], but we want the encoding itself to operate on the larger domain GF[2^16]. There are many technical reasons why we want to do this, in fact it is necessary if we want to apply some encoding to vectors of length up to 216.

To achieve this, we can simply use code concatenation, which involves dividing all GF[ 2 ] values ​​into chunks of size 16, and packing each chunk of 16 GF[ 2 ] values ​​into a single GF[ 2 ] ^ 16 ] field element. This will reduce the number of field elements to be committed by a factor of 16. We can then apply any error correcting code operating on the field GF[2^16], this is called the foreign code. Each symbol of the resulting codeword can then be unpacked into sixteen GF[2] elements, and the result encoded using the inner code defined on GF[2].

Effectively, the concatenation method reduces the length of the commitment vector (measured in the number of field elements) by a factor of 16, but requires the prover to perform packing and unpacking operations, and to apply the inner code.

This simple approach, applying Brakedown using concatenated codes, has realized many of the benefits of DP work. But DP takes a different approach, resulting in faster provers (at the cost of slightly larger proofs). For example, DPs practical approach avoids the cost of applying the inner code to each unpacked symbol of the outer codeword.

Q 9: Since DPs commitment scheme makes it very cheap to commit values ​​in { 0, 1 }, why not have the prover just commit to the bitwise decomposition of all values ​​that appear in the computation? That is, why not implement the entire calculation using a Boolean circuit and let SNARK assign a complete field element to each input bit and gate in the circuit?

A:In their SNARK for Keccak, DP does have the prover commit to only values ​​in { 0, 1 }, but this is not necessarily a good idea in the general case.

Indeed, the commitment time of DP is roughly proportional to the sum of the bit complexity of all committed values, independent of how many field elements those values ​​are spread over (which is why it makes sense to only commit to one-bit values ​​in Keccaks SNARK idea).

However, this does not mean that all costs are independent of the number of field elements committed. In particular, the size of the evaluation proof of a commitment scheme is proportional to (the square root of) the number of commitment field elements.

Another cost that is proportional to the number of committed field elements is the number of terms that need to be summed in some applications of the sum-checking protocol in DPs SNARK. Roughly speaking, committing to x times as many field elements means that the summation check proves that x times as many terms need to be summed. There are some optimizations available to mitigate this overhead, but the mitigation is not perfect. That is, the sum check will probably still prove slower, even for x one-bit values, compared to committing after packing the values ​​into a single x bit field element.

DP mitigates the latter cost of committing many one-bit values ​​by providing a sum-check-based protocol that packs many one-bit values ​​into a single field element after they are committed. This allows them to technically avoid paying the price of too much sum-check proof time for many committed values, while still being able to enjoy their benefits (especially once bit-decomposed commitments are proven by sum checks, Certain operations such as bitwise AND do not incur any additional commitment cost when proven).

Q1 0: What are the specific advantages of DP utilizing fields with characteristics two?

A:There are many, here are some examples.

DP made extensive use of tower field construction. In the context of fields with characteristic two, this refers to constructing GF[22] as a quadratic extension of GF[2], then constructing GF[24] as a quadratic extension of GF[4], and then constructing GF[28 ] as a quadratic extension of GF[ 24 ], and so on. Especially for fields of characteristic two, very efficient tower constructions are known to exist.

The sum-checking protocol computes ∑x∈ 0, 1 ^ng(x) for a multivariate polynomial g. The size of the Boolean hypercube { 0, 1 }n (and its subcubes) is a power of 2, so the subfields are well aligned with the subcubes. DP takes advantage of this and makes it easy to package many small field elements into a single element of a larger extended field.

DP currently uses Reed-Solomon coding in the Ligero/Brakedown polynomial commitment scheme. Efficient Reed-Solomon coding requires an additive FFT, which is very efficient in fields with a characteristic of two, but not so good in other fields. However, using other encodings can completely avoid the need for FFT.

Fields with characteristic two are well handled in real hardware. Real-world computers are based on power-of-two sized data types. You can fit the largest information perfectly in registers, cache lines, etc. without padding. Intel even built into the chip primitive instructions (Galois Field New Instructions [GFNI]) for performing particularly fast arithmetic in GF [28]. When using tower construction, this can be used to implement very fast GF[2k] arithmetic, even for k > 8.

Q1 1: Is it possible that by using DPs commitment scheme to recursively combine a SNARK proof with itself, no SNARK proof becomes less constrained?

A:Yes, the recursion threshold for SNARKs using Ligero/Brakedown commitments is relatively high. The recursive threshold refers to the size of the proof such that no shorter proof can be produced by recursively applying a Brakedown/Ligero based SNARK to the verifier. I expect the recursion threshold to be on the order of a few MB.

If you want to obtain a smaller proof, I think it can be achieved by combining SNARKs with other smaller proofs. Please refer to Q1 2 for details. If this assumption turns out to be wrong, it should not be seen as a failure of Binius but as an accusation against the scalability of todays popular SNARKs. How can we say they are scalable if they cant prove that a few MB of data has been hashed in a reasonable amount of time?

Anyway, besides reducing proof size, there are other important reasons for fast recursive composition. Most importantly, it is a key tool for controlling proof space requirements. Since (non-recursive) SNARKs take up a lot of space for the prover, people will break large calculations into small chunks, prove each chunk separately, and use recursive proofs to chain the chunks together. DPs fast SNARKs for standard hash functions (such as Keccak) enable such recursive proofs to be completed quickly, even if the proof size is somewhat large.

Q1 2: Suppose you want to combine DPs commitment scheme with an elliptic-curve-based SNARK (such as Plonk or Groth 16) to reduce the verifiers cost before publishing a proof on the chain. Doesnt this require proof that non-local field arithmetic is involved? Because DPs verifier operates on GF[2^128], while curve-based SNARK uses large prime-order fields.

A:Yes, this is a potential challenge, but I believe a solution can be found.

One possibility is to combine first with SNARK using a hash-based polynomial commitment scheme with shorter proofs (perhaps FRI, converted to a multilinear polynomial commitment scheme, or BaseFold), and then with curve-based SNARK to combine. Note that FRI can run natively on fields of feature two, and in fact this case was considered in the original FRI paper. The current popular SNARK restrictions on the use of these fields come from the use of polynomial IOPs that are not based on summation checks, unlike FRI itself.

This does not eliminate the problem of non-local field arithmetic, but it can be alleviated to a large extent, because for sufficiently large polynomials, the FRI validator performs fewer total operations, especially fewer field operations than the Ligero/Brakedown validator performs. few.

Q1 3: Can SNARK using DPs commitment scheme be used in conjunction with folding schemes such as Nova?

A:This will suffer from the same problem as Q1 2, that is, the folding scheme uses elliptic-curve, usually defined on large prime-order fields, while DPs commitment scheme uses fields with sizes that are powers of two.

I expect significant progress will be made on folding schemes, and they will play an important role in future SNARK designs. However, it may be the case that they do not combine well with hash-based SNARKs on fields with very small features.

For now, elliptic-curve based promises and folding schemes should be used when statements involving local definitions on large fields are involved, or when prover space is critical (folding schemes like Nova are far superior in prover space) than other SNARKs, roughly because they can break large calculations into much smaller parts than other SNARKs). In other cases, a hash-based scheme should be used, especially for small feature fields.

Likewise, further development of folding schemes in the future may lead them to move beyond hash-based schemes. In fact, Nova is already faster than current generation hash-based SNARKs in some benchmarks (although there are many claims that current hash-based SNARKs are faster than curve-based ones).

Original link

Original article, author:区块律动BlockBeats。Reprint/Content Collaboration/For Reporting, Please Contact report@odaily.email;Illegal reprinting must be punished by law.

ODAILY reminds readers to establish correct monetary and investment concepts, rationally view blockchain, and effectively improve risk awareness; We can actively report and report any illegal or criminal clues discovered to relevant departments.

Recommended Reading
Editor’s Picks