Experiments and Evaluation of ZK Backends

This document outlines our experiments and evaluation of various Zero Knowledge (ZK) backends. Our evaluation uses two benchmark sets: Direct Implementation and Proofs of Proofs.

Direct Implementation

In this category, we consider four simple programs relevant to blockchain and AI. We use various ZK backends to directly generate their ZK proofs. The programs include:

For each ZK backend, we directly implement these programs in the respective programming language and generate ZK proofs of their execution traces.

Proofs of Proofs

This category combines ZK proofs and logical/mathematical proofs. For a program Pgm in a programming language PL, we use the K framework to generate a logical proof PI(Pgm, PL). This proof demonstrates the correctness of an execution trace of Pgm using a formal semantics of PL.

A logical proof checker can automatically verify proof_check(PI(Pgm, PL)). We then generate a ZK proof demonstrating the correctness of an execution trace of proof_check.

In essence, we generate a ZK proof that validates the correctness of a logical proof, which in turn verifies the correctness of Pgm written in the language PL. This is why we refer to this benchmark set as Proofs of Proofs--it generates (ZK) proofs of (logical) proofs.

Performance Tables

Some key details about the performance data for the ZK backends. The evaluations were carried out on an AMD Ryzen 9 7950X machine, with 16 cores, 32 threads, and 128 GB of memory, paired with a 4090RTX GPU and 108 GB of memory swap.

The performance time is measured in seconds, and it's important to note how these times were calculated. For all implementations, except RISC Zero, the execution time was determined by measuring the time difference between the start and end timestamps of each execution phase. RISC Zero, has its own performance counter that was utilized to measure both the execution time and cycles.

We've also noted where the implementation is using CPU or GPU acceleration. If you see the prefix "CPU", this refers to an implementation without GPU acceleration, whereas "GPU" denotes an implementation that utilized GPU acceleration.


ZK Backends

Cairo Backend

Cairo Zero Direct Implementation

Cairo Zero (v0.13.0)
ExamplesCPU Exec TimeCPU Prove TimeCPU Verify TimeCPU Total Time
transfer0.4400.1950.0080.643
batch-transfer6.82530.1960.86937.890
perceptron0.4480.1660.0080.662
svm0.4430.1760.0080.627

The programs were compiled using the default compiler of Cairo Zero, and were proven and verified using Lambdaworks Cairo Platinum Prover (v0.3.0)

Cairo One Direct Implementation

Cairo One (v2.3.1)
ExamplesCPU Exec TimeCPU Prove TimeCPU Verify TimeCPU Total Time
transfer0.5830.0090.0020.594
batch-transfer0.69165.6931.78768.171
perceptron0.5920.0290.0030.624
svm0.5930.0320.0030.628

The programs were compiled using LambdaClass Cairo-VM (v1.0.0-rc0), and were proven and verified using Lambdaworks Cairo Platinum Prover (v0.3.0)

Cairo Proofs of Proofs

Currently, we do not have Proof of Proofs for Cairo. However, we are making progress towards implementing them.


Lurk Backend

Lurk Direct Implementation

Lurk (v0.3.1)
ExamplesIterationsCPU Prove TimeGPU Prove TimeCPU Verify TimeGPU Verify TimeCPU Total TimeGPU Total Time
transfer342.3932.3130.5540.6182.9472.931
batch-transfer*5050373681.8191193.3559.8456.5713691.6641199.926
perceptron113.5010.8300.5410.5794.0421.409
svm91.8320.8200.5380.5982.3701.418

* The batch-transfer example utilizes lurk --rc 400 batch_transfer.lurk, while other tests do not use the --rc flag

Lurk Implementation Notes

Lurk is an interpreted programming language. When we run a Lurk example, we are actually executing the interpreter, which in turn executes the program. The execution time includes the time needed for the interpreter to load every function and definition. Consequently, we cannot measure the compilation time of the program itself.

Executing large Lurk examples requires an increase in swap memory, which results in slower execution times compared to other implementations. This limitation makes it challenging to measure and compare execution times between Lurk and other implementations accurately. Despite having 128GB of RAM plus 108GB of swap memory, we were still unable to execute most of the PI2PI^2 examples in Lurk. The symbol in the performance tables indicates this inability.

The --rc n flag in Lurk is used to enhance execution performance of larger programs. The rc value indicates the number of iterations that Lurk bundles together in a single Nova folding step. Here, iterations represent reduction steps in the Lurk Universal Circuit.

In terms of parallelism, higher rc values allow Lurk to generate more partial witnesses simultaneously. However, a larger rc value also requires more memory to execute the program. The default rc value is 10. For the batch-transfer example, we used rc=400. In smaller cases, increasing the rc value can reduce execution time, which is why it's used for programs with over 100K iterations.

The Lurk examples were run on the following version:

commit: 2023-12-21 510d7042990844760d97d65c7e6c7ab75f934630
lurk 0.3.1

The following setup was used to compile the Lurk binary and run the example using GPU:

export EC_GPU_CUDA_NVCC_ARGS='--fatbin --gpu-architecture=sm_89 --generate-code=arch=compute_89,code=sm_89'
export CUDA_ARCH=89
export NVIDIA_VISIBLE_DEVICES=all
export NVIDIA_DRIVER_CAPABILITITES=compute,utility
export EC_GPU_FRAMEWORK=cuda
cargo install --path . --features=cuda --force

The following setup was used to compile the Lurk binary and run the example using CPU:

export CUDA_PATH=
export NVCC=off
export EC_GPU_FRAMEWORK=none
cargo install --path . --force

Lurk Proofs of Proofs

Lurk (v0.3.1)
ExamplesCyclesCPU Prove TimeGPU Prove TimeCPU Verify TimeGPU Verify TimeCPU Total TimeGPU Total Time
impreflex*55651217.268107.5585.8003.967223.068111.525
transfer-goal3202986
batch-transfer-goal30122047
perceptron-goal6404208
svm-goal6404208

* The impreflex example utilizes lurk --rc 400 batch_transfer.lurk, while other tests do not use the --rc flag


RISC Zero Backend

RISC Zero Direct Implementation

RISC Zero (v0.16.1)
ExamplesCyclesCPU Exec TimeGPU Exec TimeCPU Prove TimeGPU Prove TimeCPU Verify TimeGPU Verify TimeCPU Total TimeGPU Total Time
transfer211560.0170.0302.3530.6130.0010.0022.3710.645
batch-transfer7541990.0570.05737.8787.5320.0020.00137.9377.590
perceptron211560.0170.0282.3550.5950.0010.0022.3730.625
svm211560.0280.0282.3510.6020.0020.0022.3810.632

RISC Zero Implementation Notes

From the RiscZero Terminogy, the Cycles we refer to in the performance tables are the smallest unit of computation in the zkVM circuit, similar to a clock cycle on a physical CPU. The execution complexity of a guest program is measured in these clock cycles as they directly impact the memory, proof size, and time performance of the zkVM.

Generally, a single cycle corresponds to one RISC-V operation. However, some operations may require two cycles.

RISC Zero Proofs of Proofs

RISC Zero (v0.16.1)
Examples*CyclesCPU Exec TimeGPU Exec TimeCPU Prove TimeGPU Prove TimeCPU Verify TimeGPU Verify TimeCPU Total TimeGPU Total Time
impreflex663660.0310.0304.7541.2560.0010.0014.7861.287
transfer-goal11392470.0340.03448.93810.6630.0030.00348.97510.700
batch-transfer-goal67248050.1140.114274.23759.8190.0110.011274.36259.944
perceptron-goal32123460.0490.050127.91128.4330.0060.006127.96628.489
svm-goal32123460.0690.050128.28928.6950.0060.006128.36428.751

* The main implementation for RISC Zero PI2PI^2 implementation is defined here, and the inputs are defined here. The inputs are divided into three files: *-gamma, *-claim, and *-proof. Ultimately, we anticipate that all PI2PI^2 implementations will support a unique binary input format. As a result, all implementations will utilize the same inputs and have a single main implementation.


zkLLVM Backend

zkLLVM Zero Direct Implementation

zkLLVM (v0.1.11-48)
ExamplesCPU Circuit Gen TimeCPU Prove+Verify Time
transfer0.7300.131
batch-transfer1.367143.183
perceptron0.7500.130
svm0.7300.132

zkLLVM Implementation Notes

zkLLVM does not support GPU acceleration in any stage, so there are no GPU results for these experiments.

The proof and verification for zkLLVM were generated using the command transpiler -m gen-test-proof.

The versions of the individual tools used for the examples are as follows:

$ clang-17 --version
clang version 17.0.4 (http://www.github.com/NilFoundation/zkllvm-circifier.git 4c393658e71bed430b996cff8555a548fbe8bbda)

$ assigner --version
0.1.11-48

$ transpiler --version
0.1.11-48

The symbol in the performance tables indicate that the example either did not finish executing after 6 hours or was terminated by the OS due to lack of memory.

zkLLVM Proofs of Proofs

zkLLVM (v0.1.11-48)
ExamplesCPU Circuit Gen TimeCPU Prove+Verify Time
impreflex0.585298.686
transfer-goal31.58524905.477
batch-transfer-goal236.820
perceptron-goal94.530
svm-goal93.431

The main implementation for the zkLLVM PI2PI^2 implementation can be found here. We translate the inputs, which are defined here to match zkLLVM input requirements, so each binary input file was encoded into an array on a single file in JSON format.