Zero Knowledge Proof — Introduction to SP1 zkVM Source Code

14 min readJun 19, 2024

zkVM has brought the application of zero-knowledge proof technology into a new era. A few years ago, applying zero-knowledge proof technology required understanding complex zero-knowledge proof algorithms and describing the proof logic as a “circuit”. zkVM encapsulates these complex logics. Based on zkVM, developers can easily describe proof logic using familiar high-level languages. Currently, there are many zkVMs emerging in the market. Let’s take a look at the code and design of SP1.

SP1 — zkVM

SP1 is a virtual machine that supports the RISC-V instruction set. For programs executed on this virtual machine, SP1 can generate proofs to demonstrate correct execution.

SP1 provides a toolchain to compile Rust code into ELF file (RISC-V instruction set). According to chip functions, the SP1 virtual machine is composed of multiple chips: CPU, ALU, memory, etc. Based on Plonky3, SP1 enhances the AIR description of chip interconnection and uses the LogUp algorithm to achieve a complete description of the RISC-V CPU-based virtual machine constraints. Besides, the memory access consistency problem is cleverly transformed into a permutation problem, which is also achieved through the LogUp algorithm.

Building Circuits with Plonky3

SP1 builds its virtual machine based on Plonky3. SP1 can be seen as a large circuit “application” on Plonky3. Therefore, to understand the SP1 source code, you need to first understand the logic and interfaces of Plonky3 circuit building.

Plonky3 source code: https://github.com/Plonky3/Plonky3.git.

In it, uni-stark/tests/fib_air.rs is an example of building a circuit with Plonky3, implementing the Fibonacci sequence calculation. From this example, you can quickly learn the circuit building method of Plonky3.

First, look at the interface of the prove function:

pub fn prove<
    SC,
    #[cfg(debug_assertions)] A: for<'a> Air<crate::check_constraints::DebugConstraintBuilder<'a, Val<SC>>>,
    #[cfg(not(debug_assertions))] A,
>(
    config: &SC,
    air: &A,
    challenger: &mut SC::Challenger,
    trace: RowMajorMatrix<Val<SC>>,
    public_values: &Vec<Val<SC>>,
) -> Proof<SC>
where
    SC: StarkGenericConfig,
    A: Air<SymbolicAirBuilder<Val<SC>>> + for<'a> Air<ProverConstraintFolder<'a, SC>>,
{

In addition to configuration information (config) and challenge generation calculation method (challenger), the prove function requires: 1/ circuit description (air) 2/ trace information and 3/ public input information (public_values). The Plonky3 proof system calls the eval function in the Air interface to build or check circuit constraints through the AirBuilder. After obtaining the polynomial form of the circuit constraints, the Plonky3 proof system generates proofs through polynomial commitment and opening.

Let’s mainly look at the process of generating AIR circuit descriptions and trace.

AIR Description

Plonky3 uses AIR (Algebraic Intermediate Representation) to describe the circuit. The structure describing the circuit must implement the BaseAir and Air interfaces (defined in air/src/air.rs). The BaseAir interface implements the basic information of the circuit (width). The Air interface implements the eval function to describe circuit constraints.

Focus on the definition of the eval function:

pub trait Air<AB: AirBuilder>: BaseAir<AB::F> {
    fn eval(&self, builder: &mut AB);
}

Plonky3 provides a construction tool (AirBuilder) for constraint description. Developers can easily build constraints through AirBuilder. AirBuilder implements a series of when and assert functions to facilitate developers in defining constraints “under certain conditions”. At the same time, AirBuilder provides the main function, abstracting the “internal” representation of the current row and the next row.

Plonky3 comes with three builders: DebugConstraintBuilder, SymbolicAirBuilder, and ProverConstraintFolder. DebugConstraintBuilder is defined in uni-stark/src/check_constraints.rs and is mainly used to check whether the constraints of each row are met. ProverConstraintFolder is defined in uni-stark/src/folder.rs and is mainly used to construct the quotient polynomial of the constraint polynomial. SymbolicAirBuilder is mainly used to determine the degree of the quotient polynomial.

Taking the example of Fibonacci sequence calculation, the core logic described by AIR is as follows:

let main = builder.main();
let (local, next) = (main.row_slice(0), main.row_slice(1));

//If it is the first line, the left and right elements of the first line are public inputs a and b
let mut when_first_row = builder.when_first_row();
when_first_row.assert_eq(local.left, a);
when_first_row.assert_eq(local.right, b);

//If it is not the first or last row, the left element of the next row is equal to the right element of the previous row, and the right 
let mut when_transition = builder.when_transition();
when_transition.assert_eq(local.right, next.left);
when_transition.assert_eq(local.left + local.right, next.right);

//If it is the last row, the element on the right is another public input (x)
builder.when_last_row().assert_eq(local.right, x);

First, get the description of the current row and the next row through the builder, and constrain the constraint relationship between the elements of these two rows through a series of assert functions.

Trace Generation

The trace information is stored as a matrix (RowMajorMatrix). The number of columns in the matrix is equal to the number of columns defined in the BaseAir interface. Taking the Fibonacci sequence calculation example, the trace consists of 8 rows and 2 columns, and its generation logic is as follows:

let mut trace =
        RowMajorMatrix::new(vec![F::zero(); n * NUM_FIBONACCI_COLS], NUM_FIBONACCI_COLS);  //Initialize matrix information

for i in 1..n {
        rows[i].left = rows[i - 1].right; //The left element of the next row is equal to the right element of the previous
        rows[i].right = rows[i - 1].left + rows[i - 1].right;  //The right element of the next row is equal to the right element of the previous row
    }

After getting familiar with the circuit construction method of Plonky3, I started to look at SP1.

The source code of SP1 is available at https://github.com/succinctlabs/sp1.git. The last commit of the SP1 source code used in this article is as follows:

commit 3c0dae88ffbcda9fd261e51bd1fea7571f0fd59e
Merge: 2ce75c6e f0a9f2f2
Author: Ratan Kaliani <ratankaliani@berkeley.edu>
Date:   Thu May 30 11:49:13 2024 -0700

    refactor: ProverClient::remote -> ProverClient::network (#840)

SP1 Client/SDK

SP1 provides the client command “cargo prove” to help developers to build their logics, which will be proved . All logics are written in rust code. The cargo prove command is installed through the sp1up script, and its corresponding source code is: cli/src/bin/cargo-prove.rs. “cargo prove” implements some subcommands: project creation, compilation, proof generation, compilation and installation tool chain.

Usage: cargo prove [OPTIONS]
       cargo prove <COMMAND>

Commands:
  new                Setup a new project that runs inside the SP1.
  build              Build a program
  prove              (default) Build and prove a program
  build-toolchain    Build the cargo-prove toolchain.
  install-toolchain  Install the cargo-prove toolchain.
  help               Print this message or the help of the given subcommand(s)

The SP1 virtual machine uses the riscv32im instruction set. SP1 modifies the core code of rust, adds support for the riscv32im instruction set, and provides a corresponding tool chain.

The new subcommand is used to create a project template: create program and script directories. The program directory creates a virtual machine to run the program. The script directory initializes the virtual machine, executes the program and generates a certificate.

The compilation process is relatively simple. Specify the compilation tool chain and call the cargo command to compile and generate an elf file:

let result = Command::new("cargo")
            .env("RUSTUP_TOOLCHAIN", "succinct")
            .env("CARGO_ENCODED_RUSTFLAGS", rust_flags.join("\x1f"))
            .args(&cargo_args)
            .status()
            .context("Failed to run cargo command.")?;

"passes=loweratomic" //Single-threaded program, no need to consider atomic operations
"link-arg=-Ttext=0x00200800", //Program address starts from 0x00200800

With the compiled elf file and the specified input data, the prove subcommand generates a proof through ProverClient:

let client = ProverClient::new();
let (pk, _) = client.setup(&elf);
let proof = client.prove(&pk, stdin).unwrap();

ProverClient is defined in the SDK and abstracts several Prover types: MockProver, LocalProver, and NetworkProver. In order to understand the core logic of SP1, we mainly focus on the implementation of LocalProver. LocalProver is an encapsulation of SP1Prover. SP1Prover is implemented in the prover directory. The core logic of SP1Prover, such as setup/prove/verify, is implemented in the core directory.

StarkMachine — A RISC-V CPU based machine

StarkMachine is a machine described by AIR. For better description, SP1 created a new trait interface — MachineAir. MachineAir added a generate_trace interface to generate trace information. MachineAir also added a generate_dependencies interface, descripting dependencies between chips.

StarkMachine consists of the following chips:

CpuChip is responsible for the execution sequence of the program. MemoryChip is responsible for memory information. MemoryProgramChip stores the execution program. ByteChip stores the lookup table for byte operations. ALU includes addition, subtraction, multiplication, division and bit operation chips. In addition, there are some precompiled chips. From the definition of the column and the constraint description (eval function), you can roughly understand the main functions of the chip.

CPU chip

All the column information of the CPU chip is defined by the CpuCols structure: core/src/cpu/columns/mod.rs. The column information of the CPU chip includes: shard/channel, clk, pc,instruction, op_a/op_b/op_c and other information. There are many column information of the CPU chip, which are not listed one by one. Most of them are relatively simple and easy to understand.

CpuChip constraint implementation: core/src/cpu/air/mod.rs. It mainly constrains the connection relationship with other chips (Interaction is explained in detail later), the correctness of pc, the correct use of registers, and whether the operands in the instruction are correct. For detailed constraint relationships, interested partners can view the source code.

Memory Chip

MemoryChip is divided into two types: Initialize (memory initialization) and Finalize (memory final state).

pub enum MemoryChipType {
    Initialize,
    Finalize,
}

Although the memory is divided into two types, both types of memory chips use similar constraint structures. The column information is defined by the MemoryInitCols structure:

pub struct MemoryInitCols<T> {
    pub shard: T,
    pub timestamp: T,
    pub addr: T,
    pub value: Word<T>,
    pub is_real: T,
}

The main information includes: the time, address and value of memory access. is_real indicates whether a certain memory access actually occurs in the execution sequence. In other words, some addresses in the memory have never appeared in the execution sequence of a certain program.

It should be noted that the register address space is mapped as part of the memory chip address space, starting from address 0. In the constraint description, there is a clear constraint that the memory of register 0 must be 0.

ProgramChip

The program chip is defined in: core/src/program/mod.rs. The column information of the program chip is divided into preprocessing and non-preprocessing parts.

pub struct ProgramPreprocessedCols<T> {
    pub pc: T,
    pub instruction: InstructionCols<T>,
    pub selectors: OpcodeSelectorCols<T>,
}
pub struct ProgramMultiplicityCols<T> {
    pub shard: T,
    pub multiplicity: T,
}

The preprocessing part mainly includes PC and corresponding instruction information (including decoding information). The non-preprocessing part describes the usage of the code during execution.

MemoryProgramChip

MemoryProgramChip provides program’s “fixed” memory and is defined in core/src/memory/program.rs.

pub struct MemoryProgramPreprocessedCols<T> {
    pub addr: T,
    pub value: Word<T>,
    pub is_real: T,
}
pub struct MemoryProgramMultCols<T> {
    pub multiplicity: T,
    pub is_first_shard: IsZeroOperation<T>,
}

The column information of MemoryProgramChip is divided into two parts: one part is the part that can be preprocessed, and the other part is the part that cannot be preprocessed.

ByteChip

ByteChip provides byte processing related functions, such as bit logic operations, range checks, etc. ByteChip is defined in core/src/bytes/mod.rs.

ByteChip column information also includes two parts: preprocessed and non-preprocessed.

pub struct BytePreprocessedCols<T> {
    pub b: T,
    pub c: T,
    pub and: T,
    pub or: T,
    pub xor: T,
    pub sll: T,
    pub shr: T,
    pub shr_carry: T,
    pub ltu: T,
    pub msb: T,
    pub value_u16: T,
}

pub struct MultiplicitiesCols<T> {
    pub multiplicities: [T; NUM_BYTE_OPS],
}

pub struct ByteMultCols<T> {
    pub shard: T,
    pub mult_channels: [MultiplicitiesCols<T>; NUM_BYTE_LOOKUP_CHANNELS as usize],
}

You can imagine that BytePreprocessedCols gives the function “truth table”. ByteMultCols provides specific function usage. Note that ByteChip provides multi-channel functions. In other words, a ByteChip provides byte processing functions for 4 channels.

ALU chip — AddSubChip

The ALU chip has many functions. Take addition and subtraction as an example. AddSubChip is defined in core/src/alu/add_sub/mod.rs.

pub struct AddSubCols<T> {
    pub shard: T,
    pub channel: T,
    pub add_operation: AddOperation<T>,
    pub operand_1: Word<T>,
    pub operand_2: Word<T>,
    pub is_add: T,
    pub is_sub: T,
}

channel indicates that the current operation uses the Byte function of a channel. operand_1/operand_2 and add_operation indicate the operator and the calculation result.

All chips are further encapsulated by the Chip structure: core/src/stark/chip.rs

pub struct Chip<F: Field, A> {
    /// The underlying AIR of the chip for constraint evaluation.
    air: A,
    /// The interactions that the chip sends.
    sends: Vec<Interaction<F>>,
    /// The interactions that the chip receives.
    receives: Vec<Interaction<F>>,
    /// The relative log degree of the quotient polynomial, i.e. `log2(max_constraint_degree - 1)`.
    log_quotient_degree: usize,
}

Interaction

Let’s first introduce an important concept: Interaction. Interaction is used to verify or constrain the connection relationship between multiple chips. A chip includes two “directional” interconnection operations: send and receive. Sending interconnection refers to the logic that the chip itself cannot verify and needs to be “sent” to other chips for verification. Receiving interconnection refers to the chip receiving verification requests from other chips.

Interaction is defined in core/src/lookup/interaction.rs:

pub struct Interaction<F: Field> {
    pub values: Vec<VirtualPairCol<F>>,
    pub multiplicity: VirtualPairCol<F>,
    pub kind: InteractionKind,
}

InteractionKind is used to distinguish different connection types:

pub enum InteractionKind {
    /// Interaction with the memory table, such as read and write.
    Memory = 1,
    /// Interaction with the program table, loading an instruction at a given pc address.
    Program = 2,
    /// Interaction with instruction oracle.
    Instruction = 3,
    /// Interaction with the ALU operations.
    Alu = 4,
    /// Interaction with the byte lookup table for byte operations.
    Byte = 5,
    /// Requesting a range check for a given value and range.
    Range = 6,
    /// Interaction with the field op table for field operations.
    Field = 7,
    /// Interaction with a syscall.
    Syscall = 8,
}

In order to better “collect” the interconnection information between a chip and other chips, SP1 created InteractionBuilder. InteractionBuilder only focuses on the interconnection information between chips and ignores the constraint information of the chips.

Review the eval functions of all chips and summarize the interconnection information of the chips in SP1 as follows:

An interesting question is, how to constrain the “interconnection” to exist? The data sent and the received data of a connection are consistent? SP1 uses the LogUp (Log Derivative Lookup Argument) algorithm to prove that the two sequences are permutation relations. The LogUp algorithm and the proof protocol of SP1 can be found in 6.3 of the SP1 technical design document. Draw a picture to illustrate the idea.

For example, Chip2 and Chip3 send some data to Chip1. Multiplicity records the number of times each row of data is used.

The sender and receiver first perform random linearization on the data of each row. The result is multiplied by the multiplicity corresponding to each row and stored in the permutation column corresponding to each chip. Note that the mulitplicity of the receiver is negative. After all permutations are calculated, they are accumulated. Obviously, if the data sent and received by the sender and receiver are consistent (except for the order of rows), the cumulative sum of the sender’s chip and the cumulative sum of the receiver’s chip add up to 0.

The formula in the SP1 technical design document describes the process more clearly.

Note that the calculation of permutation and the calculation of cumulative sum are also constrained and proved in SP1.

Memory consistency guarantee

Careful friends may find that the “intermediate change” process of the memory is not reflected in the memory chip. The consistency of memory access (the data read by the memory is the data written before) is achieved by proving that the read and write data are “permuted”. The principle of permutation proof is explained in detail in the “Chip Interconnection” section. Here, explain how the consistency of memory access is converted into a “permutation” problem. The reading and writing of memory are constrained to a pair of read and write operations. The memory read becomes (read, write), and the read and written data are consistent. The memory write becomes (read, write), and the read and written data are inconsistent. Except for the initialization data and the final data, the read and write sequences are the same. After the read and write sequences and clk numbering, the order is eliminated, and the problem is converted into a problem of the read and write sequence being permuted. Taking the read and write sequence of a memory unit (x0) as an example, the following figure illustrates the logical principle of permutation:

The memory consistency problem is converted into the question of whether the Memory Read/Memory Write sequence in the above figure is permuted. Similar to chip interconnection, Memory Read/Write constitutes a “virtual” connection, and the logic is as follows:

fn eval_memory_access<E: Into<Self::Expr> + Clone>(
    &mut self,
    shard: impl Into<Self::Expr>,
    channel: impl Into<Self::Expr>,
    clk: impl Into<Self::Expr>,
    addr: impl Into<Self::Expr>,
    memory_access: &impl MemoryCols<E>,
    do_check: impl Into<Self::Expr>,
) {
...
        // The previous values get sent with multiplicity = 1, for "read".
        self.send(AirInteraction::new(
            prev_values,
            do_check.clone(),
            InteractionKind::Memory,
        ));

        // The current values get "received", i.e. multiplicity = -1
        self.receive(AirInteraction::new(
            current_values,
            do_check.clone(),
            InteractionKind::Memory,
        ));
}

The above describes the SP1 virtual machine from the AIR perspective. Next, we can look at the virtual machine execution and trace generation.

Runtime — Program execution

A program is executed in the Runtime environment and records various events required for tracing. The logic is relatively simple, and the main code framework is as follows:

fn execute(&mut self) -> Result<bool, ExecutionError> {
    if self.state.global_clk == 0 {
        self.initialize();
    }
    ...
    loop {
        if self.execute_cycle()? {
            done = true;
            break;
        }
        ...
    }

    if done {
        self.postprocess();
    }

    Ok(done)
}

A Runtime environment mainly consists of three parts: IO (input and output management), ExecutionState (storing intermediate states during execution), and ExecutionRecord (recording various events after the program).

Note that every certain number (1<<22) of instructions forms a shard. The shard size is configured as follows:

pub struct SP1CoreOpts {
    pub shard_size: usize,
    pub shard_batch_size: usize,
    pub shard_chunking_multiplier: usize,
    pub reconstruct_commitments: bool,
}

impl Default for SP1CoreOpts {
    fn default() -> Self {
        Self {
            shard_size: 1 << 22,
            shard_batch_size: 16,
            shard_chunking_multiplier: 1,
            reconstruct_commitments: false,
        }
    }
}

Sharding — Trace generation and segmentation

According to the various Event information in ExecutionRecord, reorganize according to shard information. For all the information of each shard, enumerate each chip to generate the corresponding trace information.

With trace information and AIR constraints, proofs can be generated. The prove_shard function is used to generate proofs for a shard and is defined in core/src/ stark/prover.rs.

pub fn prove_shard(
    config: &SC,
    pk: &StarkProvingKey<SC>,
    chips: &[&MachineChip<SC, A>],
    mut shard_data: ShardMainData<SC>,
    challenger: &mut SC::Challenger,
) -> ShardProof<SC>

In addition to the proof of chip interconnection, the core algorithm of the proof is similar to Plonky3 (polynomial representation of trace, commitment, opening). So far, the proof of each shard proves that the logic of each chip in the shard, the permutation calculation on the chip, and the interconnection logic between chips are correct. The detailed code of the shard proof algorithm can be viewed by interested partners.

Recursive Proof

For scenarios where verification overhead needs to be reduced, SP1 provides recursive proof logic to “compress” multiple shard proofs into one proof. This proof can be further generated through the Groth16 algorithm to facilitate on-chain verification. Borrowing Figure 6 of the SP1 technical whitepaper: