Recently, I had some free time to explore RISC0’s source code. Zero-knowledge Virtual Machines (zkVMs) have always been a topic I wanted to delve deeper into. zkVM abstracts and encapsulates zero-knowledge proof technology. Complex operations can be described in higher-level languages, executed by the zkVM, and effortlessly generate proofs. It feels incredibly cool that even complex programs can generate proofs quickly. I welcome anyone interested in zkVM technology to leave comments and discuss!

RISC0 Source Code Repository: https://github.com/risc0/risc0.git. The last commit used in this article is as follows:

commit ba4fdb21c06dd543462d4bd7f3a7873f0066f22f
Author: Austin Abell <austinabell8@gmail.com>
Date: Thu Dec 19 21:07:30 2024 -0500

update RSA patch to tag, use in compat test (#2649)

Switched rsa tag from specific commit

Source Code Structure

RISC0’s main logic is implemented in Rust. The following modules in the source code are relatively important:

  • bonsai: Bonsai proof service. Through API interfaces, commands requiring proofs are submitted, and Bonsai returns the corresponding proofs.
  • groth16_proof: The final Groth16 proofs in RISC0. Circuits are implemented using the Circom language.
  • risc0/circuit: Circuits used in the RISC0 project, including RV32IM circuits, Keccak circuits, “compression” circuits, etc. Note that specific circuits are not implemented in this project. This directory contains only the interfaces related to circuit functionality. The actual implementation methods are introduced later.
  • risc0/core: Definitions of the domains upon which the RISC0 proof system relies. RISC0 uses the Baby Bear field.
  • risc0/groth16: Implements the Groth16 verifier.
  • risc0/r0vm: Command-line implementation of RISC0.
  • risc0/zkp: The RISC0 proof system.
  • risc0/zkvm: Implementation related to the virtual machine (VM), including interactions between guest and host, host-side APIs, and prover interfaces.
  • rzup: RISC-V compilation/management toolchain.

High-level Logic of RISC0

RISC0, or zkVM, operates on a general logic: a guest program (based on a specific instruction set) is executed within a VM environment and generates the corresponding proof.

https://dev.risczero.com/api/zkvm/

RISC0 represents this type of VM. The part implementing the VM environment is usually referred to as the Host. The portion executed within the VM is typically called the Guest. The Guest program is based on the RV32IM instruction set (but does not support privileged instructions like RCS). Guest programs are compiled into instruction sequences using the Rust compiler. These are executed by the Executor, potentially generating multiple Segments. These segments are packaged into a Session, sent to the Prover for proof generation, and finally produce the result and proof (Receipt).

VM and Emulator

VMs used for zero-knowledge proofs are typically simplified Virtual Machines. These VMs focus mainly on computation results and consist primarily of a CPU and MEMORY. The CPU is based on the RV32IM instruction set. RISC0’s memory is defined in risc0/zkvm/platform/src/memory.rs:

pub const MEM_BITS: usize = 28;
pub const MEM_SIZE: usize = 1 << MEM_BITS;
pub const GUEST_MIN_MEM: usize = 0x0000_0400;
pub const GUEST_MAX_MEM: usize = SYSTEM.start;

pub const STACK_TOP: u32 = 0x0020_0400;

pub const TEXT_START: u32 = 0x0020_0800;
pub const SYSTEM: Region = Region::new(0x0C00_0000, mb(16));
pub const PAGE_TABLE: Region = Region::new(0x0D00_0000, mb(16));
pub const PRE_LOAD: Region = Region::new(0x0D70_0000, mb(9));

The memory size is set to 2²⁸ bits (32 MB). From address 0x0C000000, 16 MB is allocated as system memory. Registers are mapped to system addresses. Starting from 0x0D000000, another 16 MB is used for page table information. Additionally, the stack's starting position and the program's starting position are defined.

The Emulator simulates CPU execution and memory management. It is implemented in risc0/circuit/rv32im/src/prove/emu. The instruction set implementation resides in risc0/circuit/rv32im/src/prove/emu/rv32im.rs.

The Executor’s run function calls the Emulator to execute the Guest program:

pub fn run<F: FnMut(Segment) -> Result<()>>(
&mut self,
segment_po2: usize,
max_cycles: Option<u64>,
mut callback: F,
) -> Result<ExecutorResult> {
...
let mut emu = Emulator::new();
loop {
...
emu.step(self)?;
...
}
}

The main purpose of executing the program through the Emulator is to divide it into multiple Segments and generate the corresponding system state for each segment. The system state primarily comprises memory (including register) information.

System State Representation

System states are represented by the SystemState structure:

pub struct SystemState {
pub pc: u32,
pub merkle_root: Digest,
}

The system state includes the program counter (PC) and memory (including register) data. Memory data is represented by a Merkle tree root. However, from the code, memory data does not use an actual Merkle tree structure but iteratively computes SHA-256 hashes. In summary, the entire system state consists of:

  1. Memory data.
  2. Register information.

After defining the system state, the program can be divided accordingly.

Understanding Segments

Segments are defined in risc0/circuit/rv32im/src/prove/segment.rs:

pub struct Segment {
pub partial_image: MemoryImage,
pub pre_state: SystemState,
pub post_state: SystemState,
pub syscalls: Vec<SyscallRecord>,
pub insn_cycles: usize,
pub po2: usize,
pub exit_code: ExitCode,
pub index: usize,
pub input_digest: Digest,
pub output_digest: Option<Digest>,
}

Key components of a Segment:

  1. insn_cycles: Number of instructions in the current segment.
  2. pre_state/post_state: System states before and after executing instructions in the segment.
  3. input_digest/output_digest: Possible input and output information.

A complete program, after being divided into multiple segments, forms a Session, which is sent to the Prover for proof generation.

Multiple Provers

There are two types of Provers:

  1. Local Prover: Generates proofs locally.
  2. Remote Prover: Uses a remote service called Bonsai.

Local Provers are further divided into three implementations based on the programming language: Rust, CUDA, and Metal. To support multi-language implementations, RISC0 defines a Hardware Abstraction Layer (HAL), which has two types:

  • Field HAL: Handles field-related computations and data management (risc0/zkp/src/hal/mod.rs).
  • Circuit HAL: Handles witness generation for circuits (risc0/circuit/rv32im/src/prove/hal/mod.rs):
pub(crate) trait CircuitWitnessGenerator<H: Hal> {
#[allow(clippy::too_many_arguments)]
fn generate_witness(
&self,
mode: StepMode,
trace: &RawPreflightTrace,
steps: usize,
count: usize,
ctrl: &H::Buffer<H::Elem>,
io: &H::Buffer<H::Elem>,
data: &H::Buffer<H::Elem>,
);
}

To prove a Segment, the Circuit HAL generates the witness, and then the Field HAL completes the proof computations and generation.

Constraint Circuit Generation

Zirgen is an interesting circuit compilation tool. Zirgen is a domain-specific language (DSL). Circuits written in this DSL can be “compiled” into implementations in multiple languages.

https://github.com/risc0/zirgen

Using Zirgen, RISC0’s zkVM circuits and recursive circuits (for aggregating proofs) are implemented. The recursive circuit aggregates multiple zkVM proofs into a single proof.

The zkVM circuit logic is implemented in zirgen/circuit/rv32im/v2/dsl/top.zir. The Top() function implements the circuit's top-level design, reading the pre-execution state, executing an instruction, and generating the post-execution state.

Zirgen’s compilation system leverages LLVM and MLIR, creating its own dialect. Through MLIR dialects, the circuits are converted into Rust/C++/CUDA code. Understanding this logic requires familiarity with LLVM/MLIR. Interested readers can explore the details further.

RISC0 Proof System

RISC0’s proof system implements zk-STARK using the FRI protocol, DEEP-ALI, and HMAC-SHA-256-based pseudo-random functions (PRFs). The detailed algorithm is described in the design document:

https://dev.risczero.com/proof-system-in-detail.pdf

The implementation is in risc0/zkp/src/prove/prover.rs.

https://github.com/risc0/risc0/blob/main/website/docs/proof-system/proof-system.md

Through the Emulator, a program is divided into multiple segments. These segments are proven by the Prover, generating multiple Receipts. These Receipts are further aggregated into a single proof using recursive circuits. To enable on-chain verification (reducing verification costs), the aggregated proof is further proven using Groth16 to produce the final proof.

Summary

RISC0 is a zkVM that supports program execution based on the RISC-V instruction set. The zkVM’s circuits are written using Zirgen. Zirgen leverages LLVM/MLIR to convert circuit constraints into multi-language implementations. The RISC0 proof system implements zk-STARK using the FRI protocol, DEEP-ALI, and HMAC-SHA-256-based pseudo-random functions (PRFs).

--

--

Trapdoor-Tech
Trapdoor-Tech

Written by Trapdoor-Tech

Trapdoor-Tech tries to connect the world with zero-knowledge proof technologies. zk-SNARK/STARK solution and proving acceleration are our first small steps :)

Responses (1)