Filecoin — How storage replication is proved using zk-SNARK?
The Filecoin team released a beta version of Lotus in the second half of 2019. The hardware configuration of the test network is relatively high, 256G memory + Nvidia 2080TI graphics card. The leaderboard of the test network has also become a playing field. Increased computing power and block production efficiency are the main indicators.
I have sorted out the relevant logic of Lotus PoREP’s data processing (including Sector precommit and commit processes).
Lotus overall modules
Simply put, the Lotus / Filecoin project consists of three parts:
- Lotus Blockchain part — Implement blockchain-related logic (consensus algorithms, P2P, block management, virtual machine, etc.). Note that Lotus’s blockchain-related data is stored on IPFS. The part is implemented by Go language implementation.
- RUST-FIL-PROOF part — Implements Sector storage and proving. That is FPS (Filecoin Proving Subsystem). The part is implemented by Rust language implementation.
- Bellman part — Zero-knowledge proof (zk-SNARK) proof system, mainly based on BLS12_381 elliptic curve and realizes Groth16’s zero-knowledge proof system. Lotus officially recommends Nvidia’s 2080ti graphics card, which also mainly speeds up performance in this part. The part is implemented by Rust language implementation.
This article mainly introduces the core logic of the second part (that is, the storage and proof of Sector).
Before explaining the specific logic, introduce two basic terms: one is Stacked DRG and the other is Sector. Sector is relatively simple, it is a unit of data processing. Anyone who knows the structure of the hard disk knows that the smallest storage unit of the hard disk is called “Sector”. The sector used by Lotus is relatively large, 32G for Lotus test network.
Stacked DRG is an algorithm for Sector data processing. The storage data is processed to a certain extent to prove that the storage server has indeed stored some data truthfully, not forgery (attack). Filecoin begins to use “Zig Zag DRG” algorithm. Perhaps because it is too complicated (too slow), Lotus uses the “simplified” Stacked DRG algorithm. The differences between the two algorithms are shown below:
Two things need to be explained:
- Each node of Stacked DRG and each layer do not use “Zig Zag” liked dependencies. In other words, the dependency relationship between each node and other nodes is fixed.
- Stacked DRG addes node dependencies between each layer.
Sector Precommit Phase
Sector processing, which is the legendary precommit phase, is mainly composed of the following steps:
a. Construct the Merkel tree tree_d (sha256) for the original data, and the root of the tree is comm_d.
b. Label & Encode calculation:
Raw data, every 32 bytes, is called a Node. Each 128M is divided into a Window. The 32G sector has 256 windows. Each Window, according to the Stacked DRG algorithm, generates data for 4 layers. From the previous layer, the data of the next layer is generated by Encode calculation. Encode calculation is currently the modulo add operation. Generate the “key” by using the sha256 algorithm between the window number and the node relationship of the Stacked DRG. Add the Key and the original data to generate the result of the Encode calculation.
c. Generate data of Layer 4 to construct the Merkel tree tree_q (pedersen), the root of which is comm_q . The generated data of Layer4 with exp dependency, constructs the Merkel tree tree_r_last (pedersen), the root of which is comm_r_last.
d. Column Hash calculation
In the data of 256 windows of Layer 4, the Nodes at the same position are spliced together, and the Column Hash result is generated after hashing. Note that the calculation result of Column Hash is only one window size. According to the calculation result of Column Hash, a Merkel tree tree_c is constructed, and the root of the tree is comm_c.
The data that needs to saved on chain is the root of all the Merkel trees: comm_d and comm_r, where comm_r is the result of the pedersen hash of (comm_c | comm_q | comm_r_last).
The core code is in the transform_and_replicate_layers function of rust-fil-proofs / storage-proofs / src / stacked / proof.rs. You can check the specific code according to the following call relationship.
Sector Commit Phase
After Sector precommit, the processed data needs to be “proved”. After all, all processed data cannot be submitted to the blockchain. Lotus uses Bellman’s zero-knowledge proof library and uses Groth16 algorithm for data processing proof. The related logic components are as follows:
RUST-FIL-PROOF (FPS) implements StackedCompound, which is specifically used to implement Stacked DRG data processing proving. StackedCompound integrates two parts, one is a circuit (Stacked Circuit), and the other is a Stacked Drg, which implements circuit data preparation. These sections are divided into sub-functions or sub-circuits (Window, Wrapper, ReplicaColumn, etc.). When Bellman is used to generate a proof, the synthesize interface of the corresponding circuit will be called, thereby completing the process of generating R1CS for the entire circuit.
Lotus’s commit phase proves the following facts:
1 / By random number generated on the chain, randomly select 50 processed data nodes. That is, in the replica (replicate data), randomly select 50 node data, which is called challenges.
2 / Prove that these data are generated step by step from the original data. Moreover, the roots of the four Merkel trees generated during the previous processing can be constructed.
3 / Repeat steps 1 and 2 10 times, which is called partitions.
The proof of step 1/2 is shown below:
Simply put, by random information from the chain (related to the block height), 500 processed Node data is randomly selected and proved that these data are calculated from the original data by the Stacked DRG algorithm . In addition, these data can form the roots of the four Merkel trees that were previously submitted to the chain.
Lotus’s circuit logic is relatively complicated, and the circuit scale has reached 100 million. Proof generation is also very long, more than 2 hours with server-end CPU and more than 1 hour with Nvidia 2080ti GPU .
Proof on-chain verification
Totally, the data submitted to the chain includes comm_d, comm_r and proof proof data. The miner smart contract on the chain will verify that the submitted proof is correct:
func ValidatePoRep (ctx context.Context, maddr address.Address, ssize uint64, commD, commR, ticket, proof, seed  byte, sectorID uint64) (bool, ActorError)
The specific code is in the lotus / chain / actors / actor_miner.go file. Ticket and seed are the random information provided on the chain. The verification process is that the RUST-FIL-PROOF module calls Bellman library to verify whether the zero-knowledge proof is correct. The verification process is relatively fast and around several milliseconds.
Filecoin(Lotus) uses the Stacked DRG algorithm to replicate data, and uses the Groth16 zero-knowledge proof algorithm to prove the data is processed correctly. Lotus’s zero-knowledge proof proves that the randomly selected 500 node data in one replica is correctly generated by the Stacked DRG algorithm, and can generate the roots of the specified Merkel trees. Lotus’s circuit logic is relatively large and the circuit scale has reached 100 million. Proof generation is also very long, more than 2 hours with server-end CPU and more than 1 hour with Nvidia 2080ti GPU .