MergeSync: Parallel UTXO-set construction in assumevalid trust model

Nov 9 - Nov 21, 2025

  • The Bitcoin blockchain has experienced remarkable growth over the past 16 years, amassing more than 1.2 billion on-chain transactions and nearly 700GiB of data.

This expansion has introduced significant challenges in setting up new nodes due to the extensive dataset that requires processing. A critical part of this process involves managing over 3 billion Unspent Transaction Outputs (UTXOs), with a vast majority being either already spent or unspendable. The conventional method of sequentially downloading and validating each block during the initial block download (IBD) phase is notably hindered by I/O capabilities rather than CPU speed, leading to a prolonged setup time for new nodes.

In response to these scalability and efficiency concerns, a novel approach inspired by but distinct from Ruben Somsen's SwiftSync has been proposed. This method employs a map/reduce strategy along with parallel data processing to accelerate the construction of the UTXO set, thereby significantly decreasing the time required for nodes to synchronize with the network under certain trust assumptions. The process involves sharding the blockchain data across multiple cores with each core tasked with extracting OutPoints from transactions, excluding those that are provably unspendable and coinbase inputs. Following a single pass elimination of duplicates, the OutPoints undergo a recursive merging process until a unified file is achieved, indicating the cardinality of the UTXO set at a specified block height. Termed MergeSync, this method capitalizes on parallel processing capabilities and streamlines set operations, albeit without engaging in signature processing or verification, thus presenting a potential security risk unless validated by a trustless node.

Despite its innovative approach, there is skepticism regarding the efficiency of this new proposal, especially when compared against existing methodologies like Bitcoin Core's current strategy and SwiftSync's use of accumulators for optimizing checks of spent UTXOs. Concerns have been raised about the necessity of writing out all UTXO data along with all inputs' outpoints, a process that might not sufficiently reduce I/O operations. SwiftSync's method, which benefits from the use of hints and allows for parallel processing, is seen as a more advanced solution due to its ability to compress the verification process into a single accumulator, thereby enhancing processing speed and efficiency. This critique questions the novel contribution of the proposed idea over these established methods, emphasizing the need for clear evidence of its superiority or innovative advantage before it can be considered a viable alternative.

Furthermore, the proposed method's reliance on tracking everything and removing duplicates through sorting indicates that its memory footprint could scale with the number of (U)TXOs, similar to a regular full node. This is in contrast to SwiftSync's approach, which avoids such scalability issues through compression and hints files. The absence of a hints file (~80MB compressed) is acknowledged as a benefit, yet the substantial memory requirement for sorting or lookups presents a considerable downside. Additionally, the inability to disable assumevalid and perform full signature validation raises further concerns. However, the observation that sorting could potentially replace database lookups for order-independent addition/removal of UTXOs offers an interesting angle, though its practical trade-offs require careful evaluation.

Link to Raw Post
Bitcoin Logo

TLDR

Join Our Newsletter

We’ll email you summaries of the latest discussions from high signal bitcoin sources, like bitcoin-dev, lightning-dev, and Delving Bitcoin.

Explore all Products

ChatBTC imageBitcoin searchBitcoin TranscriptsSaving SatoshiDecoding BitcoinWarnet
Built with 🧡 by the Bitcoin Dev Project
View our public visitor count

We'd love to hear your feedback on this project.

Give Feedback