TLDR

Benchmarking Bitcoin Script Evaluation for the Varops Budget (Great Script Restoration)

Posted by Julian

Nov 10, 2025/20:57 UTC

The discussion revolves around optimizing and accurately benchmarking programming operations, specifically within the context of using CMAKE_BUILD_TYPE for compiling code. The default setting used is RelWithDebInfo, which is believed to be the most commonly utilized among programmers. Despite the negligible performance difference observed between the 'Release' and 'RelWithDebInfo' settings, it's suggested that specifying build options could enhance the comparability of benchmark results.

A new GitHub repository has been established at https://github.com/jmoik/varopsData for adding CSV files via pull requests (PRs), aimed at improving data organization and accessibility for benchmark analysis. This initiative underscores the importance of transparent and collaborative data handling in programming projects.

Errors encountered during benchmarks are acknowledged, with an explanation provided that such errors are anticipated for certain benchmarks. These errors result in a time of 0 seconds, which is disregarded since the focus is on capturing the worst times for performance evaluation. There's a specific mention of needing to refine the benchmarking approach for OP_DIV / OP_MOD operations to ensure accuracy and relevance of the data collected.

The effectiveness of capping the variable operations (varops) budget at 100% is debated, particularly regarding its impact on evaluating the accuracy of varops ratios. An example given involves XOR tests, where capping might limit insight into operational count and adjustment of stack element sizes, potentially skewing time/input byte metrics. The decision against capping aims to maintain practical script lengths without compromising the integrity of performance data relative to input sizes.

Benchmarking strategies discussed also include the removal of unnecessary overhead from evaluations, such as not calling EvalScript() to focus solely on input byte performance. This methodological clarity is important for ensuring benchmarks accurately measure intended performance aspects without extraneous noise.

The conversation touches upon the complexity of deriving meaningful data from benchmarks, highlighting challenges such as the numerous variables involved and the initial assumptions made regarding hashing and operation costs. Interestingly, personal observations on hashing algorithm performance are shared, noting a surprising finding that RIPEMD might perform faster than SHA256 on certain hardware, despite typically being slower and having limitations in newer script versions.

Additionally, there's an exploration of assigning compute units per operation and per byte based on benchmark outcomes, indicating a departure from previous models that either didn't account for operation costs or oversimplified them. This nuanced approach reflects a deeper dive into understanding the real-world implications of script execution efficiency and the potential need for reevaluating hardcoded operational cost factors in programming environments.

Link to Raw Post

Thread Summary (6 replies)

Nov 7 - Nov 14, 2025