TLDR

Benchmarking Bitcoin Script Evaluation for the Varops Budget (Great Script Restoration)

Posted by ajtowns

Nov 10, 2025/09:02 UTC

When compiling with benchmarks enabled, using the -DBUILD_BENCH=ON option, it is crucial to configure build options meticulously to ensure that benchmark results are comparable. Specifically, setting CMAKE_BUILD_TYPE to something other than Debug can significantly impact the results. Additionally, adopting ifdef DEBUG and ifdef USE_GMP in place of if within various val64 code segments is advisable for precise conditional compilation.

The sharing of benchmark results across different systems could greatly enhance the analysis process. To facilitate this, establishing a Git repository where contributors can submit their data via Pull Requests would be beneficial. However, some issues have been noted in the current benchmarking process, such as script errors indicating operations by zero, which may result in unreliable data. Addressing these errors should be a priority to ensure the validity of the benchmarks.

The utility of the varops budget in limiting script execution time has come into question, especially when the budget caps at 100%. This cap appears to limit the usefulness of the data in evaluating the accuracy of varops ratios. For instance, tests involving XOR operations hit the varops limit regardless of input size, thus failing to provide valuable insights into relative performance across different input sizes. This limitation suggests a potential area for methodological improvement.

An analysis of hashing results under a varops budget less than 100% has yielded preliminary calculations for constant-and-per-byte factors for various opcodes. These calculations could be instrumental in revising hardcoded varops factors or in assessing the impact of specific operations on hardware performance. Notably, comparisons among ripemd, sha256, and hash160 have revealed surprising performance characteristics, suggesting further investigation into operation costs and stack setup might yield more accurate benchmarks.

In summary, refining benchmark compilation settings, addressing script errors, and reconsidering the application of the varops budget could significantly improve the reliability and utility of benchmarking data in evaluating and optimizing script execution performance. Moreover, detailed analysis of per-hardware calculations for opcode factors offers a promising avenue for enhancing understanding of script performance implications across different hardware configurations.

Link to Raw Post

Thread Summary (13 replies)

Nov 7 - Dec 18, 2025