Statistical Methods and Algorithms in Fishtest

This document outlines the core statistical models, testing methodologies, and algorithms employed by Fishtest for chess engine evaluation and parameter tuning.

Engine Match Modeling

Fishtest utilizes advanced models to accurately represent the outcome probabilities in chess engine matches:

Pentanomial Model:
- Fishtest employs the pentanomial model for analyzing engine match results.
- This model considers five outcomes (e.g., ll, ld, dd/wl, wd, ww), providing a more accurate representation than the traditional trinomial model (win, draw, loss).
- Benefit: Leads to a substantial saving of testing resources compared to the trinomial model.
Opening Book Bias Estimation:
- The difference between the results predicted by the pentanomial and trinomial models allows for an estimation of the Root Mean Square (RMS) value of biases present in the opening book.
- This calculation is based on the accounting identity ^[archive].

Sequential Testing Framework (GSPRT)

For efficient sequential testing (determining if a change is statistically significant as quickly as possible), Fishtest uses the Generalized Sequential Probability Ratio Test (GSPRT):

Core Principle (GSPRT):
- Fishtest uses the GSPRT, a generalization of the standard SPRT.
- In GSPRT, unknown parameters within the statistical hypotheses (H0 and H1) are replaced by their maximum likelihood estimates.
- It's shown (loc. cit.) that GSPRT asymptotically behaves like SPRT when the error probabilities approach zero.
- The asymptotic guarantees of GSPRT do not depend on the Elo differences being tested being small.
Normalized Elo for Bounds:
- Test bounds in Fishtest are expressed in terms of "normalized Elo".
- Advantage: This normalization makes the expected duration of a test dependent only on the chosen bounds, independent of the specific draw ratio or opening book used.
- See this document ^[archive] for details.
Elo Estimation from GSPRT Results:
- To estimate Elo differences from completed GSPRT tests, Fishtest uses formula (6.1) in this document ^[archive].
- The underlying model approximates the GSPRT process as a random walk, specifically using formula (2.1) from this document ^[archive].
- This approximation essentially treats the GSPRT like an SPRT for the drift of a Brownian motion, where the infinitesimal variance is estimated from the sample data.
SPRT Calculator and Resource Estimation:
- The web-based SPRT calculator and the resource consumption estimation tables rely on formulas detailed in this document ^[archive].

Parameter Tuning

Fishtest uses the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm for parameter tuning process:

Normalized parameter space:
- Users choose per‑axis c_i so θ_i ± c_i Δ_i gives a small, measurable Elo gap (≈ a few Elo); this is equivalent to use normalized parameters φ_i = θ_i / c_i with the Elo gap computed with φ_i ± Δ_i.
- Gradient signal. wins − losses over the task supplies the two‑sided finite‑difference signal; for small Elo gaps, g_φ_i ≈ wins − losses · Δ_i (constant coefficients absorbed into r_i).
- In φ-space the learning rate r_i has the same magnitude order, we can use a global learning rate value.
- Mapped back to the θ-space, the θ‑update is Δθ_i = r_i · c_i · g_φ_i.
- Newton‑like behavior. Per‑axis c_i encodes axis scale/curvature, yielding diagonal quasi‑Newton steps in θ (larger on flatter axes, smaller on steeper ones) even while tuning with a global learning rate.
Parallel/out‑of‑order updates:
- Many heterogeneous workers run in parallel.
- Each task is tied to its dispatch snapshot and applies its update on arrival (order‑agnostic).

Quality Control

To ensure the reliability of test results contributed by distributed workers, Fishtest uses a χ² (Chi-squared) test to detect statistically anomalous workers whose results deviate significantly from the norm.

Validation and Simulation

Many of the formulas and statistical models used in Fishtest have been validated through simulation:

Statistical Methods and Algorithms in Fishtest ​

Engine Match Modeling ​

Sequential Testing Framework (GSPRT) ​

Parameter Tuning ​

Quality Control ​

Validation and Simulation ​