Skip to content

Depth vs. TC

Newer and older results showing the average depth for games at fishtest conditions

NewOld

Elo cost of small Hash

We measure the influence of Hash on the playing strength, using games of SF15.1 at LTC (60+0.6s) and VLTC (240+2.4s) on the UHO book. Hash is varied between 1 and 64 MB and 256MB in powers of two, leading to as average hashfull between 100 and 950 per thousand. The data suggests that keeping the average hashfull below 30% is best to maintain strength.

Raw data for the above graph
HashHashfullEloElo-err
641090.000.00
32199-3.8013.00
163360.7012.80
8513-10.7011.00
4689-21.5013.30
2825-29.5013.10
1902-47.808.80
HashHashfullEloElo-err
2561310.000.00
128239-1.007.50
64397-0.806.60
32591-12.106.10
16766-21.407.30
8865-32.304.20
4931-52.406.20
2943-67.405.70
1947-95.206.60

Elo cost of using MultiPV

MultiPV provides the N best moves and their principal variations. This is a great tool for understanding the options available in a given position. However, this information does not come for free and the cost of computing it reduces the quality of the best move found compared to a search that only needs to find a single line.

MultiPVEloElo-err
10.00.0
2-97.22.1
3-156.72.8
4-199.32.9
5-234.52.8

Engine: Stockfish 15.1
Time control: 60s+0.6s
Book: UHO


Elo gain using MultiPV at fixed depth

MultiPVEloElo-errPointsPlayed
10.013496.530614
245.73.115388.030697
353.93.515732.530722
459.53.215862.530479
563.73.616078.530604

Time control: 580s+5.8s
Depth: 18


Elo gain using syzygy

TB6 testing for various versions of SF

Consistent measurement of Elo gain (syzygy 6men vs none) for various SF versions:

TB are in RAM (so fast access), TC is 10+0.1s (STC), book UHO_XXL_+0.90_+1.19.epd. No adjudication. The introduction of NNUE (with SF12) is clearly visible. With SF15, there is just 2.7 Elo gain.

Raw data for the above graph
SFEloElo-err
614.51.4
715.61.3
815.81.3
916.51.5
1016.21.5
1115.81.5
127.21.4
1311.11.4
147.31.4
152.71.4

Testing depending on number of pieces and TC

Tested at 10+0.1, with all syzygy WDL files on tmpfs (i.e. RAM), testing using none(0), 4, 5, and 6 man TB in a round-robin tournament (SF10dev).

RankNameElo+/-GamesScoreDraws
1syzygy61328259151.8%59.5%
2syzygy5228259050.3%59.4%
3syzygy4-728259149.0%59.3%
4syzygy0-728259248.9%59.4%

Tested at 60+0.6, with all syzygy WDL files on tmpfs (i.e. RAM), testing using none(0) against 6 man TB:

Score of syzygy6 vs syzygy0: 4084 - 3298 - 18510 [0.515] 25892 Elo difference: 10.55 +/- 2.25


Threading efficiency and Elo gain.

Efficiency

Here we look at the threading efficiency of the lazySMP parallelization scheme. To focus on the algorithm we play games with a given budget of nodes rather than at a given TC. In principle, lazySMP has excellent scaling of the nps with cores, but practical measurement is influenced by e.g. frequency adjustments, SMT/hyperthreading, and sometimes hardware limitation.

Equivalent nodestime

In these tests, matches are played at a fixed nodes budget (using the nodestime feature of SF), and equivalence in strength between the serial player and the threaded player (for x threads in the graph below) is found by adjusting the number of nodes given to the threaded player (e.g. with 16 threads, the threaded player might need 200% of the nodes of the serial player to match the strength of the serial player). This 'equivalent nodestime' is determined for various number of threads and various nodes budgets (60+0.6Mnodes/game is somewhat similar to our usual LTC at 60+0.6s/game, if we assume 1Mnps).

The interesting observation one can make immediately is that this 'equivalent nodestime' grows with the number of threads, but not too steeply, and further more that the 'equivalent nodestime' decreases with increasing nodes budget. The data shows that with 64 threads, the equivalent nodestime is about 200% for a node budget of 240+2.4Mn, i.e. despite such games being much faster than STC (10+0.1s), efficiency is still around 50%.

The curves are sufficiently smooth to be fitted with a model having 1 parameter that is different between the curves (f(x), parameter a, see caption). A smaller value of a means a higher efficiency.

A fit for the a parameter, and extrapolation to long TCs.

The above parameter a from the model, can be fit as a function of nodes budget, this allows for extrapolating the parameter, and to arrive at and estimate for the 'equivalent nodestime' at large TC / nodes budgets:

The fit is again fairly good. Taking a leap of faith, these measurements at up to 240+2.4Mn can be extrapolated to node budgets typical of TCEC or CCC (up to 500Gn). This allows us to predict speedup and/or efficiency.

SpeedupEfficiency

These extrapolations suggest that even at thread counts of >300, at TCEC TCs efficiency could be 80% or higher, provided the nps scales with the number of threads.

Elo results (older)

LTC

Playing 8 threads vs 1 thread at LTC (60+0.6, 8moves_v3.pgn):

Score of t8 vs seq: 476 - 3 - 521  [0.737] 1000
Elo difference: 178.6 +/- 14.0, LOS: 100.0 %, DrawRatio: 52.1 %

Playing 1 thread at 8xLTC (480+4.8) vs (60+0.6) (8moves_v3.pgn):

Score of seq8 vs seq: 561 - 5 - 434  [0.778] 1000
Elo difference: 217.9 +/- 15.8, LOS: 100.0 %, DrawRatio: 43.4 %

Which is roughly 82% efficiency (178/218).

STC

Playing 8 threads vs 1 thread at STC (10+0.1):

Score of threads vs serial: 1606 - 15 - 540  [0.868] 2161
Elo difference: 327.36 +/- 14.59

Playing 8 threads @ 10+0.1 vs 1 thread @ 80+0.8:

Score of threads vs time: 348 - 995 - 2104  [0.406] 3447
Elo difference: -66.00 +/- 7.15

So, 1 -> 8 threads has about 83% scaling efficiency (327 / (327 + 66)) using this test.


Elo from speedups

For small speedups (<~5%) the linear estimate can be used that gives Elo gain as a function of speedup percentage (x) as:

Elo_stc(x) = 2.10 x
Elo_ltc(x) = 1.43 x

To have 50% passing chance at STC<-0.5,1.5>, we need a 0.24% speedup, while at LTC<0.25,1.75> we need 0.70% speedup. A 1% speedup has nearly 85% passing chance at LTC.

Raw data:

tc 10+0.1:
16   32.42  3.06
 8   13.67  3.05
 4    8.99  3.04
 2    3.52  3.05

tc 60+0.6:
16   20.85  2.59
 8   12.20  2.57
 4    4.67  2.57

_Note: Numbers will depend on the precise hardware. The model was verified quite accurately on fishtest see https://github.com/locutus2/Stockfish-old/commit/82958c97214b6d418e5bc95e3bf1961060cd6113#commitcomment-38646654_


Distribution of lengths of games at LTC (60+0.6) on fishtest

In a collection of a few million games, the longest was 902 plies.


Win-Draw-Loss statistics of LTC games on fishtest

The following graph gives information on the Win-Draw-Loss (WDL) statistics, relating them to score and material count. It answers the question 'What fraction of positions that have a given score (and material count) in fishtest LTC, have a Win or a Draw or a Loss ?'.

This model is used when Stockfish provides WDL statistics during analysis with the UCI_ShowWDL option set to True, as well as for the normalization of Stockfish's evaluation that ensures that a score of "100 centipawns" means the engine has a 50% probability to win from this position in selfplay at fishtest LTC time control. For details see the WDL model repo.


Equivalent time odds and normalized game pair Elo

A suitable measure to define the Elo difference between two engines is normalized game pair Elo as defined from the pentanomial statistics by:

python
def normalized_game_pair_elo(row):
    return -100 * np.log10((2 * row['pntl0'] + row['pntl1']) / (2 * row['pntl4'] + row['pntl3']))

It is nearly book independent, and thus a good measure of relative strength of two engines at a given TC. To express more clearly what a given strength difference implies. We use 'equivalent time odds', i.e. the TC factor needed to have equivalent strength, i.e. zero Elo difference in a match between two engines (which is independent of the definition of Elo used).

We see that at STC the equivalent time odds is about 6x for SF14 vs SF17, while at LTC this time odds factor has become 16x.

Raw data for the above graph
=======================================  UHO_Lichess_4852_v1 =======================================
   engine1        tc1    engine2        tc2        elo  pntl0  pntl1  pntl2  pntl3  pntl4    ngp_Elo
      sf17     10+0.1       sf14   10.0+0.1     165.29     10    432   6509  25598   3291     185.24
      sf17     10+0.1       sf14   40.0+0.4      41.74    251   5110  16632  13516    331      40.25
      sf17     10+0.1       sf14   60.0+0.6       7.47    395   7724  17826   9736    159       7.22
      sf17     10+0.1       sf14   80.0+0.8     -17.52    569  10075  17691   7408     97     -16.88
=======================================          noob_3moves =======================================
   engine1        tc1    engine2        tc2        elo  pntl0  pntl1  pntl2  pntl3  pntl4    ngp_Elo
      sf17     10+0.1       sf14   10.0+0.1     108.47      6    610  16073  16012   3139     155.43
      sf17     10+0.1       sf14   40.0+0.4      10.85    100   3004  27471   5089    176      23.00
      sf17     10+0.1       sf14   60.0+0.6      -4.87    165   4048  28312   3257     58     -11.33
      sf17     10+0.1       sf14   80.0+0.8     -15.02    219   4892  28520   2184     25     -37.76
=======================================  UHO_Lichess_4852_v1 =======================================
   engine1        tc1    engine2        tc2        elo  pntl0  pntl1  pntl2  pntl3  pntl4    ngp_Elo
      sf17     60+0.6       sf14   60.0+0.6     163.96      1    194   5269  29060   1316     220.87
      sf17     60+0.6       sf14  240.0+2.4      88.09     25   2021  14134  19482    178      98.13
      sf17     60+0.6       sf14  360.0+3.6      63.06     41   3212  16546  15938    103      69.03
      sf17     60+0.6       sf14  480.0+4.8      46.39     72   4243  17703  13760     62      50.03
=======================================          noob_3moves =======================================
   engine1        tc1    engine2        tc2        elo  pntl0  pntl1  pntl2  pntl3  pntl4    ngp_Elo
      sf17     60+0.6       sf14   60.0+0.6      71.55      0    131  22234  12279   1196     204.92
      sf17     60+0.6       sf14  240.0+2.4      19.23      0    436  31090   4231     83     100.37
      sf17     60+0.6       sf14  360.0+3.6      11.51      3    616  32255   2938     28      68.25
      sf17     60+0.6       sf14  480.0+4.8       7.08      6    716  32949   2149     20      47.81

Elo gain with time odds

See also: https://github.com/official-stockfish/Stockfish/discussions/3402

NewOld

One year of NNUE speed improvements

Presents nodes per second (nps) measurements for all SF version between the first NNUE commit (SF_NNUE, Aug 2th 2020) and end of July 2021 on a AMD Ryzen 9 3950X compiled with make -j ARCH=x86-64-avx2 profile-build. The last nps reported for a depth 22 search from startpos using NNUE (best over about 20 measurements) is shown in the graph. For reference, the last classical evaluation (SF_classical, July 30 2020) has 2.30 Mnps.


The impact of efficient (incremental) updates (NNUE)

As measured with SF17dev (dev-20230824-4c4cb185), disabling the update_accumulator_incremental() functionality.

Speedup:

Result of  10 runs
==================
base (./stockfish.master       ) =    1287575  +/- 8703
test (./stockfish.patch        ) =     696064  +/- 3451
diff                             =    -591511  +/- 7318

speedup        = -0.4594
P(speedup > 0) =  0.0000

CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor

Which corresponds to -67.55 ± 9.5 Elo on fishtest with the UHO book at LTC.


Round-robin tournament with SF releases, impact of book and time odds

Measured playing games of 5+0.05s, with SF 7 - 15, using the three different books. Each version plays once with the base TC, and once with 20% time odds.

Raw data for the above graph

UHO

SFElo20%-oddsElo-errOdds-err
SF70.038.10.04.1
SF895.840.34.25.9
SF9142.338.83.95.5
SF10199.538.24.05.7
SF11231.240.54.35.7
SF12405.637.54.05.9
SF13476.528.44.26.0
SF14553.427.84.56.3
SF15627.624.54.66.7

noob

SFElo20%-oddsElo-errOdds-err
SF70.037.80.03.8
SF897.239.84.35.9
SF9146.840.53.95.9
SF10211.139.34.36.2
SF11241.843.04.46.0
SF12458.432.14.36.2
SF13536.231.94.26.3
SF14611.329.14.56.5
SF15660.924.84.36.2

8moves

SFElo20%-oddsElo-errOdds-err
SF70.033.00.04.2
SF886.732.24.25.8
SF9126.737.34.05.6
SF10182.333.74.35.6
SF11206.542.64.05.4
SF12380.731.64.15.6
SF13445.825.04.05.7
SF14512.423.84.15.9
SF15554.526.44.15.9

Branching factor of Stockfish

The branching factor ($B_f$) of Stockfish is defined such that $\text{nodes} = B_f^{\text{rootDepth}}$ or equivalently $B_f = \exp\left(\frac{\log(\text{nodes})}{\text{rootDepth}}\right)$. Here, this has been measured with a single search from the starting position.

The trend is the deeper one searches the lower the branching factor, and newer versions of SF have a lower branching factor. A small difference in branching factor leads to very large differences in number of nodes searched. For example, Stockfish 10 needs about 338x more nodes than Stockfish 17 to reach depth 49.

chartbf

Raw data for the above graph
DepthSF_9SF_10SF_11SF_ClassicalSF_12SF_13SF_14SF_14.1SF_15SF_15.1SF_16SF_16.1SF_17SF_17.1
12020202020202020202020202020
24754545642424851456640444845
31521361471501538418115419112070697672
44952475744793032416308072641441019297512
51036115778298953257274110611449174131123124609
62148225027341161100474124481761395413034891815197752
73836448157802808168011693488545962693126156020963092019
86480784995681022447171849517269981104957912105256519843436
9129581184614134158017299389410602120531534785414500590134154748
10279332733818613252311557110144236762878523259209787548710336348620
11893876197837332362053230029662447513455140064290401305822753840215172
12161734809179954198957797125723260151103152577674120735836279551115734187
13257254160903161123159227979058998699443166195975516030859384315531497937544
142925162634202358571887741358901146021408902267981527369778973942479914796750495
15619466474032525778372631239380263818206290296769214622142447173344904548293064450
16101903468014365475040564442059731944232787646052548409119988730230513048211795380751
17127349810655311154718497031502334539694700175620141570807268004358119279903130041100673
182090089173629013335536027948883626051751254757674532873460298071406239378645278301242639
1938895242883012246604816973251163966849949146108910860241119774332202517102492584479846294310
20665173941869263442296222395422336761504284214255117313451494341557706768773898610631058356864
218289557480568051392402921191244581331683962296200221757521706387536338720831176108698143693162
221085941868278916158594488729830078783898518399212429394012709915119748913565831504022942656753875
23151238101173136397426205254745413564853548375163931707176231442471586189224013524868331239406901681
2420808306177831591217049558041285840781630039182035729001254519638223057293180221284948124421091179519
2529432182257593821657764198873658616999922613596635159575262723034032519123441646358265528510042042808
2638405658373556592941543515416329123678601329138913132115125408001015599735459475785705516065237773772915802
2761348538651933453533480116892508142004651761423017260982142536161175761938408117477520671358741823864365029
2891156568849660565236054522446945201517112294920323305850170866671495220442489207825201731509445794104505986
291520148431135269905674939731024928243362152754567032590861245375851652292266657879338607893554455425644841166
30208471933134803005802644134063616841377827529794713789742229100645175321069416686113734751280894262769696090037
3124077284219601538812096991843679803498725757000161843511005399890642132630912481255142910271639926870845466583727
3230561412424332715917641986060779582770927868323702466185509584761653532795113176152145080311916067191054818718066
3346034129852526127926281823010419655399594493121678252702653377472036642781773290475032153661628320792129750398882232
3462744449862672333634966265414711778713441159013150299111766961389119274519733693377730024887993317334551788704812279026
358778032141117294961437775571244005733167315794172011884149278426110476836550022903995379632167747367031062107766313673813
369907735301824606196547475749296743922216932118226846508281527921161111544661885615269723136605915450268343018408715440239
3717418684761948381278625189048309340804255329101286768418327695873324326807890763576361329739166657487499813665084416953238
382363551706360870637214322461603582555053789489344289102934152353094199540031044993458130072243595727579549723931041118760781
3931489662175554166307195963179141855382565426610057875084962481744548533155114670730410937386262509514806176774046475731534030
403652327064100146465702986844761647206461794186607736676898984219044627202823181038342120243575896029271050042734139995538626444
41522450225011274140350454368553680261693414764863559454546471610720260702641311327908422127205662989407491456154634918174341710464
42612707831713841000586593229351383622744015330857611746304445195380083410522139185886463481388535111179734931806071806731639448578868
431623262000317967835068778386283589636231119629278792372473217241629492512047299237042879211871445991433536411906060397082341165829957
44177188775032527382045596525278441952302791209961640533729242743072158370197067088187297720823308294915672910722765740210187124691721103
45257416221963058420184614557217345286228606825343622334495023334445337922121455704351125148935286366031203808982286695307139675067113349617
46404992363494078958496615067833369333444007850725620824943259313861256101139020620511928316855357211612241777135443906150146118621132959880
476211187375246215215573195581810659230842551576098620510088353213878803721074558545372230430028479309517286733914524498432151232944174172903
48868476345366380037595223005194627106412236587116913776136277129651263310712581908908023212535920658225438367143915641079230158042099236563617
49111558931172120597835455731245046062352086060221079212074152739841742213208512391795510773418110578839590469543884199887068496356737000379142240
5012153284805028477080465339365488774406643583111975030336544415436610938631427019654021119335316422735554425769633

Contempt measurements

Older SF (around SF10) had contempt that worked rather well. This data shows the dependence of Elo difference between SFdev of October 2018 and older versions of Stockfish depending on contempt value (The SFdev used is approx. 40Elo above SF9). Upper and lower bounds represent value with maximum error.

OpponentSTCLTC
7
8
9

Full data with values https://docs.google.com/spreadsheets/d/1R_eopD8_ujlBbt_Q0ygZMvuMsP1sc4UyO3Md4qL1z5M/edit#gid=1878521689


Elo change with respect to TC

Here is the result of some scaling tests with the 2moves book. 40000 games each (STC=10+0.1, LTC=60+0.6)

SF7 -> SF8SF8 -> SF9SF9 -> SF10
Elo STC95.91 +-2.358.28 +-2.371.03 +-2.4
Elo LTC100.40 +-2.168.55 +-2.165.55 +-2.2

So we see that the common wisdom that increased TC causes elo compression is not always true.

See https://github.com/official-stockfish/Stockfish/issues/1859#issuecomment-449624976


Discussed here https://github.com/official-stockfish/Stockfish/pull/2401#issuecomment-552768526


Elo contributions from various evaluation terms

See spreadsheet at: https://github.com/official-stockfish/Stockfish/files/3828738/Stockfish.Feature.s.Estimated.Elo.worth.1.xlsx

Note: The estimated elo worth for various features might be outdated, or might get outdated soon.