ATOMICC Computer Chess Testing: Controls

Hardware

AMD Phenom II 1100T 6 CPUs @ 3.7 GHz
16 GBs of memory @ 2,000 MHz

Match Controls

10' + 10" increment (10 minutes per game + 10 seconds per move)
1 CPU with 64-bit engines and SSE4 technology where possible
Ponder (permanent brain) = On
Hash = 512 MBs
Learning = Disabled
3-, 4-, and 5-man endgame bases on an internal SSD, using 64 MBs cache when applicable
Book = Silver Opening Suite. Both sides play each of the 50 openings as White and Black, making it a 100-game match.

Software

OS = Windows 7 Pro 64-bit, SP1
GUI = Fritz 13

Why these settings?

Increment play gives balance between time spent in the middlegame and the endgame. Using this method, engines can choose to use more time in the opening or middlegame (compared to 40-moves-in-X-minutes controls where the engines have the same amount of time in all phases), and if their clock runs out, they can play the endgame using the increment. By then, there are fewer pieces on the board, and the quality of the game should not suffer, especially when ponder is on and tablebases are being used. Additionally, not all engines do well with the 40/X time control, and I think it's nice to have a measurement of an engine's performance using another method.
I think this is a respectable time control and is long enough to produce quality games with my hardware. The standard benchmark for other chess engine test groups is based on an AMD X2 4600+ @ 2.4 GHz (or AMD64 4200+ @ 2.2 GHz). My machine benchmarks almost twice as fast as the standard 2.4 GHz PC, using Crafty Bench 19.17, so my computer processes the same amount of data in 40/21 as a "standard" 2.4 GHz PC does in 40/40. The average game is about 80 moves. The total thinking time each engine of mine will have in a typical 80-move game, with ponder=on, is 2,800 seconds, or about 47 minutes. If I were to participate in a 40/40 list with ponder=off, my computer's thinking time would be 42 minutes for 80 moves. Forty-two minutes = 2,520 seconds, which is still about 300 seconds less than the 2,800 seconds my engines will think in 80 moves using my time controls.
I test with only 1 CPU to minimize the scaling variable. I want to test an engine's ability to analyze a position, so removing its effectiveness (or lack thereof) of scaling up to more CPUs seems correct.
When ponder is on and combined with a time increment, the final product is a more human-like feel to the tournament and, maybe, better quality games.
Fewer time-outs thanks to the 10-second increment, compared to 40/X or G/X controls.
I am trying to minimize opening book bias. Some books are tuned for certain engines, and some are tuned for just the top engines' playing styles. The Silver Opening Suite is quite impressive thanks to the diligent work by GM John Nunn and Albert Silver. The fairness in these tests is also heightened by the fact that all engines are playing both sides of the same openings and that they are playing such a wide variety of style.
I could use Fischer Random Chess/Chess960 positions to minimize opening book bias, but not every engine supports it. Also, I want to see regular chess played, not FRC.
Having engines play a wide variety of openings is fun and refreshing for experienced testers, and 100-game matches where both sides play the same openings as White and Black are simply interesting.

Pages

Controls