NNUE PyTorch
Setup
Docker
Use Docker with the PyTorch container. This eliminates the need for local Python environment setup and C++ compilation. An alternative is Conda or Micromamba if Docker is not available, or if you want native Apple Silicon MPS acceleration. While Docker is available on Apple for CPU-Only testing, it does not support native MPS acceleration.
Prerequisites
For AMD Users:
- Docker
- Up-to-date ROCm driver
For NVIDIA Users:
- Docker
- Up-to-date NVIDIA driver
- NVIDIA Container Toolkit
For Apple Silicon Users (MPS):
- Native MPS acceleration does not work with Docker.
- See below for recommended setup or test with CPU only.
For CPU only (for testing purposes):
- Docker
For driver requirements, check Running ROCm Docker containers (AMD) or the PyTorch container release notes (Nvidia).
The container includes CUDA 12.x / ROCm 6.4.3 and all required dependencies. Your local CUDA/ROCm toolkit version doesn't matter.
Running the container
Use the provided script to build and start the container:
./run_docker.shYou'll be prompted to select the target GPU vendor (or CPU only for testing) and the path to your data directory, which will be mounted into the container. Once inside the container, you can run training commands directly. Also supports non-interactive workflows if all necessary arguments are given through the CLI.
Building the container will take it's time and disk space (~30-60GB)
Setup for Apple Silicon
- Up-to-date Package manager conda or micromamba is recommended.
- Create environment and activate (works the same with micromamba):
conda create -n nnue_pytorch -c pytorch -c conda-forge \ python=3.12 \ pytorch \ torchvision \ torchaudio \ compilers \ llvm-openmp \ jpeg \ libjpeg-turbo \ cmake \ make conda activate nnue_pytorch - Afterwards run:
pip install --no-cache-dir -r requirements.txt ./setup_script.sh
Network training and management
Hard way: wiki
Easier way: wiki
Logging
TODO: Move to wiki. Add setup for easy_train.py
tensorboard --logdir=logsThen, go to http://localhost:6006/
Automatically run matches to determine the best net generated by a (running) training
TODO: Move to wiki
python run_games.py --concurrency 16 --stockfish_exe ./stockfish.master --c_chess_exe ./c-chess-cli --ordo_exe ./ordo --book_file_name ./noob_3moves.epd run96Automatically converts all .ckpt found under run96 to .nnue and runs games to find the best net. Games are played using c-chess-cli and nets are ranked using ordo. This script runs in a loop, and will monitor the directory for new checkpoints. Can be run in parallel with the training, if idle cores are available.
Thanks
- Sopel - for the amazing fast sparse data loader
- connormcmonigle - https://github.com/connormcmonigle/seer-nnue, and loss function advice.
- syzygy - http://www.talkchess.com/forum3/viewtopic.php?f=7&t=75506
- https://github.com/DanielUranga/TensorFlowNNUE
- https://hxim.github.io/Stockfish-Evaluation-Guide/
- dkappe - Suggesting ranger (https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer)