From an afternoon hack to its own chess AI.
clrsrc didn't appear overnight - but almost. In about three months the path led from a Python engine written in one afternoon, through a Rust rewrite, to a standalone engine with a self-trained neural network that plays live on Lichess today. Here is the whole story: the tech, the hardware, the method - and why none of it would have been possible without a human-AI partnership.
Three months, one sprint
The public engine matured from version 1.0.0 to 1.1.1 in only about two and a half weeks - preceded by weeks of prototyping with the predecessor Jugernaut.
From Jugernaut to clrsrc
- ①Jugernaut 1.0 (Python). Written in one afternoon: a classic alpha-beta search with a hand-written evaluation. The training ground for understanding chess programming.
- ②Jugernaut (Rust). The same idea, rebuilt in Rust - considerably faster, with first experiments toward a neural evaluation.
- ③clrsrc. The clean restart: a standalone engine with its own search and a self-trained NNUE. Not a Stockfish clone - own code, own training data, GPL-3.0 open.
Why "clrsrc"?
A small retro nod to the craft: in the early C days (Turbo C / Borland,
conio.h) almost every program started with the call clrscr() -
"clear screen", to wipe the console. That became the project name: clr (clear) +
src (source) = "clear source" - the pure source.
How a neural network learns chess
An NNUE (Efficiently Updatable Neural Network) evaluates positions - and is trained in four steps:
- ▸Self-play. clrsrc plays itself millions of times and collects about 200M positions across several runs.
- ▸Teacher. Stockfish 17.1 re-evaluates every position ("knowledge distillation") - as a separate process over UCI. Stockfish itself is neither shipped nor linked into clrsrc; only the computed evaluation numbers flow in as training labels.
- ▸Training. The free trainer
bulletlearns a compact net from this (king buckets, SCReLU). - ▸Embedding. The finished net sits directly in the binary - a single
.exeplays at full strength.
Staying honest: a distilled net can approach its teacher, but on the learned distribution it cannot surpass it. That is exactly where the ongoing research starts - bigger nets, deeper training labels.
Observe, analyze, fix
An engine doesn't get stronger because you claim it does - but because you measure every change, find weaknesses from real games and eliminate bugs consistently.
Monitoring - distributed, not central
There is no central monitor: each AI instance watches its own domain, and the findings converge over the message bus.
- ▸Bot side. Watches the live games - connectivity audit (detects missed or aborted moves), post-game analysis (loss triage) and matchmaking pool logging.
- ▸Engine/data side. Datagen throughput and progress per device, plus the SPRT pipeline: every engine change is tested sequentially against the previous version before it goes live.
- ▸Training side. The val loss as an early warning for divergence/overfitting - a sanity check, explicitly not a strength metric.
- ▸Example. The bot suddenly played almost no games. The matchmaking pool logging showed at once: the opponent fetch was healthy (100 online), only an overly tight filter gate left 32 of them - not a network bug. Diagnosis in minutes instead of hours.
Game & bug analysis
- ▸Cold probe. A suspicious position is analyzed in a fresh engine process (reproducible, no carry-over from the hash table) and compared against a stronger "referee" (Stockfish).
- ▸Blind-spot scan. Targeted search for position types the NNUE systematically misjudges - e.g. an exposed own king or endgame optimism.
- ▸Eval-bias diagnosis. It cleanly separates whether a blunder is a search problem (horizon/time) or a data problem (NNUE evaluation) - which decides whether the fix belongs in the search code or in training.
- ▸Example. In won endgames the engine sometimes sacrificed queen or rook against the bare king - the win was never in danger, but it looked absurd. The cold probe showed: without a tablebase the search finds the mate cleanly; the tablebase's 50-move-rule metric had overwritten the mating move.
Bug fixing
Four examples following the pattern symptom → cause → fix → effect:
- ▸Mate thrown away into a draw. A mate entry in the warm hash table returned a non-progressing move. Fix: accept mate only after checking the winning line, plus a draw check at the leaf nodes. → +48.4 Elo.
- ▸Tactics seen too late. A search reduction ran before the pruning gates. Fix: one line reordered. → +55.9 Elo (100% LOS).
- ▸Opening book never hit. The en-passant bit always went into the Polyglot hash, even without a possible capture. Fix: mix it in only on a real ep capture. → book hits instead of none.
- ▸Material sacrifice against the bare king (see above). Fix: in winning positions, first search briefly for a forced mate, otherwise take the tablebase move. → no more sacrifices, verified strength-neutral.
- ▸Honesty included: a "TT aging" experiment was discarded again after 2,409 games via SPRT (−15 Elo). Not every good idea survives the statistics - and that's exactly what it's for.
The curated experience book
So the engine doesn't have to recompute every known position, an "experience book" collects deep search verdicts - and it's maintained like its own little data format.
- ▸What it is. An opening/experience book in the compact JBK2 format (32 bytes per entry: move, evaluation, win/draw/loss stats, source) - fed from engine self-play and real live-bot games (WDL "harvest").
- ▸How it's curated. An overlay collects new games; via
expmergethey move into the main book (priority configurable, home book first). A mandatory step removes provably bad opening moves ("poison entries") that self-learning would otherwise pick up again. - ▸Multi-source clean. Source bits record the origin per entry, separate evaluation fields prevent overwriting, and a "golden fixture" test guarantees byte-identical merges.
- ▸Key figures. Base book about 192,000 entries, generated from self-play and freely shareable; the live book grows continuously via harvest + merge.
A data center built from leftovers
The millions of training positions were not created in the cloud, but on a patched-together home cluster - desktop, old laptops, even smartphones under Termux, orchestrated over SSH - complemented by a rented VPS, the only node running all year round.
| Node | Processor | System |
|---|---|---|
| Desktop | Ryzen 9 9950X + RTX 5060 Ti | Windows 11 |
| Laptop Yoga | Core i7-1165G7 · AVX-512 | Debian |
| Laptop Device33 | Core i5-7200U | Debian |
| Laptop X230i | Core i3-3110M | Debian |
| Smartphone X1 | Dimensity 9300+ | Android · Termux |
| Smartphone X2 | Snapdragon 778G | Android · Termux |
| Smartphone X3 | Snapdragon 888 | Android · Termux |
| Smartphone Samsung | Snapdragon 888 | Android · Termux |
| TV box | ARMv8 (aarch64) | Android · Termux |
| VPS (rented) | x86-64 · AVX-512 | Linux · 24/7/365 |
When AIs talk to each other
Each subproject - engine, bot, training, website - has its own AI instance. To work in sync, they exchange facts over a lean message bus. clrsrc is the hub.
Some instances even respond autonomously to incoming messages - safeguarded by several brakes: limited reply depth, cooldowns, a daily limit, a budget cap and a kill switch. A human message resets the chain at any time.
Five specialists, one toolbox
Each instance has its own remit - and self-built tools for it, called skills in Claude Code jargon: small, recurring workflows that launch with a single command. What the four working instances can do:
clrsrc - engine & hub
The actual chess engine and at the same time the central instance through which all the others are coordinated.
Skills: expmerge-deploy (curate-merge the opening book from real bot games & ship it live), sprt (statistically A/B-test new versions), cold-probe (reproduce reported blunders in a fresh process), fleet-status (control the compute cluster). Rust hardening with Kani, Miri and Clippy.
bot - Lichess bot
Runs @clrsrc_lc0 live on Lichess and analyzes every game.
Skills: game-review (triage lost games: rating drift + eval trajectory to the tipping point), cold-probe (prove a finding move-by-move), report-finding (report & file findings uniformly), bot-health (live status, read-only). Plus occasional code-review and an hourly tournament poll.
nnue_train - net training
Trains the neural network and decides which candidate may proceed.
Skills: coverage-round (a complete training round), eval-net (check a candidate against the baseline), sprt-handoff (hand a net over for strength testing), build-book (a targeted opening book for data generation). Principle: selection by playing strength, not by training loss.
chess_engines - the yardstick
A curated collection of 57 reference engines with uniform technical profiles. Measures clrsrc's strength from real engine-vs-engine games - measured, not estimated.
Work: profile analysis (reading engine internals faithfully from source), cutechess tournaments (round-robin & gauntlet), source verification and an independent fact & code review of this website.
And the fifth instance? It built this site - static, no framework - and draws its numbers exclusively from the verified facts of the other four. Above them all sits shared tooling: the message bus with its shared fact log, an automatic check for new mail at the end of every reply, an autopilot for headless operation and a crash recovery that can resume any session with full context.
Who actually does what?
The human - architect & decision-maker
Sets the goals and quality bars, runs the hardware, keeps the legal guardrails and judges results with chess understanding.
Claude Code - the executing developer
Writes and refactors the code, builds tools, researches and documents - and runs partly autonomously within clear limits.
Thinking together
The biggest progress comes not from plain task processing, but from thinking together. The human brings a question or an observation; the instance digs in, checks hunches against the actual data and discards what doesn't hold. Often a long-held assumption flips - because someone measured instead of guessing. Human and AI here are not a command chain, but conversation partners: the human sets direction and limits, the instance delivers depth, evidence and sometimes pushback.
Working together and talking about it
The work happens in conversation: active brainstorming, open exchange, collecting ideas and sharpening them together. The special part is the short path - over the interinstanz bus the instances talk to each other directly, flexibly and fast. Questions, answers and evidence travel back and forth automatically, without anyone copying text from one chat into the next.
This is vibe coding in the serious variant:
not "let it generate blindly", but stating intent in natural language - and then proving every
engine change statistically via SPRT. Vibe meets the measuring stick.
Without this partnership the project wouldn't exist.
The endless positions
Chess can't be "brute-forced". There is a perfect solution - but it's practically unreachable.
The way out is two levers that multiply: search smarter (prune the search tree aggressively) and evaluate better (the NNUE). These two levers are the entire story of this project.
Open and clean
- ▸A standalone engine, open. clrsrc is not a fork - own search, own net -, is under
GPL-3.0and lies fully in source on GitHub. - ▸Built on the open-source community. Individual building blocks (e.g. the time management and the SIMD kernel of the evaluation) are adapted from other GPL engines like Stockfish, Stash and Viridithas - every source openly credited. That is exactly why clrsrc itself is under GPL-3.0.
- ▸Own training data. The positions are generated by self-play - no third-party bulk databases.
- ▸Teacher cleanly separated. During training Stockfish labels the positions as an external process; its code is neither shipped nor linked.
- ▸The bot licensed in its own right. The Lichess bot LiRu-Bot is a Rust port of the official lichess-bot (lichess-bot-devs) and is under AGPL-3.0, the engine under GPL-3.0. In the standard build both run as separate programs; in the embedded live build they form a combined work - no conflict, since AGPL covers GPL. Upstream credited.
- ▸Fair use of free data. Where Lichess data flows in, it is explicitly free (CC0).
Full acknowledgement of all projects from which code, algorithms or data formats originate is in the CREDITS file on GitHub ↗.
Where it's heading
- ▸Break through the NNUE plateau - larger net architectures and deeper training labels, each measured via SPRT.
- ▸Prepare an entry into the CCRL rating lists - only then will there be an official Elo number here.
- ▸Keep the bot
@clrsrc_lc0in stable continuous operation. - ▸A blog series out of this story: NNUE training, the smartphone cluster, AIs that talk, SPRT as a discipline.