I'm currently building a chess engine and in this series of posts, I write about the progress I made and how I implemented the various different parts needed to get a working engine. If you're new and don't know where to start, I'd recommend reading this post which gives a general overview of how chess engines work.
As mentioned a couple of times previously, the move generation in my engine is still quite slow. While playing my first game against the engine, I noticed that this was a big bottleneck and the engine only managed to calculate to depth 5 in a few seconds. This made it clear to me that performance improvements should be my top priority.
The most complicated and hence slowest part of the move generation is the generation of sliding moves, i.e. the moves of bishops, rooks and queens. While my old implementation avoided going through each square along the lines of a sliding piece to find possible blockers, it still involved many calculations. So I looked at what the top engines are using and most of them generate the sliding moves with a technique called magic bitboards.
The implementation of magic bitboards took longer to implement than I had hoped, because the debugging proved to be quite difficult. But the speedup I got from it definitely made it worth it.
Magic Bitboards
While the name sounds quite mysterious, the idea behind magic bitboards is quite simple. Instead of computing all the moves of a rook (or bishop or queen) on the fly, they can be computed once for each possible starting square and then stored in a table. To generate the moves, one then simply needs to look in the table for the starting square and find the correct bitboard.
But the last step is actually a bit more nuanced than it first appears to be. The logical choice for the index of the table would be the configuration of the pieces along the file and rank of the rook. After all, the possible moves only depend on where the blocking pieces are standing. But the pieces are represented as a 64 bit number, while the file and rank only include 15 squares. Using the bitboard of the blockers along the files and ranks, the table would need 2^64 (which is more than 18 quintillion) entries, which is way too many. So the index has to be a bit more sophisticated.
This is where the "magic" comes into play.
To compress the 64 bit number, we need a function that maps it to a smaller number of the length we need. This is done by generating a so-called magic number for each square. Multiplying the magic number with the configuration of the blockers gives an index where the moves of the rook can be stored. One needs to verify that there aren't any unwanted collisions with this method. The magic numbers can be found via brute force once and then hard-coded into the program.
With these magic numbers, the engine calculates the appropriate lookup tables for each square on startup.
With these tables, the move generation is reduced to looking at the entry in the table. This is great for the speed, but was difficult to debug, since mistakes in the move generation aren't linked to any chess related logic.
Example
The explanation above might be a bit confusing, so I think that looking at an example of how the lookup tables with magic numbers works in practice is a good idea.
Imagine that we have a sports team with three players and each player has a number. Now we want to store them in a table and be able to index the players by their numbers. Suppose we have the following number and player combinations:
17: Player A
41: Player B
54: Player C
If we would just use the numbers as indices, we would need to create a table with 99 entries to be able to store each number that might occur. This is clearly very inefficient.
Instead we can compute a magic number number for this team. When we multiply this magic number with a player number, the first digit of the result should be either 1, 2 or 3 and there should be no duplicates.
After some trial and error we find the magic number 6:
17 * 6 = 102, giving the index 1
41 * 6 = 246, giving the index 2
54 * 6 = 324, giving the index 3
We can now store the magic number 6 and refer to each player by the index calculated as above. If we want to find the player with the number 41, we can calculate 41 * 6, only look at the first digit and then use this digit as the index for the table lookup.
Note that starting the list with the smallest value and filling the list up doesn't scale well, since one would need to order the numbers and also search through them. The single multiplication is much quicker.
Speed Improvement
The whole point of changing the sliding move generation was to improve the performance of the engine, so let's see how this has changed.
I wrote a function to test the move generation of the engine to ensure that I didn't introduce any bugs and to compare the speed of the engine. Using the old move generation for sliding pieces, the engine took 289 seconds to compute all the moves. After implementing the magic bitboards, the time was reduced to 54 seconds!
The move generation is now over 5 times faster and I hope that the speed will at least be decent for now. There are more improvements I'd like to try, but for now I'm looking forward to implementing new features instead of trying to optimize the engine.
Next Steps
As I've already mentioned, I want to address the problems I had with the horizon effect next. I'll also look into move ordering while I try to implement quiescence search.
After this is done, I'll implement the UCI protocol to be able to interface with the engine in GUIs instead of the command line. I also want to create a Lichess bot then.