I have already written two posts here and here about using LC0's WDL in order to evaluate the sharpness of a position. This post is an explanation of the idea behind it and how it can be used.
Idea
Not all equal chess positions are the same.
One position could be equal because no side has any winning chances. Another position could be equal because both sides could win the position. The basic idea behind the sharpness score is to quantify how easy the position is to mess up for either side.
Since the WDL stores the probability of winning, drawing and losing, a sharpness score can be calculated which increases as the winning and losing probability increase.
Calculation of the Score
In my first post, I came up with some random formula but then a LC0 developer reached out to me and they were working on something similar which was the WDL contempt feature. They used the following formula:
Note that in my earlier post I didn't square the formula which should have been the case.
This is the formula I will be using in this post and going forward.
Using the Score
Openings
The first thing I thought about with the sharpness score is to compare different openings. Having a single number reflecting the sharpness of a position can help to chose a specific opening for a game.
Here are the sharpness scores for some 1.e4 openings:
This chart agrees quite well with my understanding of the sharpness of the various openings. The Morphy Defence is slightly sharper than the Berlin and the Marshall Attack is sharper than both of them. The Open Sicilian is sharper than all these Ruy Lopez lines but the sharpest option I looked at was the King’s Gambit.
I also looked at some 1.d4 openings:
The first thing to note is that 1.d4 is considered less sharp than 1.e4 which isn’t surprising. What surprised me a bit was that the Queen’s Gambit Declined is considered sharper than 1.d4 and that the Grünfeld Defence isn’t considered to be very sharp. It’s less surprising that the sharpest options are the Leningrad Dutch, the King’s Indian Defence and the Budapest Gambit.
Piece Exchanges
Another interesting thing to look into is how piece exchanges effect the sharpness of a position. Our intuition would say that fewer pieces on the board mean a less sharp position. I decided to test this with a position from a recent game of mine:
This position is certainly equal but it's not a dead draw. The sharpness score is 0.31 which means that not much is happening but the score is also not 0, meaning that the position isn’t a clear draw.
Removing the rooks and queens makes the position much duller which is reflected in a lower sharpness score of 0.16. So when the position gets simplified, the sharpness score is reduced.
It’s good to see that the sharpness score agrees with my human understanding of sharp positions.
Sharpness and Evaluation
I think that sharpness can be very interesting if it's combined with the evaluation of a position. We all know that there are many different types of positions that are +1 according to Stockfish. So my hope is that the sharpness score provides some way to differentiate between them.
Take a look at the following position from the game Caruana-Vachier-Lagrave, 2021:
White is a piece and three pawns down but Black has massive problems with their development. Analysing this position with Stockfish gives an evaluation of about -0.5.
Now take a look at the following position:
Stockfish also evaluates this position as around -0.5 but the two positions are very different in nature.
Looking at the sharpness score gives a good indication of the differences in the positions. The Caruana-MVL position has a sharpness of 1.8 and the second position only of 0.14.
Conclusion
I find it fascinating to use engines in novel ways and I think that the sharpness scores can give additional insight into a position.
If you haven’t seen it already, check out LC0’s WDL contempt feature which uses something similar to simulate Leela playing against lower rated opponents.
If you have any questions or ideas how to use the sharpness score, let me know.
I was also surprised that the model does not regard the Grunfeld as a sharp opening. The model may have been thrown off by the fact that, while the Grunfeld has a very high potential sharpness indeed, not every line is sharp. In many lines best play involves a very early queen trade leading directly to an endgame. In that respect, it resembles the Berlin! Ee can effectively exclude such lines by asking: how sharp is the Grunfeld middlegame? Very sharp, of course!
This is really interesting topic. I've been always interested on how to calculate more human-assessment scores. Now, you've done this from an engine-driven approach. The next step is to compare it with real human games. With this data-driven approach, you could get somehow, something like a blunder probability score. If both engine and data scores correlate well, then this can be really useful to stuff like building an opening repertoire, for example.
Another comment, we can certainly assume that a position sharpness is highly influenced by ratings. You could adjust the the number of nodes according to the intended rating range (a simply linear function would work at beginning I guess). That should also reflect how some openings are dangerous at some level but drawish in other