Calculating the Sharpness of Different Players
How well does the sharpness score agree with the human intuition about sharpness?
I have recently written about a way to use Leela Chess Zero to get a sharpness score for a chess position and used it to compare the sharpness of different openings. The results mostly agreed with my understanding of the sharpness of the openings.
Now I want to use this score to see how sharp a player plays in a single game or on average.
How to calculate the sharpness of a player?
The most obvious idea is to take the average sharpness scores of the positions during the game. However, the problem with that approach is that the sharpness of a position depends on both players.
To illustrate this, imagine you want to evaluate the sharpness of a player playing with the black pieces. If their opponent decides to play the King's Gambit, there will almost certainly be a very sharp position on the board, regardless of what kind of moves Black is playing.
In order to avoid that problem, I decided to look at the change of sharpness after each move.
This is simply the sharpness after a move has been played minus the sharpness before a move has been played. For example, the starting position has a sharpness of 0.468. After 1.e4 the sharpness is 0.471, so White would have a sharpness change of 0.003 for 1.e4. The sharpness after 1.d4 is 0.450, so White would have a sharpness change of -0.018.
In the end, I take the average of all the changes in sharpness after every move a player has played in order to get their sharpness change score.
Looking at that change in sharpness as opposed to the sharpness of the position itself should make it possible to isolate the play of an individual player.
Sharpness for many players
The first thing I decided to test was looking at the average sharpness change per move of many players.
I did this by analysing the games of all Candidates tournaments since 2013 (not including the 2024 Candidates). This led to the following sharpness change scores for the players:
Note that the players have played a different number of games and all games are only from the candidates.
Some sharpness values seem a bit surprising, like Carlsen having the second highest sharpness or Firouzja having the lowest sharpness.
This can have many reasons, the main one is probably the small sample size. Looking at every game of a player would give a much better picture of their sharpness, but it would also take much too long to analyse all the positions. Note that long draws also reduce the sharpness quite a bit, especially with a smaller sample size.
I also only looked at Candidates tournaments, so some players might have played more conservatively in them compared to their usual play. In a similar way, the tournament situation can also play a role when only looking at games from a few tournaments.
Overall I think that I would need to analyse many more games of the individual players to make the sharpness change scores more comparable. I'll certainly look more into this in the future but as I said, this takes a lot of computing time.
Sharpness in a match
In order to remove some of the problems mentioned above and make the sharpness change of different players more comparable, I decided to look at a match between players.
Looking at a match makes the sharpness of different players easier to compare since they are playing the same games. When one game is a long draw without any changes in sharpness, it affects the average sharpness change for both players, so the scores should still be comparable.
There is still the problem with different match situations, but I'm unsure how this can easily be solved.
The first match that came to mind was Tal-Botvinnik, 1960 since the players had such different styles and I hoped that this would be reflected in the sharpness scores. And indeed, it was:
Tal's sharpness change is significantly higher than Botvinnik's which is a good illustration of the different styles of the two players.
2014 Magnus vs 2019 Magnus
The final thing I wanted to test for now was how the sharpness change of one player might change over time.
The first example that came to mind was Magnus Carlsen. In 2019, he had an amazing year and the most striking thing was that he changed his style quite a bit and played more for the initiative. I decided to compare his games from 2019 to the games of 2014 since he also played very well in that year and therefore the quality of his games should be similar.
I expected that Carlsen's play was much sharper in 2019 than in 2014 and this is also what the sharpness change score says:
Overall I’m very happy that the average sharpness score agrees quite well with my intuitive feeling about the sharpness of the players.
Final Remarks
In my previous post, I looked at the distribution of the centipawn loss per move as opposed to the more common metric of average centipawn loss since I felt like some insight gets lost when only computing the average.
Why then am I only looking at the average of the sharpness change score?
Whenever testing something new like this, I want to keep the first tests as simple as possible. I could have looked at the distributions (I will certainly look at them in the future) and might have gotten a more nuanced perspective of the sharpness of the players. However, this would have made comparisons between the players much more difficult. My foremost goal was to make sure that the sharpness change score agrees with my human understanding of sharpness. Using a singe number makes such a comparison much easier.
Another important point to consider when looking at the sharpness is its relation to the quality of play. As an example, imagine a completely drawn position in which one player makes a mistake and is much worse after that. Due to the mistake, the sharpness will be higher and therefore the player will be classified as sharper according to the measure I presented.
Now one can ask if this is a big issue with the score.
I think that it depends on the way you look at it. However, I don't think that this is too much of a problem since I think that all these engine metrics like accuracy or sharpness shouldn't be looked at in isolation. Instead, they should be looked at together to get a better picture of the game.
I’m very interested to know what you are thinking about the average sharpness change score, so let me know!
I have a basic question upstream of many of your articles, it may be very naive. I have seen many conversion curves by now, from lichess, to LC0, to SF (and maia). I may not have digged a lot, but I wonder each time, what they mean by centipawn axis value, being associated to certain odds.
I understand the odds already, perhaps as outcome statistics, but for me the natural lowest level association is to a game. Finally perhaps odds of player pairs might be gotten.
But I suspect in those places using either pawn as or some derived measure called positoin difficulty, I wonder how the positoins of a full game different possible centipawn are being intergrated over.
Here is an attempt. For all position in a pool of games for some set of events for OTB, or some period for online (24h/24h), one can calculate all games that visit such position (as FEN with given depth, or depth agnostic FEN, such as just EPDs), the game outcome value is part of the estimator data being intergrated by the statistics being shown on those curves. I don't know why I need to have that spelled out. maybe this is not even what it is. can you help. sorry. to barge in here.
This ought to be a really comment to one of the previous posts but I am not sure draw percentage is an indication of sharpness.
A really imbalanced position like Q vs 3 pieces or Q vs 2 rooks could really imply a decisive result. But this doesn’t mean the positions in the game are sharp.
For me sharpness means something like walking in a minefield - unless you make the only one or two moves your position becomes lost or your winning advantage disappears.
According to this definition of sharpness, a position could really be sharp in that unless White makes an insane computer move, Black comfortably draws. Despite this, the most likely human result is a draw.