I was also surprised that the model does not regard the Grunfeld as a sharp opening. The model may have been thrown off by the fact that, while the Grunfeld has a very high potential sharpness indeed, not every line is sharp. In many lines best play involves a very early queen trade leading directly to an endgame. In that respect, it resembles the Berlin! Ee can effectively exclude such lines by asking: how sharp is the Grunfeld middlegame? Very sharp, of course!
also, there are many forced draws in the Grünfeld. And certain results count as low sharpness, no matter whether its king vs king or a forced draw on a board full of pieces
Is this not about how one defines the sharpness of an opening.
What is the opening exactly? A position? A prefix sequence to some tail position.
My understanding is that shrapness using LC0 measures is about the continuation profile from a given positoin. I might not have read carefully, how from a positoin sharpness one can assign a definition of sharpness to "an opening." So when chess players with actual experience with the "opening", they might not be talking about the same thing. They might be talking about the way it is played, likely about deeper positions where the sharpness has been committed (or the non-sharpness has been lost). What am I missing? a lot sure.
This is really interesting topic. I've been always interested on how to calculate more human-assessment scores. Now, you've done this from an engine-driven approach. The next step is to compare it with real human games. With this data-driven approach, you could get somehow, something like a blunder probability score. If both engine and data scores correlate well, then this can be really useful to stuff like building an opening repertoire, for example.
Another comment, we can certainly assume that a position sharpness is highly influenced by ratings. You could adjust the the number of nodes according to the intended rating range (a simply linear function would work at beginning I guess). That should also reflect how some openings are dangerous at some level but drawish in other
I'm looking into the correlation between human games. One thing I'm currently trying to figure out how to best deal with theory. For example, the Najdorf Poisoned Pawn Variation leads to very sharp positions, but GMs will hardly ever make mistakes, since they know the moves. So I would like to first be able to filter out theory moves. I'm working on that.
Adjusting for ratings is a bit difficult I think, since lowering the number of nodes could also lead LC0 to believe that a position is rather drawish, because it's missing some resources from one side. But rating could certainly be a factor in the blunder probability score you suggested. The team behind Maia chess also tried to predict blunders, so you can look into that if you are interested.
Evaluation of position sharpness is something that I was looking for, for a very long time, so thanks for writing about it :).
I really like the general idea of using LC0 WDL to assess sharpness. But I have some doubts about the formula used (both the one proposed first by you and the LC0 contempt). If you compare WDLs of three positions and apply the LC0 function, you will get (assuming I typed it correctly into my spreadsheet):
W D L Contempt
333 227 440 24.22
50 50 900 37.98
10 10 980 42.88
My intuitive feeling would be that the first position is sharpest, because any result is almost equally possible. And that the last position is least sharp, because it is almost sure win for Black. But the number suggest the opposite outcome.
I was thinking how to address it, and came upon the following idea: WDL are probabilities of a variable taking value of 1, 0.5 or 0, respectively. If we know probabilities of these values, we may calculate standard deviation and that standard deviation would be the sharpness.
If we apply this idea to the example above, we are getting:
Trying to evaluate the sharpness for one-sided positions is certainly a bit of a problem.
I have looked quite a bit in the past at such edge cases and I haven't found any good solution yet. My original formula rescaled the sharpness with the minimum of the win and loss rates in order to reduce the sharpness of your second and third example. This worked but it felt a bit artificial.
Another possible way would be to rescale the sharpness with the evaluation of the position. A potential problem with this solution is that boring drawn positions might get a rescaled sharpness which is higher than a sharper but more one-sided position. I haven't looked into it too much but it shouldn't be a real problem if you only compare positions with similar evaluations.
Your approach is interesting, but I would say that a sharp position means that the draw rate is low and the win and loss rates are high and roughly equal. So I would calculate sharpness in a way that differentiates between draws and decisive results. But that's just my opinion on sharpness.
I've taken a crack at this myself a bit, for my site chessbook. I used a similar std dev approach, but multiplied it by (1 - draw_rate), and then multiplied it by 2. That gives a nice 0->1 range, scales down for draws, and also doesn't treat lopsided positions as high in sharpness. Might check all the boxes? Could be missing something though.
Multiplying it by any constant definitely cannot do any harm. Multiplying by (1 - p_draw) would reduce the sharpness for drawish positions even more than my formula which I think should be fine. Maybe to make yourself confident that it is good, you could try sampling about 10 positions which differ a lot between my formula and your formula and check with which score you tend to agree more?
PS: congrats for including the "this move aligns with your repertoire" feature in chessbook! :-). (Sorry for the off-topic, I don't want to hijack the topic.)
I was also surprised that the model does not regard the Grunfeld as a sharp opening. The model may have been thrown off by the fact that, while the Grunfeld has a very high potential sharpness indeed, not every line is sharp. In many lines best play involves a very early queen trade leading directly to an endgame. In that respect, it resembles the Berlin! Ee can effectively exclude such lines by asking: how sharp is the Grunfeld middlegame? Very sharp, of course!
also, there are many forced draws in the Grünfeld. And certain results count as low sharpness, no matter whether its king vs king or a forced draw on a board full of pieces
Yes, it's probably the case that the engine would go into a less sharp line of the Grünfeld and therefore gives it a low sharpness score.
Is this not about how one defines the sharpness of an opening.
What is the opening exactly? A position? A prefix sequence to some tail position.
My understanding is that shrapness using LC0 measures is about the continuation profile from a given positoin. I might not have read carefully, how from a positoin sharpness one can assign a definition of sharpness to "an opening." So when chess players with actual experience with the "opening", they might not be talking about the same thing. They might be talking about the way it is played, likely about deeper positions where the sharpness has been committed (or the non-sharpness has been lost). What am I missing? a lot sure.
This is really interesting topic. I've been always interested on how to calculate more human-assessment scores. Now, you've done this from an engine-driven approach. The next step is to compare it with real human games. With this data-driven approach, you could get somehow, something like a blunder probability score. If both engine and data scores correlate well, then this can be really useful to stuff like building an opening repertoire, for example.
Another comment, we can certainly assume that a position sharpness is highly influenced by ratings. You could adjust the the number of nodes according to the intended rating range (a simply linear function would work at beginning I guess). That should also reflect how some openings are dangerous at some level but drawish in other
I'm looking into the correlation between human games. One thing I'm currently trying to figure out how to best deal with theory. For example, the Najdorf Poisoned Pawn Variation leads to very sharp positions, but GMs will hardly ever make mistakes, since they know the moves. So I would like to first be able to filter out theory moves. I'm working on that.
Adjusting for ratings is a bit difficult I think, since lowering the number of nodes could also lead LC0 to believe that a position is rather drawish, because it's missing some resources from one side. But rating could certainly be a factor in the blunder probability score you suggested. The team behind Maia chess also tried to predict blunders, so you can look into that if you are interested.
Very instructive, thank you.
Evaluation of position sharpness is something that I was looking for, for a very long time, so thanks for writing about it :).
I really like the general idea of using LC0 WDL to assess sharpness. But I have some doubts about the formula used (both the one proposed first by you and the LC0 contempt). If you compare WDLs of three positions and apply the LC0 function, you will get (assuming I typed it correctly into my spreadsheet):
W D L Contempt
333 227 440 24.22
50 50 900 37.98
10 10 980 42.88
My intuitive feeling would be that the first position is sharpest, because any result is almost equally possible. And that the last position is least sharp, because it is almost sure win for Black. But the number suggest the opposite outcome.
I was thinking how to address it, and came upon the following idea: WDL are probabilities of a variable taking value of 1, 0.5 or 0, respectively. If we know probabilities of these values, we may calculate standard deviation and that standard deviation would be the sharpness.
If we apply this idea to the example above, we are getting:
W D L std dev
333 227 440 0.87
50 50 900 0.48
10 10 980 0.22
and this seems to work better.
What do you think?
Trying to evaluate the sharpness for one-sided positions is certainly a bit of a problem.
I have looked quite a bit in the past at such edge cases and I haven't found any good solution yet. My original formula rescaled the sharpness with the minimum of the win and loss rates in order to reduce the sharpness of your second and third example. This worked but it felt a bit artificial.
Another possible way would be to rescale the sharpness with the evaluation of the position. A potential problem with this solution is that boring drawn positions might get a rescaled sharpness which is higher than a sharper but more one-sided position. I haven't looked into it too much but it shouldn't be a real problem if you only compare positions with similar evaluations.
Your approach is interesting, but I would say that a sharp position means that the draw rate is low and the win and loss rates are high and roughly equal. So I would calculate sharpness in a way that differentiates between draws and decisive results. But that's just my opinion on sharpness.
I've taken a crack at this myself a bit, for my site chessbook. I used a similar std dev approach, but multiplied it by (1 - draw_rate), and then multiplied it by 2. That gives a nice 0->1 range, scales down for draws, and also doesn't treat lopsided positions as high in sharpness. Might check all the boxes? Could be missing something though.
```
pub fn get_sharpness(wdl: &WDL) -> f64 {
let p_win = wdl.win as f64 / 1000.;
let p_draw = wdl.draw as f64 / 1000.;
let p_loss = wdl.loss as f64 / 1000.;
let mean = 1.0 * p_win + 0.5 * p_draw + 0.0 * p_loss;
let variance = p_win * (1.0 - mean).powi(2)
+ p_draw * (0.5 - mean).powi(2)
+ p_loss * (0.0 - mean).powi(2);
return variance.sqrt() * (1. - p_draw) / 0.5;
}
```
Hi Marcus,
Multiplying it by any constant definitely cannot do any harm. Multiplying by (1 - p_draw) would reduce the sharpness for drawish positions even more than my formula which I think should be fine. Maybe to make yourself confident that it is good, you could try sampling about 10 positions which differ a lot between my formula and your formula and check with which score you tend to agree more?
PS: congrats for including the "this move aligns with your repertoire" feature in chessbook! :-). (Sorry for the off-topic, I don't want to hijack the topic.)