This is really cool to see, probably the first time I've seen Magnus' endgame superiority in numbers. Have you published this dataset anywhere because I have a few questions, 1) how many games are there in this dataset? and 2) how many moves did the games last after the equal position?
the second question is because I'm curious if Magnus wins his endgames by grinding out wins vs. opponent mistakes
I chose the intersection to be empty, so to be precise: one dataset are all of Carlsen's games where he reached an equal ending. The other dataset are all games with equal endings, where one player is rated above 2800, their opponent is rated below 2800 and Carlsen is not playing.
I read your answers backward for some reason.. my bad. thanks for all your replies. I should clean up my other replies.. but instead I apologize here, and warning you.
There I was considering the effect of overlap. Whether it was meaningful to have or not overlap.. but then ratio statistics and all. Now will have to think on completely empty intersection of players. I guess this is not rating pool of games statistics. One could use one set as generalizable reference basis to look at another set, assuming the base set is generally representative. I get lost in my own ruminations sometimes.
move by move for the EGTB challenge would interest me. I initially thought that you might have been trying to rate the last positions at draw resign, given that I was still having echoes of top players drawing in such endgames, therefore... having using 2800 target rating "point of view" method.
Also, reiterating my curiosity about more visualiztino about the data set construction.. If that is something others might enjoy as well. I am always learning.
I am not sure about connecting the relative win and losses of the graphs on one hand, with the sentence that precedes a method choice (which is also not fully understood from language yet), where it is said as antecedant (arguing for the method consequent), that top level players would draw on equal endgames. What do I miss? (really).
I think that I missed a word in that sentence. I just meant that equal endings will end as draws most of the time (as they are by definition equal positions) and so it's difficult to say beforehand which score is "good" for a top player.
So arguing that you can only use hard non-draws as measure.
Now I also get the "these endings" in relation to EGTB. That is enlarging the mini-game initial conditions to not just equal (as you have defined them, which just forgot, no worries it will come back) but to expected perfect chess from EGTB portrait of all legal candidates from the start of the minigame, and then use that as basis to measure human player outcomes.. Something like that..
Also welcome to the missing words (or phrases for me) club.
Is this a possible question. Could we have a look at the game rating distribution on both sides (pre-game, i guess that is obvious, but given how thick I might get, does not hurt).
This might be related to my question about the set relation between Carlsen game set and the 2800 on one side of the pairs game set. I am trying to get my head around the dataset, and I have no experience with such 2 players data set (to be honest, that might be exactly my wobble).
Edit: just read paragraphs cluing me more on that question. like C. rating. and then some statement about others performance rating. Being relative to the data set (s) (union?) of 470 "and" 1000 games. I am still thinking that visualizing pairing distributions (is that even a thing or possible, imagining perhaps a 2D domain and histogram over that.
As Carlsen was always the number 1 player in the time I looked at, I decided to use for the 2800 dataset only games where one player was above 2800 (and not Carlsen) and the other one was below 2800, to simulate the rating difference.
I didn't worry too much about it afterwards and just calculated the performance rating to make sure that there weren't any anomalies
So the 1000 games do not include C.. as any of the players.
But the C. games could have some of the players in the 1000 game opposite to Carlsen.
Actually the more the better perhaps.
my ruminations:
This is not about predicting anything, just characterizing the datasets.
I worry irrationally about mixing "training" and "testing" influences. I guess that is also a worry when having ratio statistics, but I need to reign that in. Words tend to make me lose visualization understanding after a while. Tripping over words. or words not be telling enough of the not only sequential logic. Here the logic is "graphical". Crossings in some tabular set relations. I guess math language and internet communication is a chore. Sufficiently blocking communication fluency. I am fine though. never mind.
This is really cool to see, probably the first time I've seen Magnus' endgame superiority in numbers. Have you published this dataset anywhere because I have a few questions, 1) how many games are there in this dataset? and 2) how many moves did the games last after the equal position?
the second question is because I'm curious if Magnus wins his endgames by grinding out wins vs. opponent mistakes
great post again btw!
I used all classical games by Carlsen and the other 2800 players and filtered out the games with equal endgames.
There are around 470 Carlsen games and 1000 games by other 2800 players.
I haven't looked into the number of moves after the start of the endgame but it's an interesting question.
just to clarify, what is the intersection between the 470 games and the 1000 games.
I guess I have no clure of Carlsen rating there.. and the nature of the Carlsen games.
It might be linguistic "and" in all classical games by C. "and" the other 2800 players.
I might not be used to what that would mean off the bat. Not blocking.
I chose the intersection to be empty, so to be precise: one dataset are all of Carlsen's games where he reached an equal ending. The other dataset are all games with equal endings, where one player is rated above 2800, their opponent is rated below 2800 and Carlsen is not playing.
I read your answers backward for some reason.. my bad. thanks for all your replies. I should clean up my other replies.. but instead I apologize here, and warning you.
There I was considering the effect of overlap. Whether it was meaningful to have or not overlap.. but then ratio statistics and all. Now will have to think on completely empty intersection of players. I guess this is not rating pool of games statistics. One could use one set as generalizable reference basis to look at another set, assuming the base set is generally representative. I get lost in my own ruminations sometimes.
move by move for the EGTB challenge would interest me. I initially thought that you might have been trying to rate the last positions at draw resign, given that I was still having echoes of top players drawing in such endgames, therefore... having using 2800 target rating "point of view" method.
Also, reiterating my curiosity about more visualiztino about the data set construction.. If that is something others might enjoy as well. I am always learning.
Looking at how well top players are playing table base endings is something I'd like to do in the future
I am not sure about connecting the relative win and losses of the graphs on one hand, with the sentence that precedes a method choice (which is also not fully understood from language yet), where it is said as antecedant (arguing for the method consequent), that top level players would draw on equal endgames. What do I miss? (really).
I think that I missed a word in that sentence. I just meant that equal endings will end as draws most of the time (as they are by definition equal positions) and so it's difficult to say beforehand which score is "good" for a top player.
So arguing that you can only use hard non-draws as measure.
Now I also get the "these endings" in relation to EGTB. That is enlarging the mini-game initial conditions to not just equal (as you have defined them, which just forgot, no worries it will come back) but to expected perfect chess from EGTB portrait of all legal candidates from the start of the minigame, and then use that as basis to measure human player outcomes.. Something like that..
Also welcome to the missing words (or phrases for me) club.
Is this a possible question. Could we have a look at the game rating distribution on both sides (pre-game, i guess that is obvious, but given how thick I might get, does not hurt).
This might be related to my question about the set relation between Carlsen game set and the 2800 on one side of the pairs game set. I am trying to get my head around the dataset, and I have no experience with such 2 players data set (to be honest, that might be exactly my wobble).
Edit: just read paragraphs cluing me more on that question. like C. rating. and then some statement about others performance rating. Being relative to the data set (s) (union?) of 470 "and" 1000 games. I am still thinking that visualizing pairing distributions (is that even a thing or possible, imagining perhaps a 2D domain and histogram over that.
As Carlsen was always the number 1 player in the time I looked at, I decided to use for the 2800 dataset only games where one player was above 2800 (and not Carlsen) and the other one was below 2800, to simulate the rating difference.
I didn't worry too much about it afterwards and just calculated the performance rating to make sure that there weren't any anomalies
So the 1000 games do not include C.. as any of the players.
But the C. games could have some of the players in the 1000 game opposite to Carlsen.
Actually the more the better perhaps.
my ruminations:
This is not about predicting anything, just characterizing the datasets.
I worry irrationally about mixing "training" and "testing" influences. I guess that is also a worry when having ratio statistics, but I need to reign that in. Words tend to make me lose visualization understanding after a while. Tripping over words. or words not be telling enough of the not only sequential logic. Here the logic is "graphical". Crossings in some tabular set relations. I guess math language and internet communication is a chore. Sufficiently blocking communication fluency. I am fine though. never mind.