Chess Engine Lab

May 20

I used all classical games by Carlsen and the other 2800 players and filtered out the games with equal endgames.

There are around 470 Carlsen games and 1000 games by other 2800 players.

I haven't looked into the number of moves after the start of the endgame but it's an interesting question.

Expand full comment

just to clarify, what is the intersection between the 470 games and the 1000 games.

I guess I have no clure of Carlsen rating there.. and the nature of the Carlsen games.

It might be linguistic "and" in all classical games by C. "and" the other 2800 players.

I might not be used to what that would mean off the bat. Not blocking.

Expand full comment

I chose the intersection to be empty, so to be precise: one dataset are all of Carlsen's games where he reached an equal ending. The other dataset are all games with equal endings, where one player is rated above 2800, their opponent is rated below 2800 and Carlsen is not playing.

Expand full comment

Jun 22Edited

I read your answers backward for some reason.. my bad. thanks for all your replies. I should clean up my other replies.. but instead I apologize here, and warning you.

There I was considering the effect of overlap. Whether it was meaningful to have or not overlap.. but then ratio statistics and all. Now will have to think on completely empty intersection of players. I guess this is not rating pool of games statistics. One could use one set as generalizable reference basis to look at another set, assuming the base set is generally representative. I get lost in my own ruminations sometimes.

Expand full comment

move by move for the EGTB challenge would interest me. I initially thought that you might have been trying to rate the last positions at draw resign, given that I was still having echoes of top players drawing in such endgames, therefore... having using 2800 target rating "point of view" method.

Also, reiterating my curiosity about more visualiztino about the data set construction.. If that is something others might enjoy as well. I am always learning.

Expand full comment

Looking at how well top players are playing table base endings is something I'd like to do in the future

Expand full comment

I am not sure about connecting the relative win and losses of the graphs on one hand, with the sentence that precedes a method choice (which is also not fully understood from language yet), where it is said as antecedant (arguing for the method consequent), that top level players would draw on equal endgames. What do I miss? (really).

Expand full comment

I think that I missed a word in that sentence. I just meant that equal endings will end as draws most of the time (as they are by definition equal positions) and so it's difficult to say beforehand which score is "good" for a top player.

Expand full comment

So arguing that you can only use hard non-draws as measure.

Now I also get the "these endings" in relation to EGTB. That is enlarging the mini-game initial conditions to not just equal (as you have defined them, which just forgot, no worries it will come back) but to expected perfect chess from EGTB portrait of all legal candidates from the start of the minigame, and then use that as basis to measure human player outcomes.. Something like that..

Also welcome to the missing words (or phrases for me) club.

Expand full comment

Jun 22Edited

Is this a possible question. Could we have a look at the game rating distribution on both sides (pre-game, i guess that is obvious, but given how thick I might get, does not hurt).

This might be related to my question about the set relation between Carlsen game set and the 2800 on one side of the pairs game set. I am trying to get my head around the dataset, and I have no experience with such 2 players data set (to be honest, that might be exactly my wobble).

Edit: just read paragraphs cluing me more on that question. like C. rating. and then some statement about others performance rating. Being relative to the data set (s) (union?) of 470 "and" 1000 games. I am still thinking that visualizing pairing distributions (is that even a thing or possible, imagining perhaps a 2D domain and histogram over that.

Expand full comment

As Carlsen was always the number 1 player in the time I looked at, I decided to use for the 2800 dataset only games where one player was above 2800 (and not Carlsen) and the other one was below 2800, to simulate the rating difference.

I didn't worry too much about it afterwards and just calculated the performance rating to make sure that there weren't any anomalies

Expand full comment