Thursday 28 March 2024

Playing chess with the Generative AI Large Language Model (or at least trying)

Generative AI came into our lives as a storm, and it is indeed a revolutionary technology. Moreover, this technology becomes better and faster with each model that is released. Today we will examine Clause Sonet on Amazon Bedrock. It was released only several weeks ago and already performs remarkably. 

I decided to put the LLM to the test by doing something that it was completely not designed to do. Play chess. 

For people who actually know a little bit about how exactly the LLM models are trained, it is clear that at some point the training data also included data related to the chess games. After all, the providers claim that the models are trained with the most common resources, like Wikipedia, Reddit, StackOverflow, and others. And indeed, there are articles about chess games on Wikipedia.

I started by actually thinking about how to present the board. We need to visualize the board, but LLM operates with text (mostly).

So I came up with the following output:


       A.    B.    C.    D.    E.    F.    G.   H. 

1. ["R", "N", "B", "Q", "K", "B", "N", "R"], 

2. ["P", "P", "P", "P", "P", "P", "P", "P"], 

3. ["_", "_", "_", "_", "_", "_", "_", "_"], 

4. ["_", "_", "_", "_", "_", "_", "_", "_"], 

5. ["_", "_", "_", "_", "_", "_", "_", "_"], 

6. ["_", "_", "_", "_", "_", "_", "_", "_"], 

7. ["p", "p", "p", "p", "p", "p", "p", "p"], 

8. ["r", "n", "b", "q", "k", "b", "n", "r"] 


I also instructed the LLM what is the meaning of the board.

 Black queen presented by "Q".

 White queen presented by "q".

 Black king presented by "K".

 White king presented by "k".

 Black pawn presented by "P".

 White pawn presented by "p".

 Black knight presented by "N".

 White knight presented by "n".

 Black bishop presented by "B".

 White bishop presented by "b".

 Black rook presented by "R".

 White rook presented by "r".


I also provided the LLM with the rules of chess game and with several examples (hopefully it will learn better following the examples).

This is the situation of the game and human should play now. 
       A.    B.    C.    D.    E.    F.    G.    H. 
1. ["R", "N", "B", "Q", "K", "B", "N", "R"], 
2. ["P", "P", "P", "P", "P", "P", "P", "P"], 
3. ["_", "_", "_", "_", "_", "_", "_", "_"], 
4. ["_", "_", "_", "_", "_", "_", "_", "_"], 
5. ["_", "_", "_", "_", "_", "_", "_", "_"], 
6. ["_", "_", "_", "_", "_", "_", "_", "_"], 
7. ["p", "p", "p", "p", "p", "p", "p", "p"], 
8. ["r", "n", "b", "q", "k", "b", "n", "r"] 

Human prompts: pawn moves from A7 to A6. 
The boards after the move looks like this.
       A.    B.    C.    D.    E.    F.    G.    H. 
1. ["R", "N", "B", "Q", "K", "B", "N", "R"], 
2. ["P", "P", "P", "P", "P", "P", "P", "P"], 
3. ["_", "_", "_", "_", "_", "_", "_", "_"], 
4. ["_", "_", "_", "_", "_", "_", "_", "_"], 
5. ["_", "_", "_", "_", "_", "_", "_", "_"], 
6. ["p", "_", "_", "_", "_", "_", "_", "_"], 
7. ["_", "p", "p", "p", "p", "p", "p", "p"], 
8. ["r", "n", "b", "q", "k", "b", "n", "r"] 

You response by moving your pawn from E2 to E4. 
Remember you can only move pieces that are represented as upper case laters. The dashboard after this move looks like this. 
       A.    B.    C.    D.    E.    F.    G.    H. 
1. ["R", "N", "B", "Q", "K", "B", "N", "R"], 
2. ["P", "P", "P", "P", "_", "P", "P", "P"], 
3. ["_", "_", "_", "_", "_", "_", "_", "_"], 
4. ["_", "_", "_", "_", "P", "_", "_", "_"], 
5. ["_", "_", "_", "_", "_", "_", "_", "_"], 
6. ["p", "_", "_", "_", "_", "_", "_", "_"], 
7. ["_", "p", "p", "p", "p", "p", "p", "p"], 
8. ["r", "n", "b", "q", "k", "b", "n", "r"] 


The full prompt that I used can be found here

Now lets see what did I get.


We can see that my first command was processed well.

But when LLM responded, he actually moved my piece instead using his piece.

I tried to correct the situation




I tried to correct the LLM and he actually agreed what he is not playing according to rules, but the result was still not good. 

Starting this point I spend another half an hour trying to teach/educate/correct the LLM to do correct moves, but it didn't work.

I could not hold myself and tested the same prompt on ChatGPT.

And here are the results.



Wow, looks very promising. Lets continue.

A disappointment. The dashboard didn't contain my previous move and also AI claimed that he moved to some direction but nothing changed on board. 


Conclusion:

The result is no less than amazing!! Yes, the AI didn't provide the correct results, but it was never designed to do this and was still able to provide some kind of result that makes sense.

In addition, let's not forget that I didn't use the latest and greatest models. I used GPT-3.5 and not GPT-4. And also Claude Sonet, not Claude Opus. It could be that if I had access to those models, the results would be better.

Probably, if I used more examples or fine-tuned the models on a lot of few-shot examples, the results would be better. It can be a very interesting challenge that the reader is welcome to try on its own.