Thursday 25 April 2024

Call center analytics by using speech-to-text and generative AI

 How many times did you call customer support? I have to say that I do it a lot. It may be related to the wrong delivery I got from the supermarket (I think I already have their phone number on my seed dial) or maybe an insurance company when I need to deal with different insurances I have (car, house). Actually, I do it a lot.

Now this is the perspective of the customer. But let's think for a second about the perspective of the call center. They probably get hundreds or even thousands of calls every day. 

How exactly do they know if the problem was resolved? What was the sentiment of the call? If the manager wants to get a quick summary of the call, should he listen to all the conversations with the agent?

This was probably the situation in the near past. Managers randomly picked the calls to monitor their agents. 

This is no longer the situation. Thanks to machine learning, we can fully automate the process. 

We need to have several building blocks for this task. 

1. We need to convert the recording of the call to text.

2. We need to get insights from the produced text.

3. We need to store it for further analysis.

The good news is that you can just purchase the solution that does all of this for you and consume it as a SaaS service. The service name is Amazon Connect, and it provides all of the above and much more, including the telephone infrastructure.

In addition to the full solution, Amazon provides you with the ability to use building blocks to create your own solution. This is exactly what this blog is about. We will create a simple solution to get insights from the contact center recordings. 

The solution is very simplified, and its goal is to get the reader familiar with the technology. The reader should not see it as a production-ready solution.

The solution will do the following:.

1. Transcribe the recordings to text by using the Amazon Transcribe service.

2. Get insights from the text by using one of the large language models available on Amazon Bedrock.

3. Store the results as a JSON document in Amazon DynamoDB. You can use this data as a source for analytics or management facing dashboards.

4. In addition, the JSON object will be sent by email.

Lets describe the architecture.





We will start the process by uploading the audio file to the S3 bucket.

S3 allows you to define the events when a new object is uploaded. Once upload object event happens, we call the Lambda to invoke the transcription job on the uploaded file. 

This job is asynchronous, so we will use Amazon EventBridge to get notifications about the completion of the job. Once it is completed, the EventBridge will invoke another Lambda.

This Lambda will call Amazon Bedrock LLM to summarize the call, get the sentiment and tone of the call, and return the result as JSON.

Eventually, the JSON will also be sent to the Amazon SNS service (pub/sub solution), which has built-in integration with the Amazon Email service. This allows the subscriber to get the JSON by email.

The whole system is completely serverless. There are no virtual machines or kubernetes that you need to run and manage. 

In fact, everything that I had to write is the code for two Lambdas.

You can find all the source code in this Github repository. I also created an Amazon CDK project that can deploy all the components directly into your AWS account. 

Just follow the README instruction.



Thursday 28 March 2024

Playing chess with the Generative AI Large Language Model (or at least trying)

Generative AI came into our lives as a storm, and it is indeed a revolutionary technology. Moreover, this technology becomes better and faster with each model that is released. Today we will examine Clause Sonet on Amazon Bedrock. It was released only several weeks ago and already performs remarkably. 

I decided to put the LLM to the test by doing something that it was completely not designed to do. Play chess. 

For people who actually know a little bit about how exactly the LLM models are trained, it is clear that at some point the training data also included data related to the chess games. After all, the providers claim that the models are trained with the most common resources, like Wikipedia, Reddit, StackOverflow, and others. And indeed, there are articles about chess games on Wikipedia.

I started by actually thinking about how to present the board. We need to visualize the board, but LLM operates with text (mostly).

So I came up with the following output:


       A.    B.    C.    D.    E.    F.    G.   H. 

1. ["R", "N", "B", "Q", "K", "B", "N", "R"], 

2. ["P", "P", "P", "P", "P", "P", "P", "P"], 

3. ["_", "_", "_", "_", "_", "_", "_", "_"], 

4. ["_", "_", "_", "_", "_", "_", "_", "_"], 

5. ["_", "_", "_", "_", "_", "_", "_", "_"], 

6. ["_", "_", "_", "_", "_", "_", "_", "_"], 

7. ["p", "p", "p", "p", "p", "p", "p", "p"], 

8. ["r", "n", "b", "q", "k", "b", "n", "r"] 


I also instructed the LLM what is the meaning of the board.

 Black queen presented by "Q".

 White queen presented by "q".

 Black king presented by "K".

 White king presented by "k".

 Black pawn presented by "P".

 White pawn presented by "p".

 Black knight presented by "N".

 White knight presented by "n".

 Black bishop presented by "B".

 White bishop presented by "b".

 Black rook presented by "R".

 White rook presented by "r".


I also provided the LLM with the rules of chess game and with several examples (hopefully it will learn better following the examples).

This is the situation of the game and human should play now. 
       A.    B.    C.    D.    E.    F.    G.    H. 
1. ["R", "N", "B", "Q", "K", "B", "N", "R"], 
2. ["P", "P", "P", "P", "P", "P", "P", "P"], 
3. ["_", "_", "_", "_", "_", "_", "_", "_"], 
4. ["_", "_", "_", "_", "_", "_", "_", "_"], 
5. ["_", "_", "_", "_", "_", "_", "_", "_"], 
6. ["_", "_", "_", "_", "_", "_", "_", "_"], 
7. ["p", "p", "p", "p", "p", "p", "p", "p"], 
8. ["r", "n", "b", "q", "k", "b", "n", "r"] 

Human prompts: pawn moves from A7 to A6. 
The boards after the move looks like this.
       A.    B.    C.    D.    E.    F.    G.    H. 
1. ["R", "N", "B", "Q", "K", "B", "N", "R"], 
2. ["P", "P", "P", "P", "P", "P", "P", "P"], 
3. ["_", "_", "_", "_", "_", "_", "_", "_"], 
4. ["_", "_", "_", "_", "_", "_", "_", "_"], 
5. ["_", "_", "_", "_", "_", "_", "_", "_"], 
6. ["p", "_", "_", "_", "_", "_", "_", "_"], 
7. ["_", "p", "p", "p", "p", "p", "p", "p"], 
8. ["r", "n", "b", "q", "k", "b", "n", "r"] 

You response by moving your pawn from E2 to E4. 
Remember you can only move pieces that are represented as upper case laters. The dashboard after this move looks like this. 
       A.    B.    C.    D.    E.    F.    G.    H. 
1. ["R", "N", "B", "Q", "K", "B", "N", "R"], 
2. ["P", "P", "P", "P", "_", "P", "P", "P"], 
3. ["_", "_", "_", "_", "_", "_", "_", "_"], 
4. ["_", "_", "_", "_", "P", "_", "_", "_"], 
5. ["_", "_", "_", "_", "_", "_", "_", "_"], 
6. ["p", "_", "_", "_", "_", "_", "_", "_"], 
7. ["_", "p", "p", "p", "p", "p", "p", "p"], 
8. ["r", "n", "b", "q", "k", "b", "n", "r"] 


The full prompt that I used can be found here

Now lets see what did I get.


We can see that my first command was processed well.

But when LLM responded, he actually moved my piece instead using his piece.

I tried to correct the situation




I tried to correct the LLM and he actually agreed what he is not playing according to rules, but the result was still not good. 

Starting this point I spend another half an hour trying to teach/educate/correct the LLM to do correct moves, but it didn't work.

I could not hold myself and tested the same prompt on ChatGPT.

And here are the results.



Wow, looks very promising. Lets continue.

A disappointment. The dashboard didn't contain my previous move and also AI claimed that he moved to some direction but nothing changed on board. 


Conclusion:

The result is no less than amazing!! Yes, the AI didn't provide the correct results, but it was never designed to do this and was still able to provide some kind of result that makes sense.

In addition, let's not forget that I didn't use the latest and greatest models. I used GPT-3.5 and not GPT-4. And also Claude Sonet, not Claude Opus. It could be that if I had access to those models, the results would be better.

Probably, if I used more examples or fine-tuned the models on a lot of few-shot examples, the results would be better. It can be a very interesting challenge that the reader is welcome to try on its own.