Thursday 25 April 2024

Call center analytics by using speech-to-text and generative AI

 How many times did you call customer support? I have to say that I do it a lot. It may be related to the wrong delivery I got from the supermarket (I think I already have their phone number on my seed dial) or maybe an insurance company when I need to deal with different insurances I have (car, house). Actually, I do it a lot.

Now this is the perspective of the customer. But let's think for a second about the perspective of the call center. They probably get hundreds or even thousands of calls every day. 

How exactly do they know if the problem was resolved? What was the sentiment of the call? If the manager wants to get a quick summary of the call, should he listen to all the conversations with the agent?

This was probably the situation in the near past. Managers randomly picked the calls to monitor their agents. 

This is no longer the situation. Thanks to machine learning, we can fully automate the process. 

We need to have several building blocks for this task. 

1. We need to convert the recording of the call to text.

2. We need to get insights from the produced text.

3. We need to store it for further analysis.

The good news is that you can just purchase the solution that does all of this for you and consume it as a SaaS service. The service name is Amazon Connect, and it provides all of the above and much more, including the telephone infrastructure.

In addition to the full solution, Amazon provides you with the ability to use building blocks to create your own solution. This is exactly what this blog is about. We will create a simple solution to get insights from the contact center recordings. 

The solution is very simplified, and its goal is to get the reader familiar with the technology. The reader should not see it as a production-ready solution.

The solution will do the following:.

1. Transcribe the recordings to text by using the Amazon Transcribe service.

2. Get insights from the text by using one of the large language models available on Amazon Bedrock.

3. Store the results as a JSON document in Amazon DynamoDB. You can use this data as a source for analytics or management facing dashboards.

4. In addition, the JSON object will be sent by email.

Lets describe the architecture.





We will start the process by uploading the audio file to the S3 bucket.

S3 allows you to define the events when a new object is uploaded. Once upload object event happens, we call the Lambda to invoke the transcription job on the uploaded file. 

This job is asynchronous, so we will use Amazon EventBridge to get notifications about the completion of the job. Once it is completed, the EventBridge will invoke another Lambda.

This Lambda will call Amazon Bedrock LLM to summarize the call, get the sentiment and tone of the call, and return the result as JSON.

Eventually, the JSON will also be sent to the Amazon SNS service (pub/sub solution), which has built-in integration with the Amazon Email service. This allows the subscriber to get the JSON by email.

The whole system is completely serverless. There are no virtual machines or kubernetes that you need to run and manage. 

In fact, everything that I had to write is the code for two Lambdas.

You can find all the source code in this Github repository. I also created an Amazon CDK project that can deploy all the components directly into your AWS account. 

Just follow the README instruction.



No comments:

Post a Comment