Saturday, 5 April 2025

Game analysis with LLM, OpenSearch and multimodel embedding model

Enhancing Semantic Image Search with Amazon Bedrock and OpenSearch for Sports Analytics

Introduction

In today's blog, I'll explore how to build an advanced semantic image search system specifically tailored for sports analytics, building upon the excellent foundation provided by AWS in their original semantic image search blog . While the original solution leveraged Amazon Rekognition and Titan multimodal embedding models, I've enhanced it with Amazon Nova Pro capabilities to provide richer contextual analysis of basketball game footage.

The Original Solution

The AWS blog presented a comprehensive approach to semantic image search using:

  • Amazon Rekognition for extracting image labels and celebrity recognition

  • Amazon Titan multimodal embedding models for vector embeddings

  • Amazon OpenSearch for storing and retrieving image metadata

  • A user-friendly search interface

The solution allowed users to search for images semantically, beyond just keyword matching. The GitHub repository is available here .

The Enhancement Challenge

While the original architecture was solid, I found the image descriptions extracted by Amazon Rekognition lacked the detailed context needed for meaningful sports analytics. For example, a basketball image might be tagged with generic labels like "person," "ball," and "court" without capturing the action's intensity or significance.

Enhanced Architecture with Bedrock

To address these limitations, I replaced the Rekognition label extraction with an Amazon Bedrock-powered Lambda function that provides richer contextual analysis:

import os
import json
import base64
import boto3
import time

s3 = boto3.client("s3")
bedrock = boto3.client("bedrock-runtime")

def get_image_base64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def lambda_handler(event, context):
    try:
        # Download S3 image to local
        bucket = event["body"]["bucket"]
        key = event["body"]["key"]
            
        fileshort = os.path.basename(key)
        file_tmp = "/tmp/" + fileshort

        model_id="us.amazon.nova-pro-v1:0"
            
        s3.download_file(bucket, key, file_tmp)
            
            
            # Prepare the prompt for Claude
        prompt = os.environ.get('LLM_PROMPT')
        image_ext = file_tmp.split(".")[-1]
            
        with open(file_tmp, "rb") as f:
            image = f.read()

        message = {
            "role": "user",
            "content": [
                {
                    "text": prompt
                },
                {
                    "image": {
                        "format": image_ext,
                        "source": {
                            "bytes": image
                        }
                    }
                }
            ]
        }

        messages = [message]
   
   
        # Send the message.
        response = bedrock.converse(
            modelId=model_id,
            messages=messages
        )
        
        # Parse the response
      
        image_description = output_message = response['output']['message']['content'][0]['text']
        
        #print(image_description)
        # Clean up temporary file
        os.remove(file_tmp)
        
        return {
            "statusCode": 200,
            "body": json.dumps("Image processed successfully!"),
            "image_description": image_description
        }
        
    except Exception as e:
        return {
            "statusCode": 500,
            "body": json.dumps(f"Error processing image: {str(e)}")
        }

This new approach uses Amazon Bedrock's multimodal capabilities with the Nova Pro model to analyze basketball game images and extract detailed information about the action, players, and assess how interesting the moment is based on specific criteria.

I added this new Lambda into the state machine which is described in the original blog. I didn't remove the Rekognition step, but I don't consider it anymore in code.

image.png


Here is the prompt that is used to analyze the image of the game:


You are a basketball highlight analysis AI that evaluates footage of basketball plays. When presented with a video clip image of any basketball play, analyze it and determine whether the moment is interesting or not. Base your assessment on:

Analysis Parameters:

  1. Creativity Score (1-10):

    • Assess originality, unexpectedness, and innovation in the play.

    • Examples: creative passing, unique dribble moves, unexpected shot selection, clever defensive strategies, or innovative ways to bypass defenders.

  2. Execution Score (1-10):

    • Evaluate technical precision, timing, and effectiveness of the play.

    • Examples: perfect shooting form, precise passing, well-timed defensive rotations, or flawless ball handling.

  3. Athleticism Score (1-10):

    • Rate physical capabilities demonstrated such as speed, vertical leap, strength, balance, and body control.

    • Examples: explosive first step, high-flying rebounds, quick defensive reactions, or acrobatic finishes.

  4. Context Score (1-10):

    • Consider game situation, score differential, time remaining, playoff implications, or rivalry significance.

    • Examples: clutch moments, momentum-changing plays, or high-pressure situations.

  5. Technical Difficulty Score (1-10):

    • Evaluate complexity of the execution relative to standard basketball plays.

    • Examples: contested shots, through-traffic passes, defensive stops against elite players, or complex team movements.

Your Response Should:

  • Provide an overall verdict: "Interesting Play" or "Not Interesting Play"

  • Identify the type of play (dunk, three-pointer, assist, defensive stop, steal, block, crossover, alley-oop, etc.)

  • Justify your verdict with specific observations from the video

  • Highlight the most notable aspects (positive or negative)

  • Compare to standard/common plays of that type when relevant

Note: Be honest in your assessment - not every basketball moment is exceptional. A play may be uninteresting due to:

  • Simple, common execution without distinguishing elements

  • Poor technique or incomplete execution

  • Low difficulty level relative to performer's capabilities

  • Minimal impact on the flow or outcome of the game

  • Lack of defensive pressure or game context

Provide your analysis in a clear, structured format with specific timestamps when relevant.


And here is the result of image analisys (example):


Overall Verdict: Interesting Play

Type of Play: Dunk

Analysis:

  1. Creativity Score: 8

    • The dunk itself is not highly creative, but the sequence leading to it involves a clever pass and a well-timed cut to the basket, which adds an element of surprise and coordination.

  2. Execution Score: 9

    • The execution of the play is nearly flawless. The pass is precise, and the player receiving it shows excellent timing and ball handling to finish the dunk.

  3. Athleticism Score: 10

    • The dunk is highly athletic, showcasing excellent vertical leap and body control. The player demonstrates impressive explosiveness and timing in finishing the play.

  4. Context Score: 7

    • The play occurs in the third quarter with the score close (55-52). While it’s not a clutch moment, it’s still a significant play that could shift momentum.

  5. Technical Difficulty Score: 6

    • The play, while impressive, is somewhat standard for high-level NBA players. The difficulty lies more in the execution under game pressure rather than the complexity of the play itself.

Notable Aspects:

  • The player’s explosive first step and vertical leap are particularly impressive.

  • The precise timing and coordination between the passer and the dunker highlight strong team play.

  • The defensive effort to contest the dunk is notable, though ultimately unsuccessful.

Comparison to Standard Plays:

  • This play is more impressive than a standard dunk due to the athleticism and timing involved. However, it doesn’t reach the level of a truly extraordinary or game-changing moment, which keeps the context score moderate.


The result of image analysis is stored in Amazon OpenSearch. However, when we perform the search, this information is too large to be presented and overwhelms the UI. This is the reason we use Amazon Nova again to summarize this text on retrieval.

In a real-world scenario, we should summarize the text on insert to the OpenSearch stage. It should make the retrieval faster and avoid unnecessary processing of summarization each time I run the search.

But even if you choose not to summarize the image analysis during insert, you can use the new feature of Amazon Bedrock—"prompt" caching"—to" avoid processing the summarization over and over.

Improved Search with Hybrid Capabilities

I also enhanced the retrieval Lambda function to better present the image descriptions and implement hybrid search using OpenSearch:

import os
import json

import boto3
import awswrangler as wr

import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

sec = boto3.client("secretsmanager")
get_secret_value = sec.get_secret_value(SecretId=os.environ["OPENSEARCH_SECRET"])
secret_value = json.loads(get_secret_value["SecretString"])

bedrock = boto3.client("bedrock-runtime")

os_client = wr.opensearch.connect(
    host=os.environ["OPENSEARCH_ENDPOINT"],
    username=secret_value["username"],
    password=secret_value["password"],
)


def get_names(payload):
    logger.info("Detecting entities...")
    comprehend = boto3.client("comprehend")
    response = comprehend.detect_entities(Text=payload, LanguageCode="en")
    names = ""
    for entity in response["Entities"]:
        #if entity["Type"] == "PERSON":
        names += entity["Text"] + " "
    return names


def summarise_article_titan(payload, celebrity=None):
    prompt = f"""Provide a summary of the analysis of the image from the sport game. 
    Do not add any information that is not mentioned in the text below. Be very concrete. 
    If the analysis indicates very low scores (bellow 3 for at least two categories), 
    the content is not interesting and just say "Not interesting content."Try to use at most two sentences. 
    Mention only major impressions from image analysis. 
    If celebrity people included in the analysis bellow, include them in the summary.
    <text>
    {payload}
    </text>
    <celebrity>
    {celebrity}
    </celebrity>
    """

    body = json.dumps(
    {
        "schemaVersion": "messages-v1",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": prompt
                    }
                ]
            }
        ],
        "inferenceConfig": {
            "maxTokens": 300,
            "temperature": 0.3,
            "topP": 0.1,
            "topK": 20
        }
    })

    modelId = "us.amazon.nova-pro-v1:0"  # Nova Lite model ID
    accept = 'application/json'
    contentType = 'application/json'

    try:
        response = bedrock.invoke_model(
            body=body, 
            modelId=modelId, 
            accept=accept, 
            contentType=contentType
        )
        response_body = json.loads(response.get('body').read())
        
        # Nova models return response in a different format
        answer = response_body["output"]["message"]["content"][0]["text"]
        
        logger.info('## SUMMARIZED')
        logger.info(f'Response: "{answer}"')
        return answer
    except Exception as e:
        logger.error(f"Error in summarize_article_titan: {str(e)}")
        return payload[:200] + "..." if len(payload) > 200 else payload  # Return truncated original text as fallback


def get_vector_titan(payload_summary):
    try:
        body = json.dumps({"inputText": payload_summary})

        response = bedrock.invoke_model(
            body=body,
            modelId="amazon.titan-embed-image-v1",
            accept="application/json",
            contentType="application/json",
        )
        response_body = json.loads(response.get("body").read())
        embedding = response_body.get("embedding")
        return embedding
    except Exception as e:
        logger.error(f"Error in get_vector_titan: {str(e)}")
        raise


def index_document(document):
    logger.info("Indexing document...")
    
    # Create Index with hybrid search support
    try:
        index_exists = os_client.indices.exists(index="imagesh")
        if not index_exists:
            wr.opensearch.create_index(
                client=os_client,
                index="imagesh",
                settings={
                    "index.knn": True,
                    "index.knn.space_type": "cosinesimil",
                    "analysis": {
                        "analyzer": {
                            "default": {
                                "type": "standard",
                                "stopwords": "_english_"
                            }
                        }
                    }
                },
                mappings={
                    "properties": {
                        "image_vector": {
                            "type": "knn_vector",
                            "dimension": len(document["image_vector"]),
                            "method": {
                                "name": "hnsw",
                                "space_type": "cosinesimil",
                                "engine": "nmslib"
                            }
                        },
                        "image_path": {
                            "type": "text",
                            "store": True,
                            "fields": {
                                "keyword": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "image_words": {
                            "type": "text",
                            "store": True,
                            "analyzer": "standard",
                            "fields": {
                                "keyword": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "celebrities": {
                            "type": "text",
                            "store": True,
                            "fields": {
                                "keyword": {
                                    "type": "keyword"
                                }
                            }
                        }
                    }
                },
            )
            logger.info("Created 'imagesh' index")
    except Exception as e:
        logger.error(f"Error checking/creating index: {str(e)}")

    try:
        response = wr.opensearch.index_documents(
            client=os_client,
            index="imagesh",
            documents=[document],
        )
        return response
    except Exception as e:
        logger.error(f"Error indexing document: {str(e)}")
        raise


def search_document_vector(vector, text_query=None, knn_weight=0.7, text_weight=0.3, k=3):
    """
    Perform hybrid search combining vector similarity and text search
    """
    logger.info("Searching documents...")
    
    try:
        if text_query and text_query.strip():
            # Hybrid search query
            search_body = {
                "size": k,
                "_source": True,
                "query": {
                    "bool": {
                        "should": [
                            # KNN vector similarity
                            {
                                "knn": {
                                    "image_vector": {
                                        "vector": vector,
                                        "k": k,
                                        "boost": knn_weight
                                    }
                                }
                            },
                            # Text matching component
                            {
                                "multi_match": {
                                    "query": text_query,
                                    "fields": ["image_words", "celebrities"],
                                    "fuzziness": "AUTO",
                                    "boost": text_weight
                                }
                            }
                        ]
                    }
                }
            }
        else:
            # Pure KNN search if no text query
            search_body = {
                "size": k,
                "_source": True,
                "query": {
                    "knn": {
                        "image_vector": {
                            "vector": vector,
                            "k": k
                        }
                    }
                }
            }

        results = wr.opensearch.search(
            client=os_client,
            index="imagesh",
            search_body=search_body
        )
        
        if results.empty:
            logger.info("No search results found")
            return {"results": []}
        
        # Drop the vector column which can be large
        if "image_vector" in results.columns:
            results = results.drop(columns=["image_vector"])
            
        return results.to_dict()
    
    except Exception as e:
        logger.error(f"Error in search_document_vector: {str(e)}")
        raise


def search_document_celeb_context(person_names, vector):
    try:
        search_body = {
            "size": 10,
            "query": {
                "bool": {
                    "must": [
                        {"match": {"celebrities": person_names}}
                    ],
                    "should": [
                        {
                            "knn": {
                                "image_vector": {
                                    "vector": vector,
                                    "k": 10
                                }
                            }
                        }
                    ]
                }
            }
        }
        
        results = wr.opensearch.search(
            client=os_client,
            index="imagesh",
            search_body=search_body
        )
        
        if results.empty:
            # Fallback to regular vector search
            res = search_document_vector(vector)
        else:
            # Drop the vector column which can be large
            if "image_vector" in results.columns:
                results = results.drop(columns=["image_vector"])
            res = results.to_dict()
            
        return res
    except Exception as e:
        logger.error(f"Error in search_document_celeb_context: {str(e)}")
        # Fallback to regular vector search
        return search_document_vector(vector)


def lambda_handler(event, context):
    try:
        if "ImageText" in event:
            logger.info("Processing ImageText event")
            metad = (
                event["ImageText"]["Sentence_labels"]
                + " "
                + event["ImageText"]["Sentence_people"]
            )

            vector = event["ImageText"]["Image_embedding"]
            os_document = {
                "image_path": f"s3://{event['detail']['bucket']['name']}/{event['detail']['object']['key']}",
                "image_words": event["ImageText"]["Sentence_labels"],
                "celebrities": event["ImageText"]["Sentence_people"],
                "image_vector": vector,
            }
            index_document(os_document)
            return {"Vector": vector}
            
        elif event.get("httpMethod") == "POST" and "body" in event:
            logger.info("Processing search API request")
            body = event["body"]
            
            if len(body) > 20000:
                logger.info('## Text too long')            
                return {
                    "statusCode": 400,
                    "headers": {
                        "Access-Control-Allow-Origin": "*",
                        "Access-Control-Allow-Headers": "*",
                        "Access-Control-Allow-Methods": "*",
                    },
                    "body": json.dumps({"error": "Text too long"})
                }
            
            vector = get_vector_titan(body)
            os_results = search_document_vector(vector, body)
            
            # Format results 
            if os_results and os_results.get(list(os_results.keys())[0], {}):
                results = [
                    {k: os_results[k][n] for k in os_results.keys()}
                    for n in os_results[list(os_results.keys())[0]].keys()
                ]
            else:
                results = []
                
            # Summarize image_words for each result
            for item in results:
                image_words = item.get('image_words', '')
                celebrities = item.get('celebrities', '').strip() or None
                item['image_words'] = summarise_article_titan(image_words, celebrities)
            
            logger.info(f"Returning {len(results)} search results")
            
            return {
                "statusCode": 200,
                "headers": {
                    "Access-Control-Allow-Origin": "*",
                    "Access-Control-Allow-Headers": "*",
                    "Access-Control-Allow-Methods": "*",
                },
                "body": json.dumps({"results": results})
            }
        else:
            logger.error("Invalid event structure")
            return {
                "statusCode": 400,
                "headers": {
                    "Access-Control-Allow-Origin": "*",
                    "Access-Control-Allow-Headers": "*",
                    "Access-Control-Allow-Methods": "*",
                },
                "body": json.dumps({"error": "Invalid request format"})
            }
    except Exception as e:
        logger.error(f"Error in lambda_handler: {str(e)}")
        return {
            "statusCode": 500,
            "headers": {
                "Access-Control-Allow-Origin": "*",
                "Access-Control-Allow-Headers": "*", 
                "Access-Control-Allow-Methods": "*",
            },
            "body": json.dumps({"error": f"Internal server error: {str(e)}"})
        }

Understanding Hybrid Search

Hybrid search in OpenSearch combines:

  1. Vector search - Finding semantically similar content using vector embeddings (KNN search)

  2. Text search - Traditional keyword-based search with text matching

This powerful combination allows users to find relevant images even when their search terms don't exactly match the indexed metadata. For example, searching for "amazing dunk" might return spectacular slam dunks even if those exact words weren't in the original description.

The weights (knn_weight and text_weight) allow tuning the relevance formula to balance semantic similarity with keyword matching.

Use Case: Basketball Game Analysis

This enhanced solution is particularly valuable for basketball game analysis. Analysts can:

  • Search for specific plays like "LeBron James dunk" or "three-point shot in final seconds"

  • Find visually similar game situations without knowing the specific keywords

  • Identify exciting moments in a game based on contextual analysis scores

  • Click on a search result to view the actual footage of that moment

For example, a coach might search for "defensive stance against pick and roll" and find relevant examples across multiple games, even if those specific terms weren't in the original metadata.

Results and Improvements

The enhanced system provides several key improvements:

  1. Richer context - Detailed descriptions of basketball plays beyond simple object recognition

  2. Excitement assessment - Analysis of how interesting a moment is based on specific criteria

  3. Hybrid search - Combination of semantic and keyword search for more relevant results

Sample search results might include:

"LeBron James executing a powerful slam dunk over two defenders, showing exceptional athleticism with a score of 8/10 for excitement."

"Stephen Curry hitting a long-range three-pointer in the final seconds of the game, turning the momentum with high pressure shot execution rated 9/10."


Here how the search looks like after I implemented the changes above:

image.png

Conclusion

While the original AWS solution provided an excellent foundation for semantic image search, the enhancements with Amazon Bedrock and hybrid search capabilities in OpenSearch take sports analytics to the next level. This approach enables coaches, analysts, and fans to find meaningful moments in basketball games beyond what traditional object recognition could provide.

The ability to identify and retrieve specific game situations based on rich contextual understanding opens up new possibilities for game analysis, highlight creation, and player development.

Credit goes to the original AWS blog for laying the groundwork with their semantic image search solution. Their approach was state-of-the-art at the time of release, but the rapidly evolving AI landscape now allows us to build even more powerful tools by combining multiple AWS AI services.


Sports fans can also use this feature to get insight from the game post-factum. For example, users can search for "interesting dunk" and immediately get relevant images. The solution can be extended farther, so on click on the image from the search result, the application will play a video with 10 seconds before and 10 seconds after the time that is described in the image.


Note: This implementation can be extended to other sports or video content analysis use cases where finding specific moments based on rich contextual understanding is valuable.