Enhancing Semantic Image Search with Amazon Bedrock and OpenSearch for Sports Analytics
Introduction
In today's blog, I'll explore how to build an advanced semantic image search system specifically tailored for sports analytics, building upon the excellent foundation provided by AWS in their original semantic image search blog . While the original solution leveraged Amazon Rekognition and Titan multimodal embedding models, I've enhanced it with Amazon Nova Pro capabilities to provide richer contextual analysis of basketball game footage.
The Original Solution
The AWS blog presented a comprehensive approach to semantic image search using:
Amazon Rekognition for extracting image labels and celebrity recognition
Amazon Titan multimodal embedding models for vector embeddings
Amazon OpenSearch for storing and retrieving image metadata
A user-friendly search interface
The solution allowed users to search for images semantically, beyond just keyword matching. The GitHub repository is available here .
The Enhancement Challenge
While the original architecture was solid, I found the image descriptions extracted by Amazon Rekognition lacked the detailed context needed for meaningful sports analytics. For example, a basketball image might be tagged with generic labels like "person," "ball," and "court" without capturing the action's intensity or significance.
Enhanced Architecture with Bedrock
To address these limitations, I replaced the Rekognition label extraction with an Amazon Bedrock-powered Lambda function that provides richer contextual analysis:
import os
import json
import base64
import boto3
import time
s3 = boto3.client("s3")
bedrock = boto3.client("bedrock-runtime")
def get_image_base64(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
def lambda_handler(event, context):
try:
# Download S3 image to local
bucket = event["body"]["bucket"]
key = event["body"]["key"]
fileshort = os.path.basename(key)
file_tmp = "/tmp/" + fileshort
model_id="us.amazon.nova-pro-v1:0"
s3.download_file(bucket, key, file_tmp)
# Prepare the prompt for Claude
prompt = os.environ.get('LLM_PROMPT')
image_ext = file_tmp.split(".")[-1]
with open(file_tmp, "rb") as f:
image = f.read()
message = {
"role": "user",
"content": [
{
"text": prompt
},
{
"image": {
"format": image_ext,
"source": {
"bytes": image
}
}
}
]
}
messages = [message]
# Send the message.
response = bedrock.converse(
modelId=model_id,
messages=messages
)
# Parse the response
image_description = output_message = response['output']['message']['content'][0]['text']
#print(image_description)
# Clean up temporary file
os.remove(file_tmp)
return {
"statusCode": 200,
"body": json.dumps("Image processed successfully!"),
"image_description": image_description
}
except Exception as e:
return {
"statusCode": 500,
"body": json.dumps(f"Error processing image: {str(e)}")
}
This new approach uses Amazon Bedrock's multimodal capabilities with the Nova Pro model to analyze basketball game images and extract detailed information about the action, players, and assess how interesting the moment is based on specific criteria.
I added this new Lambda into the state machine which is described in the original blog. I didn't remove the Rekognition step, but I don't consider it anymore in code.
Here is the prompt that is used to analyze the image of the game:
You are a basketball highlight analysis AI that evaluates footage of basketball plays. When presented with a video clip image of any basketball play, analyze it and determine whether the moment is interesting or not. Base your assessment on:
Analysis Parameters:
Creativity Score (1-10):
Assess originality, unexpectedness, and innovation in the play.
Examples: creative passing, unique dribble moves, unexpected shot selection, clever defensive strategies, or innovative ways to bypass defenders.
Execution Score (1-10):
Evaluate technical precision, timing, and effectiveness of the play.
Examples: perfect shooting form, precise passing, well-timed defensive rotations, or flawless ball handling.
Athleticism Score (1-10):
Rate physical capabilities demonstrated such as speed, vertical leap, strength, balance, and body control.
Examples: explosive first step, high-flying rebounds, quick defensive reactions, or acrobatic finishes.
Context Score (1-10):
Consider game situation, score differential, time remaining, playoff implications, or rivalry significance.
Examples: clutch moments, momentum-changing plays, or high-pressure situations.
Technical Difficulty Score (1-10):
Evaluate complexity of the execution relative to standard basketball plays.
Examples: contested shots, through-traffic passes, defensive stops against elite players, or complex team movements.
Your Response Should:
Provide an overall verdict: "Interesting Play" or "Not Interesting Play"
Identify the type of play (dunk, three-pointer, assist, defensive stop, steal, block, crossover, alley-oop, etc.)
Justify your verdict with specific observations from the video
Highlight the most notable aspects (positive or negative)
Compare to standard/common plays of that type when relevant
Note: Be honest in your assessment - not every basketball moment is exceptional. A play may be uninteresting due to:
Simple, common execution without distinguishing elements
Poor technique or incomplete execution
Low difficulty level relative to performer's capabilities
Minimal impact on the flow or outcome of the game
Lack of defensive pressure or game context
Provide your analysis in a clear, structured format with specific timestamps when relevant.
And here is the result of image analisys (example):
Overall Verdict: Interesting Play
Type of Play: Dunk
Analysis:
Creativity Score: 8
The dunk itself is not highly creative, but the sequence leading to it involves a clever pass and a well-timed cut to the basket, which adds an element of surprise and coordination.
Execution Score: 9
The execution of the play is nearly flawless. The pass is precise, and the player receiving it shows excellent timing and ball handling to finish the dunk.
Athleticism Score: 10
The dunk is highly athletic, showcasing excellent vertical leap and body control. The player demonstrates impressive explosiveness and timing in finishing the play.
Context Score: 7
The play occurs in the third quarter with the score close (55-52). While it’s not a clutch moment, it’s still a significant play that could shift momentum.
Technical Difficulty Score: 6
The play, while impressive, is somewhat standard for high-level NBA players. The difficulty lies more in the execution under game pressure rather than the complexity of the play itself.
Notable Aspects:
The player’s explosive first step and vertical leap are particularly impressive.
The precise timing and coordination between the passer and the dunker highlight strong team play.
The defensive effort to contest the dunk is notable, though ultimately unsuccessful.
Comparison to Standard Plays:
This play is more impressive than a standard dunk due to the athleticism and timing involved. However, it doesn’t reach the level of a truly extraordinary or game-changing moment, which keeps the context score moderate.
The result of image analysis is stored in Amazon OpenSearch. However, when we perform the search, this information is too large to be presented and overwhelms the UI. This is the reason we use Amazon Nova again to summarize this text on retrieval.
In a real-world scenario, we should summarize the text on insert to the OpenSearch stage. It should make the retrieval faster and avoid unnecessary processing of summarization each time I run the search.
But even if you choose not to summarize the image analysis during insert, you can use the new feature of Amazon Bedrock—"prompt" caching"—to" avoid processing the summarization over and over.
Improved Search with Hybrid Capabilities
I also enhanced the retrieval Lambda function to better present the image descriptions and implement hybrid search using OpenSearch:
import os
import json
import boto3
import awswrangler as wr
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
sec = boto3.client("secretsmanager")
get_secret_value = sec.get_secret_value(SecretId=os.environ["OPENSEARCH_SECRET"])
secret_value = json.loads(get_secret_value["SecretString"])
bedrock = boto3.client("bedrock-runtime")
os_client = wr.opensearch.connect(
host=os.environ["OPENSEARCH_ENDPOINT"],
username=secret_value["username"],
password=secret_value["password"],
)
def get_names(payload):
logger.info("Detecting entities...")
comprehend = boto3.client("comprehend")
response = comprehend.detect_entities(Text=payload, LanguageCode="en")
names = ""
for entity in response["Entities"]:
#if entity["Type"] == "PERSON":
names += entity["Text"] + " "
return names
def summarise_article_titan(payload, celebrity=None):
prompt = f"""Provide a summary of the analysis of the image from the sport game.
Do not add any information that is not mentioned in the text below. Be very concrete.
If the analysis indicates very low scores (bellow 3 for at least two categories),
the content is not interesting and just say "Not interesting content."Try to use at most two sentences.
Mention only major impressions from image analysis.
If celebrity people included in the analysis bellow, include them in the summary.
<text>
{payload}
</text>
<celebrity>
{celebrity}
</celebrity>
"""
body = json.dumps(
{
"schemaVersion": "messages-v1",
"messages": [
{
"role": "user",
"content": [
{
"text": prompt
}
]
}
],
"inferenceConfig": {
"maxTokens": 300,
"temperature": 0.3,
"topP": 0.1,
"topK": 20
}
})
modelId = "us.amazon.nova-pro-v1:0" # Nova Lite model ID
accept = 'application/json'
contentType = 'application/json'
try:
response = bedrock.invoke_model(
body=body,
modelId=modelId,
accept=accept,
contentType=contentType
)
response_body = json.loads(response.get('body').read())
# Nova models return response in a different format
answer = response_body["output"]["message"]["content"][0]["text"]
logger.info('## SUMMARIZED')
logger.info(f'Response: "{answer}"')
return answer
except Exception as e:
logger.error(f"Error in summarize_article_titan: {str(e)}")
return payload[:200] + "..." if len(payload) > 200 else payload # Return truncated original text as fallback
def get_vector_titan(payload_summary):
try:
body = json.dumps({"inputText": payload_summary})
response = bedrock.invoke_model(
body=body,
modelId="amazon.titan-embed-image-v1",
accept="application/json",
contentType="application/json",
)
response_body = json.loads(response.get("body").read())
embedding = response_body.get("embedding")
return embedding
except Exception as e:
logger.error(f"Error in get_vector_titan: {str(e)}")
raise
def index_document(document):
logger.info("Indexing document...")
# Create Index with hybrid search support
try:
index_exists = os_client.indices.exists(index="imagesh")
if not index_exists:
wr.opensearch.create_index(
client=os_client,
index="imagesh",
settings={
"index.knn": True,
"index.knn.space_type": "cosinesimil",
"analysis": {
"analyzer": {
"default": {
"type": "standard",
"stopwords": "_english_"
}
}
}
},
mappings={
"properties": {
"image_vector": {
"type": "knn_vector",
"dimension": len(document["image_vector"]),
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "nmslib"
}
},
"image_path": {
"type": "text",
"store": True,
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"image_words": {
"type": "text",
"store": True,
"analyzer": "standard",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"celebrities": {
"type": "text",
"store": True,
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
},
)
logger.info("Created 'imagesh' index")
except Exception as e:
logger.error(f"Error checking/creating index: {str(e)}")
try:
response = wr.opensearch.index_documents(
client=os_client,
index="imagesh",
documents=[document],
)
return response
except Exception as e:
logger.error(f"Error indexing document: {str(e)}")
raise
def search_document_vector(vector, text_query=None, knn_weight=0.7, text_weight=0.3, k=3):
"""
Perform hybrid search combining vector similarity and text search
"""
logger.info("Searching documents...")
try:
if text_query and text_query.strip():
# Hybrid search query
search_body = {
"size": k,
"_source": True,
"query": {
"bool": {
"should": [
# KNN vector similarity
{
"knn": {
"image_vector": {
"vector": vector,
"k": k,
"boost": knn_weight
}
}
},
# Text matching component
{
"multi_match": {
"query": text_query,
"fields": ["image_words", "celebrities"],
"fuzziness": "AUTO",
"boost": text_weight
}
}
]
}
}
}
else:
# Pure KNN search if no text query
search_body = {
"size": k,
"_source": True,
"query": {
"knn": {
"image_vector": {
"vector": vector,
"k": k
}
}
}
}
results = wr.opensearch.search(
client=os_client,
index="imagesh",
search_body=search_body
)
if results.empty:
logger.info("No search results found")
return {"results": []}
# Drop the vector column which can be large
if "image_vector" in results.columns:
results = results.drop(columns=["image_vector"])
return results.to_dict()
except Exception as e:
logger.error(f"Error in search_document_vector: {str(e)}")
raise
def search_document_celeb_context(person_names, vector):
try:
search_body = {
"size": 10,
"query": {
"bool": {
"must": [
{"match": {"celebrities": person_names}}
],
"should": [
{
"knn": {
"image_vector": {
"vector": vector,
"k": 10
}
}
}
]
}
}
}
results = wr.opensearch.search(
client=os_client,
index="imagesh",
search_body=search_body
)
if results.empty:
# Fallback to regular vector search
res = search_document_vector(vector)
else:
# Drop the vector column which can be large
if "image_vector" in results.columns:
results = results.drop(columns=["image_vector"])
res = results.to_dict()
return res
except Exception as e:
logger.error(f"Error in search_document_celeb_context: {str(e)}")
# Fallback to regular vector search
return search_document_vector(vector)
def lambda_handler(event, context):
try:
if "ImageText" in event:
logger.info("Processing ImageText event")
metad = (
event["ImageText"]["Sentence_labels"]
+ " "
+ event["ImageText"]["Sentence_people"]
)
vector = event["ImageText"]["Image_embedding"]
os_document = {
"image_path": f"s3://{event['detail']['bucket']['name']}/{event['detail']['object']['key']}",
"image_words": event["ImageText"]["Sentence_labels"],
"celebrities": event["ImageText"]["Sentence_people"],
"image_vector": vector,
}
index_document(os_document)
return {"Vector": vector}
elif event.get("httpMethod") == "POST" and "body" in event:
logger.info("Processing search API request")
body = event["body"]
if len(body) > 20000:
logger.info('## Text too long')
return {
"statusCode": 400,
"headers": {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Headers": "*",
"Access-Control-Allow-Methods": "*",
},
"body": json.dumps({"error": "Text too long"})
}
vector = get_vector_titan(body)
os_results = search_document_vector(vector, body)
# Format results
if os_results and os_results.get(list(os_results.keys())[0], {}):
results = [
{k: os_results[k][n] for k in os_results.keys()}
for n in os_results[list(os_results.keys())[0]].keys()
]
else:
results = []
# Summarize image_words for each result
for item in results:
image_words = item.get('image_words', '')
celebrities = item.get('celebrities', '').strip() or None
item['image_words'] = summarise_article_titan(image_words, celebrities)
logger.info(f"Returning {len(results)} search results")
return {
"statusCode": 200,
"headers": {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Headers": "*",
"Access-Control-Allow-Methods": "*",
},
"body": json.dumps({"results": results})
}
else:
logger.error("Invalid event structure")
return {
"statusCode": 400,
"headers": {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Headers": "*",
"Access-Control-Allow-Methods": "*",
},
"body": json.dumps({"error": "Invalid request format"})
}
except Exception as e:
logger.error(f"Error in lambda_handler: {str(e)}")
return {
"statusCode": 500,
"headers": {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Headers": "*",
"Access-Control-Allow-Methods": "*",
},
"body": json.dumps({"error": f"Internal server error: {str(e)}"})
}
Understanding Hybrid Search
Hybrid search in OpenSearch combines:
Vector search - Finding semantically similar content using vector embeddings (KNN search)
Text search - Traditional keyword-based search with text matching
This powerful combination allows users to find relevant images even when their search terms don't exactly match the indexed metadata. For example, searching for "amazing dunk" might return spectacular slam dunks even if those exact words weren't in the original description.
The weights (knn_weight
and text_weight
) allow tuning the relevance formula to balance semantic similarity with keyword matching.
Use Case: Basketball Game Analysis
This enhanced solution is particularly valuable for basketball game analysis. Analysts can:
Search for specific plays like "LeBron James dunk" or "three-point shot in final seconds"
Find visually similar game situations without knowing the specific keywords
Identify exciting moments in a game based on contextual analysis scores
Click on a search result to view the actual footage of that moment
For example, a coach might search for "defensive stance against pick and roll" and find relevant examples across multiple games, even if those specific terms weren't in the original metadata.
Results and Improvements
The enhanced system provides several key improvements:
Richer context - Detailed descriptions of basketball plays beyond simple object recognition
Excitement assessment - Analysis of how interesting a moment is based on specific criteria
Hybrid search - Combination of semantic and keyword search for more relevant results
Sample search results might include:
"LeBron James executing a powerful slam dunk over two defenders, showing exceptional athleticism with a score of 8/10 for excitement."
"Stephen Curry hitting a long-range three-pointer in the final seconds of the game, turning the momentum with high pressure shot execution rated 9/10."
Here how the search looks like after I implemented the changes above:
Conclusion
While the original AWS solution provided an excellent foundation for semantic image search, the enhancements with Amazon Bedrock and hybrid search capabilities in OpenSearch take sports analytics to the next level. This approach enables coaches, analysts, and fans to find meaningful moments in basketball games beyond what traditional object recognition could provide.
The ability to identify and retrieve specific game situations based on rich contextual understanding opens up new possibilities for game analysis, highlight creation, and player development.
Credit goes to the original AWS blog for laying the groundwork with their semantic image search solution. Their approach was state-of-the-art at the time of release, but the rapidly evolving AI landscape now allows us to build even more powerful tools by combining multiple AWS AI services.
Sports fans can also use this feature to get insight from the game post-factum. For example, users can search for "interesting dunk" and immediately get relevant images. The solution can be extended farther, so on click on the image from the search result, the application will play a video with 10 seconds before and 10 seconds after the time that is described in the image.
Note: This implementation can be extended to other sports or video content analysis use cases where finding specific moments based on rich contextual understanding is valuable.