This notebook will use the latest YOLO11 (at the moment of writing this ) model to identify sign language. It will be trained on the custom dataset created by the author. The classes that we will identify are: ['hello', 'love', 'me', 'mother', 'no', 'please', 'thankyou', 'yes', 'you', 'your']
First of all, we will check what the version of CUDA is.
!nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
We need it to identify proper dependencies to install. After we see the version of CUDA, go to https://pytorch.org/get-started/locally/
and check for proper dependencies. I didn't see 12.5 version, but assumed that 12.4 will probably also work.
Next, we run it
!pip3 install --upgrade torch torchvision torchaudio ultralytics
Now we need to download the base model from https://github.com/ultralytics/ultralytics. There are several model sizes, and this example uses the largest one - YOLO11x.
!wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt
Next command may NOT be necessary. But after I ran the training for the first time, the process failed with some missing C library. Running this command fixed the problem.
!sudo apt-get install -y libgl1-mesa-glx
Next, you need to create the dataset. I used the Roboflow - https://roboflow.com/ annotation tool (free sign-up required). You actually can use any annotation tool, and the major reason why I used Roboflow is because it allows you to download the dataset in a format that YOLO expects. It createwd all training validation and test folders with all relevant metadata.
"love" sign
Your folders should look like this after you extract the files from Roboflow and download YOLO model
The code to train is very simple. Note that we trained on a GPU, and the memory consumption is somewhere around 17GB. So you need to choose relevant instance for training. I used ml.g5.xlarge, but you may try a different one.
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11x.pt")
# Train the model (I used ml.g5.xlarge instance)
train_results = model.train(
data="data.yaml", # path to dataset YAML
epochs=100, # number of training epochs
imgsz=640, # training image size
device=0, # device to run on, i.e. device=0 or device=0,1,2,3 or device=cpu
)
(I cut some output )
Creating new Ultralytics Settings v0.0.6 file ✅
View Ultralytics Settings with 'yolo settings' or at '/home/sagemaker-user/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
Ultralytics 8.3.34 🚀 Python-3.11.10 torch-2.5.1+cu124 CUDA:0 (NVIDIA A10G, 22503MiB)
engine/trainer: task=detect, mode=train, model=yolo11x.pt, data=data.yaml, epochs=100, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train
Downloading https://ultralytics.com/assets/Arial.ttf to '/home/sagemaker-user/.config/Ultralytics/Arial.ttf'...
.
.
.
.
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train
Starting training for 100 epochs...
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/100 17.2G 1.674 4.297 1.922 8 640: 100%|██████████| 29/29 [00:15<00:00, 1.83it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 5/5 [00:02<00:00, 1.94it/s]
all 137 131 0.666 0.291 0.373 0.213
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/100 17.1G 1.475 2.712 1.738 9 640: 100%|██████████| 29/29 [00:15<00:00, 1.92it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 5/5 [00:01<00:00, 2.74it/s]
all 137 131 0.599 0.337 0.297 0.141
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/100 17.3G 1.462 2.377 1.689 13 640: 100%|██████████| 29/29 [00:14<00:00, 1.95it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 5/5 [00:01<00:00, 2.84it/s]
all 137 131 0.277 0.137 0.0701 0.0196
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/100 17.3G 1.517 2.154 1.682 7 640: 100%|██████████| 29/29 [00:14<00:00, 1.95it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 5/5 [00:01<00:00, 2.92it/s]
all 137 131 0.388 0.133 0.121 0.0482
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/100 17.1G 1.413 1.692 1.596 17 640: 100%|██████████| 29/29 [00:14<00:00, 1.95it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 5/5 [00:01<00:00, 2.87it/s]
all 137 131 0.326 0.449 0.432 0.252
.
.
.
.
.
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
98/100 17.3G 0.7151 0.328 1.152 8 640: 100%|██████████| 29/29 [00:14<00:00, 1.96it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 5/5 [00:01<00:00, 2.99it/s]
all 137 131 0.977 0.996 0.995 0.745
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
99/100 17.3G 0.7023 0.3214 1.142 7 640: 100%|██████████| 29/29 [00:14<00:00, 1.96it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 5/5 [00:01<00:00, 2.99it/s]
all 137 131 0.977 0.997 0.995 0.753
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
100/100 17.3G 0.6968 0.3162 1.135 7 640: 100%|██████████| 29/29 [00:14<00:00, 1.96it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 5/5 [00:01<00:00, 2.99it/s]
all 137 131 0.976 0.997 0.995 0.754
100 epochs completed in 0.565 hours.
Optimizer stripped from runs/detect/train/weights/last.pt, 114.4MB
Optimizer stripped from runs/detect/train/weights/best.pt, 114.4MB
Validating runs/detect/train/weights/best.pt...
Ultralytics 8.3.34 🚀 Python-3.11.10 torch-2.5.1+cu124 CUDA:0 (NVIDIA A10G, 22503MiB)
YOLO11x summary (fused): 464 layers, 56,839,729 parameters, 0 gradients, 194.5 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 5/5 [00:01<00:00, 2.94it/s]
all 137 131 0.98 0.993 0.995 0.775
food 7 7 0.992 1 0.995 0.822
hello 16 16 0.992 1 0.995 0.739
love 20 20 0.989 1 0.995 0.802
me 11 11 1 0.922 0.995 0.73
mother 28 28 0.997 1 0.995 0.776
no 9 9 0.982 1 0.995 0.779
please 4 4 0.97 1 0.995 0.798
thankyou 9 9 0.894 1 0.995 0.768
yes 4 4 0.975 1 0.995 0.67
you 18 18 0.992 1 0.995 0.817
your 5 5 1 1 0.995 0.822
Speed: 0.2ms preprocess, 9.4ms inference, 0.0ms loss, 0.8ms postprocess per image
Results saved to runs/detect/train
MLflow: results logged to runs/mlflow
MLflow: disable with 'yolo settings mlflow=False'
You can find the best model in this directory. It appears at the end of the training process. I renamed it to yolo11x_custom.pt
For deployment to Sagemaker I created the following folder structure (see the printscreen).
We will discuss later why do we need it.
#now we will load the trained model
from ultralytics import YOLO
model_custom = YOLO("deploy/yolo11x_custom.pt")
model_custom.info()
YOLO11x summary: 631 layers, 56,886,481 parameters, 0 gradients, 195.5 GFLOPs
(631, 56886481, 0, 195.5136)
# We will use one of the validation images (it was not used for training) to check the results.
# You validation image probably will have a different name
# Note the "save" parameter.
VALIDATION_IMAGE='valid/images/me000040_jpg.rf.c17bb94aafb9977099e35ec30076fbb5.jpg'
response = model_custom.predict(source=VALIDATION_IMAGE,show=False,save=True,conf=0.25,line_width=1)
response
Since we put "save=True" we can find the output image in this directory (in your case it will be different)
And here is the output, which looks good. Now we want to extract textual classification of the image ("me").
# this is generic code to read the output from the model # we focused on object detection, but the model can do also other things # like semantic segmentation import json infer = {} for result in response: if result.boxes: infer['boxes'] = result.boxes.cpu().numpy().data.tolist() if result.masks: infer['masks'] = result.masks.cpu().numpy().data.tolist() if result.probs: infer['probs'] = result.probs.cpu().numpy().data.tolist() j_result=json.dumps(infer) #to get predicted class, run the following code predictions = json.loads(j_result) model_custom.names.get(predictions["boxes"][0][-1])
Output:
'me'
Hosting on Amazon SageMaker
In order to host the model on SageMaker, you need to perform very specific steps. They are described here: https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html (check Model Directory Structure section).
But in short, you need to have inference.py and requirements.txt files in the code directory and the model file in parallel to the code directory. You can find both files in deploy/code directory in this GitHub. The code folder and the model should be ziped into model.tar.gz file and uploaded to S3
To create the gz file run "tar -czvf model.tar.gz code/ yolo11x_custom.pt" from deploy directory
# now upload the model to S3. You need to make sure that you have proper permissions to work with S3 bucket from sagemaker import s3 #change to your bucket bucket = "s3://YOUR_BUCKET_HERE" prefix = "yolov11/demo-custom-endpoint" model_data = s3.S3Uploader.upload("deploy/model.tar.gz", bucket + "/" + prefix) model_data
Output
's3://YOUR_BUCKET_HERE/yolov11/demo-custom-endpoint/model.tar.gz'
Create model metadata.
from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role
import sagemaker
model_name = 'yolo11x_custom.pt'
role = get_execution_role()
session = sagemaker.Session()
#I used prebuild image for SageMaker with Pytorch and GPU.
#You can find relevant image here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md
#Note that the one I use is for "us-east-1" region
model = PyTorchModel(entry_point='inference.py',
model_data=model_data,
image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.5.1-gpu-py311-cu124-ubuntu22.04-sagemaker',
role=role,
env={'TS_MAX_RESPONSE_SIZE':'20000000', 'YOLOV11_MODEL': model_name},
sagemaker_session=session)
Deploy the model to SageMaker endpoint
from sagemaker.deserializers import JSONDeserializer
# I choose g5 instance since it has A10G GPU card.
# Could be that less powerfull instance can be used, but you have to check it by yourself :).
INSTANCE_TYPE = 'ml.g5.2xlarge'
ENDPOINT_NAME = 'yolov11-custom-sign-language'
predictor = model.deploy(initial_instance_count=1,
instance_type=INSTANCE_TYPE,
deserializer=JSONDeserializer(),
endpoint_name=ENDPOINT_NAME)
Run inference with test imageimport boto3
import numpy as np
import json
import io
import cv2
#By default, the endpoint input serializer expects bytes, so we convert the image to bytes.
#Inside inference.py file we convert it back to image object
# Load the image. It is the same one we used by directly calling "predict" on the model_custom object
image_path = VALIDATION_IMAGE # Change this to your image file
image = cv2.imread(image_path)
_, img_bytes = cv2.imencode('.jpg', image)
# Convert the memory buffer to bytes
image_bytes = img_bytes.tobytes()
# Create a SageMaker runtime client
runtime_client = boto3.client('sagemaker-runtime')
# Invoke the endpoint
response = runtime_client.invoke_endpoint(
EndpointName=ENDPOINT_NAME,
ContentType='application/x-image',
Body=image_bytes
)
# Process the response
result = response['Body'].read()
predictions = json.loads(result)
print(predictions)
{'boxes': [[280.300048828125, 333.251708984375, 345.956787109375, 436.8214111328125, 0.8167926073074341, 3.0]]}
model_custom.names.get(predictions["boxes"][0][-1])
Output:
'me'
This is exactly what we expected to get!!!
Delete all the resource to save cost.
Get endpoint metadata.
#Cleanup
import boto3
sm_client = boto3.client(service_name="sagemaker")
response = sm_client.describe_endpoint_config(EndpointConfigName=ENDPOINT_NAME)
print(response)
Output
{'EndpointConfigName': 'yolov11-custom-sign-language', 'EndpointConfigArn': 'arn:aws:sagemaker:us-east-1:346399954218:endpoint-config/yolov11-custom-sign-language', 'ProductionVariants': [{'VariantName': 'AllTraffic', 'ModelName': 'pytorch-inference-2024-11-20-12-16-14-001', 'InitialInstanceCount': 1, 'InstanceType': 'ml.g5.xlarge', 'InitialVariantWeight': 1.0}], 'CreationTime': datetime.datetime(2024, 11, 20, 12, 16, 15, 113000, tzinfo=tzlocal()), 'EnableNetworkIsolation': False, 'ResponseMetadata': {'RequestId': 'f9970e96-8722-42cb-a121-db251a410fca', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f9970e96-8722-42cb-a121-db251a410fca', 'content-type': 'application/x-amz-json-1.1', 'content-length': '414', 'date': 'Wed, 20 Nov 2024 12:47:09 GMT'}, 'RetryAttempts': 0}}
Delete endpoint.
endpoint_config_name = response['EndpointConfigName']
# Delete Endpoint
sm_client.delete_endpoint(EndpointName=ENDPOINT_NAME)
Output:
{'ResponseMetadata': {'RequestId': '84edae87-657c-40cf-834d-6cbf1fa8a2af',
'HTTPStatusCode': 200,
'HTTPHeaders': {'x-amzn-requestid': '84edae87-657c-40cf-834d-6cbf1fa8a2af',
'content-type': 'application/x-amz-json-1.1',
'date': 'Wed, 20 Nov 2024 12:47:12 GMT',
'content-length': '0'},
'RetryAttempts': 0}}
Delete endpoint configuration. This one is only logical metadata, but we will remove it anyway.
# Delete Endpoint Configuration
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
Output:
{'ResponseMetadata': {'RequestId': '243fa8e3-d522-4812-a9b8-c987efe499c5',
'HTTPStatusCode': 200,
'HTTPHeaders': {'x-amzn-requestid': '243fa8e3-d522-4812-a9b8-c987efe499c5',
'content-type': 'application/x-amz-json-1.1',
'date': 'Wed, 20 Nov 2024 12:47:16 GMT',
'content-length': '0'},
'RetryAttempts': 0}}
Now delete the model artifact.
# Delete Model
for prod_var in response['ProductionVariants']:
model_name = prod_var['ModelName']
sm_client.delete_model(ModelName=model_name)
The bottom line. Training Yolo is very easy. Deploying to Amazon SageMaker is a little bit more complicated, because you actually need to write the code to serialize and deserialize the image.