Sunday, 24 November 2024

Customize YOLO11 for sign language object detection and deploy to SageMaker Endpoint

The commands in this blog should be executed in Jupiter notebook. Full notebook can be found here

This notebook will use the latest YOLO11 (at the moment of writing this ) model to identify sign language. It will be trained on the custom dataset created by the author.  The classes that we will identify are: ['hello', 'love', 'me', 'mother', 'no', 'please', 'thankyou', 'yes', 'you', 'your'] 

First of all, we will check what the version of CUDA is.

!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0

We need it to identify proper dependencies to install. After we see the version of CUDA, go to https://pytorch.org/get-started/locally/

and check for proper dependencies. I didn't see 12.5 version, but assumed that 12.4 will probably also work.

image.png

Next, we run it

!pip3 install --upgrade torch torchvision torchaudio ultralytics

Now we need to download the base model from https://github.com/ultralytics/ultralytics. There are several model sizes, and this example uses the largest one - YOLO11x.

image.png


!wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt

Next command may NOT be necessary. But after I ran the training for the first time, the process failed with some missing C library. Running this command fixed the problem.

!sudo apt-get install -y libgl1-mesa-glx

Next, you need to create the dataset. I used the Roboflow - https://roboflow.com/ annotation tool (free sign-up required). You actually can use any annotation tool, and the major reason why I used Roboflow is because it allows you to download the dataset in a format that YOLO expects. It createwd all training validation and test folders with all relevant metadata.

image.png

"love" sign

  image.png

image.png


Your folders should look like this after you extract the files from Roboflow and download YOLO model

image.png


The code to train is very simple. Note that we trained on a GPU, and the memory consumption is somewhere around 17GB. So you need to choose relevant instance for training. I used ml.g5.xlarge, but you may try a different one.


from ultralytics import YOLO

# Load a model
model = YOLO("yolo11x.pt")

# Train the model (I used ml.g5.xlarge instance)

train_results = model.train(
    data="data.yaml",  # path to dataset YAML
    epochs=100,  # number of training epochs
    imgsz=640,  # training image size
    device=0,  # device to run on, i.e. device=0 or device=0,1,2,3 or device=cpu
)
(I cut some output )
Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/home/sagemaker-user/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
Ultralytics 8.3.34 🚀 Python-3.11.10 torch-2.5.1+cu124 CUDA:0 (NVIDIA A10G, 22503MiB)
engine/trainer: task=detect, mode=train, model=yolo11x.pt, data=data.yaml, epochs=100, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train
Downloading https://ultralytics.com/assets/Arial.ttf to '/home/sagemaker-user/.config/Ultralytics/Arial.ttf'...

.
.
.
.
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size

      1/100      17.2G      1.674      4.297      1.922          8        640: 100%|██████████| 29/29 [00:15<00:00,  1.83it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:02<00:00,  1.94it/s]
                   all        137        131      0.666      0.291      0.373      0.213


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size

      2/100      17.1G      1.475      2.712      1.738          9        640: 100%|██████████| 29/29 [00:15<00:00,  1.92it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  2.74it/s]
                   all        137        131      0.599      0.337      0.297      0.141


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size

      3/100      17.3G      1.462      2.377      1.689         13        640: 100%|██████████| 29/29 [00:14<00:00,  1.95it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  2.84it/s]
                   all        137        131      0.277      0.137     0.0701     0.0196


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size

      4/100      17.3G      1.517      2.154      1.682          7        640: 100%|██████████| 29/29 [00:14<00:00,  1.95it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  2.92it/s]
                   all        137        131      0.388      0.133      0.121     0.0482


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size

      5/100      17.1G      1.413      1.692      1.596         17        640: 100%|██████████| 29/29 [00:14<00:00,  1.95it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  2.87it/s]
                   all        137        131      0.326      0.449      0.432      0.252
.
.
.
.
.

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size

     98/100      17.3G     0.7151      0.328      1.152          8        640: 100%|██████████| 29/29 [00:14<00:00,  1.96it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  2.99it/s]
                   all        137        131      0.977      0.996      0.995      0.745


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size

     99/100      17.3G     0.7023     0.3214      1.142          7        640: 100%|██████████| 29/29 [00:14<00:00,  1.96it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  2.99it/s]
                   all        137        131      0.977      0.997      0.995      0.753


      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size

    100/100      17.3G     0.6968     0.3162      1.135          7        640: 100%|██████████| 29/29 [00:14<00:00,  1.96it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  2.99it/s]
                   all        137        131      0.976      0.997      0.995      0.754


100 epochs completed in 0.565 hours.
Optimizer stripped from runs/detect/train/weights/last.pt, 114.4MB
Optimizer stripped from runs/detect/train/weights/best.pt, 114.4MB

Validating runs/detect/train/weights/best.pt...
Ultralytics 8.3.34 🚀 Python-3.11.10 torch-2.5.1+cu124 CUDA:0 (NVIDIA A10G, 22503MiB)
YOLO11x summary (fused): 464 layers, 56,839,729 parameters, 0 gradients, 194.5 GFLOPs

                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  2.94it/s]

                   all        137        131       0.98      0.993      0.995      0.775
                  food          7          7      0.992          1      0.995      0.822
                 hello         16         16      0.992          1      0.995      0.739
                  love         20         20      0.989          1      0.995      0.802
                    me         11         11          1      0.922      0.995       0.73
                mother         28         28      0.997          1      0.995      0.776
                    no          9          9      0.982          1      0.995      0.779
                please          4          4       0.97          1      0.995      0.798
              thankyou          9          9      0.894          1      0.995      0.768
                   yes          4          4      0.975          1      0.995       0.67
                   you         18         18      0.992          1      0.995      0.817
                  your          5          5          1          1      0.995      0.822
Speed: 0.2ms preprocess, 9.4ms inference, 0.0ms loss, 0.8ms postprocess per image
Results saved to runs/detect/train
MLflow: results logged to runs/mlflow
MLflow: disable with 'yolo settings mlflow=False'


You can find the best model in this directory. It appears at the end of the training process. I renamed it to yolo11x_custom.pt


image.png


For deployment to Sagemaker I created the following folder structure (see the printscreen).

  image.png

We will discuss later why do we need it.


#now we will load the trained model
from ultralytics import YOLO
model_custom = YOLO("deploy/yolo11x_custom.pt")
model_custom.info()

YOLO11x summary: 631 layers, 56,886,481 parameters, 0 gradients, 195.5 GFLOPs
(631, 56886481, 0, 195.5136)

# We will use one of the validation images (it was not used for training) to check the results. 
# You validation image probably will have a different name
# Note the "save" parameter. 
VALIDATION_IMAGE='valid/images/me000040_jpg.rf.c17bb94aafb9977099e35ec30076fbb5.jpg'

response = model_custom.predict(source=VALIDATION_IMAGE,show=False,save=True,conf=0.25,line_width=1)
response

Since we put "save=True" we can find the output image in this directory (in your case it will be different)

  image.png

And here is the output, which looks good. Now we want to extract textual classification of the image ("me").

image.png


# this is generic code to read the output from the model
#  we focused on object detection, but the model can do also other things
# like semantic segmentation
import json

infer = {}
for result in response:
    if result.boxes:
        infer['boxes'] = result.boxes.cpu().numpy().data.tolist()
    if result.masks:
        infer['masks'] = result.masks.cpu().numpy().data.tolist()
    if result.probs:
        infer['probs'] = result.probs.cpu().numpy().data.tolist()
j_result=json.dumps(infer)



#to get predicted class, run the following code
predictions = json.loads(j_result)
model_custom.names.get(predictions["boxes"][0][-1])
       

Output:

'me'

Hosting on Amazon SageMaker


In order to host the model on SageMaker, you need to perform very specific steps. They are described here: https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html (check Model Directory Structure section).

But in short, you need to have inference.py and requirements.txt files in the code directory and the model file in parallel to the code directory. You can find both files in deploy/code directory in this GitHub. The code folder and the model should be ziped into model.tar.gz file and uploaded to S3

To create the gz file run "tar -czvf model.tar.gz code/ yolo11x_custom.pt" from deploy directory


# now upload the model to S3. You need to make sure that you have proper permissions to work with S3 bucket
from sagemaker import s3
#change to your bucket
bucket = "s3://YOUR_BUCKET_HERE"
prefix = "yolov11/demo-custom-endpoint"
model_data = s3.S3Uploader.upload("deploy/model.tar.gz", bucket + "/" + prefix)
model_data

Output

's3://YOUR_BUCKET_HERE/yolov11/demo-custom-endpoint/model.tar.gz'

Create model metadata.

from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role
import sagemaker

model_name = 'yolo11x_custom.pt'

role = get_execution_role()
session = sagemaker.Session()

#I used prebuild image for SageMaker with Pytorch and GPU.
#You can find relevant image here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md
#Note that the one I use is for "us-east-1" region

model = PyTorchModel(entry_point='inference.py',
                     model_data=model_data,
                     image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.5.1-gpu-py311-cu124-ubuntu22.04-sagemaker',
                     role=role,
                     env={'TS_MAX_RESPONSE_SIZE':'20000000', 'YOLOV11_MODEL': model_name},
                     sagemaker_session=session)

Deploy the model to SageMaker endpoint

from sagemaker.deserializers import JSONDeserializer

# I choose g5 instance since it has A10G GPU card.
# Could be that less powerfull instance can be used, but you have to check it by yourself :). 

INSTANCE_TYPE = 'ml.g5.2xlarge'
ENDPOINT_NAME = 'yolov11-custom-sign-language'

predictor = model.deploy(initial_instance_count=1,
                         instance_type=INSTANCE_TYPE,
                         deserializer=JSONDeserializer(),
                         endpoint_name=ENDPOINT_NAME)
Run inference with test image
import boto3
import numpy as np
import json
import io
import cv2

#By default, the endpoint input serializer expects bytes, so we convert the image to bytes.
#Inside inference.py file we convert it back to image object

# Load the image. It is the same one we used by directly calling "predict" on the model_custom object

image_path = VALIDATION_IMAGE  # Change this to your image file
image = cv2.imread(image_path)
_, img_bytes = cv2.imencode('.jpg', image)

# Convert the memory buffer to bytes
image_bytes = img_bytes.tobytes()

# Create a SageMaker runtime client
runtime_client = boto3.client('sagemaker-runtime')

# Invoke the endpoint
response = runtime_client.invoke_endpoint(
    EndpointName=ENDPOINT_NAME,
    ContentType='application/x-image',
    Body=image_bytes
)

# Process the response
result = response['Body'].read()
predictions = json.loads(result)
print(predictions)

{'boxes': [[280.300048828125, 333.251708984375, 345.956787109375, 436.8214111328125, 0.8167926073074341, 3.0]]}

model_custom.names.get(predictions["boxes"][0][-1])

Output:

'me'
This is exactly what we expected to get!!!

Delete all the resource to save cost.

Get endpoint metadata.

#Cleanup
import boto3

sm_client = boto3.client(service_name="sagemaker")

response = sm_client.describe_endpoint_config(EndpointConfigName=ENDPOINT_NAME)
print(response)

Output

{'EndpointConfigName': 'yolov11-custom-sign-language', 'EndpointConfigArn': 'arn:aws:sagemaker:us-east-1:346399954218:endpoint-config/yolov11-custom-sign-language', 'ProductionVariants': [{'VariantName': 'AllTraffic', 'ModelName': 'pytorch-inference-2024-11-20-12-16-14-001', 'InitialInstanceCount': 1, 'InstanceType': 'ml.g5.xlarge', 'InitialVariantWeight': 1.0}], 'CreationTime': datetime.datetime(2024, 11, 20, 12, 16, 15, 113000, tzinfo=tzlocal()), 'EnableNetworkIsolation': False, 'ResponseMetadata': {'RequestId': 'f9970e96-8722-42cb-a121-db251a410fca', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f9970e96-8722-42cb-a121-db251a410fca', 'content-type': 'application/x-amz-json-1.1', 'content-length': '414', 'date': 'Wed, 20 Nov 2024 12:47:09 GMT'}, 'RetryAttempts': 0}}

Delete endpoint.

endpoint_config_name = response['EndpointConfigName']

# Delete Endpoint
sm_client.delete_endpoint(EndpointName=ENDPOINT_NAME)

Output:

{'ResponseMetadata': {'RequestId': '84edae87-657c-40cf-834d-6cbf1fa8a2af',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '84edae87-657c-40cf-834d-6cbf1fa8a2af',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Wed, 20 Nov 2024 12:47:12 GMT',
   'content-length': '0'},
  'RetryAttempts': 0}}

Delete endpoint configuration. This one is only logical metadata, but we will remove it anyway.

# Delete Endpoint Configuration
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

Output:

{'ResponseMetadata': {'RequestId': '243fa8e3-d522-4812-a9b8-c987efe499c5',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '243fa8e3-d522-4812-a9b8-c987efe499c5',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Wed, 20 Nov 2024 12:47:16 GMT',
   'content-length': '0'},
  'RetryAttempts': 0}}

Now delete the model artifact.

# Delete Model
for prod_var in response['ProductionVariants']:
    model_name = prod_var['ModelName']
    sm_client.delete_model(ModelName=model_name)

The bottom line. Training Yolo is very easy. Deploying to Amazon SageMaker is a little bit more complicated, because you actually need to write the code to serialize and deserialize the image.