Mike's place: December 2022

We use PCs, Laptops and smartphones to visit our favorite web sites. The technology is smart enough to present different layout for different devices. Many times I saw that when I enter the URL like mysite.com from my smartphone eventually I am redirected to m.mysite.com.

It is clear that something knows to identify that my device is not a desktop and redirects me to another location. But why should I hit the web server with a request that eventually does nothing but redirect? It consumes web server compute power, it consumes the traffic. It there a way to do this redirection at some early stage?

I will show you how to do it on the edge by using AWS WAF. It used for layer 7 protection, but there are specific features that can be used to achieve out goal.

First of all I will create a simple web site that can print "User-Agent" header. The simplest way to do it is by using Lambda and API gateway. The reason I choose the API Gateway is because it has an integration with AWS WAF.

My lambda code is very simple

export const handler = async(event) => {
    // TODO implement
    const response = {
        statusCode: 200,
        body: '<html><script type="text/javascript" src="https://74f7043f5910.us-east-1.sdk.awswaf.com/74f7043f5910/c3757efb1817/challenge.js" defer></script>'+JSON.stringify(event.headers["User-Agent"])+'</html>',
        headers: {
            "Content-Type": "text/html"
        }
    };
    return response;
};

And if I access the API gateway URL I see the following output from my desktop:

"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"

and if I access it from my smartphone I get

"Mozilla/5.0 (Linux; Android 12; M2011K2G) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"

User-Agent clearly identifies what is the source device.

I also created another Lambda. It will simulate my redirected mobile website. I will use "Lambda Function URL" since it is the fastest way to to access this lambda by URL

export const handler = async(event) => {
    // TODO implement
    const response = {
        statusCode: 200,
        body: '<html>I was redirected</html>',
        headers: {
            "Content-Type": "text/html"
        }
    };
    return response;
};

The result looks like

So the flow that I expect to have is the following.

I will open my smartphone and access the API Gateway URL. I will put a WAF rule that checks if my User-Agent contains "Android" and the Host header contains the URL of the API Gateway. If this happens I will redirect to my second Lambda to simulate redirection to the mobile web site.

Lets create the WAF

Next step is to associate it with API Gateway

Create a new rule that checks the User-Agent and performs the redirection.

My WAF rule has 2 statement and one redirect action:

For redirection it is just fine.

But since we are talking about WAF I also want to show an additional power of AWS WAF.

What is some bot starts to probe our web site? Bot can simulate different User-Agents, we don't want to "spend" the traffic handling redirects from bots.

We can avoid it by using AWS WAF Managed rules.

If you checked the first lambda code carefully you probably notices that my simple HTTP script also contains this javascript

<script type="text/javascript" src="https://74f7043f5910.us-east-1.sdk.awswaf.com/74f7043f5910/c3757efb1817/challenge.js" defer></script>

Why do we need this script? We need it since this is a way AWS WAF can check if connected client is a valid one or some malicious client. This script is mandatory if you want to enable bot protection and should be enabled in "Application Integration SDK"

Once we have it in out web page we can define the proper WAF rule.

Create new managed rule:

Select and edit the bot control rule:

Choose "Override to challenge" the following settings:

Save the rule and make him the first in the list of the rules

If bot detected, the WAF sets something called "label". You can read about labels here

Now we will add an additional statement to our custom rule to perform the action only if the label was not set which proves that the request was made by a human.

Save the rule.

We will not see any difference in the response. I am still redirected from my smartphone to the lambda that simulates my mobile web site when I access API Gateway URL, but this time I also have a bot protection in place.

The last thing I wanted to show is if you did everything correctly, by setting the challenge javascript you will see the following in the Browser network inspector.

You can see "verify" request. It is the request that is used by the WAF to verify that the connection is done by human and not bot.

You can read more about AWS WAF challenge here.

Due to internet development, under the sea optic cables and better compute power streaming video is now accepted as granted by most internet users.

However it is actually a very complicated process. And also very expensive. Let's assume I am OTT provider and I got 4k video as an input. I need to convert it to different resolutions to support different internet speed connection and moreover I need to adjust it to be displayed good on different devices (PC, TV, Mobiles....). So I need a lot of compute power to create different videos. I also consume a lot of traffic to stream this data. All of it costs a lot of money. And I am always searching where can I save.

One of the areas directly related to video streaming it the bitrate. Generally speaking bitrate is how many information does my video contains every second. Obviously if my video is showing Malevich's black screen I need a very low bitrate since all I need to display is black color on the screen. However crowded street will require a very high bitrate. Lower bitrate value will produce smaller files (less compute power), smaller files require less internet traffic.

Ok, so we know now that adjusting the bitrate to current picture can be potentially very cost saving. But how exactly I know if my video contains a crowded street or a black screen? Can you put an army of cheap workers to go over all your videos and manually set the bitrate value? Sounds non sense.

Another approach is to develop a sophisticated ML model that can determine the needed bitrate for specific pieces of the video. Yes, such things exist and it was implemented by Netflix several years ago.

And as every good idea it was adopted by other vendors and also by AWS.

The service which is performing the transcoding is AWS Media Convert and the feature is called "Automated ABR". While probably an enormous job was done to train such model, from the end user perspective all you need to do is to enable this option in Media Convert job definition.

So I decided to do a little hands-on, to see the actual result.

Since no one likes to invent the wheel I used AWS Sample to transcode the video with Media Convert .

The idea is simple, upload the media file to S3, Lambda will pick the file and use Media Convert to perform the encoding. The result will be stored back into S3 bucket.

The guide is not bad. They only thing that was missing from explanation is creation of "MediaConvertRole".

I created it manually. It needs to have S3 full access and include the following "trust relationships":

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": [

"s3.amazonaws.com",

"mediaconvert.amazonaws.com"

]

"Action": "sts:AssumeRole"

}

]

}

The guide contains the CloudFormation stack and this role is one of the parameters.

I also created a different JSON file with definition of the job with "ABR" options.

You can download it here.

And lastly I modified the Lambda code a little bit to be more specific to our use-case

#!/usr/bin/env python

import glob
import json
import os
import uuid
import boto3
import datetime
import random

from botocore.client import ClientError

def handler(event, context):

    assetID = str(uuid.uuid4())
    sourceS3Bucket = event['Records'][0]['s3']['bucket']['name']
    sourceS3Key = event['Records'][0]['s3']['object']['key']
    sourceS3 = 's3://'+ sourceS3Bucket + '/' + sourceS3Key
    sourceS3Basename = os.path.splitext(os.path.basename(sourceS3))[0]
    destinationS3 = 's3://' + os.environ['DestinationBucket']
    destinationS3basename = os.path.splitext(os.path.basename(destinationS3))[0]
    mediaConvertRole = os.environ['MediaConvertRole']
    region = os.environ['AWS_DEFAULT_REGION']
    statusCode = 200
    body = {}
    
    # Use MediaConvert SDK UserMetadata to tag jobs with the assetID 
    # Events from MediaConvert will have the assetID in UserMedata
    jobMetadata = {'assetID': assetID}

    print (json.dumps(event))
    
    try:
        # Job settings are in the lambda zip file in the current working directory
        with open('job_abr.json') as json_data:
            jobSettings = json.load(json_data)
            print(jobSettings)
        
        # get the account-specific mediaconvert endpoint for this region
        mc_client = boto3.client('mediaconvert', region_name=region)
        endpoints = mc_client.describe_endpoints()

        # add the account-specific endpoint to the client session 
        client = boto3.client('mediaconvert', region_name=region, endpoint_url=endpoints['Endpoints'][0]['Url'], verify=False)

        # Update the job settings with the source video from the S3 event and destination 
        # paths for converted videos
        
        jobSettings['Inputs'][0]['FileInput'] = sourceS3
        
        
        S3KeyHLS = 'assets_abr/' + assetID + '/HLS/' + sourceS3Basename
        jobSettings['OutputGroups'][0]['OutputGroupSettings']['HlsGroupSettings']['Destination'] \
            = destinationS3 + '/' + S3KeyHLS
         
 
        print('jobSettings:')
        print(json.dumps(jobSettings))

        # Convert the video using AWS Elemental MediaConvert
        job = client.create_job(Role=mediaConvertRole, UserMetadata=jobMetadata, Settings=jobSettings,Queue='arn:aws:mediaconvert:us-west-2:621094298987:queues/ABR')
        print (json.dumps(job, default=str))

    except Exception as e:
        
        print ('Exception: %s' % e)
        statusCode = 500
        traceback.print_exc()
        raise

    finally:
        return {
            'statusCode': statusCode,
            'body': json.dumps(body),
            'headers': {'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*'}
        }

I used the same mp4 file as the one that was used in the original guide.

Several seconds after I did an upload to S3, the Media Convert picked the job and started the transcoding process.

It took a while but eventually the process completed and I saw all the files in the S3 bucket.

Since I didn't want to open my S3 bucket to the public access I used aws cli "sync" command to copy the content of S3 to my local folder.

I used VLC to play different resolution.

And thats it. Media Convert made it really easy to implement the feature, that otherwise you need to spend huge effort to implement it by yourself. One important note, the "ABR" feature is part of the professional tier.

For the cost you also need to consider S3 storage. Several MB file eventually produced about 300 MB to support different resolutions and devices.

Mike's place

Wednesday 28 December 2022

Redirect from desktop to mobile application by using AWS WAF

Friday 2 December 2022

Putting Content Aware Encoding to work with AWS Media Convert "Automated ABR"

Followers