First of all there is no such think as persistent storage at the CloudFront. It doesn't exist yet. Period. However there are some attempts to implement a workaround. Most of them are proposed here .
I found one of them specially interesting. Storing the data in Lambda memory. The rest of the options add much more latency since all of them are trying to access the resources outside the Edge.
The idea is that the variables stored in the Lambda environment can survive between call invokations. This is due to reuse of the same environment by design, to save the need to provision new Lambda environment for every new call.
Unfortunately there is no deterministic way to know when the environment will be revoked by internal AWS logic, but the data persist for at least several minutes which may sound as a low number, but when it comes to thousands of request, the latency improvement is very significant.
We will start by creating the architecture diagram:
The request will start at the CloudFront CDN. Origin Request Lambda@Edge will be used to insert a custom header. We will use current datetime as value. Obviously it is very easy to see if the value changes or stays the same, since time should be changed every millisecond.
From the CloudFrom we will continue to Application Load Balancer. For the back-end we will use another Lambda (this time a standard, not @Edge) to print the headers and eventually we will see the result in the CloudWatch.
The Lambda code is very simple. It prints the headers to the console and returns "Hello from Lambda!" text. I also appended the current time to the output. I did it to verify that I don't get the result from the cache, but from the origin. Note that Application Load Balancer is defined as a trigger for this Lambda.
Read this instruction to set Lambda with Load Balancer integration.
To sum up, you only need to run this command
aws lambda add-permission --function-name YOUR_LAMBDA_NAME \
--statement-id load-balancer --action "lambda:InvokeFunction" \
--principal elasticloadbalancing.amazonaws.com
And set Lambda inside Application Load Balancer target group.
At this point you already can test your Lambda by using ALB domain name.
And the result is:
Next we will create the CloudFront Distribution and set the ALB we created earlier as an Origin. The steps can be found here.
It is important to set the TTL in the cache policy to 0. We do want to go to the origin for every request. This is of course done for the demo purposes only. Otherwise how can we check that the value was preserved? Sorry for the type in "Ratain".
We will set this policy in the behaviour of our distribution.
And finally let's create our Lambda@Edge function.
The code is very simple. The variable that we want to persist is "dataToPersist". If it is not defined, we will create a new value based on the current time. But if it exists, we will just reuse it.
The value will be set into new header with a very creative name "custom_header". Note that as a trigger, we define the CloudFront distribution we created earlier.
We are done!! Now let check what is the result.
I will invoke the distribution URL several times for several minutes and let's see what we have in the CloudWatch
Note the value of "custome_header". For the time difference of 10 seconds we got the same value. So out test to persist the data at edge was successful.
Several important things:
Lambda@Edge should be created in N.Virginia region. To avoid permission problems, create the rest of the resources in the same region. It doesn't mean that you cannot create them in another region, but you will need to adjust the IAM policies.
My example is very simple. Usually we probably want to store the data per user session. If you have a lot of the users the data to persist may consume a lot of the memory. Remember that Lambda's memory is limited.