Friday 25 November 2022

Persistent storage at CloudFront by using Lambda@Edge

 First of all there is no such think as persistent storage at the CloudFront. It doesn't exist yet. Period. However there are some attempts to implement a workaround. Most of them are proposed here .

I found one of them specially interesting. Storing the data in Lambda memory. The rest of the options add much more latency since all of them are trying to access the resources outside the Edge.

The idea is that the variables stored in the Lambda environment can survive between call invokations. This is due to reuse of the same environment by design, to save the need to provision new Lambda environment for every new call.

Unfortunately there is no deterministic way to know when the environment will be revoked by internal AWS logic, but the data persist for at least several minutes which may sound as a low number, but when it comes to thousands of request, the latency improvement is very significant. 

We will start by creating the architecture diagram:



The request will start at the CloudFront CDN. Origin Request Lambda@Edge will be used to insert a custom header. We will use current datetime as value. Obviously it is very easy to see if the value changes or stays the same, since time should be changed every millisecond.

From the CloudFrom we will continue to Application Load Balancer. For the back-end we will use another Lambda (this time a standard, not @Edge) to print the headers and eventually we will see the result in the CloudWatch.



The Lambda code is very simple. It prints the headers to the console and returns "Hello from Lambda!" text. I also appended the current time to the output. I did it to verify that I don't get the result from the cache, but from the origin. Note that Application Load Balancer is defined as a trigger for this Lambda.

Read this instruction to set Lambda with Load Balancer integration.

To sum up, you only need to run this command

aws lambda add-permission --function-name YOUR_LAMBDA_NAME \

--statement-id load-balancer --action "lambda:InvokeFunction" \

--principal elasticloadbalancing.amazonaws.com

And set Lambda inside Application Load Balancer target group.

At this point you already can test your Lambda by using ALB domain name.


And the result is:



Next we will create the CloudFront Distribution and set the ALB we created earlier as an Origin. The steps can be found here.

It is important to set the TTL in the cache policy to 0. We do want to go to the origin for every request. This is of course done for the demo purposes only. Otherwise how can we check that the value was preserved? Sorry for the type in "Ratain". 





We will set this policy in the behaviour of our distribution.



And finally let's create our Lambda@Edge function.



The code is very simple. The variable that we want to persist is "dataToPersist". If it is not defined, we will create a new value based on the current time. But if it exists, we will just reuse it.

The value will be set into new header with a very creative name "custom_header". Note that as a trigger, we define the CloudFront distribution we created earlier.

We are done!! Now let check what is the result. 

I will invoke the distribution URL several times for several minutes and let's see what we have in the CloudWatch


Note the value of "custome_header". For the time difference of 10 seconds we got the same value. So out test to persist the data at edge was successful. 

Several important things:

Lambda@Edge should be created in N.Virginia region. To avoid permission problems, create the rest of the resources in the same region. It doesn't mean that you cannot create them in another region, but you will need to adjust the IAM policies. 

My example is very simple. Usually we probably want to store the data per user session. If you have a lot of the users the data to persist may consume a lot of the memory. Remember that Lambda's memory is limited.


Sunday 13 November 2022

AWS Solution Architect - Professional Certification

Some time ago I completed AWS Solution Architect - Professional Certification exam. The exam is hard. You have a lot of questions and most of the questions actually tell a story. So you need to react fast because most of the questions it mostly takes time to read the question and not to answer it. As will ML certification I did earlier I wrote the key points that helped mew to prepare and answer the questions and I am going to share it with you.

  • Amazon SQS extended client library for Java is designed for messages up to 2 GB.
  • For time series data, create new Dynamo table for each series. For example, new table for each week.
  • Dynamo DB - you need to know RC and WC to calculate the partition
  • Aurora supports regional failover
  • Beanstalk, has "swap url" feature to support blue green deployment
  • StackPolicy - uses in CloudFormation to protect the resources from modification.
  • If you have implicit deny, you need to have also "Allow". Can be updated only from CLI
  • Lambda calls from API Gateway have 30s timeout. You can decrease the timeout, but not to increase.
  • API Gateway - responses like 403 can be customized to something else, like 400 (with ability to add custom headers)
  • VPC CIDR is not modifiable, but you can add up to 4 secondary CIDR.
  • For BYOL (L-license) use AWS license manager
  • Cannot upload ACM certificate from IAM. No management from IAM console.
  • OpsWork is supported in CloudFormation.  If you need to use OpsWork inside the CloudFormation stack. it is better to manage the EC2 within OpsWork stack and CLoudFormation should deal with a resources which do not change frequently.
  • Instance from the different customers, running on the same physical machine are isolated by hyperviser
  • If using CodePipeline with OpsWork, the OpsWork should be put into Deploy stage
  • RDS MySQL can create cross-region read replica.
  • ENI can have a static MAC address that doesn't change when you reattach it to another EC2.
  • AWS workdocs can be used to create file-sharing solution and can be integrated with Active Directory.
  • There is no option to modify the DHCP option in VPC. You need to create the new one if you want to change,
  • S3 transfer accelerator cannot compare the speed at Edges, only at Regions
  • CloudFront caches GET,HEAD and optionally OPTIONS requests
  • To replicate RDS from AWS to On-Prem use IPSEC VPN.
  • You can assign several ENI to EC2 and each ENI can get SSL certificate.
  • CloudTrail can send the logs from different accounts to a single bucket. Also from different regions.
  • AppsStream 2.0 price is based on the monthly fee per user and stream resource.
  • Custom SSL certificate or third-party certificate can not be configured in Route 53. Origin Access Identity does not deal with custom SSL.
  • AWS Inspector is used for EC2. Cannot inspect API Gateway
  • Once you enable the encryption on RDS, the snapshots are encrypted also, and read replicas.
  • CloudFormation "DeletePolicy" for RDS can be "retain" and "snapshot". For the "snapshot" the RDS data will be stored, for "retain" the RDS will keep running
  • Aurora MySQL can be set as replication slave for self-managed MySQL or RDS MySQL
  • CloudWatch can monitor data cross-region
  • Redshift cannot replicate the data from Cluster in Region A to data in Region B. It can only replicate the snapshots.
  • Raid 0 - instead of writing to a single disk, the data is split between several disks to increase the throughput. 
  • CloudFront doesn't support IP as origin
  • Route 53 alias can point to ALB
  • OAuth 2.0 is used with WebIdentityFederation. Also used with public identity providers like Google, Facebook...
  • CloudFront signer can be "key group" or account. For "key group", you don't need ROOT user to manage the keys.
  • Dynamo DB stream has 24 hours retention limit.
  • Cognito identity pool - supports anonymous guest user.
  • Beanstalk cannot delete the source bundle or previous versions
  • Spots are not good for EMR core nodes.
  • Beanstalk "zero downtime" creates new instances. Cannot update existing.
  • CodeDeploy can disable ELB health check.
  • EC2 Termination protection doesn't prevent ACG from terminating the EC2. Instance protection - does.
  • To support client-side certificates use TCP listeners at ELB. So the HTTPS will not terminate at ELB but at EC2
  • AWS doesn't support promiscuous mode.
  • Client IP can be preserved in ELB for both HTTP and TCP with Proxy Preservation feature.
  • DMS selection rules allows to filter the source data
  • DMS transformation rules can be used to remove columns or add prefixes to table names.
  • Changes in "All Features" in AWS Organization can be upgraded in flight. 
  • S3 static website - CloudFront origin policy need to be HTTP.
  • Classic load balancer cannot handle multiple certifications.
  • NAT gateway is IP4 only.
  • Kinesis Data Stream data retention can be set up to 7 days (default = 1 day)
  • Simple AD allow access to AWS workspaces,workdoc.... and supports automated snapshots.
  • On-Prem load balancer can be set as custom origin in CloudFormation
  • ALB can use both on-prem IP addresses and AWS IP addresses as  target
  • Beanstalk can use customer AMI, dockers and "create custom environment" based on Ubuntu, Redhat or AWS Linux.
  • Storage Gateway. Cached volume = 1024TB, stored volume  = 512TB
  • AWS Organization  - cannot resend invite if present invite is still open
  • EC2 Image builder can distribute AMI to multiple regions and share with other AWS accounts. 
  • Data moved from EBS to S3 is not encrypted.
  • EFS enables encryption at rest when created. Encryption in transit is enabled when mounting.
  • Snapshots can be created only for EBS volumes.
  • AMI can be created for instance store.
  • CloudHSM can perform SSL transaction.
  • Cost allocation tags does not appear in cost reports before they were activated.
  • HVM AMI - Extend host hardware to "guest". Increases performance
  • Root user cannot "switch role".
  • NLB can preserve source IP if the target is "instance IP".
  • Billing reports can integrate with Redshift, Athena and Quicksight.
  • AWS Storage Gateway  - the replication is asynchronous.
  • RAID 1,5 not recommended for EBS/
  • Java SDK. - can increase SQS visibility for messages with specific header.
  • Only existing DX connections can be aggregated into LAG. Connections operate as Active/Active. Max number is 4 connections. They have to be of the same bandwidth .  
  • Single root IO (SR-IOV) provides high performance network, lower CPU utilization
  • Reserved Instances EC2 discount works cross-account, but only in the same AZ.
  • Classic LB - doesn't modify the headers for SSL
  • ASG - can be suspended and resumed.
  • Amazon Managed Blockchain  - does not allow to manage EC2 or ECS. AWS Blockchain templates - does.
  • CloudFront supports TTL=0.
  • Hibernate - only on ROOT volume. Volume should be encrypted. Must use HVM AMI.
  • EC2 ephemeral volumes are not encrypted at rest.
  • Dynamo DB - cannot combine On-Demand for read and provisioned for write.
  • Cannot delete VPC if it has NAT instance
  • Containers images for Lambda need to put the extensions to /opt folder.
  • Public virtual interfaces used to connect to AWS services reachable by public IP. (e.g S3)
  • Snowball data cannot be directly copied to Glacier. 
  • Gp2 - each 1GB adds 3IOPS.
  • Kinesis FIrehose can aggregate CloudWatch logs from different accounts, by creating the subscription filters.
  • You don't pay for S3 accelerated transfer if there is no acceleration.
  • SAM framework - can deploy blue/green deployment for Lambda via CodeDeploy
  • VPC sharing (part of RAM) allow multiple accounts to share resources into centrally managed VPC.
  • VPC flow logs cannot detect packet loss or inspect network traffic.
  • S3 object always owned by uploading account. If object owner is not the same as bucket owner, the bucket owner cannot access the object without relevant permissions.
  • CloudFrom - to improve cache hit ration, configure separate cache behaviour for static and dynamic content. Configure cookies forward to origin for dynamic content.
  • S3  - it can take up to 24 hours to propagate S3 names to another region
  • QuickSign can access RDS in private subnet by using VPC connector from QuickSign admin.
  • Route 53 private or public zones can use EC2 health check only if EC2 has public IP.  
  • S3 static web sites - objects cannot be encrypted by KMS. Bucket and object owner should be the same.
  • When choosing NLB or Global Accelerator for static IP, the NLB is cheaper.
  • AWS Service Migration Service is already at the end of life.
  • To connect On-Prem to VPC, use public virtual interface
  • Route 53 traffic flow policies - can route domains and subdomains but not path-based flows. ALB and Lambda@Edge - can.
  • S3 Gateway endpoint - no access from on-prem or another AWS region. Uses S3 public IP.
  • S3  - to ignote all public ACL, set IgnotePublicACL flag to true.
  • Farhgate task - if internet access needed. In public subnet  - enable auto assign IP, in private subnet disable auto-assign IP and configutr NAT gateway.
  • TAGS - takes up to 24 hours to activate
  • NLB - no security groups.
  • OpsWork support only blue/green deployment, not canary.