Modern serverless applications on AWS are complex with a lot of moving parts. Mapping a developer workflow onto those applications can be difficult. This article discusses the developer workflow I have developed for complex serverless applications at aleph0, with example CloudFormation template and GitHub Action snippetes to illustrate the concepts.
This article will use the following architecture to drive the discussion:
The architecture uses an API Gateway REST API to define endpoints that use StartSyncExecution to invoke an Express Step Function, which in turn invokes Lambda function(s) and/or other AWS services.
Building APIs with this architecture involves coordinating changes to three different components, all of which must work together:
The remainder of this discussion focuses on a developer workflow for building an application with this architecture.
The developer workflow makes use of several modern best practices, which are outlined here. The tech stack this workflow uses to implement these practices are as follows:
Certainly, these tools could easily be substituted for others — GitLab for GitHub, Jenkins for GitHub Actions, and so on — but this workflow uses the above tooling.
CICD — or Continuous Integration and Continuous Delivery — is a form of developer tooling that takes code changes and builds, validates, and deploys them automatically. It’s a cornerstore of modern development workflows and all developer workflows should use it. There are many reasons why, which have already been discussed at length online, but here are just a few:
This developer workflow uses the following (simple, bog-standard) CI/CD pipeline:
When a developer is ready to submit a code change, they push to GitHub. After the change has passed the team’s code review process and been accepted, a CI/CD process runs as a GitHub action, which ultimately deploys the code change to AWS.
References to CI/CD in the rest of the workflow are talking about this pipeline.
IaC — or Infrastructure as Code — is another form of developer tooling that takes a description of a software deployment and assembles it automatically. IaC is another cornerstone of modern development practices that all workflows should use. Its virtues have also been extolled online, but here are a few for reference:
The list should look familiar.
Given the nature of the task — building serverless architectures in AWS — this developer workflow uses AWS CloudFormation for IaC with the Serverless Application Model (SAM) transformation for ease of use.
References to IaC in the rest of the workflow, including sample templates, are talking about this stack.
Per the above architecture and assumptions, the Lambda Functions, Step Functions, and the REST API are all managed separately. It’s possible to manage all of these things together in one CloudFormation template, but that makes things simpler, so this discussion assumes everything is managed independently. It would not be difficult to adapt this process to the simpler case where everything is managed together.
Each Lambda Function implements a microservice, and therefore has its own repository and can be deployed independently. The developer workflow for Lambda Functions is the standard CI/CD pipeline. The deployment is implemented as a SAM template, which allows it publish a new Lambda Function version and alias automatically on each deploy.
So when developers push a code change to a Lambda Function repository, the updated code becomes available in the cloud shortly thereafter under an alias, e.g., “stag” short for “staging”.
This snippet shows the relevant details of the CloudFormation template. The below CD workflow assumes the template is stored at /cfn.yml
in the GitHub repository.
Note that the resource is of type AWS::Serverless::Function, not AWS::Lambda::Function.
AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31 # Must use SAM xformation
Parameters:
BuildId:
Type: String
Description: The build identifier
AllowedPattern: "^[0-9A-Za-z]+$"
Resources:
LambdaFunction:
Type: AWS::Serverless::Function
Properties:
# Many other properties for architecture, runtime, code, IAM, ...
Name: xyz
AutoPublishAlias: stag # Auto-publish stag alias
Timeout: 30 # 30sec is API Gateway timeout
VersionDescription: !Ref BuildId # Label for reference only
Outputs:
LambdaFunctionArn:
Value: !GetAtt LambdaFunction.Arn
Description: The ARN of the XYZ Lambda function
Export:
Name: xyz-lambda-function-arn
LambdaFunctionName:
Value: !GetAtt LambdaFunction.Name
Description: The Name of the XYZ Lambda function
Export:
Name: xyz-lambda-function-name
This snippet shows the skeleton of a CD workflow for the Lambda function. It can easily be adapted to an integration workflow as well. Note that it assumes the CloudFormation template is stored at /cfn.yml
in the GitHub repository.
name: delivery
on:
push:
branches:
- main
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v3
- # Set up platform, e.g., Java actions/setup-java
- # Configure build tool if necessary, e.g., mvn
- # Set up caching if desired, e.g., maven repo actions/cache
# See below for setting up OIDC for AWS permissions
# https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#sample-iam-oidc-cloudformation-template
- name: Configure AWS credentials for us-east-2
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/github-oidc-role"
aws-region: "${{ vars.AWS_REGION }}"
- # Perform build with build tool, e.g., mvn
- name: Prepare CloudFormation stack
# You must make cfn.yml in repo yourself, see above snippet
run: aws cloudformation package --template-file cfn.yml --s3-bucket "$S3_BUCKET" --s3-prefix artifacts/REPOSITORY >cfn.packaged.yml
env:
REPOSITORY: "${{ github.event.repository.name }}"
S3_BUCKET: "${{ vars.S3_BUCKET }}"
- name: Deploy CloudFormation stack
uses: aws-actions/aws-cloudformation-github-deploy@v1
with:
name: "${{ github.event.repository.name }}"
template: cfn.packaged.yml
parameter-overrides: >-
BuildId=${{ github.sha }}
no-fail-on-empty-changeset: 1
# You must make this role yourself with proper IAM perms
role-arn: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/cloudformation-deploy-role"
capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_AUTO_EXPAND
Each Step Function implements a service, which is comprised of orchestrated calls to the Lambda microservices from above and other AWS services. The developer workflow for Step Functions is the standard CI/CD pipeline. The repository only needs to contain the Step Function definition, the relevant GitHub actions, and a way to deploy the Step Function, e.g., a CloudFormation template.
Also, the Step Function must take as part of its input a value indicating the environment in which it’s been invoked, e.g., “stag” vs. “prod”. (Surprisingly, if a Step Function is invoked using an alias, that alias is not available inside the Step Function, even in the State Machine ARN, which is why this is necessary.) When calling a Lambda Function, the Step Function must use the environment value to resolve the alias to use to perform the invocation, typically using the environment value as the alias itself.
So when developers push a code change to a Step Function repository, the updated Step Function becomes available in the cloud shortly thereafter under an alias, e.g., “stag” short for “staging”. The Step Function must also be an Express Step Function (as opposed to Standard Step Function).
Users may find the template examples in the AWS Console to be very helpful in creating new State Machines.
This snippet shows the relevant details of the CloudFormation template. The below CD workflow assumes the template is stored at /cfn.yml
in the GitHub repository.
Note that the resource is of type AWS::Serverless::StateMachine, not AWS::StepFunctions::StateMachine.
AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31 # Must use SAM xformation
Parameters:
BuildId:
Type: String
Description: The build identifier
AllowedPattern: "^[0-9A-Za-z]+$"
Resources:
StateMachine:
Type: AWS::Serverless::StateMachine
Properties:
# Many other properties for architecture, runtime, code, IAM, ...
Name: xyz
Type: EXPRESS # Has to be express
AutoPublishAlias: stag # Auto-publish stag alias
# This example definition simply calls the Lambda function above.
# Obviously, it can do anything you want it to.
Definition:
Comment: "${buildId}"
StartAt: LambdaInvoke
States:
LambdaInvoke:
Type: Task
Resource: arn:aws:states:::lambda:invoke
OutputPath: "$.Payload"
Parameters:
FunctionName: "${lambdaFunctionArn}:stag"
Payload.$: "$"
Retry:
- ErrorEquals:
- Lambda.ServiceException
- Lambda.AWSLambdaException
- Lambda.SdkClientException
- Lambda.TooManyRequestsException
IntervalSeconds: 1
MaxAttempts: 3
BackoffRate: 2
End: true
DefinitionSubstitutions:
buildId: !Ref BuildId
lambdaFunctionArn: !ImportValue xyz-lambda-function-arn
Policies:
- LambdaInvokePolicy:
functionName: !ImportValue xyz-lambda-function-name
Outputs:
StateMachineArn:
Value: !Ref StateMachine
Description: The ARN of the XYZ State Machine
Export:
Name: xyz-state-machine-arn
StateMachineName:
Value: !GetAtt StateMachine.Name
Description: The name of the XYZ State Machine
Export:
Name: xyz-state-machine-name
For reference, this step function renders as follows in the excellent Workflow Studio:
This snippet shows the skeleton of a CD workflow for the State Machine. It can easily be adapted to an integration workflow as well. Note that it assumes the CloudFormation template is stored at /cfn.yml
in the GitHub repository.
name: delivery
on:
push:
branches:
- main
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v3
# See below for setting up OIDC for AWS permissions
# https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#sample-iam-oidc-cloudformation-template
- name: Configure AWS credentials for us-east-2
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/github-oidc-role"
aws-region: "${{ vars.AWS_REGION }}"
- name: Prepare CloudFormation stack
# You must make cfn.yml in repo yourself, see above snippet
run: aws cloudformation package --template-file cfn.yml --s3-bucket "$S3_BUCKET" --s3-prefix artifacts/REPOSITORY >cfn.packaged.yml
env:
REPOSITORY: "${{ github.event.repository.name }}"
S3_BUCKET: "${{ vars.S3_BUCKET }}"
- name: Deploy CloudFormation stack
uses: aws-actions/aws-cloudformation-github-deploy@v1
with:
name: "${{ github.event.repository.name }}"
template: cfn.packaged.yml
parameter-overrides: >-
BuildId=${{ github.sha }}
no-fail-on-empty-changeset: 1
# You must make this role yourself with proper IAM perms
role-arn: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/cloudformation-deploy-role"
capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_AUTO_EXPAND
The REST API should be defined as an OpenAPI spec. A GitHub repository will be dedicated to the API and its deployment. The spec itself must either (a) reside in the repository, or (b) be publicly available over HTTP using a well-known URL. A process may be used to prepare the OpenAPI spec for deployment (for example, by adding API Gateway Extensions to the spec), or the spec may be stored already prepared.
The developer workflow for the REST API is the standard CI/CD pipeline. By default, the pipeline should deploy changes to a staging environment, which is represented as a REST API stage with the name “stag”. In addition to the normal “git push” trigger, the CI/CD pipeline may also have a manual trigger for cases when an externally-stored OpenAPI spec changes.
In general, endpoint implementations should invoke Step Function services using StartSyncExecution
(which is why they must be Express Step Functions) and passing the stage name (available as $context.stage
in mapping templates) as the required environment input.
So when developers push a code change to the REST API repository, the updated stag stage of the REST API becomes available in the cloud shortly thereafter.
Serverless templates for REST APIs are essentially thin wrappers around OpenAPI specs with API Gateway extensions. The AWS Management Console has an outstanding builder for REST APIs that can export the definition automatically to help users get started. This is left as an exercise to the reader.
This snippet shows the skeleton of a CD workflow for the API Gateway. It can easily be adapted to an integration workflow as well. Note that it assumes the CloudFormation template is stored at /cfn.yml
in the GitHub repository.
name: delivery
on:
push:
branches:
- main
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout source code
uses: actions/checkout@v3
# See below for setting up OIDC for AWS permissions
# https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#sample-iam-oidc-cloudformation-template
- name: Configure AWS credentials for us-east-2
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/github-oidc-role"
aws-region: "${{ vars.AWS_REGION }}"
- name: Prepare CloudFormation stack
# You must make cfn.yml in repo yourself, see above snippet
run: aws cloudformation package --template-file cfn.yml --s3-bucket "$S3_BUCKET" --s3-prefix artifacts/REPOSITORY >cfn.packaged.yml
env:
REPOSITORY: "${{ github.event.repository.name }}"
S3_BUCKET: "${{ vars.S3_BUCKET }}"
- name: Deploy CloudFormation stack
uses: aws-actions/aws-cloudformation-github-deploy@v1
with:
name: "${{ github.event.repository.name }}"
template: cfn.packaged.yml
parameter-overrides: >-
BuildId=${{ github.sha }}
no-fail-on-empty-changeset: 1
# You must make this role yourself with proper IAM perms
role-arn: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/cloudformation-deploy-role"
capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_AUTO_EXPAND
This section assumes that the API has already been deployed and contains at least one endpoint implemented as a Step Function that calls a Lambda Function.
If a developer needs to make a change to an existing Lambda Function, then they simply push the change to the corresponding GitHub repository. This causes the updated Lambda Function to be deployed with the “stag” alias automatically. Note that this makes the updated Lambda Function available over the “stag” API immediately with no additional changes or deployments to Step Functions or the REST API required. Any Step Functions automatically call this updated version due to the updated alias.
If a developer needs to make a change to an existing Step Function, then they simply push the change to the corresponding GitHub repository. This causes the updated Step Function to be deployed with the “stag” alias automatically. Note that this makes the updated Step Function available over the “stag” API immediately with no additional changes or deployments to Step Functions or the REST API required. Any REST API endpoints automatically call this updated version due to the updated alias.
If a developer needs to make a (backwards-compatible) change to the REST API or OpenAPI spec, then they simply push the change, and perhaps trigger the manual workflow in the corresponding GitHub repository. Ideally, any Step Functions and Lambda Functions it depends on will already have been deployed first. A version of this process can be used to bootstrap the API for the first time.
The API owner should write a set of integration tests that run against an API deployment and determine whether its behavior is acceptable. The REST API repository should have an action that runs these tests and performs the following steps if they all pass:
The API owner can choose to have this action run:
Note that as long as all changes being deployed are backwards compatible, no service interruption is implied.
The above developer workflow works well for non-breaking changes, i.e., changes not involving updates to the call-level interface between the API and Step Function, or Step Function and Lambda Function. However, because changes to Step Functions and Lambda Functions are not transactional, breaking changes can result in (a) a brief period where incompatible components are in production, at best, or (b) incomplete deployments on partial failure resulting in broken endpoint(s), at worst.
To work around this, it may be necessary to hard-code a Lambda Function version in a Step Function (instead of using the alias) temporarily, or to hard-code a Step Function version in the REST API (instead of using the alias) temporarily, in order to avoid the issue. After a complete deployment with these hard-coded values, a normal deployment with appropriate aliases should be possible.
I suspect there is a version of this developer workflow that uses a different, dynamic alias (e.g., timestamps) instead of “prod” for the production environment that would solve this more transparently, but I haven’t run it to ground yet.