A Developer Workflow for Modern AWS Serverless Applications

Modern serverless applications on AWS are complex with a lot of moving parts. Mapping a developer workflow onto those applications can be difficult. This article discusses the developer workflow I have developed for complex serverless applications at aleph0, with example CloudFormation template and GitHub Action snippetes to illustrate the concepts.

Goals for Developer Workflow

An efficient developer workflow compatible with team growth and modern practices, e.g., CI/CD, Infrastructure as Code, etc.
Separate environments for staging, production, etc. are a requirement.

Assumptions for Developer Workflow

An AWS serverless architecture, per the title. A straw man architecture is documented below for the sake of this discussion.
Serverless components are managed independently. While managing everything together (i.e., in a single CloudFormation template) can simplify things, it’s important that the process survive growth in team size, complexity, etc.

Example Architecture

This article will use the following architecture to drive the discussion:

The architecture uses an API Gateway REST API to define endpoints that use StartSyncExecution to invoke an Express Step Function, which in turn invokes Lambda function(s) and/or other AWS services.

Building APIs with this architecture involves coordinating changes to three different components, all of which must work together:

API Gateway mapping template, one per endpoint
Express Step Function, one per endpoint
Lambda Function, one per microservice

The remainder of this discussion focuses on a developer workflow for building an application with this architecture.

Tools and Techniques

The developer workflow makes use of several modern best practices, which are outlined here. The tech stack this workflow uses to implement these practices are as follows:

Certainly, these tools could easily be substituted for others — GitLab for GitHub, Jenkins for GitHub Actions, and so on — but this workflow uses the above tooling.

Continuous Integration and Continuous Delivery

CICD — or Continuous Integration and Continuous Delivery — is a form of developer tooling that takes code changes and builds, validates, and deploys them automatically. It’s a cornerstore of modern development workflows and all developer workflows should use it. There are many reasons why, which have already been discussed at length online, but here are just a few:

Efficiency. Developers should be able to see the results of their code changes live on demand, preferably in 5 minutes or less. Wait is waste.
Consistency. Computers are better at performing tasks the same way than humans are. Putting computers in charge of deployments reduces deployment errors.
Security. Fewer people need access to the staging and production environments.
Definition of Done. Features are not done until their tests are passing in CI, and they are working in staging as deployed by CD.

This developer workflow uses the following (simple, bog-standard) CI/CD pipeline:

When a developer is ready to submit a code change, they push to GitHub. After the change has passed the team’s code review process and been accepted, a CI/CD process runs as a GitHub action, which ultimately deploys the code change to AWS.

References to CI/CD in the rest of the workflow are talking about this pipeline.

Infrastructure as Code

IaC — or Infrastructure as Code — is another form of developer tooling that takes a description of a software deployment and assembles it automatically. IaC is another cornerstone of modern development practices that all workflows should use. Its virtues have also been extolled online, but here are a few for reference:

Efficiency. Developers should be able to see the results of their code changes live on demand, preferably in 5 minutes or less. Wait is waste.
Consistency. Computers are better at performing tasks the same way than humans are. Putting computers in charge of deployments reduces deployment errors.
Security. Fewer people need access to the staging and production environments.
Definition of Done. Features are not done until their tests are passing in CI, and they are working in staging as deployed by CD.

The list should look familiar.

Given the nature of the task — building serverless architectures in AWS — this developer workflow uses AWS CloudFormation for IaC with the Serverless Application Model (SAM) transformation for ease of use.

References to IaC in the rest of the workflow, including sample templates, are talking about this stack.

Developer Workflow

Per the above architecture and assumptions, the Lambda Functions, Step Functions, and the REST API are all managed separately. It’s possible to manage all of these things together in one CloudFormation template, but that makes things simpler, so this discussion assumes everything is managed independently. It would not be difficult to adapt this process to the simpler case where everything is managed together.

Lambda Function Workflow

Each Lambda Function implements a microservice, and therefore has its own repository and can be deployed independently. The developer workflow for Lambda Functions is the standard CI/CD pipeline. The deployment is implemented as a SAM template, which allows it publish a new Lambda Function version and alias automatically on each deploy.

So when developers push a code change to a Lambda Function repository, the updated code becomes available in the cloud shortly thereafter under an alias, e.g., “stag” short for “staging”.

CloudFormation Template Snippet

This snippet shows the relevant details of the CloudFormation template. The below CD workflow assumes the template is stored at /cfn.yml in the GitHub repository.

Note that the resource is of type AWS::Serverless::Function, not AWS::Lambda::Function.

GitHub Actions CD Workflow Snippet

This snippet shows the skeleton of a CD workflow for the Lambda function. It can easily be adapted to an integration workflow as well. Note that it assumes the CloudFormation template is stored at /cfn.yml in the GitHub repository.

name: delivery on: push: branches: - main permissions: id-token: write contents: read jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout source code uses: actions/checkout@v3 - # Set up platform, e.g., Java actions/setup-java - # Configure build tool if necessary, e.g., mvn - # Set up caching if desired, e.g., maven repo actions/cache # See below for setting up OIDC for AWS permissions # https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#sample-iam-oidc-cloudformation-template - name: Configure AWS credentials for us-east-2 uses: aws-actions/configure-aws-credentials@v1 with: role-to-assume: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/github-oidc-role" aws-region: "${{ vars.AWS_REGION }}" - # Perform build with build tool, e.g., mvn - name: Prepare CloudFormation stack # You must make cfn.yml in repo yourself, see above snippet run: aws cloudformation package --template-file cfn.yml --s3-bucket "$S3_BUCKET" --s3-prefix artifacts/REPOSITORY >cfn.packaged.yml env: REPOSITORY: "${{ github.event.repository.name }}" S3_BUCKET: "${{ vars.S3_BUCKET }}" - name: Deploy CloudFormation stack uses: aws-actions/aws-cloudformation-github-deploy@v1 with: name: "${{ github.event.repository.name }}" template: cfn.packaged.yml parameter-overrides: >- BuildId=${{ github.sha }} no-fail-on-empty-changeset: 1 # You must make this role yourself with proper IAM perms role-arn: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/cloudformation-deploy-role" capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_AUTO_EXPAND

Step Function Workflow

Each Step Function implements a service, which is comprised of orchestrated calls to the Lambda microservices from above and other AWS services. The developer workflow for Step Functions is the standard CI/CD pipeline. The repository only needs to contain the Step Function definition, the relevant GitHub actions, and a way to deploy the Step Function, e.g., a CloudFormation template.

Also, the Step Function must take as part of its input a value indicating the environment in which it’s been invoked, e.g., “stag” vs. “prod”. (Surprisingly, if a Step Function is invoked using an alias, that alias is not available inside the Step Function, even in the State Machine ARN, which is why this is necessary.) When calling a Lambda Function, the Step Function must use the environment value to resolve the alias to use to perform the invocation, typically using the environment value as the alias itself.

So when developers push a code change to a Step Function repository, the updated Step Function becomes available in the cloud shortly thereafter under an alias, e.g., “stag” short for “staging”. The Step Function must also be an Express Step Function (as opposed to Standard Step Function).

Starting Points

Users may find the template examples in the AWS Console to be very helpful in creating new State Machines.

*The suggested templates are uncommonly useful!*

CloudFormation Template Snippet

This snippet shows the relevant details of the CloudFormation template. The below CD workflow assumes the template is stored at /cfn.yml in the GitHub repository.

Note that the resource is of type AWS::Serverless::StateMachine, not AWS::StepFunctions::StateMachine.

AWSTemplateFormatVersion: 2010-09-09 Transform: AWS::Serverless-2016-10-31 # Must use SAM xformation Parameters: BuildId: Type: String Description: The build identifier AllowedPattern: "^[0-9A-Za-z]+$" Resources: StateMachine: Type: AWS::Serverless::StateMachine Properties: # Many other properties for architecture, runtime, code, IAM, ... Name: xyz Type: EXPRESS # Has to be express AutoPublishAlias: stag # Auto-publish stag alias # This example definition simply calls the Lambda function above. # Obviously, it can do anything you want it to. Definition: Comment: "${buildId}" StartAt: LambdaInvoke States: LambdaInvoke: Type: Task Resource: arn:aws:states:::lambda:invoke OutputPath: "$.Payload" Parameters: FunctionName: "${lambdaFunctionArn}:stag" Payload.$: "$" Retry: - ErrorEquals: - Lambda.ServiceException - Lambda.AWSLambdaException - Lambda.SdkClientException - Lambda.TooManyRequestsException IntervalSeconds: 1 MaxAttempts: 3 BackoffRate: 2 End: true DefinitionSubstitutions: buildId: !Ref BuildId lambdaFunctionArn: !ImportValue xyz-lambda-function-arn Policies: - LambdaInvokePolicy: functionName: !ImportValue xyz-lambda-function-name Outputs: StateMachineArn: Value: !Ref StateMachine Description: The ARN of the XYZ State Machine Export: Name: xyz-state-machine-arn StateMachineName: Value: !GetAtt StateMachine.Name Description: The name of the XYZ State Machine Export: Name: xyz-state-machine-name

For reference, this step function renders as follows in the excellent Workflow Studio:

GitHub Actions CD Workflow Snippet

This snippet shows the skeleton of a CD workflow for the State Machine. It can easily be adapted to an integration workflow as well. Note that it assumes the CloudFormation template is stored at /cfn.yml in the GitHub repository.

name: delivery on: push: branches: - main permissions: id-token: write contents: read jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout source code uses: actions/checkout@v3 # See below for setting up OIDC for AWS permissions # https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#sample-iam-oidc-cloudformation-template - name: Configure AWS credentials for us-east-2 uses: aws-actions/configure-aws-credentials@v1 with: role-to-assume: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/github-oidc-role" aws-region: "${{ vars.AWS_REGION }}" - name: Prepare CloudFormation stack # You must make cfn.yml in repo yourself, see above snippet run: aws cloudformation package --template-file cfn.yml --s3-bucket "$S3_BUCKET" --s3-prefix artifacts/REPOSITORY >cfn.packaged.yml env: REPOSITORY: "${{ github.event.repository.name }}" S3_BUCKET: "${{ vars.S3_BUCKET }}" - name: Deploy CloudFormation stack uses: aws-actions/aws-cloudformation-github-deploy@v1 with: name: "${{ github.event.repository.name }}" template: cfn.packaged.yml parameter-overrides: >- BuildId=${{ github.sha }} no-fail-on-empty-changeset: 1 # You must make this role yourself with proper IAM perms role-arn: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/cloudformation-deploy-role" capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_AUTO_EXPAND

REST API Workflow

The REST API should be defined as an OpenAPI spec. A GitHub repository will be dedicated to the API and its deployment. The spec itself must either (a) reside in the repository, or (b) be publicly available over HTTP using a well-known URL. A process may be used to prepare the OpenAPI spec for deployment (for example, by adding API Gateway Extensions to the spec), or the spec may be stored already prepared.

The developer workflow for the REST API is the standard CI/CD pipeline. By default, the pipeline should deploy changes to a staging environment, which is represented as a REST API stage with the name “stag”. In addition to the normal “git push” trigger, the CI/CD pipeline may also have a manual trigger for cases when an externally-stored OpenAPI spec changes.

In general, endpoint implementations should invoke Step Function services using StartSyncExecution (which is why they must be Express Step Functions) and passing the stage name (available as $context.stage in mapping templates) as the required environment input.

So when developers push a code change to the REST API repository, the updated stag stage of the REST API becomes available in the cloud shortly thereafter.

CloudFormation Template Snippet

Serverless templates for REST APIs are essentially thin wrappers around OpenAPI specs with API Gateway extensions. The AWS Management Console has an outstanding builder for REST APIs that can export the definition automatically to help users get started. This is left as an exercise to the reader.

GitHub Actions CD Workflow Snippet

This snippet shows the skeleton of a CD workflow for the API Gateway. It can easily be adapted to an integration workflow as well. Note that it assumes the CloudFormation template is stored at /cfn.yml in the GitHub repository.

name: delivery on: push: branches: - main permissions: id-token: write contents: read jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout source code uses: actions/checkout@v3 # See below for setting up OIDC for AWS permissions # https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#sample-iam-oidc-cloudformation-template - name: Configure AWS credentials for us-east-2 uses: aws-actions/configure-aws-credentials@v1 with: role-to-assume: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/github-oidc-role" aws-region: "${{ vars.AWS_REGION }}" - name: Prepare CloudFormation stack # You must make cfn.yml in repo yourself, see above snippet run: aws cloudformation package --template-file cfn.yml --s3-bucket "$S3_BUCKET" --s3-prefix artifacts/REPOSITORY >cfn.packaged.yml env: REPOSITORY: "${{ github.event.repository.name }}" S3_BUCKET: "${{ vars.S3_BUCKET }}" - name: Deploy CloudFormation stack uses: aws-actions/aws-cloudformation-github-deploy@v1 with: name: "${{ github.event.repository.name }}" template: cfn.packaged.yml parameter-overrides: >- BuildId=${{ github.sha }} no-fail-on-empty-changeset: 1 # You must make this role yourself with proper IAM perms role-arn: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/cloudformation-deploy-role" capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_AUTO_EXPAND

Workflow Examples

This section assumes that the API has already been deployed and contains at least one endpoint implemented as a Step Function that calls a Lambda Function.

If a developer needs to make a change to an existing Lambda Function, then they simply push the change to the corresponding GitHub repository. This causes the updated Lambda Function to be deployed with the “stag” alias automatically. Note that this makes the updated Lambda Function available over the “stag” API immediately with no additional changes or deployments to Step Functions or the REST API required. Any Step Functions automatically call this updated version due to the updated alias.

If a developer needs to make a change to an existing Step Function, then they simply push the change to the corresponding GitHub repository. This causes the updated Step Function to be deployed with the “stag” alias automatically. Note that this makes the updated Step Function available over the “stag” API immediately with no additional changes or deployments to Step Functions or the REST API required. Any REST API endpoints automatically call this updated version due to the updated alias.

If a developer needs to make a (backwards-compatible) change to the REST API or OpenAPI spec, then they simply push the change, and perhaps trigger the manual workflow in the corresponding GitHub repository. Ideally, any Step Functions and Lambda Functions it depends on will already have been deployed first. A version of this process can be used to bootstrap the API for the first time.

Promoting to Production

The API owner should write a set of integration tests that run against an API deployment and determine whether its behavior is acceptable. The REST API repository should have an action that runs these tests and performs the following steps if they all pass:

Apply the alias “prod” to all Lambda Function versions that currently have the “stag” alias
Apply the alias “prod” to all Step Function versions that currently have the “stag” alias
Deploy the “prod” REST API stage

The API owner can choose to have this action run:

Automatically, after a successful API “stag” deployment, which implements Continuous Deployment
Manually, on demand, by the API owner only, which implements Continuous Delivery

Note that as long as all changes being deployed are backwards compatible, no service interruption is implied.

On Breaking Changes

The above developer workflow works well for non-breaking changes, i.e., changes not involving updates to the call-level interface between the API and Step Function, or Step Function and Lambda Function. However, because changes to Step Functions and Lambda Functions are not transactional, breaking changes can result in (a) a brief period where incompatible components are in production, at best, or (b) incomplete deployments on partial failure resulting in broken endpoint(s), at worst.

To work around this, it may be necessary to hard-code a Lambda Function version in a Step Function (instead of using the alias) temporarily, or to hard-code a Step Function version in the REST API (instead of using the alias) temporarily, in order to avoid the issue. After a complete deployment with these hard-coded values, a normal deployment with appropriate aliases should be possible.

I suspect there is a version of this developer workflow that uses a different, dynamic alias (e.g., timestamps) instead of “prod” for the production environment that would solve this more transparently, but I haven’t run it to ground yet.

‍

A Developer Workflow for Modern AWS Serverless Applications

Goals for Developer Workflow

Assumptions for Developer Workflow

Example Architecture

Tools and Techniques

Continuous Integration and Continuous Delivery

Infrastructure as Code

Developer Workflow

Lambda Function Workflow

CloudFormation Template Snippet

GitHub Actions CD Workflow Snippet

Step Function Workflow

Starting Points

CloudFormation Template Snippet

GitHub Actions CD Workflow Snippet

REST API Workflow

CloudFormation Template Snippet

GitHub Actions CD Workflow Snippet

Workflow Examples

Promoting to Production

On Breaking Changes

More blog posts