Avoiding Long Builds and Slow Deployments in DevOps: How We Helped A Startup Streamline Their CI/CD Processes

Looking for a DevOps-as-a-Service provider to fine-tune your CI/CD processes? Read about how we helped a startup streamline their builds and deployments.


StreamSer (NDA)

A cutting-edge platform for live broadcasts that goes above and beyond by revolutionizing CI/CD procedures, ensuring an extraordinarily seamless trip for all clients. Dive into the case study to see how DevOps can improve your streaming efforts utilizing our expertise now!

TEAM

101-250 people

PERIOD OF COLLABORATION

2022 - present

CLIENT’S LOCATION

Boston, Massachusetts, United States

About the client

In February 2022, a media and entertainment startup contacted us with a DevOps-as-a-service request. The client operates an AWS-Lambda-based platform for creating streaming apps for TV channels. They used DevOps and CI/CD pipelines to maintain and enhance their platform.

Despite using top DevOps solutions, the client struggled with slow updates and redundant processes. They needed a vendor that specialized in DevOps for startups and enterprises and had profound cloud architecture expertise.

Among other tools, the client used:

icon

Java and GraalVM to write Lambda functions

icon

Apache Maven and Quarkus to perform builds

icon

AWS Cloud Development Kit (CDK) and GitLab CI for deployments.


One Problem, Three Causes

After thoroughly auditing the client’s system and DevOps processes, we found that their previous DevOps team had improperly programmed the client’s infrastructure-as-code, hence slow builds and deployments. Even the smallest source code change would sometimes add six hours to deployment! We identified three main factors that contributed to such devastating delays.

1

Factor #1: Redeployments 

AWS, the managed cloud service provider the client relies on, uses CloudFormation (CF) to provision cloud resources needed to deploy Lambda functions. And given that the client’s app had twenty stacks of resources at the project’s onset, this led to the next problem.

By default, the AWS CDK doesn’t deploy artifacts in parallel. In addition to deploying new artifacts, it redeploys all Lambda functions in an AWS CDK app. This means that with the smallest update, it first deploys artifacts from the first stack, then does the same with the second stack, and so on, until it finishes all twenty stacks. With each change, CF and the CDK initiate a database migration. This process involves deploying Custom Resources (CR), which are essentially CDK constructs like Lambda functions.

As a result, deployment of the entire CDK app took about thirty minutes.

2

Factor #2: Redundancies in builds 

When the client came to us, redundant builds were their most pressing problem. The process spanned the entire system whenever the algorithm initiated a build of artifacts. That is, it didn’t just change modules; it had to rebuild the entire system after the smallest commit. And because their project had a few dozen modules, each build took about two hours and resulted in over 1G of artifacts (i.e., Lambda function images).

3

Factor #3: Three environments 

When the client reached out to us, they had two environments, DEV and Stage, and each required a separate build and CDK app deployment. Given that one build took two hours and one deployment took thirty minutes, every single update cost the client at least five hours. Adding one more environment, Prod, added another two to three hours.

Challenge #1: Redundant builds in the CI/CD pipeline 

We can all agree there’s no point in building and deploying each update three times.  As our first and, frankly speaking, easiest task, we needed to modify the client’s CI/CD pipeline.

To eliminate redundant builds, we needed to set up the pipeline in a way that allowed the storage of built artifacts once they were committed to the main branch of the DEV environment. Because GitLab CI has a hard limit for artifact size, we decided to store them in Gitlab’s cache (located in Amazon S3) instead and use COMMIT_SHA as the key.

This meant every commit to the main branch would create a separate ZIP archive in Amazon S3 with all built artifacts needed for deployment. Deployment to DEV then initiates, and after the tag associated with the artifacts is created, deployment to Stage and Prod kicks in. Within this setup, artifacts are built only once and automatically pulled from the Amazon S3 cache with a tag on each deployment.

As you can see, a few tiny pipeline modifications saved at least four hours.

Challenge #2: Long builds in Apache Maven

Although the above changes to the pipeline helped remove redundant builds, the problem of each build taking two hours and progressing serially through all modules in the client’s software system wasn’t going away. That is, if we had a system version 1.0.0 and wanted to update it to 1.0.1, we’d have to wait two hours for the builds, even if the reason for the version change was one artifact in a small patch.

Thus, our next move was to modify the workflow so that each build would involve only the module (or modules) it’s meant for.

Attempt #1: Modifying workflows in GitLab pipelines

Our first attempt at solving this involved using GitLab pipelines. This was the path of least resistance, requiring minimal changes to the project’s settings. With this approach, the automatic workflow would have the following logic:

chart

  • Assigning one job per module using rules[].changes
  • Defining groups of modules (and jobs) with shared dependencies and specifying these dependencies using needs
  • Specifying which module(s) require a build with mvn install -pl

However, it turned out that the number of dependencies between jobs in GitLab can’t exceed 50. This solution wouldn’t work, so we moved to a new one.

Attempt #2 (success!): GIB

We resolved the problem by using gitflow-incremental-builder (GIB), a Maven extension for the incremental building of multimodule Maven projects. Since this resolution builds only changed modules, it was a real lifesaver for this project. Plus, the -am option (a Maven flag) initiates building all the dependencies on the changed modules.

Initially, we used GIB’s -am option exclusively for the development branch. Meanwhile, builds for the main branch still relied on the client’s previous approach — the CDK app deployment won’t succeed if it doesn’t have all the needed artifacts locally. This led us to our next move. 

chart

Recall that we initiated deployments to Stage and Prod using a tag that appeared after code changes were deployed to DEV. So, to streamline builds, we came up with the following mechanism: 

  • Using the cache:policy keyword, the algorithm pulls artifacts from the tag associated with the previous software version (1.0.0).
  • The algorithm compares the previous software version (1.0.0) with the new one (1.0.1) and builds changed modules. 
  • After building the changed modules, the system creates a new cache for version 1.0.1 and uploads all unchanged artifacts from version 1.0.0, along with those changed for version 1.0.1.

This approach avoided unnecessary changes in patches, enabling us to reduce build time in Stage and Prod from two hours to ten minutes.

Let's arrange a free consultation

Just fill the form below and we will contaсt you via email to arrange a free call to discuss your project and estimates.

Challenge #3: Extra deployment processes and the limitations of AWS CDK class wrappers

Deploying Lambda functions in AWS CDK is pretty straightforward: you need to specify an S3 object, and AWS CDK takes care of the rest. 

The AWS CDK has class wrappers that automatically prepare built artifacts for deployment, upload them to AWS S3 or ECR, and paste the corresponding link into the CloudFormation template. To prevent the same artifact from uploading again, the AWS CDK creates a shared hash for all files and adds it to the artifact name. Meanwhile, Quarkus creates a separate ZIP archive (function.zip) containing an image of the built Lambda function.

Given this and that we were still doing the full build in the main thread, we faced another problem: every time the AWS CDK compiles artifact files to perform a build, the artifacts’ metadata changes (e.g., creation time, change time). From the point of view of the algorithm, the version of the artifact changes, too. Thus, all of the project’s artifacts (including those that didn’t change) get reloaded with each new deployment. 

This led us to the following solutions.

Solution #1: A custom script to upload artifacts to storage

To prevent redundant deployments, we had to opt out of using class wrappers for automatic artifact processing. Instead, we created a bash script and used the Exec Maven Plugin.

The script uploads function.zip with the Lambda function image to AWS S3 during the build and uses that image for deployment in CDK:

  • maven deploy is initiated.
  • The algorithm calls the script.
  • Once the artifact is built, the script uploads it to S3 with the object key type group_id/artefact_id/function.zip, where group_id is the service name and artefact_id is the Lambda function name.

We complemented this solution later with Build Helper Maven Plugin, a tool that uploads built artifacts to S3 and ECR. 

Solution #2: Establishing artifact versioning and creating a custom registry of artifact versions

To ensure the algorithm deploys the same Lambda functions to all three environments, we had to create our own registry of artifact versions.

For this, we used 

We also created a custom script that automatically updates all Lambda functions when the version of the internal library they rely on changes.

Solution #3: A CDK module that sources required artifacts and pulls them directly from S3 and ECR

Next, we built a CDK module that scans the registry, checks whether the needed artifact version exists, and provides a link to it. Since in the AWS CDK, such actions can’t be performed at runtime, the module finds the needed artifact before the CDK code is triggered and uses the link in runtime. 

We continue to improve the module. Our next step is to enhance it to fetch the links to local, not just remote artifacts. This will make it easy for developers to use local artifacts — the ones they build on their own.


Results: Lightning-fast builds and deployments

1

  • We reduced build time considerably by initiating the build only for changed modules. It now rarely exceeds 10 minutes for small commits.

2

  • We changed the CI/CD pipelines so that the build for all three environments is now implemented only once.

3

  • We reduced deployment time from two-three hours to forty minutes by ensuring that built artifacts are uploaded only once and stored locally. There’s no longer any need to redeploy unchanged Lambda functions