Accelerated Data Processing and Secure Releases for a B2B Platform

How we replaced manual Ansible scripts with a managed AWS EKS cluster with autoscaling and centralized monitoring.


Migration to AWS EKS

Infrastructure and CI/CD automation

TEAM

DevOps engineers

PERIOD OF COLLABORATION

2025

CLIENT’S LOCATION

USA

A Mature Business Trapped in a Legacy Monolith

Our client is a successful B2B platform from the USA that develops solutions for optimizing sales processes (sales-enablement). Prior to our intervention, the company already had a stable user flow and a developed infrastructure that constantly processed data uploaded by clients.

However, the product's technical foundation relied on outdated approaches: it was a classic Ruby on Rails monolith deployed on unmanaged Linux servers using Ansible scripts. Client assets were uploaded via a regular FTP server, and vital background tasks (Cronjobs) were chaotically scattered across various machines, which critically complicated system maintenance and prevented the business from scaling.

Key Outcomes

icon

100% deployment automation (fully without human intervention)

icon

Resource autoscaling (the system adapts to the load automatically)

icon

Instant and secure rollbacks (via GitOps)


The Main Challenge: Slow Data Processing and Fear of Every Deployment

For a sales-enablement platform, speed and uninterrupted operation are critical business requirements. Users upload their work assets daily (e.g., images or product listings), which must be quickly processed and made available for work in the Web UI. However, the client's technical base was not ready for scaling. Instead of focusing on feature development, the development team spent a lot of time maintaining the unstable legacy system.

1

Challenge #1: Performance Bottleneck

This was a fundamental problem of the system. The product operated as a massive Ruby on Rails monolith, deployed on several unmanaged Linux virtual machines. Client assets (such as product images) were uploaded via an outdated FTP server. The application ran as a separate process, and background tasks (Cronjobs) were chaotically scattered across different machines. These scripts functioned as schedulers: they placed messages into a PostgreSQL table queue, after which they were picked up by the corresponding workers. Due to this decentralized and inflexible architecture, the system could not scale under load, and data processing tasks took an unacceptably long time, which directly harmed the end-users.

2

Challenge #2: Complex Deployments & Error-prone Rollbacks

The system deployment on unmanaged virtual machines was performed using Ansible scripts. Overall, the deployment process was highly convoluted and complex. The biggest pain point was the lack of a safe path backward: attempts to roll back the system to a previous version (rollback process) were accompanied by constant errors. Because of this, any update turned into a risk.

3

Challenge #3: "Blindness" of the system

Since various processes and Cronjobs were executed on different isolated machines, the team faced serious monitoring challenges. The lack of centralized collection of metrics and logs made it difficult to understand how the system as a whole worked, and did not allow for quickly finding the root causes during incidents.

Complete Automation and Migration to AWS EKS with a GitOps Approach

To solve the problems with performance and risky releases, we globally changed the architectural approach: we migrated the system from unmanaged virtual machines to a managed Kubernetes cluster (AWS EKS) and fully implemented the GitOps philosophy. This allowed us to eliminate routine work, accelerate processes, and make the system maximally flexible.

Our "Killer Features":

  • Smart Savings (Spot & On-Demand): We configured Auto Scaling Groups in AWS EKS that use a mix of On-Demand and Spot instances. This guarantees that the system automatically gets enough capacity during peak data processing loads while significantly saving the client's budget.
  • Smart CI/CD: We deployed fully isolated environments (Staging and Production) via Terraform and set up automated pipelines using ArgoCD. They include smart branch and file filtering, as well as pre-validation of database migrations.
  • Next-Level Security: For secure storage of configurations and secrets (e.g., access to an external managed PostgreSQL database), we used AWS Parameter Store. We also configured IAM Role for Service Account (IRSA), which provided the application with secure access to client Amazon S3 buckets.

From fear of deployments to a 100% automated and reliable system

The biggest engineering challenge. Decomposition and migration of background tasks. The most complex stage was the migration of the data processing mechanism. In the legacy system, the application ran as a separate process, and schedulers (via classic Linux Crontab) were scattered across different machines. They placed messages into a queue (a PostgreSQL table), after which they were picked up by the workers. To transition to the cloud, we had to completely rethink this mechanism:

  • We migrated the old scripts to native Kubernetes CronJobs, which are now run centrally in the cluster.
  • We configured autoscaling (via EKS Autoscaling Groups with a mix of On-Demand and Spot instances) for worker pods, so they dynamically scale depending on the length of the database queue.

Smart CI/CD and secure DB migrations. To eliminate risks during releases (error-prone rollbacks), we built a full-fledged pipeline based on ArgoCD (GitOps). Instead of manual deployments, the system now automatically reacts to changes in the repository. The features of our pipeline included:

  • Smart filtering of branches and files to optimize CI/CD speed.
  • Pre-deployment validation: before each new version deployment, a special job runs for database migration validation (database migration job), which guarantees the absence of conflicts.

Network security and Service Mesh. Since the product works with B2B client assets, we strengthened security at the infrastructure level:

  • For ingress and egress traffic control and secure communication between services inside Kubernetes, we implemented the Istio Service Mesh technology.
  • Access credentials to the external managed PostgreSQL database are securely stored via AWS Parameter Store, and for integrating the application with client buckets, we used IAM Role for Service Account (IRSA), which allows secure connection to Amazon S3 without hardcoded keys.

Tech Stack & ISV Tools:

  • Cloud & IaC: AWS, Terraform (using terraform-aws-modules for rapid environment creation).
  • Containers & Orchestration: Docker (as a builder for OCI images), AWS EKS, Helm.
  • GitOps & Networking: ArgoCD, Istio.
  • Databases & Cache: CrunchyBridge (managed PostgreSQL provider), Memcached.
  • Observability: kube-prometheus-stack, Loki, Promtail (for collecting and storing logs inside the cluster).

Let's arrange a free consultation

Just fill the form below and we will contaсt you via email to arrange a free call to discuss your project and estimates.