Migration to EKS: 4x Faster Releases and $45,600/Year in Savings

How we transformed the infrastructure of a global B2B service from a bottleneck into a growth driver.


Global B2B Validation Service (Strict NDA)

B2B service for online casino validation (revenue model based on authenticity certificate subscriptions)

TEAM

2 part-time DevOps engineers from Alpacked

PERIOD OF COLLABORATION

From late 2024 - to present

CLIENT’S LOCATION

Worldwide (with the strongest traction in the Brazilian and Turkish markets)

Background: 8 Fired Teams and a Blocked Release

In late 2024, the company approached us with a critically unstable legacy infrastructure built on AWS ECR/ECS. Persistent CI/CD pipeline failures were blocking developers and critically delaying the product's first go-live. The situation had become so severe that, in the six months before we came on board, the client was forced to let go of 8 DevOps teams due to overwhelming frustration from both management and staff.

Key Results in Numbers

icon

$45,600/year - reduction in infrastructure costs

icon

4x - increase in deployment frequency through fully automated CI/CD

icon

82% - reduction in mean time to recovery (MTTR)


The Core Challenge: Extreme Development Pace Without a Tech Lead

The central problem was that the unstable infrastructure was delaying the go-to-market launch, yet the client demanded stability and best practices before launch - while refusing to slow down the relentless development pace. This challenge came down to three critical factors:

1

Challenge #1: No CTO at Extreme Velocity.

The company had no CTO or architect to make foundational decisions, which meant all communication went directly through the CEO and CMO. With release cycles of one day or less - accompanied by dozens of micro-calls daily - the absence of technical oversight combined with junior-level developers generated hours of patching work. Since the maximum acceptable downtime for this business is just a few minutes, these issues constantly put production at risk of going down.

2

Challenge #2: Hard Deadlines and Feature Blockers.

Migrating to EKS, rebuilding CI/CD pipelines, and establishing baseline security best practices required a complete rewrite of the entire infrastructure - all while new features were being developed in parallel, creating extremely tight deadlines.

3

Challenge #3: Unreliable Pipelines.

The outdated approach generated a constant stream of errors, causing the speed and quality delivered by previous DevOps teams to fall far short of the client's expectations.

Our Strategy: Seamless Migration Instead of Patching

Rather than attempting to patch the old system, we chose a reliable strategy: we stood up the new architecture in parallel with the existing one to avoid blocking developers, then performed the migration.

  • Cost Optimization (FinOps): Deployed Karpenter to manage mixed node groups combining cost-effective Spot instances with standard On-Demand instances.
  • Security Excellence: Configured full RBAC for cluster access, granular IAM permissions for service-level AWS resource access, and traffic flow control via Istio service mesh.
  • Observability: Integrated Datadog APM, which provided developers with an intuitive interface and significantly streamlined the troubleshooting process.

Engineering Journey: The "Quick Win" vs. Security Dilemma 

During the Discovery phase, our team faced a serious architectural decision. The first option - a quick-win strategy - was to improve the existing AWS ECR/ECS legacy architecture to accelerate go-live as much as possible, giving the client a chance to start generating revenue that would fund further improvements. The second option was to rewrite everything from scratch on Amazon EKS. We made a compelling case to the client that staying on the old architecture risked compromising the integrity and stability of production from day one - potentially costing them the trust of their earliest users. The client decided not to take that risk: we stood up EKS clusters in parallel with the old system and carried out a completely seamless migration. Only after that did the focus shift to post-migration tasks: security hardening, cost optimization, and bug fixes.

Architectural Leadership and Lessons Learned

We encountered the fastest SDLC of our experience: release cycles of one day or less, accompanied by dozens of micro-calls daily. Since the client had neither a dedicated architect nor a CTO, and their development team was relatively junior, we were compelled to introduce a close code review practice. This decision saved production from many hours of patching and downtime. The key lesson learned: under these conditions, it's essential to push the client from day one to hire a dedicated architect. Also worth highlighting is the close collaboration with the client's AI team, which saved the business from serious reputational damage. As for knowledge transfer, it was virtually absent on the client side - but we successfully ran a fast internal onboarding when adding a second DevOps engineer to the project.

Full Technology Stack and Killer Features

We achieved 100% infrastructure-as-code coverage via Terraform and fully implemented a GitOps approach through ArgoCD. This enabled minutes-worth changes and reduced rollbacks to a simple, safe git revert command.

  • Core Infrastructure: The foundation consists of 2 isolated Amazon EKS clusters. Network communication and traffic management are handled by Istio service mesh, which also provides mTLS encryption and deep observability.
  • AWS Services Integration: We configured VPC Peering for secure connectivity with the Mongo Atlas database. Also integrated: Amazon CloudFront (CDN), Amazon WAF, AWS Load Balancer Controller (ALB/NLB), with SecurityHub and Inspector handling continuous vulnerability monitoring.
  • Extended FinOps Stack: Node management is fully delegated to Karpenter, which automatically provisions mixed on-demand/spot instance groups - delivering up to 80% savings on Spot instances. The auto-consolidation (bin-packing) feature further reduced node count by 30–35%. For additional savings, we migrated workloads to Graviton ARM64 processors (r8g family), which proved ~20% cheaper than standard x86.
  • Observability & Security Ecosystem: While the system supports Prometheus, Grafana, and OpenTelemetry for comprehensive monitoring, one of the standout decisions was moving away from the standard Prometheus/Grafana stack in favor of Datadog APM - which delivered the ideal developer-facing interface. Sensitive data management is automated via External Secrets Operator integrated with AWS Secrets Manager, and TLS issuance is handled by cert-manager.

Business Impact of the Migration

With the move to the new architecture, infrastructure stopped being a blocker and became a driver for the development team - providing clear visibility into cloud costs and handling significant load spikes with near-zero downtime. We delivered the following key results:

1

Development Velocity

Thanks to GitOps automation, deployment frequency increased 4x. Deployments now run with zero downtime, and rollbacks execute instantly via git - reducing mean time to recovery (MTTR) by 82%.

2

Cost Reduction

Automated Spot instance interruption handling via Karpenter delivered up to 80% in compute savings. Karpenter's auto-consolidation reduced total node count by 30–35%. Migrating to Graviton ARM64 processors proved ~20% cheaper than x86. Combined, these steps produced $45,600 in annual savings.

3

Reliability & Security

99.7% uptime achieved across all services, with Change Failure Rate (CFR) reduced by 4%. The infrastructure underwent deep security hardening, enabling the company to successfully obtain ISO 27001 certification. These rigorous security standards now serve as a solid foundation for the next compliance milestone — SOC 2.

Want to speed up deployments and cut your infrastructure bill?

We'll implement reliable GitOps practices for your product.

Let's arrange a free consultation

Just fill the form below and we will contaсt you via email to arrange a free call to discuss your project and estimates.