
We left this block for those who love technical depth. Here is exactly how we migrated a huge financial product from AWS Lambda to Kubernetes while preserving every transaction.
Architectural Debates: Why EKS and not the simpler ECS?
At the start of the project, we faced a choice: where to migrate? The client's team had relevant experience working with Amazon ECS but not much understanding of EKS. It seemed logical to choose ECS—it is much simpler to set up, and migrating to it would have taken less time. However, we convinced the team to take the more difficult route. Our arguments:
- Scale dictates the rules: Scale dictates the rules: At large scale — terabytes of data, dozens of services, millions of requests — it's too late to think about “simplicity.” You need tools that provide maximum control.
- Tailored approach: With ECS, you have to manage only with the functionality provided by Amazon. EKS is more complex, but thanks to our expertise, it allows for obtaining an individually configured (tailored) result.
- Money: Most importantly, EKS allows for the most flexible configuration of cost-optimized autoscaling. At these volumes, proper optimization makes EKS 37% cheaper.
The Biggest Technical Challenge: State Handling
The most difficult stage of the project was the migration and configuration of multi-regional replication of NoSQL databases (Hydra OAuth, CockroachDB). This process required extremely careful state handling. The strictest condition was that all of this had to happen with zero-downtime, so that wallet users noticed nothing. Furthermore, we had to very cautiously transition the native AWS event-driven logic to Kafka.
Absolute Automation: GitOps, ArgoCD, and Argo Rollouts
We completely abandoned manual cluster management and introduced a strict GitOps enforcement approach:
- Git as the Single Source of Truth: All deployments are automated and launched exclusively from git commits. We completely excluded any manual infrastructure changes via kubectl commands (no manual kubectl changes). If something is not in the code—it does not exist in the system.
- Managing Multiple Clusters: For convenient deployment of configurations immediately to 6 clusters in different regions, we used ArgoCD ApplicationSets.
- Zero-Downtime Releases: To ensure updates were absolutely invisible to the end user, we implemented Argo Rollouts. This allowed us to perform safe canary deployments.
- Safe Rollbacks: Thanks to GitOps, the process of rolling back the system in case of an error became elementary and boils down to a simple git revert command (instead of complex Lambda version management, as it was before).
The Ultimate Technology Stack
- Cloud & Compute: 6 EKS clusters in 2 AWS regions. The ca-central-1 region holds Production, Staging, Load Testing, and Tooling, while us-east-1 is allocated for Development and Data.
- Cost Ops: Smart autoscaling Karpenter, mixing Spot and On-demand instances. Transition to Graviton ARM64 processors (r8g family). Automatic consolidation (bin-packing) via Karpenter reduced the number of nodes by 30-35%.
- FinOps & Cost Management: Costs for old Lambda were opaque, so we implemented Cloud Intelligence Dashboards (CID), Cost and Usage Reports (CUR 2.0), and QuickSight. Now costs are attributed down to the level of a separate namespace.
- Data & Mesh: Istio (mTLS, traffic management, circuit breaking) is responsible for service communication. Data layer: Amazon RDS Aurora (multi-region), DynamoDB (global tables), ElastiCache (Valkey), MSK (Kafka), and CockroachDB.
- Security & Observability: Datadog (APM), Prometheus, Grafana, OpenTelemetry for comprehensive monitoring. Security is handled by External Secrets Operator, AWS Secrets Manager, Cert-manager, Kyverno (policy enforcement), and Cloudflare Tunnel for zero-trust access.
- IaC: 100% of the infrastructure on Pulumi (TypeScript)—67 micro-stacks and a custom component library.