Crypto Wallet Case Study | 99.97% Uptime & $45K Savings on AWS

When Marketing Success Becomes an Infrastructure Challenge

Our client, a leading crypto wallet processing terabytes of data, relied entirely on an AWS Lambda architecture. While stable under normal conditions, the system critically failed during peak marketing activities.

During free crypto giveaways, instant load spikes of 300–500% caused the infrastructure to crash for 5–10 minutes. For a financial product of this level, even a few seconds of downtime destroys user trust, making an immediate architectural overhaul absolutely necessary.

Project Metrics

-37% AWS costs (actual savings of $45,600 per year)

x3.4 increase in deployment frequency and 0 seconds for "cold starts"

99.97% Uptime across two regions simultaneously (measured over 4 months)

We left this block for those who love technical depth. Here is exactly how we migrated a huge financial product from AWS Lambda to Kubernetes while preserving every transaction.

Architectural Debates: Why EKS and not the simpler ECS?

At the start of the project, we faced a choice: where to migrate? The client's team had relevant experience working with Amazon ECS but not much understanding of EKS. It seemed logical to choose ECS—it is much simpler to set up, and migrating to it would have taken less time. However, we convinced the team to take the more difficult route. Our arguments:

Scale dictates the rules: Scale dictates the rules: At large scale — terabytes of data, dozens of services, millions of requests — it's too late to think about “simplicity.” You need tools that provide maximum control.
Tailored approach: With ECS, you have to manage only with the functionality provided by Amazon. EKS is more complex, but thanks to our expertise, it allows for obtaining an individually configured (tailored) result.
Money: Most importantly, EKS allows for the most flexible configuration of cost-optimized autoscaling. At these volumes, proper optimization makes EKS 37% cheaper.

The Biggest Technical Challenge: State Handling

The most difficult stage of the project was the migration and configuration of multi-regional replication of NoSQL databases (Hydra OAuth, CockroachDB). This process required extremely careful state handling. The strictest condition was that all of this had to happen with zero-downtime, so that wallet users noticed nothing. Furthermore, we had to very cautiously transition the native AWS event-driven logic to Kafka.

Absolute Automation: GitOps, ArgoCD, and Argo Rollouts

We completely abandoned manual cluster management and introduced a strict GitOps enforcement approach:

Git as the Single Source of Truth: All deployments are automated and launched exclusively from git commits. We completely excluded any manual infrastructure changes via kubectl commands (no manual kubectl changes). If something is not in the code—it does not exist in the system.
Managing Multiple Clusters: For convenient deployment of configurations immediately to 6 clusters in different regions, we used ArgoCD ApplicationSets.
Zero-Downtime Releases: To ensure updates were absolutely invisible to the end user, we implemented Argo Rollouts. This allowed us to perform safe canary deployments.
Safe Rollbacks: Thanks to GitOps, the process of rolling back the system in case of an error became elementary and boils down to a simple git revert command (instead of complex Lambda version management, as it was before).

The Ultimate Technology Stack

Cloud & Compute: 6 EKS clusters in 2 AWS regions. The ca-central-1 region holds Production, Staging, Load Testing, and Tooling, while us-east-1 is allocated for Development and Data.
Cost Ops: Smart autoscaling Karpenter, mixing Spot and On-demand instances. Transition to Graviton ARM64 processors (r8g family). Automatic consolidation (bin-packing) via Karpenter reduced the number of nodes by 30-35%.
FinOps & Cost Management: Costs for old Lambda were opaque, so we implemented Cloud Intelligence Dashboards (CID), Cost and Usage Reports (CUR 2.0), and QuickSight. Now costs are attributed down to the level of a separate namespace.
Data & Mesh: Istio (mTLS, traffic management, circuit breaking) is responsible for service communication. Data layer: Amazon RDS Aurora (multi-region), DynamoDB (global tables), ElastiCache (Valkey), MSK (Kafka), and CockroachDB.
Security & Observability: Datadog (APM), Prometheus, Grafana, OpenTelemetry for comprehensive monitoring. Security is handled by External Secrets Operator, AWS Secrets Manager, Cert-manager, Kyverno (policy enforcement), and Cloudflare Tunnel for zero-trust access.
IaC: 100% of the infrastructure on Pulumi (TypeScript)—67 micro-stacks and a custom component library.

Final Results: Speed, Economy, and Stability

This architectural transformation did not just solve the crashing problem—it fundamentally changed how the business operates with infrastructure. We not only eliminated all bottlenecks but also ensured absolute budget transparency and created an environment where developers can release new features exponentially faster. Today, the crypto wallet continues to grow actively, and our team remains a reliable partner. The process of cost optimization and improved monitoring is now an ongoing joint initiative.

Zero Downtime: Crypto Wallet Migration to EKS

Crypto wallet migration to EKS

TEAM

PERIOD OF COLLABORATION

CLIENT’S LOCATION

When Marketing Success Becomes an Infrastructure Challenge

Project Metrics

Why the "Serverless" Architecture Stopped Working

Challenge #1: Databases were "suffocating"

Challenge #2: Technical Limits and Slowness

Challenge #3: High and "Blind" Costs

Rescue Strategy: From Quick Fixes to Global Migration

Architectural Debates: Why EKS and not the simpler ECS?

The Biggest Technical Challenge: State Handling

Absolute Automation: GitOps, ArgoCD, and Argo Rollouts

The Ultimate Technology Stack

An Honest Look (Lessons Learned)

Final Results: Speed, Economy, and Stability

Development Speed (Velocity)

Reliability for the Business

Budget Optimization (FinOps)

Let's arrange a free consultation

Read other cases

Staying Truly Agile and Cost-Efficient in the Cloud: The Mission Is Possible

Avoiding Long Builds and Slow Deployments in DevOps: How We Helped A Startup Streamline Their CI/CD Processes