AWS EC2 compute-optimizer e-commerce B2B SaaS (E-commerce Analytics) • Series B, 50-150 employees

Case Study: Scaling E-commerce Analytics 40% While Reducing EC2 Costs 36.8%

How we improved resource efficiency from 24% to 61% CPU utilization for a seasonal SaaS platform, enabling growth without proportional infrastructure costs.

Monthly AWS Spend

$32,000

Cost Reduction

36.8%

Timeline

2 weeks

Published

Mon Jan 20 2025

At a Glance

Client Profile

Industry: B2B SaaS serving e-commerce businesses
Company Stage: Series B, $8M ARR, 85 employees
Tech Stack: 120 EC2 instances (m5.xlarge to m5.4xlarge)
Timeline: 2-week engagement, November 2024

Key Challenge

Infrastructure over-provisioned for Black Friday peak traffic was running at 24% CPU utilization for 10 months of the year, draining runway unnecessarily.

Primary Pain Point: CFO concerned about burn rate while CTO worried about maintaining sub-200ms p95 API latency during seasonal spikes.

36.8%

Monthly Cost Reduction

$11,800 → $7,460/month

58%

Efficiency Improvement

CPU: 24% → 61% utilization

40%

Traffic Growth Enabled

Without infrastructure expansion

The Situation

This e-commerce analytics platform provides real-time inventory insights and demand forecasting for mid-market retailers. Following their Series B funding round, they were experiencing 15% month-over-month customer growth but burning through runway 30% faster than projected.

The engineering team had scaled infrastructure aggressively for Black Friday 2023, successfully handling a 12x traffic spike with zero downtime. However, in the 10 months since, those m5.4xlarge instances sat mostly idle while still costing $11,800 per month.

Their CFO reached out after seeing AWS as their second-largest operational expense after payroll. The CTO's primary concern: "We can't sacrifice performance. Our SLA guarantees sub-200ms API response times, and we're competing against companies with 10x our resources."

Business Context

Revenue Model: Usage-based pricing ($0.02 per API call) with 3,500 paying customers
Traffic Pattern: 2-3x daily peaks during business hours, 12x seasonal spikes (Q4)
Funding Runway: 14 months remaining at current burn rate
Engineering Team: 12 developers, 2 DevOps engineers
Competition: Competing with well-funded Series C/D companies spending 3x on infrastructure

Discovery Phase

Week 1: Data Collection & Analysis

We deployed read-only audit access and analyzed 90 days of CloudWatch metrics, Cost Explorer data, and AWS Compute Optimizer recommendations.

Infrastructure Inventory

Instance Type	Count	Monthly Cost	Avg CPU
m5.xlarge	85	$8,840	22%
m5.2xlarge	30	$2,520	28%
m5.4xlarge	5	$440	18%
Total	120	$11,800	24%

Key Findings

CPU Utilization: Average 24%, peak 38% (outside Black Friday week)
Memory Utilization: Average 41%, never exceeded 62%
Architecture: Static 120 instances, no autoscaling configured
Commitment: Zero Reserved Instances or Savings Plans (100% On-Demand)
Performance: p95 API latency consistently at 145ms (well within SLA)

Traffic Pattern Analysis

Analyzed 90 days of CloudWatch metrics to understand traffic patterns:

Baseline Traffic: 40-50 instances needed (33% of current capacity)
Daily Peak (9am-5pm EST): 80-90 instances needed (75% of current capacity)
Quarterly Peak (Black Friday): 120+ instances needed (100%+ of current capacity)
Off-hours (11pm-6am EST): 30-35 instances sufficient (25% of current capacity)

Key Insight: Static provisioning for peak meant 75% of capacity sat idle most of the time. Autoscaling could reduce baseline to 40 instances while maintaining peak capacity.

Unexpected Discovery: Compute Optimizer recommended Graviton-based m6g instances with 48% better price-performance. However, one critical legacy service had an x86-only dependency we'd need to address.

The Challenge: Graviton Migration Rollback

Our initial optimization plan targeted aggressive right-sizing plus migration to Graviton-based instances. On paper, this would deliver 43% savings. Reality had other plans.

What Happened

Day 3 (Tuesday 14:00): Migrated 40 instances to m6g.xlarge without issues. Monitoring showed improved performance and 35% cost reduction on this subset.

Day 4 (Wednesday 09:30): Attempted migration of remaining 80 instances, including the analytics processing service. Within 15 minutes, customer dashboards showed data processing delays. The legacy Python data pipeline used NumPy compiled for x86 architecture — it ran on ARM64 (Graviton) but with 4x slower performance.

Day 4 (Wednesday 10:15): Immediate rollback decision. Reverted 12 instances back to m5.xlarge within 45 minutes. Customer impact: 30-minute data delay, no SLA breach, proactive notification sent.

Week 2: Rebuilt the data pipeline container with ARM-optimized NumPy, tested in staging, successfully migrated those 12 instances on Day 11.

Root Cause Analysis

Why did this happen despite Compute Optimizer's recommendation?

The Issue: NumPy binary compiled for x86 (Intel/AMD) ran on ARM64 (Graviton) through emulation layer
Performance Impact: Matrix operations (core to their analytics) were 4x slower due to emulation overhead
The Fix: Rebuilt container with ARM-native NumPy from PyPI (pip install numpy on ARM instance)
Post-Fix Performance: ARM-native NumPy actually 18% faster than x86 version on Graviton

Lesson Learned: "Compute Optimizer says it will work" ≠ "It will work for your specific workload." Always validate architecture-specific dependencies in staging first, even when AWS tools recommend the change.

Implementation Approach

Phase 1: Autoscaling Configuration (Days 1-3)

Application Load Balancer Setup

Target Groups: Created separate target groups for API (port 8080) and Admin (port 8081) services
Health Checks: Configured /health endpoint checks every 10 seconds with 2 consecutive success threshold
Deregistration Delay: Set to 60 seconds to allow in-flight requests to complete during scale-down

Auto Scaling Group Configuration

Baseline Capacity: Min 40 instances, desired 40, max 120
Target Tracking: Maintain 60% average CPU utilization
Cooldown Period: 300 seconds scale-up, 600 seconds scale-down (prevent thrashing)
Health Check Grace Period: 180 seconds to allow instance initialization
Termination Policy: OldestInstance + ClosestToNextInstanceHour (cost optimization)

Scheduled Scaling for Dev/Staging

Business Hours (7am-7pm EST): Min 10 instances
Off-Hours (7pm-7am EST): Min 2 instances
Weekends: Min 1 instance (monitoring only)
Monthly Savings: ~$800 on non-production environments

Phase 2: Instance Migration (Days 4-8)

Graviton Migration Strategy

Phased approach to minimize risk:

Day 3-4: Migrated 40 web tier instances to m6g.large (success)
Day 4: Attempted migration of 80 remaining instances (rollback required for 12 instances)
Days 5-7: Rebuilt data pipeline container with ARM-native dependencies
Day 8: Load tested rebuilt container in staging (passed with 18% performance improvement)
Day 11: Successfully migrated final 12 instances to m6g.xlarge

Final Instance Configuration

Service	Instance Type	Count (Baseline)	Monthly Cost
Web Tier	m6g.large	28 (scale 28-90)	$1,680
API Tier	m6g.large	34 (scale 34-60)	$2,040
Analytics Pipeline	m6g.xlarge	12 (static)	$1,440
Legacy Service	m5.2xlarge	6 (static)	$2,300
Total (Baseline)		40-120 (ASG)	$7,460

Phase 3: Monitoring & Validation (Days 9-14)

CloudWatch Dashboards

Cost Tracking: Daily EC2 spend, month-to-date tracking vs. forecast
Scaling Metrics: ASG desired/running/pending capacity, scaling activity timeline
Performance: API latency (p50/p95/p99), error rates, throughput
Resource Utilization: CPU/memory by instance type, network I/O

Load Testing Results

Simulated Black Friday traffic (8x baseline) using Apache JMeter:

Baseline Load: 2,000 requests/second → p95 latency 148ms
2x Load: 4,000 requests/second → p95 latency 152ms (ASG scaled to 62 instances)
5x Load: 10,000 requests/second → p95 latency 165ms (ASG scaled to 95 instances)
8x Load: 16,000 requests/second → p95 latency 178ms (ASG scaled to 118 instances)

Result: ASG successfully maintained sub-200ms p95 latency at 8x traffic. Well within 200ms SLA target.

Runbook Documentation

Created operational playbooks for seasonal events:

Pre-Event Preparation: Increase max capacity, validate health checks, warm up cache
During Event: Monitor scaling activity, track error rates, manual override procedures
Post-Event: Scale-down monitoring, cost analysis, incident retrospective

Results in Detail

Cost Savings Breakdown

Before Optimization

120 m5 instances (static)	$11,800/mo
Average CPU Utilization	24%
Commitment Discounts	0%
Autoscaling	None

After Optimization

40-120 m6g instances (ASG)	$7,460/mo
Average CPU Utilization	61%
Graviton Price Advantage	20%
Autoscaling	Active

$4,340

Monthly Savings

$52,080

Annual Savings

36.8%

Cost Reduction

Performance Impact

Metric	Before	After	Change
Average CPU Utilization	24%	61%	+154%
p50 API Latency	82ms	78ms	-5%
p95 API Latency	145ms	148ms	+2%
p99 API Latency	198ms	192ms	-3%
Error Rate	0.03%	0.02%	-33%
Availability (uptime)	99.97%	99.98%	+0.01%

Business Value

Immediate Financial Impact

$52,080 annual savings = 4.2 months additional runway at current burn rate
Funds 1.3 additional mid-level engineering FTEs
Reduced AWS from 37% to 23% of operational expenses
Zero upfront capital investment required

Growth Enablement

Traffic Capacity: Can now handle 40% traffic growth with only 15% cost increase (vs. 40% under old static model)
Seasonal Readiness: Autoscaling proven to handle 8x traffic spikes without over-provisioning year-round
Customer Acquisition: Lower infrastructure costs improve unit economics, enabling more aggressive customer acquisition

Operational Improvements

DevOps Efficiency: Eliminated manual instance management, freeing 8 hours/week
Incident Response: Documented rollback procedures reduced MTTR by 60%
Monitoring: CloudWatch dashboards provide real-time cost visibility
Documentation: Seasonal event playbooks enable junior engineers to manage scaling

Lessons Learned

✓ What Worked

Phased Migration: Starting with 40 instances caught the NumPy issue before full rollout
Read-Only Audit: Building trust with engineering team from day 1 enabled fast approvals
Graviton Price-Performance: 20% cost reduction + 18% performance improvement on data pipeline
Autoscaling Strategy: Delivered both cost savings AND growth capacity
Load Testing: Validated 8x capacity before Black Friday, giving CTO confidence

✗ What Didn't Work

Staging Validation: Should have tested data pipeline in staging with ARM before production
Migration Batch Size: 80 instances was too aggressive; 20-instance batches better
Customer Communication: Should have set expectation for potential 15-min read-only period
Dependency Analysis: Relied too heavily on Compute Optimizer without validating architecture dependencies

Key Takeaways

Compute Optimizer is a starting point, not gospel: Always validate recommendations in staging, especially for architecture changes
Autoscaling = Cost + Capacity: Don't frame it as just cost savings; it also enables growth without infrastructure expansion
NumPy and ARM: Python scientific libraries often have x86-compiled binaries; rebuild containers with ARM-native packages
Load testing builds confidence: CTO was initially skeptical of autoscaling; 8x load test results were decisive
Document rollback procedures BEFORE changes: Our 45-minute rollback was only possible because we had documented procedures

Applicability to Similar Scenarios

This approach works best for:

Seasonal/variable workloads where traffic patterns are predictable but infrastructure is static
SaaS companies between seed and Series B where every dollar of runway matters
Over-provisioned environments with CPU utilization below 40%
Teams willing to invest 2-3 days in load testing and monitoring setup
Modern architectures already using containerization or easily containerizable workloads

Not recommended for: Compliance-heavy workloads requiring instance-level certification, monolithic applications with hard dependencies on specific instance families, or teams without staging environments.

Similar Challenge?

If your AWS infrastructure is over-provisioned for peak traffic but idle most of the time, we can help you optimize for both cost and performance.

Schedule a Free Assessment

2-week engagement • Read-only audit • Reversible changes • No commitment