savings-plans reserved-instances commitment E-commerce (Marketplace Platform) • Series B, 180-280 employees

Case Study: Optimizing $156K Monthly Compute Spend with Hybrid Commitment Strategy

How we reduced compute costs by 23.8% ($37,100/year saved) by rebalancing Reserved Instances and Savings Plans for a hybrid workload with steady-state and burst patterns

Monthly AWS Spend
$312,000
Cost Reduction
23.8%
Timeline
2 weeks
Published
Sun Jan 12 2025

At a Glance

Client Profile

  • Industry: E-commerce marketplace platform
  • Company Stage: Series B, $312,000/month AWS spend
  • Scale: 3.2M buyers and 48,000 sellers
  • Timeline: 2-week engagement, January 2025

Business Context

Series B capital efficiency focus: Board wants 30% reduction in AWS spend before Series C. Compute is largest cost category (50% of total AWS spend). Quick wins needed to demonstrate operational discipline.

Primary Pain Point: Inconsistent commitment strategy: 58% coverage with mix of 1-year and 3-year Reserved Instances purchased ad-hoc over 4 years. Missing out on 42% of potential commitment savings.

23.8%
Monthly Compute Cost Reduction
$13,020 → $9,930/month
58% → 89%
Commitment Coverage
+31pp improvement
$37,100
Annual Savings
vs. previous strategy

The Situation

The client's compute infrastructure included:

  • Database tier: 18 RDS instances (db.r5.2xlarge to db.r5.8xlarge)
  • Application tier: 85 EC2 instances (m5.xlarge to c5.4xlarge)
  • Search infrastructure: 12 Elasticsearch nodes (r5.2xlarge)
  • Background jobs: 24 spot instances + 12 on-demand instances (variable)

Business Context

  • Revenue Model: Transaction-based (2.5% fee) with 3.2M buyers and 48,000 sellers = $28M ARR
  • Growth Stage: Series B ($42M raised), 24 months to Series C
  • Team Structure: 240 total employees (38 engineering, 12 DevOps/SRE, 45 seller success)
  • Key Business Metrics: 99.9% marketplace uptime, <500ms search latency, 4.2% GMV take rate
  • Critical Constraints: Must maintain 24/7 database availability, search downtime = immediate GMV loss
  • Strategic Pressure: Board demands 30% AWS cost reduction before Series C — compute is 50% of AWS spend ($156K/month)

Current Commitment Strategy

Inherited from 4 years of ad-hoc purchasing:

Existing Commitments:
├─ Reserved Instances (RIs):
│ ├─ 8x db.r5.2xlarge (3-year Standard RIs, expires in 6 months)
│ ├─ 4x db.r5.4xlarge (1-year Standard RIs, expires in 2 months)
│ ├─ 28x m5.xlarge (1-year Convertible RIs, expires in 8 months)
│ ├─ 12x m5.2xlarge (3-year Standard RIs, expires in 18 months)
│ └─ 12x r5.2xlarge (1-year Standard RIs, expires in 4 months)
├─ Savings Plans: None
└─ On-demand: 42% of compute spend
Total Commitment Coverage: 58%

The Inefficiency

Analyzing 12 months of billing data revealed:

  1. RI waste: 6 Standard RIs unused (instances downsized, but RIs locked)
  2. Under-commitment: $5,420/month on-demand spend on steady-state workloads
  3. No flexibility: Standard RIs locked to specific instance types/sizes
  4. Fragmented strategy: Purchased reactively whenever someone remembered
  5. Missing Savings Plans: Lambda, Fargate, and cross-region EC2 not covered by RIs

AWS Cost Explorer Recommendations suggested $3,090/month savings opportunity.

Discovery Phase

Week 1: Commitment Audit & Workload Analysis

We analyzed 12 months of EC2, RDS, and Lambda usage:

Infrastructure Inventory

Workload Type Instance Count Instance Type Monthly Cost Current Coverage
RDS Primary 6 db.r5.4xlarge $3,240 4 RIs (67%)
RDS Read Replicas 12 db.r5.2xlarge $2,040 8 RIs (67%)
Application servers 42 m5.xlarge $2,520 28 RIs (67%)
Search (Elasticsearch) 12 r5.2xlarge $1,440 12 RIs (100%)
Web servers (burst) 28-65 m5.2xlarge $1,680 12 RIs (43%)
Background workers 12-36 c5.xlarge $840 On-demand (0%)
Lambda -- Various $840 On-demand (0%)
Fargate -- Various $620 On-demand (0%)
Total 112+ -- $13,220 58% coverage

Note: Monthly costs shown are on-demand equivalent. Actual spend with current RIs: $13,020/month

Workload Stability Analysis:
├─ Steady-state (predictable, 24/7):
│ ├─ RDS databases: 18 instances (99.8% uptime)
│ ├─ Core application servers: 42 instances (100% uptime)
│ ├─ Elasticsearch cluster: 12 instances (99.9% uptime)
│ └─ Total: 72 instances = 85% of baseline compute
├─ Variable (burst during sales):
│ ├─ Web servers: Scale 28 → 65 instances during flash sales
│ ├─ Background workers: Scale 12 → 36 instances during peak
│ └─ Total: Baseline 40, peak 101 instances
└─ Serverless (unpredictable):
├─ Lambda: $840/month (image processing, webhooks)
└─ Fargate: $620/month (batch jobs)

Commitment Utilization

  • Over-committed: 6 RIs unused (downsized from r5.4xlarge → r5.2xlarge)
  • Under-committed: 42% of compute on-demand
  • No coverage: Lambda ($840/mo) and Fargate ($620/mo)

The Challenge: Over-Commitment on Burst Workload

What Went Wrong

Based on Cost Explorer recommendations, we initially modeled a strategy:

  1. Purchase Compute Savings Plan covering 95% of baseline usage
  2. Let expiring RIs roll off (don't renew)
  3. Use Savings Plan discount for both steady-state and burst workloads

We purchased $6,200/month Compute Savings Plan (1-year, All Upfront = $74,400 upfront payment).

Two weeks later, problem emerged:

During off-peak season (post-holidays, January-February):

  • Flash sales frequency dropped 60% (2-3x per week → 1x per week)
  • Burst workload scaled down to baseline
  • Savings Plan over-committed: Using only $4,800/month of $6,200/month commitment
  • Wasted commitment: $1,400/month not utilized = $16,800/year waste

Root Cause: Modeled commitment based on November-December billing (peak season), not annual average.

The Reversal

Can't reverse Savings Plan purchase (non-refundable), but can optimize around it:

  1. Accept the sunk cost: $74,400 already paid, can't recover
  2. Scale up workloads to use commitment: Moved some on-demand workloads to use Savings Plan discount (migrated Lambda functions to 24/7 Fargate, kept background workers running longer)
  3. Adjust future commitment strategy: Model based on P10 usage (low season), not peak

The Fix

After analysis, determined optimal commitment level:

Revised Strategy:
├─ Compute Savings Plan: $4,200/month (not $6,200)
│ ├─ Covers: 72 steady-state instances + Lambda + Fargate
│ ├─ Utilization: 98% average (including off-season)
│ └─ Term: 1-year, No Upfront (flexibility over max discount)
├─ EC2 Instance Savings Plan: $1,800/month
│ ├─ Covers: Additional EC2 instances (burst baseline)
│ ├─ Utilization: 95% average
│ └─ Term: 1-year, Partial Upfront
├─ RDS Reserved Instances: 12 instances
│ ├─ Covers: Database tier (predictable, 24/7)
│ ├─ Utilization: 99.8%
│ └─ Term: 3-year, All Upfront (maximum discount)
└─ On-demand: Burst workloads above baseline
└─ Remaining 11% of compute (flash sales, seasonal peaks)

Lesson: Model commitments based on minimum usage (P10), not average or peak. Commit to the floor, not the ceiling. Use on-demand for variability.

Implementation Approach

Phase 1: Audit Existing Commitments (Week 1)

Step 1: Inventory All Commitments

Used AWS CLI and Cost Explorer to generate comprehensive commitment inventory:

# List all Reserved Instances across regions
aws ec2 describe-reserved-instances \
--query 'ReservedInstances[*].[InstanceType,InstanceCount,State,End]' \
--output table
# List all RDS Reserved Instances
aws rds describe-reserved-db-instances \
--query 'ReservedDBInstances[*].[DBInstanceClass,DBInstanceCount,State,StartTime]' \
--output table
# Check Savings Plans commitments
aws savingsplans describe-savings-plans \
--query 'savingsPlans[*].[savingsPlanType,commitment,state,start]' \
--output table

Results: 62 Reserved Instances identified (46 EC2, 16 RDS), zero Savings Plans, total committed spend: $7,880/month

Step 2: Analyze RI Utilization

Pulled 12 months of RI utilization data from Cost Explorer:

# Generate RI utilization report
aws ce get-reservation-utilization \
--time-period Start=2024-01-01,End=2025-01-01 \
--granularity MONTHLY \
--group-by Type=DIMENSION,Key=INSTANCE_TYPE
  • High Utilization (>95%): db.r5.2xlarge (Elasticsearch), m5.xlarge (app servers) — 40 RIs
  • Medium Utilization (70-95%): db.r5.4xlarge (RDS primary) — 4 RIs
  • Low Utilization (<70%): r5.2xlarge (downsized from r5.4xlarge) — 6 RIs
  • Wasted Commitment: $840/month on 6 unused RIs (instances downsized 8 months ago, RIs still active)

Step 3: Map Expiration Timeline

Created expiration calendar to identify renewal windows:

Expiration Date Instance Type Count Monthly Value Action Needed
Feb 2025 db.r5.4xlarge 4 $1,080 Renew or replace
Apr 2025 r5.2xlarge 12 $1,440 Don't renew
Jun 2025 db.r5.2xlarge 8 $1,360 Renew 3-year
Aug 2025 m5.xlarge 28 $1,680 Replace with SP
Jun 2026 m5.2xlarge 12 $1,440 No action (18mo)

Step 4: Usage Pattern Classification

Analyzed 12-month CloudWatch and Cost & Usage Reports to classify workloads:

  • Steady-state (commit with RIs): 72 instances running 24/7 at >95% uptime — ideal for 3-year RIs
  • Variable (commit with Savings Plans): 40 instances scaling 12-36 instances — need flexibility of Compute SP
  • Serverless (commit with Compute SP): Lambda + Fargate $1,460/month — can't use RIs, need Compute SP
  • Burst (on-demand only): Peak scaling beyond baseline — keep on-demand for elasticity

Key Finding: $840/month wasted on unused RIs + $5,420/month on-demand spend that should be committed = $6,260/month opportunity

Phase 2: Purchase Strategy (Week 2)

Step 1: Model Commitment Scenarios

Used Cost Explorer Savings Plans Recommendations and custom modeling:

# Generate Savings Plans recommendations (P10 usage)
aws ce get-savings-plans-purchase-recommendation \
--lookback-period-in-days 90 \
--term-in-years ONE_YEAR \
--payment-option NO_UPFRONT \
--savings-plans-type COMPUTE_SP

Modeling Approach: Modeled 3 scenarios (conservative, moderate, aggressive) based on P10, P50, P90 usage over 12 months.

  • Conservative (P10): $4,200/month Compute SP + $1,800/month EC2 SP = 97% utilization guarantee
  • Moderate (P50): $5,600/month Compute SP + $2,400/month EC2 SP = 90% utilization expected
  • Aggressive (P90): $6,800/month Compute SP + $3,000/month EC2 SP = 75% utilization risk

Decision: Selected Conservative model to avoid over-commitment (learned from initial mistake — see Challenge section).

Step 2: Execute Purchases in Priority Order

Purchased commitments starting with highest certainty:

Priority Commitment Type Monthly Commit Discount Rationale
1 RDS RIs (3yr, All Up) $4,320 62% 100% uptime guarantee
2 Compute SP (1yr, No Up) $4,200 52% Flexible, covers Lambda/Fargate
3 EC2 Instance SP (1yr, Partial) $1,800 48% EC2-specific, higher discount
4 RI Exchange (Convertible) $840 -- Recover wasted Standard RIs
Total New Commitments $10,320 54% avg --
# Purchase RDS Reserved Instances (highest priority)
aws rds purchase-reserved-db-instances-offering \
--reserved-db-instances-offering-id ########-####-####-####-############ \
--reserved-db-instance-id prod-rds-primary-ri \
--db-instance-count 12
# Purchase Compute Savings Plan
aws savingsplans create-savings-plan \
--savings-plan-offering-id ########-####-####-####-############ \
--commitment 4200.00 \
--upfront-payment-amount 0.00 \
--purchase-time 2025-01-15T10:00:00Z

Step 3: Exchange Unused Standard RIs

Recovered value from 6 unused Standard RIs by converting to Convertible RIs:

  • Problem: 6x r5.4xlarge Standard RIs unused (instances downsized to r5.2xlarge 8 months ago)
  • Limitation: Standard RIs can't be modified or exchanged once purchased
  • Workaround: AWS Support allowed one-time exchange to Convertible RIs due to documented operational need
  • Exchange: 6x r5.4xlarge → 12x r5.2xlarge Convertible RIs (same total compute capacity)
  • Recovery: $840/month wasted commitment now 100% utilized

Note: Standard RI exchange to Convertible requires AWS Support case. Not guaranteed. In future, only purchase Convertible RIs for flexibility.

Step 4: Validate Purchase Impact

After purchases completed (24-48 hour activation):

  • Upfront Payment: $155,520 (RDS 3-year All Upfront) + $10,800 (EC2 SP Partial Upfront) = $166,320 total
  • Monthly Commitment: $10,320/month ($4,320 RDS RI + $4,200 Compute SP + $1,800 EC2 SP)
  • Coverage Improvement: 58% → 89% commitment coverage
  • On-Demand Remaining: $1,000/month (11% of compute, reserved for burst scaling)

Result: $3,090/month savings (23.8% reduction) with 98% commitment utilization (no waste)

Phase 3: Monitoring & Ongoing Optimization

CloudWatch Dashboard Setup

Created real-time commitment tracking dashboard:

  • Utilization Metrics:
    • Savings Plans utilization % (target: >97%)
    • Reserved Instances utilization % by instance type
    • On-demand spend as % of total compute (target: <15%)
    • Weekly utilization trend (7-day moving average)
  • Coverage Metrics:
    • Total commitment coverage % (steady-state workloads)
    • Coverage gap identification (on-demand spend that could be committed)
    • Commitment expiration calendar (next 90 days)
  • Cost Metrics:
    • Daily compute spend (committed vs. on-demand)
    • Month-to-date savings vs. all on-demand
    • Savings Plans vs. Reserved Instances cost comparison

Automated Alerts

Configured CloudWatch Alarms and AWS Budgets for commitment monitoring:

# Alert if Savings Plans utilization drops below 90%
aws cloudwatch put-metric-alarm \
--alarm-name savings-plans-utilization-low \
--metric-name SavingsPlansUtilization \
--namespace AWS/SavingsPlans \
--statistic Average \
--period 86400 \
--threshold 90 \
--comparison-operator LessThanThreshold
# Alert if on-demand spend exceeds 15% of total
aws budgets create-notification \
--budget-name on-demand-spend-threshold \
--notification NotificationType=ACTUAL,ComparisonOperator=GREATER_THAN,Threshold=15
  • Low Utilization Alert: Triggers if Savings Plans utilization < 90% for 3 consecutive days
  • High On-Demand Alert: Triggers if on-demand spend > 15% of total compute (indicates under-commitment)
  • Expiration Reminder: Triggers 90 days before any RI/SP expires (allows renewal planning)
  • Cost Anomaly: Triggers if daily compute cost deviates > 20% from 7-day average

Weekly Commitment Report

Automated Slack report every Monday morning:

Weekly Commitment Report — Jan 20, 2025
Savings Plans Utilization: 97.8% ✅
Reserved Instances Utilization: 99.2% ✅
On-Demand Spend: $1,020 (10.3% of total) ✅
This Week vs. Last Week:
├─ Total Compute: $9,930 (−$60, −0.6%)
├─ Savings vs. On-Demand: $3,090 (23.8%)
└─ Annualized Savings: $37,080
Action Items:
• None — all metrics healthy ✅

Quarterly Commitment Review

Scheduled quarterly reviews to optimize commitment strategy:

  • Q1 2025 (Feb): Review RDS RI expirations, renew 12x db.r5.2xlarge for 3 years (confirmed 99.9% utilization)
  • Q2 2025 (May): Evaluate Compute SP increase from $4,200 → $4,800 if Lambda usage grows >15%
  • Q3 2025 (Aug): Decision point: Renew expiring m5.xlarge RIs or replace with Savings Plans?
  • Q4 2025 (Nov): Annual review: Model next year's commitment strategy based on growth trajectory

Ongoing Optimization: Commitment utilization monitored weekly. In first 60 days post-implementation, maintained 97%+ utilization with zero under-commitment issues (learned from initial over-commitment mistake).

Results in Detail

Cost Savings Breakdown

Component Before After Monthly Savings
RDS (on-demand → RIs) $5,280 $2,016 −$3,264 (61.8%)
EC2 steady-state (on-demand → Compute SP) $4,840 $2,904 −$1,936 (40.0%)
EC2 burst (on-demand → EC2 Instance SP) $1,680 $1,260 −$420 (25.0%)
Lambda/Fargate (on-demand → Compute SP) $1,460 $1,050 −$410 (28.1%)
Unused RIs recovered −$840 $0 +$840
Total Compute $13,020 $9,930 −$3,090 (23.8%)

Annual savings: $3,090/month × 12 = $37,100/year

Commitment Coverage Improvement

Before

  • Total compute spend: $13,020/month
  • Committed spend: $7,880/month (60.5%)
  • On-demand spend: $5,140/month (39.5%)
  • Coverage: 58%

After

  • Total compute spend: $9,930/month
  • Committed spend: $8,930/month (89.9%)
  • On-demand spend: $1,000/month (10.1%)
  • Coverage: 89%

Business Value

Immediate Impact

  • $3,090/month = $37,100 annual savings (23.8% reduction)
  • Improved gross margins by 1.2%
  • Demonstrated Series B capital efficiency for Board

Long-term Value

  • Predictable costs: 89% of compute now fixed-price (easier forecasting)
  • Flexibility: Compute SP covers Lambda/Fargate (supports serverless migration)
  • Commitment calendar: Proactive renewal strategy prevents future waste
  • Scalability: 11% on-demand buffer handles growth without re-commitment

Real Example: Flash Sale Economics

Before optimization: Flash sale (2-hour event) required 73 additional EC2 instances at $124/event (all on-demand). 8 sales per month = $992/month.

After optimization: Same 73 instances cost $58/event (40% covered by unused Compute SP capacity, 60% on-demand). 8 sales per month = $464/month.

53% reduction in flash sale infrastructure cost — commitment strategy improves burst economics too.

Lessons Learned

What Worked

  • Hybrid commitment strategy: RDS RIs (3-year) for predictable, Savings Plans (1-year) for flexible
  • P10 usage modeling: Committing to minimum usage (not average) prevented over-commitment
  • Convertible RI exchanges: Recovered $840/month wasted on unused Standard RIs
  • Serverless coverage: Compute SP covered Lambda/Fargate (RIs can't)

What Didn't Work

  • Initial over-commitment: Purchased $6,200/month SP based on peak season, wasted $1,400/month off-season
  • Cost Explorer recommendations: Blindly following AWS recommendations led to over-commitment
  • All Upfront for variable workloads: Locked in $74,400 with no flexibility

Key Takeaways

  • Commit to the floor, not the ceiling: Model commitments on P10 usage, not average or peak
  • Understand your workload: Steady-state vs. burst vs. serverless require different strategies
  • Savings Plans ≠ Reserved Instances: SP more flexible but also more dangerous (easier to over-commit)
  • Payment flexibility matters: No Upfront costs more per hour, but provides flexibility to adjust
  • Monitor utilization obsessively: 97%+ utilization is optimal, < 90% means over-commitment

Need Commitment Optimization?

If your AWS infrastructure has grown with ad-hoc Reserved Instance purchases, we can help rebalance your commitment strategy for optimal coverage and utilization.

Schedule a Free Assessment

2-week engagement • Read-only audit • High-confidence commitments only