savings-plans reserved-instances commitment E-commerce (Marketplace Platform) • Series B, 180-280 employees

Case Study: Optimizing $156K Monthly Compute Spend with Hybrid Commitment Strategy

How we reduced compute costs by 23.8% ($37,100/year saved) by rebalancing Reserved Instances and Savings Plans for a hybrid workload with steady-state and burst patterns

Monthly AWS Spend

$312,000

Cost Reduction

23.8%

Timeline

2 weeks

Published

Sun Jan 12 2025

At a Glance

Client Profile

Industry: E-commerce marketplace platform
Company Stage: Series B, $312,000/month AWS spend
Scale: 3.2M buyers and 48,000 sellers
Timeline: 2-week engagement, January 2025

Business Context

Series B capital efficiency focus: Board wants 30% reduction in AWS spend before Series C. Compute is largest cost category (50% of total AWS spend). Quick wins needed to demonstrate operational discipline.

Primary Pain Point: Inconsistent commitment strategy: 58% coverage with mix of 1-year and 3-year Reserved Instances purchased ad-hoc over 4 years. Missing out on 42% of potential commitment savings.

23.8%

Monthly Compute Cost Reduction

$13,020 → $9,930/month

58% → 89%

Commitment Coverage

+31pp improvement

$37,100

Annual Savings

vs. previous strategy

The Situation

The client's compute infrastructure included:

Database tier: 18 RDS instances (db.r5.2xlarge to db.r5.8xlarge)
Application tier: 85 EC2 instances (m5.xlarge to c5.4xlarge)
Search infrastructure: 12 Elasticsearch nodes (r5.2xlarge)
Background jobs: 24 spot instances + 12 on-demand instances (variable)

Business Context

Revenue Model: Transaction-based (2.5% fee) with 3.2M buyers and 48,000 sellers = $28M ARR
Growth Stage: Series B ($42M raised), 24 months to Series C
Team Structure: 240 total employees (38 engineering, 12 DevOps/SRE, 45 seller success)
Key Business Metrics: 99.9% marketplace uptime, <500ms search latency, 4.2% GMV take rate
Critical Constraints: Must maintain 24/7 database availability, search downtime = immediate GMV loss
Strategic Pressure: Board demands 30% AWS cost reduction before Series C — compute is 50% of AWS spend ($156K/month)

Current Commitment Strategy

Inherited from 4 years of ad-hoc purchasing:

Existing Commitments:

├─ Reserved Instances (RIs):

│ ├─ 8x db.r5.2xlarge (3-year Standard RIs, expires in 6 months)

│ ├─ 4x db.r5.4xlarge (1-year Standard RIs, expires in 2 months)

│ ├─ 28x m5.xlarge (1-year Convertible RIs, expires in 8 months)

│ ├─ 12x m5.2xlarge (3-year Standard RIs, expires in 18 months)

│ └─ 12x r5.2xlarge (1-year Standard RIs, expires in 4 months)

├─ Savings Plans: None

└─ On-demand: 42% of compute spend

Total Commitment Coverage: 58%

The Inefficiency

Analyzing 12 months of billing data revealed:

RI waste: 6 Standard RIs unused (instances downsized, but RIs locked)
Under-commitment: $5,420/month on-demand spend on steady-state workloads
No flexibility: Standard RIs locked to specific instance types/sizes
Fragmented strategy: Purchased reactively whenever someone remembered
Missing Savings Plans: Lambda, Fargate, and cross-region EC2 not covered by RIs

AWS Cost Explorer Recommendations suggested $3,090/month savings opportunity.

Discovery Phase

Week 1: Commitment Audit & Workload Analysis

We analyzed 12 months of EC2, RDS, and Lambda usage:

Infrastructure Inventory

Workload Type	Instance Count	Instance Type	Monthly Cost	Current Coverage
RDS Primary	6	db.r5.4xlarge	$3,240	4 RIs (67%)
RDS Read Replicas	12	db.r5.2xlarge	$2,040	8 RIs (67%)
Application servers	42	m5.xlarge	$2,520	28 RIs (67%)
Search (Elasticsearch)	12	r5.2xlarge	$1,440	12 RIs (100%)
Web servers (burst)	28-65	m5.2xlarge	$1,680	12 RIs (43%)
Background workers	12-36	c5.xlarge	$840	On-demand (0%)
Lambda	--	Various	$840	On-demand (0%)
Fargate	--	Various	$620	On-demand (0%)
Total	112+	--	$13,220	58% coverage

Note: Monthly costs shown are on-demand equivalent. Actual spend with current RIs: $13,020/month

Workload Stability Analysis:

├─ Steady-state (predictable, 24/7):

│ ├─ RDS databases: 18 instances (99.8% uptime)

│ ├─ Core application servers: 42 instances (100% uptime)

│ ├─ Elasticsearch cluster: 12 instances (99.9% uptime)

│ └─ Total: 72 instances = 85% of baseline compute

├─ Variable (burst during sales):

│ ├─ Web servers: Scale 28 → 65 instances during flash sales

│ ├─ Background workers: Scale 12 → 36 instances during peak

│ └─ Total: Baseline 40, peak 101 instances

└─ Serverless (unpredictable):

├─ Lambda: $840/month (image processing, webhooks)

└─ Fargate: $620/month (batch jobs)

Commitment Utilization

Over-committed: 6 RIs unused (downsized from r5.4xlarge → r5.2xlarge)
Under-committed: 42% of compute on-demand
No coverage: Lambda ($840/mo) and Fargate ($620/mo)

The Challenge: Over-Commitment on Burst Workload

What Went Wrong

Based on Cost Explorer recommendations, we initially modeled a strategy:

Purchase Compute Savings Plan covering 95% of baseline usage
Let expiring RIs roll off (don't renew)
Use Savings Plan discount for both steady-state and burst workloads

We purchased $6,200/month Compute Savings Plan (1-year, All Upfront = $74,400 upfront payment).

Two weeks later, problem emerged:

During off-peak season (post-holidays, January-February):

Flash sales frequency dropped 60% (2-3x per week → 1x per week)
Burst workload scaled down to baseline
Savings Plan over-committed: Using only $4,800/month of $6,200/month commitment
Wasted commitment: $1,400/month not utilized = $16,800/year waste

Root Cause: Modeled commitment based on November-December billing (peak season), not annual average.

The Reversal

Can't reverse Savings Plan purchase (non-refundable), but can optimize around it:

Accept the sunk cost: $74,400 already paid, can't recover
Scale up workloads to use commitment: Moved some on-demand workloads to use Savings Plan discount (migrated Lambda functions to 24/7 Fargate, kept background workers running longer)
Adjust future commitment strategy: Model based on P10 usage (low season), not peak

The Fix

After analysis, determined optimal commitment level:

Revised Strategy:

├─ Compute Savings Plan: $4,200/month (not $6,200)

│ ├─ Covers: 72 steady-state instances + Lambda + Fargate

│ ├─ Utilization: 98% average (including off-season)

│ └─ Term: 1-year, No Upfront (flexibility over max discount)

├─ EC2 Instance Savings Plan: $1,800/month

│ ├─ Covers: Additional EC2 instances (burst baseline)

│ ├─ Utilization: 95% average

│ └─ Term: 1-year, Partial Upfront

├─ RDS Reserved Instances: 12 instances

│ ├─ Covers: Database tier (predictable, 24/7)

│ ├─ Utilization: 99.8%

│ └─ Term: 3-year, All Upfront (maximum discount)

└─ On-demand: Burst workloads above baseline

└─ Remaining 11% of compute (flash sales, seasonal peaks)

Lesson: Model commitments based on minimum usage (P10), not average or peak. Commit to the floor, not the ceiling. Use on-demand for variability.

Implementation Approach

Phase 1: Audit Existing Commitments (Week 1)

Step 1: Inventory All Commitments

Used AWS CLI and Cost Explorer to generate comprehensive commitment inventory:

# List all Reserved Instances across regions

aws ec2 describe-reserved-instances \

--query 'ReservedInstances[*].[InstanceType,InstanceCount,State,End]' \

--output table

# List all RDS Reserved Instances

aws rds describe-reserved-db-instances \

--query 'ReservedDBInstances[*].[DBInstanceClass,DBInstanceCount,State,StartTime]' \

--output table

# Check Savings Plans commitments

aws savingsplans describe-savings-plans \

--query 'savingsPlans[*].[savingsPlanType,commitment,state,start]' \

--output table

Results: 62 Reserved Instances identified (46 EC2, 16 RDS), zero Savings Plans, total committed spend: $7,880/month

Step 2: Analyze RI Utilization

Pulled 12 months of RI utilization data from Cost Explorer:

# Generate RI utilization report

aws ce get-reservation-utilization \

--time-period Start=2024-01-01,End=2025-01-01 \

--granularity MONTHLY \

--group-by Type=DIMENSION,Key=INSTANCE_TYPE

High Utilization (>95%): db.r5.2xlarge (Elasticsearch), m5.xlarge (app servers) — 40 RIs
Medium Utilization (70-95%): db.r5.4xlarge (RDS primary) — 4 RIs
Low Utilization (<70%): r5.2xlarge (downsized from r5.4xlarge) — 6 RIs
Wasted Commitment: $840/month on 6 unused RIs (instances downsized 8 months ago, RIs still active)

Step 3: Map Expiration Timeline

Created expiration calendar to identify renewal windows:

Expiration Date	Instance Type	Count	Monthly Value	Action Needed
Feb 2025	db.r5.4xlarge	4	$1,080	Renew or replace
Apr 2025	r5.2xlarge	12	$1,440	Don't renew
Jun 2025	db.r5.2xlarge	8	$1,360	Renew 3-year
Aug 2025	m5.xlarge	28	$1,680	Replace with SP
Jun 2026	m5.2xlarge	12	$1,440	No action (18mo)

Step 4: Usage Pattern Classification

Analyzed 12-month CloudWatch and Cost & Usage Reports to classify workloads:

Steady-state (commit with RIs): 72 instances running 24/7 at >95% uptime — ideal for 3-year RIs
Variable (commit with Savings Plans): 40 instances scaling 12-36 instances — need flexibility of Compute SP
Serverless (commit with Compute SP): Lambda + Fargate $1,460/month — can't use RIs, need Compute SP
Burst (on-demand only): Peak scaling beyond baseline — keep on-demand for elasticity

Key Finding: $840/month wasted on unused RIs + $5,420/month on-demand spend that should be committed = $6,260/month opportunity

Phase 2: Purchase Strategy (Week 2)

Step 1: Model Commitment Scenarios

Used Cost Explorer Savings Plans Recommendations and custom modeling:

# Generate Savings Plans recommendations (P10 usage)

aws ce get-savings-plans-purchase-recommendation \

--lookback-period-in-days 90 \

--term-in-years ONE_YEAR \

--payment-option NO_UPFRONT \

--savings-plans-type COMPUTE_SP

Modeling Approach: Modeled 3 scenarios (conservative, moderate, aggressive) based on P10, P50, P90 usage over 12 months.

Conservative (P10): $4,200/month Compute SP + $1,800/month EC2 SP = 97% utilization guarantee
Moderate (P50): $5,600/month Compute SP + $2,400/month EC2 SP = 90% utilization expected
Aggressive (P90): $6,800/month Compute SP + $3,000/month EC2 SP = 75% utilization risk

Decision: Selected Conservative model to avoid over-commitment (learned from initial mistake — see Challenge section).

Step 2: Execute Purchases in Priority Order

Purchased commitments starting with highest certainty:

Priority	Commitment Type	Monthly Commit	Discount	Rationale
1	RDS RIs (3yr, All Up)	$4,320	62%	100% uptime guarantee
2	Compute SP (1yr, No Up)	$4,200	52%	Flexible, covers Lambda/Fargate
3	EC2 Instance SP (1yr, Partial)	$1,800	48%	EC2-specific, higher discount
4	RI Exchange (Convertible)	$840	--	Recover wasted Standard RIs
Total New Commitments		$10,320	54% avg	--

# Purchase RDS Reserved Instances (highest priority)

aws rds purchase-reserved-db-instances-offering \

--reserved-db-instances-offering-id ########-####-####-####-############ \

--reserved-db-instance-id prod-rds-primary-ri \

--db-instance-count 12

# Purchase Compute Savings Plan

aws savingsplans create-savings-plan \

--savings-plan-offering-id ########-####-####-####-############ \

--commitment 4200.00 \

--upfront-payment-amount 0.00 \

--purchase-time 2025-01-15T10:00:00Z

Step 3: Exchange Unused Standard RIs

Recovered value from 6 unused Standard RIs by converting to Convertible RIs:

Problem: 6x r5.4xlarge Standard RIs unused (instances downsized to r5.2xlarge 8 months ago)
Limitation: Standard RIs can't be modified or exchanged once purchased
Workaround: AWS Support allowed one-time exchange to Convertible RIs due to documented operational need
Exchange: 6x r5.4xlarge → 12x r5.2xlarge Convertible RIs (same total compute capacity)
Recovery: $840/month wasted commitment now 100% utilized

Note: Standard RI exchange to Convertible requires AWS Support case. Not guaranteed. In future, only purchase Convertible RIs for flexibility.

Step 4: Validate Purchase Impact

After purchases completed (24-48 hour activation):

Upfront Payment: $155,520 (RDS 3-year All Upfront) + $10,800 (EC2 SP Partial Upfront) = $166,320 total
Monthly Commitment: $10,320/month ($4,320 RDS RI + $4,200 Compute SP + $1,800 EC2 SP)
Coverage Improvement: 58% → 89% commitment coverage
On-Demand Remaining: $1,000/month (11% of compute, reserved for burst scaling)

Result: $3,090/month savings (23.8% reduction) with 98% commitment utilization (no waste)

Phase 3: Monitoring & Ongoing Optimization

CloudWatch Dashboard Setup

Created real-time commitment tracking dashboard:

Utilization Metrics:
- Savings Plans utilization % (target: >97%)
- Reserved Instances utilization % by instance type
- On-demand spend as % of total compute (target: <15%)
- Weekly utilization trend (7-day moving average)
Coverage Metrics:
- Total commitment coverage % (steady-state workloads)
- Coverage gap identification (on-demand spend that could be committed)
- Commitment expiration calendar (next 90 days)
Cost Metrics:
- Daily compute spend (committed vs. on-demand)
- Month-to-date savings vs. all on-demand
- Savings Plans vs. Reserved Instances cost comparison

Automated Alerts

Configured CloudWatch Alarms and AWS Budgets for commitment monitoring:

# Alert if Savings Plans utilization drops below 90%

aws cloudwatch put-metric-alarm \

--alarm-name savings-plans-utilization-low \

--metric-name SavingsPlansUtilization \

--namespace AWS/SavingsPlans \

--statistic Average \

--period 86400 \

--threshold 90 \

--comparison-operator LessThanThreshold

# Alert if on-demand spend exceeds 15% of total

aws budgets create-notification \

--budget-name on-demand-spend-threshold \

--notification NotificationType=ACTUAL,ComparisonOperator=GREATER_THAN,Threshold=15

Low Utilization Alert: Triggers if Savings Plans utilization < 90% for 3 consecutive days
High On-Demand Alert: Triggers if on-demand spend > 15% of total compute (indicates under-commitment)
Expiration Reminder: Triggers 90 days before any RI/SP expires (allows renewal planning)
Cost Anomaly: Triggers if daily compute cost deviates > 20% from 7-day average

Weekly Commitment Report

Automated Slack report every Monday morning:

Weekly Commitment Report — Jan 20, 2025

Savings Plans Utilization: 97.8% ✅

Reserved Instances Utilization: 99.2% ✅

On-Demand Spend: $1,020 (10.3% of total) ✅

This Week vs. Last Week:

├─ Total Compute: $9,930 (−$60, −0.6%)

├─ Savings vs. On-Demand: $3,090 (23.8%)

└─ Annualized Savings: $37,080

Action Items:

• None — all metrics healthy ✅

Quarterly Commitment Review

Scheduled quarterly reviews to optimize commitment strategy:

Q1 2025 (Feb): Review RDS RI expirations, renew 12x db.r5.2xlarge for 3 years (confirmed 99.9% utilization)
Q2 2025 (May): Evaluate Compute SP increase from $4,200 → $4,800 if Lambda usage grows >15%
Q3 2025 (Aug): Decision point: Renew expiring m5.xlarge RIs or replace with Savings Plans?
Q4 2025 (Nov): Annual review: Model next year's commitment strategy based on growth trajectory

Ongoing Optimization: Commitment utilization monitored weekly. In first 60 days post-implementation, maintained 97%+ utilization with zero under-commitment issues (learned from initial over-commitment mistake).

Results in Detail

Cost Savings Breakdown

Component	Before	After	Monthly Savings
RDS (on-demand → RIs)	$5,280	$2,016	−$3,264 (61.8%)
EC2 steady-state (on-demand → Compute SP)	$4,840	$2,904	−$1,936 (40.0%)
EC2 burst (on-demand → EC2 Instance SP)	$1,680	$1,260	−$420 (25.0%)
Lambda/Fargate (on-demand → Compute SP)	$1,460	$1,050	−$410 (28.1%)
Unused RIs recovered	−$840	$0	+$840
Total Compute	$13,020	$9,930	−$3,090 (23.8%)

Annual savings: $3,090/month × 12 = $37,100/year

Commitment Coverage Improvement

Before

Total compute spend: $13,020/month
Committed spend: $7,880/month (60.5%)
On-demand spend: $5,140/month (39.5%)
Coverage: 58%

After

Total compute spend: $9,930/month
Committed spend: $8,930/month (89.9%)
On-demand spend: $1,000/month (10.1%)
Coverage: 89%

Business Value

Immediate Impact

$3,090/month = $37,100 annual savings (23.8% reduction)
Improved gross margins by 1.2%
Demonstrated Series B capital efficiency for Board

Long-term Value

Predictable costs: 89% of compute now fixed-price (easier forecasting)
Flexibility: Compute SP covers Lambda/Fargate (supports serverless migration)
Commitment calendar: Proactive renewal strategy prevents future waste
Scalability: 11% on-demand buffer handles growth without re-commitment

Real Example: Flash Sale Economics

Before optimization: Flash sale (2-hour event) required 73 additional EC2 instances at $124/event (all on-demand). 8 sales per month = $992/month.

After optimization: Same 73 instances cost $58/event (40% covered by unused Compute SP capacity, 60% on-demand). 8 sales per month = $464/month.

53% reduction in flash sale infrastructure cost — commitment strategy improves burst economics too.

Lessons Learned

What Worked

Hybrid commitment strategy: RDS RIs (3-year) for predictable, Savings Plans (1-year) for flexible
P10 usage modeling: Committing to minimum usage (not average) prevented over-commitment
Convertible RI exchanges: Recovered $840/month wasted on unused Standard RIs
Serverless coverage: Compute SP covered Lambda/Fargate (RIs can't)

What Didn't Work

Initial over-commitment: Purchased $6,200/month SP based on peak season, wasted $1,400/month off-season
Cost Explorer recommendations: Blindly following AWS recommendations led to over-commitment
All Upfront for variable workloads: Locked in $74,400 with no flexibility

Key Takeaways

Commit to the floor, not the ceiling: Model commitments on P10 usage, not average or peak
Understand your workload: Steady-state vs. burst vs. serverless require different strategies
Savings Plans ≠ Reserved Instances: SP more flexible but also more dangerous (easier to over-commit)
Payment flexibility matters: No Upfront costs more per hour, but provides flexibility to adjust
Monitor utilization obsessively: 97%+ utilization is optimal, < 90% means over-commitment

Need Commitment Optimization?

If your AWS infrastructure has grown with ad-hoc Reserved Instance purchases, we can help rebalance your commitment strategy for optimal coverage and utilization.

Schedule a Free Assessment

2-week engagement • Read-only audit • High-confidence commitments only