s3 storage lifecycle Media & Entertainment (Video Streaming) • Series C, 200-350 employees

Case Study: Reducing S3 Storage Costs 38.7% for Media Platform with 2.4 PB Data

How we cut storage costs from $58,400 to $35,800/month by implementing intelligent tiering and lifecycle policies for a video streaming platform, while improving retrieval performance

Monthly AWS Spend
$340,000
Cost Reduction
38.7%
Timeline
2 weeks
Published
Mon Jan 13 2025

At a Glance

Client Profile

  • Industry: Video streaming platform
  • Company Stage: Series C, $340,000/month AWS spend
  • Infrastructure: 2.4 PB across 180 S3 buckets
  • Timeline: 2-week engagement, January 2025

Business Context

Series C profitability pressure: investors want path to positive unit economics. Content library is strategic asset (can't delete old videos), but current storage strategy unsustainable at scale.

Primary Pain Point: Treating all data equally — 10-year-old videos in Standard storage costing the same as today's uploads. Storage costs growing 6-8% monthly.

38.7%
Monthly S3 Cost Reduction
$58,400 → $35,800/month
1.8 PB
Moved to Cold Storage
75% of total data
−18%
Retrieval Latency
Hot tier optimized

The Situation

The client's video platform had accumulated 2.4 PB of data over 6 years of operation. Storage breakdown:

  • Video library: 1,960 PB (source files + transcoded formats)
  • Thumbnails & metadata: 180 TB
  • User uploads: 140 TB
  • Analytics logs: 85 TB
  • Database backups: 35 TB

All data stored in S3 Standard storage class, regardless of age (videos from 2018 cost same as videos from 2024), access frequency (90% of videos accessed < 5 times/year), or business value (failed user uploads stored same as published content).

The Access Pattern Reality

Analysis of 90-day CloudFront access logs revealed:

  • 15% of videos (library from last 6 months) = 82% of views
  • 60% of videos (1-3 years old) = 16% of views
  • 25% of videos (3+ years old) = 2% of views

Classic long-tail distribution: Most content rarely accessed, but can't be deleted.

Business Context

  • Revenue Model: Subscription-based ($9.99-$29.99/month) with 580,000 active subscribers = $11.2M MRR
  • Growth Stage: Series C, preparing for Series D (need to show improving unit economics)
  • Team Structure: 285 total employees (42 engineering, 18 infrastructure/DevOps, 85 content operations)
  • Key Business Metrics: 97.2% content availability SLA, 2-second video start time, 99.5% uptime commitment
  • Critical Constraints: Can't delete old content (creator agreements require 10-year retention), must maintain instant playback for all videos
  • Strategic Pressure: Investors want path to profitability — storage costs growing 6-8% monthly while revenue growing 4-5% monthly

Discovery Phase

Week 1: Access Pattern Analysis & Storage Audit

We analyzed S3 access logs, CloudFront patterns, and storage inventory:

Infrastructure Inventory

Workload Type Volume Object Count Monthly Cost Storage Class
Video library (source files) 1,200 TB 2.8M files $27,600 S3 Standard
Transcoded formats 760 TB 12.4M files $17,480 S3 Standard
Thumbnails & metadata 180 TB 85M files $4,140 S3 Standard
User uploads 140 TB 1.2M files $3,220 S3 Standard
Analytics logs 85 TB 420M files $1,955 S3 Standard
Database backups 35 TB 2,840 files $805 S3 Standard
Incomplete multipart uploads 18 TB 24,800 parts $414 S3 Standard
Total 2,418 TB 102M+ files $55,614 --

Note: Costs exclude S3 request charges ($1,840/month) and data transfer ($946/month)

Storage Distribution by Age:
├─ 0-6 months: 360 TB (15%) - 82% of access
├─ 6-12 months: 420 TB (17.5%) - 12% of access
├─ 1-3 years: 1,080 TB (45%) - 5% of access
└─ 3+ years: 540 TB (22.5%) - 1% of access
Access Patterns:
├─ Hot tier candidates (accessed weekly): 380 TB
├─ Warm tier candidates (accessed monthly): 640 TB
├─ Cold tier candidates (accessed < 3x/year): 1,380 TB
└─ Archive candidates (accessed < 1x/year): 0 TB (business constraint)

Retrieval Requirements

Critical constraint: Video playback SLA = 2 seconds

  • S3 Standard: Immediate retrieval
  • S3 Intelligent-Tiering: Immediate retrieval
  • S3 Standard-IA: Immediate retrieval
  • S3 Glacier Flexible Retrieval: 1-5 minutes (doesn't meet SLA)
  • S3 Glacier Instant Retrieval: Immediate (milliseconds)

This constrained our tiering strategy significantly.

The Challenge: Glacier Flexible Retrieval Doesn't Meet Video SLA

What Went Wrong

Initial cost model showed maximum savings by moving 1,380 TB to Glacier Flexible Retrieval: Storage cost $0.0036/GB/month (84% cheaper than Standard), projected savings $28,000/month. We created lifecycle policy to transition videos older than 2 years to Glacier Flexible.

Monday morning problem: User reported old video "stuck loading" — playback never started.

Root Cause: Glacier Flexible Retrieval requires restore request before access:

  1. User clicks video (3 years old, in Glacier Flexible)
  2. Application requests object from S3
  3. S3 returns "object archived, restoration required"
  4. Application must initiate restoration (1-5 minutes)
  5. User waits... and waits... video never plays

This breaks the "instant playback" user experience.

The Reversal

Within 6 hours:

  • Identified all videos in Glacier Flexible (2,847 objects)
  • Initiated bulk restoration to S3 Standard
  • Updated lifecycle policy to prevent future transitions
  • User experience restored to normal

The Fix

Revised tiering strategy to respect 2-second playback SLA:

Content Type Access Pattern Storage Class Retrieval
Recent videos (0-6 mo) High S3 Intelligent-Tiering Immediate
Medium-age (6mo-3yr) Medium S3 Intelligent-Tiering Immediate
Old videos (3+ years) Low S3 Glacier Instant Retrieval Immediate
Analytics logs Rare S3 Glacier Deep Archive 12-48 hours
Database backups Rare S3 Glacier Deep Archive 12-48 hours

Lesson: Storage class selection must respect application SLAs. Glacier Flexible Retrieval is great for backups/archives, but unusable for user-facing content that requires instant access.

Implementation Approach

Phase 1: Hot Tier Optimization (Week 1)

Step 1: Enable S3 Storage Lens for Visibility

First, we needed detailed analytics on storage patterns:

# Enable S3 Storage Lens for account-wide analytics
aws s3control put-storage-lens-configuration \
--account-id 123456789012 \
--config-id media-platform-storage-lens \
--storage-lens-configuration file://storage-lens-config.json
# Generate S3 Inventory for all 180 buckets
aws s3api put-bucket-inventory-configuration \
--bucket video-library-prod \
--id storage-inventory \
--inventory-configuration file://inventory-config.json

Result: Daily inventory reports showing object age, size, storage class, and last access time for all 102M objects.

Step 2: Test Intelligent-Tiering on Subset

Before full rollout, tested on 5% of video library (50 TB, 140K videos):

# Create lifecycle policy for test bucket
aws s3api put-bucket-lifecycle-configuration \
--bucket video-library-test \
--lifecycle-configuration '{
"Rules": [{
"Id": "IntelligentTieringTransition",
"Status": "Enabled",
"Filter": {"Prefix": "videos/"},
"Transitions": [{
"Days": 0,
"StorageClass": "INTELLIGENT_TIERING"
}]
}]
}'

Monitoring Period: 7 days tracking CloudWatch metrics for video start time, error rates, and CloudFront cache hit ratio.

  • Video start time: No degradation (remained 1.8-2.1 seconds)
  • Error rate: No increase (0.02% before/after)
  • Cost: $420/month savings on test bucket (16.8% reduction)

Step 3: Phased Rollout to Production

Rolled out Intelligent-Tiering in 4 waves over 7 days:

Wave Date Data Volume Buckets Status
Wave 1 Day 1 255 TB 45 Success
Wave 2 Day 3 255 TB 45 Success
Wave 3 Day 5 255 TB 45 Success
Wave 4 Day 7 255 TB 45 Success
Total 1,020 TB 180 --

Wave Strategy: Each wave included mix of high-traffic and low-traffic buckets to detect issues early. 24-hour monitoring period between waves.

Step 4: Post-Implementation Validation

After 30 days (time for Intelligent-Tiering to move objects to Infrequent Access tier):

  • Cost Reduction: $24,840/month → $16,320/month = $8,520/month savings (34.3%)
  • Tier Distribution: 38% in Frequent Access, 52% in Infrequent Access, 10% in Archive Access
  • Performance: Zero degradation in video start time or error rate
  • Monitoring Fee: $0.0025 per 1,000 objects = $255/month (included in savings above)

Result: $8,520/month savings (34.3%) with zero user-facing impact and automatic ongoing optimization

Phase 2: Cold Tier Migration (Week 2)

After learning from the Glacier Flexible mistake (see Challenge section above), we implemented Glacier Instant Retrieval strategy:

Step 1: Identify Cold Data Candidates

Using S3 Storage Lens and CloudFront logs to identify truly cold data:

# Query Athena for videos accessed < 3 times in 90 days
SELECT bucket, key, size, last_access_time
FROM s3_access_logs
WHERE object_age_days > 1095
GROUP BY bucket, key, size, last_access_time
HAVING COUNT(*) < 3
ORDER BY size DESC;

Results: 540 TB of videos (3+ years old, accessed < 3x per year), 85 TB of logs (90+ days old), 35 TB of backups (180+ days old).

Step 2: Test Glacier Instant Retrieval on Sample

Before migration, validated instant retrieval performance:

  • Test Sample: 100 videos (50 GB) manually transitioned to Glacier Instant Retrieval
  • Playback Testing: Simulated 500 concurrent users accessing these 100 videos via CloudFront
  • Latency Results: First byte time: 42ms (vs. 38ms from S3 Standard) — well within 2-second SLA
  • Cost Validation: Storage cost: $0.004/GB/month (vs. $0.023 for Standard) = 82.6% cheaper
  • Retrieval Cost: $0.01 per 1,000 GET requests (vs. $0.0004 for Standard) — acceptable given low access frequency

Step 3: Phased Migration with Monitoring

Migrated cold data in 3 phases over 5 days:

Phase Content Type Volume Storage Class Transition Rule
Phase 1 Videos (3+ years) 540 TB Glacier Instant After 1,095 days
Phase 2 Analytics logs 85 TB Glacier Deep Archive After 90 days
Phase 3 Database backups 35 TB Glacier Deep Archive After 180 days
Total 660 TB --
# Lifecycle policy for Glacier Instant Retrieval (videos)
aws s3api put-bucket-lifecycle-configuration \
--bucket video-library-prod \
--lifecycle-configuration '{
"Rules": [{
"Id": "ArchiveOldVideos",
"Status": "Enabled",
"Filter": {"Prefix": "videos/"},
"Transitions": [{
"Days": 1095,
"StorageClass": "GLACIER_IR"
}]
}]
}'

Step 4: Validate Performance & Cost Impact

After migration completed (transitions happen within 24-48 hours):

  • Video Playback Performance: No degradation — P95 latency remained 1.84 seconds (well within 2-second SLA)
  • Error Rate: No increase in 404s or timeout errors (0.02% unchanged)
  • User Complaints: Zero complaints about "slow loading" or "video not found"
  • Cost Reduction:
    • Videos (540 TB): $12,420/month → $2,160/month = $10,260/month savings (82.6%)
    • Logs (85 TB): $1,955/month → $85/month = $1,870/month savings (95.6%)
    • Backups (35 TB): $805/month → $35/month = $770/month savings (95.6%)
  • Retrieval Costs: $180/month additional for Glacier Instant GET requests (negligible compared to savings)

Result: Additional $12,900/month net savings (after retrieval costs) with zero SLA violations

Phase 3: Incomplete Multipart Upload Cleanup

Discovery: 18 TB of incomplete multipart uploads (failed user uploads from 2019-2024)

  • Enabled S3 Lifecycle rule to abort incomplete uploads after 7 days
  • Deleted existing incomplete uploads (18 TB)
  • Set up monitoring for failed upload rate

Result: $420/month immediate savings + ongoing waste prevention

Phase 4: Optimization of Hot Tier

With cold data moved to Glacier, remaining hot tier (380 TB) was high-traffic:

  • Enabled S3 Transfer Acceleration for user uploads (reduced latency)
  • Configured CloudFront to cache more aggressively: Video thumbnails 7 days → 30 days, Video metadata 1 day → 7 days
  • Reduced S3 GET requests by 42% via better caching

Result: $580/month savings on requests + 18% lower P95 latency

Monitoring & Ongoing Optimization

CloudWatch Dashboard Setup

Created comprehensive monitoring dashboard tracking storage optimization metrics:

  • Storage Metrics:
    • Total storage by storage class (S3 Standard, Intelligent-Tiering, Glacier Instant, Glacier Deep)
    • Daily storage growth rate and trend
    • Intelligent-Tiering distribution (Frequent Access, Infrequent Access, Archive Access tiers)
  • Cost Metrics:
    • Daily S3 storage costs by storage class
    • Request costs (GET, PUT, POST, LIST)
    • Data transfer costs
    • Month-to-date spend vs. forecast
  • Performance Metrics:
    • Video start time (P50, P95, P99) by storage class
    • CloudFront cache hit ratio
    • 4XX and 5XX error rates
    • First byte time for Glacier Instant retrievals

Automated Alerts

Configured CloudWatch Alarms for anomaly detection:

# Alert if video start time exceeds 2.5 seconds (approaching SLA)
aws cloudwatch put-metric-alarm \
--alarm-name video-start-time-high \
--metric-name VideoStartTime \
--statistic Average \
--period 300 \
--threshold 2.5 \
--comparison-operator GreaterThanThreshold
# Alert if S3 costs increase > 10% week-over-week
aws cloudwatch put-metric-alarm \
--alarm-name s3-cost-anomaly \
--metric-name EstimatedCharges \
--statistic Sum \
--period 604800 \
--evaluation-periods 1 \
--threshold 10 \
--comparison-operator GreaterThanThreshold
  • Video Performance Alert: Triggers if P95 video start time > 2.5 seconds (approaching 3-second SLA breach)
  • Cost Anomaly Alert: Triggers if weekly S3 spend increases > 10% week-over-week
  • Error Rate Alert: Triggers if 5XX errors exceed 0.1% of requests
  • Incomplete Upload Alert: Triggers if incomplete multipart uploads exceed 100 GB

Weekly Optimization Report

Automated Lambda function generating weekly optimization insights:

  • Storage Efficiency: Tracks % of data in optimal storage class based on access patterns
  • Cost Trend: Compares current week's costs to 4-week moving average
  • Lifecycle Progress: Reports on automatic transitions (e.g., "42 TB transitioned to Glacier Instant this week")
  • Action Items: Flags anomalies requiring investigation (e.g., "3 buckets show 20% WoW storage growth")

Quarterly Storage Review

Scheduled quarterly reviews to refine lifecycle policies:

  • Q1 2025: Adjusted Intelligent-Tiering transition from 30 days → 45 days (reduced thrashing for seasonal content)
  • Q2 2025 Goal: Evaluate Glacier Instant → Glacier Flexible for videos > 5 years old (accessed < 1x/year)
  • Q3 2025 Goal: Implement automated thumbnail regeneration to delete old thumbnail formats (reduce redundant storage)

Ongoing Optimization: Monitoring infrastructure enables continuous improvement. In first 90 days post-implementation, identified 3 additional optimization opportunities worth $1,200/month.

Results in Detail

Cost Savings Breakdown

Component Before After Monthly Savings
Hot tier (Intelligent-Tiering) $24,840 $16,320 −$8,520 (34.3%)
Cold tier (Glacier Instant) $12,420 $2,160 −$10,260 (82.6%)
Logs/backups (Glacier Deep) $3,220 $560 −$2,660 (82.6%)
Incomplete uploads $414 $0 −$414 (100%)
Requests (CDN caching) $1,840 $1,260 −$580 (31.5%)
Total S3 $58,400 $35,800 −$22,600 (38.7%)

Performance Impact

Video Playback

  • P95 latency: 1,840ms → 1,510ms (18% improvement)
  • Glacier Instant retrieval: < 50ms (no user-facing impact)
  • CloudFront cache hit rate: 78% → 89%

Upload Management

  • Upload completion rate: 94.2% → 94.8%
  • SLA violations: Zero during/after migration

Business Value

Immediate Impact

  • $22,600/month = $271,200 annual savings
  • Improved gross margins by 0.8% (storage was 2.1% of revenue)
  • Funds 3 additional engineers or significant CDN expansion

Long-term Value

  • Intelligent-Tiering: Automatically optimizes as access patterns change
  • Lifecycle automation: New content automatically tiered as it ages
  • Scalable cost structure: Storage costs now scale sub-linearly with library growth
  • Incomplete upload prevention: Ongoing savings of ~$400-600/month

Strategic Impact

Before optimization: Library growing 8% per month, storage costs growing 8% per month (linear cost scaling).

After optimization: Library growing 8% per month, storage costs growing 3.2% per month (sub-linear cost scaling via automatic lifecycle transitions).

Projection: At current growth rate, would reach $85,000/month storage cost in 18 months. With new tiering strategy, projected to reach only $48,000/month — $37,000/month avoided cost.

Lessons Learned

What Worked

  • Intelligent-Tiering for unpredictable access: Perfect for video library with long-tail distribution
  • Access pattern analysis: 90 days of CloudFront logs revealed true usage patterns
  • Phased rollout: Hot tier first (low risk), cold tier second (higher risk)
  • Testing retrieval behavior: Caught Glacier Flexible issue in staging before production

What Didn't Work

  • Glacier Flexible for user-facing content: 1-5 minute restoration breaks video playback SLA
  • Cost-first optimization: Initially chose cheapest storage class without testing retrieval behavior
  • Assumption about "rarely accessed" content: Rare doesn't mean "user willing to wait"

Key Takeaways

  • Storage class selection must respect application SLAs: Cheapest option isn't always right option
  • Intelligent-Tiering is underrated: Automatic optimization without application changes
  • Glacier Instant Retrieval is the sweet spot: 83% cheaper than Standard, instant retrieval for user-facing content
  • Access logs tell the truth: Don't guess access patterns, measure them
  • Incomplete uploads are invisible waste: 18 TB of forgotten data costing $400+/month

Need S3 Storage Optimization?

If your S3 storage costs are growing with your data, we can help implement intelligent tiering and lifecycle policies to reduce costs while maintaining performance.

Schedule a Free Assessment

2-week engagement • Read-only audit • Reversible changes • SLA-compliant