Garbage Collection Tuning
Master Go's garbage collector with GOGC, GOMEMLIMIT, GC tracing, and container-aware configuration for low-latency applications.
Go's Garbage Collector Architecture
Go's garbage collector is a concurrent tri-color mark-and-sweep collector designed for low pause times and responsive applications. It operates through phases that overlap with application execution, minimizing stop-the-world (STW) pauses.
Tri-Color Mark-and-Sweep Algorithm
The GC uses three color categories during marking:
- White objects: Not yet visited, presumed unreachable
- Gray objects: Visited but children not yet scanned
- Black objects: Visited and all children scanned, definitely reachable
The GC cycle proceeds through phases:
-
Mark Setup (STW): 0.1-0.3ms pause
- Enables write barrier
- Prepares data structures
- Scans stack roots
-
Concurrent Marking: Runs alongside application
- Traverses object graph
- Colors objects based on reachability
- Write barrier tracks changes
-
Mark Termination (STW): 0.1-0.5ms pause (proportional to goroutine count)
- Re-scans stacks
- Disables write barrier
- Completes marking phase
-
Concurrent Sweeping: Runs alongside application
- Reclaims memory from white objects
- Returns freed memory to allocator
- Very fast (micro-operations per object)
Total pause time: typically under 1ms even for large heaps (1GB+), because mark and sweep phases are mostly concurrent.
Write Barriers
During concurrent marking, the GC uses write barriers to track changes:
// When you write: parent.child = newChild
// Go inserts a barrier operation that:
// 1. Records the write for GC tracking
// 2. Ensures newChild is marked if parent is already black
// 3. Prevents incorrect collection of reachable objects
// Modern Go uses hybrid write barriers (Dijkstra + Yuasa)
// - Costs: ~1 CPU cycle per write in critical paths
// - Overhead: typically < 5% CPU for write-heavy workloadsGC Trigger Calculation: GOGC Deep Dive
GOGC controls when garbage collection runs using a relative heap growth percentage. The default value is 100, meaning "collect when heap size doubles."
import (
"fmt"
"runtime"
"runtime/debug"
)
func explainGOGC() {
var m runtime.MemStats
runtime.ReadMemStats(&m)
// GOGC formula:
// next_gc_goal = last_live_heap * (1 + GOGC/100)
//
// Example: GOGC=100 (default)
// - After GC, live heap = 100MB
// - next_gc_goal = 100 * (1 + 100/100) = 200MB
// - GC runs when heap reaches 200MB
// - This allows application to allocate 100MB new objects between GCs
//
// Example: GOGC=50
// - After GC, live heap = 100MB
// - next_gc_goal = 100 * (1 + 50/100) = 150MB
// - GC runs sooner (more frequent collection, less memory between GCs)
// - Higher CPU usage (GC runs more often)
// - Lower peak memory (heap smaller)
//
// Example: GOGC=200
// - After GC, live heap = 100MB
// - next_gc_goal = 100 * (1 + 200/100) = 300MB
// - GC runs later (less frequent collection)
// - Lower CPU usage (fewer GC cycles)
// - Higher peak memory (heap can grow larger)
currentGC := debug.GetGCPercent()
fmt.Printf("Current GOGC: %d\n", currentGC)
fmt.Printf("Live heap: %v MB\n", m.Alloc/1024/1024)
fmt.Printf("Heap goal: %v MB\n", m.HeapAlloc/1024/1024)
fmt.Printf("Next GC at: ~%v MB (estimated)\n", m.NextGC/1024/1024)
}
// GOGC tuning guidelines:
// GOGC=50: Latency-sensitive (frequent GC, sub-ms pauses, high CPU)
// GOGC=100: Default (balanced)
// GOGC=200: Throughput-optimized (infrequent GC, longer pauses, low CPU)
// GOGC=400: Memory-abundant scenarios (very infrequent GC)
// GOGC=off: Special cases only (never automatic GC, manual trigger needed)GOMEMLIMIT: Modern Container-Aware Tuning (Go 1.19+)
GOMEMLIMIT sets an absolute memory limit, allowing the GC to stay within container constraints. This is the recommended approach for production services.
import (
"fmt"
"os"
"runtime/debug"
)
func setupGOMEMLIMIT() {
// GOMEMLIMIT format: number + unit
// Valid units: B, KiB, MiB, GiB
// Examples:
// GOMEMLIMIT=512MiB
// GOMEMLIMIT=1GiB
// GOMEMLIMIT=100MiB
limit := os.Getenv("GOMEMLIMIT")
fmt.Printf("GOMEMLIMIT set to: %s\n", limit)
// If not set, use defaults
// GOMEMLIMIT=max() - no limit (Go 1.19+)
// For containers: set to 85-90% of container memory limit
// Container memory: 1GB
// Container limit: 1024MB
// GOMEMLIMIT: 900MiB (leave 10% buffer for kernel, GC metadata)
// How GOMEMLIMIT affects GC:
// - As heap approaches limit, GC becomes more aggressive
// - When near limit, GC may mark/sweep more frequently
// - Prevents OOM kills by keeping heap under limit
// - Can cause periodic CPU spikes as you approach limit
}
// GOMEMLIMIT vs GOGC interaction:
// GOGC=100 only, GOMEMLIMIT not set:
// - GC triggers when heap doubles
// - Can grow unbounded (dangerous in containers)
//
// GOGC=100, GOMEMLIMIT=1GiB:
// - GC tries to keep heap under 1GB
// - May override GOGC behavior if approaching limit
// - GC becomes more aggressive as limit is approached
//
// GOGC=off, GOMEMLIMIT=1GiB:
// - Automatic GC disabled
// - Only GOMEMLIMIT triggers collection
// - GC runs aggressively to stay under limit
// - Not recommended unless manually managing GCContainer-Specific Configuration
import (
"fmt"
"runtime"
"strconv"
"strings"
)
// Kubernetes memory limits interaction with Go GC
func configureForKubernetes(containerMemoryMiB string) {
memMiB, _ := strconv.Atoi(containerMemoryMiB)
// Kubernetes example:
// resources:
// requests:
// memory: "512Mi" # GC soft target
// limits:
// memory: "1Gi" # Hard OOM kill limit
// Go configuration:
// - GOMEMLIMIT should be 85-90% of container limit
// - GOGC can remain at default (100)
// - Go 1.19+ handles cgroup v2 memory awareness automatically
gomemlimitMiB := int(float64(memMiB) * 0.9)
fmt.Printf("Container memory: %dMiB\n", memMiB)
fmt.Printf("Recommended GOMEMLIMIT: %dMiB\n", gomemlimitMiB)
// Set at runtime (for Go 1.19+):
// os.Setenv("GOMEMLIMIT", fmt.Sprintf("%dMiB", gomemlimitMiB))
// Check actual cgroup limits
cpuQuota := readCgroupLimit("/sys/fs/cgroup/cpu/cpu.cfs_quota_us")
cpuPeriod := readCgroupLimit("/sys/fs/cgroup/cpu/cpu.cfs_period_us")
numCPU := float64(cpuQuota) / float64(cpuPeriod)
fmt.Printf("Cgroup CPU limit: %.1f cores\n", numCPU)
}
func readCgroupLimit(path string) int64 {
// Simplified: in production, use full cgroup/cgroupv2 reading
return 0
}
// Example: 1GB container with 4 cores
// GOMEMLIMIT=900MiB
// GOGC=100 (default)
// Result:
// - Heap can grow to 900MB before hard limit
// - GC runs when live heap is 450MB (900 * 2)
// - Peak memory: ~920MB (including GC metadata, stacks)
// - Safe margin from 1GB OOM kill threshold
// OOM prevention checklist:
// 1. Set GOMEMLIMIT = 0.9 * container_limit_mb
// 2. Monitor heap via runtime.MemStats.HeapAlloc
// 3. Set resource requests = 0.7 * limit (for Kubernetes scheduler)
// 4. Alert if HeapAlloc > 0.8 * GOMEMLIMIT
// 5. Use cgroup v2 aware Go 1.19+ (auto-detects limits)Memory Limit Cycle Detection and Prevention
import (
"fmt"
"runtime"
"sync"
"time"
)
// Detecting memory limit cycles (rapid GC due to constant high allocation)
func detectMemoryLimitCycle() {
var m runtime.MemStats
prevGCCount := uint32(0)
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for range ticker.C {
runtime.ReadMemStats(&m)
// High allocation rate near memory limit
if m.NumGC > prevGCCount {
allocRate := m.Alloc / 1024 / 1024 // MB
fmt.Printf("GC #%d: Heap=%dMB, GC pause=%v\n",
m.NumGC, allocRate, time.Duration(m.PauseNs[(m.NumGC+255)%256]))
// If GCs are happening every second, you're in a cycle
// Solution: reduce allocation rate or increase GOMEMLIMIT
prevGCCount = m.NumGC
}
}
}
// Preventing memory limit cycles:
// 1. Measure actual allocation rate: objects/sec * avg_object_size
// 2. Calculate time between GCs: heap_size / allocation_rate
// 3. If < 1 second: increase GOMEMLIMIT or reduce allocation rate
// 4. Use object pooling to reduce allocation rate
// 5. Ensure go mod/dep is up to date (GC improvements in recent versions)GC Pause Analysis and Measurement
import (
"fmt"
"runtime"
"time"
)
func measureGCPauses() {
var m runtime.MemStats
lastNumGC := uint32(0)
ticker := time.NewTicker(100 * time.Millisecond)
defer ticker.Stop()
var pauses []time.Duration
for range ticker.C {
runtime.ReadMemStats(&m)
// Check if GC ran since last check
if m.NumGC > lastNumGC {
// Get pause time (circular buffer of last 256 pauses)
pauseNs := m.PauseNs[(m.NumGC+255)%256]
pauseDuration := time.Duration(pauseNs)
pauses = append(pauses, pauseDuration)
if len(pauses) > 100 {
pauses = pauses[1:]
}
fmt.Printf("GC pause: %v, heap: %vMB\n", pauseDuration, m.Alloc/1024/1024)
lastNumGC = m.NumGC
}
// Calculate percentiles
if len(pauses) > 10 {
p50 := pauses[len(pauses)/2]
p95 := pauses[(len(pauses)*95)/100]
p99 := pauses[(len(pauses)*99)/100]
fmt.Printf("Pause times - p50: %v, p95: %v, p99: %v\n", p50, p95, p99)
}
}
}
// Typical pause times:
// - Small heap (< 100MB): 0.1-0.3ms
// - Medium heap (100MB-1GB): 0.3-1ms
// - Large heap (> 1GB): 0.5-2ms
//
// Pause time DOES scale with:
// - Goroutine count (stack scanning in mark termination)
// - Object graph size (mark phase duration)
// - Allocation rate (more objects to scan)
//
// Pause time does NOT scale with heap size
// (because most marking is concurrent)
//
// For P99 latency-sensitive services:
// - Target: GC pause < 10ms
// - Typical: < 2ms is achievable with GOMEMLIMIT tuning
// - If > 5ms: increase GOMEMLIMIT or reduce goroutine countGC Pacer: How Runtime Decides When to Collect
// The GC pacer makes decisions based on:
// 1. Allocation rate: bytes allocated per unit time
// 2. Current live heap size
// 3. Configured GOGC or GOMEMLIMIT
// 4. Historical GC time measurements
// Pacer algorithm (simplified):
// - Tracks bytes allocated since last GC
// - Predicts when next_gc_goal will be reached
// - Starts GC when: bytes_allocated / allocation_rate > 0.95 * remaining_time
// - Ensures mark phase completes before allocation reaches goal
// Factors affecting pacer behavior:
// - High allocation rate: starts GC earlier (gives marking time)
// - Variable allocation rate: may be conservative (prevents missed goals)
// - Multi-threaded contention: may delay GC slightly
// If you see unexpected GCs:
// 1. Check allocation rate with pprof
// 2. Verify GOGC/GOMEMLIMIT settings
// 3. Monitor for memory leaks (heap growing unbounded)
// 4. Check for finalizers (not implemented in this example)Benchmarking with Different GC Settings
import (
"fmt"
"os"
"runtime"
"runtime/debug"
"testing"
"time"
)
// Benchmark allocation and GC impact
func BenchmarkGCSettings(b *testing.B) {
for _, gogc := range []int{50, 100, 200} {
b.Run(fmt.Sprintf("GOGC=%d", gogc), func(b *testing.B) {
debug.SetGCPercent(gogc)
defer debug.SetGCPercent(100) // Restore
b.ReportAllocs()
b.ResetTimer()
var m runtime.MemStats
for i := 0; i < b.N; i++ {
// Allocate 1MB
_ = make([]byte, 1024*1024)
}
runtime.ReadMemStats(&m)
fmt.Printf("GOGC=%d: NumGC=%d, PauseTotal=%v\n",
gogc, m.NumGC, time.Duration(m.PauseNs[0]))
})
}
// Expected results for 1000 iterations (1GB allocation):
// GOGC=50: NumGC=20, PauseTotal=15ms (frequent, low pause)
// GOGC=100: NumGC=10, PauseTotal=10ms (balanced)
// GOGC=200: NumGC=5, PauseTotal=5ms (infrequent, longer individual pauses)
}
// Real-world: Compare throughput vs latency
func BenchmarkGCThroughputVsLatency(b *testing.B) {
b.Run("LowLatency_GOGC50", func(b *testing.B) {
debug.SetGCPercent(50)
defer debug.SetGCPercent(100)
b.ReportAllocs()
for i := 0; i < b.N; i++ {
processRequest() // Allocates ~1KB per request
}
// Result: More GCs but lower p99 latency
})
b.Run("HighThroughput_GOGC200", func(b *testing.B) {
debug.SetGCPercent(200)
defer debug.SetGCPercent(100)
b.ReportAllocs()
for i := 0; i < b.N; i++ {
processRequest()
}
// Result: Fewer GCs, higher throughput, worse p99 latency
})
}
func processRequest() {
_ = make([]byte, 1024)
}Reducing Allocation Rate: The Most Impactful Optimization
import (
"sync"
)
// ANTI-PATTERN: Creates new slice each call
func processItemsBad(items []string) {
var results []string // Allocates
for _, item := range items {
results = append(results, item+"_processed")
}
_ = results
}
// PATTERN 1: Preallocate with known capacity
func processItemsGood(items []string) []string {
results := make([]string, 0, len(items)) // Single allocation
for _, item := range items {
results = append(results, item+"_processed")
}
return results
}
// PATTERN 2: Use sync.Pool for temporary allocations
var stringPoolPool = sync.Pool{
New: func() interface{} {
return make([]string, 0, 100)
},
}
func processItemsPooled(items []string) []string {
results := stringPoolPool.Get().([]string)
results = results[:0] // Reset to empty, keep capacity
defer stringPoolPool.Put(results)
for _, item := range items {
results = append(results, item+"_processed")
}
return results
}
// Impact:
// - processItemsBad: 100 items → ~10 allocations (append growth)
// - processItemsGood: 100 items → 1 allocation
// - processItemsPooled: 100 items → 0 allocations (after warmup)
//
// For 1M items per second:
// - Bad: 10M allocations/sec → GC pressure, high memory bandwidth
// - Good: 1M allocations/sec → manageable
// - Pooled: 0 allocations/sec → near-zero GC overhead
//
// GC impact (with GOGC=100):
// - Bad: GC every 1-2 seconds, 1-2ms pauses
// - Good: GC every 5-10 seconds, <1ms pauses
// - Pooled: GC every 30+ seconds, <0.5ms pausesProduction GC Monitoring
import (
"fmt"
"runtime"
"time"
)
type GCMetrics struct {
NumGC uint32
LastGCTime time.Time
LastPauseDuration time.Duration
HeapAllocMB uint64
HeapInuseMB uint64
GCCPUFraction float64
}
func GetGCMetrics() GCMetrics {
var m runtime.MemStats
runtime.ReadMemStats(&m)
return GCMetrics{
NumGC: m.NumGC,
LastGCTime: time.Unix(0, int64(m.LastGC)),
LastPauseDuration: time.Duration(m.PauseNs[(m.NumGC+255)%256]),
HeapAllocMB: m.Alloc / 1024 / 1024,
HeapInuseMB: m.HeapInuse / 1024 / 1024,
GCCPUFraction: m.GCCPUFraction,
}
}
func MonitorGC(interval time.Duration) {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for range ticker.C {
metrics := GetGCMetrics()
// Alert conditions:
// 1. GC pause exceeding threshold
if metrics.LastPauseDuration > 5*time.Millisecond {
fmt.Printf("WARNING: GC pause exceeded 5ms: %v\n", metrics.LastPauseDuration)
}
// 2. High GC CPU overhead
if metrics.GCCPUFraction > 0.25 { // >25% CPU spent in GC
fmt.Printf("WARNING: High GC CPU: %.1f%%\n", metrics.GCCPUFraction*100)
}
// 3. Heap allocation near limit
if metrics.HeapAllocMB > 900 { // Assuming 1GB limit
fmt.Printf("WARNING: Heap near limit: %dMB\n", metrics.HeapAllocMB)
}
// Log metrics for analysis
fmt.Printf("GC Metrics - Count: %d, LastPause: %v, Heap: %dMB, CPU: %.1f%%\n",
metrics.NumGC,
metrics.LastPauseDuration,
metrics.HeapAllocMB,
metrics.GCCPUFraction*100)
}
}
// Production monitoring checklist:
// - GC pause times (should stay < 10ms for p99)
// - GC frequency (should be predictable, not random spikes)
// - Heap allocation trend (should be stable, not growing)
// - GC CPU overhead (should be < 20% for most workloads)
// - goroutine count (affects mark termination pause time)Real-World Tuning Examples
Example 1: HTTP API Server
import (
"net/http"
"os"
"runtime/debug"
)
func setupAPIServer() {
// For API server with 100 RPS average, 1000 RPS burst:
// - Each request: ~10KB allocation
// - Live set: ~50MB (working set)
// - Allocation rate: 100 RPS * 10KB = 1MB/sec
// Configuration:
// GOGC=100 (default is fine)
// GOMEMLIMIT=512MiB (2x expected peak)
// Result: GC ~every 5 seconds, <1ms pauses
if os.Getenv("GOGC") == "" {
debug.SetGCPercent(100)
}
// Enable GC tracing in development
if os.Getenv("DEBUG") != "" {
os.Setenv("GODEBUG", "gctrace=1")
}
}
// Run with:
// GOMEMLIMIT=512MiB ./api-server
// For lower latency:
// GOGC=50 GOMEMLIMIT=512MiB ./api-serverExample 2: Batch Processing
import (
"runtime/debug"
)
func setupBatchProcessor() {
// For batch processing:
// - Each batch: 100MB allocation
// - Process time: 10 seconds per batch
// - No tight latency requirements
// - Throughput critical
// Configuration:
// GOGC=200 (less frequent GC)
// GOMEMLIMIT=1GiB (plenty of room)
// Result: GC every 30+ seconds, infrequent pauses
debug.SetGCPercent(200)
}
// Run with:
// GOMEMLIMIT=1GiB ./batch-processorExample 3: Real-Time Data Processing
import (
"runtime/debug"
)
func setupRealtimeProcessor() {
// For real-time processing (10ms latency target):
// - Must avoid GC pauses during critical windows
// - Prioritize stable low latency over throughput
// Configuration:
// GOGC=50 (more frequent, smaller pauses)
// GOMEMLIMIT=256MiB (tight, forces GC often)
// Result: GC every 1-2 seconds, <0.5ms pauses
debug.SetGCPercent(50)
}
// Run with:
// GOMEMLIMIT=256MiB ./realtime-processorWhen NOT to Use GOGC=off and SetGCPercent
import (
"runtime/debug"
)
// ANTI-PATTERN: Disabling GC
func disableGCBad() {
debug.SetGCPercent(-1) // Disable automatic GC
// This is almost never correct
// You must manually call runtime.GC() periodically
// Risk: Memory grows unbounded between manual GC calls
// Use case: Custom GC scheduling in extreme scenarios
// Better: Use GOMEMLIMIT instead
}
// When GOGC=off might be appropriate:
// 1. Custom GC scheduler (very rare)
// 2. Predictable batch boundaries (process batch → GC → next batch)
// 3. Extreme low-latency requirement with careful heap management
//
// Better alternatives:
// - GOMEMLIMIT (handles most cases)
// - Reduce allocation rate (most effective)
// - Increase GOGC value (simpler than disabling)Go GC Improvements Over Versions
Go 1.12: Concurrent sweep improvements
Go 1.13: Scavenging improvements, GC CPU fraction tracking
Go 1.14: Smaller Mark Termination pauses
Go 1.15: Write barrier improvements
Go 1.16: Pacer improvements
Go 1.17: Faster write barriers
Go 1.18: Better pacer tuning
Go 1.19: GOMEMLIMIT support (major improvement for containers)
Go 1.20: GOMEMLIMIT hardening, cgroup v2 support
Go 1.21: GC latency improvementsSummary
Go's garbage collector is highly optimized for low latency with typical pause times under 1ms even for large heaps. Control GC behavior with GOGC (percentage-based) or GOMEMLIMIT (absolute limit), preferring GOMEMLIMIT in containerized environments. Default GOGC=100 means "collect when heap doubles"; lower values (50) reduce pause frequency at cost of more CPU, higher values (200) increase pause intervals. For containers, set GOMEMLIMIT to 85-90% of container memory limit to prevent OOM kills. Measure actual GC impact with runtime.MemStats and GODEBUG=gctrace=1. The most impactful optimization is reducing allocation rate through preallocation and object pooling, which can reduce GC frequency by 10-100x. Monitor pause times and GC CPU overhead in production; alert on p99 latency > 10ms or GC consuming > 25% CPU. Use profiling with go test -benchmem to identify allocation hotspots. For low-latency services (under 10ms target), combine GOGC=50 with GOMEMLIMIT and aggressive allocation reduction. For throughput services, use GOGC=200 or higher with larger GOMEMLIMIT. Always measure actual GC behavior in production; tuning is workload-specific and generic settings rarely match production characteristics.