Go Performance Guide
Memory Management

Garbage Collection Tuning

Master Go's garbage collector with GOGC, GOMEMLIMIT, GC tracing, and container-aware configuration for low-latency applications.

Go's Garbage Collector Architecture

Go's garbage collector is a concurrent tri-color mark-and-sweep collector designed for low pause times and responsive applications. It operates through phases that overlap with application execution, minimizing stop-the-world (STW) pauses.

Tri-Color Mark-and-Sweep Algorithm

The GC uses three color categories during marking:

  • White objects: Not yet visited, presumed unreachable
  • Gray objects: Visited but children not yet scanned
  • Black objects: Visited and all children scanned, definitely reachable

The GC cycle proceeds through phases:

  1. Mark Setup (STW): 0.1-0.3ms pause

    • Enables write barrier
    • Prepares data structures
    • Scans stack roots
  2. Concurrent Marking: Runs alongside application

    • Traverses object graph
    • Colors objects based on reachability
    • Write barrier tracks changes
  3. Mark Termination (STW): 0.1-0.5ms pause (proportional to goroutine count)

    • Re-scans stacks
    • Disables write barrier
    • Completes marking phase
  4. Concurrent Sweeping: Runs alongside application

    • Reclaims memory from white objects
    • Returns freed memory to allocator
    • Very fast (micro-operations per object)

Total pause time: typically under 1ms even for large heaps (1GB+), because mark and sweep phases are mostly concurrent.

Write Barriers

During concurrent marking, the GC uses write barriers to track changes:

// When you write: parent.child = newChild
// Go inserts a barrier operation that:
// 1. Records the write for GC tracking
// 2. Ensures newChild is marked if parent is already black
// 3. Prevents incorrect collection of reachable objects

// Modern Go uses hybrid write barriers (Dijkstra + Yuasa)
// - Costs: ~1 CPU cycle per write in critical paths
// - Overhead: typically < 5% CPU for write-heavy workloads

GC Trigger Calculation: GOGC Deep Dive

GOGC controls when garbage collection runs using a relative heap growth percentage. The default value is 100, meaning "collect when heap size doubles."

import (
	"fmt"
	"runtime"
	"runtime/debug"
)

func explainGOGC() {
	var m runtime.MemStats
	runtime.ReadMemStats(&m)

	// GOGC formula:
	// next_gc_goal = last_live_heap * (1 + GOGC/100)
	//
	// Example: GOGC=100 (default)
	// - After GC, live heap = 100MB
	// - next_gc_goal = 100 * (1 + 100/100) = 200MB
	// - GC runs when heap reaches 200MB
	// - This allows application to allocate 100MB new objects between GCs
	//
	// Example: GOGC=50
	// - After GC, live heap = 100MB
	// - next_gc_goal = 100 * (1 + 50/100) = 150MB
	// - GC runs sooner (more frequent collection, less memory between GCs)
	// - Higher CPU usage (GC runs more often)
	// - Lower peak memory (heap smaller)
	//
	// Example: GOGC=200
	// - After GC, live heap = 100MB
	// - next_gc_goal = 100 * (1 + 200/100) = 300MB
	// - GC runs later (less frequent collection)
	// - Lower CPU usage (fewer GC cycles)
	// - Higher peak memory (heap can grow larger)

	currentGC := debug.GetGCPercent()
	fmt.Printf("Current GOGC: %d\n", currentGC)
	fmt.Printf("Live heap: %v MB\n", m.Alloc/1024/1024)
	fmt.Printf("Heap goal: %v MB\n", m.HeapAlloc/1024/1024)
	fmt.Printf("Next GC at: ~%v MB (estimated)\n", m.NextGC/1024/1024)
}

// GOGC tuning guidelines:
// GOGC=50:  Latency-sensitive (frequent GC, sub-ms pauses, high CPU)
// GOGC=100: Default (balanced)
// GOGC=200: Throughput-optimized (infrequent GC, longer pauses, low CPU)
// GOGC=400: Memory-abundant scenarios (very infrequent GC)
// GOGC=off: Special cases only (never automatic GC, manual trigger needed)

GOMEMLIMIT: Modern Container-Aware Tuning (Go 1.19+)

GOMEMLIMIT sets an absolute memory limit, allowing the GC to stay within container constraints. This is the recommended approach for production services.

import (
	"fmt"
	"os"
	"runtime/debug"
)

func setupGOMEMLIMIT() {
	// GOMEMLIMIT format: number + unit
	// Valid units: B, KiB, MiB, GiB
	// Examples:
	// GOMEMLIMIT=512MiB
	// GOMEMLIMIT=1GiB
	// GOMEMLIMIT=100MiB

	limit := os.Getenv("GOMEMLIMIT")
	fmt.Printf("GOMEMLIMIT set to: %s\n", limit)

	// If not set, use defaults
	// GOMEMLIMIT=max() - no limit (Go 1.19+)
	// For containers: set to 85-90% of container memory limit

	// Container memory: 1GB
	// Container limit: 1024MB
	// GOMEMLIMIT: 900MiB (leave 10% buffer for kernel, GC metadata)

	// How GOMEMLIMIT affects GC:
	// - As heap approaches limit, GC becomes more aggressive
	// - When near limit, GC may mark/sweep more frequently
	// - Prevents OOM kills by keeping heap under limit
	// - Can cause periodic CPU spikes as you approach limit
}

// GOMEMLIMIT vs GOGC interaction:
// GOGC=100 only, GOMEMLIMIT not set:
//   - GC triggers when heap doubles
//   - Can grow unbounded (dangerous in containers)
//
// GOGC=100, GOMEMLIMIT=1GiB:
//   - GC tries to keep heap under 1GB
//   - May override GOGC behavior if approaching limit
//   - GC becomes more aggressive as limit is approached
//
// GOGC=off, GOMEMLIMIT=1GiB:
//   - Automatic GC disabled
//   - Only GOMEMLIMIT triggers collection
//   - GC runs aggressively to stay under limit
//   - Not recommended unless manually managing GC

Container-Specific Configuration

import (
	"fmt"
	"runtime"
	"strconv"
	"strings"
)

// Kubernetes memory limits interaction with Go GC
func configureForKubernetes(containerMemoryMiB string) {
	memMiB, _ := strconv.Atoi(containerMemoryMiB)

	// Kubernetes example:
	// resources:
	//   requests:
	//     memory: "512Mi"  # GC soft target
	//   limits:
	//     memory: "1Gi"    # Hard OOM kill limit

	// Go configuration:
	// - GOMEMLIMIT should be 85-90% of container limit
	// - GOGC can remain at default (100)
	// - Go 1.19+ handles cgroup v2 memory awareness automatically

	gomemlimitMiB := int(float64(memMiB) * 0.9)
	fmt.Printf("Container memory: %dMiB\n", memMiB)
	fmt.Printf("Recommended GOMEMLIMIT: %dMiB\n", gomemlimitMiB)

	// Set at runtime (for Go 1.19+):
	// os.Setenv("GOMEMLIMIT", fmt.Sprintf("%dMiB", gomemlimitMiB))

	// Check actual cgroup limits
	cpuQuota := readCgroupLimit("/sys/fs/cgroup/cpu/cpu.cfs_quota_us")
	cpuPeriod := readCgroupLimit("/sys/fs/cgroup/cpu/cpu.cfs_period_us")
	numCPU := float64(cpuQuota) / float64(cpuPeriod)
	fmt.Printf("Cgroup CPU limit: %.1f cores\n", numCPU)
}

func readCgroupLimit(path string) int64 {
	// Simplified: in production, use full cgroup/cgroupv2 reading
	return 0
}

// Example: 1GB container with 4 cores
// GOMEMLIMIT=900MiB
// GOGC=100 (default)
// Result:
// - Heap can grow to 900MB before hard limit
// - GC runs when live heap is 450MB (900 * 2)
// - Peak memory: ~920MB (including GC metadata, stacks)
// - Safe margin from 1GB OOM kill threshold

// OOM prevention checklist:
// 1. Set GOMEMLIMIT = 0.9 * container_limit_mb
// 2. Monitor heap via runtime.MemStats.HeapAlloc
// 3. Set resource requests = 0.7 * limit (for Kubernetes scheduler)
// 4. Alert if HeapAlloc > 0.8 * GOMEMLIMIT
// 5. Use cgroup v2 aware Go 1.19+ (auto-detects limits)

Memory Limit Cycle Detection and Prevention

import (
	"fmt"
	"runtime"
	"sync"
	"time"
)

// Detecting memory limit cycles (rapid GC due to constant high allocation)
func detectMemoryLimitCycle() {
	var m runtime.MemStats
	prevGCCount := uint32(0)
	ticker := time.NewTicker(1 * time.Second)
	defer ticker.Stop()

	for range ticker.C {
		runtime.ReadMemStats(&m)

		// High allocation rate near memory limit
		if m.NumGC > prevGCCount {
			allocRate := m.Alloc / 1024 / 1024 // MB
			fmt.Printf("GC #%d: Heap=%dMB, GC pause=%v\n",
				m.NumGC, allocRate, time.Duration(m.PauseNs[(m.NumGC+255)%256]))

			// If GCs are happening every second, you're in a cycle
			// Solution: reduce allocation rate or increase GOMEMLIMIT
			prevGCCount = m.NumGC
		}
	}
}

// Preventing memory limit cycles:
// 1. Measure actual allocation rate: objects/sec * avg_object_size
// 2. Calculate time between GCs: heap_size / allocation_rate
// 3. If < 1 second: increase GOMEMLIMIT or reduce allocation rate
// 4. Use object pooling to reduce allocation rate
// 5. Ensure go mod/dep is up to date (GC improvements in recent versions)

GC Pause Analysis and Measurement

import (
	"fmt"
	"runtime"
	"time"
)

func measureGCPauses() {
	var m runtime.MemStats
	lastNumGC := uint32(0)

	ticker := time.NewTicker(100 * time.Millisecond)
	defer ticker.Stop()

	var pauses []time.Duration

	for range ticker.C {
		runtime.ReadMemStats(&m)

		// Check if GC ran since last check
		if m.NumGC > lastNumGC {
			// Get pause time (circular buffer of last 256 pauses)
			pauseNs := m.PauseNs[(m.NumGC+255)%256]
			pauseDuration := time.Duration(pauseNs)

			pauses = append(pauses, pauseDuration)
			if len(pauses) > 100 {
				pauses = pauses[1:]
			}

			fmt.Printf("GC pause: %v, heap: %vMB\n", pauseDuration, m.Alloc/1024/1024)
			lastNumGC = m.NumGC
		}

		// Calculate percentiles
		if len(pauses) > 10 {
			p50 := pauses[len(pauses)/2]
			p95 := pauses[(len(pauses)*95)/100]
			p99 := pauses[(len(pauses)*99)/100]
			fmt.Printf("Pause times - p50: %v, p95: %v, p99: %v\n", p50, p95, p99)
		}
	}
}

// Typical pause times:
// - Small heap (< 100MB): 0.1-0.3ms
// - Medium heap (100MB-1GB): 0.3-1ms
// - Large heap (> 1GB): 0.5-2ms
//
// Pause time DOES scale with:
// - Goroutine count (stack scanning in mark termination)
// - Object graph size (mark phase duration)
// - Allocation rate (more objects to scan)
//
// Pause time does NOT scale with heap size
// (because most marking is concurrent)
//
// For P99 latency-sensitive services:
// - Target: GC pause < 10ms
// - Typical: < 2ms is achievable with GOMEMLIMIT tuning
// - If > 5ms: increase GOMEMLIMIT or reduce goroutine count

GC Pacer: How Runtime Decides When to Collect

// The GC pacer makes decisions based on:
// 1. Allocation rate: bytes allocated per unit time
// 2. Current live heap size
// 3. Configured GOGC or GOMEMLIMIT
// 4. Historical GC time measurements

// Pacer algorithm (simplified):
// - Tracks bytes allocated since last GC
// - Predicts when next_gc_goal will be reached
// - Starts GC when: bytes_allocated / allocation_rate > 0.95 * remaining_time
// - Ensures mark phase completes before allocation reaches goal

// Factors affecting pacer behavior:
// - High allocation rate: starts GC earlier (gives marking time)
// - Variable allocation rate: may be conservative (prevents missed goals)
// - Multi-threaded contention: may delay GC slightly

// If you see unexpected GCs:
// 1. Check allocation rate with pprof
// 2. Verify GOGC/GOMEMLIMIT settings
// 3. Monitor for memory leaks (heap growing unbounded)
// 4. Check for finalizers (not implemented in this example)

Benchmarking with Different GC Settings

import (
	"fmt"
	"os"
	"runtime"
	"runtime/debug"
	"testing"
	"time"
)

// Benchmark allocation and GC impact
func BenchmarkGCSettings(b *testing.B) {
	for _, gogc := range []int{50, 100, 200} {
		b.Run(fmt.Sprintf("GOGC=%d", gogc), func(b *testing.B) {
			debug.SetGCPercent(gogc)
			defer debug.SetGCPercent(100) // Restore

			b.ReportAllocs()
			b.ResetTimer()

			var m runtime.MemStats

			for i := 0; i < b.N; i++ {
				// Allocate 1MB
				_ = make([]byte, 1024*1024)
			}

			runtime.ReadMemStats(&m)
			fmt.Printf("GOGC=%d: NumGC=%d, PauseTotal=%v\n",
				gogc, m.NumGC, time.Duration(m.PauseNs[0]))
		})
	}

	// Expected results for 1000 iterations (1GB allocation):
	// GOGC=50:  NumGC=20, PauseTotal=15ms (frequent, low pause)
	// GOGC=100: NumGC=10, PauseTotal=10ms (balanced)
	// GOGC=200: NumGC=5,  PauseTotal=5ms  (infrequent, longer individual pauses)
}

// Real-world: Compare throughput vs latency
func BenchmarkGCThroughputVsLatency(b *testing.B) {
	b.Run("LowLatency_GOGC50", func(b *testing.B) {
		debug.SetGCPercent(50)
		defer debug.SetGCPercent(100)

		b.ReportAllocs()
		for i := 0; i < b.N; i++ {
			processRequest() // Allocates ~1KB per request
		}
		// Result: More GCs but lower p99 latency
	})

	b.Run("HighThroughput_GOGC200", func(b *testing.B) {
		debug.SetGCPercent(200)
		defer debug.SetGCPercent(100)

		b.ReportAllocs()
		for i := 0; i < b.N; i++ {
			processRequest()
		}
		// Result: Fewer GCs, higher throughput, worse p99 latency
	})
}

func processRequest() {
	_ = make([]byte, 1024)
}

Reducing Allocation Rate: The Most Impactful Optimization

import (
	"sync"
)

// ANTI-PATTERN: Creates new slice each call
func processItemsBad(items []string) {
	var results []string // Allocates
	for _, item := range items {
		results = append(results, item+"_processed")
	}
	_ = results
}

// PATTERN 1: Preallocate with known capacity
func processItemsGood(items []string) []string {
	results := make([]string, 0, len(items)) // Single allocation
	for _, item := range items {
		results = append(results, item+"_processed")
	}
	return results
}

// PATTERN 2: Use sync.Pool for temporary allocations
var stringPoolPool = sync.Pool{
	New: func() interface{} {
		return make([]string, 0, 100)
	},
}

func processItemsPooled(items []string) []string {
	results := stringPoolPool.Get().([]string)
	results = results[:0] // Reset to empty, keep capacity
	defer stringPoolPool.Put(results)

	for _, item := range items {
		results = append(results, item+"_processed")
	}
	return results
}

// Impact:
// - processItemsBad: 100 items → ~10 allocations (append growth)
// - processItemsGood: 100 items → 1 allocation
// - processItemsPooled: 100 items → 0 allocations (after warmup)
//
// For 1M items per second:
// - Bad: 10M allocations/sec → GC pressure, high memory bandwidth
// - Good: 1M allocations/sec → manageable
// - Pooled: 0 allocations/sec → near-zero GC overhead
//
// GC impact (with GOGC=100):
// - Bad: GC every 1-2 seconds, 1-2ms pauses
// - Good: GC every 5-10 seconds, <1ms pauses
// - Pooled: GC every 30+ seconds, <0.5ms pauses

Production GC Monitoring

import (
	"fmt"
	"runtime"
	"time"
)

type GCMetrics struct {
	NumGC           uint32
	LastGCTime      time.Time
	LastPauseDuration time.Duration
	HeapAllocMB     uint64
	HeapInuseMB     uint64
	GCCPUFraction   float64
}

func GetGCMetrics() GCMetrics {
	var m runtime.MemStats
	runtime.ReadMemStats(&m)

	return GCMetrics{
		NumGC:             m.NumGC,
		LastGCTime:        time.Unix(0, int64(m.LastGC)),
		LastPauseDuration: time.Duration(m.PauseNs[(m.NumGC+255)%256]),
		HeapAllocMB:       m.Alloc / 1024 / 1024,
		HeapInuseMB:       m.HeapInuse / 1024 / 1024,
		GCCPUFraction:     m.GCCPUFraction,
	}
}

func MonitorGC(interval time.Duration) {
	ticker := time.NewTicker(interval)
	defer ticker.Stop()

	for range ticker.C {
		metrics := GetGCMetrics()

		// Alert conditions:
		// 1. GC pause exceeding threshold
		if metrics.LastPauseDuration > 5*time.Millisecond {
			fmt.Printf("WARNING: GC pause exceeded 5ms: %v\n", metrics.LastPauseDuration)
		}

		// 2. High GC CPU overhead
		if metrics.GCCPUFraction > 0.25 { // >25% CPU spent in GC
			fmt.Printf("WARNING: High GC CPU: %.1f%%\n", metrics.GCCPUFraction*100)
		}

		// 3. Heap allocation near limit
		if metrics.HeapAllocMB > 900 { // Assuming 1GB limit
			fmt.Printf("WARNING: Heap near limit: %dMB\n", metrics.HeapAllocMB)
		}

		// Log metrics for analysis
		fmt.Printf("GC Metrics - Count: %d, LastPause: %v, Heap: %dMB, CPU: %.1f%%\n",
			metrics.NumGC,
			metrics.LastPauseDuration,
			metrics.HeapAllocMB,
			metrics.GCCPUFraction*100)
	}
}

// Production monitoring checklist:
// - GC pause times (should stay < 10ms for p99)
// - GC frequency (should be predictable, not random spikes)
// - Heap allocation trend (should be stable, not growing)
// - GC CPU overhead (should be < 20% for most workloads)
// - goroutine count (affects mark termination pause time)

Real-World Tuning Examples

Example 1: HTTP API Server

import (
	"net/http"
	"os"
	"runtime/debug"
)

func setupAPIServer() {
	// For API server with 100 RPS average, 1000 RPS burst:
	// - Each request: ~10KB allocation
	// - Live set: ~50MB (working set)
	// - Allocation rate: 100 RPS * 10KB = 1MB/sec

	// Configuration:
	// GOGC=100 (default is fine)
	// GOMEMLIMIT=512MiB (2x expected peak)
	// Result: GC ~every 5 seconds, <1ms pauses

	if os.Getenv("GOGC") == "" {
		debug.SetGCPercent(100)
	}

	// Enable GC tracing in development
	if os.Getenv("DEBUG") != "" {
		os.Setenv("GODEBUG", "gctrace=1")
	}
}

// Run with:
// GOMEMLIMIT=512MiB ./api-server
// For lower latency:
// GOGC=50 GOMEMLIMIT=512MiB ./api-server

Example 2: Batch Processing

import (
	"runtime/debug"
)

func setupBatchProcessor() {
	// For batch processing:
	// - Each batch: 100MB allocation
	// - Process time: 10 seconds per batch
	// - No tight latency requirements
	// - Throughput critical

	// Configuration:
	// GOGC=200 (less frequent GC)
	// GOMEMLIMIT=1GiB (plenty of room)
	// Result: GC every 30+ seconds, infrequent pauses

	debug.SetGCPercent(200)
}

// Run with:
// GOMEMLIMIT=1GiB ./batch-processor

Example 3: Real-Time Data Processing

import (
	"runtime/debug"
)

func setupRealtimeProcessor() {
	// For real-time processing (10ms latency target):
	// - Must avoid GC pauses during critical windows
	// - Prioritize stable low latency over throughput

	// Configuration:
	// GOGC=50 (more frequent, smaller pauses)
	// GOMEMLIMIT=256MiB (tight, forces GC often)
	// Result: GC every 1-2 seconds, <0.5ms pauses

	debug.SetGCPercent(50)
}

// Run with:
// GOMEMLIMIT=256MiB ./realtime-processor

When NOT to Use GOGC=off and SetGCPercent

import (
	"runtime/debug"
)

// ANTI-PATTERN: Disabling GC
func disableGCBad() {
	debug.SetGCPercent(-1) // Disable automatic GC
	// This is almost never correct
	// You must manually call runtime.GC() periodically
	// Risk: Memory grows unbounded between manual GC calls
	// Use case: Custom GC scheduling in extreme scenarios
	// Better: Use GOMEMLIMIT instead
}

// When GOGC=off might be appropriate:
// 1. Custom GC scheduler (very rare)
// 2. Predictable batch boundaries (process batch → GC → next batch)
// 3. Extreme low-latency requirement with careful heap management
//
// Better alternatives:
// - GOMEMLIMIT (handles most cases)
// - Reduce allocation rate (most effective)
// - Increase GOGC value (simpler than disabling)

Go GC Improvements Over Versions

Go 1.12: Concurrent sweep improvements
Go 1.13: Scavenging improvements, GC CPU fraction tracking
Go 1.14: Smaller Mark Termination pauses
Go 1.15: Write barrier improvements
Go 1.16: Pacer improvements
Go 1.17: Faster write barriers
Go 1.18: Better pacer tuning
Go 1.19: GOMEMLIMIT support (major improvement for containers)
Go 1.20: GOMEMLIMIT hardening, cgroup v2 support
Go 1.21: GC latency improvements

Summary

Go's garbage collector is highly optimized for low latency with typical pause times under 1ms even for large heaps. Control GC behavior with GOGC (percentage-based) or GOMEMLIMIT (absolute limit), preferring GOMEMLIMIT in containerized environments. Default GOGC=100 means "collect when heap doubles"; lower values (50) reduce pause frequency at cost of more CPU, higher values (200) increase pause intervals. For containers, set GOMEMLIMIT to 85-90% of container memory limit to prevent OOM kills. Measure actual GC impact with runtime.MemStats and GODEBUG=gctrace=1. The most impactful optimization is reducing allocation rate through preallocation and object pooling, which can reduce GC frequency by 10-100x. Monitor pause times and GC CPU overhead in production; alert on p99 latency > 10ms or GC consuming > 25% CPU. Use profiling with go test -benchmem to identify allocation hotspots. For low-latency services (under 10ms target), combine GOGC=50 with GOMEMLIMIT and aggressive allocation reduction. For throughput services, use GOGC=200 or higher with larger GOMEMLIMIT. Always measure actual GC behavior in production; tuning is workload-specific and generic settings rarely match production characteristics.

On this page