Go Performance Guide
Memory Management

Value Copy Costs

Understanding Go's value model: direct parts, indirect parts, type sizes, and the performance impact of value copying in various scenarios.

Introduction

Go's memory model requires developers to understand the distinction between direct and indirect value parts. When you copy a value in Go, only the direct part gets copied. For some types like slices and maps, this means the underlying data remains shared. For others like large arrays and structs, copying becomes expensive. This article explores how to identify copy costs and optimize them.

Direct Parts vs Indirect Parts

Every Go type falls into one of two categories:

Direct-only types: The entire value is stored inline. Copying copies everything.

  • Primitive types: bool, int, uint, float64, etc.
  • Arrays: [N]T — the entire array is the direct part
  • Structs: all fields are stored inline

Header + Backing types: The direct part is just a header pointing to backing data stored elsewhere (usually the heap).

  • string: pointer (8 bytes) + length (8 bytes) = 16 bytes total
  • []T slice: pointer (8 bytes) + length (8 bytes) + capacity (8 bytes) = 24 bytes total
  • map[K]V: just a pointer (8 bytes) on 64-bit systems
  • interface{}: type pointer (8 bytes) + data pointer (8 bytes) = 16 bytes total
  • chan T: just a pointer (8 bytes)
  • Function values: just a pointer (8 bytes)

When you copy a slice, you copy only the 24-byte header. The underlying array remains in the same location. When you copy an array, you copy all the bytes.

Type Sizes Reference Table

Understanding the size of types helps predict copy costs:

TypeSize (64-bit)CategoryNotes
bool1 byteDirect-only
int8 / uint81 byteDirect-only
int16 / uint162 bytesDirect-only
int32 / uint324 bytesDirect-only
int64 / uint648 bytesDirect-only
int / uint8 bytesDirect-onlyOn 64-bit systems
float324 bytesDirect-only
float648 bytesDirect-only
uintptr8 bytesDirect-only
*T8 bytesDirect-onlyAny pointer type
string16 bytesHeader onlyPointer (8) + length (8)
[]T slice24 bytesHeader onlyPointer (8) + length (8) + cap (8)
interface{}16 bytesHeader onlyType pointer (8) + data pointer (8)
map[K]V8 bytesHeader onlyJust a pointer
chan T8 bytesHeader onlyJust a pointer

The Cost of Copying Large Values

Copying is cheap for small types and expensive for large ones. Let's benchmark:

package main

import (
	"fmt"
	"testing"
)

// Small struct: 4 fields (32 bytes on 64-bit)
type SmallStruct struct {
	A int64
	B int64
	C int64
	D int64
}

// Large struct: 5 fields (40 bytes on 64-bit)
type LargeStruct struct {
	A int64
	B int64
	C int64
	D int64
	E int64
}

// Medium array: 100 integers
type MediumArray [100]int64

// Large array: 10,000 integers
type LargeArray [10000]int64

func BenchmarkSmallStructCopy(b *testing.B) {
	s := SmallStruct{1, 2, 3, 4}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = s // Compiler still tracks the copy
	}
}

func BenchmarkLargeStructCopy(b *testing.B) {
	s := LargeStruct{1, 2, 3, 4, 5}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = s
	}
}

func BenchmarkMediumArrayCopy(b *testing.B) {
	arr := MediumArray{}
	for i := range arr {
		arr[i] = int64(i)
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = arr
	}
}

func BenchmarkLargeArrayCopy(b *testing.B) {
	arr := LargeArray{}
	for i := range arr {
		arr[i] = int64(i)
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = arr
	}
}

func BenchmarkSliceHeaderCopy(b *testing.B) {
	s := make([]int64, 10000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = s
	}
}

func main() {
	fmt.Println("Run with: go test -bench=. -benchmem")
}

Sample Results (on a typical modern CPU):

  • SmallStructCopy: ~0.5 ns/op
  • LargeStructCopy: ~1.0 ns/op (5 fields vs 4 fields causes measurable difference)
  • MediumArrayCopy: ~20 ns/op (800 bytes)
  • LargeArrayCopy: ~2000 ns/op (80,000 bytes)
  • SliceHeaderCopy: ~0.2 ns/op (only 24 bytes copied)

Notice the dramatic difference: a slice copy is orders of magnitude faster than a large array copy, even though the slice contains the same data, because only the header is copied.

The Compiler's Optimization Threshold

The Go compiler has a special optimization for small structs and arrays. Values with 4 or fewer machine-word-sized fields can often be optimized to use registers instead of stack allocation. Crossing this threshold can trigger a performance cliff:

type Person struct {
	ID        int64
	Name      *string
	Email     *string
	CreatedAt int64
}
// 4 pointers/ints — compiler may optimize heavily

type PersonPlus struct {
	ID        int64
	Name      *string
	Email     *string
	CreatedAt int64
	Active    bool
}
// 5 fields — may spill to memory operations

Common Value Copy Scenarios

Scenario 1: Function Parameters (Pass by Value)

Every time you call a function with a value parameter, Go copies the argument:

func ProcessData(data [1000]int) {
	// Copies 8000 bytes into the function!
}

// Better for large types:
func ProcessDataPtr(data *[1000]int) {
	// Only 8 bytes copied (the pointer)
}

Scenario 2: Range Loop with Value Copy

One of the most common performance pitfalls:

type Event struct {
	ID       int64
	Timestamp int64
	Message  string
	Data     [256]byte
}

// Expensive: copies Event (including [256]byte) for each iteration
for _, event := range events {
	process(event) // Event copied here!
}

// Better: use index-only range
for i := range events {
	process(&events[i]) // Only pointer copied
}

// Or: range over pointers
for _, event := range ptrEvents {
	process(event) // Only pointer copied
}

Let's benchmark this:

type LargeEvent struct {
	ID    int64
	Value int64
	Blob  [512]byte // 512 bytes
}

func BenchmarkRangeWithValue(b *testing.B) {
	events := make([]LargeEvent, 1000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sum := int64(0)
		for _, e := range events {
			sum += e.ID + e.Value
		}
	}
}

func BenchmarkRangeWithIndex(b *testing.B) {
	events := make([]LargeEvent, 1000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sum := int64(0)
		for idx := range events {
			sum += events[idx].ID + events[idx].Value
		}
	}
}

func BenchmarkRangePointers(b *testing.B) {
	events := make([]*LargeEvent, 1000)
	for i := range events {
		e := &LargeEvent{}
		events[i] = e
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sum := int64(0)
		for _, e := range events {
			sum += e.ID + e.Value
		}
	}
}

Results: RangeWithValue is ~4x slower than RangeWithIndex due to the constant copying of the 512-byte blob.

Scenario 3: Channel Send and Receive

Channels copy values:

type Packet struct {
	Header [64]byte
	Payload [4096]byte
}

// Expensive: sends copy of entire Packet
ch := make(chan Packet)
ch <- packet

// Better: send pointer
chPtr := make(chan *Packet)
chPtr <- &packet

Scenario 4: Map Insertion

Maps copy their value types:

type Config struct {
	Setting1 string
	Setting2 int
	Setting3 [1000]byte
}

configs := make(map[string]Config)
configs["key"] = cfg // Copies entire Config including [1000]byte

// Better approach: store pointers
configPtrs := make(map[string]*Config)
configPtrs["key"] = &cfg

Scenario 5: interface{} Boxing

Assigning a value to an interface{} requires boxing (copying the value onto the heap):

var data [1000]int
var i interface{} = data // Array gets copied during boxing

var ptr *[1000]int = &data
i = ptr // Only pointer (8 bytes) boxed

Value vs Pointer: Decision Tree

Use value parameters when:

  • The type is ≤ 4 machine words (32 bytes on 64-bit)
  • The function doesn't modify the value
  • You need the safety of value semantics (copy-on-assignment)

Use pointer parameters when:

  • The type is > 4 machine words
  • The function modifies the value and you want those changes visible to the caller
  • The value is expensive to copy
func ProcessSmall(s SmallStruct) {
	// Value copy is cheap, semantics are clear
}

func ProcessLarge(l *LargeStruct) {
	// Pointer is much faster
}

func Modify(s *SmallStruct) {
	// We want to modify the original, so pointer is correct
	s.Field = newValue
}

Optimization Patterns

Pattern 1: Use Pointers in Hot Paths

// Hot path that processes millions of events
type Event struct {
	ID    int64
	Type  int
	Value int64
	Data  [128]byte
}

// Slower: copies Event each iteration
for _, e := range events {
	total += processEvent(e)
}

// Faster: use pointer slices
func BenchmarkEventProcessing(b *testing.B) {
	events := make([]*Event, 10000)
	for i := range events {
		events[i] = &Event{ID: int64(i)}
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sum := int64(0)
		for _, e := range events {
			sum += e.ID
		}
	}
}

Pattern 2: Pre-allocate Large Values

// Don't create large values in loops
for i := 0; i < 1000; i++ {
	data := [1000]int{} // Allocated each iteration
	process(data)
}

// Better: allocate once, reuse
data := [1000]int{}
for i := 0; i < 1000; i++ {
	// Clear if needed
	clear(data[:])
	process(data)
}

Pattern 3: Return Pointers from Factory Functions

// If constructing a large value, return a pointer
func NewConfig() *Config {
	return &Config{...}
}

// Not:
func NewConfig() Config {
	return Config{...}
}

Memory Layout and Alignment

The compiler aligns struct fields to optimize memory access. This can affect actual size:

type Aligned struct {
	A bool    // 1 byte, then 7 bytes padding
	B int64   // 8 bytes (needs 8-byte alignment)
	C int32   // 4 bytes, then 4 bytes padding
	D int64   // 8 bytes
}
// Total: 32 bytes (not 21)

type Better struct {
	A int64
	B int64
	C int32
	D bool
}
// Total: 24 bytes (fields sorted by size)

Optimizing alignment reduces struct size, which reduces copy costs.

Summary and Recommendations

  1. Understand your type sizes: Small values (32 bytes or less) are cheap to copy. Larger types should use pointers.

  2. Watch the range loop pitfall: Avoid copying large values in for _, v := range slice loops. Use index-only range or pointer slices instead.

  3. Profile before optimizing: Use go test -bench and memory profilers to identify actual copy costs in your code.

  4. Use pointers in hot paths: If a code path executes millions of times, even small optimizations matter.

  5. Consider the compiler threshold: Structs with 4 fields often compile more efficiently than those with 5+.

  6. Document your choice: When you use pointers for large values, make it clear in comments why you're not using value semantics.

The most impactful optimization is often as simple as changing for _, e := range events to for i := range events when each e is large. This single-character change can improve performance by 50-80% in some cases.

On this page