Go Performance Guide
Concurrency Patterns

Atomic Operations

Master atomic operations for lock-free programming, memory ordering, and high-performance concurrent counters in Go.

Atomic Operations

Atomic operations provide lock-free synchronization for simple shared variables. The sync/atomic package offers CPU-level instructions that guarantee atomicity without mutexes, enabling faster concurrent access for basic data types.

The sync/atomic Package

The atomic package provides functions for atomic access to basic types:

import "sync/atomic"

// Original int64 interface (all Go versions)
func AtomicAddInt64(ptr *int64, delta int64) int64
func AtomicLoadInt64(ptr *int64) int64
func AtomicStoreInt64(ptr *int64, val int64)
func AtomicSwapInt64(ptr *int64, new int64) int64
func AtomicCompareAndSwapInt64(ptr *int64, old, new int64) bool

// Generic functions (Go 1.19+)
type Int64 struct { /* unexported */ }
func (x *Int64) Add(delta int64) int64
func (x *Int64) Load() int64
func (x *Int64) Store(val int64)
func (x *Int64) Swap(new int64) int64
func (x *Int64) CompareAndSwap(old, new int64) bool

Atomic vs Mutex for Simple Counters

For simple counters, atomics outperform mutexes by orders of magnitude:

package benchmark

import (
	"sync"
	"sync/atomic"
	"testing"
)

// Mutex-protected counter
type MutexCounter struct {
	mu    sync.Mutex
	value int64
}

func (c *MutexCounter) Increment() {
	c.mu.Lock()
	c.value++
	c.mu.Unlock()
}

func (c *MutexCounter) Get() int64 {
	c.mu.Lock()
	defer c.mu.Unlock()
	return c.value
}

// Atomic counter
type AtomicCounter struct {
	value atomic.Int64
}

func (c *AtomicCounter) Increment() {
	c.value.Add(1)
}

func (c *AtomicCounter) Get() int64 {
	return c.value.Load()
}

// Benchmark comparison
func BenchmarkMutexCounter(b *testing.B) {
	counter := &MutexCounter{}
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			counter.Increment()
		}
	})
}

func BenchmarkAtomicCounter(b *testing.B) {
	counter := &AtomicCounter{}
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			counter.Increment()
		}
	})
}

// Benchmark results on a 4-core system:
// BenchmarkMutexCounter    2000000    598 ns/op
// BenchmarkAtomicCounter   20000000   56 ns/op
// Atomic is ~10x faster for simple increments

Key Insight: Mutexes have overhead from acquiring/releasing locks and context switching. Atomics use CPU-level instructions that don't require kernel calls.

Go 1.19+ Typed Atomics

Go 1.19 introduced generic atomic types that provide type safety and eliminate pointer arithmetic:

// Old approach (Go 1.18 and earlier)
var count int64
atomic.AddInt64(&count, 1)  // Must use pointers
value := atomic.LoadInt64(&count)

// New approach (Go 1.19+)
var count atomic.Int64
count.Add(1)         // Type-safe, no pointers
value := count.Load()

// Available types:
// atomic.Bool
// atomic.Int32, atomic.Int64
// atomic.Uint32, atomic.Uint64
// atomic.Uintptr
// atomic.Pointer[T]

Benefits of typed atomics:

  1. Type safety: Compiler catches mistakes
  2. Cleaner syntax: No pointer dereferencing
  3. Escapes analysis: Better for stack allocation
  4. Documentation: Clear intent in code
type RequestCounter struct {
	total   atomic.Int64
	success atomic.Int64
	errors  atomic.Int64
}

func (rc *RequestCounter) RecordSuccess() {
	rc.total.Add(1)
	rc.success.Add(1)
}

func (rc *RequestCounter) RecordError() {
	rc.total.Add(1)
	rc.errors.Add(1)
}

func (rc *RequestCounter) Stats() (total, success, errors int64) {
	return rc.total.Load(), rc.success.Load(), rc.errors.Load()
}

Memory Ordering

Atomics are not just about preventing race conditions. They also provide memory ordering guarantees that ensure visibility of changes across CPU cores:

Sequential Consistency (Full Barrier)

Most atomic operations in Go provide sequential consistency by default:

var flag atomic.Bool
var data int

// Thread A: Write with acquire semantics
data = 42
flag.Store(true)  // Full memory barrier

// Thread B: Read with release semantics
if flag.Load() {  // Full memory barrier
	println(data)  // Always prints 42, never 0
}

Sequential consistency means:

  • All threads see the same order of atomic operations
  • Atomics act as implicit memory barriers
  • More conservative (slower) than other memory models

x86-64 vs ARM

Different CPU architectures have different memory models:

// x86-64: Store already has release semantics
// ARM: Store needs explicit release (Go handles this)
// MIPS: Store needs full barrier

flag.Store(true)
// Go runtime translates this to appropriate CPU instruction

Important: Go abstracts CPU differences. You don't need to think about x86 vs ARM memory models; the runtime ensures correct behavior.

Compare-And-Swap (CAS) Patterns

CAS is a foundation for lock-free algorithms. It atomically compares a value and conditionally updates it:

type Flags struct {
	value atomic.Uint32
}

// CAS returns true if the swap succeeded
func (f *Flags) SetBitIfClear(bit uint32) bool {
	for {
		old := f.value.Load()
		new := old | bit
		if f.value.CompareAndSwap(old, new) {
			return old&bit == 0  // Was bit clear before?
		}
		// Retry if concurrent modification detected
	}
}

// Lock-free stack using CAS
type Node struct {
	Value int
	Next  *Node
}

type Stack struct {
	head atomic.Pointer[Node]
}

func (s *Stack) Push(value int) {
	newHead := &Node{Value: value}
	for {
		old := s.head.Load()
		newHead.Next = old
		if s.head.CompareAndSwap(old, newHead) {
			return
		}
	}
}

func (s *Stack) Pop() (int, bool) {
	for {
		old := s.head.Load()
		if old == nil {
			return 0, false
		}
		if s.head.CompareAndSwap(old, old.Next) {
			return old.Value, true
		}
	}
}

CAS loops are the building block for lock-free data structures. However, Go's atomic.Pointer[T] (Go 1.19+) makes this safer:

// atomic.Pointer provides type-safe CAS
var head atomic.Pointer[Node]
head.CompareAndSwap(old, new)  // Type-safe

Lock-Free Data Structures Basics

Lock-free structures use atomics and CAS to avoid mutexes. The fundamental pattern is optimistic updates with CAS retry:

// Simple lock-free counter with maximum value
type BoundedCounter struct {
	value atomic.Int64
	max   int64
}

func NewBoundedCounter(max int64) *BoundedCounter {
	return &BoundedCounter{max: max}
}

func (bc *BoundedCounter) TryIncrement() bool {
	for {
		current := bc.value.Load()
		if current >= bc.max {
			return false  // Would exceed max
		}
		if bc.value.CompareAndSwap(current, current+1) {
			return true   // Successfully incremented
		}
		// Retry on conflict
	}
}

// Lock-free queue (simplified)
type QueueNode struct {
	Value int
	Next  atomic.Pointer[QueueNode]
}

type LockFreeQueue struct {
	head atomic.Pointer[QueueNode]
	tail atomic.Pointer[QueueNode]
}

func NewLockFreeQueue() *LockFreeQueue {
	sentinel := &QueueNode{}
	q := &LockFreeQueue{}
	q.head.Store(sentinel)
	q.tail.Store(sentinel)
	return q
}

func (q *LockFreeQueue) Enqueue(value int) {
	newNode := &QueueNode{Value: value}
	for {
		tail := q.tail.Load()
		next := tail.Next.Load()

		// Check if tail is still valid
		if tail != q.tail.Load() {
			continue
		}

		if next == nil {
			// Try to append to tail
			if tail.Next.CompareAndSwap(nil, newNode) {
				q.tail.CompareAndSwap(tail, newNode)
				return
			}
		} else {
			// Help advance tail
			q.tail.CompareAndSwap(tail, next)
		}
	}
}

When to Use Atomics vs Mutexes

Choose atomics when:

  • Simple types: Single int64, bool, pointer
  • High frequency: Many reads/writes per second
  • Lock contention expected: Multiple goroutines accessing frequently
  • Predictable latency required: Atomics don't have lock acquisition delays

Choose mutexes when:

  • Complex state: Multiple related fields
  • Infrequent access: Low contention scenarios
  • Readability matters: Complex lock-free logic is hard to understand
  • RWMutex needed: Benefit from read-heavy workloads
// Use atomics
type HTTPStats struct {
	requests atomic.Int64
	errors   atomic.Int64
}

// Use mutex
type UserRecord struct {
	mu       sync.RWMutex
	name     string
	email    string
	age      int
	verified bool
	// Multiple related fields that must change together
}

Benchmark: Atomic vs Mutex vs Channel

package benchmark

import (
	"sync"
	"sync/atomic"
	"testing"
)

func BenchmarkAtomicCounter(b *testing.B) {
	var counter atomic.Int64
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			counter.Add(1)
		}
	})
}

func BenchmarkMutexCounter(b *testing.B) {
	var mu sync.Mutex
	var counter int64
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			mu.Lock()
			counter++
			mu.Unlock()
		}
	})
}

func BenchmarkRWMutexCounter(b *testing.B) {
	var mu sync.RWMutex
	var counter int64
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			mu.Lock()
			counter++
			mu.Unlock()
		}
	})
}

func BenchmarkChannelCounter(b *testing.B) {
	ch := make(chan int, 1)
	var counter int64
	go func() {
		for range ch {
			counter++
		}
	}()

	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			ch <- 1
		}
	})
	close(ch)
}

// Benchmark results on 4-core system:
// BenchmarkAtomicCounter      20000000    57 ns/op
// BenchmarkMutexCounter        2000000   523 ns/op
// BenchmarkRWMutexCounter      2000000   605 ns/op
// BenchmarkChannelCounter      1000000  1234 ns/op

Rankings:

  1. Atomic: Fastest, no contention, CPU-level instruction
  2. Mutex: 9x slower, lock acquisition overhead
  3. RWMutex: Similar to mutex when writing
  4. Channel: ~20x slower, scheduling overhead

Real-World Example: Metrics Collector

type Metrics struct {
	httpRequests     atomic.Int64
	dbQueries        atomic.Int64
	cacheHits        atomic.Int64
	cacheMisses      atomic.Int64
	averageLatencyMs atomic.Int64
}

func (m *Metrics) RecordHTTPRequest() {
	m.httpRequests.Add(1)
}

func (m *Metrics) RecordDBQuery() {
	m.dbQueries.Add(1)
}

func (m *Metrics) RecordCacheHit() {
	m.cacheHits.Add(1)
}

func (m *Metrics) RecordCacheMiss() {
	m.cacheMisses.Add(1)
}

func (m *Metrics) RecordLatency(latencyMs int64) {
	// Simple averaging (not statistically accurate, just example)
	for {
		current := m.averageLatencyMs.Load()
		next := (current + latencyMs) / 2
		if m.averageLatencyMs.CompareAndSwap(current, next) {
			return
		}
	}
}

func (m *Metrics) CacheHitRate() float64 {
	hits := float64(m.cacheHits.Load())
	misses := float64(m.cacheMisses.Load())
	total := hits + misses
	if total == 0 {
		return 0
	}
	return hits / total
}

// Usage in HTTP handler
func HandleRequest(m *Metrics) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		m.RecordHTTPRequest()
		start := time.Now()

		// Handle request...

		latency := time.Since(start).Milliseconds()
		m.RecordLatency(latency)
	}
}

Best Practices

  1. Prefer typed atomics (Go 1.19+): Better safety and readability
  2. Use for simple values: Atomics shine with int64, bool, pointers
  3. Avoid in tight loops: Even atomic operations have cost
  4. Document shared state: Make it clear which fields are atomic
  5. Consider RWMutex for read-heavy: If mostly reads, RWMutex can be faster
  6. Profile first: Don't optimize prematurely; measure contention
  7. Keep logic simple: Avoid complex CAS loops unless necessary

Atomic operations are the foundation of high-performance concurrent Go code. Understanding their mechanics and trade-offs enables building systems that handle massive concurrency efficiently.

On this page