Atomic Operations
Master atomic operations for lock-free programming, memory ordering, and high-performance concurrent counters in Go.
Atomic Operations
Atomic operations provide lock-free synchronization for simple shared variables. The sync/atomic package offers CPU-level instructions that guarantee atomicity without mutexes, enabling faster concurrent access for basic data types.
The sync/atomic Package
The atomic package provides functions for atomic access to basic types:
import "sync/atomic"
// Original int64 interface (all Go versions)
func AtomicAddInt64(ptr *int64, delta int64) int64
func AtomicLoadInt64(ptr *int64) int64
func AtomicStoreInt64(ptr *int64, val int64)
func AtomicSwapInt64(ptr *int64, new int64) int64
func AtomicCompareAndSwapInt64(ptr *int64, old, new int64) bool
// Generic functions (Go 1.19+)
type Int64 struct { /* unexported */ }
func (x *Int64) Add(delta int64) int64
func (x *Int64) Load() int64
func (x *Int64) Store(val int64)
func (x *Int64) Swap(new int64) int64
func (x *Int64) CompareAndSwap(old, new int64) boolAtomic vs Mutex for Simple Counters
For simple counters, atomics outperform mutexes by orders of magnitude:
package benchmark
import (
"sync"
"sync/atomic"
"testing"
)
// Mutex-protected counter
type MutexCounter struct {
mu sync.Mutex
value int64
}
func (c *MutexCounter) Increment() {
c.mu.Lock()
c.value++
c.mu.Unlock()
}
func (c *MutexCounter) Get() int64 {
c.mu.Lock()
defer c.mu.Unlock()
return c.value
}
// Atomic counter
type AtomicCounter struct {
value atomic.Int64
}
func (c *AtomicCounter) Increment() {
c.value.Add(1)
}
func (c *AtomicCounter) Get() int64 {
return c.value.Load()
}
// Benchmark comparison
func BenchmarkMutexCounter(b *testing.B) {
counter := &MutexCounter{}
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
counter.Increment()
}
})
}
func BenchmarkAtomicCounter(b *testing.B) {
counter := &AtomicCounter{}
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
counter.Increment()
}
})
}
// Benchmark results on a 4-core system:
// BenchmarkMutexCounter 2000000 598 ns/op
// BenchmarkAtomicCounter 20000000 56 ns/op
// Atomic is ~10x faster for simple incrementsKey Insight: Mutexes have overhead from acquiring/releasing locks and context switching. Atomics use CPU-level instructions that don't require kernel calls.
Go 1.19+ Typed Atomics
Go 1.19 introduced generic atomic types that provide type safety and eliminate pointer arithmetic:
// Old approach (Go 1.18 and earlier)
var count int64
atomic.AddInt64(&count, 1) // Must use pointers
value := atomic.LoadInt64(&count)
// New approach (Go 1.19+)
var count atomic.Int64
count.Add(1) // Type-safe, no pointers
value := count.Load()
// Available types:
// atomic.Bool
// atomic.Int32, atomic.Int64
// atomic.Uint32, atomic.Uint64
// atomic.Uintptr
// atomic.Pointer[T]Benefits of typed atomics:
- Type safety: Compiler catches mistakes
- Cleaner syntax: No pointer dereferencing
- Escapes analysis: Better for stack allocation
- Documentation: Clear intent in code
type RequestCounter struct {
total atomic.Int64
success atomic.Int64
errors atomic.Int64
}
func (rc *RequestCounter) RecordSuccess() {
rc.total.Add(1)
rc.success.Add(1)
}
func (rc *RequestCounter) RecordError() {
rc.total.Add(1)
rc.errors.Add(1)
}
func (rc *RequestCounter) Stats() (total, success, errors int64) {
return rc.total.Load(), rc.success.Load(), rc.errors.Load()
}Memory Ordering
Atomics are not just about preventing race conditions. They also provide memory ordering guarantees that ensure visibility of changes across CPU cores:
Sequential Consistency (Full Barrier)
Most atomic operations in Go provide sequential consistency by default:
var flag atomic.Bool
var data int
// Thread A: Write with acquire semantics
data = 42
flag.Store(true) // Full memory barrier
// Thread B: Read with release semantics
if flag.Load() { // Full memory barrier
println(data) // Always prints 42, never 0
}Sequential consistency means:
- All threads see the same order of atomic operations
- Atomics act as implicit memory barriers
- More conservative (slower) than other memory models
x86-64 vs ARM
Different CPU architectures have different memory models:
// x86-64: Store already has release semantics
// ARM: Store needs explicit release (Go handles this)
// MIPS: Store needs full barrier
flag.Store(true)
// Go runtime translates this to appropriate CPU instructionImportant: Go abstracts CPU differences. You don't need to think about x86 vs ARM memory models; the runtime ensures correct behavior.
Compare-And-Swap (CAS) Patterns
CAS is a foundation for lock-free algorithms. It atomically compares a value and conditionally updates it:
type Flags struct {
value atomic.Uint32
}
// CAS returns true if the swap succeeded
func (f *Flags) SetBitIfClear(bit uint32) bool {
for {
old := f.value.Load()
new := old | bit
if f.value.CompareAndSwap(old, new) {
return old&bit == 0 // Was bit clear before?
}
// Retry if concurrent modification detected
}
}
// Lock-free stack using CAS
type Node struct {
Value int
Next *Node
}
type Stack struct {
head atomic.Pointer[Node]
}
func (s *Stack) Push(value int) {
newHead := &Node{Value: value}
for {
old := s.head.Load()
newHead.Next = old
if s.head.CompareAndSwap(old, newHead) {
return
}
}
}
func (s *Stack) Pop() (int, bool) {
for {
old := s.head.Load()
if old == nil {
return 0, false
}
if s.head.CompareAndSwap(old, old.Next) {
return old.Value, true
}
}
}CAS loops are the building block for lock-free data structures. However, Go's atomic.Pointer[T] (Go 1.19+) makes this safer:
// atomic.Pointer provides type-safe CAS
var head atomic.Pointer[Node]
head.CompareAndSwap(old, new) // Type-safeLock-Free Data Structures Basics
Lock-free structures use atomics and CAS to avoid mutexes. The fundamental pattern is optimistic updates with CAS retry:
// Simple lock-free counter with maximum value
type BoundedCounter struct {
value atomic.Int64
max int64
}
func NewBoundedCounter(max int64) *BoundedCounter {
return &BoundedCounter{max: max}
}
func (bc *BoundedCounter) TryIncrement() bool {
for {
current := bc.value.Load()
if current >= bc.max {
return false // Would exceed max
}
if bc.value.CompareAndSwap(current, current+1) {
return true // Successfully incremented
}
// Retry on conflict
}
}
// Lock-free queue (simplified)
type QueueNode struct {
Value int
Next atomic.Pointer[QueueNode]
}
type LockFreeQueue struct {
head atomic.Pointer[QueueNode]
tail atomic.Pointer[QueueNode]
}
func NewLockFreeQueue() *LockFreeQueue {
sentinel := &QueueNode{}
q := &LockFreeQueue{}
q.head.Store(sentinel)
q.tail.Store(sentinel)
return q
}
func (q *LockFreeQueue) Enqueue(value int) {
newNode := &QueueNode{Value: value}
for {
tail := q.tail.Load()
next := tail.Next.Load()
// Check if tail is still valid
if tail != q.tail.Load() {
continue
}
if next == nil {
// Try to append to tail
if tail.Next.CompareAndSwap(nil, newNode) {
q.tail.CompareAndSwap(tail, newNode)
return
}
} else {
// Help advance tail
q.tail.CompareAndSwap(tail, next)
}
}
}When to Use Atomics vs Mutexes
Choose atomics when:
- Simple types: Single int64, bool, pointer
- High frequency: Many reads/writes per second
- Lock contention expected: Multiple goroutines accessing frequently
- Predictable latency required: Atomics don't have lock acquisition delays
Choose mutexes when:
- Complex state: Multiple related fields
- Infrequent access: Low contention scenarios
- Readability matters: Complex lock-free logic is hard to understand
- RWMutex needed: Benefit from read-heavy workloads
// Use atomics
type HTTPStats struct {
requests atomic.Int64
errors atomic.Int64
}
// Use mutex
type UserRecord struct {
mu sync.RWMutex
name string
email string
age int
verified bool
// Multiple related fields that must change together
}Benchmark: Atomic vs Mutex vs Channel
package benchmark
import (
"sync"
"sync/atomic"
"testing"
)
func BenchmarkAtomicCounter(b *testing.B) {
var counter atomic.Int64
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
counter.Add(1)
}
})
}
func BenchmarkMutexCounter(b *testing.B) {
var mu sync.Mutex
var counter int64
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
mu.Lock()
counter++
mu.Unlock()
}
})
}
func BenchmarkRWMutexCounter(b *testing.B) {
var mu sync.RWMutex
var counter int64
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
mu.Lock()
counter++
mu.Unlock()
}
})
}
func BenchmarkChannelCounter(b *testing.B) {
ch := make(chan int, 1)
var counter int64
go func() {
for range ch {
counter++
}
}()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
ch <- 1
}
})
close(ch)
}
// Benchmark results on 4-core system:
// BenchmarkAtomicCounter 20000000 57 ns/op
// BenchmarkMutexCounter 2000000 523 ns/op
// BenchmarkRWMutexCounter 2000000 605 ns/op
// BenchmarkChannelCounter 1000000 1234 ns/opRankings:
- Atomic: Fastest, no contention, CPU-level instruction
- Mutex: 9x slower, lock acquisition overhead
- RWMutex: Similar to mutex when writing
- Channel: ~20x slower, scheduling overhead
Real-World Example: Metrics Collector
type Metrics struct {
httpRequests atomic.Int64
dbQueries atomic.Int64
cacheHits atomic.Int64
cacheMisses atomic.Int64
averageLatencyMs atomic.Int64
}
func (m *Metrics) RecordHTTPRequest() {
m.httpRequests.Add(1)
}
func (m *Metrics) RecordDBQuery() {
m.dbQueries.Add(1)
}
func (m *Metrics) RecordCacheHit() {
m.cacheHits.Add(1)
}
func (m *Metrics) RecordCacheMiss() {
m.cacheMisses.Add(1)
}
func (m *Metrics) RecordLatency(latencyMs int64) {
// Simple averaging (not statistically accurate, just example)
for {
current := m.averageLatencyMs.Load()
next := (current + latencyMs) / 2
if m.averageLatencyMs.CompareAndSwap(current, next) {
return
}
}
}
func (m *Metrics) CacheHitRate() float64 {
hits := float64(m.cacheHits.Load())
misses := float64(m.cacheMisses.Load())
total := hits + misses
if total == 0 {
return 0
}
return hits / total
}
// Usage in HTTP handler
func HandleRequest(m *Metrics) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
m.RecordHTTPRequest()
start := time.Now()
// Handle request...
latency := time.Since(start).Milliseconds()
m.RecordLatency(latency)
}
}Best Practices
- Prefer typed atomics (Go 1.19+): Better safety and readability
- Use for simple values: Atomics shine with int64, bool, pointers
- Avoid in tight loops: Even atomic operations have cost
- Document shared state: Make it clear which fields are atomic
- Consider RWMutex for read-heavy: If mostly reads, RWMutex can be faster
- Profile first: Don't optimize prematurely; measure contention
- Keep logic simple: Avoid complex CAS loops unless necessary
Atomic operations are the foundation of high-performance concurrent Go code. Understanding their mechanics and trade-offs enables building systems that handle massive concurrency efficiently.