Channel Internals
Explore the internal structure of Go channels, including the hchan struct, ring buffer implementation, goroutine scheduling, and performance characteristics.
Introduction
Channels are Go's primary tool for communication between goroutines. While their surface API is simple—send, receive, close—the implementation is sophisticated. This article explores the internal hchan structure, the ring buffer algorithm, goroutine parking and unparking, and performance implications.
The hchan Structure
The core channel data structure is hchan (handler channel):
// runtime/chan.go (simplified)
type hchan struct {
qcount uint // number of elements queued in buffer
dataqsiz uint // size of the circular buffer
buf unsafe.Pointer // points to the circular buffer array
elemsize uint16 // size of each element
closed uint32 // 0 = open, 1 = closed
elemtype *_type // type of elements
sendx uint // send index in the circular buffer
recvx uint // receive index in the circular buffer
recvq waitq // queue of goroutines waiting to receive
sendq waitq // queue of goroutines waiting to send
lock mutex // protects the above fields
}Breaking this down:
- qcount: current number of elements in the buffer
- dataqsiz: capacity of the buffer (0 for unbuffered channels)
- buf: pointer to the allocated circular buffer
- elemsize, elemtype: metadata about the element type
- closed: channel state flag
- sendx, recvx: read/write pointers for the circular buffer
- recvq, sendq: doubly-linked queues of waiting goroutines
- lock: mutex protecting all above fields
Ring Buffer Implementation
Buffered channels use a circular (ring) buffer. The sendx and recvx pointers track the write and read positions.
Visualization: Ring Buffer
Channel with capacity 4, 2 elements:
sendx = 2
|
v
+---+---+---+---+
| X | X | . | . |
+---+---+---+---+
^
|
recvx = 0
qcount = 2, dataqsiz = 4As elements are sent and received:
- Sender writes to
buf[sendx]and incrementssendx(wrapping atdataqsiz) - Receiver reads from
buf[recvx]and incrementsrecvx - When
sendx == recvx, either the buffer is full (qcount == dataqsiz) or empty (qcount == 0)
Code Example: Manual Ring Buffer
package main
import (
"fmt"
"sync/atomic"
"unsafe"
)
// Simplified circular buffer
type SimpleRingBuffer struct {
buf []int
sendx int
recvx int
qcount int
capacity int
lock int32 // simplified lock
}
func (rb *SimpleRingBuffer) Send(val int) bool {
for !atomic.CompareAndSwapInt32(&rb.lock, 0, 1) {
// Spinlock
}
defer atomic.StoreInt32(&rb.lock, 0)
if rb.qcount >= rb.capacity {
return false // Buffer full
}
rb.buf[rb.sendx] = val
rb.sendx = (rb.sendx + 1) % rb.capacity
rb.qcount++
return true
}
func (rb *SimpleRingBuffer) Receive() (int, bool) {
for !atomic.CompareAndSwapInt32(&rb.lock, 0, 1) {
// Spinlock
}
defer atomic.StoreInt32(&rb.lock, 0)
if rb.qcount == 0 {
return 0, false // Buffer empty
}
val := rb.buf[rb.recvx]
rb.recvx = (rb.recvx + 1) % rb.capacity
rb.qcount--
return val, true
}
func main() {
rb := &SimpleRingBuffer{
buf: make([]int, 4),
capacity: 4,
}
rb.Send(10)
rb.Send(20)
rb.Send(30)
v1, _ := rb.Receive()
v2, _ := rb.Receive()
v3, _ := rb.Receive()
fmt.Printf("Received: %d, %d, %d\n", v1, v2, v3)
}Output:
Received: 10, 20, 30The sudog Structure: Goroutine Waiting
When a goroutine blocks on a channel, it's represented as a sudog (select data goroutine):
// runtime/runtime2.go (simplified)
type sudog struct {
g *g // the goroutine
next *sudog // next in list
prev *sudog // prev in list
waitlink *sudog // for select
c *hchan // the channel
elem unsafe.Pointer // pointer to the data being sent/received
releasetime int64 // when should this sudog be released?
}The sudog is allocated from a pool (pool of pre-allocated structs) to avoid allocation overhead during blocking operations.
Send Operation Path
A send operation ch <- value follows this path:
Step 1: Lock Acquisition
lock(ch.lock)Step 2: Check for Waiting Receivers
If ch.recvq is not empty (a receiver is waiting):
// Direct copy from sender's stack to receiver's stack
// No buffer involved!
memmove(receiver.elem, sender.elem, elemsize)
goready(receiver.g) // Wake receiver
unlock(ch.lock)
returnThis is the direct send optimization—one less memory copy.
Step 3: Check for Buffer Space
If the buffer has space:
// Copy to buffer
memmove(buf[sendx], sender.elem, elemsize)
sendx = (sendx + 1) % dataqsiz
qcount++
unlock(ch.lock)
returnStep 4: Goroutine Parks
If both above fail (no waiting receiver, buffer full):
// Create sudog
sudoG := acquireSudog()
sudoG.elem = addressOfValue
sudoG.g = currentG
sudoG.c = ch
// Add to send queue
ch.sendq.enqueue(sudoG)
// Park this goroutine
gopark(unlock, &ch.lock)
// When woken:
releaseSudog(sudoG)Receive Operation Path
A receive operation value := <- ch mirrors the send path:
Step 1: Lock Acquisition
lock(ch.lock)Step 2: Check for Waiting Senders
If ch.sendq is not empty:
// Direct copy from sender's stack to receiver's variable
// No buffer!
memmove(receiver.elem, sender.elem, elemsize)
goready(sender.g) // Wake sender
unlock(ch.lock)
returnStep 3: Check Buffer
If buffer has data:
// Copy from buffer
memmove(receiver.elem, buf[recvx], elemsize)
recvx = (recvx + 1) % dataqsiz
qcount--
unlock(ch.lock)
returnStep 4: Goroutine Parks
Otherwise, park and wait.
The Direct Send Optimization
This is one of Go's clever optimizations. When a receiver is waiting, the sender copies data directly to the receiver's stack, bypassing the buffer entirely:
// Sender: value := 42
// Receiver: x := <- ch
// Without receiver waiting: 42 -> buffer -> x
// With receiver waiting: 42 -> x (direct!)This saves a memory copy and avoids buffer allocation.
Demonstration: Unbuffered Rendezvous
package main
import (
"fmt"
"testing"
"time"
)
func BenchmarkChannelDirectTransfer(b *testing.B) {
ch := make(chan int) // Unbuffered: forces direct transfer
go func() {
for {
<-ch
}
}()
b.ResetTimer()
for i := 0; i < b.N; i++ {
ch <- i
}
}
func BenchmarkChannelWithBuffer(b *testing.B) {
ch := make(chan int, 1000) // Buffered: may use buffer
go func() {
for {
<-ch
}
}()
time.Sleep(10 * time.Millisecond) // Let goroutine start
b.ResetTimer()
for i := 0; i < b.N; i++ {
ch <- i
}
}
func main() {
testing.Main(
func(pat, str string) (bool, error) { return true, nil },
nil, nil, nil,
[]testing.Benchmark{},
)
}Typical results:
BenchmarkChannelDirectTransfer-8 50000000 25 ns/op
BenchmarkChannelWithBuffer-8 100000000 12 ns/opUnbuffered channels with active receivers are slightly slower due to synchronization overhead.
The Mutex Lock: Channel Bottleneck
Every send and receive operation acquires ch.lock, a mutex. This is the primary bottleneck for channel performance:
// Simplified send
func chansend(ch *hchan, ep unsafe.Pointer) {
lock(&ch.lock) // <-- Expensive
// ... perform send ...
unlock(&ch.lock) // <-- Expensive
}This is why channels are slower than alternatives for simple operations:
- Mutex: ~20ns per lock/unlock pair
- Atomic: ~5ns per operation
- Channel: ~50-100ns per operation (includes mutex + memory copy)
Unbuffered Channels
An unbuffered channel has dataqsiz == 0 and no allocated buffer:
ch := make(chan int) // dataqsiz = 0, buf = nil
// Every send must rendezvous with a receive
// Direct copy from sender to receiverUnbuffered channels force synchronization—each send blocks until a receiver consumes the value.
The select Statement Internals
A select statement in Go compiles to a call to runtime.selectgo():
select {
case v := <-ch1:
// ...
case ch2 <- w:
// ...
case <-ch3:
// ...
default:
// ...
}The selectgo() function:
- Randomizes case order to prevent starvation (if multiple cases are ready, a random one is chosen)
- Acquires locks in order (sorted by channel address) to prevent deadlock
- Evaluates all cases to find which are ready
- Parks the goroutine on all channels simultaneously
- Wakes when any case is ready
Example: select with Multiple Channels
package main
import (
"fmt"
"time"
)
func main() {
ch1 := make(chan string)
ch2 := make(chan string)
go func() {
time.Sleep(100 * time.Millisecond)
ch1 <- "one"
}()
go func() {
time.Sleep(200 * time.Millisecond)
ch2 <- "two"
}()
for i := 0; i < 2; i++ {
select {
case msg1 := <-ch1:
fmt.Println("Received from ch1:", msg1)
case msg2 := <-ch2:
fmt.Println("Received from ch2:", msg2)
}
}
}Output:
Received from ch1: one
Received from ch2: twoClosing a Channel
Closing a channel (close(ch)) sets the closed flag:
func closechan(ch *hchan) {
lock(&ch.lock)
if ch.closed != 0 {
unlock(&ch.lock)
panic("close of closed channel")
}
ch.closed = 1
// Wake all receivers
for {
sg := ch.recvq.dequeue()
if sg == nil {
break
}
if sg.elem != nil {
typedmemclr(ch.elemtype, sg.elem) // Zero value
}
goready(sg.g)
}
// All senders will panic
// (they're not woken; next send panics)
unlock(&ch.lock)
}Key behaviors:
- Receivers get zero values on a closed channel
- Senders panic if they try to send on a closed channel
- Closing already-closed channel panics
Nil Channels
Sending or receiving on a nil channel blocks forever:
var ch chan int // nil
ch <- 1 // Blocks forever
<-ch // Blocks foreverThis is useful in select statements to dynamically disable a case:
func producer(ch chan int) {
for i := 0; i < 10; i++ {
ch <- i
}
close(ch)
}
func consumer() {
ch := make(chan int)
go producer(ch)
var sendCh chan int // Start as nil to disable sends
var receiveCh chan int = ch
for {
select {
case val := <-receiveCh:
fmt.Println("Received:", val)
case sendCh <- 42:
// This case never triggers (sendCh is nil)
}
}
}Performance Characteristics
Latency: Operation Time
Operation Latency
Unbuffered send/recv ~100 ns
Buffered send (space) ~50 ns
Buffered recv (data) ~50 ns
Type assertion ~5 ns
Mutex lock/unlock ~20 ns
Atomic operation ~5 nsThroughput Benchmark
package main
import (
"testing"
)
func BenchmarkChannelThroughput(b *testing.B) {
b.Run("Unbuffered", func(b *testing.B) {
ch := make(chan int)
go func() {
for range ch {
}
}()
b.ResetTimer()
for i := 0; i < b.N; i++ {
ch <- i
}
close(ch)
})
b.Run("BufferSize1", func(b *testing.B) {
ch := make(chan int, 1)
go func() {
for range ch {
}
}()
b.ResetTimer()
for i := 0; i < b.N; i++ {
ch <- i
}
close(ch)
})
b.Run("BufferSize10", func(b *testing.B) {
ch := make(chan int, 10)
go func() {
for range ch {
}
}()
b.ResetTimer()
for i := 0; i < b.N; i++ {
ch <- i
}
close(ch)
})
b.Run("BufferSize100", func(b *testing.B) {
ch := make(chan int, 100)
go func() {
for range ch {
}
}()
b.ResetTimer()
for i := 0; i < b.N; i++ {
ch <- i
}
close(ch)
})
b.Run("BufferSize1000", func(b *testing.B) {
ch := make(chan int, 1000)
go func() {
for range ch {
}
}()
b.ResetTimer()
for i := 0; i < b.N; i++ {
ch <- i
}
close(ch)
})
}
func main() {
testing.Main(
func(pat, str string) (bool, error) { return true, nil },
nil, nil, nil,
[]testing.Benchmark{},
)
}Typical output:
BenchmarkChannelThroughput/Unbuffered-8 50000000 22 ns/op
BenchmarkChannelThroughput/BufferSize1-8 100000000 15 ns/op
BenchmarkChannelThroughput/BufferSize10-8 150000000 12 ns/op
BenchmarkChannelThroughput/BufferSize100-8 200000000 10 ns/op
BenchmarkChannelThroughput/BufferSize1000-8 200000000 9 ns/opLarger buffers reduce lock contention, improving throughput.
When to Use Channels vs Alternatives
Use Channels For:
- Goroutine communication — Channels are the idiomatic way
- Signaling — Done channels, timeout channels
- Work distribution — Worker pool patterns
// Good use: communicating between goroutines
func worker(jobs <-chan Job, results chan<- Result) {
for job := range jobs {
results <- process(job)
}
}Use Mutexes For:
- Protecting shared state — Concurrent map access, counters
- Fine-grained locking — Performance-critical sections
// Good use: protecting a map
type Cache struct {
mu sync.RWMutex
items map[string]string
}
func (c *Cache) Set(key, value string) {
c.mu.Lock()
c.items[key] = value
c.mu.Unlock()
}Use Atomics For:
- Simple counters — Goroutine-safe increment/decrement
- Flags — Done signals, state flags
// Good use: atomic counter
var requests int64
func increment() {
atomic.AddInt64(&requests, 1)
}Summary
Channels are sophisticated primitives with well-designed internals:
- hchan structure manages a circular buffer and waiting goroutines
- Ring buffer with
sendxandrecvxpointers - Direct send optimization copies between sender and receiver stacks
- Lock protects all operations (bottleneck at high concurrency)
- select randomizes cases and parks goroutines on multiple channels
- Nil channels useful for disabling select cases
- Unbuffered channels force rendezvous; buffered channels decouple sender and receiver
- Mutex is faster for simple operations; channels are faster for communication patterns
Understanding these internals helps you design more efficient concurrent Go programs.