Execution Tracing
Use Go's execution tracer to visualize goroutine scheduling, GC events, and I/O operations
Execution Tracing
While pprof shows where CPU time goes, execution tracing reveals when and why things happen. It captures goroutine scheduling, garbage collection events, network I/O, and synchronization points—essential for understanding concurrent performance issues.
Runtime/trace Fundamentals
The runtime/trace package provides low-level tracing capabilities:
package main
import (
"fmt"
"os"
"runtime/trace"
)
func main() {
// Start tracing
f, err := os.Create("trace.out")
if err != nil {
panic(err)
}
defer f.Close()
if err := trace.Start(f); err != nil {
panic(err)
}
defer trace.Stop()
// Application runs here with tracing active
doWork()
fmt.Println("Trace written to trace.out")
}
func doWork() {
// Simulate some concurrent work
done := make(chan struct{})
for i := 0; i < 100; i++ {
go func() {
defer func() { done <- struct{}{} }()
// Simulate work
sum := 0
for j := 0; j < 1000000; j++ {
sum += j
}
}()
}
// Wait for completion
for i := 0; i < 100; i++ {
<-done
}
}Run and analyze:
go run main.go
go tool trace trace.outThis opens a web-based interactive viewer in your browser.
Integration with net/http/pprof
For HTTP services, add tracing endpoints:
package main
import (
"net/http"
_ "net/http/pprof"
)
func main() {
// Tracing is available at /debug/pprof/trace
// Capture 5-second trace:
// curl http://localhost:6060/debug/pprof/trace?seconds=5 > trace.out
// go tool trace trace.out
http.HandleFunc("/api/data", func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("OK"))
})
http.ListenAndServe(":6060", nil)
}Capture traces from running services:
curl http://localhost:6060/debug/pprof/trace?seconds=10 > trace.out
go tool trace trace.outUnderstanding the Trace Viewer
The trace viewer UI displays several critical views:
Goroutine Timeline
Shows each goroutine's execution over time:
- Green: Running on CPU
- Orange: Blocked (network, locks, channels)
- Gray: Runnable but waiting for CPU
Network Blocking Profile
Displays I/O operations and their duration:
package main
import (
"io"
"net"
"os"
"runtime/trace"
)
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
// Network I/O appears in trace
conn, _ := net.Dial("tcp", "example.com:80")
defer conn.Close()
io.Copy(io.Discard, conn)
}In the trace viewer, you'll see network blocking times annotated.
GC Events
The trace captures every garbage collection:
package main
import (
"fmt"
"os"
"runtime"
"runtime/trace"
)
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
// Trigger allocations to see GC in trace
for i := 0; i < 10; i++ {
data := make([]byte, 10000000)
_ = data
}
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("GC runs: %d\n", m.NumGC)
}Trace shows:
- Mark phase duration
- Sweep phase duration
- GC pause times
- Goroutines blocked by GC
Programmatic Task and Region Tracking
Use tasks and regions to annotate your trace:
package main
import (
"context"
"fmt"
"os"
"runtime/trace"
)
func processRequest(ctx context.Context, id int) {
// Create a task for this logical unit of work
task := trace.NewTask(ctx, fmt.Sprintf("request-%d", id))
defer task.End()
// Create regions within the task
trace.WithRegion(ctx, "parseInput", func() {
// Simulate input parsing
sum := 0
for i := 0; i < 100000; i++ {
sum += i
}
})
trace.WithRegion(ctx, "processData", func() {
// Simulate processing
sum := 0
for i := 0; i < 1000000; i++ {
sum += i
}
})
trace.WithRegion(ctx, "formatOutput", func() {
// Simulate formatting
_ = fmt.Sprintf("Result: %d", 42)
})
}
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
ctx := context.Background()
// Process multiple requests to see concurrent execution
for i := 0; i < 50; i++ {
go processRequest(ctx, i)
}
// Wait for completion
select {}
}In the trace viewer, click the "Trace" tab to see execution flow with your custom regions highlighted.
Analyzing Goroutine Scheduling
Identify scheduling issues and imbalance:
package main
import (
"os"
"runtime/trace"
"sync"
)
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
var wg sync.WaitGroup
ch := make(chan int, 1) // Buffered channel creates imbalance
for i := 0; i < 100; i++ {
wg.Add(2)
go func(id int) {
defer wg.Done()
// Sender goroutines
for j := 0; j < 10000; j++ {
ch <- j
}
}(i)
go func(id int) {
defer wg.Done()
// Receiver goroutines
for j := 0; j < 10000; j++ {
<-ch
}
}(i)
}
wg.Wait()
}The trace reveals:
- Which goroutines block waiting for channel operations
- CPU utilization across cores
- Synchronization bottlenecks
Detecting Goroutine Contention
Find lock contention patterns:
package main
import (
"context"
"os"
"runtime/trace"
"sync"
"time"
)
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
var mu sync.Mutex
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
task := trace.NewTask(context.Background(), "criticalSection")
defer task.End()
mu.Lock()
defer mu.Unlock()
// Critical section
time.Sleep(1 * time.Millisecond)
}(i)
}
wg.Wait()
}In the trace:
- Orange sections show lock blocking
- Concentration of orange indicates high contention
- Look for periods where most goroutines are blocked
GC Impact Analysis
Understand how garbage collection affects performance:
package main
import (
"os"
"runtime"
"runtime/trace"
)
func allocateMemory(iterations int) {
for i := 0; i < iterations; i++ {
// Small allocations trigger frequent GC
data := make([]byte, 1024)
_ = data
}
}
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
// Show GC impact on concurrent work
done := make(chan struct{})
go func() {
defer func() { done <- struct{}{} }()
allocateMemory(100000)
}()
go func() {
defer func() { done <- struct{}{} }()
// Count operations while GC runs
count := 0
for i := 0; i < 1000000000; i++ {
count++
}
}()
<-done
<-done
var m runtime.MemStats
runtime.ReadMemStats(&m)
}The trace shows:
- GC pause frequency and duration
- Impact on other goroutines during GC
- Relationship between allocation rate and GC frequency
Practical Example: Tracing a Request Pipeline
package main
import (
"context"
"fmt"
"net/http"
"os"
"runtime/trace"
"sync"
"time"
)
func processRequest(ctx context.Context, id int) string {
task := trace.NewTask(ctx, fmt.Sprintf("request-%d", id))
defer task.End()
var result string
// Stage 1: Fetch
trace.WithRegion(ctx, "fetch", func() {
time.Sleep(10 * time.Millisecond)
result = fmt.Sprintf("data-%d", id)
})
// Stage 2: Transform
trace.WithRegion(ctx, "transform", func() {
time.Sleep(5 * time.Millisecond)
result = result + "-processed"
})
// Stage 3: Enrich (parallel sub-tasks)
trace.WithRegion(ctx, "enrich", func() {
var wg sync.WaitGroup
enrichments := make([]string, 3)
for i := 0; i < 3; i++ {
wg.Add(1)
idx := i
go func() {
defer wg.Done()
subTask := trace.NewTask(ctx, fmt.Sprintf("enrich-%d", idx))
defer subTask.End()
time.Sleep(3 * time.Millisecond)
enrichments[idx] = fmt.Sprintf("e%d", idx)
}()
}
wg.Wait()
for _, e := range enrichments {
result = result + "-" + e
}
})
// Stage 4: Serialize
trace.WithRegion(ctx, "serialize", func() {
time.Sleep(2 * time.Millisecond)
result = fmt.Sprintf(`{"data": "%s"}`, result)
})
return result
}
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
ctx := context.Background()
// Simulate concurrent requests
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
_ = processRequest(ctx, id)
}(i)
}
wg.Wait()
fmt.Println("Trace complete")
}Run and analyze:
go run main.go
go tool trace trace.outIn the viewer:
- Click on request tasks to see execution timeline
- Identify which stages are bottlenecks
- See parallelism during enrichment stage
- Observe scheduling of concurrent requests
Sampling Strategies for Production
Full tracing has overhead. Use sampling for production:
package main
import (
"math/rand"
"net/http"
_ "net/http/pprof"
"os"
"runtime/trace"
"sync/atomic"
"time"
)
var traceEnabled atomic.Bool
func enableTracing() {
if traceEnabled.Load() {
return
}
traceEnabled.Store(true)
f, _ := os.Create("prod-trace.out")
trace.Start(f)
// Stop after 10 seconds
time.AfterFunc(10*time.Second, func() {
trace.Stop()
f.Close()
traceEnabled.Store(false)
})
}
func handleRequest(w http.ResponseWriter, r *http.Request) {
// Sample 1% of requests for tracing
if rand.Float32() < 0.01 && !traceEnabled.Load() {
enableTracing()
}
w.Write([]byte("OK"))
}
func main() {
http.HandleFunc("/api", handleRequest)
http.ListenAndServe(":8080", nil)
}Correlating Traces with pprof Data
Combine tracing and profiling for complete picture:
# Capture both simultaneously
go test -bench=BenchmarkWork \
-trace=trace.out \
-cpuprofile=cpu.prof \
-benchmem
# Analyze trace for scheduling issues
go tool trace trace.out
# Analyze pprof for hotspots
go tool pprof cpu.profUse trace viewer to identify when hotspots occur, then use pprof to understand why they're expensive.
Trace Viewer Navigation Tips
- View Options: Toggle goroutines, processors, network blocking
- Zoom: Select time range to focus on specific events
- Search: Find specific goroutine IDs or task names
- Statistics: View execution stats per goroutine
- Synchronization: See channel, lock, and semaphore events
- Execution Timeline: Drag to pan, scroll to zoom on time axis
Common Trace Patterns
Runnable Goroutine Spike: Many goroutines queued for CPU
- Solution: Reduce concurrency or add backpressure
Blocking During GC: Goroutines suspended by garbage collection
- Solution: Reduce allocation rate or tune GC with
GOGC
Channel Contention: Orange blocking on channel operations
- Solution: Buffer channels or redesign communication pattern
Lock Contention: Goroutines waiting for mutexes
- Solution: Reduce critical section size or use lock-free algorithms
Key Takeaways
- Use
runtime/traceto capture detailed execution flow - Tasks and regions annotate traces with application semantics
- Trace viewer reveals scheduling inefficiencies invisible to pprof
- GC impact becomes visible in traces
- Sampling reduces production overhead
- Correlate traces with pprof for complete performance picture
- Goroutine timeline shows when (not just how much) work happens
- Identify contention on channels, locks, and I/O operations