Master CPU, memory, and goroutine profiling using Go's pprof tool to identify performance bottlenecks

Profiling with pprof

Go's pprof is one of the most powerful tools for identifying performance bottlenecks. Whether you're investigating high CPU usage, memory leaks, or goroutine explosions, pprof provides detailed insights into your application's runtime behavior.

Understanding pprof Basics

The pprof tool is built into Go and available through two primary interfaces: the runtime/pprof package for programmatic profiling and net/http/pprof for HTTP-based profiling in running services.

runtime/pprof Package

The runtime/pprof package provides low-level profiling capabilities for standalone applications:

package main

import (
    "fmt"
    "os"
    "runtime/pprof"
)

func main() {
    // Start CPU profiling
    cpuFile, err := os.Create("cpu.prof")
    if err != nil {
        panic(err)
    }
    defer cpuFile.Close()

    if err := pprof.StartCPUProfile(cpuFile); err != nil {
        panic(err)
    }
    defer pprof.StopCPUProfile()

    // Your application code here
    expensiveOperation()

    fmt.Println("CPU profile written to cpu.prof")
}

func expensiveOperation() {
    sum := 0
    for i := 0; i < 100000000; i++ {
        sum += i
    }
}

Run with go run main.go and analyze the profile with go tool pprof cpu.prof.

net/http/pprof Integration

For HTTP services, net/http/pprof provides zero-cost profiling endpoints:

package main

import (
    "fmt"
    "net/http"
    _ "net/http/pprof"
    "time"
)

func handler(w http.ResponseWriter, r *http.Request) {
    time.Sleep(100 * time.Millisecond)
    fmt.Fprintf(w, "Hello World\n")
}

func main() {
    http.HandleFunc("/api/hello", handler)

    // pprof endpoints automatically registered at:
    // /debug/pprof/
    // /debug/pprof/profile (CPU)
    // /debug/pprof/heap (Memory)
    // /debug/pprof/goroutine
    // /debug/pprof/mutex
    // /debug/pprof/block

    if err := http.ListenAndServe(":6060", nil); err != nil {
        panic(err)
    }
}

Visit http://localhost:6060/debug/pprof/ to explore available profiles.

CPU Profiling

CPU profiling identifies which functions consume the most processing time. It uses statistical sampling to determine where your program spends CPU cycles.

Collecting CPU Profiles

package main

import (
    "fmt"
    "os"
    "runtime/pprof"
)

func fibonacci(n int) int {
    if n <= 1 {
        return n
    }
    return fibonacci(n-1) + fibonacci(n-2)
}

func main() {
    cpuFile, _ := os.Create("cpu.prof")
    defer cpuFile.Close()
    pprof.StartCPUProfile(cpuFile)
    defer pprof.StopCPUProfile()

    // Run intensive computation
    for i := 0; i < 100; i++ {
        fibonacci(30)
    }

    fmt.Println("Done")
}

Analyzing CPU Profiles

go tool pprof cpu.prof

Interactive pprof commands:

top: Show top 10 functions by CPU time
list <function>: View source code with CPU time per line
web: Generate graph visualization (requires graphviz)
peek <function>: Brief info about a function
pdf: Export as PDF

Example session:

$ go tool pprof cpu.prof
File: main
Type: cpu
Time: Jan 10 2025 at 10:00am (1s total)
Entering interactive mode (type "help" for commands)

(pprof) top
Showing nodes accounting for 900ms, 90% of 1000ms total
Showing top 10 nodes out of 15
      flat  flat%   sum%        cum   cum%
     500ms 50.0% 50.0%     500ms 50.0%  main.fibonacci
     300ms 30.0% 80.0%     800ms 80.0%  main.expensiveCompute
     100ms 10.0% 90.0%     100ms 10.0%  runtime.gcAssistant

(pprof) list fibonacci
Total: 1000ms
ROUTINE ======================== main.fibonacci in /app/main.go
     500ms     800ms (flat, cum) 80.0% of Total
        .       .     3:func fibonacci(n int) int {
     300ms     300ms     4:	if n <= 1 {
     200ms     200ms     5:		return n
        .     300ms     6:	}
        .     500ms     7:	return fibonacci(n-1) + fibonacci(n-2)

Tip: For live HTTP services, capture CPU profiles without stopping the service:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

Memory Profiling

Memory profiling tracks heap allocations, helping identify memory leaks and excessive allocation patterns.

Heap Allocation Profiling

package main

import (
    "fmt"
    "os"
    "runtime"
    "runtime/pprof"
)

func leakyFunction(iterations int) {
    for i := 0; i < iterations; i++ {
        // Allocate but forget to free - creates memory pressure
        _ = make([]byte, 1024*1024) // 1MB allocation
    }
}

func efficientFunction(iterations int) {
    buf := make([]byte, 0, 10)
    for i := 0; i < iterations; i++ {
        buf = buf[:0] // Reuse buffer
        buf = append(buf, byte(i))
    }
}

func main() {
    memFile, _ := os.Create("mem.prof")
    defer memFile.Close()

    // Run operations
    runtime.GC()
    if err := pprof.WriteHeapProfile(memFile); err != nil {
        panic(err)
    }

    leakyFunction(1000)

    if err := pprof.WriteHeapProfile(memFile); err != nil {
        panic(err)
    }

    fmt.Println("Heap profile written")
}

Profile Types: alloc_space vs inuse_space

$ go tool pprof -alloc_space mem.prof  # Total allocated memory
$ go tool pprof -inuse_space mem.prof  # Currently allocated memory

The critical difference:

alloc_space: Total memory ever allocated (includes freed memory)
inuse_space: Active allocations right now (true memory usage)
alloc_objects: Number of allocations made
inuse_objects: Current number of objects

package main

import (
    "fmt"
    "os"
    "runtime"
    "runtime/pprof"
)

func demonstrateAllocation() {
    // These allocations will be freed
    for i := 0; i < 10000; i++ {
        data := make([]byte, 1024)
        _ = data // Use to prevent optimization
    }
    // Total alloc_space is high, but inuse_space is low
}

func main() {
    f, _ := os.Create("heap.prof")
    defer f.Close()

    runtime.GC()
    demonstrateAllocation()
    runtime.GC()

    pprof.WriteHeapProfile(f)
    fmt.Println("Heap profile created")
}

Goroutine Profiling

Identify goroutine leaks and contention:

package main

import (
    "fmt"
    "os"
    "runtime"
    "runtime/pprof"
    "time"
)

func leakyWorker(done <-chan struct{}) {
    for {
        select {
        case <-done:
            return
        default:
            time.Sleep(1 * time.Millisecond)
        }
    }
}

func main() {
    // Start goroutines but forget to stop some
    for i := 0; i < 100; i++ {
        go leakyWorker(make(chan struct{})) // Channel never closed!
    }

    time.Sleep(1 * time.Second)

    // Write goroutine profile
    f, _ := os.Create("goroutine.prof")
    defer f.Close()
    pprof.Lookup("goroutine").WriteTo(f, 0)

    fmt.Printf("Active goroutines: %d\n", runtime.NumGoroutine())
}

Analyze with:

go tool pprof goroutine.prof
(pprof) top
(pprof) list leakyWorker

Block and Mutex Profiling

Block Profiling

Identifies goroutine blocking on channels and locks:

package main

import (
    "fmt"
    "os"
    "runtime"
    "runtime/pprof"
    "sync"
)

func main() {
    runtime.SetBlockProfileRate(1)

    var wg sync.WaitGroup
    ch := make(chan int, 1)

    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            // These will block waiting to send
            ch <- 42
        }()
    }

    wg.Wait()

    f, _ := os.Create("block.prof")
    defer f.Close()
    pprof.Lookup("block").WriteTo(f, 0)
}

Mutex Profiling

Track lock contention:

package main

import (
    "fmt"
    "os"
    "runtime"
    "runtime/pprof"
    "sync"
)

func main() {
    runtime.SetMutexProfileFraction(1)

    var mu sync.Mutex
    var wg sync.WaitGroup

    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            mu.Lock()
            mu.Unlock()
        }()
    }

    wg.Wait()

    f, _ := os.Create("mutex.prof")
    defer f.Close()
    pprof.Lookup("mutex").WriteTo(f, 0)
}

Reading Flame Graphs

Flame graphs visualize call stacks over time:

go tool pprof -http=:8080 cpu.prof

This opens a web UI where you can:

View "Flame Graph" tab for interactive visualization
Click sections to zoom into specific functions
Hover to see function names and percentages
Use "View Options" to change visualization style

The x-axis represents total CPU time, height represents call depth.

pprof Web UI

When analyzing HTTP profiles, access the built-in web UI:

go tool pprof http://localhost:6060/debug/pprof/heap

Available views:

Graph: Call graph with percentages
Flame Graph: Interactive stack visualization
Top: Function list by metric
Source: Annotated source code
Disasm: Assembly code with profiling data

Production Profiling

Continuously profile live systems safely:

package main

import (
    "log"
    "net/http"
    _ "net/http/pprof"
    "runtime"
    "runtime/pprof"
    "time"
)

func init() {
    // Write profiles periodically
    go func() {
        ticker := time.NewTicker(1 * time.Hour)
        defer ticker.Stop()

        for range ticker.C {
            writeProfiles()
        }
    }()
}

func writeProfiles() {
    timestamp := time.Now().Format("2006-01-02-15-04-05")

    // Memory profile
    memFile, _ := os.Create(fmt.Sprintf("profiles/heap-%s.prof", timestamp))
    pprof.WriteHeapProfile(memFile)
    memFile.Close()

    // Goroutine profile
    gorFile, _ := os.Create(fmt.Sprintf("profiles/goroutine-%s.prof", timestamp))
    pprof.Lookup("goroutine").WriteTo(gorFile, 0)
    gorFile.Close()
}

func main() {
    go http.ListenAndServe(":6060", nil)

    // Application logic
    select {}
}

Practical Workflow: Investigating a Slow Endpoint

Step 1: Identify the Problem

package main

import (
    "fmt"
    "net/http"
    _ "net/http/pprof"
    "time"
)

func slowEndpoint(w http.ResponseWriter, r *http.Request) {
    // Simulate slow operation
    sum := 0
    for i := 0; i < 500000000; i++ {
        sum += i
    }
    fmt.Fprintf(w, "Result: %d\n", sum)
}

func main() {
    http.HandleFunc("/slow", slowEndpoint)
    http.ListenAndServe(":8080", nil)
}

Step 2: Capture CPU Profile

# Collect 30-second profile
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30

# Wait for requests to complete, then analyze
(pprof) top10
(pprof) list slowEndpoint

Step 3: Identify Bottleneck

The list command shows CPU time per line, revealing exact hotspots in your code.

Step 4: Optimize and Verify

func slowEndpoint(w http.ResponseWriter, r *http.Request) {
    // Use formula instead of loop: sum = n*(n-1)/2
    n := 500000000
    sum := n * (n - 1) / 2
    fmt.Fprintf(w, "Result: %d\n", sum)
}

Capture new profile to confirm improvement.

Common pprof Patterns

Find memory leaks:

go tool pprof -base=baseline.prof current.prof

Compare profiles:

go tool pprof -http=:8080 old.prof new.prof

Export data:

go tool pprof -svg cpu.prof > cpu.svg
go tool pprof -pdf cpu.prof > cpu.pdf

Filter results:

(pprof) focus handler
(pprof) ignore runtime

Key Takeaways

Use net/http/pprof in production for zero-cost profiling
CPU profiling identifies compute bottlenecks
Heap profiling distinguishes between alloc_space (total) and inuse_space (current)
Goroutine profiling detects leaks
Block/mutex profiling reveals synchronization issues
Flame graphs provide intuitive visualization
Always profile with real workloads before optimizing

Profiling with pprof

On this page