Function Inlining

Understand how Go's compiler inlines functions to eliminate call overhead and enable further optimizations like escape analysis and bounds check elimination.

What is Function Inlining?

Function inlining is a compiler optimization where a function call is replaced with the actual function body. Instead of jumping to a function's address, executing its code, and returning, the compiler copies the function's code directly into the call site.

// Original code
func add(a, b int) int {
    return a + b
}

func main() {
    result := add(5, 3)  // Function call
}

// After inlining
func main() {
    result := 5 + 3  // Function body inlined
}

While this sounds simple, inlining has profound implications for performance and optimization.

Why Inlining Matters

1. Eliminating Call Overhead

Each function call has a cost:

Saving the return address on the stack
Adjusting the stack pointer (for non-leaf functions)
Spilling/restoring registers across the call boundary
Fetching the first instruction of the called function (instruction cache miss risk)

On modern CPUs, a simple function call costs 1-5 nanoseconds. For functions that do minimal work, the call overhead can exceed the computation time.

// Without inlining
func min(a, b int) int {
    if a < b {
        return a
    }
    return b
}

// Calling min 1 billion times:
// Cost: 1 billion calls × 3-5 ns/call = 3-5 seconds (just call overhead!)

2. Enabling Further Optimizations

Inlining creates new optimization opportunities:

Escape Analysis: When a function is inlined, the compiler can perform more accurate escape analysis on variables allocated within the inlined function.

// Without inlining, compiler must assume slice escapes
func makeBuffer(n int) []byte {
    return make([]byte, n)  // Might escape to heap
}

// With inlining, compiler sees actual usage
result := makeBuffer(100)
// If the buffer is never returned or stored, compiler can stack-allocate it

Constant Folding: Inlining reveals constant values that the compiler can fold.

func getBuffer() []byte {
    return make([]byte, 1024)
}

// After inlining with constant folding:
// buf := make([]byte, 1024) → compiler allocates exactly 1024 bytes

Bounds Check Elimination: Inlining can create the patterns needed for BCE.

func safeAccess(s []int) int {
    if len(s) >= 10 {
        return s[9]  // After inlining, surrounding context enables BCE
    }
    return 0
}

Go's Inlining Budget: The Cost Model

The Go compiler uses a budget system to decide whether inlining a function is worthwhile. The budget is based on the function's AST (Abstract Syntax Tree) node count.

How the Budget Works

The compiler counts the number of AST nodes in a function. Each node has a cost:

Simple operations (assignments, arithmetic): 1 node each
Control flow (if, for, switch): higher cost
Function calls: higher cost

The current default threshold is 80 AST nodes (as of Go 1.21). Functions smaller than this budget can be inlined.

Let's examine some functions:

// Very small - easily inlined (cost ≈ 5 nodes)
func isPositive(x int) bool {
    return x > 0
}

// Medium - likely inlined (cost ≈ 30 nodes)
func parseHeader(data []byte) (magic uint32, version uint8, err error) {
    if len(data) < 5 {
        return 0, 0, errors.New("too short")
    }
    magic = binary.LittleEndian.Uint32(data[0:4])
    version = data[4]
    return magic, version, nil
}

// Large - might not be inlined (cost ≈ 200+ nodes)
func complexParser(input string) (ast.Node, error) {
    // Complex parsing logic with multiple levels of if/else
    // ...extensive code...
}

Checking Inlining Decisions

To see what the compiler is inlining, use:

go build -gcflags="-m" ./...

Output example:

./main.go:5:6: can inline add
./main.go:10:9: inlining call to add
./main.go:15:6: cannot inline complexFn - function too large
./main.go:20:10: add does not escape

For more detailed information:

go build -gcflags="-m=2" ./...

This shows reasons why inlining couldn't happen:

./main.go:15:6: cannot inline complexFn - function too large (1250 nodes > 80 budget)
./main.go:20:6: cannot inline withDefer - function contains defer statement
./main.go:25:6: cannot inline withRecover - function calls recover

What Prevents Inlining

Several constructs prevent a function from being inlined, either because they add too many nodes or because inlining is unsafe.

1. Function Size

The primary limitation is function size. If a function exceeds the budget, it won't be inlined:

func processLargeData(data []byte) (result []int) {
    // 150+ lines of complex logic...
    // This function probably exceeds 80 nodes
    // It won't be inlined
}

2. Defer Statements

Functions containing defer are not inlined (in Go versions before 1.21):

func withDefer(x int) int {
    defer func() { /* cleanup */ }()
    return x + 1
}
// Not inlined because defer adds complexity

In Go 1.21+, simple defer statements might still allow inlining in some cases.

3. Calls to Built-in recover()

Functions that call recover() cannot be inlined:

func safeCall(f func()) {
    defer func() {
        if r := recover(); r != nil {
            // Handle panic
        }
    }()
    f()
}
// Not inlined

4. Go Statements

Launching goroutines prevents inlining:

func launchWorker(f func()) {
    go f()  // Makes this function non-inlineable
}

5. Closures Over Variables

Closures that capture variables prevent inlining:

func makeAdder(x int) func(int) int {
    return func(y int) int {
        return x + y  // Captures x
    }
}
// Not inlined because of closure

6. Type Switches (sometimes)

Complex type switches can exceed the inlining budget:

func handleInterface(v interface{}) string {
    switch v.(type) {
    case int:
        return "int"
    case string:
        return "string"
    case []byte:
        return "bytes"
    // ... many more cases ...
    }
    return "unknown"
}
// Might not be inlined due to size

Mid-Stack Inlining (Go 1.12+)

Traditionally, Go only inlined leaf functions (functions that don't call other functions). Mid-stack inlining (introduced in Go 1.12) allows inlining of functions that call other functions.

This enables chains of inlining:

func A() int {
    return B()
}

func B() int {
    return C()
}

func C() int {
    return 42
}

// Caller invokes A()
// A inlines to call B()
// B inlines to call C()  // Mid-stack inlining
// Final result: return 42 directly

Without mid-stack inlining, the chain would stop at B.

Compiler Directives for Inlining Control

//go:noinline

Prevent inlining of a specific function:

//go:noinline
func benchmark() int {
    // Code that should not be inlined
    sum := 0
    for i := 0; i < 1000000; i++ {
        sum += i
    }
    return sum
}

This is useful when benchmarking, where inlining would distort measurements.

//go:inline

This is not a standard directive in Go (unlike some other languages). The compiler makes its own inlining decisions based on the cost model. You cannot force inlining, only prevent it with //go:noinline.

//go:nosplit

Prevent stack overflow checking (related but different from inlining):

//go:nosplit
func criticalPath() {
    // This function won't emit stack checks
    // Useful for signal handlers, runtime code
}

Impact of Inlining on Escape Analysis

Inlining dramatically improves escape analysis by giving the compiler more context.

Example: Stack Allocation Thanks to Inlining

type Buffer struct {
    data [1024]byte
    pos  int
}

// Without inlining, Buffer would escape to heap
func NewBuffer() *Buffer {
    return &Buffer{}
}

func main() {
    b := NewBuffer()
    b.data[0] = 42
    // When NewBuffer is inlined:
    // b := &Buffer{} // Compiler sees this is used locally only
    // Buffer can be stack-allocated!
}

Example: Preventing Unnecessary Allocation

type Encoder struct {
    buf []byte
}

// With inlining, the compiler sees buf's lifetime
func newEncoder(capacity int) *Encoder {
    return &Encoder{
        buf: make([]byte, capacity),
    }
}

// In your function:
enc := newEncoder(256)
enc.buf[0] = 1
// After inlining, the compiler might stack-allocate the byte slice!

Patterns to Keep Functions Inlineable

To write code that inlines effectively:

1. Keep Helper Functions Small

// ✓ Good - easily inlined (5 nodes)
func max(a, b int) int {
    if a > b {
        return a
    }
    return b
}

// ✗ Poor - likely too large
func complexMax(a, b int) int {
    if a <= 0 || b <= 0 {
        // 50 lines of validation
    }
    // complex logic
}

2. Avoid Defer in Hot Paths

// ✗ Poor - won't inline
func processHot() {
    defer mu.Unlock()
    mu.Lock()
    // process
}

// ✓ Good - inlineable
func processHot() {
    mu.Lock()
    defer mu.Unlock()
    // process
}

3. Split Large Functions

// ✗ Poor - 500 nodes, won't inline
func processEverything(data []byte) {
    // Validation, parsing, processing, encoding - all in one function
}

// ✓ Good - each part inlineable
func validateData(data []byte) error { /*small*/ }
func parseData(data []byte) (ast, error) { /*small*/ }
func encodeResult(ast) []byte { /*small*/ }

4. Use Interface Methods Carefully

Interface methods incur dynamic dispatch, preventing some optimizations:

// ✗ Interface dispatch costs
type Reader interface {
    Read([]byte) (int, error)
}

// ✓ Direct method call - can inline
func readBuffer(r io.Reader) {
    // Still might not inline due to interface, but faster
}

Profile-Guided Optimization (PGO) - Go 1.21+

Go 1.21 introduced PGO (Profile-Guided Optimization), which uses runtime profiling data to make better inlining decisions.

How PGO Improves Inlining

The compiler builds with profiling information to understand which functions are called frequently in hot paths. It can then decide to inline functions that slightly exceed the budget if profiling shows they're called in critical code.

Using PGO

Build with profiling:

go test -cpuprofile=cpu.prof ./...

Place the profile in your source directory:

cp cpu.prof default.prof

Build with PGO enabled:

go build -o myapp

The compiler automatically uses default.prof if present.

Benchmark: Inlined vs Non-Inlined Functions

package main

import (
    "testing"
)

//go:noinline
func addNoInline(a, b int) int {
    return a + b
}

func addInline(a, b int) int {
    return a + b
}

func BenchmarkNoInline(b *testing.B) {
    result := 0
    for i := 0; i < b.N; i++ {
        result += addNoInline(i, i+1)
    }
    _ = result
}

func BenchmarkInline(b *testing.B) {
    result := 0
    for i := 0; i < b.N; i++ {
        result += addInline(i, i+1)
    }
    _ = result
}

// Complex function requiring several arguments
//go:noinline
func complexCalcNoInline(a, b, c, d, e int) int {
    return (a + b) * (c - d) / (e + 1)
}

func complexCalcInline(a, b, c, d, e int) int {
    return (a + b) * (c - d) / (e + 1)
}

func BenchmarkComplexNoInline(b *testing.B) {
    result := 0
    for i := 0; i < b.N; i++ {
        result += complexCalcNoInline(i, i+1, i+2, i+3, i+4)
    }
    _ = result
}

func BenchmarkComplexInline(b *testing.B) {
    result := 0
    for i := 0; i < b.N; i++ {
        result += complexCalcInline(i, i+1, i+2, i+3, i+4)
    }
    _ = result
}

Expected Results

On a modern system:

BenchmarkNoInline-8           300000000   4.12 ns/op
BenchmarkInline-8             1000000000  0.98 ns/op
BenchmarkComplexNoInline-8    100000000   10.8 ns/op
BenchmarkComplexInline-8      500000000   2.15 ns/op

Inlining simple functions can provide 3-5x speedup. For more complex functions, the benefit is still substantial (4-5x).

Tradeoffs: Binary Size vs Speed

Inlining comes with a cost: increased binary size. Each inlined call site gets a copy of the function body.

Measuring Binary Size Impact

go build -o app
ls -lh app

go build -gcflags="-l" -o app-no-inline  # Disable inlining
ls -lh app-no-inline

The difference can be 5-15% depending on your codebase. This is usually a worthwhile tradeoff for performance-critical applications.

Advanced Inlining Patterns

Inlining in Generic Functions

func min[T interface{ int | float64 }](a, b T) T {
    if a < b {
        return a
    }
    return b
}

// Each instantiation (min[int], min[float64]) gets its own copy
// Both can be inlined independently

Inlining in Methods

type Point struct {
    x, y float64
}

func (p Point) Distance() float64 {
    return math.Sqrt(p.x*p.x + p.y*p.y)
}

// Receiver methods are evaluated for inlining like regular functions

Checking Assembly Output

To verify inlining is happening:

go tool compile -S main.go > main.s
grep -A 20 "main\(\)" main.s

Look for the actual arithmetic operations in main's assembly. If you see CALL instructions to helper functions, they weren't inlined.

Summary

Function inlining is a critical optimization in Go that eliminates call overhead and enables further compiler optimizations. By understanding the 80-node budget and avoiding constructs that prevent inlining, you can write code that compiles to faster machine code.

Key takeaways:

Small helper functions will be inlined - keep them under 80 nodes
Inlining enables escape analysis - allowing stack allocation instead of heap
Use -m flag to verify inlining - don't assume, check!
Avoid defer and goroutines in inlineable functions - they prevent inlining
Profile-guided optimization (1.21+) improves decisions - use default.prof for hints
The tradeoff is worthwhile - 5-15% binary size increase for significant speed

The Go compiler is aggressive about inlining because it pays off consistently in performance-sensitive code.

On this page