Function Inlining
Understand how Go's compiler inlines functions to eliminate call overhead and enable further optimizations like escape analysis and bounds check elimination.
What is Function Inlining?
Function inlining is a compiler optimization where a function call is replaced with the actual function body. Instead of jumping to a function's address, executing its code, and returning, the compiler copies the function's code directly into the call site.
// Original code
func add(a, b int) int {
return a + b
}
func main() {
result := add(5, 3) // Function call
}
// After inlining
func main() {
result := 5 + 3 // Function body inlined
}While this sounds simple, inlining has profound implications for performance and optimization.
Why Inlining Matters
1. Eliminating Call Overhead
Each function call has a cost:
- Saving the return address on the stack
- Adjusting the stack pointer (for non-leaf functions)
- Spilling/restoring registers across the call boundary
- Fetching the first instruction of the called function (instruction cache miss risk)
On modern CPUs, a simple function call costs 1-5 nanoseconds. For functions that do minimal work, the call overhead can exceed the computation time.
// Without inlining
func min(a, b int) int {
if a < b {
return a
}
return b
}
// Calling min 1 billion times:
// Cost: 1 billion calls × 3-5 ns/call = 3-5 seconds (just call overhead!)2. Enabling Further Optimizations
Inlining creates new optimization opportunities:
Escape Analysis: When a function is inlined, the compiler can perform more accurate escape analysis on variables allocated within the inlined function.
// Without inlining, compiler must assume slice escapes
func makeBuffer(n int) []byte {
return make([]byte, n) // Might escape to heap
}
// With inlining, compiler sees actual usage
result := makeBuffer(100)
// If the buffer is never returned or stored, compiler can stack-allocate itConstant Folding: Inlining reveals constant values that the compiler can fold.
func getBuffer() []byte {
return make([]byte, 1024)
}
// After inlining with constant folding:
// buf := make([]byte, 1024) → compiler allocates exactly 1024 bytesBounds Check Elimination: Inlining can create the patterns needed for BCE.
func safeAccess(s []int) int {
if len(s) >= 10 {
return s[9] // After inlining, surrounding context enables BCE
}
return 0
}Go's Inlining Budget: The Cost Model
The Go compiler uses a budget system to decide whether inlining a function is worthwhile. The budget is based on the function's AST (Abstract Syntax Tree) node count.
How the Budget Works
The compiler counts the number of AST nodes in a function. Each node has a cost:
- Simple operations (assignments, arithmetic): 1 node each
- Control flow (if, for, switch): higher cost
- Function calls: higher cost
The current default threshold is 80 AST nodes (as of Go 1.21). Functions smaller than this budget can be inlined.
Let's examine some functions:
// Very small - easily inlined (cost ≈ 5 nodes)
func isPositive(x int) bool {
return x > 0
}
// Medium - likely inlined (cost ≈ 30 nodes)
func parseHeader(data []byte) (magic uint32, version uint8, err error) {
if len(data) < 5 {
return 0, 0, errors.New("too short")
}
magic = binary.LittleEndian.Uint32(data[0:4])
version = data[4]
return magic, version, nil
}
// Large - might not be inlined (cost ≈ 200+ nodes)
func complexParser(input string) (ast.Node, error) {
// Complex parsing logic with multiple levels of if/else
// ...extensive code...
}Checking Inlining Decisions
To see what the compiler is inlining, use:
go build -gcflags="-m" ./...Output example:
./main.go:5:6: can inline add
./main.go:10:9: inlining call to add
./main.go:15:6: cannot inline complexFn - function too large
./main.go:20:10: add does not escapeFor more detailed information:
go build -gcflags="-m=2" ./...This shows reasons why inlining couldn't happen:
./main.go:15:6: cannot inline complexFn - function too large (1250 nodes > 80 budget)
./main.go:20:6: cannot inline withDefer - function contains defer statement
./main.go:25:6: cannot inline withRecover - function calls recoverWhat Prevents Inlining
Several constructs prevent a function from being inlined, either because they add too many nodes or because inlining is unsafe.
1. Function Size
The primary limitation is function size. If a function exceeds the budget, it won't be inlined:
func processLargeData(data []byte) (result []int) {
// 150+ lines of complex logic...
// This function probably exceeds 80 nodes
// It won't be inlined
}2. Defer Statements
Functions containing defer are not inlined (in Go versions before 1.21):
func withDefer(x int) int {
defer func() { /* cleanup */ }()
return x + 1
}
// Not inlined because defer adds complexityIn Go 1.21+, simple defer statements might still allow inlining in some cases.
3. Calls to Built-in recover()
Functions that call recover() cannot be inlined:
func safeCall(f func()) {
defer func() {
if r := recover(); r != nil {
// Handle panic
}
}()
f()
}
// Not inlined4. Go Statements
Launching goroutines prevents inlining:
func launchWorker(f func()) {
go f() // Makes this function non-inlineable
}5. Closures Over Variables
Closures that capture variables prevent inlining:
func makeAdder(x int) func(int) int {
return func(y int) int {
return x + y // Captures x
}
}
// Not inlined because of closure6. Type Switches (sometimes)
Complex type switches can exceed the inlining budget:
func handleInterface(v interface{}) string {
switch v.(type) {
case int:
return "int"
case string:
return "string"
case []byte:
return "bytes"
// ... many more cases ...
}
return "unknown"
}
// Might not be inlined due to sizeMid-Stack Inlining (Go 1.12+)
Traditionally, Go only inlined leaf functions (functions that don't call other functions). Mid-stack inlining (introduced in Go 1.12) allows inlining of functions that call other functions.
This enables chains of inlining:
func A() int {
return B()
}
func B() int {
return C()
}
func C() int {
return 42
}
// Caller invokes A()
// A inlines to call B()
// B inlines to call C() // Mid-stack inlining
// Final result: return 42 directlyWithout mid-stack inlining, the chain would stop at B.
Compiler Directives for Inlining Control
//go:noinline
Prevent inlining of a specific function:
//go:noinline
func benchmark() int {
// Code that should not be inlined
sum := 0
for i := 0; i < 1000000; i++ {
sum += i
}
return sum
}This is useful when benchmarking, where inlining would distort measurements.
//go:inline
This is not a standard directive in Go (unlike some other languages). The compiler makes its own inlining decisions based on the cost model. You cannot force inlining, only prevent it with //go:noinline.
//go:nosplit
Prevent stack overflow checking (related but different from inlining):
//go:nosplit
func criticalPath() {
// This function won't emit stack checks
// Useful for signal handlers, runtime code
}Impact of Inlining on Escape Analysis
Inlining dramatically improves escape analysis by giving the compiler more context.
Example: Stack Allocation Thanks to Inlining
type Buffer struct {
data [1024]byte
pos int
}
// Without inlining, Buffer would escape to heap
func NewBuffer() *Buffer {
return &Buffer{}
}
func main() {
b := NewBuffer()
b.data[0] = 42
// When NewBuffer is inlined:
// b := &Buffer{} // Compiler sees this is used locally only
// Buffer can be stack-allocated!
}Example: Preventing Unnecessary Allocation
type Encoder struct {
buf []byte
}
// With inlining, the compiler sees buf's lifetime
func newEncoder(capacity int) *Encoder {
return &Encoder{
buf: make([]byte, capacity),
}
}
// In your function:
enc := newEncoder(256)
enc.buf[0] = 1
// After inlining, the compiler might stack-allocate the byte slice!Patterns to Keep Functions Inlineable
To write code that inlines effectively:
1. Keep Helper Functions Small
// ✓ Good - easily inlined (5 nodes)
func max(a, b int) int {
if a > b {
return a
}
return b
}
// ✗ Poor - likely too large
func complexMax(a, b int) int {
if a <= 0 || b <= 0 {
// 50 lines of validation
}
// complex logic
}2. Avoid Defer in Hot Paths
// ✗ Poor - won't inline
func processHot() {
defer mu.Unlock()
mu.Lock()
// process
}
// ✓ Good - inlineable
func processHot() {
mu.Lock()
defer mu.Unlock()
// process
}3. Split Large Functions
// ✗ Poor - 500 nodes, won't inline
func processEverything(data []byte) {
// Validation, parsing, processing, encoding - all in one function
}
// ✓ Good - each part inlineable
func validateData(data []byte) error { /*small*/ }
func parseData(data []byte) (ast, error) { /*small*/ }
func encodeResult(ast) []byte { /*small*/ }4. Use Interface Methods Carefully
Interface methods incur dynamic dispatch, preventing some optimizations:
// ✗ Interface dispatch costs
type Reader interface {
Read([]byte) (int, error)
}
// ✓ Direct method call - can inline
func readBuffer(r io.Reader) {
// Still might not inline due to interface, but faster
}Profile-Guided Optimization (PGO) - Go 1.21+
Go 1.21 introduced PGO (Profile-Guided Optimization), which uses runtime profiling data to make better inlining decisions.
How PGO Improves Inlining
The compiler builds with profiling information to understand which functions are called frequently in hot paths. It can then decide to inline functions that slightly exceed the budget if profiling shows they're called in critical code.
Using PGO
- Build with profiling:
go test -cpuprofile=cpu.prof ./...- Place the profile in your source directory:
cp cpu.prof default.prof- Build with PGO enabled:
go build -o myappThe compiler automatically uses default.prof if present.
Benchmark: Inlined vs Non-Inlined Functions
package main
import (
"testing"
)
//go:noinline
func addNoInline(a, b int) int {
return a + b
}
func addInline(a, b int) int {
return a + b
}
func BenchmarkNoInline(b *testing.B) {
result := 0
for i := 0; i < b.N; i++ {
result += addNoInline(i, i+1)
}
_ = result
}
func BenchmarkInline(b *testing.B) {
result := 0
for i := 0; i < b.N; i++ {
result += addInline(i, i+1)
}
_ = result
}
// Complex function requiring several arguments
//go:noinline
func complexCalcNoInline(a, b, c, d, e int) int {
return (a + b) * (c - d) / (e + 1)
}
func complexCalcInline(a, b, c, d, e int) int {
return (a + b) * (c - d) / (e + 1)
}
func BenchmarkComplexNoInline(b *testing.B) {
result := 0
for i := 0; i < b.N; i++ {
result += complexCalcNoInline(i, i+1, i+2, i+3, i+4)
}
_ = result
}
func BenchmarkComplexInline(b *testing.B) {
result := 0
for i := 0; i < b.N; i++ {
result += complexCalcInline(i, i+1, i+2, i+3, i+4)
}
_ = result
}Expected Results
On a modern system:
BenchmarkNoInline-8 300000000 4.12 ns/op
BenchmarkInline-8 1000000000 0.98 ns/op
BenchmarkComplexNoInline-8 100000000 10.8 ns/op
BenchmarkComplexInline-8 500000000 2.15 ns/opInlining simple functions can provide 3-5x speedup. For more complex functions, the benefit is still substantial (4-5x).
Tradeoffs: Binary Size vs Speed
Inlining comes with a cost: increased binary size. Each inlined call site gets a copy of the function body.
Measuring Binary Size Impact
go build -o app
ls -lh app
go build -gcflags="-l" -o app-no-inline # Disable inlining
ls -lh app-no-inlineThe difference can be 5-15% depending on your codebase. This is usually a worthwhile tradeoff for performance-critical applications.
Advanced Inlining Patterns
Inlining in Generic Functions
func min[T interface{ int | float64 }](a, b T) T {
if a < b {
return a
}
return b
}
// Each instantiation (min[int], min[float64]) gets its own copy
// Both can be inlined independentlyInlining in Methods
type Point struct {
x, y float64
}
func (p Point) Distance() float64 {
return math.Sqrt(p.x*p.x + p.y*p.y)
}
// Receiver methods are evaluated for inlining like regular functionsChecking Assembly Output
To verify inlining is happening:
go tool compile -S main.go > main.s
grep -A 20 "main\(\)" main.sLook for the actual arithmetic operations in main's assembly. If you see CALL instructions to helper functions, they weren't inlined.
Summary
Function inlining is a critical optimization in Go that eliminates call overhead and enables further compiler optimizations. By understanding the 80-node budget and avoiding constructs that prevent inlining, you can write code that compiles to faster machine code.
Key takeaways:
- Small helper functions will be inlined - keep them under 80 nodes
- Inlining enables escape analysis - allowing stack allocation instead of heap
- Use
-mflag to verify inlining - don't assume, check! - Avoid defer and goroutines in inlineable functions - they prevent inlining
- Profile-guided optimization (1.21+) improves decisions - use
default.proffor hints - The tradeoff is worthwhile - 5-15% binary size increase for significant speed
The Go compiler is aggressive about inlining because it pays off consistently in performance-sensitive code.