Compiler Pipeline and SSA Optimization
How the Go compiler transforms source code through lexing, parsing, type-checking, SSA generation, optimization passes, and machine code emission — and how understanding this pipeline helps write faster code.
Introduction
The Go compiler is a sophisticated multi-phase pipeline that transforms human-readable source code into efficient machine instructions. Understanding this pipeline—particularly the Static Single Assignment (SSA) form and its optimization passes—is essential for writing performant Go code. This article explores each phase in depth, with practical examples and insights that help you write code the compiler can optimize effectively.
The Go compiler is written in Go itself (for versions 1.5+) and lives in the cmd/compile directory of the Go repository. Unlike C compilers that often operate on a single intermediate representation, the Go compiler transforms code through multiple distinct representations, each optimized for specific tasks.
Compiler Architecture Overview
The Go compiler pipeline follows a classic compiler design with six major phases:
Source Code
↓
[Phase 1: Lexing & Parsing] → AST
↓
[Phase 2: Type Checking] → Typed AST
↓
[Phase 3: AST → IR] → Intermediate Representation
↓
[Phase 4: IR → SSA] → Static Single Assignment Form
↓
[Phase 5: SSA Optimizations] → Optimized SSA
↓
[Phase 6: SSA → Machine Code] → Binary
↓
Final ExecutableEach phase is independent, allowing incremental improvements without redesigning the entire compiler. The modular design has proven invaluable for Go's evolution—optimizations added in one phase don't require rewriting others.
Phase 1: Lexing and Parsing → AST
Lexical Analysis (Lexing)
The lexer (syntax/scanner.go) converts raw source characters into tokens. This is a straightforward phase: read bytes, recognize patterns (keywords, identifiers, operators, literals), and emit token streams.
// Example: lexing this Go code
x := 42
if x > 40 {
fmt.Println(x)
}
// Produces tokens like:
// IDENT(x) ASSIGN INT(42) IF IDENT(x) GT INT(40) LBRACE ...The lexer handles Go's unique lexical rules:
- Automatic semicolon insertion based on newlines
- Comments (line and block)
- Raw string literals (backtick strings)
- Rune literals with escape sequences
Performance note: The lexer is rarely a bottleneck. Go's simple syntax means tokenization is linear and very fast.
Parsing
The parser (syntax/parser.go) consumes the token stream and builds an Abstract Syntax Tree (AST). It uses recursive descent parsing—predictable, easy to understand, and sufficient for Go's context-free grammar.
// Input tokens: IDENT(x) ASSIGN INT(42)
// Parser produces AST node:
// AssignStmt {
// Lhs: [Ident("x")]
// Rhs: [BasicLit(42)]
// }The AST preserves all syntactic structure but loses some information (like parentheses in expressions—the tree structure handles precedence). This is intentional; the AST is meant to be easy to walk and analyze.
Key AST node types:
DeclStmtfor declarationsFuncDeclfor function definitionsBlockStmtfor compound statementsIfStmt,ForStmt,SwitchStmtfor control flowCallExprfor function callsBinaryExpr,UnaryExprfor operations
Phase 2: Type Checking and Name Resolution
The types2 Package
After parsing, the compiler performs semantic analysis in types2/ (unified type checker used by both gc and gopls). This phase:
- Resolves names: Maps identifiers to their declarations
- Checks types: Ensures operations are valid (can't add string + int)
- Infers types: Determines untyped constant types
- Reports errors: Type mismatches, undefined variables, etc.
// Input AST:
// AssignStmt { Lhs: [Ident("x")], Rhs: [BasicLit(42)] }
//
// Type checker:
// - Resolves x to a new variable declaration
// - Infers that 42 is an untyped integer constant
// - Infers x's type as `int` (from assignment)
// - Produces typed AST with type information attachedThe output is a fully typed AST—every expression node knows its type. This information is crucial for the next phases.
Important Type Information
The type checker also computes:
- Method sets (which methods a type has)
- Interface satisfaction (does type T satisfy interface I?)
- Pointer vs. value receivers for methods
- Exported vs. unexported identifiers
This phase is responsible for the strict type safety Go is known for.
Phase 3: AST → IR (Intermediate Representation)
The noder and irgen Passes
The noder package (cmd/compile/internal/noder) converts the typed AST into a compiler-specific intermediate representation (IR). The irgen package then transforms that into the ir package's node types.
Typed AST (from types2)
↓
IR nodes (ir/node.go, ir/expr.go, ir/stmt.go)
↓
Function bodies ready for SSA conversionThe IR is more compiler-centric than the AST. It includes:
- Function signatures with calling conventions
- Variable scopes and lifetimes
- Memory effects of operations
- Explicit conversions and type assertions
Example transformation:
// AST: if x > 40 { ... }
// Type-checked AST has type info
// IR (simplified):
// IfStmt {
// cond: CompareOp(GreaterThan, x, 40),
// body: BlockStmt { ... }
// }This IR is higher-level than SSA but lower-level than the original syntax. It's used for:
- Generating escape analysis information
- Computing inline costs
- Preparing function bodies for inlining
- Creating the foundation for SSA generation
Phase 4: IR → SSA (Static Single Assignment Form)
What is SSA?
Static Single Assignment is a form of intermediate representation where every variable is assigned exactly once. This constraint is powerful for optimization.
In traditional code:
x = 5
x = x + 3 // x assigned twice
y = x // which x? the first or second?In SSA:
x₁ = 5
x₂ = x₁ + 3 // new variable x₂
y = x₂ // unambiguousNotice subscripts: each assignment creates a new "version" of the variable. SSA makes data flow explicit and enables powerful optimizations.
SSA Building Blocks: Values, Blocks, Operations
SSA in Go (cmd/compile/internal/ssa) consists of:
-
Values: Individual computations
- Constants (immediate values)
- Variables (memory locations)
- Operations (arithmetic, comparisons, calls)
-
Blocks: Sequential instruction lists
- Control flow between blocks via jumps/branches
- No jumps within a block (jumps are block-level)
-
Operations (Ops): Primitive computations
OpAdd(addition)OpLoad(memory load)OpStore(memory store)OpCall(function call)- 200+ different operations
Example SSA structure:
// Input function:
func abs(x int) int {
if x < 0 {
return -x
}
return x
}
// Becomes (simplified SSA):
// Block0: entry
// v0 = Arg(x) // parameter x
// v1 = LessThan(v0, 0)
// Branch v1 -> Block1, Block2
//
// Block1: negative case
// v2 = Neg(v0)
// Ret v2
//
// Block2: positive case
// Ret v0Each block ends with a control-flow operation (Branch, Ret, Jump, etc.). Values only flow forward—never backward. This is the "Static" in SSA.
Phi Functions: Merging Values from Multiple Paths
When control flow paths merge, we need to decide which version of a variable to use. SSA uses Phi (φ) functions:
// Input:
func max(x, y int) int {
var result int
if x > y {
result = x
} else {
result = y
}
return result
}
// SSA with Phi:
// Block0: entry
// v0 = Arg(x)
// v1 = Arg(y)
// v2 = GreaterThan(v0, v1)
// Branch v2 -> Block1, Block2
//
// Block1: true branch
// Ret v0
//
// Block2: false branch
// Ret v1
//
// (Note: In this case, we return directly, so no Phi needed.
// But if we assigned to a variable, it would use Phi.)Phi functions handle cases like:
// Input:
if cond {
result = a
} else {
result = b
}
use(result) // which a or b?
// SSA:
// Block1: then
// Goto Block3
// Block2: else
// Goto Block3
// Block3: merge
// v3 = Phi(a, b) // join pointThe Phi function selects between its inputs based on which predecessor block was executed.
Phase 5: SSA Optimization Passes
This is where the compiler makes code fast. The SSA form enables a suite of powerful optimizations. They run in sequence, with earlier optimizations enabling later ones.
Optimization Pipeline
SSA (from Phase 4)
↓
[pass: BlockCombine] → Merge empty blocks
↓
[pass: Dead Code Elimination] → Remove unused values
↓
[pass: Prove] → Bounds check elimination, nil check elimination
↓
[pass: Copyelim] → Remove copies
↓
[pass: Simple Phi elimination] → Remove trivial phis
↓
[pass: Common Subexpression Elimination] → Share identical computations
↓
[pass: Inlining] → Expand function calls
↓
[pass: Escape Analysis] → Determine which values escape
↓
[pass: Copy Propagation] → Remove intermediate variables
↓
[pass: Dead Store Elimination] → Remove unused writes
↓
[pass: Nilcheck Insertion] → Insert safety checks (if needed)
↓
[pass: Fuse] → Combine operations
↓
[pass: Dupe Block] → Duplicate blocks for optimization
↓
[pass: Deadcode] → Final sweep
↓
[pass: Regalloc] → Assign variables to registers
↓
Optimized SSALet's examine the most impactful optimizations:
Dead Code Elimination (DCE)
Removes values that are computed but never used.
// Input:
func compute(x int) int {
y := x * 2 // computed
z := y + 5 // computed but z is never used
return x
}
// Before DCE:
// v0 = Arg(x)
// v1 = Mul(v0, 2)
// v2 = Add(v1, 5) // z unused
// Ret v0
// After DCE:
// v0 = Arg(x)
// Ret v0The entire multiplication and addition are eliminated because z is never used. DCE runs multiple times—early DCE exposes opportunities for later passes, which may enable even more DCE.
Common Subexpression Elimination (CSE)
Identifies identical computations and reuses results.
// Input:
func redundant(x, y int) int {
a := x + y
// ... code using a ...
b := x + y // identical to a
return a + b
}
// Before CSE:
// v0 = Arg(x)
// v1 = Arg(y)
// v2 = Add(v0, v1) // first addition
// ... use v2 ...
// v3 = Add(v0, v1) // identical addition
// v4 = Add(v2, v3)
// Ret v4
// After CSE:
// v0 = Arg(x)
// v1 = Arg(y)
// v2 = Add(v0, v1) // computed once
// ... use v2 ...
// v4 = Add(v2, v2) // reuse v2
// Ret v4CSE is block-local by default but can be extended across blocks with care.
Constant Folding and Propagation
Evaluates constant expressions at compile time.
// Input:
func constants() int {
a := 10
b := 20
c := a + b
return c
}
// Before constant folding:
// v0 = Const(10)
// v1 = Const(20)
// v2 = Add(v0, v1)
// Ret v2
// After constant folding:
// Ret Const(30)Constants flow through operations. If we know the inputs are constants, we compute the output at compile time.
Bounds Check Elimination (Prove Pass)
The prove pass uses mathematical reasoning to eliminate redundant bounds checks.
// Input:
func safe(arr []int, i int) int {
if i < len(arr) {
return arr[i] // bounds check here
}
return 0
}
// Before prove:
// v0 = Arg(arr)
// v1 = Arg(i)
// v2 = IfLessThan(v1, Len(v0))
// Branch v2 -> Block1, Block2
// Block1:
// v3 = Bounds(v1, Len(v0)) // safety check
// v4 = Load(v0, v3)
// Ret v4
// Block2:
// Ret Const(0)
// After prove:
// Block1:
// v4 = Load(v0, v1) // bounds check eliminated!
// Ret v4Since we've checked i < len(arr) before loading, the bounds check is redundant. The prove pass tracks constraints through the CFG and determines which checks can be safely eliminated. This is one of the most impactful optimizations for hot loops accessing slices and arrays.
For details on bounds check elimination, see our dedicated BCE article.
Nil Check Elimination
Similar to bounds check elimination, but for pointer operations.
// Input:
func deref(p *int) int {
if p != nil {
return *p // nil check here
}
return 0
}
// Before nilcheck elimination:
// v0 = Arg(p)
// v1 = IfNotNil(v0)
// Branch v1 -> Block1, Block2
// Block1:
// v2 = Nilcheck(v0) // safety check
// v3 = Load(v2)
// Ret v3
// After nilcheck elimination:
// Block1:
// v3 = Load(v0) // check eliminated
// Ret v3The compiler proves the pointer is non-nil and removes the check.
Copy Propagation
Eliminates intermediate variables that just copy values.
// Input:
func copies(x int) int {
y := x
z := y
return z
}
// Before copy propagation:
// v0 = Arg(x)
// v1 = Copy(v0) // y = x
// v2 = Copy(v1) // z = y
// Ret v2
// After copy propagation:
// v0 = Arg(x)
// Ret v0By tracking that v2 only copies from v1 which only copies from v0, the compiler can eliminate the intermediate steps.
Strength Reduction
Replaces expensive operations with cheaper ones.
// Input:
func multiply(n int) int {
return n * 8
}
// Before strength reduction:
// v0 = Arg(n)
// v1 = Mul(v0, 8)
// Ret v1
// After strength reduction:
// v0 = Arg(n)
// v1 = Lsh(v0, 3) // multiply by 8 = left shift by 3
// Ret v1Bit shifts are faster than multiplication. The optimizer recognizes powers of two and uses shifts instead.
Other strength reductions:
x / 2→x >> 1x % 2→x & 1x + x→x << 1
Function Inlining
Expands function calls inline, eliminating call overhead.
// Input:
func double(x int) int { return x * 2 }
func caller(a int) int { return double(a) + double(a+1) }
// Before inlining:
// caller:
// v0 = Arg(a)
// v1 = Call(double, v0)
// v2 = Add(v0, 1)
// v3 = Call(double, v2)
// v4 = Add(v1, v3)
// Ret v4
// After inlining:
// caller:
// v0 = Arg(a)
// v1 = Mul(v0, 2) // inlined double(a)
// v2 = Add(v0, 1)
// v3 = Mul(v2, 2) // inlined double(a+1)
// v4 = Add(v1, v3)
// Ret v4Inlining is crucial for performance. It enables other optimizations by exposing more of the computation to the optimizer. See our dedicated inlining article for details on controlling and understanding inlining.
Escape Analysis Integration
Determines whether allocated values escape the function. If a value doesn't escape, the compiler can allocate it on the stack instead of the heap, avoiding garbage collection overhead.
// Input:
func allocate() *int {
x := 42
return &x // x escapes (returned)
}
// Allocation: must be heap-allocated
// &x requires pointer to x, and x outlives the function
func local() int {
x := 42
y := x + 1
return y // x doesn't escape
}
// Allocation: can be stack-allocated (or even eliminated)
// x never leaves the functionThe escape analysis pass computes escape information, which subsequent passes use to:
- Avoid allocating on the heap
- Avoid pointer indirection
- Enable more aggressive inlining
See our dedicated escape analysis article for deep analysis.
Write Barrier Elimination
Go's garbage collector uses write barriers to track heap pointers. These barriers have overhead. The writebarrierelim pass removes redundant ones.
// Input:
type Node struct {
Next *Node
}
func setup(n *Node, child *Node) {
n.Next = child // pointer assignment
}
// Before writebarrierelim:
// v0 = Arg(n)
// v1 = Arg(child)
// v2 = Write(v0, field_offset, v1) // includes barrier
// After writebarrierelim (if the barrier is unnecessary):
// v0 = Arg(n)
// v1 = Arg(child)
// v2 = Store(v0, field_offset, v1) // no barrierThe pass analyzes whether a pointer write could affect the GC invariant. If the pointer is newly allocated or other conditions hold, the barrier is unnecessary.
Register Allocation (Regalloc)
Assigns values to CPU registers, minimizing memory access.
// Before regalloc:
// v0 = Arg(x) // where is x?
// v1 = Add(v0, 5) // where is the result?
// Ret v1
// After regalloc:
// RAX = Arg(x) // x in register RAX
// RAX = Add(RAX, 5) // add to RAX
// Ret RAXRegister allocation is a complex optimization problem (NP-hard in general). Go uses a linear-scan allocator that runs in linear time, making good (though not perfect) decisions quickly.
The register allocator runs last because it must consider all previous optimizations' results.
Phase 6: SSA → Machine Code
The final phase generates machine code from optimized SSA. This happens in the ssa package and architecture-specific backends in ssa/gen*amd64.go, ssa/genarm64.go, etc.
Optimized SSA
↓
[pass: Compile] → Instruction selection
↓
[arch-specific backend] → amd64, arm64, etc.
↓
[pass: schedule] → Instruction ordering
↓
[pass: regalloc] → (re)run register allocation
↓
Machine code (assembly)
↓
Object file format (ELF, Mach-O, PE)Key steps:
-
Instruction Selection: Map SSA operations to machine instructions
OpAddon amd64 →addqinstructionOpLoad→movqinstruction- Architecture-specific patterns recognized
-
Calling Convention: Prepare function calls
- Arguments placed in correct registers/stack locations
- Return values positioned correctly
- Caller/callee-saved registers respected
-
Code Layout: Arrange code in memory
- Hot paths prefetched
- Cold paths (error handling) separated
- Jumps within cache lines when possible
-
Encoding: Convert instructions to binary
- Variable-length x86 instructions are carefully packed
- Relocations for external references
Visualizing SSA: GOSSAFUNC
To understand how your code is optimized, use GOSSAFUNC:
GOSSAFUNC=myFunction go buildThis generates ssa.html showing every SSA optimization pass.
// example.go
func abs(x int) int {
if x < 0 {
return -x
}
return x
}
// Build with:
// GOSSAFUNC=abs go buildThe HTML output shows:
- Initial SSA (right after phase 4)
- Each optimization pass in sequence
- Final machine assembly
- Differences highlighted between passes
Reading the dump:
Block0 (entry)
v0 = Arg(x) [int] # v0 is the parameter x
v1 = LessThan(v0, [0]) # compare x < 0
IfLessThan v1 -> b1 b2 # branch on comparison
Block1 (negative case)
v2 = Neg(v0) [int] # v2 = -x
Ret v2 # return -x
Block2 (positive case)
Ret v0 # return xEach line shows:
v#: value identifier- Operation type
- Arguments
- Type (in brackets)
- Comments
Watching optimizations remove values is enlightening.
Profile-Guided Optimization (PGO)
Go 1.21+ integrates PGO into the compiler pipeline. If a default.pgo file exists in the module root, the compiler uses profiling information to guide optimization decisions.
Source Code
↓
[Phases 1-4: same]
↓
[Read PGO information if available]
↓
[Phase 5: SSA Optimizations using PGO hints]
- Inlining: more aggressive for hot functions
- Bounds check elimination: prioritize hot paths
- Register allocation: optimize hot loops
↓
[Phase 6: code generation]PGO feeds into:
- Inlining heuristics: Hot functions are inlined more aggressively
- Bounds elimination: Prioritize hot paths
- Code layout: Keep hot code together
- Devirtualization: More likely to inline when call is observed
See the PGO documentation for profiling and optimization workflows.
Compiler Flags and Their Effects
Several flags affect the optimization pipeline:
-N (disable optimizations)
Disables most optimizations (except register allocation, which is required). Useful for debugging.
go build -gcflags="-N"This skips:
- Inlining
- DCE
- CSE
- Escape analysis optimizations
- Most other passes
Output code is larger and slower but easier to debug (variable names preserved).
-l (disable inlining)
Disables function inlining specifically.
go build -gcflags="-l"Useful for testing whether inlining affects performance:
# Benchmark with inlining
go test -bench=. -benchmem
# Benchmark without inlining
go test -bench=. -benchmem -gcflags="-l"
# Compare results-O (optimization level)
Go doesn't expose explicit optimization levels like C's -O2. All optimizations run by default. Flags control specific passes:
# Most aggressive (default)
go build
# No optimizations except register allocation
go build -gcflags="-N"
# No inlining
go build -gcflags="-l"
# No escape analysis optimizations (harder to control)
# Not exposed via flags, requires code changesEnvironment Variables
GOSSAFUNC: Generate SSA HTML dump
GOSSAFUNC=funcname go buildGOGC: GC trigger percentage (affects runtime, not compiler)
GOGC=75 ./program # trigger GC every 75% heap growthGODEBUG: Various debug flags
GODEBUG=gcpause=500 ./program # GC pauses every 500msWriting Compiler-Friendly Code
Understanding the pipeline helps you write code the compiler can optimize:
1. Bounds Check Elimination
Help the compiler prove bounds checks are redundant:
// BAD: compiler can't prove the check is redundant
func sumSlice(arr []int) int {
total := 0
for _, v := range arr {
total += v
}
return total
}
// GOOD: explicit bound check allows elimination
func sumSliceRange(arr []int, start, end int) int {
if start < 0 || end > len(arr) || start > end {
return 0 // or panic
}
total := 0
for i := start; i < end; i++ {
total += arr[i] // bounds check eliminated!
}
return total
}
// Even better: preallocate with capacity
func readInto(buf []int, limit int) error {
if limit > len(buf) {
return errors.New("limit exceeds buffer")
}
// Now the compiler knows indices 0..limit-1 are safe
for i := 0; i < limit; i++ {
buf[i] = rand.Intn(100)
}
return nil
}2. Keep Functions Small for Inlining
Small functions are more likely to be inlined, exposing optimization opportunities:
// BAD: Complex function unlikely to be inlined
func process(x int) int {
if x < 0 {
return 0
}
result := x * 2
// ... 20 more lines ...
return result
}
// GOOD: Simple helpers are inlined
func double(x int) int { return x * 2 }
func isPositive(x int) bool { return x > 0 }
func process(x int) int {
if !isPositive(x) {
return 0
}
return double(x)
}Inlining thresholds are roughly 80 IR nodes. Keep critical hot-path functions below this.
3. Avoid Allocations in Hot Loops
Allocations that escape require GC overhead. Stack allocations are free:
// BAD: allocation in loop
func sumMany(slices [][]int) int {
total := 0
for _, slice := range slices {
temp := make([]int, len(slice)) // heap allocation in loop
copy(temp, slice)
for _, v := range temp {
total += v
}
}
return total
}
// GOOD: allocation outside loop
func sumMany(slices [][]int) int {
total := 0
temp := make([]int, 0, 1000) // allocate once
for _, slice := range slices {
temp = temp[:len(slice)]
copy(temp, slice)
for _, v := range temp {
total += v
}
}
return total
}4. Use Constants for Known Values
Constant propagation eliminates computation:
// BAD: computation at runtime
const (
MB = 1024 * 1024 // computed once at compile time
pageSize = 4096
)
// At runtime, MB and pageSize are constants, zero cost
// Avoid:
var bufferSize = 1024 * 1024 // computed at initialization5. Help the Prove Pass Eliminate Checks
Explicit bounds checks before loops:
// BAD: compiler can't eliminate checks in loop
func processArray(arr []int) {
for i := 0; i < len(arr); i++ {
_ = arr[i] // bounds check in every iteration
}
}
// GOOD: explicit check before loop
func processArray(arr []int) {
if len(arr) > 0 {
for i := 0; i < len(arr); i++ {
_ = arr[i] // bounds check eliminated!
}
}
}Actually, the compiler is smart about the simple case. Better example:
// TRICKY: compiler struggles here
func sumRange(arr []int, start, end int) int {
total := 0
for i := start; i < end; i++ {
total += arr[i] // is i < len(arr)?
}
return total
}
// GOOD: explicit proof before loop
func sumRange(arr []int, start, end int) int {
if start < 0 || end > len(arr) || start > end {
return 0
}
total := 0
for i := start; i < end; i++ {
total += arr[i] // bounds check eliminated!
}
return total
}Practical Examples: Observing Optimizations
Let's build several examples and examine their SSA to understand optimizations in action.
Example 1: Dead Code Elimination
package main
func unused(x int) int {
y := x + 5
z := x * 2 // computed but unused
return y
}Build and examine:
GOSSAFUNC=unused go buildLook at the SSA dump. You'll see z computed initially, then eliminated by DCE. The final assembly won't include the multiplication.
Example 2: Constant Folding
package main
func constants() int {
a := 10
b := 20
c := a + b
d := c * 2
return d
}The compiler evaluates 10 + 20 * 2 = 60 at compile time. The function returns a constant.
Example 3: Bounds Check Elimination
package main
func sumFirst(arr []int, n int) int {
if n <= 0 || n > len(arr) {
return 0
}
sum := 0
for i := 0; i < n; i++ {
sum += arr[i] // can bounds check be eliminated?
}
return sum
}The prove pass should eliminate the bounds check inside the loop since it's proven i < n <= len(arr).
Example 4: Common Subexpression Elimination
package main
func redundant(x, y int) int {
a := x + y
b := x + y // same computation
c := a + b
return c
}CSE eliminates the second addition, reusing the first result.
Summary
The Go compiler is a well-engineered pipeline transforming source code through multiple representations, each optimized for specific tasks:
- Lexing/Parsing produces an AST
- Type checking enriches the AST with type information
- IR generation prepares for optimization
- SSA conversion enables powerful optimizations
- SSA passes optimize for performance
- Code generation produces binary instructions
Understanding this pipeline—especially SSA and optimization passes—helps you write code the compiler can optimize effectively. Use GOSSAFUNC to visualize, keep functions small, help the prove pass, and avoid allocations in hot loops.
The compiler is sophisticated, but it's not magic. By understanding what it can and cannot do, you can write genuinely faster Go code.