String Optimization in Go

Master Go string internals, concatenation patterns, and efficient string handling techniques

Strings are fundamental to Go programs, yet many developers treat them as zero-cost abstractions. This guide explores string internals and practical optimization techniques that can dramatically improve performance.

String Internals: Immutable Byte Sequences

Go strings are immutable read-only byte slices with runtime support. Understanding their structure is critical for optimization.

String Structure

// Internal representation (simplified)
type StringHeader struct {
    Data uintptr  // Pointer to byte array (immutable)
    Len  int      // Length of string
}

Unlike slices, strings have no capacity field. This immutability guarantee means:

No bounds checking on capacity
Safe to share across goroutines
Cannot be modified in-place
Any "modification" requires new allocation

String vs Byte Slice: When to Use Which

The choice between string and []byte has significant performance implications.

Conversion Costs

Converting between strings and byte slices involves allocation and copying:

// String to []byte: allocation + copy
s := "hello world"
b := []byte(s)  // Allocates and copies

// []byte to string: allocation + copy
b := []byte("hello world")
s := string(b)  // Allocates and copies

// Benchmark: conversion costs
func BenchmarkStringByteConversion(b *testing.B) {
    s := "hello world with some longer content"

    b.Run("string_to_bytes", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = []byte(s)
        }
    })

    b.Run("bytes_to_string", func(b *testing.B) {
        bs := []byte(s)
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = string(bs)
        }
    })
}

When to Use Each

Use Case	Type	Reason
API parameters	`string`	Standard, immutable, zero-copy sharing
Data modification	`[]byte`	Mutable, in-place operations
Network I/O	`[]byte`	Efficient reading into buffers
Text processing	`string`	Built-in string functions
Performance-critical paths	`[]byte`	Avoid conversion overhead

// BAD: Multiple string-byte conversions
func ProcessDataBad(data string) string {
    b := []byte(data)           // Convert to bytes
    // ... modification ...
    return string(b)            // Convert back
}

// GOOD: Use appropriate type from start
func ProcessDataGood(data []byte) []byte {
    // Modify in-place
    return data
}

String Concatenation: Avoiding O(n²) Complexity

The + operator is convenient but creates a new string on each operation, leading to quadratic complexity.

The O(n²) Problem

// CATASTROPHIC: O(n²) complexity
func ConcatenateWithPlus(words []string) string {
    result := ""
    for _, word := range words {
        result = result + word  // Creates new string each iteration
    }
    return result
}

// Example: concatenating 1000 words
// Iteration 1: copy 0 bytes + 1 word
// Iteration 2: copy 1 word + 1 word
// Iteration 3: copy 2 words + 1 word
// ...
// Iteration 1000: copy 999 words + 1 word
// Total: 0 + 1 + 2 + ... + 999 = ~500,000 byte copies for 1000 words

Solution 1: strings.Builder

The standard approach for efficient string concatenation:

import "strings"

func ConcatenateWithBuilder(words []string) string {
    var sb strings.Builder
    sb.Grow(estimateSize(words))  // Preallocate

    for _, word := range words {
        sb.WriteString(word)
    }

    return sb.String()
}

func estimateSize(words []string) int {
    size := 0
    for _, w := range words {
        size += len(w)
    }
    return size
}

Solution 2: bytes.Buffer

Similar to Builder but with additional capabilities:

import "bytes"

func ConcatenateWithBuffer(words []string) string {
    var buf bytes.Buffer
    buf.Grow(estimateSize(words))

    for _, word := range words {
        buf.WriteString(word)
    }

    return buf.String()
}

Comprehensive Concatenation Benchmark

func BenchmarkConcatenation(b *testing.B) {
    // Prepare test data
    words := make([]string, 1000)
    totalLen := 0
    for i := 0; i < 1000; i++ {
        words[i] = fmt.Sprintf("word_%d_", i)
        totalLen += len(words[i])
    }

    b.Run("plus_operator", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            result := ""
            for _, word := range words {
                result = result + word
            }
            _ = result
        }
    })

    b.Run("strings.Builder", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            var sb strings.Builder
            sb.Grow(totalLen)
            for _, word := range words {
                sb.WriteString(word)
            }
            _ = sb.String()
        }
    })

    b.Run("strings.Builder_no_grow", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            var sb strings.Builder
            for _, word := range words {
                sb.WriteString(word)
            }
            _ = sb.String()
        }
    })

    b.Run("bytes.Buffer", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            var buf bytes.Buffer
            buf.Grow(totalLen)
            for _, word := range words {
                buf.WriteString(word)
            }
            _ = buf.String()
        }
    })

    b.Run("join", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = strings.Join(words, "")
        }
    })
}

// Result (typical): Builder ~5x faster than +, 1x vs Join

Best practice: Use strings.Builder with .Grow() for known sizes, or without for unknown sizes.

fmt.Sprintf Overhead

fmt.Sprintf is convenient but has reflection overhead.

func BenchmarkStringFormatting(b *testing.B) {
    b.Run("sprintf", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = fmt.Sprintf("user_%d_%s", i, "data")
        }
    })

    b.Run("builder", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            var sb strings.Builder
            sb.WriteString("user_")
            sb.WriteString(strconv.Itoa(i))
            sb.WriteString("_data")
            _ = sb.String()
        }
    })

    b.Run("concat", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = "user_" + strconv.Itoa(i) + "_data"
        }
    })
}

// Result: Direct concatenation fastest, Builder 2x faster than Sprintf

Tip: For simple formatting, use direct concatenation or Builder. Reserve Sprintf for complex layouts.

String Interning: Deduplicating Strings

Strings with identical content are separate allocations. Interning creates a pool of canonical strings.

import "sync"

// String interning pool
type StringPool struct {
    mu    sync.RWMutex
    pool  map[string]string
}

func NewStringPool() *StringPool {
    return &StringPool{
        pool: make(map[string]string),
    }
}

// Intern returns the canonical string
func (sp *StringPool) Intern(s string) string {
    sp.mu.RLock()
    if canonical, exists := sp.pool[s]; exists {
        sp.mu.RUnlock()
        return canonical
    }
    sp.mu.RUnlock()

    // Double-check after acquiring write lock
    sp.mu.Lock()
    defer sp.mu.Unlock()

    if canonical, exists := sp.pool[s]; exists {
        return canonical
    }

    sp.pool[s] = s
    return s
}

// Use case: parsing many duplicate strings
func ParseWithInterning(lines []string, pool *StringPool) {
    for _, line := range lines {
        fields := strings.Fields(line)
        for _, field := range fields {
            canonical := pool.Intern(field)
            // All identical fields now point to same string
            _ = canonical
        }
    }
}

When to use:

Parsing logs/configs with many duplicate values
DNS names, hostnames, labels
Cached taxonomy/enum values

Warning: Strings in the pool are never garbage collected. Only intern strings with known longevity.

Substring Gotcha: GC Prevention

Substrings share the underlying array, preventing garbage collection of the original.

// PROBLEMATIC: Holding substring keeps entire string allocated
func ProcessLargeFileProblematic(filepath string) []string {
    content, _ := ioutil.ReadFile(filepath)  // 10 MB
    lines := strings.Split(string(content), "\n")

    // Keeping only first 100 lines
    results := lines[:100]

    // content is still allocated (not GC'd)!
    // Because each line[:100] shares the original byte array
    return results
}

// SOLUTION: Clone substrings (Go 1.20+)
func ProcessLargeFileFixed(filepath string) []string {
    content, _ := ioutil.ReadFile(filepath)
    lines := strings.Split(string(content), "\n")

    results := make([]string, len(lines[:100]))
    for i, line := range lines[:100] {
        results[i] = strings.Clone(line)  // Makes independent copy
    }

    // Original content can now be GC'd
    return results
}

// Manual clone (pre-1.20)
func cloneString(s string) string {
    return string([]byte(s))  // Forces copy
}

// Benchmark: memory impact
func BenchmarkSubstringGC(b *testing.B) {
    b.Run("substring_shared", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            s := strings.Repeat("x", 10000)
            _ = s[:100]  // Substring still holds reference
        }
    })

    b.Run("substring_cloned", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            s := strings.Repeat("x", 10000)
            _ = strings.Clone(s[:100])  // Independent copy
        }
    })
}

Critical: Use strings.Clone() when extracting long-lived substrings from large strings.

Unsafe Zero-Alloc String Conversions

Advanced technique for specific scenarios where safety guarantees allow:

// UNSAFE: Converting []byte to string without copy
// Only safe if bytes are immutable for lifetime of string
func unsafeByteSliceToString(b []byte) string {
    return *(*string)(unsafe.Pointer(&b))
}

// Use case: receiving data that won't be modified
func parseReceivedData(buffer []byte) {
    // Only if we're CERTAIN buffer won't be mutated
    s := unsafeByteSliceToString(buffer)
    _ = s
}

Use only when:

Working with data you control that won't mutate
The byte slice outlives the string
Performance is critical and profiling shows this is a bottleneck

General advice: Avoid unsafe conversions. The alloc/copy is usually negligible.

strings.Builder vs bytes.Buffer

Both accumulate strings, but with different tradeoffs:

// Comparison
func CompareBuilderVsBuffer() {
    // strings.Builder
    var sb strings.Builder
    sb.WriteString("hello")
    s := sb.String()  // Zero-copy return

    // bytes.Buffer
    var buf bytes.Buffer
    buf.WriteString("hello")
    s = buf.String()  // Creates copy
}

// Benchmark: which is faster?
func BenchmarkBuilderVsBuffer(b *testing.B) {
    b.Run("Builder.String", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            var sb strings.Builder
            for j := 0; j < 100; j++ {
                sb.WriteString("test_string_content_")
            }
            _ = sb.String()
        }
    })

    b.Run("Buffer.String", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            var buf bytes.Buffer
            for j := 0; j < 100; j++ {
                buf.WriteString("test_string_content_")
            }
            _ = buf.String()
        }
    })
}

// Result: Builder slightly faster for string output

Choose:

Builder: When your output is a string
Buffer: When you need byte-level operations or encoding

String Comparison Performance

Different comparison methods have different costs:

func BenchmarkStringComparison(b *testing.B) {
    s1 := "hello"
    s2 := "hello"

    b.Run("equality_check", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = s1 == s2
        }
    })

    b.Run("strings.EqualFold", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = strings.EqualFold(s1, s2)
        }
    })

    b.Run("strings.Compare", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = strings.Compare(s1, s2)
        }
    })
}

// Result: == fastest, EqualFold ~10% slower for case-insensitive

Efficient String Searching

Different search methods for different scenarios:

func BenchmarkStringSearc(b *testing.B) {
    haystack := strings.Repeat("the quick brown fox ", 1000)
    needle := "brown"

    b.Run("Contains", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = strings.Contains(haystack, needle)
        }
    })

    b.Run("Index", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = strings.Index(haystack, needle)
        }
    })

    b.Run("HasPrefix", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = strings.HasPrefix(haystack, needle)
        }
    })

    b.Run("Field", func(b *testing.B) {
        single := "word"
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = strings.Contains(single, needle)
        }
    })
}

Real-World Example: Building JSON Efficiently

// BAD: Multiple allocations with Sprintf
func BuildJSONBad(users []User) string {
    result := ""
    for _, u := range users {
        result += fmt.Sprintf(`{"id":%d,"name":"%s"},`, u.ID, u.Name)
    }
    return "[" + result[:len(result)-1] + "]"
}

// GOOD: Use Builder with preallocated size
func BuildJSONGood(users []User) string {
    var sb strings.Builder

    // Estimate: 50 bytes per user + 2 for brackets
    sb.Grow(len(users)*50 + 2)

    sb.WriteString("[")
    for i, u := range users {
        if i > 0 {
            sb.WriteString(",")
        }
        sb.WriteString(`{"id":`)
        sb.WriteString(strconv.Itoa(u.ID))
        sb.WriteString(`,"name":"`)
        sb.WriteString(u.Name)
        sb.WriteString(`"}`)
    }
    sb.WriteString("]")

    return sb.String()
}

// Benchmark
func BenchmarkJSONConstruction(b *testing.B) {
    users := make([]User, 1000)
    for i := 0; i < 1000; i++ {
        users[i] = User{ID: i, Name: fmt.Sprintf("User_%d", i)}
    }

    b.Run("Sprintf", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = BuildJSONBad(users)
        }
    })

    b.Run("Builder", func(b *testing.B) {
        b.ReportAllocs()
        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            _ = BuildJSONGood(users)
        }
    })
}

type User struct {
    ID   int
    Name string
}

String Optimization Checklist

Avoid string concatenation in loops: Use strings.Builder
Preallocate with .Grow(): Reduces allocations by 50-75%
Choose correct type: string for APIs, []byte for I/O
Intern frequently-repeated strings: In cache/pool workloads
Clone extracted substrings: Prevent unintended GC holds
Use direct concatenation for small strings: Faster than formatting
Prefer strings.Builder: Over bytes.Buffer for string output
Avoid multiple conversions: Between string and []byte
Monitor allocations: With -test.v and b.ReportAllocs()

On this page