Syscall and OS Integration
How Go handles system calls, the difference between syscall and rawSyscall, M parking during blocking calls, the netpoller for async I/O, and CGO threading implications.
Introduction
The Go runtime bridges the gap between your high-level goroutines and the operating system through a sophisticated syscall integration layer. Understanding how Go makes system calls, how it prevents thread exhaustion, and how it implements asynchronous I/O is critical for writing performant concurrent programs.
In this article, we'll dissect:
- Direct syscall mechanisms (avoiding libc)
- The distinction between
syscall.Syscallandsyscall.RawSyscall - The "M handoff" mechanism for non-blocking goroutine scheduling
- The netpoller architecture for async network I/O
- CGO threading implications
- Performance considerations and optimization strategies
How Go Makes System Calls
Unlike languages like Python or Ruby that route syscalls through libc, Go typically makes system calls directly using syscall instructions, bypassing the C standard library entirely (except on macOS/iOS where dynamic linking requirements force libc usage).
Direct vs. Libc Syscalls
Direct syscalls (most platforms):
- Go emits syscall instructions (e.g.,
SYSCALLon x86-64,SVCon ARM64) - Bypasses libc overhead
- Syscalls are functions like
syscall.Open(),syscall.Read(),syscall.Write() - Architecture-specific code in
runtime/sys_*.goandsyscall/zsyscall_*.go
Example architecture breakdown:
User Code
↓
syscall.Open() [Go implementation]
↓
func open(path string, mode int, perm uint32) (fd int, err error) {
r0, _, e1 := syscall.Syscall(syscall.SYS_OPEN, uintptr(unsafe.Pointer(...)...))
// syscall.Syscall() calls entersyscall/exitsyscall
}
↓
go/src/runtime/sys_linux_amd64.s [assembly]
MOVQ $syscall.SYS_OPEN, AX ; syscall number
SYSCALL ; CPU instruction
↓
OS Kernel
↓
File system layerlibc syscalls (macOS/iOS):
User Code
↓
syscall.Open()
↓
cgo bridge → libc open() → SYSCALLWhy Not Always Use Libc?
- Overhead: Function call, PLT relocation, library initialization
- Compatibility: Direct syscalls don't depend on glibc version
- Performance: Measurable difference in tight loops with many syscalls
- Control: Go can implement custom error handling and path strategies
Syscall vs. RawSyscall: The Critical Difference
The syscall package exposes two fundamental syscall entry points, and the difference is crucial for Go's scheduling model.
syscall.Syscall (with scheduler notification)
// From src/syscall/syscall.go
func Syscall(trap, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
// Pseudo-code behavior:
// 1. Call runtime.entersyscall() [notify scheduler]
// 2. Execute the actual syscall
// 3. Call runtime.exitsyscall() [scheduler recovery logic]What entersyscall does:
entersyscall() {
g := getg() // Get current goroutine
g.m.locks++ // Prevent preemption
g.syscallsp = getcallersp()
g.syscallpc = getcallerpc()
atomic.Store(&g.atomicstatus, Gsyscall) // Mark G as in syscall
if atomic.Load(&sched.gcwaiting) != 0 {
// GC is waiting; let's give up our P
atomic.Xchg(&g.m.p.ptr().status, Psyscall)
handoffp(releasep()) // Hand P off
}
g.m.syscalltick = g.m.p.ptr().syscalltick
g.m.locks--
}Key implications:
- The P (processor) is detached from the M (OS thread)
- The P can be given to another M to run other goroutines
- The current M can now block without stalling other goroutines
What exitsyscall does:
exitsyscall() {
g := getg()
oldp := g.m.oldp.ptr()
if oldp != nil && atomic.Load(&oldp.status) == Psyscall &&
atomic.Cas(&oldp.status, Psyscall, Prunning) {
// Successfully reacquired P
g.m.p.set(oldp)
atomic.Store(&g.atomicstatus, Grunning)
g.syscallsp = 0
return
}
// Could not reacquire P; goroutine must wait
mcall(exitsyscallSlow) // Park M, put G in global queue
}
exitsyscallSlow() {
// Put G in global run queue
// Park M in idle queue
// Schedule() will pick up work when M is unparked
}rawSyscall (no scheduler notification)
// From src/syscall/syscall.go
func RawSyscall(trap, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)
// Pseudo-code behavior:
// 1. Execute the syscall directly
// 2. NO entersyscall/exitsyscallWhen to use RawSyscall:
- Very fast syscalls that never block (e.g.,
getpid(),gettimeofday()) - Syscalls from syscall.Syscall would cause unnecessary overhead
- The cost of entersyscall/exitsyscall exceeds the syscall latency
Never use RawSyscall for blocking operations like read(), write(), or accept() — this blocks the entire OS thread and prevents the scheduler from running other goroutines.
Benchmark: Syscall vs. RawSyscall
package main
import (
"syscall"
"testing"
)
// Benchmark reading from /dev/null (fast, non-blocking path)
func BenchmarkSyscall(b *testing.B) {
fd, _ := syscall.Open("/dev/null", syscall.O_RDONLY, 0)
defer syscall.Close(fd)
buf := make([]byte, 1)
b.ResetTimer()
for i := 0; i < b.N; i++ {
syscall.Read(fd, buf)
}
}
// Raw syscall (unsafe if read blocks, but fast for /dev/null)
func BenchmarkRawSyscall(b *testing.B) {
fd, _, _ := syscall.RawSyscall(
syscall.SYS_OPEN,
uintptr(unsafe.Pointer(syscall.StringBytePtr("/dev/null"))),
uintptr(syscall.O_RDONLY),
0,
)
defer syscall.RawSyscall(syscall.SYS_CLOSE, fd, 0, 0)
buf := make([]byte, 1)
b.ResetTimer()
for i := 0; i < b.N; i++ {
syscall.RawSyscall(
syscall.SYS_READ,
fd,
uintptr(unsafe.Pointer(&buf[0])),
1,
)
}
}
// Expected output on Linux x86-64:
// BenchmarkSyscall-12 2000000 500 ns/op
// BenchmarkRawSyscall-12 3000000 350 ns/opThe M Handoff Mechanism
The cornerstone of Go's scalability is the M (OS thread) handoff: when an M blocks in a syscall, its P is handed to another M (or a new one is created), allowing other goroutines to run.
Visual Overview
Before Syscall:
┌────────────────┬────────────────┬────────────────┐
│ M1 (busy) │ M2 (idle) │ M3 (idle) │
├────────────────┼────────────────┼────────────────┤
│ P1 │ idle │ idle │
├────────────────┼────────────────┼────────────────┤
│ G1, G2, G3 │ - │ - │
└────────────────┴────────────────┴────────────────┘
M1 runs entersyscall() for G1:
┌────────────────┬────────────────┬────────────────┐
│ M1 (blocked) │ M2 (busy) │ M3 (idle) │
├────────────────┼────────────────┼────────────────┤
│ - │ P1 │ idle │
├────────────────┼────────────────┼────────────────┤
│ G1 (Gsyscall) │ G2, G3, ... G4 │ - │
└────────────────┴────────────────┴────────────────┘
↓
[blocked on syscall for G1]
↓
[P1 handed off to M2]
↓
[M2 runs other goroutines]Handoff Algorithm
// Simplified from runtime/proc.go
func entersyscall_handoff(gp *g) {
// Check if we should hand off P
if atomic.Load(&sched.gcwaiting) != 0 {
// GC needs us to give up P
mp := acquirem() // Acquire current M
mp.p.ptr().status = Psyscall
handoffp(releasep()) // Hand off P to another M
releasem(mp)
}
}
func handoffp(pp *p) {
// Try to give P to an idle M
if n := sched.pidle.len(); n > 0 {
mp := findrunnable() // Find an idle M
startm(mp, pp) // Start M with P
return
}
// No idle M; create a new one
if newmcount() < gomaxprocs*10 { // Don't exceed limit
newm(nil, pp) // Create new M with P
return
}
// All else fails; put P in idle queue
pidleput(pp)
}M Limit: runtime.SetMaxProcs and gomaxprocs
Go limits the maximum number of OS threads created by the runtime via runtime/debug.SetMaxThreads():
// Default: 10,000 OS threads
var MaxThreads int = 10000
func SetMaxThreads(n int) int {
return runtime_setMaxStack(n)
}Why this limit?
- OS overhead per thread: ~2 MB stack allocation (Linux)
- Context switch overhead scales with thread count
- Runaway thread creation → resource exhaustion
- Unbounded syscall count could create one thread per syscall
Note: This is independent of GOMAXPROCS, which controls the P count (logical CPUs).
The Netpoller: Asynchronous I/O
While blocking syscalls like read() and write() on regular files must block an M, Go provides asynchronous I/O for network operations through the netpoller.
Why Netpoller?
Network I/O is fundamentally different from file I/O:
- Network: Packets arrive asynchronously; multiplexing is natural (epoll/kqueue)
- Files: Sequential access model; kernel doesn't provide efficient async I/O (AIO is complex)
The netpoller allows thousands of goroutines to wait on I/O without consuming OS threads.
Architecture: epoll on Linux
Go User Code:
↓
conn.Read(buf) [net.Conn interface]
↓
(*netFD).Read() [Internal file descriptor wrapper]
↓
pollDesc.waitRead() [Register for read interest]
↓
netpoll() [Check for ready I/O, park goroutine if needed]
↓
epoll_wait(epfd, events, maxevents, timeout) [Linux syscall]
↓
Kernel [epoll multiplexer]
↓
[When packet arrives, event generated]
↓
Ready events returned to netpoll()
↓
Goroutines unparked, resumedpollDesc Structure
// From src/runtime/netpoll.go
type pollDesc struct {
link *pollDesc // Linked list of poll descriptors
fd uintptr // OS file descriptor
// Goroutines waiting on this fd
rg uintptr // G waiting for read; 0 if none
wg uintptr // G waiting for write; 0 if none
// Deadline handling
rt timer // Read deadline timer
wt timer // Write deadline timer
// Poll state
user uint32 // User-settable data (opaque)
rseq uint32 // Read sequence number
wseq uint32 // Write sequence number
}
type netpollEvent struct {
fd uintptr // Which fd became ready
pd *pollDesc // Descriptor for that fd
mask uint32 // POLLIN | POLLOUT
}netpoll() Integration with Scheduler
The netpoller is invoked during findrunnable() when the scheduler has no work:
// From src/runtime/proc.go, simplified
func findrunnable() (gp *g, inheritTime bool) {
// Try to find work locally
if gp := runqget(pp); gp != nil {
return gp, false
}
// Check netpoller for ready network I/O
if netpolled := netpoll(0); netpolled != nil {
injectglist(netpolled)
return netpolled, false
}
// Check work-stealing from other Ps
// ...
// If still nothing, park M in idle queue
// ...
}
func netpoll(delay int64) *g {
// timeout = delay
// Wait for events
n := epoll_wait(epfd, events, 128, timeout)
var gp *g
for i := 0; i < n; i++ {
pd := events[i].pollDesc
// Unpark goroutines waiting on this fd
if events[i].mask&(POLLIN|POLLHUP|POLLERR) != 0 {
if rg := atomic.LoadUintptr(&pd.rg); rg != 0 {
gp = casgstatus((*g)(unsafe.Pointer(rg)), Gwaiting, Grunnable)
// gp will be added to run queue
}
}
// Similarly for write readiness
}
return gp
}Deadline/Timeout Mechanism
Go timers and context deadlines integrate with netpoller:
// User code with timeout
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
// This calls conn.SetDeadline() internally
conn.Read(buf)Internally:
func (fd *netFD) SetReadDeadline(t time.Time) error {
// Schedule a timer that will interrupt the read
runtime_pollSetDeadline(fd.pd, t, 'r')
}
// Timer implementation
func runtime_pollSetDeadline(pd *pollDesc, t time.Time, mode byte) {
var d int64
if !t.IsZero() {
d = t.UnixNano()
}
// If deadline passes, the timer unparks the goroutine with an error
// The goroutine resumes and returns a context.DeadlineExceeded error
}Netpoller Platforms
Platform Multiplexer Maximum FDs
────────────────────────────────────────────────
Linux epoll unlimited
macOS/BSD kqueue unlimited
Windows IOCP (I/O Completion Ports)
Plan 9 Custom
Older systems select() FD_SETSIZE (~1024)File I/O: Why It's Blocking
File I/O is NOT async — there's no netpoller for regular files. This is a fundamental OS limitation:
- POSIX
aio_read()/aio_write()exist but are rarely used - Linux
io_uringis newer (Go doesn't use it yet) - Most syscalls on regular files complete quickly
When you call os.Open() or ioutil.ReadFile():
func ReadFile(filename string) ([]byte, error) {
f, err := os.Open(filename) // Syscall, blocks M
if err != nil {
return nil, err
}
defer f.Close() // Also a syscall, blocks M
b, err := io.ReadAll(f) // read() syscalls, block M
return b, err
}Each read() syscall blocks the OS thread. If you have thousands of goroutines doing file I/O concurrently, you'll create thousands of OS threads.
Workaround: Goroutine Pool for File I/O
package main
import (
"fmt"
"io/ioutil"
"sync"
)
type FileIOPool struct {
workers int
tasks chan FileTask
wg sync.WaitGroup
}
type FileTask struct {
path string
result chan []byte
err chan error
}
func NewFileIOPool(workers int) *FileIOPool {
p := &FileIOPool{
workers: workers,
tasks: make(chan FileTask, 100),
}
for i := 0; i < workers; i++ {
p.wg.Add(1)
go p.worker()
}
return p
}
func (p *FileIOPool) worker() {
defer p.wg.Done()
for task := range p.tasks {
data, err := ioutil.ReadFile(task.path)
if err != nil {
task.err <- err
} else {
task.result <- data
}
}
}
func (p *FileIOPool) ReadFile(path string) ([]byte, error) {
result := make(chan []byte)
errChan := make(chan error)
p.tasks <- FileTask{path, result, errChan}
select {
case data := <-result:
return data, nil
case err := <-errChan:
return nil, err
}
}
// Usage:
// pool := NewFileIOPool(10) // 10 dedicated threads for file I/O
// data, _ := pool.ReadFile("/path/to/file")This pattern bounds the number of OS threads doing file I/O.
CGO and Threading Implications
When you call C code from Go via cgo, the call is treated similarly to a syscall:
entersyscall for CGO
//export GoFunction
func GoFunction(x int) int {
// CGO call marshaling triggers:
// 1. entersyscall() [P is detached]
// 2. C function call
// 3. exitsyscall() [P reacquired or M parked]
return x * 2
}
func main() {
// Call C function from Go
result := C.some_c_function(42) // entersyscall/exitsyscall
}Thread Affinity: LockOSThread
Some C libraries require calls on a specific OS thread (e.g., OpenGL, some database drivers). Use runtime.LockOSThread():
package main
import (
"C"
"runtime"
)
func InitOpenGL() {
runtime.LockOSThread()
// C.glInit() // Must run on same thread
// All GL calls must be from this goroutine
}
func main() {
go InitOpenGL()
// This goroutine is permanently bound to its M
}Cost: That M can never run other goroutines. For N thread-affine goroutines, N OS threads are needed.
C Calling Back to Go
When C calls back into Go, the runtime creates a new M/G:
// C code calls back into Go
extern int go_handler(int x);
void c_library_init(int (*callback)(int)) {
// Later...
int result = callback(42); // Calls Go function
}//export go_handler
func go_handler(x int) int {
return x * 2
}
func main() {
C.c_library_init(C.go_handler)
}When callback is invoked from C:
- A new M (cgo thread) is activated
- A new G is created on that M
go_handlerexecutes- The G is parked until next C callback
Signal Handling
Go manages OS signals through a dedicated signal stack and sigtramp trampoline:
User Code
↓
Signal arrives (SIGTERM, SIGINT, etc.)
↓
Kernel [Signal handling]
↓
sigtramp (assembly) [runtime/os_linux.s]
↓
Signal handler (Go function)
↓
signal.Notify channels
↓
User signal receiversKey Points
// Register signal handler
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGTERM, syscall.SIGINT)
go func() {
sig := <-sigs // Blocks until signal
fmt.Println("Received signal:", sig)
}()Signals are asynchronous but delivered through goroutines, maintaining Go's concurrency model.
Performance Implications and Optimization Tips
1. Many Blocking Syscalls = Thread Explosion
// Bad: Creates many threads
func FetchManyUrls(urls []string) {
var wg sync.WaitGroup
for _, url := range urls {
wg.Add(1)
go func(u string) {
defer wg.Done()
resp, _ := http.Get(u) // May block in many ways
// ...
}(url)
}
wg.Wait()
}
// Better: Use connection pooling
client := &http.Client{
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
},
}
// Now syscalls are multiplexed over fewer threads2. Prefer net Package Over Raw Sockets
// Avoid (blocks on syscall)
fd, _ := syscall.Socket(syscall.AF_INET, syscall.SOCK_STREAM, 0)
syscall.Connect(fd, &sa)
syscall.Read(fd, buf)
// Prefer (integrated with netpoller)
conn, _ := net.Dial("tcp", "example.com:80")
conn.Read(buf)The net package wraps syscalls with netpoller integration.
3. Batch Syscalls
// Bad: N syscalls
for _, path := range paths {
os.Stat(path) // syscall per file
}
// Better: Use syscall batching (where applicable)
// Example: statx on Linux can batch operations4. Be Cautious with CGO in Hot Paths
// Avoid in tight loops
func ProcessMany(items []Item) {
for _, item := range items {
C.process_item(item) // entersyscall per item
}
}
// Better: Batch items
func ProcessBatch(items []Item) {
// Copy items once, process in C
C.process_many_items(...)
}5. Monitor Thread Count
package main
import (
"fmt"
"runtime"
"time"
)
func main() {
ticker := time.NewTicker(1 * time.Second)
for range ticker.C {
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Goroutines: %d, M threads: %d\n",
runtime.NumGoroutine(),
runtime.NumThread(), // Number of OS threads
)
}
}Benchmarks: Syscall Overhead
package main
import (
"net"
"testing"
)
// Benchmark: Writing to network socket (with netpoller)
func BenchmarkNetWrite(b *testing.B) {
ln, _ := net.Listen("tcp", "127.0.0.1:0")
defer ln.Close()
go func() {
for {
conn, _ := ln.Accept()
go io.Copy(io.Discard, conn)
}
}()
conn, _ := net.Dial("tcp", ln.Addr().String())
defer conn.Close()
data := []byte("Hello")
b.ResetTimer()
for i := 0; i < b.N; i++ {
conn.Write(data)
}
}
// Expected: ~200 ns/op (optimized, batched writes)
// Benchmark: Writing to file (blocks M)
func BenchmarkFileWrite(b *testing.B) {
f, _ := os.Create("/tmp/test.txt")
defer f.Close()
data := []byte("Hello")
b.ResetTimer()
for i := 0; i < b.N; i++ {
f.Write(data)
}
}
// Expected: ~500 ns/op (additional M parking overhead)Summary
Go's syscall integration is a marvel of engineering:
- Direct syscalls bypass libc overhead
- entersyscall/exitsyscall allows goroutines to block without stalling others via M handoff
- The netpoller enables thousands of concurrent network connections on a few OS threads
- File I/O remains blocking (use worker pools for heavy concurrent file access)
- CGO requires careful threading consideration (LockOSThread for thread-affine code)
- Signals are delivered asynchronously through goroutines
Understanding these mechanisms helps you write scalable, efficient concurrent Go programs.