Since I last talked about Mach ports, Mac OS X has changed a bit. In this essay, rather than investigating a particular issue, I will talk about some changes to the OS X Mach IPC layer.

The Deprecation of bootstrap_register

When a new task is created on OS X, it is given a set of special Mach ports. Among these are its host port, which represents the machine on which the task is running; its task port, which is self-referential; and the bootstrap port, which is a connection to the bootstrap server. The bootstrap server provides a port namespace, in which tasks can register their own ports, which other tasks can look up and send messages to. Think of the bootstrap server as a telephone directory: a task can place a known, named value to correspond to a Mach port on which that task is listening.

To register a service with the bootstrap server, a task could use the bootstrap_register() function, which takes a string name and the Mach port to associate with it. However Apple deprecated this function in 10.5 and recommended using launchd instead. There is a long thread about this on the darwin-dev mailing list, which largely centers around one problem: how to connect a parent task to a child.

The way one creates a process on OS X is to use the fork() system call, which is managed by the BSD part of XNU, since most of process management is handled via the BSD mechanisms for POSIX compatibility. The Mach process creation facility, task_create(), has been disabled since 10.5, because too many other system calls assumed the presence of a BSD process in an execution context. Mach ports are not inherited across fork() (unlike file descriptors), so passing ports to child processes requires some work. A typical way of passing a port from a parent was to create a port in the parent process, register it using a name that could be passed to the child, fork/exec, and then grab the port using the shared name. With the deprecation of bootstrap_register(), though, a new way to pass ports needed to be found.

A simple replacement is found in the -[NSMachBootstrapServer registerPort:name:] API, which wraps a private bootstrap_register2() function. This is what Chrome uses, but it has the unfortunate characteristic of not being plain C. If the application has an installer, you can use launchd and its configuration directives to create the bootstrap server entry for you; this is what Apple recommends, but it requires additional setup and it cannot be done dynamically.

Discovering mach_ports_register()

While scanning the Mach system calls for other possibilities to hand off a port to a child during fork(), I found mach_ports_register(). This function is documented as taking an array of ports that are passed to the child during task_create(). The child could then look these up using mach_ports_lookup(). This seemed incredibly promising, but my tests revealed that it did not work.

In the kernel, when the fork() system call is handled, it calls fork_create_child() [xnu/bsd/kern/kern_fork.c], which then itself calls task_create_internal() and then ipc_task_init() [xnu/osfmk/kern/ipc_tt.c]. Reading through this chain, it’s apparent that mach_register_ports() does in fact work, and it places up to TASK_PORT_REGISTER_MAX (3) ports into the itk_registered field of the task. During ipc_task_init(), if a parent task is present, it copies the send rights into the child task. This looked very promising, so it was unclear why this was not working.

A quick use of the dtrace command revealed why:

$ sudo dtrace -n 'fbt::mach_ports_register:entry { ustack() }' -c ./parent
    CPU     ID                    FUNCTION:NAME
      4 266220        mach_ports_register:entry 
                  libsystem_kernel.dylib`mach_msg_trap+0xa
                  libsystem_kernel.dylib`mach_ports_register+0x70
                  parent`main+0x116
                  libdyld.dylib`start
                  parent`0x1

      4 266220        mach_ports_register:entry 
                  libsystem_kernel.dylib`mach_msg_trap+0xa
                  libsystem_kernel.dylib`mach_ports_register+0x70
                  libxpc.dylib`xpc_atfork_prepare+0x2b
                  libSystem.B.dylib`libSystem_atfork_prepare+0x9
                  libsystem_c.dylib`fork+0xc
                  parent`main+0x138
                  libdyld.dylib`start
                  parent`0x1

And there’s the problem… mach_ports_register() works just fine, but the system, specifically libC and libxpc, clobber the registered ports as part of the fork() routine. After my program parent calls mach_ports_register(), hooks in the libC fork() syscall wrapper call out to libSystem_atfork_prepare(), which then runs xpc_atfork_prepare(). XPC is not open source (I’ve filed rdar://problem/11192369 requesting it), but my guess is that it registers a special port for its own use. A disassembly of /usr/lib/system/libxpc.dylib reveals that:

_xpc_atfork_prepare:
000000000000d596        pushq   %rbp
000000000000d597        movq    %rsp, %rbp
000000000000d59a        pushq   %r14
000000000000d59c        pushq   %rbx
000000000000d59d        subq    $16, %rsp
000000000000d5a1        movl    _xpc_bootstrap_port(%rip), %eax
000000000000d5a7        movl    %eax, -20(%rbp)
000000000000d5aa        movq    89031(%rip), %rax
000000000000d5b1        movl    (%rax), %edi
000000000000d5b3        leaq    -20(%rbp), %rsi
000000000000d5b7        movl    $1, %edx
000000000000d5bc        callq   0x1704e ## symbol stub for: _mach_ports_register

XPC sets its bootstrap port (which is set up in an XPC initialization routine as part of pre-main() initialization) as the array of ports to be registered for a child, overwriting anything that was set up pre-fork() by the parent.

I tested whether mach_ports_register() worked on 10.6, prior to the introduction of XPC, and it does pass the port down to the child as expected. You can find the sample code here. You can build and test using make && ./parent. I filed rdar://problem/15417334 as a regression, but Apple responded by saying, “This API was never shareable, and now the system owns it” (whatever that means).

The Port Swap Dance

With mach_ports_register() broken, the only other way to pass ports from a parent to a child is to use the special ports, declared in /usr/include/mach/task_special_ports.h. The special ports are connections to various system services, and they are also inherited across fork(). To pass a port from parent to child, this is how to do it using special ports:

  1. The parent grabs one of its special ports with task_get_special_port() and stores it in a local variable.
  2. The parent allocates a new port with mach_port_allocate() that it wants to pass to its child and sets the new port as that special port with task_set_special_port(). Alternatively, it could use mach_port_insert_right() to create an additional send right for the child on an existing port.
  3. The parent forks, giving the child its inherited special port.
  4. The parent resets its special port with its saved value.
  5. The new child grabs the swapped special port and saves it appropriately.
  6. The child messages the parent over the grabbed port, to check in with the parent.
  7. The parent receives the check-in message and replies with the original special port it had previously saved.
  8. The child receives the real special port from the parent and sets its special port back to that.

The parent and child are now connected over a port across fork(), with the parent holding the receive end of a port and the child having a send right to it.

Passing FDs Over Mach IPC

Until 10.6, there was no way to pass a file descriptor over Mach IPC. This was likely a result of joining two kernels, BSD and Mach, into XNU, and the POSIX layer of BSD is only coupled to the Mach part in specific places. Starting in OS X 10.6 however, Apple introduced a new way to bridge these parts of the kernel using two private APIs, one to “convert” an FD to a port right, and another to get the underlying FD back. These SPIs should not be used, since they are not public and are liable to change, but the underlying concept is interesting and worth discussing.

fileport_makeport takes a file descriptor and an out-pointer to a new Mach port to be created. In the kernel, this creates a new port and allocates a send right to it, then associates the fileglob (the kernel structure that backs an fd) with the port. Since the fd is now a Mach name-right, it can be sent as a descriptor to another process in a Mach message.

When another process receives a fileport, it can call fileport_makefd() to convert the send right back into a file descriptor. Since file descriptors represent files, pipes, and sockets, this is a convenient way to use POSIX conventions across Mach IPC.