Welcome to the next installment of infrequent deep dives into XNU, the kernel that powers Apple’s operating systems. In this post, I will walk through the Mach voucher subsystem from a software engineering perspective. Ian Beer discusses the Mach voucher system from an exploitation perspective in this post. References herein are from xnu-8020.140.41 and libdispatch-1325.120.2.

Vouchers are immutable kernel key-value stores, represented by a Mach port name (handle). While vouchers are immutable, they can be used to derive new vouchers with different attributes. Vouchers are maintained in the kernel by resource managers, as defined by the ipc_voucher_attr_manager struct vtable. Every possible voucher key has a corresponding resource manager. Resource managers provide the following operations:

Attribute managers are registered in the kernel using ipc_register_well_known_mach_voucher_attr_manager(). Voucher managers are currently in-kernel only, but it looks like there are some designs in host_register_well_known_mach_voucher_attr_manager() to allow arbitrary userspace managers via MIG. However these are not currently supported. In addition, voucher managers can choose whether to participate in Mach message send pre- and post-processing.

As of macOS 13, there are the following attribute managers:

I’ll briefly describe each of those next:

User Data

This attribute class is fairly self-explanatory. It allows storing arbitrary userspace-supplied data up to USER_DATA_MAX_DATA (currently 16 KB) in the kernel, attached to the voucher object. Currently this is used by libdispatch to perform activity tracing between asynchronous dispatch continuations. See the _voucher_mach_udata_s struct for details.

pthread Priority

This attribute manager is marked as deprecated. It was used to encode a pthread priority value and transfer it over IPC. It has been superseded by the importance manager, as discussed below.

Bank

The Bank key is used to transmit the “persona” of a process for the purposes of resource accounting chargeback. Personas are a meta-user management/organization facility, and they are classified into persona types, such as PERSONA_SYSTEM, PERSONA_DEFAULT, and PERSONA_SYSTEM_PROXY. Personas are identified by a uid_t and are associated with an originating user logon name and process path. On macOS, personas do not appear to be widely used, though they can be displayed via ps aux -O prsna. The usage I observed appears related to system extensions or plugins. The purpose of this appears to be to account back to a persona the CPU time and energy usage that were consumed by one process on behalf of another.

Importance

The importance attribute is the most complicated, and it interacts with several other userspace subsystems. The importance system is used to propagate quality-of-service (QoS) across IPCs.

Each task has an associated ipc_importance_task struct that contains the following booleans:

These flags are set depending on the tasks’s app type/role, per osfmk/kern/task_policy.c. Most user-facing applications are TASK_APPTYPE_APP_DEFAULT, which specifies live_donor and denap but not donor nor receiver. Standard/legacy daemons are only donor. Adaptive daemons are receiver, but not live_donor, donor, nor denap. Background daemons do not participate in importance at all.

There are two ways in which the kernel propagates QoS between threads (including between processes) when sending and receiving a Mach message. In order for importance to propagate, the receiving Mach port must be configured with MACH_PORT_IMPORTANCE_RECEIVER. There is a related port flag MACH_PORT_DENAP_RECEIVER that indicates whether or not a process that is currently suppressed (i.e. in App Nap) can be boosted. Interestingly, despite there being two public definitions for those Mach port attributes, they both map to the same port property, ip_impdonation in the kernel.

The first way to propagate QoS is by using one of two flags to mach_msg(). The MACH_SEND_OVERRIDE flag uses the priority specified in the mach_msg_overwrite_trap_args. Oddly, this value comes from the notify parameter of mach_msg(); this argument was likely unused and re-purposed for this QoS value. Alternatively, the MACH_SEND_PROPAGATE_QOS flag instructs the kernel to record the current thread’s QoS in the message and propagate it to the receiver. If neither option is specified, the per-message message priority is unspecified. See ipc_kmsg_set_qos() for details.

The other, and preferred, way QoS is propagated is adaptive, via the mach_msg_header::msgh_voucher_port in a Mach message, with a voucher that contains an IMPORTANCE attribute. Like the other header ports, this port’s disposition is conferred in the msgh_bits field. In order for importance to be transferred this way, the receiving Mach port must be configured as a MACH_PORT_IMPORTANCE_RECEIVER, and MACH_RCV_VOUCHER must be specified to mach_msg(). The msgh_voucher_port does not need to be explicitly set by a sender, as the kernel will potentially produce a voucher when the message is in transit.

When a message is sent, the kernel invokes the importance subsystem via ipc_importance_send(). In that function, the ipc_kmsg is linked to a a task’s ipc_importance_task struct. A reference to this struct is stored in the importance voucher. If the sender did not specify a voucher port, then the sending task’s importance is used; otherwise, the reference from the voucher is used. Sending with importance also sets the MACH_MSGH_BITS_RAISEIMP flag in the msgh_bits field of the message and increases the boost count of the receiving port. Boosts are associated with a specific port, but the boost then propagates up to the task whose IPC space contains the port; this is called an importance assertion. Boost propagation happens at send time, so that the receiver task is raised in priority for scheduling to wake it up. If this is the first boost the task has received, that takes the task out of App Nap suppression.

When the message is received, the kernel again calls into the importance subsystem via ipc_importance_receive(). This function will produce a new voucher, using the ipc_importance_task that the ipc_kmsg was linked to. It also finds an existing, or allocates a new, ipc_importance_inherit struct, which is used to track the number of externalized boosts between the two processes. The importance inheritance is stored in the new voucher, which is then attached to the msgh_voucher port. Therefore, so long as the voucher port remains active, so does the boost. When the last send right to the voucher port is deallocated, the boost count is decremented, and the receiving task potentially loses its importance assertion.

In the simple case, this allows an IPC sender to donate its system-level importance to a receiver, without needing to explicitly set QoS or create a voucher—so long as the receiver is set up to accept importance. But the voucher system allows for more complex propagation of importance state by having senders forward or proxy the voucher port.

Importance in Userspace

The above describes how importance propagates in the kernel between a sender process and a receiver. However, many systems that use IPC are multi-threaded and often do asynchronous operations. Apple’s XPC and libdispatch systems build on the kernel support to provide QoS propagation for this kind of programming. Unfortunately XPC is not open source, so most of this discussion is informed by the libdispatch source and debugging sample XPC programs. These two libraries are deeply intertwined: XPC connection events and messages are delivered onto dispatch queues with block event handlers.

When an XPC connection is created, it uses the private Mach channel API declared in private/mach_private.h for messaging. This internally uses the public DISPATCH_SOURCE_TYPE_MACH_RECV API, but the private mach_channel API is richer and provides libdispatch object-oriented wrappers around the low-level Mach libsyscall API. Specifically, it encapsulates bidirectional communication into a dispatch_mach_t object, messages into a dispatch_mach_msg_t object, and vouchers into a voucher_t object. The Mach channel API is also tightly integrated into the libdispatch queue system, and using the dispatch_mach_xpc_hooks_t, it provides special optimizations for XPC message dispatching.

The Mach channel code is activated by the central libdispatch kqueue via _dispatch_mach_merge_msg() when a Mach message is received. This uses _dispatch_mach_msg_create_recv() to transform the raw Mach message into a dispatch_mach_msg_t object. As part of this, it inspects the msgh_voucher_port, and from it creates a libdispatch voucher_t. It also records if the message was delivered with MACH_MSGH_BITS_RAISEIMP in the voucher_t::v_kv_has_importance field. The delivered message is then queued for handling on the appropriate libdispatch queue.

When the handler block is to be invoked, libdispatch uses _dispatch_mach_msg_invoke_with_mach() to set up the context. Before the client callout block is invoked, the current thread for the libdispatch queue adopts the QoS and voucher from the voucher_t. This happens using the _pthread_set_properties_self() call, which takes the pthread_priority_t and mach_port_t voucher set for the current thread. That private pthread function is also called by the public pthread_set_qos_class_self_np() API, which encodes the qos_class_t and relpri parameters into a pthread_priority_t. However, the public API does not provide a way to set the kernel thread’s voucher port. Setting the thread’s voucher port currently only appears to be used to implement current_persona_get_id().

Before invocation, all dispatch blocks – whether from dispatch_async(), event handlers, etc. – establish a dispatch_continuation_t. This is used to store, amongst other things, the voucher and priority of the current libdispatch thread. The continuation enables the system to propagate the voucher and QoS across libdispatch callouts as asynchronous work completes. For example, if an XPC message was received with importance and a QoS boost, libdispatch will invoke the XPC message handler on a queue with the boosted QoS. If that message handler in turn calls dispatch_async() (or some other libdispatch-using asynchronous API), that QoS will be recorded in a libdispatch continuation. When that asynchronous API ultimately completes and the next block is to be invoked, libdispatch will use that stored continuation to further propagate the QoS.

Conclusion

The voucher importance system, combined with libdispatch, is extremely elegant: an XPC message sent from a high-priority process will propagate its importance to the message receiver, potentially boosting its priority as a result. As that receiver processes the message, potentially through asynchronous work or multiple dispatch_async() calls, that boosted QoS will follow the block continuations. When the last reference to the voucher is dropped, either by replying to the message and/or not creating any more libdispatch continuations to retain it, the voucher’s Mach port will be destroyed, and the associated boost count will be decremented. Dynamically boosting and propagating QoS in this way means that the system can be much more efficient with resources, especially when using libdispatch to also dynamically manage the thread pool.