Welcome to the next installment of infrequent deep dives into XNU, the kernel that powers Apple’s operating systems. In this post, I will walk through the Mach voucher subsystem from a software engineering perspective. Ian Beer discusses the Mach voucher system from an exploitation perspective in this post. References herein are from xnu-8020.140.41 and libdispatch-1325.120.2.
Vouchers are immutable kernel key-value stores, represented by a Mach port name
(handle). While vouchers are immutable, they can be used to derive new vouchers
with different attributes. Vouchers are maintained in the kernel by resource
managers, as defined by the ipc_voucher_attr_manager
struct
vtable.
Every possible voucher key has a corresponding resource manager. Resource
managers provide the following operations:
MACH_VOUCHER_ATTR_REDEEM
,
as well as manager-specific ones.Attribute managers are registered in the kernel using
ipc_register_well_known_mach_voucher_attr_manager()
. Voucher managers are
currently in-kernel only, but it looks like there are some designs in
host_register_well_known_mach_voucher_attr_manager()
to allow arbitrary
userspace managers via MIG. However these are not currently supported. In
addition, voucher managers can choose whether to participate in Mach message
send pre- and post-processing.
As of macOS 13, there are the following attribute managers:
MACH_VOUCHER_ATTR_KEY_USER_DATA
in osfmk/ipc/ipc_voucher.cMACH_VOUCHER_ATTR_KEY_IMPORTANCE
in osfmk/ipc/ipc_importance.cMACH_VOUCHER_ATTR_KEY_BANK
in osfmk/bank/bank.cMACH_VOUCHER_ATTR_KEY_PTHPRIORITY
in osfmk/voucher/ipc_pthread_priority.cI’ll briefly describe each of those next:
This attribute class is fairly self-explanatory. It allows storing arbitrary
userspace-supplied data up to USER_DATA_MAX_DATA
(currently 16 KB) in the
kernel, attached to the voucher object. Currently this is used by libdispatch
to perform activity tracing between asynchronous dispatch continuations. See
the _voucher_mach_udata_s
struct for details.
This attribute manager is marked as deprecated. It was used to encode a pthread priority value and transfer it over IPC. It has been superseded by the importance manager, as discussed below.
The Bank key is used to transmit the “persona” of a process for the purposes of
resource accounting chargeback. Personas are a meta-user
management/organization facility, and they are classified into persona types,
such as PERSONA_SYSTEM
, PERSONA_DEFAULT
, and PERSONA_SYSTEM_PROXY
.
Personas are identified by a uid_t
and are associated with an originating
user logon name and process path. On macOS, personas do not appear to be widely
used, though they can be displayed via ps aux -O prsna
. The usage I observed
appears related to system extensions or plugins. The purpose of this appears to
be to account back to a persona the CPU time and energy usage that were
consumed by one process on behalf of another.
The importance attribute is the most complicated, and it interacts with several other userspace subsystems. The importance system is used to propagate quality-of-service (QoS) across IPCs.
Each task has an associated ipc_importance_task
struct that contains the
following booleans:
iit_receiver
: “the task can receive importance boost”iit_denap
: “the task can be awaked from App Nap”iit_donor
: “the task always sends boosts regardless of boost status”iit_live_donor
: “the task temporarily sends boosts regardless of boost status”These flags are set depending on the tasks’s app type/role, per
osfmk/kern/task_policy.c. Most user-facing applications are
TASK_APPTYPE_APP_DEFAULT
, which specifies live_donor
and denap
but not
donor
nor receiver
. Standard/legacy daemons are only donor
. Adaptive
daemons are receiver
, but not live_donor
, donor
, nor denap
.
Background daemons do not participate in importance at all.
There are two ways in which the kernel propagates QoS between threads
(including between processes) when sending and receiving a Mach message. In
order for importance to propagate, the receiving Mach port must be configured
with MACH_PORT_IMPORTANCE_RECEIVER
. There is a related port flag
MACH_PORT_DENAP_RECEIVER
that indicates whether or not a process that is
currently suppressed (i.e. in App Nap) can be boosted. Interestingly, despite
there being two public definitions for those Mach port attributes, they both
map to the same port property, ip_impdonation
in the kernel.
The first way to propagate QoS is by using one of two flags to mach_msg()
.
The MACH_SEND_OVERRIDE
flag uses the priority specified in the
mach_msg_overwrite_trap_args
. Oddly, this value comes from the notify
parameter of mach_msg()
; this argument was likely unused and re-purposed for
this QoS value. Alternatively, the MACH_SEND_PROPAGATE_QOS
flag instructs the
kernel to record the current thread’s QoS in the message and propagate it to
the receiver. If neither option is specified, the per-message message priority
is unspecified. See ipc_kmsg_set_qos()
for details.
The other, and preferred, way QoS is propagated is adaptive, via the
mach_msg_header::msgh_voucher_port
in a Mach message, with a voucher that
contains an IMPORTANCE
attribute. Like the other header ports, this port’s
disposition is conferred in the msgh_bits
field. In order for importance to
be transferred this way, the receiving Mach port must be configured as a
MACH_PORT_IMPORTANCE_RECEIVER
, and MACH_RCV_VOUCHER
must be specified to
mach_msg()
. The msgh_voucher_port
does not need to be explicitly set by a
sender, as the kernel will potentially produce a voucher when the message is in
transit.
When a message is sent, the kernel invokes the importance subsystem via
ipc_importance_send()
. In that function, the ipc_kmsg
is linked to a a
task’s ipc_importance_task
struct. A reference to this struct is stored in
the importance voucher. If the sender did not specify a voucher port, then the
sending task’s importance is used; otherwise, the reference from the voucher is
used. Sending with importance also sets the MACH_MSGH_BITS_RAISEIMP
flag in
the msgh_bits
field of the message and increases the boost count of the
receiving port. Boosts are associated with a specific port, but the boost then
propagates up to the task whose IPC space contains the port; this is called an
importance assertion. Boost propagation happens at send time, so that the
receiver task is raised in priority for scheduling to wake it up. If this is
the first boost the task has received, that takes the task out of App Nap
suppression.
When the message is received, the kernel again calls into the importance
subsystem via ipc_importance_receive()
. This function will produce a new
voucher, using the ipc_importance_task
that the ipc_kmsg
was linked to. It
also finds an existing, or allocates a new, ipc_importance_inherit
struct,
which is used to track the number of externalized boosts between the two
processes. The importance inheritance is stored in the new voucher, which is
then attached to the msgh_voucher
port. Therefore, so long as the voucher
port remains active, so does the boost. When the last send right to the voucher
port is deallocated, the boost count is decremented, and the receiving task
potentially loses its importance assertion.
In the simple case, this allows an IPC sender to donate its system-level importance to a receiver, without needing to explicitly set QoS or create a voucher—so long as the receiver is set up to accept importance. But the voucher system allows for more complex propagation of importance state by having senders forward or proxy the voucher port.
The above describes how importance propagates in the kernel between a sender process and a receiver. However, many systems that use IPC are multi-threaded and often do asynchronous operations. Apple’s XPC and libdispatch systems build on the kernel support to provide QoS propagation for this kind of programming. Unfortunately XPC is not open source, so most of this discussion is informed by the libdispatch source and debugging sample XPC programs. These two libraries are deeply intertwined: XPC connection events and messages are delivered onto dispatch queues with block event handlers.
When an XPC connection is created, it uses the private Mach channel API
declared in private/mach_private.h for messaging. This internally uses the
public DISPATCH_SOURCE_TYPE_MACH_RECV
API, but the private mach_channel
API
is richer and provides libdispatch object-oriented wrappers around the
low-level Mach libsyscall API. Specifically, it encapsulates bidirectional
communication into a dispatch_mach_t
object, messages into a
dispatch_mach_msg_t
object, and vouchers into a voucher_t
object. The Mach
channel API is also tightly integrated into the libdispatch queue system, and
using the dispatch_mach_xpc_hooks_t
, it provides special optimizations for
XPC message dispatching.
The Mach channel code is activated by the central libdispatch kqueue via
_dispatch_mach_merge_msg()
when a Mach message is received. This uses
_dispatch_mach_msg_create_recv()
to transform the raw Mach message into a
dispatch_mach_msg_t
object. As part of this, it inspects the
msgh_voucher_port
, and from it creates a libdispatch voucher_t
. It also
records if the message was delivered with MACH_MSGH_BITS_RAISEIMP
in the
voucher_t::v_kv_has_importance
field. The delivered message is then queued
for handling on the appropriate libdispatch queue.
When the handler block is to be invoked, libdispatch uses
_dispatch_mach_msg_invoke_with_mach()
to set up the context. Before the
client callout block is invoked, the current thread for the libdispatch queue
adopts the QoS and voucher from the voucher_t
. This happens using the
_pthread_set_properties_self()
call, which takes the pthread_priority_t
and
mach_port_t
voucher set for the current thread. That private pthread function
is also called by the public pthread_set_qos_class_self_np()
API, which
encodes the qos_class_t
and relpri
parameters into a pthread_priority_t
.
However, the public API does not provide a way to set the kernel thread’s
voucher port. Setting the thread’s voucher port currently only appears to be
used to implement current_persona_get_id()
.
Before invocation, all dispatch blocks – whether from dispatch_async()
, event
handlers, etc. – establish a dispatch_continuation_t
. This is used to store,
amongst other things, the voucher and priority of the current libdispatch
thread. The continuation enables the system to propagate the voucher and QoS
across libdispatch callouts as asynchronous work completes. For example, if an
XPC message was received with importance and a QoS boost, libdispatch will
invoke the XPC message handler on a queue with the boosted QoS. If that message
handler in turn calls dispatch_async()
(or some other libdispatch-using
asynchronous API), that QoS will be recorded in a libdispatch continuation.
When that asynchronous API ultimately completes and the next block is to be
invoked, libdispatch will use that stored continuation to further propagate the
QoS.
The voucher importance system, combined with libdispatch, is extremely elegant:
an XPC message sent from a high-priority process will propagate its importance
to the message receiver, potentially boosting its priority as a result. As that
receiver processes the message, potentially through asynchronous work or
multiple dispatch_async()
calls, that boosted QoS will follow the block
continuations. When the last reference to the voucher is dropped, either by
replying to the message and/or not creating any more libdispatch continuations
to retain it, the voucher’s Mach port will be destroyed, and the associated
boost count will be decremented. Dynamically boosting and propagating QoS in
this way means that the system can be much more efficient with resources,
especially when using libdispatch to also dynamically manage the thread pool.