December 27, 2021
realtime audio on macOS in the age of asymmetrical multicore CPUs
It's now time when I bitch about, and document my experiences dealing with Apple's documentation/APIs/etc. For some
reason I never feel the need to do this on other platforms, maybe it's that they tend to have better documentation
or less fuss to deal with, I'm not sure why, but anyway if you search for "macOS" on this blog you'll find previous installments. Anyway, let's begin.
A bit over a year ago Apple started making computers with their own CPUs, the M1. These have 8 or more cores, but have a mix of slower and faster cores, the slower cores having lower power consumption (whether or not they are more efficient per unit of work done is unclear, it wouldn't surprise me if their efficiency was similar under full load, but anyway now I'm just guessing).
The implications for realtime audio of these asymmetric cores is pretty complex and can produce all sorts of weird behavior. The biggest issue seems to be when your code ends up running on the efficiency cores even though you need the results ASAP, causing underruns. Counterintuitively, it seems that under very light load, things work well, and under very heavy load, things work well, but for medium loads, there is failure. Also counterintuitively, the newer M1 Pro and M1 Max CPUs, with more performance cores (6-8) and fewer efficiency cores (2), seem to have a larger "medium load" range where things don't work well.
The short summary:
Perhaps this was all obvious and documented and I failed to read the right things, but anyway I'm just putting this here in case somebody like me would find it useful.
- Ignore the thread QoS APIs, for realtime audio they're apparently not applicable (and do not address these issues). This was the biggest timesink for me -- I spend a ton of time going "why doesn't this QoS setting do anything?" Also Xcode has a CPU meter and for each thread it says "QoS unavailable"... so confusing.
- If a normal thread yields via usleep() or pthread_cond_timedwait() for more than a few hundred microseconds, it'll likely end up running on an efficiency core when it resumes (and it takes an eternity in terms of audio blocks to get bumped back to a performance core, by which there's been an underrun and the thread probably will go back to sleep anyway). Reducing all sleeps/waits to at most a few hundred microseconds is a way to avoid that fate (though Apple recommends against spinning, likely for good reason). It's not ideal, but you can effectively pin normal threads to performance cores using this method.
- Porting Your Audio Code to Apple Silicon was the most helpful guide (I wish I had seen the link at the bottom of one of the other less-helpful guides sooner! so much time wasted...), though it assumes some knowledge which doesn't seem to be linked in the document:
You want to get your realtime threads in the same thread workgroup as the CoreAudio device's (via kAudioDevicePropertyIOThreadOSWorkgroup), and to do that you first have to make your threads realtime threads using thread_policy_set(THREAD_TIME_CONSTRAINT_POLICY) (side note: we probably should have been doing this already, doh), ideally with the similar parameters that the coreaudio thread uses, which seems to be a period and constraint of ((1000000000.0 * blocksize * mach_timebase_info().denom) / (mach_timebase_info().numer * srate), and a computation of half that. If you don't set this policy, it will fail adding your thread to the workgroup (EINVAL in that case means "thread is not realtime" and not "workgroup is canceled" per the docs). Once you do that, you do effectively get your threads locked to performance cores, and can start breathing again.