Security topics: io_uring, VM attestation, and random-reseed notifications
Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net. |
The kernel-development community has recently been discussing a number of independent patches, each of which is intended to help improve the security of deployed systems in some way. They touch on a number of areas within the kernel, including the question of how widely io_uring should be available, how to allow virtual machines to attest to their integrity, and the best way to inform applications when their random-number generators need to be reseeded.
Disabling io_uring
The io_uring interface has been a boon to
users striving for the best performance with I/O-heavy workloads; it has
finally given Linux an approach to asynchronous I/O (and more) that the
community can be proud of. It has also brought a number of
security-related bugs, to the point the Google recently described
it as being "safe only for use by trusted components
". It is
thus not surprising that somebody (Matteo Rizzo, in this case) has put
together a
patch allowing the system administrator to disable io_uring entirely.
This patch adds a new sysctl knob (kernel.io_uring_disabled) that controls the availability of the io_uring feature. At the knob's default value of zero, io_uring remains available as always. Setting it to one disables it for unprivileged users, where "privileged" is defined as having the CAP_SYS_ADMIN capability. In response to a request from Andres Freund after a previous posting, Rizzo added another knob, kernel.io_uring_group, that can be set with a group number; any process that is a member of the indicated group is also allowed to use io_uring. Finally, setting kernel.io_uring_disabled to two turns the feature off entirely.
After five revisions, the patch seems about ready to go into the mainline; there does not seem to be any real opposition to it. One might wonder how long it will really be useful, though. As Ben Hawkes recently wrote, the bulk of the io_uring problems may have already been found:
The era of io_uring is probably coming to an end, but it's been a very popular area of research recently. It reminds me of the gold rush around unprivileged user namespaces. Basically these complex new kernel features are consistently more bug-prone than we'd like, and this pattern seems to repeat itself every few years.
In the case of io_uring, perhaps the worst problems have been found and the stream of vulnerabilities will begin to taper off.
Virtual-machine attestation
The field of confidential computing has put a lot of effort into the ability to run virtual machines that cannot be compromised or spied upon, even by the host computer on which those machines are run. Getting to that point requires a lot of system hardening, use of encryption, and hardware that provides features (such as encrypted memory) to protect virtual machines from the surrounding world. All that work will be for nothing, though, if a virtual machine is compromised in some way: if, for example, its data has been tampered with, or if the hardware features it is depending on are not really there.
Users of confidential-computing systems tend to start them and, after convincing themselves that all is well, entrusting them with the encryption keys or other secrets they need to get their job done. For a virtual machine, convincing an orchestration system is a matter of using the available integrity-measurement mechanisms and having the CPU attest to its own integrity using a secret key buried deeply inside. All of this information can be signed by a device like a trusted platform module, then passed out of the machine, where it can be verified externally.
Numerous vendors are working on this functionality and, naturally, each is solving the problem in its own way. This, as Dan Williams noted in this patch series, is not the best way forward:
The approach of adding adding new char devs and new ioctls, for what amounts to the same logical functionality with minor formatting differences across vendors, is untenable. Common concepts and the community benefit from common infrastructure.
Williams is working to provide that infrastructure. The result is a configfs interface where the orchestration system can create a directory, write nonce data to a special file (called inblob). The virtual machine will then read the nonce data, incorporate it into its attestation report, and make it available to be read from outblob. The orchestrator can then verify the signatures and nonce data; if everything checks out, the machine should be safe to use.
It's worth noting that this proposal says nothing about the format of the data written to and read from these configfs files; they are still specific to the confidential-computing mechanism that is in use. There is, evidently, a discussion underway concerning the standardization of this data, but it is not clear if or when that will happen. Meanwhile, though, there will at least be a uniform interface for working with this information.
Random reseeding
The kernel's random-number generator is meant to be fast, but it is still not fast enough for some users. In such cases, it is common to implement a pseudo-random-number generator in user space, which is seeded from the kernel at application startup. That can work well, but there is a problem: sometimes the random seed may be in danger of compromise and in need of replacement. This can happen, for example, if a virtual machine is snapshotted and later restored, resulting in two machines generating the same "random" number series from the same seed. This problem was addressed in the kernel in 2022, but it remains for user space.
The kernel is aware of events that may require reseeding a random-number generator; it is just a matter of making that information available to interested processes in user space. A system call to check whether reseeding is necessary could be added, but that would defeat the purpose of using the user-space generator in the first place; something faster is needed.
The approach currently under consideration can be seen in this short patch series from Babis Chalios. It allows a process to open /dev/random, invoke a new ioctl() to get a special-purpose file descriptor, then pass that descriptor to mmap() to map a single page of shared memory into the process's address space. That page contains a 32-bit value split into two fields: an eight-bit "notifier ID" and a 24-bit "epoch counter".
There are numerous notifiers in the kernel that may detect and signal the need to reseed the random-number generator; each of these is assigned a unique ID. Examples of notifiers might include the virtual-machine snapshot mechanism or a periodic timer. Whenever a notifier decides that a reseed is warranted, it increments the epoch counter and writes its own ID into the notifier-ID field; the combination of the two values ensures that the full 32-bit value will change with every update regardless of any races between notifiers. With this mechanism in place, a user-space process need only read this value before generating a random number; if it has changed since the last read, a reseed should happen before anything else.
Some discussion on the details of the reporting format are still ongoing
(Greg Kroah-Hartman suggested
using two 32-bit values), but otherwise this mechanism, which was evidently
hashed out at the 2022 Linux Plumbers Conference, appears to be
uncontroversial. Unless something surprising happens, reseed notifications
should be ready for merging by the time the 6.7 merge window opens.
Index entries for this article | |
---|---|
Kernel | Confidential computing |
Kernel | io_uring |
Kernel | Random numbers |
(Log in to post comments)
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 4, 2023 16:51 UTC (Mon) by tux3 (subscriber, #101245) [Link]
One very welcome set of patches. Though I'm not sure how useful that will be in its current state.
The library implementors that deal with those chardevs have much worse problems in that they need to parse the actual blobs, and deal with a lot of different nested vendor-specific structures everywhere.
The chardevs themselves are abstracted away in a couple functions and not heard from again by the rest of the code.
The content of the blobs themselves, the various options when generating them, and the convoluted mechanisms to verify their validity are vastly more frightening than the weird chardevs, in my humble experience :')
> This approach later allows for the standardization of the attestation blob format without needing to invent a new ABI. Once standardization happens the standard format can be emitted by $report/outblob and indicated by $report/provider, or a new attribute like "$report/tcg_coco_report" can emit the standard format alongside the vendor format.
... ah, there it is, music to my ears =)
The problem is that the current outblob is a giant flaming hairball of mud, sprinkled with vendor-specific options, where any operation on the blob involves covering your arms elbow-deep in vendor-specific mud.
The patch does make it easier to get that blob into your hands. A very mild relief washes over me.
But the real benefit will be if and when the players manage to standardize any part of the format, even if just a few fields at first. Here's to hoping!
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 4, 2023 22:31 UTC (Mon) by roc (subscriber, #30627) [Link]
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 11:53 UTC (Tue) by hmh (subscriber, #3838) [Link]
The lack of correct-use and discoverability documentation at feature acceptance is the bane of proper widespread (and correct!) use of a *lot* of interesting kernel functionality. No matter how much (very appreciated!) effort sites like LWN make to offset this, an online article in a LWN edition has nowhere the same long-term discoverability as appropriate documentation stored at the appropriate location, especially five or ten years from now.
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 14:18 UTC (Tue) by kaesaecracker (subscriber, #126447) [Link]
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 16:12 UTC (Tue) by calumapplepie (subscriber, #143655) [Link]
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 16:29 UTC (Tue) by mb (subscriber, #50428) [Link]
Anybody remember the Debian OpenSSL disaster?
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 19:50 UTC (Tue) by calumapplepie (subscriber, #143655) [Link]
This change will probably just add a new line to the openssl manpage laying out another circumstance that triggers reseeding, but existing systems will hopefully not be devastatingly insecure without it.
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 6, 2023 5:18 UTC (Wed) by ianmcc (subscriber, #88379) [Link]
// global - assume we previously initialized rnd_epoch_ptr
uint32_t rand_epoch = *rnd_epoch_ptr;
int get_random()
{
// get random bytes
int r = call_my_prng();
int current_epoch = *rnd_epoch_ptr;
while (current_epoch != rand_epoch) {
rand_epoch = current_epoch;
// re-seed the generator
reseed_my_prng();
// regenerate the random bytes
r = call_my_prng();
current_epoch = *rnd_epoch_ptr;
}
return r;
}
I agree with the need for good documentation, since I think the above code still probably isn't correct (eg surely it needs some memory barriers to stop the compiler re-ordering or eliding the reads from *rnd_epoch_ptr) and might have other problems too.
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 6, 2023 11:10 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 6, 2023 14:01 UTC (Wed) by ianmcc (subscriber, #88379) [Link]
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 15:13 UTC (Tue) by sirdarckcat (subscriber, #155945) [Link]
What Ben said was that since it is being disabled, the age of io_uring is coming to an end, as security-critical systems will disable it.
The problems with it will remain, and are unlikely to be over any time soon, as going forward there will be less of an incentive to look for them (as it'll be disabled on systems that care about security, and io_uring is not good enough that it can be trusted).
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 18:30 UTC (Tue) by willy (subscriber, #9762) [Link]
In the context of the whole article where he prioritises Android-affecting bugs but considers enterprise distro bugs almost as important, he doesn't mean "Google have disabled it so problem solved". I'd expect Google to re-enable it in a few years once it's a bit more settled.
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 18:37 UTC (Tue) by sirdarckcat (subscriber, #155945) [Link]
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 21:24 UTC (Tue) by axboe (subscriber, #904) [Link]
As someone who received and dealt with these reports, I think it's safe to say that the main cause of most of them has long since been eliminated. io_uring initially had some unsafe practices for handling thread offload, for example, which was the main driver of a lot of them. These aren't relevant anymore, even in 5.10-stable. Outside of that, the few recent ones (around spring time) were all issues in older kernels where more prudent rewrites had eliminated those cases in recent kernels.
So while I don't think the patch from Matteo is particularly interesting now, it would've been earlier. But at the same time, it doesn't hurt, and it makes some peoples lives easier, so...
Security topics: io_uring, VM attestation, and random-reseed notifications
Posted Sep 5, 2023 21:49 UTC (Tue) by sirdarckcat (subscriber, #155945) [Link]
Yes, I think you are talking about this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/li... which backported io_uring implementation from 5.15 to 5.10 and hence got rid of issues like
9eac1904d3364254d622bf2c771c4f85cd435fc2
I believe we've seen similarly serious io_uring bugs more recently though?
a26a35e9019fd70bf3cf647dcfdae87abc7bacea
and 12ad3d2d6c5b0131a6052de91360849e3e154846 for example but here are a couple more:
ef7dfac51d8ed961b742218f526bd589f3900a59
9d94c04c0db024922e886c9fd429659f22f48ea4
fc7222c3a9f56271fba02aabbfbae999042f1679
Or, do you mean there was another commit on 6.x which also had some unsafe practices for handling thread offload that were refactored away more recently?
Regards
random-reseed notifications
Posted Sep 7, 2023 9:58 UTC (Thu) by rwmj (subscriber, #5474) [Link]
random-reseed notifications
Posted Sep 7, 2023 10:05 UTC (Thu) by rwmj (subscriber, #5474) [Link]
In answer to my own link, VMGENID is at least mentioned in the patch series:The feature is similar to Microsoft's Virtual Machine Generation ID and it can be used to (1) avoid the race-condition that exists in our current VMGENID implementation, between the time vcpus are resumed and the ACPI notification is being handled and (2) propagate these events to user space through the random.c epoch mechanism.It's a shame that (1) cannot be fixed, as VMGENID is a widespread mechanism, and covers many more cloning cases than this patch series.