Security topics: io_uring, VM attestation, and random-reseed notifications

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jonathan Corbet
September 4, 2023

The kernel-development community has recently been discussing a number of independent patches, each of which is intended to help improve the security of deployed systems in some way. They touch on a number of areas within the kernel, including the question of how widely io_uring should be available, how to allow virtual machines to attest to their integrity, and the best way to inform applications when their random-number generators need to be reseeded.

Disabling io_uring

The io_uring interface has been a boon to users striving for the best performance with I/O-heavy workloads; it has finally given Linux an approach to asynchronous I/O (and more) that the community can be proud of. It has also brought a number of security-related bugs, to the point the Google recently described it as being "safe only for use by trusted components". It is thus not surprising that somebody (Matteo Rizzo, in this case) has put together a patch allowing the system administrator to disable io_uring entirely.

This patch adds a new sysctl knob (kernel.io_uring_disabled) that controls the availability of the io_uring feature. At the knob's default value of zero, io_uring remains available as always. Setting it to one disables it for unprivileged users, where "privileged" is defined as having the CAP_SYS_ADMIN capability. In response to a request from Andres Freund after a previous posting, Rizzo added another knob, kernel.io_uring_group, that can be set with a group number; any process that is a member of the indicated group is also allowed to use io_uring. Finally, setting kernel.io_uring_disabled to two turns the feature off entirely.

After five revisions, the patch seems about ready to go into the mainline; there does not seem to be any real opposition to it. One might wonder how long it will really be useful, though. As Ben Hawkes recently wrote, the bulk of the io_uring problems may have already been found:

The era of io_uring is probably coming to an end, but it's been a very popular area of research recently. It reminds me of the gold rush around unprivileged user namespaces. Basically these complex new kernel features are consistently more bug-prone than we'd like, and this pattern seems to repeat itself every few years.

In the case of io_uring, perhaps the worst problems have been found and the stream of vulnerabilities will begin to taper off.

Virtual-machine attestation

The field of confidential computing has put a lot of effort into the ability to run virtual machines that cannot be compromised or spied upon, even by the host computer on which those machines are run. Getting to that point requires a lot of system hardening, use of encryption, and hardware that provides features (such as encrypted memory) to protect virtual machines from the surrounding world. All that work will be for nothing, though, if a virtual machine is compromised in some way: if, for example, its data has been tampered with, or if the hardware features it is depending on are not really there.

Users of confidential-computing systems tend to start them and, after convincing themselves that all is well, entrusting them with the encryption keys or other secrets they need to get their job done. For a virtual machine, convincing an orchestration system is a matter of using the available integrity-measurement mechanisms and having the CPU attest to its own integrity using a secret key buried deeply inside. All of this information can be signed by a device like a trusted platform module, then passed out of the machine, where it can be verified externally.

Numerous vendors are working on this functionality and, naturally, each is solving the problem in its own way. This, as Dan Williams noted in this patch series, is not the best way forward:

The approach of adding adding new char devs and new ioctls, for what amounts to the same logical functionality with minor formatting differences across vendors, is untenable. Common concepts and the community benefit from common infrastructure.

Williams is working to provide that infrastructure. The result is a configfs interface where the orchestration system can create a directory, write nonce data to a special file (called inblob). The virtual machine will then read the nonce data, incorporate it into its attestation report, and make it available to be read from outblob. The orchestrator can then verify the signatures and nonce data; if everything checks out, the machine should be safe to use.

It's worth noting that this proposal says nothing about the format of the data written to and read from these configfs files; they are still specific to the confidential-computing mechanism that is in use. There is, evidently, a discussion underway concerning the standardization of this data, but it is not clear if or when that will happen. Meanwhile, though, there will at least be a uniform interface for working with this information.

Random reseeding

The kernel's random-number generator is meant to be fast, but it is still not fast enough for some users. In such cases, it is common to implement a pseudo-random-number generator in user space, which is seeded from the kernel at application startup. That can work well, but there is a problem: sometimes the random seed may be in danger of compromise and in need of replacement. This can happen, for example, if a virtual machine is snapshotted and later restored, resulting in two machines generating the same "random" number series from the same seed. This problem was addressed in the kernel in 2022, but it remains for user space.

The kernel is aware of events that may require reseeding a random-number generator; it is just a matter of making that information available to interested processes in user space. A system call to check whether reseeding is necessary could be added, but that would defeat the purpose of using the user-space generator in the first place; something faster is needed.

The approach currently under consideration can be seen in this short patch series from Babis Chalios. It allows a process to open /dev/random, invoke a new ioctl() to get a special-purpose file descriptor, then pass that descriptor to mmap() to map a single page of shared memory into the process's address space. That page contains a 32-bit value split into two fields: an eight-bit "notifier ID" and a 24-bit "epoch counter".

There are numerous notifiers in the kernel that may detect and signal the need to reseed the random-number generator; each of these is assigned a unique ID. Examples of notifiers might include the virtual-machine snapshot mechanism or a periodic timer. Whenever a notifier decides that a reseed is warranted, it increments the epoch counter and writes its own ID into the notifier-ID field; the combination of the two values ensures that the full 32-bit value will change with every update regardless of any races between notifiers. With this mechanism in place, a user-space process need only read this value before generating a random number; if it has changed since the last read, a reseed should happen before anything else.

Some discussion on the details of the reporting format are still ongoing (Greg Kroah-Hartman suggested using two 32-bit values), but otherwise this mechanism, which was evidently hashed out at the 2022 Linux Plumbers Conference, appears to be uncontroversial. Unless something surprising happens, reseed notifications should be ready for merging by the time the 6.7 merge window opens.

Index entries for this article
Kernel	Confidential computing
Kernel	io_uring
Kernel	Random numbers

(Log in to post comments)

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 4, 2023 16:51 UTC (Mon) by tux3 (subscriber, #101245) [Link]

> Williams is working to provide that infrastructure. The result is a configfs interface where the orchestration system can create a directory, write nonce data to a special file (called inblob). The virtual machine will then read the nonce data, incorporate it into its attestation report, and make it available to be read from outblob. The orchestrator can then verify the signatures and nonce data; if everything checks out, the machine should be safe to use.

One very welcome set of patches. Though I'm not sure how useful that will be in its current state.
The library implementors that deal with those chardevs have much worse problems in that they need to parse the actual blobs, and deal with a lot of different nested vendor-specific structures everywhere.
The chardevs themselves are abstracted away in a couple functions and not heard from again by the rest of the code.

The content of the blobs themselves, the various options when generating them, and the convoluted mechanisms to verify their validity are vastly more frightening than the weird chardevs, in my humble experience :')

> This approach later allows for the standardization of the attestation blob format without needing to invent a new ABI. Once standardization happens the standard format can be emitted by $report/outblob and indicated by $report/provider, or a new attribute like "$report/tcg_coco_report" can emit the standard format alongside the vendor format.

... ah, there it is, music to my ears =)

The problem is that the current outblob is a giant flaming hairball of mud, sprinkled with vendor-specific options, where any operation on the blob involves covering your arms elbow-deep in vendor-specific mud.
The patch does make it easier to get that blob into your hands. A very mild relief washes over me.

But the real benefit will be if and when the players manage to standardize any part of the format, even if just a few fields at first. Here's to hoping!

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 4, 2023 22:31 UTC (Mon) by roc (subscriber, #30627) [Link]

Seems critically important that user-space must generate the new (pseudo) random numbers and *then* check for a notification before proceeding. Doing it the other way around would leave open a possible race. That needs to be documented very loudly because it will be easy for user-space to get this wrong.

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 11:53 UTC (Tue) by hmh (subscriber, #3838) [Link]

Then please don't allow this to get merged without a companion *complete* manpage, and a Documentation/ patch. Hopefully the current patchsets already have it, even: I did not check, because this is a "general" rant post, not anything against this specific feature patchset, and it is not directed at @roc or anyone specific.

The lack of correct-use and discoverability documentation at feature acceptance is the bane of proper widespread (and correct!) use of a *lot* of interesting kernel functionality. No matter how much (very appreciated!) effort sites like LWN make to offset this, an online article in a LWN edition has nowhere the same long-term discoverability as appropriate documentation stored at the appropriate location, especially five or ten years from now.

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 14:18 UTC (Tue) by kaesaecracker (subscriber, #126447) [Link]

Well even if you do it like that, the VM could be saved after you got the random value and checked for the reseed but before you used it, so I don't know wether that helps significantly. I think it is more about e.g. having two servers use the same random seed for multiple random values.

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 16:12 UTC (Tue) by calumapplepie (subscriber, #143655) [Link]

Agreed: the explotability of a single predictable random number is heavily limited, especially since security-critical random number generation would probably use the kernel interfaces directly rather than have a userspace generator.

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 16:29 UTC (Tue) by mb (subscriber, #50428) [Link]

I think it's not that uncommon to use a custom RNG + pool in userspace for security critical apps.
Anybody remember the Debian OpenSSL disaster?

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 19:50 UTC (Tue) by calumapplepie (subscriber, #143655) [Link]

Okay, I poked around OpenSSL, and while it does still have a userspace random number generator, it will automatically reseed from the OS after a certain amount of time (by default). So if the VM you're spinning up or resuming from hibernation in or whatever has an updated time, I *think* that it will probably be greater than this reseed time and thus trigger a reseed when you attempt to use the data.

This change will probably just add a new line to the openssl manpage laying out another circumstance that triggers reseeding, but existing systems will hopefully not be devastatingly insecure without it.

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 6, 2023 5:18 UTC (Wed) by ianmcc (subscriber, #88379) [Link]

Isn't there a race either way? I'd have thought that the correct way to use it would be something like:

// global - assume we previously initialized rnd_epoch_ptr
uint32_t rand_epoch = *rnd_epoch_ptr;

int get_random()
{
// get random bytes
int r = call_my_prng();

int current_epoch = *rnd_epoch_ptr;

while (current_epoch != rand_epoch) {
rand_epoch = current_epoch;

// re-seed the generator
reseed_my_prng();

// regenerate the random bytes
r = call_my_prng();

current_epoch = *rnd_epoch_ptr;
}
return r;
}

I agree with the need for good documentation, since I think the above code still probably isn't correct (eg surely it needs some memory barriers to stop the compiler re-ordering or eliding the reads from *rnd_epoch_ptr) and might have other problems too.

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 6, 2023 11:10 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Wouldn't `rnd_epoch_ptr` correctly be marked as `volatile`? Or are the semantics around that not enough either?

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 6, 2023 14:01 UTC (Wed) by ianmcc (subscriber, #88379) [Link]

I think volatile isn't sufficient. And once you've got the required memory barriers then volatile isn't necessary.

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 15:13 UTC (Tue) by sirdarckcat (subscriber, #155945) [Link]

> As Ben Hawkes recently wrote, the bulk of the io_uring problems may have already been found

What Ben said was that since it is being disabled, the age of io_uring is coming to an end, as security-critical systems will disable it.

The problems with it will remain, and are unlikely to be over any time soon, as going forward there will be less of an incentive to look for them (as it'll be disabled on systems that care about security, and io_uring is not good enough that it can be trusted).

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 18:30 UTC (Tue) by willy (subscriber, #9762) [Link]

I agree that that is one possible interpretation of what Ben said, but I do not think it is the only interpretation. He could mean that "The initial security problems with this feature have now been found and fixed; there may be a long tail of minor bugs, but we're not likely to see anything like the number of bugs in the future".

In the context of the whole article where he prioritises Android-affecting bugs but considers enterprise distro bugs almost as important, he doesn't mean "Google have disabled it so problem solved". I'd expect Google to re-enable it in a few years once it's a bit more settled.

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 18:37 UTC (Tue) by sirdarckcat (subscriber, #155945) [Link]

We can just ask Ben https://twitter.com/sirdarckcat/status/1699129492281622609 🤷‍♂️

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 21:24 UTC (Tue) by axboe (subscriber, #904) [Link]

> The problems with it will remain, and are unlikely to be over any time soon [...]

As someone who received and dealt with these reports, I think it's safe to say that the main cause of most of them has long since been eliminated. io_uring initially had some unsafe practices for handling thread offload, for example, which was the main driver of a lot of them. These aren't relevant anymore, even in 5.10-stable. Outside of that, the few recent ones (around spring time) were all issues in older kernels where more prudent rewrites had eliminated those cases in recent kernels.

So while I don't think the patch from Matteo is particularly interesting now, it would've been earlier. But at the same time, it doesn't hurt, and it makes some peoples lives easier, so...

Security topics: io_uring, VM attestation, and random-reseed notifications

Posted Sep 5, 2023 21:49 UTC (Tue) by sirdarckcat (subscriber, #155945) [Link]

Thanks!

Yes, I think you are talking about this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/li... which backported io_uring implementation from 5.15 to 5.10 and hence got rid of issues like
9eac1904d3364254d622bf2c771c4f85cd435fc2

I believe we've seen similarly serious io_uring bugs more recently though?
a26a35e9019fd70bf3cf647dcfdae87abc7bacea
and 12ad3d2d6c5b0131a6052de91360849e3e154846 for example but here are a couple more:
ef7dfac51d8ed961b742218f526bd589f3900a59
9d94c04c0db024922e886c9fd429659f22f48ea4
fc7222c3a9f56271fba02aabbfbae999042f1679

Or, do you mean there was another commit on 6.x which also had some unsafe practices for handling thread offload that were refactored away more recently?

Regards

random-reseed notifications

Posted Sep 7, 2023 9:58 UTC (Thu) by rwmj (subscriber, #5474) [Link]

Are we reinventing VMGENID? https://github.com/libguestfs/virt-v2v/blob/master/docs/v...

random-reseed notifications

Posted Sep 7, 2023 10:05 UTC (Thu) by rwmj (subscriber, #5474) [Link]

In answer to my own link, VMGENID is at least mentioned in the patch series:

The feature is similar to Microsoft's Virtual Machine Generation ID and
it can be used to (1) avoid the race-condition that exists in our
current VMGENID implementation, between the time vcpus are resumed and
the ACPI notification is being handled and (2) propagate these events to
user space through the random.c epoch mechanism.

It's a shame that (1) cannot be fixed, as VMGENID is a widespread mechanism, and covers many more cloning cases than this patch series.