trusted_for() bounces off the merge window

By Jake Edge
April 12, 2022

When last we looked in on the proposed trusted_for() system call, which would allow user-space interpreters and other tools to ask the kernel whether a file is "trusted" for execution, it looked like it was on-track for the mainline. That was back in October 2020; the patch has been updated multiple times since then, made its way into linux-next, and a pull request was made by Mickaël Salaün for the 5.18 merge window. But it seems that there will be more to the story of getting this functionality into the kernel, as Linus Torvalds declined to pull trusted_for(), at least partly because he did not like the name, but there were other reasons as well. While he is not opposed to the functionality it would provide, he also had strong feelings that a new system call was not the right approach.

Background

The patch has been through 18 versions since it was first introduced in 2018. It started out as a new flag (O_MAYEXEC) for the openat2() system call. The idea behind it is fairly straightforward: the kernel enforces a number of security checks on files before they can be executed, but various kinds of tools can simply read files in order to execute them. Those files are not subject to the same checks, since the kernel is unaware that they contain code to be executed; finding a way to apply the same checks to files that are, effectively, being opened for execution, is the goal of Salaün's work.

Obviously, user space needs to be involved since the kernel cannot know that any file being opened is going to be used that way—the vast majority of files are not, after all. Python and other tools are interested in supporting security checks for files containing code (see PEP 578, for example), but there will clearly be a long tail of tools needing to inform the kernel of their intention and some may well resist or be uninterested in doing so. There would be value in having the feature for some types of locked-down systems that only have "well-behaved" tools that make the check.

Along the way, Al Viro, maintainer of the virtual filesystem (VFS) layer, complained that openat2() was not the proper place for handling this kind of check. He suggested a new system call, instead. The next version of the patches moved to an AT_INTERPRETED flag for the faccessat2() system call instead, but Viro thought that was not any better and again suggested a new system call.

After a round of bikeshedding about the name, Salaün decided on trusted_for(). The subsequent revisions were mostly cosmetic changes or updating the code for more recent kernels. It looks nearly the same as it did in our article a year and a half ago:

    int trusted_for(const int fd, const int usage, const unsigned int flags);

The call will check the file indicated by fd to see if it is allowed for the usage (TRUSTED_FOR_EXECUTION is the only option currently defined); flags is, as yet, unused. It will return zero if the file is trusted or EACCESS if it is not. By default, however, trusted_for() does not actually do anything, but there is a new fs.trusted_for_policy sysctl knob that can be set to have it check for files on a filesystem mounted with noexec, files that do not have execute permission, or both.

No merge

After the 5.18 merge window had closed without trusted_for() being pulled, both Salaün and Kees Cook asked about the status. It turns out that Torvalds was not happy to see a new, non-standard system call with a "completely random interface with no semantics except for random 'future flags'". Salaün disagreed that the semantics were unspecified; "I think the semantic is well defined: 'This new syscall enables user space to ask the kernel: is this file descriptor's content trusted to be used for this purpose?'"

Torvalds had a few other complaints as well:

What the system call seems to actually *want* is basically a new flag to access() (and faccessat()). One that is very close to what X_OK already is.
[...] No way will this ever get merged, and whoever came up with that disgusting "trusted_for()" (for WHAT? WHO TRUSTS? WHY?) should look themselves in the mirror.
If you add a new X_OK variant to access(), maybe that could fly.

The X_OK flag for access() (and faccessat2()) is used to determine whether the process has permission to execute a given file, using the real user and group IDs (rather than the effective IDs, which could be different for set-user-ID programs). For faccessat2(), the AT_EACCESS flag can be used to check the effective IDs instead. As Salaün noted, though, Torvalds's suggestion was similar to what Salaün had earlier done with AT_INTERPRETED for faccessat2(); he is willing to go back to that mechanism and wondered if Torvalds liked that approach better.

Torvalds looked at the earlier patch, which he said was a more reasonable approach, though he had some specific questions and suggestions. He wondered why a new mode bit, perhaps called EXECVE_OK, could not be used instead of adding the new AT_INTERPRETED flag value. That way it could be used for both access(), which lacks a flags parameter, and for faccessat2(); that makes more sense given what is being checked. The currently defined mode bits for those system calls check for read, write, or execute access.

Salaün agreed that using a mode bit was a better choice. Some of the other oddities that Torvalds noted in the patch were due to it being an early version of the feature on a path that was quickly abandoned after Viro's objection. Salaün plans to update the patch and resubmit, though one might guess Viro will still have the same objections, so how far it all goes is not clear. In addition, if further checks are added, such as for Linux security module (LSM) access restrictions or file-integrity verification, it may be done by way of additional bits on fs.trusted_for_policy (with a new name), but it will require additional code for access()/faccessat2() to actually perform the checks.

Bikeshed history

Ted Ts'o suggested that the history of the evolution of the feature would be a good addition to the changelog:

As a suggestion, something that can be helpful for something which has been as heavily bike-sheded as this concept might be to write a "legislative history", or perhaps, a "bike shed history".
And not just with links to mailing list discussions, but a short summary of why, for example, we moved from the open flag O_MAYEXEC to the faccessat(2) approach. I looked, but I couldn't find the reasoning while diving into the mail archives. [...]
It might be that when all of this is laid out, we can either revisit prior design decisions as "that bike-shed request to support this corner case was unreasonable", or "oh, OK, this is why we need as fully general a solution as this".

Some of that information is contained in the patch that actually adds the system call, though it mostly just lists the changes for each version without a lot of explanation of the sort Ts'o is looking for. This article and the earlier two may also help fill in some of those holes.

Overall, it is a fairly simple feature that could provide some useful functionality in specialized environments. But where it actually will live has been rather difficult to resolve. Given Torvalds's preference, returning to the plan for putting it in access() and faccessat2() looks like it has a plausible future, but we will have to see how version 19 (and beyond) of the patch set fare.

Index entries for this article
Kernel	Filesystems/Virtual filesystem layer
Kernel	System calls

(Log in to post comments)

trusted_for() bounces off the merge window

Posted Apr 13, 2022 2:56 UTC (Wed) by clay.sweetser@gmail.com (guest, #155278) [Link]

Is there a reason why interpreters can't just fstat() a file descriptor, and check whether the file has executable permission before running it?

I also feel like this functionality is overly focused - the whole "how can a program tell if a script should be interpreted" seems like something file metadata would be better suited for (or some central database).

As an example, on Windows both Chrome and Firefox store the "origin" of downloaded files as part of a file's metadata. Then, if a user attempts to execute such a file, the operating system displays an "are you sure you want to execute this?" prompt to the user. Putting aside whether that *particular* mechanism is worthwhile, the general idea seems close to what trusted_for is supposed to do.

trusted_for() bounces off the merge window

Posted Apr 13, 2022 8:57 UTC (Wed) by Baughn (subscriber, #124425) [Link]

The kernel does more checks than just the execute flag. Just to name one, the filesystem may be mounted with noexec.

There are others, there may be more in the future, and handling all the checks in one spot—the kernel—makes more sense than duplicating it across a dozen potentially buggy interpreters.

trusted_for() bounces off the merge window

Posted Apr 13, 2022 23:16 UTC (Wed) by simcop2387 (subscriber, #101710) [Link]

> Is there a reason why interpreters can't just fstat() a file descriptor, and check whether the file has executable permission before running it?

This trivially leads to a Time-Of-Check, Time-Of-Use race condition since it's not a property of the FD but instead of the entry on the file system that has the execute permission.

trusted_for() bounces off the merge window

Posted Apr 14, 2022 0:28 UTC (Thu) by clay.sweetser@gmail.com (guest, #155278) [Link]

Huh, I didn't know that. I had assumed that a file descriptor kept a snapshot of the file's permissions when it was initially created, and so would not be affected by future permission changes.

trusted_for() bounces off the merge window

Posted Apr 14, 2022 2:10 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

A file descriptor *does* do that, sort of. It's just that the permissions it keeps track of are O_RDONLY, O_WRONLY, etc., rather than S_IRUSR etc. (i.e. it keeps track of what you asked to do when you called open(2), not what stat(2) or access(2) would have told you). That's probably why the first version of this patch was attempting to add a flag to open(2).

trusted_for() bounces off the merge window

Posted Apr 13, 2022 18:40 UTC (Wed) by dullfire (subscriber, #111432) [Link]

Ultimately this idea looks much like forcing policy into kernel space.

Kernel doesn't decided if the interpreter can run the script (the interpreter does, meaning the interpreter could be written to ignore the syscall). Further more the kernel doesn't actually know answer the the question with it's resources. It *could* tell you if the file is executable or not. However if that's what you want, there's an easy solution: make the interpreter only work as an interpreter (and/or use access(X_OK)). Those are the things the kernel knows about or has control over.

trusted_for() bounces off the merge window

Posted Apr 13, 2022 19:15 UTC (Wed) by atnot (subscriber, #124910) [Link]

I recall an important use case for this was to integrate well with things like selinux and IMA. In those cases, the kernel would be responsible for security policy and verifying signatures or hashes for executables already. However, for interpreters that causes issues, as the kernel has no way of knowing whether a file is being opened for regular reading or execution, as only the latter should require special verification from the kernel.

trusted_for() bounces off the merge window

Posted Apr 21, 2022 7:42 UTC (Thu) by njs (subscriber, #40338) [Link]

The kernel is already responsible for making the policy decision "is it ok to execute this ELF binary". This is just letting you reuse that same policy machinery for interpreters.

It helps to remember that this is mostly useful for systems with fancy security hardening, e.g. ones where all executables have to be signed with a trusted key, as a way to make life difficult for attackers. This feature should make it possible for that kind of system to e.g. ship a regular python interpreter, without breaking the security hardening.

trusted_for() bounces off the merge window

Posted Apr 28, 2022 7:36 UTC (Thu) by arnout (subscriber, #94240) [Link]

How is access() or faccessat2() supposed to be used safely? Is there a way to avoid the TOCTOU that is mentioned in the man page?

Warning: Using these calls to check if a user is authorized to, for example, open a file before actually doing so using open(2) creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it. For this reason, the use of this system call should be avoided. (In the example just described, a safer alternative would be to temporarily switch the process's effective user ID to the real ID and then call open(2).)