Controlling access to user namespaces
LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing |
The user namespaces feature holds an interesting promise for system security: users can be confined within a namespace, given full root privileges within that namespace, and still be unable to adversely affect the system as a whole. The path to better security has, perhaps predictably, proved to be a bit rocky, however. In response, there is now an effort to make the feature configurable by system administrators, but this new configuration knob is proving to be a harder sell than one might expect.
User namespaces are created by passing the CLONE_NEWUSER flag to the clone() or unshare() system calls. Administrators who are nervous about allowing access to this feature currently only have one option: configure out support at kernel build time. That option is not easily available to the many systems running distribution-built kernels, though. Kees Cook set out to create an easier way with this patch set creating a new sysctl knob to control access to the user-namespace feature, saying:
In particular, the patch adds a knob called /proc/sys/kernel/userns_restrict. When it is set to the default value (zero), user namespaces are unrestricted. Setting it to one allows only privileged users to create user namespaces; a setting of two disables user namespaces altogether. In that final case, it is not possible to re-enable user namespaces without rebooting the system.
One of the first issues to be aired had to do with naming: it turns out that Debian currently carries a similar patch, but, on Debian systems, the knob is called unprivileged_userns_clone and doesn't support the "privileged users only" setting. Ben Hutchings agreed that the new naming was probably better and said that, should Kees's patch go upstream, Debian would slowly move over to it.
Some developers worried that allowing user namespaces to be turned off would slow the process of finding and fixing any remaining security issues. Additionally, Serge Hallyn suggested that, if application developers could not count on the availability of user namespaces, they wouldn't use them at all. He suggested that, if the knob is accepted, it be marked as a short-term workaround that would eventually be removed.
The strongest opposition, though, came from Eric Biederman, the creator of
user namespaces and also the developer who has done the most work on the
sysctl code in recent times. He stated
flat out that "the code is buggy, and poorly thought through
"
and would not be merged. In another
message he described his objections in detail, starting with a challenge
to the idea that user namespaces are a security risk at all:
Others, though, seem to think that, if problems elsewhere are being "amplified," there is indeed a security exposure. Andy Lutomirski described some concerns of his own:
Eric echoed the point that making it possible to disable user namespaces would be a net loss in security, since the feature would not be available on all systems. He cited web browsing with Chrome as a use case; Kees responded that this patch wasn't really aimed at desktop systems in the first place.
Next on Eric's list was a complaint that a system-wide knob was too coarse;
he suggested that perhaps the seccomp() mechanism should be used
instead if access to user namespaces must really be restricted. Kees's
answer here is that it's not really possible to set a global
seccomp() policy, that performance would suffer in any case, and
that seccomp() is meant for developers to use rather than system
administrators. "It's an extraordinarily big hammer for wanting to
turn off a single area of the kernel with a long history of
problems.
" He noted that trying to use a Linux security module to
achieve this end would have a number of similar problems.
Then, Eric said, the sysctl knob could create "a false sense of
security
" since it would have no effect on processes that are
already running in a user namespace. If a security issue comes to light,
just turning off the knob will not be enough to protect a system; a reboot
will also be necessary. Eric returned to
this point later, calling the patch "fatally flawed
" as a result of
the "subtlety and nuance
" involved in using it.
Kees acknowledged the "corner case" in the
sysctl implementation, one that, he said, applies to a number of other,
existing knobs as well. But, he said, it really does not matter to an
administrator who simply wants to disable the feature outright as a way of
reducing the attack surface of a system. Even so, he allowed: "
As a sort of postscript,
Eric suggested that, perhaps, the desired restriction could be
implemented as a resource limit controlling the number of user namespaces
that any user would be allowed to create. Setting that number to zero
would effectively disable the feature. Kees indicated a willingness to
look at this idea; it is the end result he wants, rather than the sysctl
knob itself.
There is an evident desire for the ability to turn off access to user
namespaces; various other developers spoke in its favor over the course of
the discussion. But this desire is clearly not universal and, as a
result, the current
patches do not appear to have an easy path into the mainline. It is
entirely possible that the concerns blocking this feature may eventually be
addressed and overcome, but it also seems possible that, in the end, this
knob ends up being part of the patch set carried by distributors and
users. It seems that getting security-related changes into the kernel is
still a difficult task.
I'm
open to having this sysctl kill all CLONE_NEWUSERed process trees
",
without noting that having a sysctl knob kill off processes might pose some
interesting "subtlety and nuance" of its own.
Index entries for this article Kernel Namespaces/User namespaces Kernel Security/Namespaces Security Linux kernel Security Namespaces
(Log in to post comments)
Controlling access to user namespaces
Posted Jan 29, 2016 1:56 UTC (Fri) by zuki (subscriber, #41808) [Link]
Also, I don't really buy the argument that setting the sysctl does not work retroactively and this is terrrrrrible. The same is true for most settings... If I had a setuid binary, dropping the bit only affects the future, running instances are not killed. If I change the permissions on a file, processes which had it open just continue. Etc, etc. For example kernel.modules_disabled=1 follows a similar pattern.
It seems that EB doesn't like that people want to disable some feature which he deeply cares about and loses objectivity. The "shortcomings" of the patch seem like things made up post factum to justify the initial emotional response.
Also a global per-user limit doesn't seem very useful. If there's a vulnerability, just one namespace is enough to exploit it. And otherwise, why would we care how many namespaced processes are running? So only two values of the limit make sense: 0 and infinity. So we're back to the original sysctl patch.