Exclusive page-frame ownership
An attacker wanting to get the kernel to run arbitrary code faces a problem: where can that code be put so that the kernel might run it? If the kernel can be convinced to run code found in user space, that problem becomes much easier to solve, since placing code in user-space memory is something that anybody can do. Since user-space memory remains mapped while the processor is running in kernel mode, all that needs to be done is to convince the kernel to jump to a user-space address. Years ago, it was possible to simply map the page at address 0 and find a bug that would cause the kernel to jump to a null pointer. Such simple attacks have been headed off, but more complex exploits are still common.
Obviously, the best solution is to ensure that the kernel will never try to jump to a user-space address. If one accepts that there will always be bugs, though, it makes sense to add other defenses, such as simply preventing the execution of user-space memory by the kernel. The PaX KERNEXEC and UDEREF mechanisms are designed to prevent this kind of user-space access. More recently, the processor manufacturers have gotten into the game as well; Intel now has supervisor mode access prevention and supervisor mode execute protection, while ARM has added privileged execute-never. On systems where these mechanisms are fully implemented, it should be impossible for the kernel to execute code found in user-space memory.
Except, as this paper from Vasileios P. Kemerlis et al. [PDF] points out, there's a loophole. User-space memory is accessed via a process's page tables, and the various access-prevention mechanisms work to block kernel access via those page tables. But the kernel also maintains a linear mapping of the entire range of physical memory (on 64-bit systems; the situation on 32-bit systems is a bit more complicated). This mapping has many uses within the kernel, with page-level memory management being near the top of the list. It provides a separate address for every physical page in the system. Importantly, it's a kernel-space address and, on some systems (x86 before 3.9 and all ARM), this memory range is executable by the kernel.
If an attacker can cause the kernel to jump into the direct mapping, none of the user-space access-prevention mechanisms will apply, even if the target address corresponds to a user-space page. So the direct mapping offers a convenient way to bypass these protections, with only one little catch: an attacker must be able to determine the physical address of the page containing the exploit code. As the paper points out, the pagemap files under /proc will provide that information, and, while these files can be disabled, distributions tend not to do that. So, on most systems, everything is in place to enable an attacker to exploit a bug that can cause a jump to an arbitrary address and the existing access-prevention mechanisms are powerless to stop it.
(Life gets a little harder on current x86 kernels, where it is no longer possible to directly execute code via the direct mapping. In such cases, the attacker must resort to return-oriented programming instead — not a huge obstacle for many attackers.)
The solution, as described in the paper and implemented in the exclusive page frame ownership (XPFO) patch set posted by Juerg Haefliger, is to take away the back-door access to user-space pages via the direct mapping. The mechanism is fairly simple in concept. Whenever a page is allocated for user-space use (something the kernel already indicates with the GFP flags in the allocation request), the direct mapping for that page is removed. Thus, even if an attacker can generate the directly mapped address for the page and get the kernel to jump there, the kernel will fault due to lack of access permissions to that page. When user space frees a page, it will be zeroed (to prevent attacks via hostile code left in the page) and returned to the direct map.
There are times when the kernel must access user-space memory, of course; the copy_to_user() and copy_from_user() functions are obvious examples. In such cases, the direct mapping is restored for the duration of the operation.
Naturally, there is a performance cost to this. The mapping and unmapping of pages in the kernel's address space will slow things down somewhat, as will the zeroing of returned user-space pages. Perhaps more significant, though, is a change in how the direct mapping is implemented. Normally, the kernel creates this mapping with huge pages; that, among other things, greatly reduces the pressure on the processor's translation lookaside buffer (TLB) when the direct mapping is accessed. But use of huge pages is incompatible with adding and removing mappings for individual (small) pages in that range, so, with XPFO, the huge-page mappings have to go. There is also some increased memory overhead resulting from the need to store more per-page information. All told, enabling XPFO has a performance cost up to about 3% in the worst case, though most of the benchmarks reported in the paper suffered much less than that.
The patch set needs some completion work before it can be seriously
considered for merging into the mainline. Once that point comes, one can
assume that the conversation will hinge on how effective it is at
preventing exploits and whether it is worth the performance cost. The fact
that the slowdown for kernel builds is 2.5% could prove to be a bit of an
obstacle in this discussion. A performance hit on that scale is a hard
thing to swallow, but so are successful exploits. Which pill will prove to
be the bitterest will have to be seen as the patch set progresses.
Index entries for this article | |
---|---|
Kernel | Memory management/Address-space isolation |
Kernel | Security/Kernel hardening |
(Log in to post comments)
Exclusive page-frame ownership
Posted Sep 15, 2016 1:11 UTC (Thu) by spender (guest, #23067) [Link]
I should also mention that there are numerous glaring factual errors in the article (unsurprisingly due to repeating the same mistakes of the paper author and not adjusting some of the claims to kernels within the past 2 years or so) but I'm curious to see if anyone else can spot them without my help, or if everyone will just take the article at face value.
-Brad
Exclusive page-frame ownership
Posted Sep 15, 2016 6:35 UTC (Thu) by tao (subscriber, #17563) [Link]
Patch submissions, with follow-through, until the patch has been merged will always count for more than an out of tree patch that the author, and their fanclub, doesn't want to, or cannot be bothered to, go through the effort of getting merged.
Exclusive page-frame ownership
Posted Sep 15, 2016 8:56 UTC (Thu) by PaXTeam (guest, #24616) [Link]
> and their fanclub, doesn't want to, or cannot be bothered to, go through the effort of getting merged.
here's some reading material: https://lwn.net/Articles/700358/ tl;dr: you don't get to make demands on my free time and then blame me for not abiding by them.
Exclusive page-frame ownership
Posted Sep 15, 2016 17:55 UTC (Thu) by flussence (subscriber, #85566) [Link]
That's exactly what you've been doing for years. You go a bit beyond lashing out at merely the upstream kernel hand that feeds you though; you also spew venom at reporters, developers and end users who don't shower you in praise and roses while also telling them you know what's best for them.
It's a shame you're *not* a paid slander-mercenary like rcweir, because at least then someone could pull the plug to spare us all from this whinging.
Exclusive page-frame ownership
Posted Sep 15, 2016 18:20 UTC (Thu) by PaXTeam (guest, #24616) [Link]
OK...
Posted Sep 15, 2016 18:22 UTC (Thu) by corbet (editor, #1) [Link]
I don't think this conversation is doing any good for anybody involved, bystanders included. Perhaps it's time for everybody to stop throwing mudballs and let it be, please?
Exclusive page-frame ownership
Posted Sep 15, 2016 11:08 UTC (Thu) by spender (guest, #23067) [Link]
-Brad
Exclusive page-frame ownership
Posted Sep 15, 2016 11:50 UTC (Thu) by pizza (subscriber, #46) [Link]
In the immortal words of _The Critic_ -- "And nothing of value was lost."
Exclusive page-frame ownership
Posted Sep 15, 2016 12:08 UTC (Thu) by spender (guest, #23067) [Link]
-Brad
Exclusive page-frame ownership
Posted Sep 15, 2016 13:40 UTC (Thu) by pizza (subscriber, #46) [Link]
Perhaps, but I'm not claiming that my comments here are of any inherent value, nor am I threatening to deprive LWN readers the awesomeness of my "free review or hints".
> Have you ever contributed anything of technical value?
Why yes, I have:
Two chunks of Linux kernel code bear my name -- One wifi driver that I heavily cleaned up [ie rewrote large chunks of], mainlined and still maintain, and another that I originally wrote but someone else mainlined (after some rewriting).
(So yes, I have first-hand knowledge of just how much work mainlining something can be. Incidentally, the latter bears quite a few similarities to this situation -- Only I had the maturity to not complain about how those folks doing the work I was unwilling to do were somehow doing it all wrong. I even tossed some patches their way to help out)
I'm also a major contributor to Gutenprint, having written drivers for several dozen photo printers.
Beyond that, there's a long tail of mostly-minor contributions to various F/OSS projects. I try to give back whenever possible, and my track record shows that I play well with others.
Any more questions?
Exclusive page-frame ownership
Posted Sep 16, 2016 15:36 UTC (Fri) by jschrod (subscriber, #1646) [Link]
Thanks in advance.
Exclusive page-frame ownership
Posted Sep 15, 2016 1:46 UTC (Thu) by PaXTeam (guest, #24616) [Link]
Exclusive page-frame ownership
Posted Sep 15, 2016 3:27 UTC (Thu) by luto (subscriber, #39314) [Link]
Exclusive page-frame ownership
Posted Sep 16, 2016 8:31 UTC (Fri) by pbonzini (subscriber, #60935) [Link]
Exclusive page-frame ownership
Posted Sep 15, 2016 3:58 UTC (Thu) by kees (subscriber, #27264) [Link]
PXN/SMEP: http://kernsec.org/wiki/index.php/Exploit_Methods/Userspa...
PAN/SMAP: http://kernsec.org/wiki/index.php/Exploit_Methods/Userspa...
If there are mistakes there, let me know and I'll fix 'em. :)
Exclusive page-frame ownership
Posted Sep 15, 2016 12:21 UTC (Thu) by PaXTeam (guest, #24616) [Link]
2. the UDEREF style page table entry shadowing and switching on user/kernel transitions would work on any arch that can otherwise support kernel mode execution control (so UDEREF works on pre-IVB, let alone pre-BDW). if the arch has some form of address space/context ID mechanism then this can be further optimized though in my experience the end result still sucks for performance unfortunately.
3. i wouldn't call data access control/prevention a superset of execution prevention as i think most processors clearly distinguish between insn fetches and data accesses (different caches, TLBs, access control, etc) and thus you can control them indepedently.
Exclusive page-frame ownership
Posted Sep 15, 2016 19:55 UTC (Thu) by kees (subscriber, #27264) [Link]
2. I think I covered that already in the text above the tables ("via separate page tables" and "page table swapping"). Is it the table's "could use PCID?" note that feels inaccurate?
3. Yup, fair point. I've clarified it to mention the emulation case (e.g. CONFIG_SW_DOMAIN_PAN provides PXN emulation as well as PAN emulation) and distinguish the instruction fetch from data access.
Thanks!
Exclusive page-frame ownership
Posted Sep 15, 2016 10:37 UTC (Thu) by MarkRutland (subscriber, #74197) [Link]
I think that the paper is somewhat outdated w.r.t. its comments regarding ARM.
For arm, since v3.18, as of commit 1e6b48116a95046e ("ARM: mm: allow non-text sections to be non-executable"), the linear map is not executable, modulo a small overlap with the kernel text mapping.
For arm64, since v3.19, commit da141706aea52c1a ("arm64: add better page protections to arm64"), the linear map is not executable, modulo a small overlap with the kernel text mapping. v4.5 commit 068a17a5805dfbca ("arm64: mm: create new fine-grained mappings at boot") gets rid of that overlap.
Exclusive page-frame ownership
Posted Sep 18, 2016 17:44 UTC (Sun) by rjw@sisk.pl (subscriber, #39252) [Link]