An end to high memory?
A high-memory refresher
The younger readers out there may be forgiven for not remembering just what high memory is, so a quick refresh seems in order. We'll start by noting, for the oldest among our readers, that it has nothing to do with the "high memory" concept found on early personal computers. That, of course, was memory above the hardware-implemented hole at 640KB — memory that was, according to a famous quote often attributed to Bill Gates, surplus to the requirements of any reasonable user. The kernel's notion of high memory, instead, is a software construct, not directly driven by the hardware.
Since the earliest days, the kernel has maintained a "direct map", wherein all of physical memory is mapped into a single, large, linear array in kernel space. The direct map makes it easy for the kernel to manipulate any page in the system; it also, on somewhat newer hardware, is relatively efficient since it is mapped using huge pages.
A problem arose, though, as memory sizes increased. A 32-bit system has the ability to address 4GB of virtual memory; while user space and the kernel could have distinct 4GB address spaces, arranging things that way imposes a significant performance cost resulting from the need for frequent translation lookaside buffer flushes. To avoid paying this cost, Linux used the same address space for both kernel and user mode, with the memory protections set to prevent user space from accessing the kernel's portion of the shared space. This arrangement saved a great deal of CPU time — at least, until the Meltdown vulnerability hit and forced the isolation of the kernel's address space.
The kernel, by default, divided the 4GB virtual address space by assigning 3GB to user space and keeping the uppermost 1GB for itself. The kernel itself fits comfortably in 1GB, of course — even 5.x kernels are smaller than that. But the direct memory map, which is naturally as large as the system's installed physical memory, must also fit into that space. Early kernels could only manage memory that could be directly mapped, so Linux systems, for some years, could only make use of a bit under 1GB of physical memory. That worked for a surprisingly long time; even largish server systems didn't exceed that amount.
Eventually, though, it became clear that the need to support larger installed memory sizes was coming rather more quickly than 64-bit systems were, so something would need to be done. The answer was to remove the need for all physical memory to be in the direct map, which would only contain as much memory as the available address space would allow. Memory above that limit was deemed "high memory". Where the dividing line sat depended entirely on the kernel configuration and how much address space was dedicated to kernel use, rather than on the hardware.
In many ways, high memory works like any other; it can be mapped into user space and the recipients don't see any difference. But being absent from the direct map means that the kernel cannot access it without creating a temporary, single-page mapping, which is expensive. That implies that high memory cannot hold anything that the kernel must be able to access quickly; in practice, that means any kernel data structure at all. Those structures must live in low memory; that turns low memory into a highly contended resource on many systems.
64-Bit systems do not have the 4GB virtual address space limitation, so they have never needed the high-memory concept. But high memory remains for 32-bit systems, and traces of it can be seen throughout the kernel. Consider, for example, all of the calls to kmap() and kmap_atomic(); they do nothing on 64-bit systems, but are needed to access high memory on smaller systems. And, sometimes, high memory affects development decisions being made today.
Inode-cache shrinking vs. highmem
When a file is accessed on a Linux system, the kernel loads an inode structure describing it; those structures are cached, since a file that is accessed once will frequently be accessed again in the near future. Pages of data associated with that file are also cached in the page cache as they are accessed; they are associated with the cached inode. Neither cache can be allowed to grow without bound, of course, so the memory-management system has mechanisms to remove data from the caches when memory gets tight. For the inode cache, that is done by a "shrinker" function provided by the virtual filesystem layer.
In his patch description, Weiner notes that the inode-cache shrinker is allowed to remove inodes that have associated pages in the page cache; that causes those pages to also be reclaimed. This happens despite the fact that the inode-cache shrinker has no way of knowing if those pages are in active use or not. This is, he noted, old behavior that no longer makes sense:
Andrew Morton, it turns out, is the
developer responsible for this behavior, which is driven by the
constraints of high memory. Inodes, being kernel data structures, must
live in low memory; page-cache pages, instead, can be placed in high
memory. But if the existence of pages in the page cache can prevent inode
structures from being reclaimed, then a few high-memory pages can prevent
the freeing of precious low memory. On a system using high memory,
sacrificing many pages worth of cached data may well be worth it to gain a
few hundred bytes of low memory. Morton said
that the problem being solved was real, and that the solution cannot be
tossed even now; "a 7GB
highmem machine isn't crazy and I expect the inode has become larger
since those days
".
The conversation took a bit of a turn, though, when Linus Torvalds interjected
that "in the intervening years a 7GB highmem machine has indeed become
crazy
". He continued that high memory should be now considered
to be deprecated: "In this day and age, there is no excuse for running
a 32-bit kernel with lots of physical memory
". Others were quick to
add their support for this idea; removing high-memory would
simplify the memory-management code significantly with no negative effects
on the 64-bit systems that everyone is using now.
Except, of course, not every system has a 64-bit CPU in it. The area of
biggest concern is the Arm architecture, where 32-bit CPUs are still being
built, sold, and deployed. Russell King noted
that there are a lot of 32-bit Arm systems with more than 1GB of installed
memory being sold: "You're probably
talking about crippling support for any 32-bit ARM system produced
in the last 8 to 10 years
".
Arnd Bergmann provided a
rather more detailed look at the state of 32-bit Arm systems; he noted
that there is one TI CPU that is being actively marketed with the ability
to handle up to 8GB of RAM. But, he said, many new Arm-based devices are
actually shipping with smaller installed memory because memory sizes up to
512MB are cheap to provide. There are phones out there with 2GB of memory
that still need to be supported, though it may be possible to support them
without high memory by increasing the kernel's part of the address space to
2GB. Larger systems still exist, he said, though systems with 3GB or more
"are getting very rare
". Rare is not the same as nonexistent,
though.
The conversation wound down without any real conclusions about the fate of
high-memory support. Reading between the lines, one might conclude that,
while it is still a bit early to deprecate high memory, the pressure to do
so will only increase in the coming years. In the meantime, though, nobody
will try to force the issue by regressing performance on high-memory
systems; the second version
of Weiner's patch retains the current behavior on such machines. So
users of systems needing high memory are safe — for now.
Index entries for this article | |
---|---|
Kernel | Memory management/High memory |
(Log in to post comments)
An end to high memory?
Posted Feb 27, 2020 18:11 UTC (Thu) by willy (subscriber, #9762) [Link]
As always with caches, it's not creating the mapping that's expensive. It's removing it afterwards that's the expensive part!
An end to high memory?
Posted Feb 27, 2020 19:28 UTC (Thu) by mwsealey (subscriber, #71282) [Link]
Just get rid of the linear map, and at the cost of a tiny bit of extra cache and TLB maintenance in certain cases, and not being able to cheap out on pointer math (not that this isn't being tracked many times over anyway) I think everyone's life gets easier and more computer science students would understand how the hell the kernel mm works! Also, no address aliasing..
An end to high memory?
Posted Feb 27, 2020 20:37 UTC (Thu) by luto (subscriber, #39314) [Link]
On an architecture with lousy TLB maintenance facilities like x86, the expense would be *huge*.
An end to high memory?
Posted Feb 27, 2020 20:41 UTC (Thu) by willy (subscriber, #9762) [Link]
As for undergraduates being able to figure out how the Linux MM works ... I don't think HIGHMEM is the problem here.
We may end up with an option to map everything. Some of the security people are this as a solution to the various cache information leakage problems like meltdown and spectre. But it's likely to have a negative performance impact.
An end to high memory?
Posted Feb 29, 2020 2:57 UTC (Sat) by mwsealey (subscriber, #71282) [Link]
“Low” or pinned memory for kernel allocations needs to be there, I don’t deny that. However as a way of providing watermarks for particular kinds of allocations is it *really* “faster” to not have to map it?
For 64-bit kernels not having the linear map of an arbitrary amount of physical memory and just having those watermarks will still divide throwaway pages from fixed ones. For 32-bit kernels they hit HIGHMEM pressure so quickly it doesn’t makes sense to me “reserve” the memory used for non-HIGHMEM regions.
In any case by removing the linear map no physical address is mapped >=twice to virtual addresses without there being an extremely specific purpose for it. Userspace pages will never “turn up” in the middle of kernel space, or alias with other memory types.
For Arm the linear map is just trouble - I spend a good deal of time running training courses on Arm architecture and it is always disappointing to have to point out that a particular architectural feature like fine grained memory type specification is just a pain in the backside in Linux.
If you have a need to mark something as only outer cacheable or not-outer-shareable (and your HW supports that) the possibility that it also exists in the linear map as inner cacheable and outer shareable too just defeats any efficiencies you could gain from it. It serves to promote all cache activity to “L1 vs. DRAM” and all DVM transactions towards all other devices with nothing lighter in between. The x86-ism of poor TLB functionality and limited memory types and coherency traffic spamming leaks to other architectures like a plague.
Getting rid of HIGHMEM and making everything a linear mapped address range just codifies that x86-ism for 64-bit, while crippling 32-bit. I can’t say I like either idea. I’d rather see HIGHMEM live on and we reduce our requirement (and subsequently watermarks) for NORMAL. Spectre/Meltdown and the rise of virtualization have helped now x86 cores are gaining more efficient TLB flush operations, working ASIDs et al.
I reckon it’s worth investigating, for all we know it could surprisingly end up more performant and with less side effects and give Linux a few extra features on top of a more fine-grained memory management subsystem. I’m failing to find significant parts of mm that are actually hardcore enough to rely on the linear mapping and don’t have a “slow path” through using high memory (what requires it will allocate it, and then it’s doing the job it is meant to do), and from what I’ve seen the machinations around contiguous/large pages and other zone allocators like CMA already implement all you’d need to efficiently divorce the kernel from having to keep the linear map around.
An end to high memory?
Posted Feb 27, 2020 18:21 UTC (Thu) by ecm (subscriber, #129897) [Link]
This does not seem entirely clear.
Actually, there is an area on the IBM PC and 86-DOS platform that is called the Upper Memory Area (UMA), which can hold things such as ROMs, video memory, EMS frames, and Upper Memory Blocks (UMBs). It usually lies between 640 KiB (linear A_0000h) up to 1024 KiB (10_0000h). It wasn't due to the CPU however, but due to the IBM PC's memory layout.
The High Memory Area starts at 1024 KiB and reaches up to 1088 KiB minus 16 Bytes. This was introduced with the 286 when the physical memory was expanded to 24 address lines. This meant that even in Real 86 Mode (and on the 386 likewise in Virtual 86 Mode), the segmented addresses between 0FFFFh:0010h and 0FFFFh:0FFFFh would no longer wrap around to the linear addresses within the first 64 KiB minus 16 Bytes, and would instead form a 21-bit linear address pointing into the HMA. (This is also what A20 was about.)
I call the memory area from zero up to below the UMA the Lower Memory Area, in symmetry with the UMA and HMA. It is also called base memory or conventional memory.
To add to the confusion, in German MS-DOS localisations, the UMA was called hoher Speicherbereich ("high Memory-area") and then the HMA was called oberer Speicherbereich ("upper Memory-area"). That is, the terms were swapped as compared to the English terms. This I believe was fixed starting in MS-DOS 7.00 (bundled with MS Windows 95).
An end to high memory?
Posted Feb 27, 2020 21:20 UTC (Thu) by farnz (subscriber, #17727) [Link]
Note, though, that the IBM PC has that memory layout because the BIOS ROM has to be mapped at 0xFFFF0 on the 8086/8088 - CS:IP of 0xFFFF:0x0000 - because that's where the reset vector is, and you also want 0x00000 to 0x00400 to be in RAM because that's where the 8086/88 put their Interrupt Vector Table.
The Model 5150 was being built to a price point, and was in any case limited to 256 KiB RAM, so it wasn't worth the extra ICs needed to allow you to remap all but the first 1 KiB RAM and the BIOS ROM after reset, when you could leave them in place.
Arguably, things would have been different if IBM had used the Motorola 68k instead of the 8088; the 68k has its reset vector at address 0, and thus you'd naturally put your ROM at the beginning of address space.
An end to high memory?
Posted Feb 28, 2020 11:48 UTC (Fri) by ecm (subscriber, #129897) [Link]
An end to high memory?
Posted Feb 28, 2020 14:18 UTC (Fri) by farnz (subscriber, #17727) [Link]
In the original PC, they didn't set a 640 KiB limit (that comes in with the EGA card. Original IBM PCs with MDA displays have a 704 KiB limit, CGA raises that to 736 KiB, and a theoretical video adapter using I/O ports instead of a memory mapped buffer could raise it to 768 KiB.
Honestly, it looks like IBM never really thought about the real mode address limits; EGA lowered it to 640 KiB, but that comes in with the 80286 in the PC/AT, which in theory could be run in protected mode and thus not have issues around the 1 MiB limit. OS/2 1.x could thus have ensured we never needed to know about HMA, UMA, XMS etc, had IBM's vision been successful, and hence the "640 KiB" limit of their 8088 products would never have mattered. It's just that IBM failed in delivering its vision, and thus we kept treating x86 systems as ways to run DOS for far longer than intended.
An end to high memory?
Posted Mar 6, 2020 11:32 UTC (Fri) by khim (subscriber, #9252) [Link]
Indeed IBM was never concerned about memory limits and haven't planned for that infamous 640KiB barrier to ever exist.
You opponent says IBM didn't have to reserve ⅜ of the (20-bit) address space for the UMA, 256 KiB or even only 128 KiB would have been possible too… but IBM haven't reserved ⅜ of the address space for that! Look on the System Memory Map in the manual. It places "128KB RESERVED GRAPHIC/DISPLAY BUFFER" at 256K position!
XT acknowledged the fact that you may actually add more RAM, but even AT manual says it's not a standard use but option which requires an addon card!
In fact that's how IBM Itself perceived it till PS/2: that's why they were so happy to allow IBM PC DOS 4 to become so much larger than IBM PC DOS 3. The idea was: "hey, 512K is standard now, we could offer option card we talked around long ago and people would get bigger DOS, yet more space for programs, too.
Only by that time “an option” have already become de-facto “standard” and many packages needed 640KiB with DOS 3…
P.S. And after all that IBM decided that it's doesn't matter how much BIOS takes since everyone would soon use OS/2 anyway… and introduced ABIOS… but that's another story…
An end to high memory?
Posted Feb 27, 2020 20:35 UTC (Thu) by Polynka (guest, #129183) [Link]
Oldest? I was born in 1991 and still remember it. In fact I was really confused when I have read the title of this article, thinking that it refers to the HMA.
An end to high memory?
Posted Feb 28, 2020 7:50 UTC (Fri) by cpitrat (subscriber, #116459) [Link]
... Fruit flies like a banana.
An end to high memory?
Posted Feb 28, 2020 15:19 UTC (Fri) by Polynka (guest, #129183) [Link]
I’m pretty sure that round fruit fly like an apple, not like a banana. In fact, are there any other fruit shaped like a banana besides bananas? Because flight of bananas may be actually _sui generis_.
An end to high memory?
Posted Feb 27, 2020 20:52 UTC (Thu) by flussence (subscriber, #85566) [Link]
There are hidden kconfig options to make that 2 or 3GB, and there are third-party patches to make the default 1GB. It's probably for compatibility with some ancient blob, because I can't imagine that being a good default to keep otherwise; there are quite a few 32-bit systems out there sitting between ⅞ and 4GB of RAM installed.
An end to high memory?
Posted Feb 28, 2020 10:04 UTC (Fri) by eru (subscriber, #2753) [Link]
An end to high memory?
Posted Feb 28, 2020 13:12 UTC (Fri) by leromarinvit (subscriber, #56850) [Link]
I wonder what the performance impact of just getting rid of the split in this manner would be. Isn't the performance advantage already rendered moot by KPTI?
An end to high memory?
Posted Feb 28, 2020 14:56 UTC (Fri) by arnd (subscriber, #8866) [Link]
However, it does seem possible to use a 4G vmsplit on 32-bit ARM (probably also MIPS or others) ASIDs to avoid the TLB flush and keep the performance impact way down. On ARM with LPAE, we could e.g. use the split page tables to have the top 256MB (or another power-of-two size) reserved for vmalloc pages, modules, MMIO and the kernel image, while the lower 3.75GB are mapped to either user space or a large linear map. There is some extra overhead for copy_to_user() etc, but also extra isolation between user and kernel addresses that may provide security benefits.
It's also interesting that the original VMPLIT_4G_4G patches were not even intended to replace highmem, but to allow configurations with 32GB or 64GB of physical RAM that would otherwise fill a lot of the 896MB lowmem area with 'struct page' structures even before you start allocating inodes, dentries or page tables.
An end to high memory?
Posted Feb 29, 2020 12:25 UTC (Sat) by willy (subscriber, #9762) [Link]
An end to high memory?
Posted Feb 29, 2020 17:56 UTC (Sat) by nivedita76 (guest, #121790) [Link]
An end to high memory?
Posted Feb 29, 2020 19:29 UTC (Sat) by willy (subscriber, #9762) [Link]
Yes, you're correct, only 896MB of physical RAM is usable unless you enable at least HIGHMEM4G.
Updated list of machines with 4GB or more
Posted Feb 28, 2020 16:05 UTC (Fri) by arnd (subscriber, #8866) [Link]
- Calxeda Midway servers used in build farms for native 32-bit builds to avoid differences in compilation results between running on 32-bit and 64-bit kernels, usually with 16GB of RAM.
- Most TI keystone-2 systems (sometimes up to 8GB): https://lwn.net/ml/linux-kernel/7c4c1459-60d5-24c8-6eb9-d...
- Baikal T1 (MIPS, not ARM) with up to 8GB: https://www.t-platforms.ru/production/products-on-baikal/
- Dragonbox Pyra game consoles with TI OMAP5: https://pyra-handheld.com/boards/pages/pyratech/
- Novena Laptop: https://www.crowdsupply.com/sutajio-kosagi/novena
- SolidRun CuBox Pro i4x4 (early models only) https://www.solid-run.com/solidrun-introduces-4gb-mini-co...
- Tegra K1 and Rockchips RK3288 based Chromebooks: https://www.chromium.org/chromium-os/developer-informatio...
- Very rare industrial NXP i.MX6 and Renesas RZ/G1 systems (most board manufacturers I talked to said they never sold 4GB options, or never even offered them because of 8Gbit DDR3 availability and cost reasons)
Updated list of machines with 4GB or more
Posted Feb 29, 2020 1:23 UTC (Sat) by clopez (guest, #66009) [Link]
How can be posible that a compiler produces different results depending on the kernel running the compiler (instead of depending on the kernel targeted by the compiler)?
How crosscompilers even work then?
Updated list of machines with 4GB or more
Posted Feb 29, 2020 10:12 UTC (Sat) by rossburton (subscriber, #7254) [Link]
I’m guessing the point was to do native and not cross compilation.
Updated list of machines with 4GB or more
Posted Feb 29, 2020 10:48 UTC (Sat) by arnd (subscriber, #8866) [Link]
This is about building entire packages, not just a single source file, so a lot of differences between native 32-bit kernels and compat mode can be relevant, though it would usually be a bug in a package if they are:
* The contents of /proc/cpuinfo, uname or the cpuid registers are different, which can confuse configure scripts in all kinds of ways, such as thinking they are cross-compiling when they should be building natively, or being unable to parse the incompatible /proc/cpuinfo format. Simply changing these to look like an ARMv7 CPU would be bad for applications that have a legitimate interest in finding out the actual CPU type on a 64-bit kernel.
* For doing local builds, upstream maintainers may default to using -march=native, but when building a distro package you normally want to target the minimum supported CPU instead. This bug can also hit when building an ARMv6 package on an ARMv7VE host just as it can when building an i586 package on Skylake x86-64 host.
* The compat mode in the kernel is a pretty good approximation of the native interfaces, but there are always a few differences -- usually those are bugs in compat handling, but there is also sysvipc being broken on native sparc32 kernels or such as arm32 kernels traditionally not having NUMA syscalls, both of which worked fine on 64-bit compat mode.
* On 64-bit kernels, you usually have 4GB of virtual address space for user space, while native kernels have only 3GB or less. This is great for most applications, but it can cause problems in a runtime environment that treats any high pointer value as a special cookie.
Some of these can be worked around using the 'personality' interfaces, others should probably be configurable the same way, and some should just be considered bugs that need to be fixed in upstream user space packages. I personally think the build systems should use 64-bit kernels with 32-bit user space and fix the resulting bugs, but it's not my decision.
> How crosscompilers even work then?
You can probably cross-compile a majority of all packages in a distro these days, but there are enough exceptions that Debian or OBS decide to only build packages natively.
An end to high memory?
Posted Mar 1, 2020 12:32 UTC (Sun) by ldearquer (guest, #137451) [Link]
An end to high memory?
Posted Mar 2, 2020 1:05 UTC (Mon) by dave4444 (guest, #127523) [Link]
Lets also not forget about highmem's ugly step child, the bounce buffer for IO devices that DMA to/from high memory. Oh the headaches from that.
An end to high memory?
Posted Mar 5, 2020 18:52 UTC (Thu) by kpfleming (subscriber, #23250) [Link]
An end to high memory?
Posted Mar 6, 2020 9:39 UTC (Fri) by geert (subscriber, #98403) [Link]
An end to high memory?
Posted Mar 6, 2020 10:59 UTC (Fri) by kpfleming (subscriber, #23250) [Link]
An end to high memory?
Posted Mar 6, 2020 11:04 UTC (Fri) by geert (subscriber, #98403) [Link]
An end to high memory?
Posted Mar 6, 2020 11:09 UTC (Fri) by kpfleming (subscriber, #23250) [Link]
An end to high memory?
Posted Mar 2, 2020 19:31 UTC (Mon) by neilbrown (subscriber, #359) [Link]
32bit MIPS needs high memory just to make use of 512MB of RAM. The first 448MB are directly mapped, then there is IO space, then there is the rest of RAM as high-mem.
Maybe this "interesting" address space layout could be managed differently to avoid the dependency on highmem - I don't know.
[ 0.000000] MIPS: machine is GB-PC1 [ 0.000000] Determined physical RAM map: [ 0.000000] memory: 1c000000 @ 00000000 (usable) [ 0.000000] memory: 04000000 @ 20000000 (usable) ... [ 0.000000] Zone ranges: [ 0.000000] Normal [mem 0x0000000000000000-0x000000001fffffff] [ 0.000000] HighMem [mem 0x0000000020000000-0x0000000023ffffff] ...
An end to high memory?
Posted Mar 6, 2020 14:19 UTC (Fri) by arnd (subscriber, #8866) [Link]
As far as I can tell, most MIPS32r2 based chips are limited to less than 512MB anyway by having only a single DDR2 channel, your MT7621A being a notable exception that has up to 512MB DDR3.
Out of the chips that support more than 512MB, the ones based on MIPS32r3 or higher (Baikal, Intel/Lantiq, ...), may be able to use th eextended virtual addressing to allow larger direct lowmem mappings with a bit of work.
The Creator CI20 is one that has 1GB of RAM on MIPS32r2, and I can see that the bmips defconfigs enable highmem support, so I would guess they also need it, but I could not find any actual products. Are there others?
An end to high memory?
Posted Mar 8, 2020 23:46 UTC (Sun) by neilbrown (subscriber, #359) [Link]
I know about precisely one MIPS board - the MT7621A that you have already mentioned. So you already know more than me :-(
Maybe ask on linux-mips ??
An end to high memory?
Posted Jan 30, 2023 1:57 UTC (Mon) by ringerc (subscriber, #3071) [Link]
The 16 Exabyte boundary will be a problem before we know it 😀