Go's memory management, ulimit -v, and RSS control

By Jonathan Corbet
February 15, 2011

Many years ago, your editor ported a borrowed copy of the original BSD vi editor to VMS; after all, using EDT was the sort of activity that lost its charm relatively quickly. DEC's implementation of C for VMS wasn't too bad, so most of the port went reasonably well, but there was one hitch: the vi code assumed that two calls to sbrk() would return virtually contiguous chunks of memory. That was true on early BSD systems, but not on VMS. Your editor, being a fan of elegant solutions to programming problems, solved this one by simply allocating a massive array at the beginning, thus ensuring that the second sbrk() call would never happen. Needless to say, this "fix" was never sent back upstream (the VMS uucp port hadn't been done yet in any case) and has long since vanished from memory.

That said, your editor was recently amused by this message on the golang-dev list indicating that the developers of the Go language have adopted a solution of equal elegance. Go has memory management and garbage collection built into it; the developers believe that this feature is crucial, even in a systems-level programming language. From the FAQ:

One of the biggest sources of bookkeeping in systems programs is memory management. We feel it's critical to eliminate that programmer overhead, and advances in garbage collection technology in the last few years give us confidence that we can implement it with low enough overhead and no significant latency.

In the process of trying to reach that goal of "low enough overhead and no significant latency," the Go developers have made some simplifying assumptions, one of which is that the memory being managed for a running application comes from a single, virtually-contiguous address range. Such assumptions can run into the same problem your editor hit with vi - other code can allocate pieces in the middle of the range - so the Go developers adopted the same solution: they simply allocate all the memory they think they might need (they figured, reasonably, that 16GB should suffice on a 64-bit system) at startup time.

That sounds like a bit of a hack, but an effort has been made to make things work well. The memory is allocated with an mmap() call, using PROT_NONE as the protection parameter. This call is meant to reserve the range without actually instantiating any of the memory; when a piece of that range is actually used by the application, the protection is changed to make it readable and writable. At that point, a page fault on the pages in question will cause real memory to be allocated. Thus, while this mmap() call will bloat the virtual address size of the process, it should not actually consume much more memory until the running program actually needs it.

This mechanism works fine on the developers' machines, but it runs into trouble in the real world. It is not uncommon for users to use ulimit -v to limit the amount of virtual memory available to any given process; the purpose is to keep applications from getting too large and causing the entire system to thrash. When users go to the trouble to set such limits, they tend, for some reason, to choose numbers rather smaller than 16GB. Go applications will fail to run in such an environment, even though their memory use is usually far below the limit that the user set. The problem is that ulimit -v does not restrict memory use; it restricts the maximum virtual address space size, which is a very different thing.

One might argue that, given what users typically want to do with ulimit -v, it might make more sense to have it restrict resident set size instead of virtual address space size. Making that change now would be an ABI change, though; it would also make Linux inconsistent with the behavior of other Unix-like systems. Restricting resident set size is also simply harder than restricting the virtual address space size. But even if this change could be made, it would not help current users of Go applications, who may not update their kernels for a long time.

One might also argue that the Go developers should dump the continuous-heap assumption and implement a data structure which allows allocated memory to be scattered throughout the virtual address space. Such a change also appears not to be in the cards, though; evidently that assumption makes enough things easy (and fast) that they are unwilling to drop it. So some other kind of solution will need to be found. According to the original message, that solution will be to shift allocations for Go programs (on 64-bit systems) up to a range of memory starting at 0xf800000000. No memory will be allocated until it is needed; the runtime will simply assume that nobody else will take pieces of that range in between allocations. Should that assumption prove false, the application will die messily.

For now, that assumption is good; the Linux kernel will not hand out memory in that range unless the application asks for it explicitly. As with many things that just happen to work, though, this kind of scheme could break at any time in the future. Kernel policy could change, the C library might begin doing surprising things, etc. That is always the hazard of relying on accidental, undocumented behavior. For now, though, it solves the problem and allows Go programs to run on systems where users have restricted virtual address space sizes.

It's worth considering what a longer-term solution might look like. If one assumes that Go will continue to need a large, virtually-contiguous heap, then we need to find a way to make that possible. On 64-bit systems, it should be possible; there is a lot of address space available, and the cost of reserving unused address space should be small. The problem is that ulimit -v is not doing exactly what users are hoping for; it regulates the maximum amount of virtual memory an application can use, but it has relatively little effect on how much physical memory an application consumes. It would be nice if there were a mechanism which controlled actual memory use - resident set sizes - instead.

As it turns out, we have such a mechanism in the memory controller. Even better, this controller can manage whole groups of processes, meaning that an application cannot increase its effective memory limit by forking. The memory controller is somewhat resource-intensive to use (though work is being done to reduce its footprint) and, like other control group-based mechanisms, it's not set up to "just work" by default. With a bit of work, though, the memory controller could replace ulimit -v and do a better job as well. With a suitably configured controller running, a Go process could run without limits on address space size and still be prevented from driving the system into thrashing. That seems like a more elegant solution, somehow.

Index entries for this article
Kernel	Memory management

(Log in to post comments)

Go's memory management, ulimit -v, and RSS control

Posted Feb 17, 2011 4:20 UTC (Thu) by alkbyby (subscriber, #61687) [Link]

AFAIR memory controller just kills processes that allocate more than allowed, whereas rlimit just makes malloc call return 0.

This is argument for the controller, not against it...

Posted Feb 17, 2011 5:55 UTC (Thu) by khim (subscriber, #9252) [Link]

In practice programs don't cope well with the case where malloc returns 0. Worse: often they DO contain code which tries to do something but in reality it's poorly tested, buggy and often destroys more data then it saves.

It's probably good idea to provide some warning when application is close to the limit (it's usually much easier to cope with "low memory" problem rather then "no memory" problem), but that's separate issue.

This is argument for the controller, not against it...

Posted Feb 18, 2011 1:08 UTC (Fri) by giraffedata (guest, #1954) [Link]

Yes, killing the process is almost always better than failing the allocation. ISTR seeing Linux do that sometimes for rlimit violation; maybe there's a switch for that.

Besides the fact that programmers just don't take the time to tediously check every memory allocation, there's not much they can do anyway if there is no memory available. It takes a pretty intelligent program to be able to function when the memory well is dry and adjust itself to the available memory. For programs that are that intelligent, there should be a way to do an explicit conditional memory allocation.

This is argument for the controller, not against it...

Posted Feb 19, 2011 22:34 UTC (Sat) by nix (subscriber, #2304) [Link]

I check most allocations when doing so does not make the program too ugly (which means everywhere other than in tiny string allocations, pretty much). I don't care about functioning under OOM conditions, but being told *which* allocation failed can sometimes point to catastrophic memory leaks and that sort of thing. It's saved my bacon more than once.

If a crashing process was hit with SIGABRT so it dumped core when it ran the machine out of memory, that might be OK... but the problem is that you then can't free up the memory until the core dump is finished, and the machine is in dire straits until then, and that the core dump is likely gigantic. If there was a way to automatically dump a backtrace... (unfortunately core_pattern pipes don't help here because you have to suck in the entire dump to get a backtrace, and if you're out of memory that probably means you have to write it to disk...)

changing ulimit -v behaviour

Posted Feb 17, 2011 10:02 UTC (Thu) by pjm (guest, #2080) [Link]

If a user requests RLIMIT_AS (ulimit -v) instead of RLIMIT_RSS (ulimit -m) then it may well be that the user actually wanted it to limit (say) swap usage as well as just the resident set size.

Yes, it may well also be because RLIMIT_RSS is "hard to implement", as the article notes, and indeed consequently unimplemented on many systems including current versions of Linux; but that doesn't seem like a good basis on which to suggest that RLIMIT_AS should be implemented as RLIMIT_RSS.

The problem with RLIMIT_DATA (ulimit -d) is that it only affects brk (and its wrapper sbrk), not mmap, and so doesn't limit memory allocated with mmap; and many memory allocators use mmap instead of brk.

Rather than change RLIMIT_AS (or even just ulimit -v as distinct from RLIMIT_AS) to limit only resident set size, would it be better to either change it only so as to exclude PROT_NONE mappings; and/or change RLIMIT_DATA behaviour such that it also limits malloc-like memory mappings (say those that are either private or anonymous), so that people could reasonably switch from ulimit -v to ulimit -d ?

changing ulimit -v behaviour

Posted Feb 18, 2011 1:19 UTC (Fri) by giraffedata (guest, #1954) [Link]

Indeed, I use RLIMIT_AS extensively and I think I am like most users in that I'm not interested in limiting real memory usage. That just doesn't concern me. What concerns me is a runaway process using up all the real memory plus swap space and causing an innocent process to fail.

Of course, I don't get that either, because of all the virtual address space that is not backed by memory and swap space. The first time this bit me was with my X server, whose mmap of the frame buffer on the video controller failed for "lack of resources."

I would welcome an RLIMIT that just covers anonymous memory. I don't think RLIMIT_AS could be changed for that, because some people today use it the way I did with my X server: add the size of the frame buffer to the limit I really wanted.

Go's memory management, ulimit -v, and RSS control

Posted Feb 17, 2011 11:11 UTC (Thu) by kov (guest, #7423) [Link]

WebKit's JavaScriptCore also allocates 2GB for its usage on startup. A problem I discovered while debugging crashes in a buildbot was that Linux is not happy to overcommit too much by default, so in a machine having 2GB RAM plus less than a G swap the likelihood of an mmap failing for hitting the limmit increased by quite a bit when the tests were running. The limit is system-wide. I fixed it by disabling the overcommit limit. If Go goes that route I'm afraid we'll run into this problem more often, so disabling that limmit by default seems to be in order.

Go's memory management, ulimit -v, and RSS control

Posted Feb 17, 2011 14:57 UTC (Thu) by jcm (subscriber, #18262) [Link]

1). Applications programmers should do their homework before relying on behavior that is undocumented or different from the relevant standard.

2). Fortunately this is in "Go". The only Go I am interested in is the game. This particular nonsense is unlikely to affect me.

Jon.

Go's memory management, ulimit -v, and RSS control

Posted Feb 17, 2011 16:31 UTC (Thu) by meuh (guest, #22042) [Link]

Even if not allocating physical memory, reservation of 16GB might have an impact on the number of memory descriptors (pgd, pmd, pte ?) allocated by the kernel.

Go's memory management, ulimit -v, and RSS control

Posted Sep 27, 2011 17:14 UTC (Tue) by Blaisorblade (guest, #25465) [Link]

PGD, PMD and PTE are part of page tables, and they won't be allocated by a simple mmap. The kernel will record the mmap with a single VMA (a structure representing an mmap), which is small. My only concern is that if you do actual allocations (i.e. change protections) a page at a time, you'll get a VMA for each page, and that consumes memory and slows down page faults for that program. A good memory manager will allocate memory from the OS in bigger chunks, and I hope/guess that Go's memory manager is good enough to do this.

Go's memory management, ulimit -v, and RSS control

Posted Feb 18, 2011 2:08 UTC (Fri) by jengelh (subscriber, #33263) [Link]

>It is not uncommon for users to use ulimit -v to limit the amount of virtual memory available to any given process; the purpose is to keep applications from getting too large and causing the entire system to thrash.

What's worse, ulimit -v will even prevent read-only mappings (think ISO images) if they are only sufficiently large even though they won't be taking up any real memory besides the PTEs.

Go's memory management, ulimit -v, and RSS control

Posted Feb 18, 2011 4:21 UTC (Fri) by cmccabe (guest, #60281) [Link]

Hmm. When the program is started, can't you just call getrlimit to find out the RLIMIT_AS limit is? Then, if that is less than 16 GB, allocate only that amount of pages rather than the full 16 GB.

Of course, if that's *all* you do, you can't get any more pages by other means, such as mmap.

I guess you can work around that too. Whenever you are about to call mmap, first reduce your original reservation by the size of the reservation you're about to make using mremap(MREMAP_FIXED).

Does this work, or am I missing something?

Go's memory management, ulimit -v, and RSS control

Posted Feb 21, 2011 10:45 UTC (Mon) by xilun (guest, #50638) [Link]

So in the first place, 16GB should be enough for everyone?