Transcendent memory

By Jonathan Corbet
July 8, 2009

Making the best use of available memory is one of the biggest challenges for any operating system. Throwing virtualization into the mix adds both new challenges (balancing memory use between guests, for example) and opportunities (sharing pages between guests). Developers have responded with technologies like hot-plug memory and KSM, but nobody seems to think that the problem is fully solved. Transcendent memory is a new memory-management technique which, it is hoped, will improve the system's use of scarce RAM, regardless of whether virtualization is being used.

In his linux-kernel introduction, Dan Magenheimer asks:

What if there was a class of memory that is of unknown and dynamically variable size, is addressable only indirectly by the kernel, can be configured either as persistent or as "ephemeral" (meaning it will be around for awhile, but might disappear without warning), and is still fast enough to be synchronously accessible?

Dan (along with a list of other kernel developers) is exploring this concept, which he calls "transcendental memory." In short, transcendental memory can be thought of as a sort of RAM disk with some interesting characteristics: nobody knows how big it is, writes to the disk may not succeed, and, potentially, data written to the disk may vanish before being read back again. At a first blush, it may seem like a relatively useless sort of device, but it is hoped that transcendental memory will be able to improve performance in a few situations.

There is an API specification [PDF] available; there is also a related C API found in the patch itself. This discussion will focus on the latter, which suffers from less EXCESSIVE CAPITAL USE and is generally easier to understand.

Transcendental memory operates on the concept of page pools; once a pool is created, data can be stored to pages within the pool. The calls for creating and destroying pools look like this:

    u32 pool_id = tmem_new_pool(struct tmem_pool_uuid uuid, u32 flags)
    tmem_destroy_pool(u32 pool_id);

Pools are identified by the uuid value, though the identification really only matters for pools which might be shared among multiple users. A fair amount of information is stored in the flags field, including:

An "ephemeral" bit, which controls whether data successfully written to the pool is allowed to disappear at a random future time.
A "shared" bit indicating whether the pool is to be shared with other users.
The size of pages to use in the pool, expressed as a kernel "order" value.
A specification version number, used to ensure that both sides of the conversation know how to understand each other.

While users are expected to specify an expected page size, there is no way to specify the size of the pool as a whole. Determining the proper sizing for a pool (which almost certainly changes over time) is left to the hypervisor or whatever other software component is managing the pool.

As suggested by the above interface, transcendental memory is very much page-based. Beyond that, it also can never be referenced directly; users are required to copy data into and out of the pool explicitly. The functions used for moving data between normal and transcendental memory are:

    int tmem_put_page(u32 pool_id, u64 object_id, u32 page_id, unsigned long pfn);
    int tmem_get_page(u32 pool_id, u64 object_id, u32 page_id, unsigned long pfn);

For both of these calls, pool_id specifies an existing pool. The object_id and page_id values, together, form a unique identifier for the page within the pool. If the pool is being used to cache file pages, for example, the object_id would identify the file, while page_id would be the offset within the file. pfn (a page frame number) identifies the page which is the source of the data (for tmem_put_page()) or the destination (tmem_get_page()).

Note that either call might fail. Since the size of the pool is not known, callers can never know in advance whether tmem_put_page() will succeed. So any transcendental memory user must have a backup plan ready in case the call fails. For pools marked as "ephemeral," tmem_get_page() is allowed to fail even if tmem_put_page() on the same ID succeeded; in other words, the implementation is allowed to drop pages from ephemeral pools if it decides that the memory can be put to better use elsewhere. It's also worth noting that, with private, ephemeral pools, tmem_get_page() will remove the indicated page from the pool.

As an example of how this feature might be used, consider the Linux page cache, which maintains copies of pages from disk files. When memory gets tight, the page cache will start forgetting pages which are clean, but which have not been referenced in the recent past. With transcendental memory, the page cache could, before dropping the pages, attempt to store them into an ephemeral transcendental memory pool. At some future time, when one of those pages is needed again, the page cache would first attempt to fetch it from the pool. If the tmem_get_page() call succeeds, a disk I/O operation will have been avoided and everybody benefits; otherwise the page is read from disk as usual.

Persistent (non-ephemeral) pools could be used as a sort of swap device. If the swapping code succeeds in writing a page to the pool, it can avoid writing it to the real swap device. The result is saved I/O at both swap-out and swap-in times. If the pool lacks space for the swapped page, it will be written to the real swap device in the usual way.

Meanwhile, the transcendental memory implementation can try to optimize its management of the memory pools. Guests which are more active (or which have been given a higher priority) might be allowed to allocate more pages from the pools. Duplicate pages can be coalesced; KSM-like techniques could be used, but the use of object IDs could make it easier to detect duplicates in a number of situations. And so on.

The API specifies a number of other operations. There are a couple of calls to flush pages from the pool; one of them can remove all pages with a given object ID. Sub-page-size reads and writes are supported; there is also a tmem_xchg() call to atomically exchange data within a transcendental memory page. See the API specification for the full list.

A number of concerns were raised in the subsequent discussion; as a result, the above API is likely to change a bit. The biggest concern, though, appears to be security. The potential for hostile code to tap into shared pools and read out pages has developers worried; the need to guess a 128-bit UUID first has proved not to be sufficiently reassuring. Even with legitimate users only, a shared pool has the potential to contain data which should not, in reality, be shared between guests. As a result, any transcendental memory user will have to be written to take high-level security issues into account in low-level code.

Dan seemingly doesn't see the security problems as being as worrisome as others do. Even so, he eventually announced that the next transcendental memory patch would not include support for shared pools, and, indeed, version 2 lacks that feature. That feature will probably not come back until the security issues have been thought through and the concerns have been addressed.

Beyond that, transcendental memory will need some convincing evidence that it improves performance before it can make it into the mainline. The potential for improvements is clearly there; it is essentially a way for the system to take higher-level information into account when managing its virtual memory resources. If transcendental memory is able to fulfill that potential in a secure way, there may well be a place for it in the mainline kernel.

Index entries for this article
Kernel	Memory management/Virtualization
Kernel	Transcendent memory

(Log in to post comments)

Transcendent memory

Posted Jul 8, 2009 18:52 UTC (Wed) by nix (subscriber, #2304) [Link]

"the need to guess a 128-bit UUID first has proved not to be sufficiently
reassuring"? This is mystifying. If it's sufficiently random (which as a
UUID it had better be), brute-forcing any of the pool IDs is going to be
next to impossible. Are people with 128-bit secret keys worried that
someone is going to guess their key by brute force? No: they're worried
about attacks that avoid brute-forcing and reduce the search space.

Transcendent memory

Posted Jul 8, 2009 23:16 UTC (Wed) by aliguori (subscriber, #30636) [Link]

Why limit your security to a shared secret when you can implement stronger policies within the hypervisor itself?

A common requirement with virtualization is to implement "chinese wall" security policies. Imagine if you had a single box that was running a production server as a VM for both Coke and Pepsi. No matter what, neither company wants there to be any chance that the other one can access it's data. The hypervisor must be able to enforce that. If the Pepsi VM was somehow able to obtain the UUID for the Coke shared tmem pool (even if it was because of a bug in the Coke server), you'd have one unhappy customer.

If you were to support a memory sharing system like this, you would want the available pools to be enumerated by the hypervisor. You likely want to support dynamic pools too so you need some way to hot add/remove pools. Using uuids is certainly a reasonable means of identifying pools but the point is that you need a more coherent strategy for exposing the pools to the guest that is arbitrated by the hypervisor.

Good example...

Posted Jul 9, 2009 1:07 UTC (Thu) by khim (subscriber, #9252) [Link]

You can see what this kind of thinking can lead to - if you sha1 hash is not good enough? But sometimes security requirements are not so strict - so it'll be good (disabled by default) option...

Good example...

Posted Jul 9, 2009 6:51 UTC (Thu) by nix (subscriber, #2304) [Link]

Haven't seen it yet (got to go to work too soon), but I'd just like to
comment on the astounding quality of the comments on that article on
youtube. It's like something out of xkcd: hundreds of comments, all
pushing their 'site for free games' or complaining that they won't watch
it because it's 'too long'. It makes you appreciate lwn's comment quality
(generally high when I keep quiet) all the more...

Shared memory = shared secrets

Posted Jul 9, 2009 1:14 UTC (Thu) by PaulWay (subscriber, #45600) [Link]

If we're talking about a shared system, then at some point that UUID has to be shared amongst hosts. I think the fear is that a hacked client will be able to see the UUIDs used by other clients, and therefore be able to use those UUIDs directly rather than having to guess them.

Have fun,

Paul

Shared memory = shared secrets

Posted Jul 9, 2009 12:20 UTC (Thu) by nix (subscriber, #2304) [Link]

Ow. Yeah, that's plausible, but unfortunately it would apply to all other shared-secret mechanisms too :/ basically if people can steal your key, you've lost. (But if they can steal your key they can presumably steal anything else they care to, as well.)

Transcendent memory

Posted Jul 8, 2009 20:16 UTC (Wed) by johill (subscriber, #25196) [Link]

What's the use case for shared anyway? Copying data between the different guests?

Transcendent memory

Posted Jul 8, 2009 20:57 UTC (Wed) by elanthis (guest, #6227) [Link]

Sharing data between the guests. If both guests have an identical binary loaded up from their identical page cache of their identical read only /usr, why copy it into memory twice and waste the space and incur the cache/copying/performance overhead?

Transcendent memory

Posted Jul 9, 2009 0:44 UTC (Thu) by johill (subscriber, #25196) [Link]

I don't see how that's possible with this API though. Or rather, how the API has any influence on it.

See, if the host were to hash the pages I gave it (from the guest) with this API, and memcmp() them, it could with very little effort tell me it's stored it even though it just made a note that my ID(s) point to that particular page, and refcounted the page. This is easy since a page, once pushed from the guest to the host, is immutable.

The way the article was written though it seems that with the shared stuff, different hosts could access the same page frame by the same ID. So guest A could push out a page and guest B could retrieve it with that same ID? Where's the use in that?

Transcendent memory

Posted Jul 9, 2009 0:59 UTC (Thu) by ncm (guest, #165) [Link]

If the client were responsible for hashing the page contents to produce the UUID, then the kernel wouldn't have to do it; it could rely on the UUID itself to identify sharable pages automatically.

Transcendent memory

Posted Jul 9, 2009 14:35 UTC (Thu) by johill (subscriber, #25196) [Link]

That

doesn't work – the host would still have to verify there are no hash collisions
doesn't really make a difference afaict?

Transcendent memory

Posted Jul 11, 2009 16:04 UTC (Sat) by MarkWilliamson (subscriber, #30166) [Link]

How about for implementing a shared in-memory filesystem that multiple guests can retrieve information from? The filesystem itself might have been created by a third-party "trusted" VM or the hypervisor itself.

Transcendent (?) memory

Posted Jul 8, 2009 23:46 UTC (Wed) by ncm (guest, #165) [Link]

What's supposed to be "transcendent" about this memory? It seems more "transient" to me. We're not talking about memories of past lives or Jungian archetypes.

Transcendent (?) memory

Posted Jul 9, 2009 3:54 UTC (Thu) by firasha (guest, #4230) [Link]

Dan explained his choice of "transcendent" in the original thread:

> While true that this memory is "exceeding usual limits", the more
> important criteria is that it may disappear.
>
> It might be clearer to just call it "ephemeral memory".
Ephemeral tmem (precache) may be the most interesting, but there is persistent tmem (preswap) as well. Both are working today and both are included in the patches I posted.
Looking for a term encompassing both, I chose "transcendent".

Transcendent memory

Posted Jul 9, 2009 5:45 UTC (Thu) by stewart (subscriber, #50665) [Link]

anybody else thinking 'memcached'?

Transcendent memory

Posted Jul 9, 2009 11:21 UTC (Thu) by rwmj (subscriber, #5474) [Link]

I was thinking "weak hash tables", but either way "everything old is new again".

Transcendent memory

Posted Jul 15, 2009 1:54 UTC (Wed) by holstein (guest, #6122) [Link]

That's the first thing that came to my mind.

Transcendent memory

Posted Jul 22, 2009 12:10 UTC (Wed) by tdz (subscriber, #58733) [Link]

"As an example of how this feature might be used, consider the Linux page cache, which maintains copies of pages from disk files. When memory gets tight, the page cache will start forgetting pages which are clean, but which have not been referenced in the recent past. With transcendental memory, the page cache could, before dropping the pages, attempt to store them into an ephemeral transcendental memory pool."

I don't understand why it is better to move a page to transcendent memory instead of keeping it in the page cache. The same amount of memory is needed in both cases. Can someone enlight me?

Regards, Thomas

Transcendent memory

Posted Jul 22, 2009 12:14 UTC (Wed) by johill (subscriber, #25196) [Link]

The only difference is who decides to drop it -- if it's in the page cache that decision has to be made by the guest, if it's in the transcendent memory that decision can be made by the host too, if _it_ needs memory.

Transcendent memory

Posted Jul 22, 2009 15:31 UTC (Wed) by tdz (subscriber, #58733) [Link]

Hmm, that makes sense. Thanks for the fast answer.

Regards, Thomas

Word?

Posted Mar 30, 2011 12:51 UTC (Wed) by juliank (guest, #45896) [Link]

They used Word to write the spec, I don't believe they should seriously be listened to.