Transcendent memory
In his linux-kernel introduction, Dan Magenheimer asks:
Dan (along with a list of other kernel developers) is exploring this concept, which he calls "transcendental memory." In short, transcendental memory can be thought of as a sort of RAM disk with some interesting characteristics: nobody knows how big it is, writes to the disk may not succeed, and, potentially, data written to the disk may vanish before being read back again. At a first blush, it may seem like a relatively useless sort of device, but it is hoped that transcendental memory will be able to improve performance in a few situations.
There is an API specification [PDF] available; there is also a related C API found in the patch itself. This discussion will focus on the latter, which suffers from less EXCESSIVE CAPITAL USE and is generally easier to understand.
Transcendental memory operates on the concept of page pools; once a pool is created, data can be stored to pages within the pool. The calls for creating and destroying pools look like this:
u32 pool_id = tmem_new_pool(struct tmem_pool_uuid uuid, u32 flags) tmem_destroy_pool(u32 pool_id);
Pools are identified by the uuid value, though the identification really only matters for pools which might be shared among multiple users. A fair amount of information is stored in the flags field, including:
- An "ephemeral" bit, which controls whether data successfully written
to the pool is allowed to disappear at a random future time.
- A "shared" bit indicating whether the pool is to be shared with other
users.
- The size of pages to use in the pool, expressed as a kernel "order"
value.
- A specification version number, used to ensure that both sides of the conversation know how to understand each other.
While users are expected to specify an expected page size, there is no way to specify the size of the pool as a whole. Determining the proper sizing for a pool (which almost certainly changes over time) is left to the hypervisor or whatever other software component is managing the pool.
As suggested by the above interface, transcendental memory is very much page-based. Beyond that, it also can never be referenced directly; users are required to copy data into and out of the pool explicitly. The functions used for moving data between normal and transcendental memory are:
int tmem_put_page(u32 pool_id, u64 object_id, u32 page_id, unsigned long pfn); int tmem_get_page(u32 pool_id, u64 object_id, u32 page_id, unsigned long pfn);
For both of these calls, pool_id specifies an existing pool. The object_id and page_id values, together, form a unique identifier for the page within the pool. If the pool is being used to cache file pages, for example, the object_id would identify the file, while page_id would be the offset within the file. pfn (a page frame number) identifies the page which is the source of the data (for tmem_put_page()) or the destination (tmem_get_page()).
Note that either call might fail. Since the size of the pool is not known, callers can never know in advance whether tmem_put_page() will succeed. So any transcendental memory user must have a backup plan ready in case the call fails. For pools marked as "ephemeral," tmem_get_page() is allowed to fail even if tmem_put_page() on the same ID succeeded; in other words, the implementation is allowed to drop pages from ephemeral pools if it decides that the memory can be put to better use elsewhere. It's also worth noting that, with private, ephemeral pools, tmem_get_page() will remove the indicated page from the pool.
As an example of how this feature might be used, consider the Linux page cache, which maintains copies of pages from disk files. When memory gets tight, the page cache will start forgetting pages which are clean, but which have not been referenced in the recent past. With transcendental memory, the page cache could, before dropping the pages, attempt to store them into an ephemeral transcendental memory pool. At some future time, when one of those pages is needed again, the page cache would first attempt to fetch it from the pool. If the tmem_get_page() call succeeds, a disk I/O operation will have been avoided and everybody benefits; otherwise the page is read from disk as usual.
Persistent (non-ephemeral) pools could be used as a sort of swap device. If the swapping code succeeds in writing a page to the pool, it can avoid writing it to the real swap device. The result is saved I/O at both swap-out and swap-in times. If the pool lacks space for the swapped page, it will be written to the real swap device in the usual way.
Meanwhile, the transcendental memory implementation can try to optimize its management of the memory pools. Guests which are more active (or which have been given a higher priority) might be allowed to allocate more pages from the pools. Duplicate pages can be coalesced; KSM-like techniques could be used, but the use of object IDs could make it easier to detect duplicates in a number of situations. And so on.
The API specifies a number of other operations. There are a couple of calls to flush pages from the pool; one of them can remove all pages with a given object ID. Sub-page-size reads and writes are supported; there is also a tmem_xchg() call to atomically exchange data within a transcendental memory page. See the API specification for the full list.
A number of concerns were raised in the subsequent discussion; as a result, the above API is likely to change a bit. The biggest concern, though, appears to be security. The potential for hostile code to tap into shared pools and read out pages has developers worried; the need to guess a 128-bit UUID first has proved not to be sufficiently reassuring. Even with legitimate users only, a shared pool has the potential to contain data which should not, in reality, be shared between guests. As a result, any transcendental memory user will have to be written to take high-level security issues into account in low-level code.
Dan seemingly doesn't see the security problems as being as worrisome as others do. Even so, he eventually announced that the next transcendental memory patch would not include support for shared pools, and, indeed, version 2 lacks that feature. That feature will probably not come back until the security issues have been thought through and the concerns have been addressed.
Beyond that, transcendental memory will need some convincing evidence that
it improves performance before it can make it into the mainline. The
potential for improvements is clearly there; it is essentially a way for
the system to take higher-level information into account when managing its
virtual memory resources. If transcendental memory is able to fulfill that
potential in a secure way, there may well be a place for it in the mainline
kernel.
Index entries for this article | |
---|---|
Kernel | Memory management/Virtualization |
Kernel | Transcendent memory |
(Log in to post comments)
Transcendent memory
Posted Jul 8, 2009 18:52 UTC (Wed) by nix (subscriber, #2304) [Link]
reassuring"? This is mystifying. If it's sufficiently random (which as a
UUID it had better be), brute-forcing any of the pool IDs is going to be
next to impossible. Are people with 128-bit secret keys worried that
someone is going to guess their key by brute force? No: they're worried
about attacks that avoid brute-forcing and reduce the search space.
Transcendent memory
Posted Jul 8, 2009 23:16 UTC (Wed) by aliguori (subscriber, #30636) [Link]
A common requirement with virtualization is to implement "chinese wall" security policies. Imagine if you had a single box that was running a production server as a VM for both Coke and Pepsi. No matter what, neither company wants there to be any chance that the other one can access it's data. The hypervisor must be able to enforce that. If the Pepsi VM was somehow able to obtain the UUID for the Coke shared tmem pool (even if it was because of a bug in the Coke server), you'd have one unhappy customer.
If you were to support a memory sharing system like this, you would want the available pools to be enumerated by the hypervisor. You likely want to support dynamic pools too so you need some way to hot add/remove pools. Using uuids is certainly a reasonable means of identifying pools but the point is that you need a more coherent strategy for exposing the pools to the guest that is arbitrated by the hypervisor.
Good example...
Posted Jul 9, 2009 1:07 UTC (Thu) by khim (subscriber, #9252) [Link]
You can see what this kind of thinking can lead to - if you sha1 hash is not good enough? But sometimes security requirements are not so strict - so it'll be good (disabled by default) option...
Good example...
Posted Jul 9, 2009 6:51 UTC (Thu) by nix (subscriber, #2304) [Link]
comment on the astounding quality of the comments on that article on
youtube. It's like something out of xkcd: hundreds of comments, all
pushing their 'site for free games' or complaining that they won't watch
it because it's 'too long'. It makes you appreciate lwn's comment quality
(generally high when I keep quiet) all the more...
Shared memory = shared secrets
Posted Jul 9, 2009 1:14 UTC (Thu) by PaulWay (subscriber, #45600) [Link]
Have fun,
Paul
Shared memory = shared secrets
Posted Jul 9, 2009 12:20 UTC (Thu) by nix (subscriber, #2304) [Link]
Transcendent memory
Posted Jul 8, 2009 20:16 UTC (Wed) by johill (subscriber, #25196) [Link]
Transcendent memory
Posted Jul 8, 2009 20:57 UTC (Wed) by elanthis (guest, #6227) [Link]
Transcendent memory
Posted Jul 9, 2009 0:44 UTC (Thu) by johill (subscriber, #25196) [Link]
See, if the host were to hash the pages I gave it (from the guest) with this API, and memcmp() them, it could with very little effort tell me it's stored it even though it just made a note that my ID(s) point to that particular page, and refcounted the page. This is easy since a page, once pushed from the guest to the host, is immutable.
The way the article was written though it seems that with the shared stuff, different hosts could access the same page frame by the same ID. So guest A could push out a page and guest B could retrieve it with that same ID? Where's the use in that?
Transcendent memory
Posted Jul 9, 2009 0:59 UTC (Thu) by ncm (guest, #165) [Link]
Transcendent memory
Posted Jul 9, 2009 14:35 UTC (Thu) by johill (subscriber, #25196) [Link]
That- doesn't work – the host would still have to verify there are no hash collisions
- doesn't really make a difference afaict?
Transcendent memory
Posted Jul 11, 2009 16:04 UTC (Sat) by MarkWilliamson (subscriber, #30166) [Link]
Transcendent (?) memory
Posted Jul 8, 2009 23:46 UTC (Wed) by ncm (guest, #165) [Link]
Transcendent (?) memory
Posted Jul 9, 2009 3:54 UTC (Thu) by firasha (guest, #4230) [Link]
Dan explained his choice of "transcendent" in the original thread:> While true that this memory is "exceeding usual limits", the more
> important criteria is that it may disappear.
>
> It might be clearer to just call it "ephemeral memory".Ephemeral tmem (precache) may be the most interesting, but there is persistent tmem (preswap) as well. Both are working today and both are included in the patches I posted.
Looking for a term encompassing both, I chose "transcendent".
Transcendent memory
Posted Jul 9, 2009 5:45 UTC (Thu) by stewart (subscriber, #50665) [Link]
Transcendent memory
Posted Jul 9, 2009 11:21 UTC (Thu) by rwmj (subscriber, #5474) [Link]
I was thinking "weak hash tables", but either way "everything old is new again".
Transcendent memory
Posted Jul 15, 2009 1:54 UTC (Wed) by holstein (guest, #6122) [Link]
Transcendent memory
Posted Jul 22, 2009 12:10 UTC (Wed) by tdz (subscriber, #58733) [Link]
I don't understand why it is better to move a page to transcendent memory instead of keeping it in the page cache. The same amount of memory is needed in both cases. Can someone enlight me?
Regards, Thomas
Transcendent memory
Posted Jul 22, 2009 12:14 UTC (Wed) by johill (subscriber, #25196) [Link]
Transcendent memory
Posted Jul 22, 2009 15:31 UTC (Wed) by tdz (subscriber, #58733) [Link]
Regards, Thomas
Word?
Posted Mar 30, 2011 12:51 UTC (Wed) by juliank (guest, #45896) [Link]