Implementing network channels

[Posted May 1, 2006 by corbet]

Last January, Van Jacobson presented his network channel concept at the 2006 linux.conf.au gathering. Channels, by concentrating network processing in ways which are most friendly to SMP systems, look like a promising way to improve high-speed networking performance. There was a fair amount of excitement about the idea. Unfortunately, Mr. Jacobson appears to have since become busy with other projects, so no contributions of actual code have resulted from his work. So not much has happened on this front in the last few months - or so it seemed.

David Miller recently let slip that he was working on his own channel implementation. It was not something he expected to see functioning anytime soon, however:

[D]on't expect major progress and don't expect anything beyond a simple channel to softint packet processing on receive any time soon.

Going all the way to the socket is a large endeavor and will require a lot of restructuring to do it right, so expect this to take on the order of months.

It turns out, however, that David was not the only person working on this idea; Kelly Daly and Rusty Russell have also put together a rudimentary channel implementation; in response to David's note, they posted their code for review. Since this version is more advanced, it has been the center of most of the discussion.

The Daly/Russell patch creates a data structure called struct channel_ring. It consists of 256 pages of memory, mapped contiguously into the receiving process's address space - though the pages will not be contiguous in kernel space. As Van Jacobson described, the variables used by the producer side are located at the beginning of the ring, while variables used by the consumer are at the end; this separation helps to ensure that the cache lines representing those variables do not bounce between processors. These variables include the circular buffer indexes indicating which buffer each side will use next. There are also flags allowing the consumer to request a wakeup when buffers are added to the ring.

User-space starts by creating a socket with the new PF_VJCHAN protocol type, then using mmap() to map the ring buffer. Thereafter, it can use buffers as they become available (using poll() or select(), if need be, to wait for more data). When a buffer is no longer needed, incrementing the appropriate index will free it up for new data.

The driver-side interface is, so far, quite simple. A buffer can be allocated from a given ring with a call to vj_get_buffer(); once the data has been placed there by the network interface, vj_netif_rx() sends that buffer up into the protocol code. The tricky part is getting each packet into the correct buffer in the first place. Copying packets inside the kernel would defeat the purpose of this whole exercise; it is important that the network interface choose the correct buffer before DMAing the packet data into memory. As it happens, contemporary network cards can be smart enough to make that decision, if programmed properly by the driver.

There are vast numbers of issues to be worked out still. David Miller takes exception to the preallocated buffers, seeing them as inflexible and hard to change; he would rather see a pointer-oriented data structure. But it is hard to see how that might work while still avoiding the overhead of mapping buffers into user space with every packet.

A more difficult issue, perhaps, is netfilter. The zero-copy approach can be quite fast, but it also naturally shorts out the packet filtering done by the netfilter code. It has been suggested that, for established connections, that is an acceptable tradeoff. But Rusty has pointed out that people do use filtering on established connections, for packet counting if nothing else. As he put it: "Basically I don't think we can 'relax' our firewall implementation and retain trust". So some other sort of solution will have to be found here.

Another open issue has to do with whether the channel should go all the way through to user space or not. Van Jacobson's linux.conf.au presentation included discussion of a user-space TCP implementation, taking the end-to-end principle to its logical conclusion. The reasoning behind this move is that, since the data will be processed by the application, putting the protocol code in the same place will be the fastest, most cache-friendly way to do it. But moving protocol code to user space also means duplicating much of the networking stack and adding to the complexity of the system as a whole. Leaving the protocol code in the kernel simplifies the situation, and, it is believed, can be made to yield almost all of the same performance benefits. In particular, protocol processing can happen on the same processor as the destination application (a fair amount of it is done that way now), and zero-copy networking will still be possible.

It has also been pointed out that, since most of the system calls involved with network data reception (read() or recv(), for example) already imply copying the data, that copy might as well be done in kernel space. But implicit in that statement is another conclusion: if channels are to be used to their fullest potential for high-performance networking, a new set of user-space interfaces will have to be developed. The venerable socket interface was never designed for a channel-oriented environment. How such an interface might look is not entirely clear; it could be based on the current asynchronous I/O API, on kevents, or on something completely new.

In summary, the networking developers are working on some major changes to how networking will be done in Linux, and there are a lot of issues which are not yet understood. The developers are groping around for ideas. So the channel implementations which are being posted now are unlikely to resemble the code which will, someday, be merged into the mainline; they are, instead, exercises intended mainly to obtain a better understanding of the real nature of the problem. But they are still a promising start to what looks to be an interesting development effort.

Index entries for this article
Kernel	Kevent
Kernel	Networking/Channels

(Log in to post comments)

Implementing network channels

Posted May 4, 2006 10:58 UTC (Thu) by liljencrantz (guest, #28458) [Link]

How does this relate to the zero-copy work going on with splice, tee and friends? Are those functions inherently oriented exclusively towards pipes and kernel buffers or could these be used for sending IO to network cards as well?

Implementing network channels

Posted May 4, 2006 12:43 UTC (Thu) by nix (subscriber, #2304) [Link]

If the networking stack moved into libc, every app could stay the same unless it wanted direct channel access.

I think the netfilter problems are more significant.

Implementing network channels

Posted May 4, 2006 14:05 UTC (Thu) by kfiles (subscriber, #11628) [Link]

> I think the netfilter problems are more significant.

I don't see why. If I'm designing a server process that requires very high throughput, I'm not going to install iptables rules for established connections. That kind of performance hit just seems antithetical to high throughput.

I would think the following logic would be fine for users:
* If the iptables rules installed only filter on the first packet in a connection, network channels can be used for data reception.
* If per-packet (establisted connection) rules are in effect, disable network channels.

I'd be perfectly happy with such a compromise, and I can't imagine it would be to hard to set a /proc variable when iptables installs a rull for established connections.

--kirby

Implementing network channels

Posted May 4, 2006 21:02 UTC (Thu) by caitlinbestler (guest, #32532) [Link]

Or more generally, before binding a flow to a netchannel:

1) find all netfilter rules that would apply to the flow.
2) If the hardware end of the netchannel can implement those
restirctions then proceed, otherwise don't assign the
netchannel directly to the hardware.

The rule you cited deals with the easy subset: there are
no rules that apply once the connection is established.
And obviously any hardware would be able to implement
zero rules. But other hardware may be able to implement
*some* rules, the most important plausible probably being
to count every packet within the connection.

Implementing network channels

Posted May 4, 2006 22:19 UTC (Thu) by smoogen (subscriber, #97) [Link]

The case where I could see the need for high throughput and high integrity or modification would be in a router. In some cases you want the netfilter stack to be very low level. I could see netfilter in this 'world' to be split into a layered approach. A very high level port open/port closed ACL level, a lower related/established, and a very low level 'what the f is this doing in my packet level.'

routers / firewalls

Posted May 9, 2006 2:48 UTC (Tue) by xoddam (subscriber, #2322) [Link]

Packets don't go to userspace at all if they're going *through* a router.
But we still need this functionality for firewalls on the host.

Some firewall applications need to track connections, scan packets
within a connection, and even have the option of dropping connections
altogether (eg. intrusion protection). Netfilter will need some
rearrangement to achieve this if channels go direct to userspace.

Implementing network channels

Posted May 11, 2006 8:26 UTC (Thu) by amcrae (guest, #25501) [Link]

User level networking has lots of interesting possibilities:
http://au.netd.com/papers/user-networking.pdf
http://au.netd.com/papers/router-on-linux.pdf
Cheers,
AMc

Implementing network channels

Posted May 11, 2006 16:21 UTC (Thu) by nirajgupta (guest, #37693) [Link]

sorry about this late post, actually implemented this concept while implementing a high speed packet analyser about 2 years back, ofcourse our company still heavily uses it, right now our implementation does copies, but even then the performance is very very significantly higher than pcap.