[Tux3] Patch: Introduce ddlink interface to expose extended functionality

Daniel Phillips phillips at phunq.net
Wed Feb 25 21:16:04 PST 2009


When Tux3 snapshots arrive we will need some means of creating, deleting, listing and otherwise managing them.  This could get quite involved, particularly when per-directory snapshots are supported.  There is no existing interface in Linux suitable for this purpose.  Today I will present an interface that I think can do the job well, and be useful in a number of other ways, particularly for monitoring and debugging.

Here is my original ddlink writeup, posted a year ago:

   "An alternative interface to device mapper"
   http://lwn.net/Articles/271805/

Ddlink is a generic pipe-like interface originally intended for controlling device drivers, but useful for many other kinds of kernel/userspace interaction.  Interfaces may range from very simple, implemented in a few dozen lines of kernel and userspace code, to complex state machines such as the device mapper control interface given as an example in my earlier post.

Ddlink was inspired by Trond Myklebust's venerable and successful rpc_pipefs interface, currently used to control NFS clients and servers.  Ddlink provides an application program with an fd object that can be read, written, ioctled and polled much like a pipe, suitable for efficient binary communication with kernel components.  Unlike a pipe, there is no write buffering.  Each write to a ddlink directly triggers some kernel handler.  Reads are buffered via an output queue of ddlink "items", each of which is an unrestricted blob.  In practice, a ddlink item is usually a C structure or ascii text.  Ioctls on ddlinks are unrestricted and the ioctl command space is unpolluted.

There are no partial reads of ddlink output data.  A read call either provides enough space to hold the next outbound kernel item or triggers EIO, meaning "make your buffer bigger and try again".  This arrangement takes the onus off the userspace program to buffer partial reads in order to reassemble input that would otherwise be brutally dismembered.  As a bonus, the kernel code for ddlink is considerably simplified versus Trond's rpc-pipefs precursor.

Unlike a pipe, there is no waiting for input on a ddlink: if there is 
nothing to read then the read returns immediately with zero length.  If 
some other behavior is desired then it can be obtained using poll.

Ddlink provides a simple framework to the implementor for generalized allocation and destruction of dditems.  There is a small library of helper functions that are useful for creating domain-specific ddlink interfaces.

The code for ddlink is compact:

  * ~150 lines of core ddlink code
  * ~100 lines of support for kernel ddlink implementations
  * A ddlink kernel implementation example in 50 lines

In terms of object size:

  * ~1800 bytes of kernel code for ddlink and library
  * ~325 bytes of module code for example implementation

So ddlink is about as light and tight as an interface can be.  It is also highly efficient, flexible and extensible, and requires very little boilerplate code, either in kernel or user space.

Ddlink has a number of advantages over ioctl:

 - Input and output transfer size are part of the interface

 - Delivers error messages as readable text

 - Provides a mechanism for queueing asynchronous results (also
   provides a mechanism for returning immediate results)

 - Supports stateful interface protocols

 - The creator of a ddlink fd owns it and does not have to worry
   about traffic on it from other sources.

 - Supports a file-oriented security model

 - Pollable

Two ddlink examples userspace programs are attached, based on the example kernel ddlink implementation in the patch.

 - The fs/tux3 kenrel implementation is a simple echo, cut and
   pasted from the ddlink.c example, where all text written to the
   ddlink is just requeued for reading.

 - Ioctl any file or directory on the mounted filesystem with 0xdd
   to obtain a ddlink

 - Example "ddtest.c" reads arguments from the command line, writes
   to the ddlink, reads from the ddlink and shows the output.

 - Example "ddtest.c" shows ddlink used for interprocess communication.
   Forks a child process, reads from the ddlink, parent writes several
   items to the ddlink and exits, child reads some of them and exits.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ddtest.c
Type: text/x-csrc
Size: 515 bytes
Desc: not available
URL: <http://phunq.net/pipermail/tux3/attachments/20090225/0b36af95/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ddfork.c
Type: text/x-csrc
Size: 966 bytes
Desc: not available
URL: <http://phunq.net/pipermail/tux3/attachments/20090225/0b36af95/attachment-0001.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tux3.ddlink.patch
Type: text/x-diff
Size: 12234 bytes
Desc: not available
URL: <http://phunq.net/pipermail/tux3/attachments/20090225/0b36af95/attachment.patch>
-------------- next part --------------
_______________________________________________
Tux3 mailing list
Tux3 at tux3.org
http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3


More information about the Tux3 mailing list