2004-04-26 · in Ideas · 893 words

When packaging software for GARstow, there are a few common things that I need to know for every package: the name, a brief (and optionally a long) description of the package, the latest release number, where to get the source from, what build system it uses, and what its dependencies are. This is true for all packaging systems, and there are many of them in use across all the different Unix-like (and sometimes not-so-Unix-like) systems. Some require more information: for instance, Debian's packagers need to know the package author and the package's license, and Gentoo packages can have a link to a home page and list older versions too.

Finding some of this information is fairly straightforward (for instance, much of it is contained in Freshmeat's database). Some of it — dependency information in particular — can be very tricky to figure out. What's needed is a common format for software packaging information: a way that software developers can specify most of the basic information about their software that's needed for other people to build installable packages of it.

Note that "most"; I'm not proposing a common packaging format, or even something from which different packaging formats can be generated. The problem with the latter is that many decisions that the packager must make will be based on system policy rather than package policy; as an example, take the installation of initialisation scripts, which must be done in dramatically different ways on current Debian, FreeBSD and Solaris systems, and would probably be done differently again were someone to develop a new operating system next year. It's necessary to keep the specification sufficiently general that unusual and future systems will be able to make use of it.

One use of this would be as a means for developers to publish latest-version information for their packages so that users and packagers can tell when a package needs updating. For instance, I could publish the URL of a rawdog-latest.sdf file (which would actually be a symlink to rawdog-3.14.sdf, or similar); packagers would poll it daily and check the version number against their package version. This model works well for RSS; it would provide a distributed equivalent to Freshmeat.

Another use would be to provide a standard means of reporting bugs upstream; the file could include email addresses and URLs to use when filing bug reports. A third could be documentation searching: if the file listed the binaries provided by the package and linked to documentation, FAQs, mailing list archives and so on, then a "man" equivalent could both display a local manual and suggest external locations to look for help.

Specifying dependency information is an interesting problem for several reasons:

It'd clearly be useful to be able to describe both source-available and closed-source software this way (particularly since some packages straddle the line between the two — a collection of data files can be installed as-is, and source-available software may depend on closed-source software or vica versa).

An obvious starting point for the format would be RDF; while it's a bit clumsy for human manipulation, it's already solved many of the problems of linking together a distributed collection of information. (For instance, the FOAF seeAlso mechanism would probably work well for dependencies.)

There are at least three other file formats called "SDF", so it's a poor choice of name; perhaps "Open Software Description Format" or something like that would be better.

The old Linux Software Map format handled the Freshmeat-like parts of this spec, but has long since fallen into disuse. Ports-like systems implement another subset of it.

DOAP implements quite a lot of the above in RDF.