When packaging software for GARstow, there are a few common things that I need to know for every package: the name, a brief (and optionally a long) description of the package, the latest release number, where to get the source from, what build system it uses, and what its dependencies are. This is true for all packaging systems, and there are many of them in use across all the different Unix-like (and sometimes not-so-Unix-like) systems. Some require more information: for instance, Debian's packagers need to know the package author and the package's license, and Gentoo packages can have a link to a home page and list older versions too.
Finding some of this information is fairly straightforward (for instance, much of it is contained in Freshmeat's database). Some of it — dependency information in particular — can be very tricky to figure out. What's needed is a common format for software packaging information: a way that software developers can specify most of the basic information about their software that's needed for other people to build installable packages of it.
Note that "most"; I'm not proposing a common packaging format, or even something from which different packaging formats can be generated. The problem with the latter is that many decisions that the packager must make will be based on system policy rather than package policy; as an example, take the installation of initialisation scripts, which must be done in dramatically different ways on current Debian, FreeBSD and Solaris systems, and would probably be done differently again were someone to develop a new operating system next year. It's necessary to keep the specification sufficiently general that unusual and future systems will be able to make use of it.
One use of this would be as a means for developers to publish
latest-version information for their packages so that users and
packagers can tell when a package needs updating.
For instance, I could publish the URL of a
rawdog-latest.sdf
file (which would actually be a symlink
to rawdog-3.14.sdf
, or similar); packagers would poll it
daily and check the version number against their package version.
This model works well for RSS; it would provide a distributed
equivalent to Freshmeat.
Another use would be to provide a standard means of reporting bugs upstream; the file could include email addresses and URLs to use when filing bug reports. A third could be documentation searching: if the file listed the binaries provided by the package and linked to documentation, FAQs, mailing list archives and so on, then a "man" equivalent could both display a local manual and suggest external locations to look for help.
Specifying dependency information is an interesting problem for several reasons:
-
It would be preferable to avoid needing a central registry of
package names; while each package has a name, it's not unknown for a
package to fork without a name change (for instance, there are currently
two versions of
procps
, and two libraries calledaudiofile
). When referring to a package, it's necessary to provide both a name and a URL to its SDF file. - ... which works fine, until that SDF file moves, since then you have to update all the packages that depend on it. This may require searching a central registry (such as Freshmeat, or a hypothetical "Google Software").
- Dependencies need to include version information too (GARstow doesn't capture this; Gentoo does). This means that not only do you need to specify an open or closed range of versions that you depend on, but you usually don't want to link to the latest version. It might be useful to be able to describe all the versions of a piece of software in a single file.
- Dependencies come in several types. Extract dependencies are required to turn the source distribution into buildable source. Build dependencies are required to compile the software. Runtime dependencies are required while the software is in use (so they're the only interesting dependencies for binary distribution). The GAR system pays careful attention to this, since it's designed for building systems which must run in limited space.
-
Some dependencies will be "virtual": while fontconfig is a
specific bit of software, not everyone uses XFree86 to provide
their X libraries, and most systems have their own
libc
. Some sort of "provides" approach might work here. -
Some packages need to specify dependencies on system APIs like the
Single Unix Standard (for instance, not all systems have the
poll
system call).
It'd clearly be useful to be able to describe both source-available and closed-source software this way (particularly since some packages straddle the line between the two — a collection of data files can be installed as-is, and source-available software may depend on closed-source software or vica versa).
An obvious starting point for the format would be RDF; while it's
a bit clumsy for human manipulation, it's already solved many of the
problems of linking together a distributed collection of information.
(For instance, the FOAF seeAlso
mechanism would
probably work well for dependencies.)
There are at least three other file formats called "SDF", so it's a poor choice of name; perhaps "Open Software Description Format" or something like that would be better.
The old Linux Software Map format handled the Freshmeat-like parts of this spec, but has long since fallen into disuse. Ports-like systems implement another subset of it.
DOAP implements quite a lot of the above in RDF.