2003-12-29 · in Ideas · 185 words

Unix pipelines work very nicely for processing text and simple columnar data. However, these days I'm finding increasingly often that I want to process XML. I would like to have tools that convert between columnar and XML formats, perform basic operations (grep, tr, etc.) on XML data, and allow me to do more complex operations with an awk-like language (preferably with a nicer API than DOM). The interesting problem here would be coming up with succinct ways of describing the interesting bits of an XML file as command-line options; XML Path is quite good, but you really want to be able to do XSLT-style things. The particular use case I've got in mind is translating XHTML to RSS by chaining together a select to pick the ideas from the XHTML, a translation operation to produce the appropriate elements in the output, and a head to select only the first few items.

Giles Radford suggested XMLStarlet, which provides many of the tools I suggested above; Piet Delport also suggested the XSH language built on top of Perl, which is pretty much the "XML awk" I wanted.