2007-04-24 · in Ideas · 168 words

I'm currently looking at changing the mail-archive-to-HTML system I use -- which is largely painless, but has the disadvantage that all the URLs to individual messages will change. I could reverse-engineer both systems and figure out how it's generating filenames based on the messages, but I think there's probably an easier approach.

I'm imagining a "make-redirects" tool. Given two trees of documents, this program should identify pairs of similar documents in the "old" and "new" trees, and emit an Apache config file with Redirect directives in. The documents don't need to be identical, just as close as possible (containing the largest percentage of identical text) -- it seems unlikely to me that this wouldn't work for mail archives.

The same approach could be used to deal in a mostly automatic way with web content moving around on any site -- and even with smaller documents being merged into one larger one. (The other way around would require generation of a stub page, and would be a bit harder to detect automatically.)