The UK Mirror Service uses the homebrew "syncfs" program to do FTP mirroring. This has a number of useful features aimed at getting the best performance when mirroring slow upstream sites. However, FTP is (quite reasonably) falling out of favour as a mirroring protocol; we tend to prefer rsync these days. Some of syncfs's features would be worth porting to rsync:
- Shadow copies. syncfs can connect to multiple source sites, treating one as the reference but preferring to pull data from the others. We use this when there is a slow master site and a faster local mirror: the master site is set to be the reference, and the mirror as the shadow, and syncfs will pull data from the mirror for files that match the master site. rsync should be able to do this without too much trouble.
- Reverse mirroring. We used to run from two physical sites, and did FTP mirroring to both sites. Each site was set to use the other as a shadow, so on average half the content was pulled from the source and half from the other site. To avoid duplication as far as possible, one site walked the directory tree in alphabetical order, and the other in reverse alphabetical order. It should be safe to run two copies of rsync on the same tree, since it does proper atomic file updates; if it were possible to tell one to invert the tree before syncing, then two connections could be used with minimal overlap.