2004-10-26 · in Tech Notes · 633 words

I've done a couple of small patches for KRoC this week to cure minor annoyances. While the compiler's code is big and complex, it's actually pretty readable; some cautious grep usage (and a couple of hints from Fred a while ago) rapidly located the right places to make the changes. (Apparently the code used to be really messy, and Fred ran much of it through GNU indent to clean it up...)

Anyway, the first patch removed the need for the clock file. KRoC can use CPU timers on x86 to implement occam timers, and it needs to know the CPU clock speed to convert RDTSC's result into milliseconds. You used to have to provide this in a .kroc_clock file in your home directory or /etc, but it's pretty straightforward to read it from /proc/cpuinfo on Linux instead, which removes a configuration step. (While I was doing this, I found that my clock file still had the clock speed of my old "slow" machine in it.) Other OSs usually have ways of getting the x86 clock speed too (some of them even do a Linux /proc emulation). We could do with fixing the KRoC build script so that it doesn't tell you to create the clock file any more...

While doing that, I also fixed a really little bug whereby KRoC-compiled programs wouldn't notice if the clock file was empty. For future reference, the clock speed detection stuff is in ccsp/common/rtsmain.c, and it's used in tranx86/arch386.c; the variable is glob_cpufactor.

The second patch implements one of the items off my occam syntax ideas list: arbitrary indentation levels. You can now use any number of spaces (Python/Haskell-style) instead of just two between indentation levels, and tabs are counted as 8 spaces so tab damage in editors is less of an issue. This was a self-contained hack to the occ21 lexer (in occ21/fe/lex1.c): the compiler keeps track of the horizontal location of anything that it reads from a file, and these are used all over the place to check correct indentation, detect continuation lines and VALOF blocks, and so on.

The easiest way of implementing this turned out to be to keep a stack of previously-used indentation levels (I started off with just one, but that breaks if you drop back more than one level!) and a depth counter. The lexer's routine to count whitespace at the start of the line now adjusts a fudge-factor variable that's added to the horizontal position when working out where in the line we are -- so the rest of the code still thinks that the programmer's using two-space indentation, and (nearly) everything works happily.

Fred took the first patch, but the second one's still pretty experimental -- the handy you-are-here pointer in compiler error messages points to the wrong place. I also need to add a couple of new lexing errors: the stack of indentation "stops" (analogous to tab stops) can fill up, and it's possible to have a new kind of broken indentation where you fall back to a level that you hadn't used before:

A
    B
  C

It did strike me while looking at the source that it'd be pretty straightforward to make some of the compiler errors more helpful: for example, there are loads of places that generate the "Incorrect indentation" error, and for some errors it'd really help to have a list of possible causes. (The most common cause of that error for new occam programmers appears to be "Didn't expect to find code here; have you forgotten a SEQ?")

I'm also not entirely convinced that the logic that deals with extremely long lines is correct: the variable that keeps track of where the compiler is in the input buffer is used in the indentation-level calculation, so when the buffer is reused it'll go back to zero...