2004-10-10 · in Ideas · 903 words

The most striking feature about occam's syntax for newcomers to the language is that it uses significant indentation: rather than delimiting code blocks with {} like C or Perl, the parser simply looks for changes in indentation level. Since good programmers indent their code anyway, this reduces redundancy and visual clutter.

occam isn't alone in this: the popular modern languages Python and Haskell are also indentation-based. Their syntaxes have some nice features that occam currently doesn't. I'd like to revise the occam syntax so that it's comparable to more recent indentation-based languages.

I don't want to change the semantics of the language at all; I'm just proposing an alternative syntax for the existing occam-pi language. The intention is to borrow the useful syntactical features from newer languages, and hopefully make occam look a bit less alien to new programmers at the same time.

A few really useful features don't fit into this scheme very well at the moment: replicated IFs, extended channel inputs, VALOF expressions. I'll need to think some more about how these could be represented.

I'd also like to come with with an example bit of occam code that uses all the features here, and mutate it as I work through the suggestions.

(One of the suggestions here was to add ASSERT to the language, until Fred pointed out that KRoC already has it!)

Flexible indentation

In occam, each indentation step must be two spaces.

WHILE foo
  SEQ
    bar ()
    IF
      condition
        baz ()

Python and Haskell don't care, provided you're consistent between lines in the same block. Python counts tabs as eight spaces; some people have suggested making it complain if you mix tabs and spaces.

WHILE foo
    SEQ
        bar ()
        IF
          condition
            baz ()

(Not that I'd actually want to indent code like that!)

Lowercase keywords

These days, most languages don't make you SHOUT ALL THE TIME. Modern syntax-highlighting editors differentiate keywords by colour, so there's no particular need for them to be capitalised any more.

while foo
  seq
    bar ()
    if
      condition
        baz ()

Simpler IF syntax

The occam IF syntax is elegant for complicated stuff, but for simple usage it's a bit verbose:

IF
  condition
    do.something ()
  other.condition
    do.something.else ()
  TRUE
    SKIP

A Python-style syntax could write this as:

if condition:
    do_something()
elif other_condition:
    do_something_else()
else:
    skip

Or perhaps even have the compiler insert the skip clause automatically; when you want the old behaviour, you can always add an else: stop clause yourself.

Changing what colons mean

occam uses colons to indicate that a declaration is in force for the next block:

INT foo:
BOOL bar:
SEQ
  c ? foo
  do.something (c)

Python uses colons after statements that need an indentation increase after them:

while foo:
    bar()
    if x == 3:
        print "foo"
    else:
        print "bar"

In neither case are the colons actually needed (the Ruby language is pretty similar but has neither, for instance), but I find the Python style to be a bit more readable.

Implicit SEQs

Current occam requires you to insert SEQ or PAR whenever you have multiple processes:

WHILE condition
  SEQ
    do.one.thing ()
    do.another.thing ()
    IF
      other.condition
        SEQ
          do.third.thing ()
          do.fourth.thing ()

A quick look at the occamnet code shows that I use about three times as many SEQs as I do PARs. It'd be possible to make occam code rather more compact by assuming that any set of multiple processes is wrapped in a SEQ unless it's already wrapped in a PAR. I suspect this'd be a controversial change, because it could result in programmers thinking less about opportunities to parallelise their code; we'd have to decide whether the increased readability makes it worthwhile.

WHILE condition
  do.one.thing ()
  do.another.thing ()
  IF
    other.condition
      do.third.thing ()
      do.fourth.thing ()

You'd still need to have SEQ in the language, so that you can do replicated SEQs or force an extra internal scope. It may also be necessary to come up with a different syntax for extended inputs, which need to have two code blocks specified (without an implicit SEQ).

Better syntax for INITIAL

occam-pi lets you initialise a variable when you declare it. However, initialising a variable currently looks more like a VAL declaration:

INT foo:
INITIAL INT bar IS 4:
VAL INT baz IS 5:

I'd prefer to have a syntax that looks more like a variable declaration:

INT foo IS 4:
INT foo := 4:

Underscores in variable names

occam, unlike pretty much every other programming languages, allows dots in variable names.

INT my.string:

Many languages use underscores instead for the same purpose.

int my_string

Field access

occam uses array-like syntax to refer to fields in structures.

packet[ip.id] := 3

C-derived languages use dots.

packet.ip_id := 3

C-style assignment and equality operators

occam uses the same scheme as many 70s-era languages for setting and comparing variables.

num := 3
IF
  num = 3
    print ("occam works!")
  num <> 3
    print ("occam doesn*'t work!")
  TRUE
    print ("occam really doesn*'t work!")

Modern languages tend to use a set of operators derived from C instead:

num = 3
if num == 3:
    print("occam works!")
elif num != 3
    print("occam doesn't work!")
else:
    print("occam really doesn't work!")

C-style string escapes

occam (following Modula?) uses * to escape characters in string constants (and complains if you don't escape ' characters).

print ("Hello, world!*c*n")

Modern languages follow the C conventions.

print("Hello, world!\r\n")

It's also unusual to use \r\n as a line-ending sequence on the systems that occam's used on these days; I'd rather just be able to write \n. (As with C, you could have \n translated to the appropriate line-ending sequence on platforms that use \r or \r\n.)