One of the buzzwords of the compiler community is automatic parallelisation -- taking sequential code and making it parallel to get better performance. However, this presupposes that it's easier to write sequential code. Instead, let's pretend we've got a language in which everything can be trivially written in a parallel fashion -- this means that to get reasonable performance, we now have to automatically decide which things can be done sequentially. This may well be an easier problem to solve.
A related problem is automatic buffering: relaxing CSP semantics in cases where we know it'll make no difference, in order to obtain better performance. A good example is a process which just generates a stream of data from its initial state; there is no reason why this can't be run on its own for many cycles and the output buffered, since it does not interact with the rest of the system. A more complex example is an audio processing system with a number of filter processes; if we have an upper bound for the total processing latency of the system, we could automatically compute buffering parameters to reduce context switch overhead. This gets more complex if your process graph contains cycles. It might be possible to tag channels as "deterministic" or "non-deterministic" to make the decision easier.
A final optimisation would be to detect deterministic stream generator processes, run them at compile time, and just include their output as an array (provided it's not too long).