1. Purpose of this document
  2. About my coding style
  3. Formatting
  4. Coding
  5. Architecture


Purpose of this document:

There are plenty of coding-style manifestoes out there. This isn't intended to be yet another one of those. I see these coding suggestions as serving several purposes:

  1. As an aid to other developers who may need to read my code. My coding style differs significantly from nearly everyone else's, so an explanation of how and why I code the way that I do may make my code easier for others to read.
  2. As suggestions that may be helpful to others. I'm not claiming that my style is the One True Way. However, I've been writing computer programs for over twenty years now, and my coding style has been evolving over that entire time. Unlike many of the other style-guides out there, these suggestions are deeply rooted in practice, not just theory: I've experimented with many different coding conventions, keeping those that have made my life easier, and abandoning many that, while elegant in theory, proved unhelpful in the actual business of writing correct and maintainable code.
  3. To discourage bad coding practices. While there is no one Right Way to code, there are certainly a few Wrong Ways -- many of them depressingly common, even in professional code. In what follows, you will find plenty of rants against what I consider bad coding practices.

This document is not meant to be exhaustive. There are quite a few issues on which I remain entirely neutral. (For example, "Hungarian notation": some people love it, some people hate it. I personally don't have particularly strong feelings either way.) Instead, I focus only on those aspects of coding where I believe my coding style offers an advantage over "general practice", or where I personally have seen a lot of bad coding practices that I would like to see corrected.


About my coding style:

First, one of my goals has been to develop a coding style that is language-neutral. Nearly all published style guides that I've seen have been geared towards one specific language -- usually Java or Perl or C++ -- and contain many highly specific recommendations that aren't applicable to other languages.

One problem with this approach is that it often defeats one of the primary goals of using a style-guide in the first place, which is to keep coding styles consistent. Large projects often encompass many different programming languages, and therefore a set of guidelines which can be applied to all languages -- and thus to all of the code in a project -- is highly desirable. Of course some adjustments will need to be made to adapt from one language to another, but these adjustments should be as minor and obvious as possible.



My formatting style is based on "K&R style". In any construct in which an opening statement is followed by a code block, the opening curly-brace of the block is placed on the same line as the opening statement, statements inside the code block are indented one tab, and the closing curly-brace is placed on a new line, aligned with the opening statement:

    //  this is how I would do it:
    char * jump() {
        return "how high?";

Many C/C++ coders prefer to keep the curly braces lined up, like so:

    //  this is okay too:
    char * jump() 
        return "how high?";

This is fine too, though I don't personally find any great value in keeping the braces aligned, and it does take up considerable amounts of vertical screen space. One thing that's not fine (in my opinion) is when developers use multiple levels of indentation for a single code block, like this:

    //  this is BAD:
    char * jump() 
            return "how high?";

This is confusing and hard to read. Don't do it.

Now here's where I think my coding style improves on K&R style: I've generalized the formatting for code blocks to all multi-line constructs. Here's the rule, more-or-less:

In any syntactical element that spans more than one line, the closing punctuation for that element should be placed on a separate line, aligned with the opening of that element, and intervening sub-elements should be placed on separate lines and indented one tab. This formatting should be applied recursively to any multi-line sub-elements.

This is only a heuristic explanation, not a strict formal definition. A few examples should make it clear:

    #    Perl example with formatting applied recursively
    #     to sub-elements:

In some cases it makes sense to treat what are technically sub-elements as separate elements, and the closing of one of these elements as the opening of the next. The most common examples are function definitions and if-then-else statements:

    void jump(
        float           height,
        int             x_distance,
        long            y_distance,
        const char *    message,
        int *           return_code

    ) {
        *return_code = k_not_yet_implemented;
    if ( foo ) {
    } else if ( bar ) {
    } else {

There are several reasons why I like this formatting convention. First is simply consistency. I know of no developers who would format an if-then statement like this:

    if (foo) { do_something(); do_something_else();
                                do_yet_another_thing(); }

Yet many developers will write a long function call like so:

    some_function_with_lots_of_arguments( argument_1, argument_2,
                                                        argument_3 );

This is just as ugly and illegible as the if-then example above, so why do it?

Another reason I like this convention is that it supports another guideline of mine:

The most important parts of the code should be placed as far to the left -- i.e. as close to the beginning of a line -- as possible.

Unless you're coding in Arabic, code is read from left-to-write. Constructs placed to the left, at the beginning of a line, will stand out. Constructs placed to the right, at the end of a line, are easily overlooked. This is another reason why formatting a function call like this

    some_function_with_lots_of_arguments( argument_1, argument_2,
                                                        argument_3 );

is bad. Important information is being hidden away at the right edge of the screen where it's hard to see.

Following this guideline, whenever a complex arithmetic and/or logical expression is broken up into multiple lines, each connecting operator will be placed at the beginning of a line, not at the end:

    sum_of_products = (
        (value_1 * multiplier_1)
        + (value_2 * multiplier_2)
        + (value_3 * multiplier_3)
    is_human = (
        ( number_of_legs == 2 )
        && !has_feathers

If you think this looks "weird", it might help to consider how you learned to write arithmetic expressions back in grade school:

    +   678

This guideline is especially helpful for taming the useful but unwieldy trinary operator:

    value = (
        ? do_foo()
        : do_bar()

Expressions using the trinary operator are often difficult to read. If you're consistent in formatting expressions in this way, their semantics become much more immediately obvious.

This brings me to my next guideline:

Splitting expressions into multiple lines is GOOD.

You don't have to wait until the code hits the 80-character mark, or until it becomes utterly illegible, before adding a carriage return. If splitting an expression up into multiple lines will make it even a little easier to read, then you should do so.

The observant will have noticed that many of the expressions above contain "unnecessary" parentheses. This brings me to my next guideline:

Extra parentheses are GOOD.

It's better to have too many parentheses than not enough. When in doubt, use extra parentheses. When not in doubt, use extra parentheses. In my opinion, it's scarcely possible to use too many parentheses.

For one thing, real programmers have better things to do with their brain cells than memorize operator-precedence rules. This is especially true for programmers who work in many different languages. A developer shouldn't need to know whether bitwise-or binds more or less tightly than integer addition, or whether it's right- or left-associative, in order to understand your code.

I often hear developers say "As long as the expression does what it looks like it ought to do, then there's no reason to use extra parentheses." Such a developer might give as an example the following Perl expression:

    $value = defined $value ? $value : '';

It might seem silly to parenthesize this expression, like so:

    $value = ( defined($value) ? $value : '' );

because really, this is the only order of operations that would make any sense.

But this argument against extra parentheses misses one important point: someone else looking over your code has no a priori assurances that your code is correct. They may very well be scanning your code for bugs. Even if your code is flawless, the person reading your code may have spent the past week wading through thousands of lines of buggy spaghetti-code written by others, and so may be distrustful of any even slightly ambiguous construct.

Sure, it's obvious what you intended when you wrote:

    $value = defined $value ? $value : '';

But how do I know, without looking up Perl's operator-precedence rules, that this line is actually doing what you intended it to do? Maybe I've found a bug in your code. Maybe Perl is actually parsing this line as:

    $value = defined( $value ? $value : '' );

or even as:

    ( $value = defined( $value ) ) ? $value : '';

If you had added those "unnecessary" parentheses to begin with, then that would be one less thing that I would have to worry about.

Another argument I hear against using extra parentheses is that they make an expression cluttered and hard to read. This is only true if you insist on cramming everything onto one line. But if you follow my other guidelines, splitting expressions into multiple lines and indenting sub-expressions, then the extra parentheses can make even huge, messy algebraic expressions relatively easy to read.

    x = sqrt( b ^^ 2 - 4 * a * c ) / 2 * a;


    x = 
                ( b ^^ 2 )
                - ( 4 * a * c )
            / 2 
        * a

Yes, the first version is prettier, but the second version is clearer.



Know the difference between Java and a "Swiss Army Chainsaw".

There are, essentially, two different kinds of programming languages.

On the one hand, there are languages like Java or Pascal. These languages were designed with very specific coding philosophies: there is a Right Way and a Wrong Way to write code in Java or Pascal. The syntax and feature-set of these languages were designed to make it easy to write code the Right Way and difficult or impossible to write code the Wrong Way. The language is designed to encourage good coding practices and to discourage bad coding practices.

On the other hand, there are what might be called "Swill Army Chainsaw" languages like C++ and Perl. The philosophy behind C++ or Perl is that the language should provide as many features and different ways of doing things as possible. Some of the features provided are probably not a very good way of doing things, ever, but they're there anyway, just in case. It's up to you, the programmer, to decide what are the Right Ways and Wrong Ways to program in a Swill Army Chainsaw language. The language isn't going to make those decisions for you, because it isn't the job of the language to teach you how to code.

Unfortunately, many programmers really ought to be working in a language that does teach them how to code. A Swiss Army Chainsaw Language empowers programmers to write code in any way that they want. In far too many cases, that means empowering programmers to write code very, very badly.

A large part of this is because many programmers don't understand the design philosophy behind the Swiss Army Chainsaw languages. Most programmers seem to assume that every single language feature is there for a reason. It is, but the reason is often that it just might possibly be legitimately useful to someone, somewhere, sometime -- not that it's usually a Good Idea. The annotations to many, perhaps most, features of C++ and Perl really ought to read: "DON'T EVER, EVER USE THIS FEATURE."

But many programmers seem to think that it's somehow "cool" to use as many language features as possible. They apparently think that the more language features you use, the better programmer you are. I wouldn't be surprised to find that some C++ programmers deliberately design class hierarchies in which classes inherit from multiple subclasses of the same base class just so they can define their base classes as "virtual".

(I have a hint for C++ programmers out there: if you are ever, ever thinking about declaring a base class as "virtual" -- or in fact of using an object hierarchy in which you need to worry about whether a base class is virtual or not -- it's time to take a break. Get away from the computer. Take a nap. And while you're at it you might want to check your abdomen for parasitic wasp larvae. Come back when you're thinking straight, and are certain that your design decisions aren't being dictated by strange insect pheromones...)

Yes, a good programmer will be aware of most or all of a language's features (if for no other reason than to avoid invoking any of them unintentionally) but that does not mean that they are going to use all or even most of them.

And even when a language feature has many legitimate uses, most programmers can't seem to resist overusing and misusing it. Take, for example, operator overloading in C++. It makes sense to overload operators if and only if the overloaded operator has a meaning which is the same or very similar to the meaning that that operator has for basic types. It makes sense to define operator+ for a complex number class, because addition is just as much a standard, well-defined mathematical operation for complex numbers as it is for integers. I recently coded a function in which I needed to do a bitmask on every single member of a fairly large structure, so it made sense to define operator& for that structure. It does not make sense to define bitshift operators to do I/O. (Yet this is what the STANDARD LIBRARY does! Is there any logical similarity between reading from an open socket and bitshifting right? No. The standard library overloads operator>> that way because it "looks pretty". This is stupid.)



Circle is not a subclass of Ellipse.

It is rather unfortunate that, in standard OOP terminology, object types are known as "classes", and that inheritance relationships are described in terms of "subclasses" and "superclasses". What many developers don't understand is that "class", "subclass", and "superclass" are technical terms whose OOP meanings are only tangentially related to their more common, logical meanings. If object type Subclass inherits from object type Superclass, this does not necessarily mean that Subclass is in all ways a logical subclass of Superclass. All it means is that objects of type Subclass are plug-compatible with objects of type Superclass. Type Subclass is a logical subclass of "things which can do everything that things of type Superclass can do". Whether this means that type Subclass is a logical subclass of type Superclass in any other way depends on what exactly Subclass and Superclass are meant to be modelling. Object hierarchies are often useful for modelling logical class relationships, but that does not mean that OO classes and logical classes are the same thing.

Unfortunately, many developers don't understand this, and think that anything which is a logical subclass of something else can and should be modeled as a subclass in an OO class hierarchy. This leads to all sorts of absurdities. The classic example is trying to make Circle a subclass of Ellipse. Logically, circles represent a subclass of ellipses. But if you try to make an object type Circle that inherits from an object type Ellipse, it doesn't work, because circles are not plug-compatible with ellipses. An ellipse can do things a circle can't, like having differing height and width.

In this case, it would make more sense to have the object type Ellipse inherit from the object type Circle -- since ellipses are generally plug-compatible with circles -- making Ellipse a "subclass" of Circle. If this seems "backwards", it's only because of a confusion of terminology: "subclass" means something different in OOP than it does in formal logic. (Of course, this still might not be the best solution: it would probably make more sense to have Circle and Ellipse be siblings, and both inherit from a common base type such as "Shape". One might also question whether one really needs a Circle class at all, since objects of type Ellipse will function as circles when passed the correct parameters. The point here is that making Circle a subclass of Ellipse is definitely the wrong thing to do.)