The Hacker's Corner, #0: Shell Programming By The Block

The importance of parentheses and braces as grouping symbols
by Greywolf

Greetings and welcome to the inaugural article from the Hacker's Corner. In the experiences over the years, we all pick up a few tricks and traps and tips that would probably be of some use to somebody else.

Remember this is all advice, and you can take it for what it's worth.

[Bourne] Shell Programming - By the Block

If anyone here has had occasion to wander through a shell script of any substance, they will have found that, as the script progresses, the logical blocking -- the while/for/until-do-done, the if-then-elif-then-else-fi, and the case-esac blocks all become harder to match up, sometimes due to inconsistent or unfamiliar indenting styles, sometimes because one is unsure of precisely which fi REALLY matches which if.

This gets even trickier if one is writing self-generating shell scripts, but that's a topic for another article.

In the Bourne shell, there are a couple of operators which can help divide the script into neatly delineated programming "blocks" of code, all of which will be easily navigable by any editor which has a paren- matching function.

- Subshells

Parentheses ( <-- you know, these things --> ) in most shells indicate that everything within is to be done in a subshell; that is, the shell will fork, or create an exact copy of itself and run the commands in the context of the copy, after which the copy exits. The end result is that the main shell is unaffected. This is useful if you want to temporarily change directories or modify shell variables but don't want to have to remember their old values and reset them.

It turns out that these parentheses can be used as blocking symbols. If you're going to be running commands which will ultimately not affect the outcome of the shell, either by happenstance or design, rather than writing:

if [ -f ${somefile} ]; then
    OLDWD=`pwd`;
    OLDPATH=${PATH};
    PATH=/usr/sbin:/usr/bin:/usr/local/bin:${PROG_HOME}/bin;
    cd ${somedir};
    ${somecmd} -r -x ${datafile} -v > /tmp/cmd.out 2> /tmp/cmd.errs;
    cd ${OLDWD};
    PATH=${OLDPATH};
fi;

You may instead write:

if [ -f ${somefile} ]; then (
    PATH=/usr/sbin:/usr/bin:/usr/local/bin:${PROG_HOME}/bin;
    cd ${somedir};
    ${somecmd} -r -x ${datafile} -v > /tmp/cmd.out 2> /tmp/cmd.errs;
) fi;

And, unless your system is REALLY tight for process overhead, you've just saved yourself some lines of code by letting the subshell do all the work for you.

- Shell Blocks

Braces { <-- you know, these things --> } in Bourne-like shells indicate that the following code is to be done in a block. This is similar to the parentheses with two exceptions:

  1. The code in question is run in the context of the current shell; i.e. if you type
    	( exit; )
    
    the shell will not exit, whereas if you type
    	{ exit; }
    

    the shell will exit; and

  2. Parentheses are not reserved words as are "for", "while", "do", and braces.

Programmatically, braces are not necessary, but they are very helpful. You can achieve a C-like style through judicious use of braces. For example:

if ${checkcmd} -s 
    then ${runcmd} -r -x ${datafile};
        ...
        for file in ${flist}
        do ...
            while ${condition}
            do ... 
                ${updcondition};
            done;
        done;
    fi;

If you have a few if/then/elifs sprinkled in there, with some interspersed while/for loops, it can get rather difficult to match the blocks.

Now consider this:

    if ${checkcmd} -s; then {
        ${runcmd} -r -x ${datafile};
        ...
        for file in ${flist}; do {
            ...
            while ${condition}; do {
                ...
                ${updcondition};
            } done;
        } done;
    } fi;

(The practice of putting the 'then' and 'do' on the same line, along with the trailing semicolons, is a matter of personal preference.)

Not only is it much more obvious that there's a block of code here, but you can use your editor's paren-matching function (the percent key in command mode in vi, when the cursor is atop a grouping symbol, will find its match -- try it!) to locate precisely which "if" matches the "fi" you just typed.

This may not seem like much while writing the script, but if you have to debug a script which keeps saying

$script: 'fi' unexpected

or some such, you'll wish the script had braces to match against.

As mentioned above, the brace is a reserved word of sorts; the opening brace may immediately follow a "then" or "do", or it may appear on a line by itself. The closing brace MUST be the first word on a line (or following a semicolon), and there must be a space between it and the next word.

The upshot of all this is that if you use parentheses and braces as code group operators, it will be much easier to go traipsing around a script and know exactly what matches up, saving yourself a bit of trouble in the long run.

There is one caveat: You cannot use braces or parens to block a case statement; i.e.

    case ${foo} in {
    foo)
        ${do_something};
        ;;
    bar)
        ${do_something_else};
        ;;
    } esac;

will generate a syntax error at run time; you must write

    case ${foo} in
    foo)
        ${do_something};
        ;;
    bar)
        ${do_something_else};
        ;;
    esac;

However, you will also notice that paren-matching breaks inside case statements because of the use of the closing parenthesis as a case label. The solution to this is to comment an opening parenthesis somewhere before the one used as the case label. The above turns into:

    case ${foo} in #(
    foo)
        ${do_something};
        ;; #(
    bar)
        ${do_something_else};
        ;;
    esac;

The #( is a comment statement (remember, comments start with pound signs/sharps/number signs/octothorpe/whatever you want to call this thing -> # <-, and they end at the end of the line), and thus has no effect. The only time a # is NOT a comment is when it is in the middle of a word, including when it is immediately preceded by a $, at which point it represents the number of parameters passed into the main program (or a function), but we digress...

To cast a long story into a short, by adding the #(, we preserve the balance of parentheses and allow matching to work.

Next time: Stylistic tips for Shellers and Makers

Wanna submit something? For now, use your handy-dandy emailer, and send mail!