Pages: About | Home | My Jotter | PFT

Multiple getops

dacav - 2019-8-27
Tags: [ shell ] [ Today I learned ] [ wtf ]

Interesting pitfall encountered while shell-scripting.

Let's consider the case of two functions, one invoking the other, both using the shell builtin getopt to extract command line parameters:

fun() {
    local opt

    while getopts "x:y:z:" opt; do
        printf "fun [%s] -> [%s]\n" $opt $OPTARG >&2
    done
}

morefun() {
    local x y opt

    while getopts "x:y:" opt; do
        printf "morefun [%s] -> [%s]\n" $opt $OPTARG >&2
        case "$opt" in
        x) x="$OPTARG";;
        y) y="$OPTARG";;
        esac
    done

    printf "morefun: [%s, %s]\n" "$x" "$y" >&2

    fun -x "$x" -y "$y" -z 5
}

What would you expect by the following invocation?

morefun -x one -y two

You are running it in dash.

If you are using dash (usually the case under Debian-like systems) you will see a reasonable behaviour:

morefun [x] -> [one]
morefun [y] -> [two]
morefun: [one, two]
fun [x] -> [one]
fun [y] -> [two]
fun [z] -> [5]

You are running it in bash.

If you are running that in bash (used as /bin/sh interpreter by Red Hat / CentOS / Fedora and friends) you will be surprised:

morefun [x] -> [one]
morefun [y] -> [two]
morefun: [one, two]
fun [z] -> [5]

That's funny right?

What happens there?

I had a discussion with a colleague who actually saw this before. It turned out to be yet another of those shell quirks that we all love. It almost makes sense, if one keeps in mind that getopts relies on environment variables.

Of course this problem appears only when using getopt multiple times in the context of the same shell process, that is basically when using shell functions. If you are using getopt to fetch the command line options for the whole script (and never again) you are on the safe side.

The interesting bit is that, dash and bash show a different behaviour, and dash does the least surprising thing, in my opinion.

I'm often using dash to interpret my scripts, even under Fedora. The reason is that dash is stricter when it comes to POSIX compliance (not to mention faster — oh, and the man page is much clearer). I hit the problem the first time I ran the script under /bin/sh (which is bash). But the same would have happened if the script was written under Debian and executed under any Red-Hat-like.

Which of the two implementations is doing it right? Let's see what POSIX has to say:

If the application sets OPTIND to the value 1, a new set of parameters can be used: either the current positional parameters or new arg values. Any other attempt to invoke getopts multiple times in a single shell execution environment with parameters (positional parameters or arg operands) that are not the same in all invocations, or with an OPTIND value modified to be a value other than 1, produces unspecified results.

Of course. What is more neutral than unspecified results?

In conclusion, OPTIND is to blame. That is the variable determining the global state. The right pattern is to reset OPTIND before each while loop.

OPTIND=1;
while getopts "x:y:z:" opt; do
    # ...
done