Interesting pitfall encountered while shell-scripting.
Let's consider the case of two functions, one invoking the other, both
using the shell builtin getopt
to extract command line parameters:
fun() {
local opt
while getopts "x:y:z:" opt; do
printf "fun [%s] -> [%s]\n" $opt $OPTARG >&2
done
}
morefun() {
local x y opt
while getopts "x:y:" opt; do
printf "morefun [%s] -> [%s]\n" $opt $OPTARG >&2
case "$opt" in
x) x="$OPTARG";;
y) y="$OPTARG";;
esac
done
printf "morefun: [%s, %s]\n" "$x" "$y" >&2
fun -x "$x" -y "$y" -z 5
}
What would you expect by the following invocation?
morefun -x one -y two
You are running it in dash
.
If you are using dash (usually the case under Debian-like systems) you will see a reasonable behaviour:
morefun [x] -> [one]
morefun [y] -> [two]
morefun: [one, two]
fun [x] -> [one]
fun [y] -> [two]
fun [z] -> [5]
You are running it in bash
.
If you are running that in bash
(used as /bin/sh
interpreter by Red Hat
/ CentOS / Fedora and friends) you will be surprised:
morefun [x] -> [one]
morefun [y] -> [two]
morefun: [one, two]
fun [z] -> [5]
That's funny right?
What happens there?
I had a discussion with a colleague who actually
saw this before. It turned out to be yet another of those shell quirks
that we all love. It almost makes sense, if one keeps in mind that
getopts
relies on environment variables.
Of course this problem appears only when using getopt
multiple times in
the context of the same shell process, that is basically when using shell
functions. If you are using getopt
to fetch the command line options for
the whole script (and never again) you are on the safe side.
The interesting bit is that, dash
and bash
show a different behaviour,
and dash
does the least surprising thing, in my opinion.
I'm often using dash
to interpret my scripts, even under Fedora. The
reason is that dash
is stricter when it comes to POSIX compliance (not to
mention faster — oh, and the man page is much clearer).
I hit the problem the first time I ran the script under /bin/sh
(which is
bash
). But the same would have happened if the script was written
under Debian and executed under any Red-Hat-like.
Which of the two implementations is doing it right? Let's see what POSIX has to say:
If the application sets OPTIND to the value 1, a new set of parameters can be used: either the current positional parameters or new arg values. Any other attempt to invoke getopts multiple times in a single shell execution environment with parameters (positional parameters or arg operands) that are not the same in all invocations, or with an OPTIND value modified to be a value other than 1, produces unspecified results.
Of course. What is more neutral than unspecified results?
In conclusion, OPTIND
is to blame. That is the variable determining the
global state. The right pattern is to reset OPTIND
before each
while
loop.
OPTIND=1;
while getopts "x:y:z:" opt; do
# ...
done