As of the the latest commit, ShellCheck will try to detect shadowed case branches.
Here’s an adaptation from an unnamed script on GitHub:
case $1 in -h|--help) help exit 0 ;; -h|--hub) hub=$2 shift ;; *) die "Unknown option: $1" ;; esac
The original case
statement was significantly longer, so you’d be excused for not noticing the problem: -h
is used for two different branches. Because of this, -h
as a short option for --hub
will not work.
If you run ShellCheck on this example now, you will get a pair of helpful warnings:
Line 4: -h|--help) ^-- SC2221: This pattern always overrides a later one. Line 8: -h|--hub) ^-- SC2222: This pattern never matches because of a previous pattern.
Very simple and probably somewhat useful in certain cases, right? Well, it gets slightly more interesting.
Here is another example adapted from the wild:
case $1 in -h|--help|-?) usage exit ;; -v|--verbose) verbose=1 ;; *) die "Unknown option: $1" ;; esac
Did you spot the same problem? ShellCheck did:
Line 4: -h|--help|-?) ^-- SC2221: This pattern always overrides a later one.
Since an unescaped ?
matches any character, it will match also match -v
, so the short form of --verbose
will not work.
Similarly, it recognizes two separate issues in this example:
-*|--*) die "Invalid option: $1" ;; --) shift; break ;;
The end-of-option --
marker will never be recognized, and -*|--*
is redundant because the first already covers the second.
These are all very simple cases, but this also works more generally. Here’s a fabricated music sorting script where the bug would be exceedingly hard to spot in a longer list of bands:
case "${filename,,}" in *"abba"*.mp3 ) rm "$filename" ;; *"black"*"sabbath"*.mp3 ) mv "$filename" "Music/Metal" ;; esac
So how does it work?
There are very clever ways of determining whether one regular language is a superset of another by intersecting it with the complement of the other, and checking the result for satisfiability.
ShellCheck uses none of them.
I’ve written a regex inverter before, and that level of complexity was not something I wanted to introduce.
Instead, ShellCheck’s pattern intersection and superset supports only basic DOS style wildcard patterns: ?
, *
and literals. It just does a simple recursive match on the two patterns.
Let’s call the patterns A and B, and we wish to check if A is a superset of B, i.e. if A matches everything that B does.
We have two arbitrary shell patterns that we want to turn into a simplified form, while ensuring we don’t simplify away any details that will cause a false positive. ShellCheck does this in two ways:
It creates A in such a way that it’s guaranteed to match a (non-strict) subset of the actual glob. This just means giving up on any pattern that uses features we don’t explicitly recognize. $(cmd)foo@(ab|c)
is rejected, while *foo*
is allowed.
It then creates B to guarantee that it matches a (non-strict) superset of the actual glob. This is done by replacing anything we don’t support with a *
. $(cmd)foo@(ab|c)
just becomes *foo*
.
Now we can just match the two patterns against each other with an inefficient but simple recursive matcher. Matching two patterns is slightly trickier than matching a pattern against a string, but it’s still a first year level CS exercise.
It just involves breaking down the patterns by prefix, and matching until you reach a trivial base case:
- superset(“”, “”) = True
- superset(“”, cY) = False
- superset(cX, cY) = superset(X, Y)
- superset(*X, *Y) = superset(*X, Y)
- …
The actual code calls the simplified patterns “PseudoGlobs”, inhabited by PGAny ?, PGMany *, and PGChar c:
pseudoGlobIsSuperSetof :: [PseudoGlob] -> [PseudoGlob] -> Bool pseudoGlobIsSuperSetof = matchable where matchable x@(xf:xs) y@(yf:ys) = case (xf, yf) of (PGMany, PGMany) -> matchable x ys (PGMany, _) -> matchable x ys || matchable xs y (_, PGMany) -> False (PGAny, _) -> matchable xs ys (_, PGAny) -> False (_, _) -> xf == yf && matchable xs ys matchable [] [] = True matchable (PGMany : rest) [] = matchable rest [] matchable _ _ = False
That’s really all there is to it. ShellCheck just goes through each pattern, and flags the first pattern (if any) that it shadows. There’s also a pattern simplifier which rearranges c*?*?****d
into c??*d
to add some efficiency to obviously diseased patterns.
Future work could include supporting character sets/ranges since [yY]
is at least occasionally used, but it’s rare to find any extglob to warrant full regex support.
Of course, 99% of the time, there are no duplicates. 99.9% of the time, you’d get the same result with simple string matches.
However, that 0.1% of cases where you get delightful insights like -?
shadowing -v
or Linux-3.1*
shadowing Linux-3.12*
makes it all worthwhile.