Here I present various approaches to testing shell programs and discuss their relative merits. Further discussion of my shell test runner, Urchin, is in a separate article.
The overwhelming majority of shell programs don't have tests; here are some programs that didn't have tests as of 2012.
Didn't have tests in 2012
- git flow (still no tests in 2016)
- homeshick (has tests as of 2016)
- ievms (still no tests in 2016)
- rbenv (has tests as of 2016)
- z (still no tests in 2016)
Here are some that didn't have tests as of 2016. Some of these are very concise, and I imagine that is part of how they do fine without tests.
And here are some shell profiles that similarly lacked tests.
Some projects test with a long file that does a bunch of things and prints lots of output; the test is to compare the output to the expected output and make sure that it is exactly the same.
You use them like this.
This can get messy.
This approach is somewhat standard in other languages: Write functions inside of files or classes, and run assertions within those functions. Failed assertions and other errors are caught and raised.
In some cases these libraries use shell's built-in error handling, and in other cases they add their own assertion functions.
Implementations of this approach
Now let's discuss some implementations of this approach.
In Roundup, test cases are
functions, and their return code determines whether the test passes. Shell
already has a nice assertion function called test
, so Roundup doesn't need
to implement its own. It also provides its own way of structuring your
tests; you can use the
describe
function to name your tests, and you can define before
and after
functions to be run before and after test cases, respectively.
Here's a simple example from the Roundup documentation.
describe "My Utility"
it_displays_usage() {
usage=$(./my-utility | head -n 1)
test "$usage" = "usage: my-utility [arg1 ... argN]"
}
You can see more Roundup tests in spark.
As far as I can tell, there is no standard way of listing all of the functions
that are presently defined in a shell process. (Bash has declare
, but that
isn't standard.) Roundup uses regular expressions to look for function names
within files; here is the relevant section.
# Seek test methods and aggregate their names, forming a test plan.
# This is done before populating the sandbox with tests to avoid odd
# conflicts.
roundup_plan=$(
grep "^it_.*()" $roundup_p |
sed "s/\(it_[a-zA-Z0-9_]*\).*$/\1/g"
)
shunit follows the same paradigm of
organizing things into functions, but it defines its own assertion functions,
like assertEquals
and assertFalse
git-ftp uses it.
test_inits() {
init=$($GIT_FTP init)
assertEquals 0 $?
assertTrue 'file does not exist' "remote_file_exists 'test 1.txt'"
assertTrue 'file differs' "remote_file_equals 'test 1.txt'"
}
Like Roundup, shunit expects tests to be organized into functions and uses its own regular expression to list the functions; here is the relevant section in shunit.
# extract the lines with test function names, strip of anything besides the
# function name, and output everything on a single line.
_shunit_regex_='^[ ]*(function )*test[A-Za-z0-9_]* *\(\)'
egrep "${_shunit_regex_}" "${_shunit_script_}" \
|sed 's/^[^A-Za-z0-9_]*//;s/^function //;s/\([A-Za-z0-9_]*\).*/\1/g' \
|xargs
bash-infinity-framework is absolutely insane! It implements something that at least looks like object orientation and libraries; I don't really know how that works, but its included test library seems to work the same way as the libraries I mention above.
ts tests and setup/teardown procedures are specified as functions in a file.
#!/bin/sh
# pick a shell, any (POSIX) shell
setup () { # optional setup
mkdir -p "$ts_test_dir"
}
teardown () { # optional teardown
rm -r "$ts_test_dir"
}
test_true () { # write tests named like "test_"
true # return 0 to pass.
}
. ts # source ts to run the tests
ts provides a couple assertion functions and a skip function, and it exposes some of its state through shell variables.
It too looks for tests by parsing source files with its own regular expression.
# Prints all functions in a test file starting with 'test_' or the pattern
# given by ts_test_pattern. Recurses into sourced files if TS_TESTS_IN_SOURCE
# is set to true.
ts_list () {
ts_file="$1"
shift 1
if [ $# -eq 0 ]
then
grep -onE "^[[:space:]]*(${ts_test_pattern:-test_\w+})[[:space:]]*\(\)" /dev/null "$ts_file" |
sed -e 's/^\([^:]*\):\([0-9]\{1,\}\):[[:space:]]*\([^ (]\{1,\}\).*/\3 \1:\2/'
else
ts_list "$@" | awk -v file="$ts_file" '{ $2=file " -> " $2; print }'
fi
}
Bats's test cases use a bespoke syntax rather than the ordinary shell function syntax. (And its parser is also a regular expression.) Bats otherwise has a similar organizational structure to Roundup and shunit, but it adds its own idioms for passing certain information around. Here's an example from the Bats documentation.
@test "invoking foo with a nonexistent file prints an error" {
run foo nonexistent_filename
[ "$status" -eq 1 ]
[ "$output" = "foo: no such file 'nonexistent_filename'" ]
}
The `$status` variable contains the status code of the command, and
the `$output` variable contains the combined contents of the command's
standard output and standard error streams.
run
copies various aspects of program output to different variables.
This approach may be less concise as the standard way of accessing these
values, but the word names may be easier to read for some people.
load
sources (bash version of .
) files relative to the current test
file, rather than relative the current directory. I don't know why they
use this rather than just changing directory to the test file's
directory and using ordinary .
.
Bats is quite popular.
tf provides some special shell-style assertions ("matchers") that are specified as shell comments.
## User comments start with double #
## command can be written in one line with multiple tests:
true # status=0; match=/^$/
## or tests can be placed in following lines:
false
# status=1
Rather than just testing status codes or stdout, you can also test environment characteristics, and you can test multiple properties of one command. rvm uses it. It is written in ruby.
Cram implements its own language for specifying shell code that should be run, specifying what the output should be, and for adding arbitrary descriptions of what the tests are doing. Here's a simple example from its test suite.
Options in an environment variable:
$ CRAM='-y -n' cram
options -y and -n are mutually exclusive
[2]
When Cram runs this test it prints "Options in an environment variable:"
to explain what is going on and then runs the line CRAM='-y -n' cram
.
It expects options -y and -n are mutually exclusive
as output, and it
reports a success or failure based on whether the observed output matches.
It is written in Python.
In cmdtest, one test case spans multiple files. Minimally, you provide the test script, but you can also provide files for the stdin, the intended stdout, the intended stderr and the intended exit code. You can also specify setup and teardown procedures are files.
rnt is quite similar to cmdtest. Each test case corresponds to a directory containing, minimally, a "cmd" file and, optionally, a few others.
"cmd" is run, the results are compared to the files, any differences are reported.
sharness provides a more typical shell function.
test_expect_success "Success is reported like this" "
echo hello world | grep hello
"
This looks a lot like the aforementioned "tests-as-shell-functions"
tools, but it is different in that runs in the ordinary shell
interpreter; test_expect_success
is a shell function, so this testcase
is an invocation of a shell function rather than a definition of a
shell function. Thus, it doesn't rely on a bespoke interpreter for
listing the test functions.
JSON.sh has a bespoke test suite that runs all of the files in a directory and converts their exit codes to Test Anything Protocol output. Its code is simple enough that we can go through the whole thing right now.
#!/bin/sh
cd ${0%/*}
#set -e
fail=0
tests=0
#all_tests=${__dirname:}
#echo PLAN ${#all_tests}
for test in test/*.sh ;
do
tests=$((tests+1))
echo TEST: $test
./$test
ret=$?
if [ $ret -eq 0 ] ; then
echo OK: ---- $test
passed=$((passed+1))
else
echo FAIL: $test $fail
fail=$((fail+ret))
fi
done
if [ $fail -eq 0 ]; then
echo -n 'SUCCESS '
exitcode=0
else
echo -n 'FAILURE '
exitcode=1
fi
echo $passed / $tests
exit $exitcode
bocker's test suite uses the same concept and is even simpler.
#!/usr/bin/env bash
exit_code=0
for t in tests/test_*; do
bash tests/teardown > /dev/null 2>&1
bash "$t" > /dev/null 2>&1
if [[ $? == 0 ]]; then
echo -e "\e[1;32mPASSED\e[0m : $t"
else
echo -e "\e[1;31mFAILED\e[0m : $t"
exit_code=1
fi
bash tests/teardown > /dev/null 2>&1
done
exit "$exit_code"
Urchin has more features than JSON.sh's and bocker's tests but is based on the same principle of each test being a Unix program.
testlib.sh is a framework that runs as standard shell. Tests look like this.
begin_test "the thing"
(
set -e
echo "hello"
false
)
end_test
This may look like a completely different language, but is in fact
ordinary shell, except that the functions are run in a particular order.
begin_test
and end_test
are ordinary functions, and the parentheses
are an ordinary subshell.
shpec is another framework that runs in an ordinary shell interpreter. Tests look like this.
# in shpec/network_shpec.sh
describe "my server"
it "serves responses"
assert still_alive "my-site.com"
end
end
Again, while this doesn't look like shell, it is;
describe
, it
, still_alive
, and end
are functions that shpec
defined. still_alive
, in particular, is a shpec "matcher" function.
shpec matchers are usually a simple wrappers around test
.
The shall utility is intended specifically for testing portability. If you invoke a shell program with shall, the program gets run in several different shells, and the output from each is displayed. If you open an interactive shall shell, the same thing happens with each command you run. From the shall documentation,
# Pass a script to all shells via stdin, plus an argument on the command line.
echo 'echo "Passed to $0: $1"' | shall -s one
# Execute script 'foo-script' with argument 'bar' in all shells.
shall foo-script bar
# Print the type of the 'which' command in Bash and Zsh.
shall -w bash,zsh -c 'type which'
# Enter a REPL that evaluates commands in both Bash and Dash.
SHELLS=bash,dash shall -i
Here's what the output looks like.
# Echo the name of each executing shell; sample output included. $ shall -c 'echo "Hello from $0."'
Urchin, which I mention above, has similar support for running tests in multiple shells.
History of cross-shell testing in Urchin
- I wrote cross-shell tests for Urchin itself.
- Michael Klement wrote the cross-shell test runner.
- Michael Klement wrote shall.
First I wrote cross-shell tests for urchin itself.
Then Michael Klement realized that these tested only Urchin and not anything else that Urchin might be testing. So he wrote the original cross-shell testing feature. It hasn't changed fundamentally since.
Michael Klement later wrote shall.
When the topic of cross-shell testing originally came up, I had wanted to run my test cases with something like shall, but I still haven't come up with a nice way to do this without making things confusing or annoying to install.
These linters check your shell code for strange formatting and sources of potential error. I haven't used either, and they aren't exactly testing, but I think they deserve mention.
Having looked at how the different tools work, who uses the different tools, and what people test with them, I believe that different approaches are appropriate depending on your particular situation.
Here are the main things I would consider when determining which approach to follow.
For simple programs with simple user interfaces it might be best to avoid thinking much about the test runner; it might be best to forgo automated tests or to have a bespoke test script and thus to avoid introducing dependencies.
People who are very familiar with shells, especially sh, should quickly grasp bocker, the JSON.sh tests, and Urchin. rnt and cmdtest should come very easily too, but they will probably be less intuitive to such people because these programs have their own ways of sending inputs and comparing outputs.
The language extensions and environment variables rarely make tests any shorter to write, but I think that they are helpful for some people. These seemingly redundant features can make shell programs look more like programs from other languages, which some people might know better.
The testing frameworks that I have referenced mostly do very similar things, so much of the difference is just that they have different conventions. If you use BSD and write shell very often, I suspect that you'll find Urchin to be most intuitive, but if you are more familiar with other operating systems and with object-oriented programming, you will probably find one of the others to be more intuitive.
Test runner language
Special Urchin features
Parallel test execution
Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better. (Dijkstra)
I see no particular organizational constraints for personal projects.
If you want lots of people to use it, something that follows existing norms is good so people have an easier time learning it. Probably something written in shell with TAP output is good.
You should prefer the more complex tools if you are working at a tech company.
In a tech company you should prefer the tools that implement their substantial own language or conventions, such as cram, tf, cmdtest, and rnt. Because these require specialized knowledge, you and your colleagues are likely to have a harder time using them, at least at first, so it will look like you are working harder.
In case you decide against any of those, you should prefer the tools that expect tests to be defined as shell functions. While the test suites for these tools are technically standard shell, these tools implement their own mechanisms for listing the defined functions, and the functions must sometimes follow special conventions.
Avoid the tools that are implemented as ordinary shell functions because those are too easy to use.
Tangential comment on the merits of GNU/Linux
People complain a lot about how GNU and Linux are too messy to trust and are thus appalled when they say real businesses using such software. I think that business software ideally lives at the edge of catastrophic failure.
For business applications, your software should live at the edge of catastrophic failure.
In my experience, the entire tech industry is snake oil, and the primary job of the tech worker to maintain the illusion, to himself/herself and to colleagues, that he or she is important.
Functional software is not a priority because the software is fundamentally a scam. Unreliable software, on the other hand, helps the tech worker; maintaining such software is stressful and unpleasant and requires specialized knowledge, and all of this adds to the illusion that the tech work is important and that the workers are unique.
The tech workers are important because they are people, but the work they are doing is just a silly game that some people take very seriously.
First, note that these recommendations are mostly not from my personal experience; the only shell program that I presently maintain is Urchin, and its tests are run in Urchin.
Second, keep in mind that the differences among the tools is mostly in the ways that you think about tests and write test cases; if you want to use a particular tool that lack a particular feature (for example, cross-shell testing or advanced assertion functions), it should be easy to write a separate utility that provides at least a rough version of that feature.