Testing shell programs

16 Nov.,2022


valve shell

Testing shell programs

Here I present various approaches to testing shell programs and discuss their relative merits. Further discussion of my shell test runner, Urchin, is in a separate article.

No automated tests

The overwhelming majority of shell programs don't have tests; here are some programs that didn't have tests as of 2012.

Didn't have tests in 2012

  • git flow (still no tests in 2016)
  • homeshick (has tests as of 2016)
  • ievms (still no tests in 2016)
  • rbenv (has tests as of 2016)
  • z (still no tests in 2016)

Here are some that didn't have tests as of 2016. Some of these are very concise, and I imagine that is part of how they do fine without tests.

And here are some shell profiles that similarly lacked tests.

Simple single-file test

Some projects test with a long file that does a bunch of things and prints lots of output; the test is to compare the output to the expected output and make sure that it is exactly the same.

You use them like this.

  1. Write decent error messages in the program you are testing.
  2. Write a sequence of commands with your program that should work.
  3. Run that sequence of commands.
  4. Look for error messages.

This can get messy.

Functions, or something like functions, as test cases

This approach is somewhat standard in other languages: Write functions inside of files or classes, and run assertions within those functions. Failed assertions and other errors are caught and raised.

In some cases these libraries use shell's built-in error handling, and in other cases they add their own assertion functions.

Implementations of this approach

Now let's discuss some implementations of this approach.


In Roundup, test cases are functions, and their return code determines whether the test passes. Shell already has a nice assertion function called test, so Roundup doesn't need to implement its own. It also provides its own way of structuring your tests; you can use the describe function to name your tests, and you can define before and after functions to be run before and after test cases, respectively. Here's a simple example from the Roundup documentation.

describe "My Utility"

it_displays_usage() {
  usage=$(./my-utility | head -n 1)
  test "$usage" = "usage: my-utility [arg1 ... argN]"

You can see more Roundup tests in spark.

As far as I can tell, there is no standard way of listing all of the functions that are presently defined in a shell process. (Bash has declare, but that isn't standard.) Roundup uses regular expressions to look for function names within files; here is the relevant section.

# Seek test methods and aggregate their names, forming a test plan.
# This is done before populating the sandbox with tests to avoid odd
# conflicts.
  grep "^it_.*()" $roundup_p           |
  sed "s/\(it_[a-zA-Z0-9_]*\).*$/\1/g"


shunit follows the same paradigm of organizing things into functions, but it defines its own assertion functions, like assertEquals and assertFalse git-ftp uses it.

test_inits() {
  init=$($GIT_FTP init)
  assertEquals 0 $?
  assertTrue 'file does not exist' "remote_file_exists 'test 1.txt'"
  assertTrue 'file differs' "remote_file_equals 'test 1.txt'"

Like Roundup, shunit expects tests to be organized into functions and uses its own regular expression to list the functions; here is the relevant section in shunit.

# extract the lines with test function names, strip of anything besides the
# function name, and output everything on a single line.
_shunit_regex_='^[  ]*(function )*test[A-Za-z0-9_]* *\(\)'
egrep "${_shunit_regex_}" "${_shunit_script_}" \
|sed 's/^[^A-Za-z0-9_]*//;s/^function //;s/\([A-Za-z0-9_]*\).*/\1/g' \


bash-infinity-framework is absolutely insane! It implements something that at least looks like object orientation and libraries; I don't really know how that works, but its included test library seems to work the same way as the libraries I mention above.


ts tests and setup/teardown procedures are specified as functions in a file.

# pick a shell, any (POSIX) shell

setup () {              # optional setup
  mkdir -p "$ts_test_dir"

teardown () {           # optional teardown
  rm -r "$ts_test_dir"

test_true () {          # write tests named like "test_"
  true                  # return 0 to pass.

. ts                    # source ts to run the tests

ts provides a couple assertion functions and a skip function, and it exposes some of its state through shell variables.

It too looks for tests by parsing source files with its own regular expression.

# Prints all functions in a test file starting with 'test_' or the pattern
# given by ts_test_pattern.  Recurses into sourced files if TS_TESTS_IN_SOURCE
# is set to true.
ts_list () {
  shift 1

  if [ $# -eq 0 ]
    grep -onE "^[[:space:]]*(${ts_test_pattern:-test_\w+})[[:space:]]*\(\)" /dev/null "$ts_file" |
    sed -e 's/^\([^:]*\):\([0-9]\{1,\}\):[[:space:]]*\([^ (]\{1,\}\).*/\3 \1:\2/'
    ts_list "$@" | awk -v file="$ts_file" '{ $2=file " -> " $2; print }'


Bats's test cases use a bespoke syntax rather than the ordinary shell function syntax. (And its parser is also a regular expression.) Bats otherwise has a similar organizational structure to Roundup and shunit, but it adds its own idioms for passing certain information around. Here's an example from the Bats documentation.

@test "invoking foo with a nonexistent file prints an error" {
  run foo nonexistent_filename
  [ "$status" -eq 1 ]
  [ "$output" = "foo: no such file 'nonexistent_filename'" ]

The `$status` variable contains the status code of the command, and
the `$output` variable contains the combined contents of the command's
standard output and standard error streams.

run copies various aspects of program output to different variables. This approach may be less concise as the standard way of accessing these values, but the word names may be easier to read for some people.

load sources (bash version of .) files relative to the current test file, rather than relative the current directory. I don't know why they use this rather than just changing directory to the test file's directory and using ordinary ..

Bats is quite popular.

Extend a shell language in other ways

  • tf
  • cram


tf provides some special shell-style assertions ("matchers") that are specified as shell comments.

## User comments start with double #
## command can be written in one line with multiple tests:
true # status=0; match=/^$/
## or tests can be placed in following lines:
# status=1

Rather than just testing status codes or stdout, you can also test environment characteristics, and you can test multiple properties of one command. rvm uses it. It is written in ruby.


Cram implements its own language for specifying shell code that should be run, specifying what the output should be, and for adding arbitrary descriptions of what the tests are doing. Here's a simple example from its test suite.

Options in an environment variable:

  $ CRAM='-y -n' cram
  options -y and -n are mutually exclusive

When Cram runs this test it prints "Options in an environment variable:" to explain what is going on and then runs the line CRAM='-y -n' cram. It expects options -y and -n are mutually exclusive as output, and it reports a success or failure based on whether the observed output matches. It is written in Python.

Several files make up one test

  • cmdtest
  • rnt


In cmdtest, one test case spans multiple files. Minimally, you provide the test script, but you can also provide files for the stdin, the intended stdout, the intended stderr and the intended exit code. You can also specify setup and teardown procedures are files.

  • stdin
  • expected stdout
  • expected stderr
  • expected exit code
  • setup
  • teardown


rnt is quite similar to cmdtest. Each test case corresponds to a directory containing, minimally, a "cmd" file and, optionally, a few others.

  • cmd (required)
  • exit.expected
  • out.expected
  • err.expected

"cmd" is run, the results are compared to the files, any differences are reported.

Tests as ordinary shell calls

  • JSON.sh
  • bocker
  • sharness
  • Urchin


sharness provides a more typical shell function.

test_expect_success "Success is reported like this" "
    echo hello world | grep hello

This looks a lot like the aforementioned "tests-as-shell-functions" tools, but it is different in that runs in the ordinary shell interpreter; test_expect_success is a shell function, so this testcase is an invocation of a shell function rather than a definition of a shell function. Thus, it doesn't rely on a bespoke interpreter for listing the test functions.

Tests as ordinary Unix Programs

  • JSON.sh, bocker
  • Urchin


JSON.sh has a bespoke test suite that runs all of the files in a directory and converts their exit codes to Test Anything Protocol output. Its code is simple enough that we can go through the whole thing right now.


cd ${0%/*}

#set -e
#echo PLAN ${#all_tests}
for test in test/*.sh ;
  echo TEST: $test
  if [ $ret -eq 0 ] ; then
    echo OK: ---- $test
    echo FAIL: $test $fail

if [ $fail -eq 0 ]; then
  echo -n 'SUCCESS '
  echo -n 'FAILURE '
echo   $passed / $tests
exit $exitcode


bocker's test suite uses the same concept and is even simpler.

#!/usr/bin/env bash

for t in tests/test_*; do
  bash tests/teardown > /dev/null 2>&1
  bash "$t" > /dev/null 2>&1
  if [[ $? == 0 ]]; then
    echo -e "\e[1;32mPASSED\e[0m : $t"
    echo -e "\e[1;31mFAILED\e[0m : $t"
  bash tests/teardown > /dev/null 2>&1
exit "$exit_code"


Urchin has more features than JSON.sh's and bocker's tests but is based on the same principle of each test being a Unix program.

Tests as ordinary shell calls between special shell functions

  • testlib.sh
  • shpec


testlib.sh is a framework that runs as standard shell. Tests look like this.

begin_test "the thing"
  set -e
  echo "hello"

This may look like a completely different language, but is in fact ordinary shell, except that the functions are run in a particular order. begin_test and end_test are ordinary functions, and the parentheses are an ordinary subshell.


shpec is another framework that runs in an ordinary shell interpreter. Tests look like this.

# in shpec/network_shpec.sh
describe "my server"
  it "serves responses"
    assert still_alive "my-site.com"

Again, while this doesn't look like shell, it is; describe, it, still_alive, and end are functions that shpec defined. still_alive, in particular, is a shpec "matcher" function. shpec matchers are usually a simple wrappers around test.

Cross-shell testing

The shall utility is intended specifically for testing portability. If you invoke a shell program with shall, the program gets run in several different shells, and the output from each is displayed. If you open an interactive shall shell, the same thing happens with each command you run. From the shall documentation,

    # Pass a script to all shells via stdin, plus an argument on the command line.
    echo 'echo "Passed to $0: $1"' | shall -s one

    # Execute script 'foo-script' with argument 'bar' in all shells.
    shall foo-script bar

    # Print the type of the 'which' command in Bash and Zsh.
    shall -w bash,zsh -c 'type which'

    # Enter a REPL that evaluates commands in both Bash and Dash.
    SHELLS=bash,dash shall -i

Here's what the output looks like.

# Echo the name of each executing shell; sample output included.
$ shall -c 'echo "Hello from $0."'

Urchin, which I mention above, has similar support for running tests in multiple shells.

History of cross-shell testing in Urchin

  1. I wrote cross-shell tests for Urchin itself.
  2. Michael Klement wrote the cross-shell test runner.
  3. Michael Klement wrote shall.

First I wrote cross-shell tests for urchin itself.

Then Michael Klement realized that these tested only Urchin and not anything else that Urchin might be testing. So he wrote the original cross-shell testing feature. It hasn't changed fundamentally since.

Michael Klement later wrote shall.

When the topic of cross-shell testing originally came up, I had wanted to run my test cases with something like shall, but I still haven't come up with a nice way to do this without making things confusing or annoying to install.


These linters check your shell code for strange formatting and sources of potential error. I haven't used either, and they aren't exactly testing, but I think they deserve mention.

Which approach should I use?

Having looked at how the different tools work, who uses the different tools, and what people test with them, I believe that different approaches are appropriate depending on your particular situation.

Here are the main things I would consider when determining which approach to follow.

  • How complex is the program that you are testing?
  • What programming styles/languages are most familiar to the people writing the program.
  • What languages are you using in the project? In particular
  • Which shell(s) do you want to support?
  • Are you writing tests in languages other than shell?
  • Organizational structure of the collaboration
  • What dependencies are acceptable?
  • Special features
  • Do you think cram or tf would be a particularly convenient way of specifying your particular tests?
  • Do you want to a strong structure for specifying inputs and outputs, as in tf and cmdtest?
  • Do you require parallel test execution or testing in multiple environments?

Complexity of the program you are testing

For simple programs with simple user interfaces it might be best to avoid thinking much about the test runner; it might be best to forgo automated tests or to have a bespoke test script and thus to avoid introducing dependencies.

Programming styles/languages

People who are very familiar with shells, especially sh, should quickly grasp bocker, the JSON.sh tests, and Urchin. rnt and cmdtest should come very easily too, but they will probably be less intuitive to such people because these programs have their own ways of sending inputs and comparing outputs.

The language extensions and environment variables rarely make tests any shorter to write, but I think that they are helpful for some people. These seemingly redundant features can make shell programs look more like programs from other languages, which some people might know better.

The testing frameworks that I have referenced mostly do very similar things, so much of the difference is just that they have different conventions. If you use BSD and write shell very often, I suspect that you'll find Urchin to be most intuitive, but if you are more familiar with other operating systems and with object-oriented programming, you will probably find one of the others to be more intuitive.

Test suite languages

  • Most of the test runners require that tests be written in the shell that the test runner is run in.
  • Many test runners are extensions of sh (cram, tf, cmdtest, &c.)
  • JSON.sh, bocker, and Urchin can run tests written in any language as long as they are Unix-style programs.

Test runner dependencies

Test runner language

  • cram is written in Ruby.
  • tf and cmdtest are written in Python.
  • bats is written in bash.
  • Many of the above tools use non-standard shell features, especially bash. Often the developers don't know what shells their tools work on.

Special features

  • Do you think cram or tf would be a particularly convenient way of specifying your particular tests?
  • Do you want to a strong structure for specifying inputs and outputs, as in rnt and cmdtest?

Special features

  • Cram language or tf language
  • Input and output fixtures structured as files
  • Special Urchin features

  • Parallel test execution

  • Testing in multiple environments

Organizational structure of the collaboration

Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better. (Dijkstra)

Personal projects

I see no particular organizational constraints for personal projects.

Free software with lots of contributors

If you want lots of people to use it, something that follows existing norms is good so people have an easier time learning it. Probably something written in shell with TAP output is good.

Tech companies

You should prefer the more complex tools if you are working at a tech company.

  • Prefer complex test runners
  • Avoid test runners written in shell

In a tech company you should prefer the tools that implement their substantial own language or conventions, such as cram, tf, cmdtest, and rnt. Because these require specialized knowledge, you and your colleagues are likely to have a harder time using them, at least at first, so it will look like you are working harder.

In case you decide against any of those, you should prefer the tools that expect tests to be defined as shell functions. While the test suites for these tools are technically standard shell, these tools implement their own mechanisms for listing the defined functions, and the functions must sometimes follow special conventions.

Avoid the tools that are implemented as ordinary shell functions because those are too easy to use.

Tangential comment on the merits of GNU/Linux

People complain a lot about how GNU and Linux are too messy to trust and are thus appalled when they say real businesses using such software. I think that business software ideally lives at the edge of catastrophic failure.

For business applications, your software should live at the edge of catastrophic failure.

In my experience, the entire tech industry is snake oil, and the primary job of the tech worker to maintain the illusion, to himself/herself and to colleagues, that he or she is important.

Functional software is not a priority because the software is fundamentally a scam. Unreliable software, on the other hand, helps the tech worker; maintaining such software is stressful and unpleasant and requires specialized knowledge, and all of this adds to the illusion that the tech work is important and that the workers are unique.

The tech workers are important because they are people, but the work they are doing is just a silly game that some people take very seriously.

Final thoughts

First, note that these recommendations are mostly not from my personal experience; the only shell program that I presently maintain is Urchin, and its tests are run in Urchin.

Second, keep in mind that the differences among the tools is mostly in the ways that you think about tests and write test cases; if you want to use a particular tool that lack a particular feature (for example, cross-shell testing or advanced assertion functions), it should be easy to write a separate utility that provides at least a rough version of that feature.