Hi everyone! I’m Mats Lidell.  I'm going to talk about my journey
writing test cases for GNU Hyperbole and what I learned on the way.

So, why write tests for GNU Hyperbole. There is some background.

I'm the co-maintainer of GNU Hyperbole together with Bob Weiner. Bob
is the author of the package.

The package is available through the Emacs package manager and GNU
Elpa if you would want to try it out.

The package has some age. I think it dates back to a first release
around 1993 which is also when I got in contact with the package the
first time. I was a user of the package for many years. Later I became
the maintainer of the package for the FSF that was although I did not
have much knowledge of Emacs Lisp and still have a lot to learn.

A few years ago we started to work actively on the package with
setting up goals and having meetings.

So my starting point is that I had experience with test automation
from development in c++, java, python using the different x-unit
frameworks. Like cppunit, junit. That was in my daytime work where the
technique of using pull requests with changes backed up by tests were
the daily routine. It was really a requirement for a change to go in
to have supporting test cases. I believe a quite common setup and
requirement these days.

I also had been an Emacs user for many years but with focus on being a
users. So as I mentioned I had limited Emacs Lisp knowledge.

So when we decided to start to work actively on Hyperbole again it was
natural for me to look into raising the quality by adding unit
tests. This also goes hand in hand with running these regularly as
part of a build process. All in all following the current best
practice of software development.

But since Hyperbole had no tests at all it would not be enough just to
add tests for new or changed functionality. We wanted to add it even
broader. Ideally everywhere. So work started with adding tests here
and there based on our gut feeling where it would be most useful. This
work is still on going.

So this where my journey starts with much functionality to test, no
knowledge of what testing frameworks existed and not really knowing a
lot about Emacs Lisp at all.

Luckily there is a package for writing tests in Emacs. It is called
ERT. Emacs Lisp Regression Testing. It contains both support for
defining tests and running them.

Defining a test is done with the macro ert-deftest.

In its simplest form a test has a name, a doc string and a body. The
doc string is where you typically can give a detailed description of
the test and has space for more info than what can be given in the
test name.

The body is where all interesting things happen. It is here you
prepare the test, run it and verify the outcome.

Schematically it looks like this. You have the ert-deftest, you have
the testname, the docstring and then the body.

It is in the body where everything interesting happens. The test is
prepared, the function under test is executed and the outcome of the
test is evaluated. Did the test succeed or not.

The verification of a test is performed with one or more so called
Assertions and in Ert they are are implemented with the macro 'should'
together with a set of related macros.

Should takes a form as argument and if the form evaluates to nil the
test has failed.

So lets look at an example.

This simple test verifies that the function + can add the numbers 2
and 3 to get the result 5. So now we have defined a test case. How do
we run it?

The ert package has the function (or rather convenience alias) ert. It
takes a test selector. The test name works as a selector for running
just one test.

So here we have the example, lets evaluate it and then run it using
ert. As you see we get prompted for a test selector but we only have
one test case defined at the moment, it is the example zero. So lets
hit return.

And as you see here we get some output describing what we have just
done. It is one test case it has passed. Zero failed. Zero skipped.
A total of 1 of 1 test case and some time stamps for the execution.

We also see this green mark here indicating one test case and that it
was successful. For inspecting the test we can hit the letter l. Which
shows all the should forms that was executed during this test case. So
here we see that we have one should executed and we see the form
equals to true and it was 5 equals to 5. A good example of a
successful test case.

So now we have seen how we can run a test case. Can we debug it? Yes!

For debugging a test case the ert-deftest can be setup using the
edebug-defun just as a function or macro is setup or instrumented for
debugging.

So lets try that.

We try edebug-defun here. Now it is instrumented for debugging. And we
run it with ert. And we are inside the debugger and we can inspect
here what is happening. We step through it and it succeeded just as
before.

It is time for a commercial break!

Hyperbole itself can help with running tests and also help with
running them in debug mode. That is because Hyperbole identifies the
ert-deftest as an implicit button. An implicit button is basically a
string, or pattern, that Hyperbole has assigned some meaning too. For
the string ert-deftest it is to run the test case.

You activate a button with the action-key. The standard binding is the
middle mouse button or from the keyboard, M-RET.

Lets try that.

So we move the cursor here and we type M-RET. And boom the test case
was executed.

And to run it in debug mode we type C-U M-RET to get the assist key
and then we are in the debugger.

So that is pretty useful and convenient.

A related useful feature here is the step in functionality bound to
the letter i in debug-mode. It allows you to step in to a function and
continue debugging from there. For the cases where your test does not
do what you want looking at what happens in the function under test
can be really useful.

Lets try that with another example.

So here we have two helper functions one f1-add, that use the built in
+ function, and then we have my-add that uses that function.  So we
are going to test my-add. Lets run this using Hyperbole by typing C-u
M-RET. We are in the debugger again and lets step upfront to my
function under test and then press i for getting it instrumented and
going into it for debugging. And here we can expect we are getting the
arguments 1 and three and the result 4 as expected. And our test case
will then succeed.

Next tool in our toolbox is mocking. Mocking is needed when we want to
simulate the response from a function used by the function under
test. That is the implementation of the function. This could be for
various reasons. One example could be because it would be hard or
impossible in the test setup to get the behavior you want to test for
like an external error case. But a mock can also be used to verify
that a function is called with a specific argument.  We can view it as
a way to isolate the function under test from its dependencies.

So in order to test a function in isolation we need to cut out any
dependencies to external behavior. Most obvious would be dependencies
to external resources such as web-pages. As an example: Hyperbole
contains functionality to link you to social media resources and other
resources on the net. Testing that would require the test system to
call out to the social media resource and would depend on it being
available etc.

Nothing technically stops a test case to depend on the external
resources but would if nothing else be flaky or slow.

It could be part of an end to end suite where we want to test that it
works all the way. In this case we want to look at the isolated case
that can be run with no dependency on external resources.

What you want to do is to replace the function with a mock that
behaves as the real function would do.

The package I found and have been using for mocking is "el-mock".

The workhorse in this package is the with-mock macro. It looks like
this. with-mock followed by a body.

In the execution of the body stubs and mocks defined in the body is
respected. Lets looks at some examples to make that clearer.

So in this case we have the macro "with-mock" works so that the
expression "stub + => 10" is interpreted so that the function + will
be replaced with a stub. The stub will return 10 regardless how it is
called. Note that the stubbed function does not have to be called at
this level but could be called at any level in the call chain.

So by knowing how the function under test is implemented and how the
implementation works you can find functions calls you want to mock to
force certain behavior that you want to test. Or to avoid calls to
external resources, slow calls etc. Simply isolate the function under
test and simulate its environment.

A mock is a little bit more sophisticated and depends on the arguments
that the mocked function is called with. Or more precise it is checked
after the with-mock clause that the arguments matched the arguments it
was called with or even if it was called at all. So that if is called
with other arguments there will be an error. And if it is not called
it is also an error.

So this way we are sure that the function we were expected to be
called actually was called. An important piece of testing. So we are
sure that the mock we have provided actually is triggered by the test
case.

So here we have an example of with-mock with a mock. Where the f1-add
function is mocked so if it is called with arguments 2 and 3 it
returns 10. Then we have a test case where we try with my-add
function, as you might remember, and call that with 2 and 3 and see it
should also return 10 because it is using f1-add.

Moving over to cl-letf.

In rare occasions the limitations of el-mock means you would want to
implement a full fledged function to be used under test. Then the
macro cl-letf can be useful. However you need to handle the case your
self if the function was not called.

Looking through the test cases where I have used cl-letf I think most
can be implemented using plain mocking. Cases left is where the args
to the mock might be different due to environment issues. In that case
a static mock will not work.

Another trick is for functions that uses hooks you can overload or
replace the hooks to do the testing. So you can use the hook function
just to do the verification and not do anything useful in the
hook. Also here you need to be careful to make sure the test handler
is called and nothing else.


So far we have been talking about testing what a function returns. In
the best of world we have a pure function that only depends on its
arguments and produces no side effects.

Many operations produce side effects or operates on the contents of
buffers. Such as writing a message in the message buffer, change the
state of a buffer, move point etc. Hyperbole is not an
exception. Quite the contrary. Much of the functions creating links
are just about updating buffers. This poses a special problem for
tests.

The tests gets longer since you need to create buffers and files and
initialize their contents. Verifying the outcome becomes trickier
since you need to make sure you look at the right place.

At the end of the test you need to clean up both for not leaving a lot
of garbage buffers and files around and even worse, not cause later
tests to be depending on the left overs from other tests.

Here are some functions and variable I have found useful for this.

For creating tests:

with-temp-buffer: It provides you a temp buffer that you visit and
afterwords there is no need to clean up. So this is a first choice if
that is all you need.

make-temp-file: If you need a file this is the function to use. It
create a temp file or a directory. A file can be filled with initial
contents. This needs to be cleaned up after test.

Moving to verifying and debugging:

buffer-string: Returns the full contents of the buffer as a string.

That can sound a bit voluminous but since tests are normally small
that often works well. I have in particular found good use of
comparing the contents of buffers with the empty string. That will
give an error but as we have seen with the output produced by the
should assertion this is almost like a print statement and can be
compared to the good old technique of debugging with print statements.

There might be other ways todo do the same as we saw with debugging.

buffer-name: Getting the buffer name is good to verify what buffer we
are looking at.

I have often found it useful to check that my assumptions on what
buffer I'm acting on is correct by adding should clauses in the middle
of the test execution or after preparing the test input. Sometimes
Emacs can switch buffers in strange ways, maybe because the test case
is badly written, and making sure your assumptions are correct is a
good sanity check.

Even the ert package does some buffer and windows manipulation for its
reporting that I have not fully learned how to master so assertion for
checking the sanity of the test is good.

Finally major-mode: verify the buffer has the proper mode can also be
very useful and is a good sanity check.

Finally Cleaning up - unwind protect

The tool for cleaning up is the unwind-protect form which ensures that
the unwind forms always are executed regardless of the outcome of the
body. So even if you test fails you are sure the cleanup is executed.

Lets look at unwind-protect together with a temporary file
example. Many tests looks like this. You create some resource you call
unwind-protect, you do the test and afterwards you do the cleanup.

The cleanup for a file and buffer is so common so I have created a
helper for that. It looks like this.

The trick with the buffer modified flag is to avoid getting prompted
for killing a buffer that is not saved.  The test buffers are often in
a state where they have not been saved but modified.


Another problem for tests are input. In the middle of execution a
function might want to have some interaction with the user.  Testing
these poses a problem not only in that the input matters but also as
how to even get the test case to recognize the input!?

Ideally the tests are run in batch mode which in some sense means no
user interaction. And in batch mode there is no event loop
running. Fortunately there is a package "with-simulated-input" that
gets you around these issues.

So this is a macro that allows us to define a set of characters that
will be read by the function under test and all of this works in batch
mode.

It looks like this. We have with-simulated-input and a string of
characters and then a body.

The form takes a string of keys and runs the rest of the body and if
there are input required it is picked from the string of keys. In our
example the read-string call will read up until RET and then return
the chars read.

As you see in the example space needs to be provided by the string SPC
as return by the string RET.


So now we have seen ways to create test cases and even make it
possible to run some of them that has I/O in batch mode. But the
initial goal was to run then all at once. How do you do that?

Lets go back to the ert command. It prompts for a test selector. If we
give it the selector t it will run all tests we have currently
defined.

So lets try that with a subset of the Hyperbole tests.

Here is the test folder in the Hyperbole directory. Lets go up and
load all the demo tests. And then try to run ert. Now we see we have a
bunch of test cases. We can all run them individually but we can run
them with t instead. We will run them all at once. So now ert is
executing all our test cases.

So we have a nice green display with all the test cases.

So that was fine but we are still running it by manually calling
ert. How could we run it from the command line?

Ert comes with functions for running it in batch mode. For Hyperbole
we use make for repetitive tasks. So we have a make target that uses
the ert batch functionality and this is the line from the Makefile.

This is a bit detailed but you see that we have a part here where we
load the test dependencies. For getting the packages such as el-mock
and with-simulated-input etc loaded. I also want to point out here the
setting of auto-save-default to nil to get away of the prompt for
excessive backup files that can pile up after running the tests a few
times.

Even with the help of simulated input not all tests can be run in
batch mode. They will simply not work there and has to be run in an
interactive Emacs with a running event loop.

One trick still be able to use batch mode for automation is to put a
guard at the top of the test case as the first thing to be
executed. So that it kicks in before anything else and stops Emacs to
try to run the test case.

This looks like this:

    (skip-unless (not noninteractive))

So when ert sees that a test should be skipped it skips it and makes a
note of that so you'll see how many tests that has been skipped.

So to bad. We have a number of test cases defined and to run them we
need to run them manually?

Well yes, sort of.  Not being able to run all test easily is a bit
counter productive since our goal is to run all tests. There is
however no ert function to run tests in batch mode with an interactive
Emacs. The closes I have got is either to start Emacs from the command
line calling the ert function as we have seen and then killing it
manually when done, or add a function to extract the contents of the
*ert* buffer when done and echo it to standard output.

This is how it looks in the Makefile to get the behavior of cutting
and paste, getting the ert output into a file so you can kill emacs
and then spit out the content of the ert buffer.

One final word here is when you run this in a CI pipeline you might
not have a TTY for getting Emacs to start and that is then another
problem with getting the interactive mode.
 

So we are reaching the end of the talk.

If you have any new ideas or have some suggestion for improvements
feel free to reach out because I'm still on a learning curve on how to
write good test cases.

If you look at the test cases we have in Hyperbole and you think they
might contradict what I'm saying here it is OK. It is probably
right. I have changed the style as I go and we have not yet refactored
old test to benefit from new designs. That is also a beauty of a test
case. As long as it serves its purpose it is not terrible if it is not
optimal or not having the best style.

And yes. Thanks for listening. Bye!