Hi everyone, I'm Mats Liddell. I'm going to talk about my journey writing test cases for GNU Hyperbole and what I learned on the way.
So, why write tests for GNU Hyperbole? There is some background. I'm the co-maintainer of GNU Hyperbole together with Bob Weiner. Bob is the author of the package. The package is available through the Emacs Package Manager and GNU Elpa if you would want to try it out.
The package has some age. I think it dates back to the first release around 1993, which is also when I got in contact with the package first time. I was used to the package for many years. Later I became the maintainer of the package for the FSF.
That was although I did not have much knowledge of Emacs Lisp and I still have a lot to learn. A few years ago we started to work actively on the package with setting up goals and having meetings.
My starting point is that I had experience with test automation from development in C++, Java and Python using different XUnit frameworks like CppUnit, JadeUnit.
That was in my daytime work where the technique of using pull requests with changes backed up by tests were the daily routine. It was really a requirement for a change to go in to have supporting test cases. I believe a quite common setup and requirement these days.
I also had been an Emacs user for many years but with focus on being a user. So as I mentioned I have limited Emacs Lisp knowledge.
So when we decided to start to work actively on Hyperbole again it was natural for me to look into raising the quality by adding unit tests. This also goes hand in hand with running these regularly as part of a build process. All in all following the current best practice of software development.
But since Hyperbole had no tests at all it would not be enough just to add tests for new or changed functionality. We wanted to add it even broader, ideally everywhere. So work started with adding tests here and there based on our gut feeling where it would be most useful. This work is still ongoing.
So this is where my journey starts with much functionality to test, no knowledge of what testing frameworks existed and not really knowing a lot about Emacs Lisp at all.
Luckily there is a package for writing tests in Emacs. It is called ERT, Emacs Lisp Progression Testing. It contains both support for defining tests around them. Defining a test is done with the macro ERTDefTest.
In simplest form a test has a name, a docstring and a body. A docstring is where you typically can give a detailed description of the test and has space for more info than what can be given in the test name.
So the body is where all the interesting things happen. Here you prepare the test, run it and verify the outcome. Schematically it looks like this. You have the ERTDefTest, you have the test name and the docstring and then the body.
So it is in the body where everything interesting happens. The test is prepared, the function of the test is executed and the outcome of the test is evaluated. Did the test succeed or not?
The verification of a test is performed with one or more so called assertions and in the ERT they are implemented with a macro should together with a set of related macros.
Should takes a form as argument and if the form evaluates to nil the test has failed. So let's look at an example.
This simple test verifies that the function plus can add the numbers 2 and 3 and get the result 5. So now we have defined a test case. How do we run it?
The ERT package has the function of a rather convenient alias ERT. It takes a test selector. The test name works as a selector for running just one test.
So here we have the example and let's just evaluate it. We define it and then we run it using ERT.
As you see we get prompted for a test selector but we only have one test case defined at the moment so it's the example 0. So let's hit return.
And as you see here we get some output describing what we have just done. There is one test case it has passed, zero failed, zero skipped, total one of one test case and some time stamps for the execution.
We also see this green mark here indicating one test case and that it was successful. For inspecting the test we can hit the letter L which shows all the should forms that was executed during this test case.
So here we see that we have the should, one should executed and we see the form equals to 2 and it was 5 equals to 5. So a good example of a successful test case.
So now we've seen how we can run a test case. Can we debug it? Yes. For debugging a test case the ERT test can be set up using the debug.fun just as a function or macro is set up or instrumented for debugging. So let's try that.
So we try to debug the fun here. So now it's instrumented for debugging and we run it ERT and we're inside the debugger and we can expect here what's happening.
Step through it and yes it succeeded just as before.
It's time for a commercial break.
Hyperbole itself can help with running tests and also help with running them in debug mode. That is because hyperbole identifies the ERT dev test as an implicit button. An implicit button is basically a string or pattern that hyperbole has assigned some meaning to. For the string ERT dev test it is around the test case.
So you activate the button with the action key. The standard binding is the middle mouse button or from the keyboard meta return.
So let's try that.
So we move the cursor here and then we type meta return and boom the test case was executed.
And to run it in debug mode we type CTRL U, meta return to get the assist key and then we're in the debugger.
So that's pretty useful and convenient.
A related useful feature here is the step-in functionality bound to the letter I in debug mode. It allows you to step into a function and continue debugging from there. For the cases where your test does not do what you want looking at what happens in the function of the test can be really useful. Let's try that with another example.
So here we have two helper functions one f one add that use the built-in plus function and then we have myadd that uses that function. So we're going to test myadd.
And then let's run this. Let's run this using hyperbole in debug mode CTRL U, meta return. So we're in debugger again and let's step up front to my function on the test and then press I for getting it instrumented and going into it for debugging.
And here we can expect that it's getting the arguments one and three and it returns the result four as expected and yes of course our test case will then succeed.
The next tool in our toolbox is mocking. Mocking is needed when we want to simulate the response from a function used by the function on the test. That is the implementation of the function. This could be for various reasons.
One example could be because it would be hard to impossible in the test setup to get the behavior you want to test for like an external error case. But the mock can also be used to verify that the function is called with a specific argument. We can view it as a way to isolate the function on the test from its dependencies.
So in order to test the function in isolation we need to cut out any dependencies to external behavior. Most obvious would be dependencies to external resources such as web pages as an example. Hyperbole contains functionality to link you to social media resources and other resources on the net.
Testing that would require the test system to be called out to the social media resources and would depend on it being available etc.
Nothing technically stops a test case to depend on the external resources but would if nothing else be flaky or slow. It could be part of an end-to-end suite where we want to test that it works all the way. In this case we want to look at the isolated case that can be run with no dependency on external resources.
What you want to do is to replace the function with a mock that behaves as the real function would do. The package I have found and have used for mocking is elmock.
The workhorse in this package is the withmock macro. It looks like this. Withmock followed by a body. In the execution of the body stubs and mocks defined in the body is extraspected. Let's look at some examples to make that clearer.
In this case we have the macro withmock. It works so that the expression stub plus equal angle bracket 10 is interpreted so that the function plus will be replaced with the stub. The stub will return 10 regardless how it is called.
Note that the stub function does not have to be called at this level but could be called at any level in the call chain.
By knowing how the function under test is implemented and how the implementation works you can find functions calls you want to mock to force certain behavior that you want to test or to avoid calls to external resources, slow calls etc. Simply isolate the function under test and simulate its environment.
Mock is a little bit more sophisticated and depends on the arguments that the mock function is called with. For more precise it is checked withmockclause that the arguments match the arguments it was called with or even if it was called at all.
If it is called with other arguments there will be an error and if it's not called it is also an error. So this way we are sure that the function we were expected to be called actually was called, an important piece of the testing. So we are sure that the mock we have provided actually is triggered by the test case.
So here we have an example of a withmock where the f1add function is mocked so that if it's called with two and three as arguments it will return 10. Then we have a test case where we try the myadd function.
As you might remember and call that with two or three and see that it should also then return 10 because it's using f1add.
So moving over to clletf. In rare occasions the limitations of elmock means you could want to implement the full-fledged function to be used under test.
Then the macro clletf can be useful. However you need to handle the case yourself if the function was not called. Looking through the test cases where I have used clletf I think most can be implemented using plain mocking. Cases left is where the args to the mock might be different due to environment issues. In that case a static mock will not work.
Another trick is that functions that uses hooks. You can overload or replace the hooks to do the testing. So you can use the hook function just to do the verification rather than to do anything useful in the hook. Also here you need to be careful to make sure your test handler is called and nothing else.
So far we have been talking about testing and what the function returns. In the best of words we have a pure function that only depends on its arguments and produces no side effects. Many operations produce side effects or operate on the contents of buffers such as writing a message in the message buffer, change the state of a buffer, move point etc.
Hyperbole is not an exception, quite the contrary. Much of the functions creating links are just about updating buffers. This poses a special problem for tests. The test gets longer since you need to create buffers and files, initialize the contents, verifying the outcome becomes trickier since you need to make sure you look at the right place.
At the end of the test you need to clean up both for not leaving a lot of garbage in buffers and files around and even worse not cause later tests to depend on the leftovers from the other tests. Here are some functions and variables I have found useful for this.
So for creating tests with temp buffer it provides you a temp buffer that you visit and afterwards there is no need to clean up. So this is the first choice if that is all you need.
MakeTempFile. If you need a file this is the function to use. It creates a temp file or a directory. The file can be filled with initial contents. This needs to be cleaned up after a test.
Moving on to verifying and debugging. BufferString. It returns the full contents of the buffer as a string. That can sound a bit voluminous but since tests are normally small this often works well.
I have in particular found good use of comparing the contents of buffers with the empty string. That would give an error but as we have seen with the output produced by the should assertion this is almost like a print statement and can be compared with the good old technique of debugging with print statements.
There might be other ways to do the same as we saw with debugging. BufferName. Getting the buffer name is good to verify what buffer we are looking at. I often found it useful to check that my assumptions on what buffer I am acting on is correct by adding should clauses in the middle of the test execution or after preparing the test input.
Sometimes Emacs and switch buffers in strange ways, maybe because the test course is badly written and making sure your assumptions are correct is a good sanity check.
Even the ERT package does some buffer and windows manipulation for its reporting but I have not fully learned how to master should assertion for checking the sanity of the test is good.
Finally major mode. Verify the buffer has the proper mode. Can also be very useful and is a good sanity check.
Finally cleaning up. UnwindProtect. The tool for cleaning up is the UnwindProtect form which ensures that the unwind forms always are executed regardless of the outcome of the body. So if your test fails you are sure the cleanup is executed.
Let's look at UnwindProtect together with the temporary file example. Many tests look like this.
You create some resource called UnwindProtect, you do the test and then afterwards you do the cleanup.
The cleanup for a file and a buffer is so common I have created a helper for that.
It looks like this.
The trick with the buffer modify flag is to avoid getting prompted for killing a buffer that is not saved. The test buffers are often in the state where they have not been saved but modified.
Another problem for tests are input.
In the middle of execution a function might want to have some interaction with the user. Testing this poses a problem not only in that the input matters but also as how even to get the test case to recognize the input.
I did the test around in batch mode which in some sense means no user interaction and in batch mode there is no event loop running. Fortunately there is a package with simulated input that gets you around these issues.
This is a macro that allows us to define a set of characters that will be read by the function on the test and all of this works in batch mode. It looks like this.
We have with simulated input and then a string of characters and then a body.
The form takes a string of keys and runs a test on the body and if there are input required it is picked from the string of keys.
In our example the read string call will read up until ret and then return the characters read.
As you see in the example space needs to be provided by the string spc as returned by the string ret.
So now we have seen ways to create test cases and even make it possible to run some of them that has IO in batch mode.
But the initial goal was to run them all at once so how do we do that?
Let's go back to the rt command. It prompts for a test selector. If we give it the selector t it will run all tests we have currently defined.
So let's try that with the subset of the hyperbole tests.
So here is the test folder in the hyperbole directors and let's go up here and load all the demo tests.
So now we see that we have a bunch of test cases. We can all run them individually but we can run with t instead.
We will run them all at once.
So now rt is execution of all our test cases.
So now we can see that we have a bunch of test cases.
So here we have a nice green display with all the test cases.
So that was fine but we were still running it manually by calling rt. How could we run it from the command line?
So rt comes with functions for running it in batch mode.
For a hyperbole we use make for repetitive tasks so we have a make target that uses the rt batch functionality and this is the line from the make file.
So this is a bit detailed but you see that we have a part here where we load the test dependencies for getting the packages such as elmoc and simulated input etc loaded.
I also want to point out here the call to or the setting of autosave default to nil to get away with the prompt for excessive backup files that can pile up after running the tests a few times.
Even with the help of simulated input not all tests can be run in batch mode. They would simply not work there and has to be run in an interactive emacs with the running event loop.
One trick still to be able to use batch mode for automation is to put the guard at the top of each test case as the first thing to be executed so that it kicks in before anything else stops emacs to try to run the test case.
Now it looks like this skip unless not non-interactive.
So when rt sees that the test should be skipped it skips it and makes a note of that so you will see how many tests that have been skipped.
So too bad. We have a number of test cases defined and to run them we need to run them manually. Well sort of.
Not being able to run all tests is a bit counterproductive since our goal is to run all tests.
There is however no ERT function to run tests in batch mode with an interactive emacs.
The closest I have got is either to start the emacs from the command line calling the ERT function as we just have seen and then killing it manually when done.
Or add a function to extract the contents of the ERT buffer when done and echo it to standard output.
So this is how it looks in the makefile to get the behavior of getting the ERT output into a file so we can then kill emacs and spit out the content of the ERT buffer.
One final word here is that when you run this in like a continuous integration pipeline you might not have a TTY for getting emacs to start and that is then another problem with getting the interactive mode.
So we have reached the end of the talk. If you have any new ideas or have some suggestions for improvements feel free to reach out because I am still on the learning curve of writing how to write good test cases.
If you look at the test cases we have in hyperbole and you think they might contradict what I am saying here it is ok. It is probably right.
I have changed the style as I go and we have not yet refactored all tests to benefit from new designs.
That is also the beauty of the test case. As long as it serves its purpose it is not terrible if it is not optimal or not having the best style.
And yes, thanks for listening. Bye.
Microsoft Mechanics
www.microsoft.com