Hi everyone! I’m Mats Lidell. I'm going to talk about my journey writing test cases for GNU Hyperbole and what I learned on the way. So, why write tests for GNU Hyperbole. There is some background. I'm the co-maintainer of GNU Hyperbole together with Bob Weiner. Bob is the author of the package. The package is available through the Emacs package manager and GNU Elpa if you would want to try it out. The package has some age. I think it dates back to a first release around 1993 which is also when I got in contact with the package the first time. I was a user of the package for many years. Later I became the maintainer of the package for the FSF that was although I did not have much knowledge of Emacs Lisp and still have a lot to learn. A few years ago we started to work actively on the package with setting up goals and having meetings. So my starting point is that I had experience with test automation from development in c++, java, python using the different x-unit frameworks. Like cppunit, junit. That was in my daytime work where the technique of using pull requests with changes backed up by tests were the daily routine. It was really a requirement for a change to go in to have supporting test cases. I believe a quite common setup and requirement these days. I also had been an Emacs user for many years but with focus on being a users. So as I mentioned I had limited Emacs Lisp knowledge. So when we decided to start to work actively on Hyperbole again it was natural for me to look into raising the quality by adding unit tests. This also goes hand in hand with running these regularly as part of a build process. All in all following the current best practice of software development. But since Hyperbole had no tests at all it would not be enough just to add tests for new or changed functionality. We wanted to add it even broader. Ideally everywhere. So work started with adding tests here and there based on our gut feeling where it would be most useful. This work is still on going. So this where my journey starts with much functionality to test, no knowledge of what testing frameworks existed and not really knowing a lot about Emacs Lisp at all. Luckily there is a package for writing tests in Emacs. It is called ERT. Emacs Lisp Regression Testing. It contains both support for defining tests and running them. Defining a test is done with the macro ert-deftest. In its simplest form a test has a name, a doc string and a body. The doc string is where you typically can give a detailed description of the test and has space for more info than what can be given in the test name. The body is where all interesting things happen. It is here you prepare the test, run it and verify the outcome. Schematically it looks like this. You have the ert-deftest, you have the testname, the docstring and then the body. It is in the body where everything interesting happens. The test is prepared, the function under test is executed and the outcome of the test is evaluated. Did the test succeed or not. The verification of a test is performed with one or more so called Assertions and in Ert they are are implemented with the macro 'should' together with a set of related macros. Should takes a form as argument and if the form evaluates to nil the test has failed. So lets look at an example. This simple test verifies that the function + can add the numbers 2 and 3 to get the result 5. So now we have defined a test case. How do we run it? The ert package has the function (or rather convenience alias) ert. It takes a test selector. The test name works as a selector for running just one test. So here we have the example, lets evaluate it and then run it using ert. As you see we get prompted for a test selector but we only have one test case defined at the moment, it is the example zero. So lets hit return. And as you see here we get some output describing what we have just done. It is one test case it has passed. Zero failed. Zero skipped. A total of 1 of 1 test case and some time stamps for the execution. We also see this green mark here indicating one test case and that it was successful. For inspecting the test we can hit the letter l. Which shows all the should forms that was executed during this test case. So here we see that we have one should executed and we see the form equals to true and it was 5 equals to 5. A good example of a successful test case. So now we have seen how we can run a test case. Can we debug it? Yes! For debugging a test case the ert-deftest can be setup using the edebug-defun just as a function or macro is setup or instrumented for debugging. So lets try that. We try edebug-defun here. Now it is instrumented for debugging. And we run it with ert. And we are inside the debugger and we can inspect here what is happening. We step through it and it succeeded just as before. It is time for a commercial break! Hyperbole itself can help with running tests and also help with running them in debug mode. That is because Hyperbole identifies the ert-deftest as an implicit button. An implicit button is basically a string, or pattern, that Hyperbole has assigned some meaning too. For the string ert-deftest it is to run the test case. You activate a button with the action-key. The standard binding is the middle mouse button or from the keyboard, M-RET. Lets try that. So we move the cursor here and we type M-RET. And boom the test case was executed. And to run it in debug mode we type C-U M-RET to get the assist key and then we are in the debugger. So that is pretty useful and convenient. A related useful feature here is the step in functionality bound to the letter i in debug-mode. It allows you to step in to a function and continue debugging from there. For the cases where your test does not do what you want looking at what happens in the function under test can be really useful. Lets try that with another example. So here we have two helper functions one f1-add, that use the built in + function, and then we have my-add that uses that function. So we are going to test my-add. Lets run this using Hyperbole by typing C-u M-RET. We are in the debugger again and lets step upfront to my function under test and then press i for getting it instrumented and going into it for debugging. And here we can expect we are getting the arguments 1 and three and the result 4 as expected. And our test case will then succeed. Next tool in our toolbox is mocking. Mocking is needed when we want to simulate the response from a function used by the function under test. That is the implementation of the function. This could be for various reasons. One example could be because it would be hard or impossible in the test setup to get the behavior you want to test for like an external error case. But a mock can also be used to verify that a function is called with a specific argument. We can view it as a way to isolate the function under test from its dependencies. So in order to test a function in isolation we need to cut out any dependencies to external behavior. Most obvious would be dependencies to external resources such as web-pages. As an example: Hyperbole contains functionality to link you to social media resources and other resources on the net. Testing that would require the test system to call out to the social media resource and would depend on it being available etc. Nothing technically stops a test case to depend on the external resources but would if nothing else be flaky or slow. It could be part of an end to end suite where we want to test that it works all the way. In this case we want to look at the isolated case that can be run with no dependency on external resources. What you want to do is to replace the function with a mock that behaves as the real function would do. The package I found and have been using for mocking is "el-mock". The workhorse in this package is the with-mock macro. It looks like this. with-mock followed by a body. In the execution of the body stubs and mocks defined in the body is respected. Lets looks at some examples to make that clearer. So in this case we have the macro "with-mock" works so that the expression "stub + => 10" is interpreted so that the function + will be replaced with a stub. The stub will return 10 regardless how it is called. Note that the stubbed function does not have to be called at this level but could be called at any level in the call chain. So by knowing how the function under test is implemented and how the implementation works you can find functions calls you want to mock to force certain behavior that you want to test. Or to avoid calls to external resources, slow calls etc. Simply isolate the function under test and simulate its environment. A mock is a little bit more sophisticated and depends on the arguments that the mocked function is called with. Or more precise it is checked after the with-mock clause that the arguments matched the arguments it was called with or even if it was called at all. So that if is called with other arguments there will be an error. And if it is not called it is also an error. So this way we are sure that the function we were expected to be called actually was called. An important piece of testing. So we are sure that the mock we have provided actually is triggered by the test case. So here we have an example of with-mock with a mock. Where the f1-add function is mocked so if it is called with arguments 2 and 3 it returns 10. Then we have a test case where we try with my-add function, as you might remember, and call that with 2 and 3 and see it should also return 10 because it is using f1-add. Moving over to cl-letf. In rare occasions the limitations of el-mock means you would want to implement a full fledged function to be used under test. Then the macro cl-letf can be useful. However you need to handle the case your self if the function was not called. Looking through the test cases where I have used cl-letf I think most can be implemented using plain mocking. Cases left is where the args to the mock might be different due to environment issues. In that case a static mock will not work. Another trick is for functions that uses hooks you can overload or replace the hooks to do the testing. So you can use the hook function just to do the verification and not do anything useful in the hook. Also here you need to be careful to make sure the test handler is called and nothing else. So far we have been talking about testing what a function returns. In the best of world we have a pure function that only depends on its arguments and produces no side effects. Many operations produce side effects or operates on the contents of buffers. Such as writing a message in the message buffer, change the state of a buffer, move point etc. Hyperbole is not an exception. Quite the contrary. Much of the functions creating links are just about updating buffers. This poses a special problem for tests. The tests gets longer since you need to create buffers and files and initialize their contents. Verifying the outcome becomes trickier since you need to make sure you look at the right place. At the end of the test you need to clean up both for not leaving a lot of garbage buffers and files around and even worse, not cause later tests to be depending on the left overs from other tests. Here are some functions and variable I have found useful for this. For creating tests: with-temp-buffer: It provides you a temp buffer that you visit and afterwords there is no need to clean up. So this is a first choice if that is all you need. make-temp-file: If you need a file this is the function to use. It create a temp file or a directory. A file can be filled with initial contents. This needs to be cleaned up after test. Moving to verifying and debugging: buffer-string: Returns the full contents of the buffer as a string. That can sound a bit voluminous but since tests are normally small that often works well. I have in particular found good use of comparing the contents of buffers with the empty string. That will give an error but as we have seen with the output produced by the should assertion this is almost like a print statement and can be compared to the good old technique of debugging with print statements. There might be other ways todo do the same as we saw with debugging. buffer-name: Getting the buffer name is good to verify what buffer we are looking at. I have often found it useful to check that my assumptions on what buffer I'm acting on is correct by adding should clauses in the middle of the test execution or after preparing the test input. Sometimes Emacs can switch buffers in strange ways, maybe because the test case is badly written, and making sure your assumptions are correct is a good sanity check. Even the ert package does some buffer and windows manipulation for its reporting that I have not fully learned how to master so assertion for checking the sanity of the test is good. Finally major-mode: verify the buffer has the proper mode can also be very useful and is a good sanity check. Finally Cleaning up - unwind protect The tool for cleaning up is the unwind-protect form which ensures that the unwind forms always are executed regardless of the outcome of the body. So even if you test fails you are sure the cleanup is executed. Lets look at unwind-protect together with a temporary file example. Many tests looks like this. You create some resource you call unwind-protect, you do the test and afterwards you do the cleanup. The cleanup for a file and buffer is so common so I have created a helper for that. It looks like this. The trick with the buffer modified flag is to avoid getting prompted for killing a buffer that is not saved. The test buffers are often in a state where they have not been saved but modified. Another problem for tests are input. In the middle of execution a function might want to have some interaction with the user. Testing these poses a problem not only in that the input matters but also as how to even get the test case to recognize the input!? Ideally the tests are run in batch mode which in some sense means no user interaction. And in batch mode there is no event loop running. Fortunately there is a package "with-simulated-input" that gets you around these issues. So this is a macro that allows us to define a set of characters that will be read by the function under test and all of this works in batch mode. It looks like this. We have with-simulated-input and a string of characters and then a body. The form takes a string of keys and runs the rest of the body and if there are input required it is picked from the string of keys. In our example the read-string call will read up until RET and then return the chars read. As you see in the example space needs to be provided by the string SPC as return by the string RET. So now we have seen ways to create test cases and even make it possible to run some of them that has I/O in batch mode. But the initial goal was to run then all at once. How do you do that? Lets go back to the ert command. It prompts for a test selector. If we give it the selector t it will run all tests we have currently defined. So lets try that with a subset of the Hyperbole tests. Here is the test folder in the Hyperbole directory. Lets go up and load all the demo tests. And then try to run ert. Now we see we have a bunch of test cases. We can all run them individually but we can run them with t instead. We will run them all at once. So now ert is executing all our test cases. So we have a nice green display with all the test cases. So that was fine but we are still running it by manually calling ert. How could we run it from the command line? Ert comes with functions for running it in batch mode. For Hyperbole we use make for repetitive tasks. So we have a make target that uses the ert batch functionality and this is the line from the Makefile. This is a bit detailed but you see that we have a part here where we load the test dependencies. For getting the packages such as el-mock and with-simulated-input etc loaded. I also want to point out here the setting of auto-save-default to nil to get away of the prompt for excessive backup files that can pile up after running the tests a few times. Even with the help of simulated input not all tests can be run in batch mode. They will simply not work there and has to be run in an interactive Emacs with a running event loop. One trick still be able to use batch mode for automation is to put a guard at the top of the test case as the first thing to be executed. So that it kicks in before anything else and stops Emacs to try to run the test case. This looks like this: (skip-unless (not noninteractive)) So when ert sees that a test should be skipped it skips it and makes a note of that so you'll see how many tests that has been skipped. So to bad. We have a number of test cases defined and to run them we need to run them manually? Well yes, sort of. Not being able to run all test easily is a bit counter productive since our goal is to run all tests. There is however no ert function to run tests in batch mode with an interactive Emacs. The closes I have got is either to start Emacs from the command line calling the ert function as we have seen and then killing it manually when done, or add a function to extract the contents of the *ert* buffer when done and echo it to standard output. This is how it looks in the Makefile to get the behavior of cutting and paste, getting the ert output into a file so you can kill emacs and then spit out the content of the ert buffer. One final word here is when you run this in a CI pipeline you might not have a TTY for getting Emacs to start and that is then another problem with getting the interactive mode. So we are reaching the end of the talk. If you have any new ideas or have some suggestion for improvements feel free to reach out because I'm still on a learning curve on how to write good test cases. If you look at the test cases we have in Hyperbole and you think they might contradict what I'm saying here it is OK. It is probably right. I have changed the style as I go and we have not yet refactored old test to benefit from new designs. That is also a beauty of a test case. As long as it serves its purpose it is not terrible if it is not optimal or not having the best style. And yes. Thanks for listening. Bye!