Hi, I'm Austin Terrio, and this is writing a language server in OCaml for Emacs, fun, and profits. Real quick, who am I? Well, I'm a software engineer at Semgrep. I work on our editor integrations, and I love working on programming languages, editors, and cryptography. What is Semgrep? We're a small cybersecurity startup whose core product is a SaaS tool, which is static application security testing. You can think of it as like a security linter, so normal linters will say, hey, you wrote ugly code, fix it. We'll say, hey, you wrote a SQL injection, fix that. We support 30 plus languages, and we have lots of customers all using different IDs. Why does that matter? Well, our goal is to show security bugs as early as possible in the development cycle. In the industry, we call this shifting left. And so how far left can we shift? The editor. So that's why it matters that our customers have different editors. Our goal is to have Semgrep and the editor show up like other language tooling. And what I mean by that is I wrote some bad OCaml up here, and the editor gave me that red squiggly and said, fix your OCaml, and we want Semgrep to do something similar. And so our goal then is to provide a similar experience to normal language checking. And then since we're a small startup, and there's a ton of different IDEs that our customers use, ideally, we don't want to have to rewrite a plugin for every single type of editor out there. Our other goal is abstract away editing and language features for editors to one code base. Ideally, we write it once and then plug it into all of them. So how can we do that, though? Well, in the process of working on this stuff, I figured out about I found out about the language server protocol. And what's great about the language server protocol is it's a specification that defines all the ways that these language tools might interact with a development tool. And by development tool, I mean like VS Code, Sublime, Emacs, any of those. And by language tool, I mean something like PyRite, MyPy. And so what's cool about LSP is that you can separate out those tools into language servers and the development tools into language clients. And because they share this common specification, they can now interact without knowing each other. So it's this great abstraction that means all you have to do is go write one language server and you can hook it up to a bunch of language clients and it'll just work. So let's do a quick case study on language servers in LSP, just so you get an idea of why this is super cool. So there's this language server called Rust Analyzer. It's a language server for the Rust language. If you've ever developed in Rust, you'll know that takes a really long time to compile, but the compiler gives you fantastic feedback. Rust has a lot of advanced language features, so that feedback is super important for developing. And so Rust Analyzer will give you that feedback instantly. Here's a ton of things that it gives you. Code completion, fixes, compiler errors, warnings, type signatures. Rust has a pretty strong type system. It also has this thing called lifetimes. A bunch of advanced language features in Rust Analyzer helps you manage all that and gives you all that info without having to wait for it to compile. Developing with the Rust Analyzer is just orders of magnitude easier than just trying to write Rust straight. Rust Analyzer, fantastic. They went and they developed it, and now you can go use that in Emacs, NeoVim, VS Code, wherever. So you can develop Rust in a way that's relatively efficient without having to give up your favorite editor. So here's a quick little demo of all the cool things it can do. So you can see I typed an error. It tells me that I wrote an error. I used the incorrect lifetime, which is some advanced language feature, and it'll let me know that. I expanded a Rust macro just there, which is similar to list macros, and then I ran a single unit test, and that's really cool because I ran a single unit test from my editor. I didn't have to go and type any commands or anything. It just worked. So why is this just useful in general for a user? Well, you get the same experience across editors. Like I was saying, you don't have to give up one editor for another so you get some sort of cool language feature. You can easily set up and use language servers made for other editors if developers don't support your editor of choice. Performance is not dependent on the editor. That's fantastic because to do all that Rust stuff, it takes a lot of CPU power, and so that's going to be slow if your editor language is not great, not fast. And then bug fixes, updates, all that, it all comes out at the same time. And then from the developer perspective, well, adding new editors is quick and easy. For reference, when I wrote the Semgrep language server, it took me maybe two or three weeks, but then actually going and setting it up for VS Code, that took an hour. For Emacs, 30 minutes. IntelliJ, maybe another hour. So it took me a day to add support for three different editors, which was I think something like 75% of the market share or something crazy like that. So very quick. You only need one mental model. You don't have to figure out all these different extension mental models, how those editors work, anything like that. And another thing that's cool is you only have to write tests for the language server, not necessarily for the editor. It's great to have just one set of tests that you have to pass. So why does a language server protocol matter with Emacs? Well, like I was saying before, Emacs gets the benefit from work put into other editors. So we get all this language support, and no one actually has to go and write the list for it or write those tools specific to Emacs. You get the language tooling, the CPU intensive part of the editors. It can be written in something else. Lisp is fast. It's not that fast. Having that speed is fantastic. It's all asynchronous. It won't slow down Emacs. And then there's this package called LSP mode, which is an LSP client commonly included in popular Emacs distributions. So a lot of people already have that. If you're using Emacs 29 or greater, you have EGLOT mode, which is a lighter weight version of LSP mode. It's just another LSP client. When I wrote the Semgrep language server, Emacs 29 hadn't come out yet. And so I'm not going to talk too much about EGLOT mode because I did everything in LSP mode, but I would imagine a lot of this stuff is very similar. And just here's a list of some supported languages. Now let's get into the technical part. How does LSP actually work? So let's go over how it communicates first. It uses JSON RPC, which is just kind of like HTTP, but instead of sending plain text, you're sending JSON. So it's just sending JSON back and forth. It's great because it's a way for two programs to communicate without sharing a common programming language. Transport, platform agnostic, so it could be standard in, standard out, sockets, whatever. It's just JSON. You can send it over whatever. There's two different types of messages, a request, which requires a response from the other party, and a notification, which does not expect a response. So just a quick little example, a user might open a document, and then it'll send like a text document did open and what document it was to the language server, and then they'll change it. Maybe they edit some code and introduce a syntax error. The changes will be sent to the language server, and then the language server will publish diagnostics, which is like those red squigglies I was talking about earlier, and say, hey, syntax error or whatever here, or maybe the user says, I want to go to the definition of this function, and then the language server will spit back, hey, this is where that function lives. All very useful, and the communication is relatively simple, which is great. This is what it looks like, what a request looks like. Notifications look somewhat similar. So now we know how LSP communication works, but how does the actual protocol work? Well, almost all of the protocol is opt-in, meaning you don't have to support the entire specification, you can just pick and choose. Servers and clients will then communicate what part of the protocol they both support, so they'll both say, hey, we support being notified when a user opens a document, or if they're looking for documentation. And so then once they agree upon what they'll both support, then they'll send that stuff, those notifications and requests back and forth. Things like opening and closing files, diagnostics, code completion, hovering over stuff, type signatures, all of that. And what's cool is even though the specification is huge and probably has everything you need, you can go ahead and add custom capabilities if you really want to. So you can just define a custom method, and then now that works for you, and now you can have that in all your editors. For example, Rust Analyzer has structural search and replace, which is like find and replace, but with respect to the structure of the code. And if you choose to go down this route with the custom capabilities, you do have to remember you're going to have to implement it in every client. And that's a little bit more work, but it's better than where we were without LXP. So some quick tips on writing a language server. I'm not going to get too into this because it's very application specific. I wrote semgreps in OCaml since our code base was almost all OCaml already, and I wanted to leverage that. Would not recommend unless you also have a code base all in OCaml. Structure is similar to a Rust server, so a bunch of independent endpoints. I would do everything functionally if I were you. This is EmacsConf. We're all hopefully used to writing functional Lisp. I would recommend TypeScript or Rust, though, depending on your level of performance that you really need or whatever language you're trying to support ideally. Most languages have some sort of language server protocol already. But if they don't, then it might be easier to do it in that language. TypeScript has a lot of support, a lot of documentation, a lot of examples out there because it was what Microsoft originally intended the language server protocol to be for, for VS Code, which is written in TypeScript. Rust is fast, it's going to take more effort, but it's very fast, and Rust Analyzer has a great library that they use and that they support. So support there, examples there are great. The hard part is not really the language server protocol, but the actual logic. So, like, if you're doing, like, language tooling, you're going to have to do analysis on the code, so you need to do parsing, possibly compiling, all these different advanced features, all these advanced different things. For example, Rust Analyzer will do incremental compilation, which is really, really cool, but that's, like, a whole separate talk. If you're adapting an existing language tool, this stuff is really easy. You're basically just wiring stuff up. But, yeah. So, now we know all about LSP and language servers. Say you want to actually add support for a language server in Emacs. How do you do that? Well, let's look at LSP mode, because, like I said, this is what I'm most familiar with. I'm sure EGLOT mode is pretty similar. So, LSP mode's repository is on GitHub, like everything, and it has a ton of different clients for a ton of different languages and frameworks and tools, like Semgrep, and these are available to anyone who installs LSP mode. Alternatively, you can make a separate package and just use LSP mode as a library, but I'm not going to focus on this, because there's already a ton of resources out there on packaging and Emacs. So, our steps, very quickly, are going to look like adding an Emacs list file that contains some logic, add an entry somewhere, so we added a new client to the list of clients, and then do some documentation, because documentation's great. First, creating a client. In the clients folder in LSP mode, literally just add, like, LSP dash whatever it is, require the library, and register a client. Registering a client just means, like, saying what kind of connection it is. It's most likely going to be standard IO, because that's pretty easy to implement, and then you just pass it the executable that you actually want to run. Say what the activation function is, so this is when the client should start, so you can specify the language or the major mode or whatever, and now your client will start whenever that's triggered, and then finally provide just a server ID, so that way it's easy to keep track of, and then run this LSP consistency check function. This just makes sure everything up there is good. You can do more advanced stuff with making an LSP client that I'm not going to get into, but just know that these aren't your only options, and then finally provide your client. Next, you just have to add your client to the list of clients that LSP mode supports, and now you've added support for a whole new language, whole new framework, whole new tool to Emacs, and it's taking you, what, like, what is that, 20 lines of Lisp? No, not even, like, 15. 15 lines of Lisp, whole new language for Emacs. It's really exciting. Now that you have your client, let's do some documentation. Go fill out this, like, name, where the repository, the source code is, because free software is great, and you should open source your stuff. Specify the installation command. What's cool about this is this can be run automatically from Emacs, so if it's, like, pip install pyrite, right, you can put that there, and Emacs will ask you, do you want to install the language server, and you can hit yes and users will just have it installed for them, and then you can say whether or not it's a debugger. This is completely separate, so there's this thing called DAP, which is the debugger adapter protocol, and it's similar to LSP but for debuggers, which is very cool, and then finally link to your documentation. Please, please document your stuff. If you want to add, like, a custom Emacs function or custom capabilities, it's super easy. It's literally just, like, calling a normal Emacs function. For example, Semgrep normally only scans files when you open them, but we added a Emacs function that will scan your entire project, right, and so that was just a client notification. It was just LSP notify and then a custom method, and it's great because now you can just scan your project from a simple Emacs function. Requests, very similar to notifications. You send it and then pass it a lambda and do something with the result, and so that's adding custom capabilities, and that's pretty much it, so thank you for listening. Some resources here. These links are clickable if you get the PDF, if you get the slides. Semgrep we're hiring. If you want to work on, like, programming language theory stuff, compilers, parsers, editors, email me or go look at our jobs. The LSP specification, this is, like, the holy Bible. It has all the specs, all the types, everything. LSP mode and the docs. LSP mode, right, that's where you want to add your client. The docs are great, super useful. Rust analyzer is just a great reference for language servers in general if you want to write one or if you just want to, like, see how they work. It's all just really well done. It's great code, very readable, and then down here is just a long video tutorial, a longer video tutorial, not by me, by someone else on how to add a language client to Emacs, but hopefully this is sufficient for y'all, and now it's time for some Q&A.