All talks: https://emacsconf.org/2024/talks/
p-search: a local search engine in Emacs
https://emacsconf.org/2024/talks/p-search - Zac Romero - Track: Development
Watch/participate: https://emacsconf.org/2024/watch/dev/
Q&A room: https://media.emacsconf.org/2024/current/bbb-p-search.html
IRC: https://chat.emacsconf.org/#/connect?join=emacsconf,emacsconf-dev or #emacsconf-dev on libera.chat network
Guidelines for conduct: https://emacsconf.org/conduct
See end of file for license (CC Attribution-ShareAlike 4.0 + GPLv3 or later)
----------------------------------------------------------------
Notes, discussions, links, feedback:
- I like the dedicated-buffer interface (I'm assuming using magit-section and transient).
- <meain> Very interesting ideas. I was very happy when I was able to do simple
- filters with orderless, but this is great [11:46]
- <NullNix> I dunno about you, but I want to start using p-search yesterday.
- (possibly integrating lsp-based tokens somehow...) [11:44]
- <codeasone> Awesome job Ryota, thank you for sharing!
----------------------------------------------------------------
Questions and answers go here:
- Q: Do you think a reduced version of this functionality could be integrated into isearch? Right now you can turn on various flags when using isearch with M-s <key>, like M-s SPC to match spaces literally. Is it possible to add a flag to "search the buffer semantically"? (Ditto with M-x occur, which is more similar to your buffer-oriented results interface)
- A: it's essencially a framwork so you would create a generator; but it does not exist yet.
- Q: Any idea how this would work with personal information like Zettlekastens?
- A: Useable as is, because all the files are in directory. So only have to set the files to search in only. You can then add information to ignore some files (like daily notes). Documentation is coming.
- Q: How good does the search work for synonyms especially if you use different languages?
- A: There is an entire field of search to translate the word that is inputted to normalize it (like plural -> singular transformation). Currently p-search does not address this.
- A: for different languages it gets complicated (vector search possible, but might be too slow in Elisp).
- Q: When searching by author I know authors may setup a new machine and not put the exact same information. Is this doing anything to combine those into one author?
- A: Currently using the git command. So if you know the emails the author have used, you can add different priors.
- Q: A cool more powerful grep "Rak" to use and maybe has some good ideas in increasing the value of searches, for example using Raku code while searching. is Rak written in Raku. Have you seen it?
- Q: Have you thought about integrating results from using cosine similarity with a deep-learning based vector embedding? This will let us search for "fruit" and get back results that have "apple" or "grapes" in them -- that kind of thing. It will probably also handle the case of terms that could be abbreviated/formatted differently like in your initial example.
- A: Goes back to semantic search. Probably can be implemented, but also probably too slow. And it is hard to get the embeddings and the system running on the machine.
- Q: I missed the start of the talk, so apologies if this has been covered - is it possible to save/bookmark searches or search templates so they can be used again and again?
- A: Exactly. I just recently added bookmarking capabilities, so we can bookmark and rerun our searches from where we left off. I tried to create a one-to-one mapping from the search object to the search object - there is a command to do this- to get a data representation of the search, to get a custom plist and resume the search where we left off, which can be used to create command to trigger a prior search.
- Q: You mentioned about candidate generators. Could you explain about to what the score is assigned to. Is it to a line or whatever the candidate generates? How does it work with rg in your demo?
FOLLOW-UP: How does the git scoring thingy hook into this?
- A: Candidate generator produces documents. Documents have properties (like an id and a path). From that you get subproperties like the content of the document. Each candidate generator know how to search in the files (emails, buffers, files, urls, ...). There is only the notion of score + document.
- Then another method is used to extract the lines that matches in the document (to show precisely the lines that matches).
- Q: Hearing about this makes me think about how nice the emergent workflow with denote using easy filtering with orderless. It is really easy searching for file tags, titles etc. and do things with them. Did this or something like this help or infulce the design of psearch?
- A: You can search for whatever you want. No hardcoding is possible for anything (file, directories, tags, titlese...).
- Q: [comments from IRC] <NullNix> git covers the "multiple names" thing itself: see .mailmap 10:51:19
- <NullNix> thiis is a git feature, p-search shouldn't need to implement it 10:51:34
- <NullNix> To me this seems to have similarities to notmuch -- honestly I want notmuch with the p-search UI :) (of course, notmuch uses a xapian index, because repeatedly grepping all traffic on huge mailing lists would be insane.) 10:55:30
- <NullNix> (notmuch also has bookmark-like things as a core feature, but no real weighting like p-search does.) 10:56:07
- A: I have not used notmuch, but many extensions are possible. mu4e is using a full index for the search. This could be adapted here to with the SQL database as source.
- Q: You can search a buffer using ripgrep by feeding it in as stdin to the ripgrep process, can't you?
- A: Yes you can. But the aim is to search many different things in elisp. So there is a mechanism in psearch anyway to be able to represent anything including buffers. This is working pretty well.
- Q: Thanks for making this lovely thing, I'm looking forward to trying it out. Seems modular and well thought out. Questions about integreation and about the interface
- A: project.el is used to search only in the local files of the project (as done by default)
- Q: how happy are you with the interface?
- A: psearch is going over the entire files trying to find the best. Many features can be added, e.g., to improve debuggability (is this highly ranked due to a bug? due to a high weight? many matching documents?)
- A: hopefully will be on ELPA at some point with proper documentation.
- Q: Remembering searches is not available everywhere (rg.el? but AI package like gptel already have it). Also useful for using the document in the future.
- A: Retrievel augmented generation: p-search could be used for the search, combining it with an AI to fine-tune the search with a Q-A workflow. Although currently no API.
- (gptel author here: I'm looking forward to seeing if I can use gptel with p-search)
- A: as the results are surprisingly good, why is that not used anywhere else? But there is a lot of setup to get it right. You need to something like emacs with many configuration (transient is helping to do that) without scaring the users.
- Everyone uses emacs differently, so unclear how people will really use it. (PlasmaStrike) For example consult-omni (elfeed-tube, ...) searching multiple webpages at the same time, with orderless. However, no webpage offers this option. Somehow those tools stay in emacs only. (Corwin Brust) This is the strength of emacs: people invest a lot of time to improve their workflow from tomorrow. [see xkcd on emacs learning curve vs nano vs vim]
- https://github.com/armindarvish/consult-omni
- https://github.com/karthink/elfeed-tube
- https://www.reddit.com/r/ProgrammerHumor/comments/9d6f19/text_editor_learning_curves_fixed/
- A: emacs is not the most beginner friendly, but the solution space is very large
- (Corwin Brust) Emacs supports all approaches and is extensible. (PlasmaStrike) Youtube much larger, but somehow does not have this nice sane interface.
- Q: Do you think the Emacs being kinda slow will get in the way of being able to run a lot of scoring algorithms?
- A: The code currently is dumb in a lot of places (like going of all files to calculate a score), but that is not that slow surprisingly. Elisp enumerating all files and multiplying numbers in the emacs repo isn't really slow. But if you have to search in files, this will be slow without relying on ripgrep on a faster tool. Take for example the search in info files / elisp info files, the search in elisp is almost instant. For human-size documents, probably fast enough -- and if not, there is room for optimizations. For coompany-size documents (like repos), could be too small.
- Q: When do you have to make something more complicated to scale better?
- A: I do not know yet really. I try to automate tasks as much as possible, like in the emacs configuration meme "not doing work I have to do the configuration". Usually I do not add web-based things into emacs.
----------------------------------------------------------------
Next talks:
Questions/comments related to EmacsConf 2024 as a whole? https://pad.emacsconf.org/2024
----------------------------------------------------------------
This pad will be archived at https://emacsconf.org/2024/talks/p-search after the conference.
Except where otherwise noted, the material on the EmacsConf pad are dual-licensed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International Public License; and the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) an later version. Copies of these two licenses are included in the EmacsConf wiki repository, in the COPYING.GPL and COPYING.CC-BY-SA files (https://emacsconf.org/COPYING/)
By contributing to this pad, you agree to make your contributions available under the above licenses. You are also promising that you are the author of your changes, or that you copied them from a work in the public domain or a work released under a free license that is compatible with the above two licenses. DO NOT SUBMIT COPYRIGHTED WORK WITHOUT PERMISSION.