All talks: https://emacsconf.org/2024/talks/
Transducers: finally, ergonomic data processing for Emacs!
https://emacsconf.org/2024/talks/transducers - Colin Woodbury - Track: General
Watch/participate: https://emacsconf.org/2024/watch/gen/
Q&A room: https://media.emacsconf.org/2024/current/bbb-transducers.html
IRC: https://chat.emacsconf.org/#/connect?join=emacsconf,emacsconf-gen or #emacsconf-gen on libera.chat network
Guidelines for conduct: https://emacsconf.org/conduct
See end of file for license (CC Attribution-ShareAlike 4.0 + GPLv3 or later)
----------------------------------------------------------------
Notes, discussions, links, feedback:
- <lounge-081> What made transducers click for me way back when was realizing that the usual operations (think map/reduce/filter/etc and their generalizations) implicitly or explicitly demand realized collections, plus intermediate ones between those operations (and GC), instead of composing such transformations into one operation and then making this applicable to streams of unknown-sizes and such.
- <Ez3> Sorry but map is collect and filter is select for me :)
- <lounge-081> @robin: there are many ways to get to them (some may already think of those functions as folds, etc.), but for the bulk of people familiar with map/reduce/filter/etc, it's useful to enter the thinking realizing that you may have infinite streams (of any type even) on input, and may not want to have realized collections at every step of the usual function applications, thus the need to compose the things that make up a map/reduce/filter/etc before applying them one by one on a continued stream of things one at a time
- <NullNix> ellis: I wrote about half of one in binutils libctf (generators, anyway). See the *_next() functions. Having seen this I'll try to mutate the internals to something closer, right now every iterator is independently implemented (mostly).
- <NullNix> (inspired by generators, not transducers, since I didn't know about them)
- <NullNix> still *far* less ugly than "call a function with every item" *_iter() stuff
- <lounge-081> Thanks for the answers fosskers, I'm quite hopeful with transducers working their way into many things, so thinking ahead to where that may happen and to solving perf hurdles
- <ElephantErgo> I'm totally sold. I'm working on a CL codebase right now, and these are going in there immediately
- <robin> (also CL does not require TCO but good compilers support it to the extent that extensive use of FP is practical)
- <ellis> it's a tricky protocol to get perfect the first time for sure, is something that transcends language barriers so always fun to see more impls :)
- <robin> CLTL2 docs on SERIES for those who are curious http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/clm/node347.html#SECTION003400000000000000000
- <robin> definitely watching this one more carefully. if it's CLOS-oriented i'm going to like it
- <robin> note that full TCO is strictly more expressive than special-case optimizations as with emacs's cl-labels
----------------------------------------------------------------
Questions and answers go here:
- Q: When I tried comparing transducers.el to cl-lib and dash (benchmark-compiled), I got the following results:
cl-lib: 0.5 sec, 2 GCs
dash: 0.3 sec, 1 GC,
transducers: 0.75 sec, 2 GC
cl-loop: 0.025 sec, 0 GC (note: 0.025, one order-of-magnitude faster)
I expected transducers to be slower than cl-loop but faster than the cl-lib or dash. However this isn't the case. Any idea why? (benchmark is here: https://old.reddit.com/r/emacs/comments/1h5c778/which_emacsconf_2024_talks_have_your_attention/m061dge/)
- (benchmark-run-compiled 1000 (cl-loop for n from 1 below 2000 by 2 sum (* n n) into total finally return total))
- A: Loop is always going to win in cases like this where we are not doing two nested (e.g.) change calls, what I called comp steps. tranducers shines while we need to do things which chain things together; we can sometimes express ourselves more clearly vs loop. this may sound sounds like moving the goal-posts: it's really about internal function calls, avoiding stepping though each loop in ways which loop doesn't need to do, so loop might "win".
- Note: I'm comparing against cl-lib and dash -- the cl-loop is only for reference. I'm curious about the performance gap between transducers and cl-lib/dash. The number of function calls is the same for cl-lib and dash and transducers.
- Q: Did you think about generators as a source cf lists, vectors, etc? Maybe I got a word wrong. After the development that generators and Series operations needed-each-other, not being redundant as had been thought. I forgot who the generators person was in lisp.
- Q: Do you know of any theoretical texts on transducers?
- Q: Waters (lazy series in lisp, late 70s) said that this *should have* been done as an additional compiler feature in compilers, but if not it must be a macro package. Did you think about that viz your cl, fennel, elisp, porting of your transducers?
- A: I think my work could help provide the basis for this; there's much more to be done.
- Q: Does t-buffer-read provide a lazy stream that's linewise, or charwise, or do something else entirely?
- Q: Can the Elisp library be combined with the stream.el API (https://elpa.gnu.org/packages/stream.html)? Or seq in general?
- A: I'd say these libraries are completely orthogonal. (Re: what made me ask this question was seeing `t-buffer-read' and hoping that this doesn't convert a buffer into a string) With seq/generics it should just work: magic of defgeneric
- Q: How does one debug a t-comp expression? Can you single step and see intermediate results of the different statements you declare?
- A: In Common Lisp it is wonderful. In Emacs Lisp it is more complicated. I don't know if step debugging would work, but there is a "log" (?) function to print out values.
- Q: Is there a path for transducers to enable elisp processing of otherwise overly large datasets as if just normal Emacs "buffers" (i.e. just pulling one thing at a time so essentially stream-like under the hood but buffer-like in interface), with none of the usual perf issues with a traditional buffer structure?
- Q: Re the performance issues mentioned and that popped up recently in the lists/forums, to what extend does tailcall optimization and other mechanisms (incl. inlining, GC-friendliness, etc.) could eventually alleviate issues and enable their use at little to no extra cost?
- A: Over time, with further work and optimizations. Some already there (taillcall notably)
- Q: Is there an option to read a csv/json and produce an alist or plist instead of a hash table for an entry?
- Q: Is the common lisp version ready for 'production' use? Is it complete enough and the API stable enough?
- A: I use it all the time. I use it extensively. Programming a game, not realizing a dent in the frame rate.
- Q: Do we need a pre-written "t-" version for every already existing reducing function like + or is there a function to construct them from already defined reducer 2-arg functions?
- Q: Is the compelling argument for transducers is that it's a better abstraction? Seems like a lot of concerns/objections (while pragmatically valid) are focused on implementation. Can this abstraction allow for advances in implementation?
- Time for the closing remarks, but people can stay on the pad or join the BBB and keep going
----------------------------------------------------------------
Next talks:
Questions/comments related to EmacsConf 2024 as a whole? https://pad.emacsconf.org/2024
----------------------------------------------------------------
This pad will be archived at https://emacsconf.org/2024/talks/transducers after the conference.
Except where otherwise noted, the material on the EmacsConf pad are dual-licensed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International Public License; and the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) an later version. Copies of these two licenses are included in the EmacsConf wiki repository, in the COPYING.GPL and COPYING.CC-BY-SA files (https://emacsconf.org/COPYING/)
By contributing to this pad, you agree to make your contributions available under the above licenses. You are also promising that you are the author of your changes, or that you copied them from a work in the public domain or a work released under a free license that is compatible with the above two licenses. DO NOT SUBMIT COPYRIGHTED WORK WITHOUT PERMISSION.