Editing captions

I want to quickly demonstrate how I reflow and edit captions. I don't have sound on these recordings because I forgot to have it capture the system audio. Watching me type probably won't be all that interesting, but they're there anyway. =)

Reflowing the text

First, let's start with reflowing. We like to have one line of captions about 50 characters long so that they'll display nicely in the stream. You probably don't need to do this step if you're working with the VTT files in the backstage area, since I try to reflow things before people edit them, but I thought I'd demonstrate it in case people are curious.

I start with the text file that OpenAI Whisper generates. I set my fill-column to 50 and use display-fill-column-indicator-mode to give myself a goal column. A little over is fine too. Then I use emacsconf-reflow from the emacsconf-el repository to quickly split up the text into captions by looking for where I want to add newlines and then typing the word or words. I type in ' to join lines. Sometimes, if it splits at the wrong one, I just undo it and edit it normally.

It took me about 4 minutes to reflow John Wiegley's 5-minute presentation.

Alignment

The next step is to align it with aeneas. This takes each line of text and figures out the start and end timestamps for it. I've just added subed-align to the subed package, but you can also call aeneas manually by following its instructions.

Editing the VTT

The last step is to edit these subtitles. VTT files are plain text, so you can edit them with regular text-mode if you want to. I like to use subed because then the video playback is synchronized with my editing, which makes it easier to figure out technical words. subed tries to load the video based on the filename, but if it can't find it, I can use C-c C-v (subed-mpv-find-video) to play a video file or C-c C-u to play a video at a URL.

I look for misrecognized words and use C-s (isearch-forward) to jump to them. I also like to change things to follow Emacs keybinding conventions. I sometimes spell out acronyms on first use or add extra information in brackets. The captions will be used in a transcript as well, so I like to add punctuation, remove some filler words, and try to make it read better.

Sometimes I want to tweak how the captions are split. I use M-j (subed-jump-to-current-subtitle) to jump to the caption if I'm not already on it, listen for the right spot, and maybe use M-SPC to toggle playback. I use M-. (subed-split-subtitle) to split a caption at the current MPV playing position and M-m (subed-merge-with-next) to merge a subtitle with the next one. Times don't need to be very precise.

It took me about 5 minutes to edit the 5-minute talk.

It usually takes me between 1x to 4x the video time to edit captions, but I don't usually listen to everything all the way through, so there are probably still a few errors. I e-mail captions to the speakers for review in order to help catch things. So that's how I edit the captions for EmacsConf. Hope that helps!

sacha@sachachua.com