Editing captions
I want to quickly demonstrate how I reflow and edit captions. I don't have sound on these recordings because I forgot to have it capture the system audio. Watching me type probably won't be all that interesting, but they're there anyway. =)
Reflowing the text
First, let's start with reflowing. We like to have one line of captions about 50 characters long so that they'll display nicely in the stream. You probably don't need to do this step if you're working with the VTT files in the backstage area, since I try to reflow things before people edit them, but I thought I'd demonstrate it in case people are curious.
I start with the text file that OpenAI Whisper generates. I set my
fill-column
to 50 and use display-fill-column-indicator-mode
to give
myself a goal column. A little over is fine too. Then I use
emacsconf-reflow from the emacsconf-el repository to quickly split up
the text into captions by looking for where I want to add newlines and
then typing the word or words. I type in ' to join lines. Sometimes,
if it splits at the wrong one, I just undo it and edit it normally.
It took me about 4 minutes to reflow John Wiegley's 5-minute presentation.
Alignment
Editing the VTT
The last step is to edit these subtitles. VTT files are plain text, so
you can edit them with regular text-mode
if you want to. I like to
use subed because then the video playback is synchronized with my
editing, which makes it easier to figure out technical words. subed
tries to load the video based on the filename, but if it can't find
it, I can use C-c C-v
(subed-mpv-find-video
) to play a video file
or C-c C-u
to play a video at a URL.
I look for misrecognized words and use C-s
(isearch-forward
) to
jump to them. I also like to change things to follow Emacs keybinding
conventions. I sometimes spell out acronyms on first use or add extra
information in brackets. The captions will be used in a transcript as
well, so I like to add punctuation, remove some filler words, and try
to make it read better.
Sometimes I want to tweak how the captions are split. I use M-j
(subed-jump-to-current-subtitle
) to jump to the caption if I'm not already on it, listen for the right spot, and maybe use M-SPC
to toggle playback. I use M-.
(subed-split-subtitle
) to split a caption at the current MPV playing position and M-m
(subed-merge-with-next
) to merge a subtitle with the next one. Times don't need to be very precise.
It took me about 5 minutes to edit the 5-minute talk.
It usually takes me between 1x to 4x the video time to edit captions, but I don't usually listen to everything all the way through, so there are probably still a few errors. I e-mail captions to the speakers for review in order to help catch things. So that's how I edit the captions for EmacsConf. Hope that helps!