Building Browser-Based Video Editor in Vanilla JS

I made a video editor over the long weekend and wanted to share my experience of diving into JavaScript in 2021.

The code can be found here: https://github.com/bwasti/mebm and a demo can be found here.

Here's a quick run down of how and why I built it.

Why:

How:

Turns out modern browsers (Safari included) have a shocking number of features. After browsing around for a bit, I conclude that this whole project is basically just going to be patching together the impressive work of browser developers. I can probably get it done in a weekend.

I start in familiar territory: a window.requestAnimationFrame(loop) where I paint a video element to a canvas. I hide the video element with a low z-index, a large offset and overflow: none. A user can move a cursor around ("scrub") to control the time of the video (which I easily set with video.currentTime = t). Perfect.

My sample hardcoded .mov works well out of the box. If the user scrubs, I pause the video, set a different time, and then resume the video. But then I try to get fancy and reactive. My Player class has an attribute time, so I should be able to just set player.time to whatever, whenever, wherever in the code and the video will JUMP to that time. That would certainly make life a lot easier. So in the requestAnimationFrame loop (which conveniently gives the callback a time parameter) I set video.currentTime = this.time.
Now the user interaction (and anything else) of could just set player.time and immediately see the video at the right time. Great.

Then I tried an mp4 and ... crap. I could scrub to certain times really easily, but ran into jittery messes every so often. It seemed to be every 2 seconds that it would freeze. Turns out that rapidly setting video.currentTime is effectively useless with mp4s. This is because the browser isn't smart enough to cache the partial decoding of mp4s and would occasionally hit frames that required looking at many of the frames prior to it. Slow.

So I came up with a dumb idea. Decode every frame up front and save them into an array of ImageData objects. I initially thought that there's no way a browser could

  1. decode quickly and
  2. store all these massive image files for a full video. I was wrong!
    The little % that shows when you drag a video in is actually the decode process and it seems that browsers are given plenty of ram so its quite easy to work with thousands of images.

Cool, videos now play and I can scrub around easily. Audio doesn't work, but oh well, I'm just making gifs.

So what do I actually want in a meme? Just pasting text and pictures, right? I quickly realize that a bunch of stationary images and text on video is boring. REALLY boring.

So I decide to add key-frame animation. If you do anything to an image or text like scale it up, move it around, etc. (jk there is no "etc", you can only do that), I want to capture the moment in video time (checking that trusty player.time attribute) and record it to a set of frames associated with the image or text and label it a "key" frame. Then, as you play the video, the image or text will slowly morph into the next "key" frame. This is exactly the same behavior as css key frames.

I eventually find the easiest way to implement this is about as dumb as the video implementation. Record a datapoint for every frame in the video and then at each time step look up the datapoint and transform the image or text accordingly (scale + x + y). Since we can easily store and draw full images at like 60fps I figure this won't hurt. However, it does mean I need to keep every datapoint updated with every user interaction. That's not exactly ideal because the user might have a transform at the beginning of the video and one transform at the end, so I'd need to update every single point in-between. Seems dumb.

But dumb is best, so I do exactly that and call it a day.

Now on to importing and exporting! StackOverflow to the rescue. There's an incredible MediaRecorder API in modern browsers that makes exporting super easy. You just record the canvas as if its a webcam and then write the output to a file. I wasn't super happy with a popup blocker ruining the download, so I instead inject a link (as is done in the SO post) and make the user click "download" afterward. Uploading is also straightforward using the FileReader and file drag and drop API. (Not be confused with the HTML drag and drop API. Which I do. Repeatedly.) And voilĂ  the editor is useable!

With a functional edit and export flow I'm pretty happy with the results. Time to start making memes!

Thanks for reading :^)