Over-engineering bunnies.io 🐰

bunnies.io is a simple little site which serves hundreds of gigabytes of rabbit-related videos to bunny lovers every month. This blog post documents what makes the site work, from the software that serves API requests, to the addition and transcoding of the videos themselves.

Almost all the software described below is visible-source at the Bunnies GitHub Organisation 🐰, but be warned that most of it was written as a learning tool (and might therefore not use “best practices” in their respective contexts).

I deliberately keep the site ad and tracker free, despite the costs of hosting it. The best way to support it is through Patreon. patreon

Backend

The heart of the backend is a Dropwizard API layer, written in Kotlin. I wrote some fairly comprehensive docs for it which describe the endpoints. The most commonly used feature is to fetch the media for a random bunny, as shown here.

Endpoints (and their variants) exist as @Path entries in a rather large BunnyServiceV2 file. Most of the code in there is actually sanity checking user input. There’s also some goodies like binary search to fulfil requests for media of a specific aspect ratio.

A counter for each resource (and the current total), is persisted using Redis (there’s a Dropwizard plugin for it). Counts are stored under bunnies.%s.count keys, where %s is the resource ID of the bunny (or total). Counts are updated in-memory and synced with Redis once a second, so only a second’s worth of counts are lost in the event of the API crashing, and there’s a maximum of 60 potential updates a minute if the site’s under heavy load.

The API is given metadata in the form of a couple of JSON files, describing resource IDs and their associated information, and it’s described below in the “Metaformer” section.

After completing the API, I used a load testing service called loader.io to flood it with requests and see what broke. To my surprise it handled thousands of requests a second, and the limiting factor was actually CPU usage on the Linode instance it’s hosted on. This turned out to be prudent as someone from Mojang retweeted it, which featured it on their blog briefly.

Metaformer

Metadata is fed in to the API service through a couple of JSON files: one for “derived” metadata and one for “specified” metadata. Derived metadata is computed for all the media files produced by the asset pipeline (described below), and specified metadata is hand-crafted by me to add things like source/attribution links.

Metaformer is a small Kotlin app which analyses all the media associated with the site (gifs, webms, mp4s and poster images at the time of writing) and spits out a rather large JSON file containing information like file sizes, aspect ratios, and IDs. Here’s a small sample of the derived_metadata.json file for bunnies 155 and 156:

{
  "resources": {
    "155": {
      "width": 720,
      "height": 406,
      "aspect_ratio": 1.7733990147783252,
      "sizes": {
        "gif": 28569134,
        "webm": 17849460,
        "mp4": 13481197
      }
    },
    "156": {
      "width": 250,
      "height": 445,
      "aspect_ratio": 0.5617977528089888,
      "sizes": {
        "gif": 13017206,
        "webm": 813779,
        "mp4": 901965
      }
    }
  }
}

There’s quite a bit more that can be done with this metadata generation layer. Proper image analysis for attributes like colour would make for an interesting project.

Asset Pipeline

A surprising amount of work goes in to adding a particular bunny to the site, and I haven’t invested the time to automate any of it yet.

The videos always start out as gifs, and are passed through a conversion script which transcodes it to MP4 and Webm formats at very high quality settings, and also captures a still image for displaying as a “poster”.

It turns out video transcoding is something of a dark art, so the script also does things like truncate the gifs to be an even height, as most video players couldn’t manage to show videos with odd heights (especially MP4s) 🤔.

Transcoding the entire library of ~160 gifs takes a number of hours on my server, but I only have to do them all again if I expect there to be improvements in quality (from ffmpeg bug fixes, improvements, etc). The libvpx-vp9 encoder (for Webms) was particularly slow until recently, when they added row-based multithreading, which halved the transcode time.

When the transcoding is done, I just use rsync to copy it over to a media server.

Frontend

The frontend is deliberately sparse and arguably needs the most work; but it’s also the least interesting bit of it (for me).

A dead simple jQuery script listens for button taps and makes AJAX calls to populate divs with content.

The video file format served to you is chosen by your browser / OS, with MP4 and Webm files available to choose from. iOS and macOS devices probably choose the MP4, Android probably chooses the Webm, and I have no idea what Windows does.

Distribution

Whilst my server could handle at least 1TB of traffic a month, it has a relatively slow outbound connection speed (200mbps). When there was a sudden influx of traffic (often by someone posting a bunny in a large Slack channel), the site would crawl for 30 seconds whilst it served an enormous gif to everybody.

The solution to this was using Cloudflare, who act as a CDN for media I put on a specific domain. It’s free, fast, and they served literally 90% of my bandwidth from their cache last month!

cloudflare bandwidth graph

Further Improvements

I’ve alluded to possible improvements in the previous sections. I would particularly like to work on different kinds of automation:

  • Being able to put gifs somewhere and have a process decide which need transcoded and copied
  • Automated running of Metaformer on the resulting files
  • Dev-ops is all manual right now 😱 There’s definitely an opportunity to learn about configuration management, containerisation and all that good stuff.

Conclusion

This started as a fun “little” project for learning my way around backend development (and the API layer was one of the first things I did in Kotlin). I’m really happy it serves so much traffic, and I’m glad there’s lots of room for improvement and future sub-projects!

I made a GitHub Organisation for it even though it’s rough, because I think it’s interesting for people to see what goes in to it. When I feel like updating the frontend, that’ll be visible-source too.

It’s ok to learn, move fast, and make things that aren’t perfect if you’re having fun!

Enjoy reading this post? Want to support my work, and get sneak-peeks at other side-projects? Become a Patron for $1 ❤️ 👇