Making Your Own URL Shortener 🔗

This month I decided I wanted a URL shortener, after noticing the domain crrt.io was available. Many such services exist (both self-hosting and hosted) but I had specific requirements that they didn’t tend to meet, and making things is fun, so I wrote one called Kit (as in baby rabbit) 🐇.

Requirements

First, I wrote out a list of requirements for the shortening service:

  • Should be able to add things using an HTTP POST
  • Should be written in Java or Kotlin
  • Should be able to add links manually with custom IDs
  • Should be able to add links and randomly generate IDs
  • Should be able to use emoji in IDs
  • IDs should be configurable (in length, and character set)
  • Should be able to deduplicate links
  • Should be able to look at access logs easily

A few homegrown services do exist that meet many of these, but none seemed to be able to handle emoji in the links.

Construction

Technologies And Building It

I used dropwizard for the service framework - it also powers bunnies.io and I’m somewhat familiar with it, and using it in Kotlin. There are technically two endpoints - one to GET links (what everybody hits when they follow a short URL), and one to POST them (what I use to add links).

Persistence is done using redis as, again, I’ve used it before and the problem is simple enough that I really don’t need a relational database. Links are stored as link.id.<id> → URL, and hashes of links are stored as link.hash.<hash> → ID. By storing hashes, links can be looked up first so you’re not storing them again (you just return the existing ID instead).

httpie proved invaluable for making lots of HTTP requests and understanding the responses whilst debugging.

httpie example

Difficulties

As expected, adding support for configurable emojis was difficult. Specifically, being able to parse them correctly from links, and from a file specifying permitted characters for ID generation, was difficult. In the end I used a popular i18n library called icu4j to provide a BreakIterator that understands recent versions of Unicode, and wrote a relatively small function to iterate through the ‘breaks’ in a string in order to extract the grapheme clusters (meaningful symbols such as emoji, or simple characters like ASCII). It works pretty well: https://crrt.io/🐰.

fun extractGraphemeClusters(input: String): List<String> {
    val characters = mutableListOf<String>()

    val iterator = BreakIterator.getCharacterInstance()
    iterator.setText(input)

    var start = iterator.first()
    var iterated = false
    while (!iterated) {
        val next = iterator.next()
        if (next == BreakIterator.DONE) {
            iterated = true
            continue
        }

        val extracted = input.substring(start, next)

        start = next

        characters += extracted
    }

    return characters
}

Sadly, many chat clients don’t seem to parse links with emoji in them properly 😢. The solution to this was to use the emoji shortened links sparingly, offer text alternatives (like https://crrt.io/bunnies), and hope it gets better in the future. If you’re a client developer checking for links - treat them like email addresses (i.e. be lenient, do something like checking for an http prefix and then parse to the next space).

Usage

When I want to add a new link, I can make an HTTP request like so:

⚡️ http POST https://link.shortener/link?key=an_api_key&id=something&link=https://skywelch.io/

Which can be fetched with:

⚡️ http GET https://link.shortener/something

TweetBot also has support for custom URL shorteners - I can just put links in my tweets like usual and they’ll automatically get shortened by my own service! However, it’s pretty much undocumented, and only appears to support GET requests with the API key in the URL 😱. In practice this is OK for me because I’m only responding over HTTPS and I trust that I’m the only person looking at my own logs.

Analysing Usage

After the success of using GoAccess in my previous post about analysing the usage of bunnies.io - I decided to use it again for this project.

goaccess Who the fuck is trying to get to /wp-login.php?!

By ‘wrapping’ links with my shortener (as I’ve done through most of this site), I can look at referrers and tell where people are coming from. So far, the majority come from Twitter, Patreon and this site.

Conclusion

Having my own shortener has actually turned out to be really useful, and I had loads of fun making it! Hopefully documenting some of the technologies and decisions here helps someone else make something cool, too.

If you like these posts, or you learned something useful, please consider tipping through Patreon 🎉

Enjoy reading this post? Want to support my work, and get sneak-peeks at other side-projects? Become a Patron for $1 ❤️ 👇