Building Our Own Heroku-Like Service Using Kubernetes

This is the story of how and why we built a Heroku-like service using Kubernetes, with no previous experience and no DevOps/SREs on our team.

At Wilco, we aim to let developers get better through hands-on, lifelike challenges (which we call “quests”) that have them ship code to a cloud provider. Our users get a sandboxed workspace in the cloud, where they can experiment with the different aspects of shipping code to the cloud: pushing code changes, setting up logs, monitoring performance, handling errors, and much more.

Until recently, we were using Heroku as our infrastructure. If you’re unfamiliar with Heroku, it is a popular cloud Platform-as-a-Service (PaaS) that lets users deploy their apps to the cloud. We would use Heroku’s API to programmatically create a sandboxed “app” within our account and add them as “collaborators,” which gave them access to deploy code to that “app”:

‍

As this high-level abstract diagram shows, users could experiment with different aspects of cloud deployment (setting configurations and environment variables, CI/CD pipeline, and more) within their allocated user “app” space, which they have full control over.

This setup worked well for our users and us, with some caveats. The biggest, of course, is that this sandboxed environment is based on a proprietary solution. Although the high-level concepts are somewhat transferable to other solutions, the setup was tied to a specific vendor.

While we were okay with that, and Heroku’s offering initially helped us move quickly, other issues started popping up:

We were slowly stretching the limit of what Heroku could support in a single account
Role management wasn’t fine-grained enough, and we couldn’t easily prevent users from changing the sizes of the machines their apps were running.
Costs were rising quickly with usage, and Heroku offers no support for spending limits or alerts.

We knew we’d eventually have to migrate to another solution or roll our own infrastructure. After further discussion (and a good nudge from a rogue user who found a loophole and provisioned the most expensive resources, resulting in a substantial bill), we decided it was time to migrate.

‍

Setting our sights on Kubernetes

We started with a defined goal of “creating an in-house version of Heroku” to suit our needs so that we could provide the same experience but with more control over the user experience, level of abstraction, our spending limits, and more.

Although we could have chosen a more managed solution, we decided to use Kubernetes. With little-no-experience within our team in Kubernetes, this decision might seem ill-advised. However, we had an ulterior motive: to help teach our users to work with Kubernetes. What’s the best way to teach if not to learn it yourself first?

Since I am not an SRE or a DevOps Engineer by trade, and since time was of the essence and we were aware of our institutional knowledge gap, we opted to get some initial guidance from an expert. Omri Siri, CEO of Project Circle, was quick to help by going over our list of requirements and offering his sage advice. Omri helped us come up with an initial plan to achieve this migration.

Kubernetes is, as defined in their documentation, “an open-source system for automating deployment, scaling, and management of containerized applications.” It’s a tool that has been open-sourced in a joint effort by prominent engineers from Google, Red Hat, Amazon, and other big tech companies. It is battle tested and flexible for many uses and is used in production by many companies. For a cool history of how it came to be, watch the free, two-part Kubernetes Documentary.

‍

The Plan

Our final plan involved setting up a Kubernetes cluster on AWS. Following Omri’s input, we decided to automate as much of the product as possible, assisted by several 3rd-party tools:

Terraform (we use the Terragrunt wrapper) for provisioning and managing our infrastructure on AWS as code. By relying on Terraform, we could avoid setting up the different services we use in AWS (more on those later).
Helm - the Kubernetes package manager. Helm enables easy, automated management (install, update, or uninstall) of packages for Kubernetes applications and deploys them with just a few commands. Helm is the de-facto standard for working with Kubernetes nowadays.
Github Actions, as our CI/CD pipeline.
AWS as the infrastructure for Kubernetes.

‍

The Architecture

Remember the previous Heroku diagram? We aimed to build something similar using Github, AWS, and Kubernetes. Let’s start with the result and then break it down:

‍

Some Basic Terminology

Kubernetes (also called k8s for short) runs on a set of one or more computers (called Worker Nodes in Kubernetes lingo). A group of Worker Nodes that are provisioned to a single Kubernetes “instance” are called a Kubernetes Cluster. A Cluster has at least one Worker Node.

As you can see in the diagram above, our Kubernetes Cluster (dubbed “Anythink Cluster”) runs on AWS. All three major infrastructure providers (AWS, Google Cloud, and Microsoft Azure) support Kubernetes. We went with AWS due to our previous experience.

We provisioned EC2 machines as Worker Nodes for the Kubernetes Cluster and used a service called EKS (״Elastic Kubernetes Service״, Amazon’s hosted Kubernetes offering). Besides EKS and EC2, we also use ECR (״Elastic Container Registry״), which is like a git repository for storing public or private docker images.

The diagrams do not mention other aspects: Authentication, Authorization, and Resource Management. In Heroku’s world, this is all abstracted as part of their offering (and controlled via UI or API).

In the Kubernetes world, most of these are available within Kubernetes itself. There are several options for handling Authentication (we specifically went with Service Accounts), Authorization (we’re using RBAC Authorization), and Resource Management (See here).

For CI/CD, we opted to use Github Actions as we already integrate deeply with Github as part of the Wilco experience.

‍

How It All Came Together

Once our cluster was up and running, we could start letting users work on it. Within a Kubernetes Cluster, Kubernetes has a concept called Namespaces which provides a mechanism for isolating resource groups within a single cluster. This mechanism is ideal for our use case of isolating users’ sandboxed workspaces. When users start playing a quest that requires working with cloud deployment, we create a new empty namespace and generate a unique certificate for the user with specific permissions (using the aforementioned RBAC mechanism of Kubernetes). The namespace is preconfigured with allotted resources a user can use (number of pods, max memory/CPU, etc.)

Users receive a kubeconfig file which they use along with Kubernetes’ official CLI tool, kubectl, to interact with their allocated namespace. The kubeconfig file defines which cluster, user, namespace, and authentication mechanisms to use and acts as their authentication key to access their cloud resources.

Then, once users finish a quest, we can scale their resources to 0 to minimize our spending on AWS.

‍

My Takeaway and Wrapping Words

On a personal level, this project proved to be a very worthwhile learning experience. I started it knowing very little about Kubernetes and finished it knowing how to set up and run my own cluster. Kubernetes has many more bells and whistles that I have yet to experiment with, but as a software engineer with little experience in SRE/DevOps land, I feel I’ve leveled up tremendously.

On a team level, this project helped introduce Kubernetes to our team, and many of our engineers have since contributed to the project, including writing our own “Kubernetes First Steps” quest.

On a product level, we achieved our goal of enabling users to learn Kubernetes through hands-on quests, which you are welcome to check out. We plan to expand on it with more content. If you’d like, you can also contribute and write your own quests using our Quest Builder (link).

‍