Designing a Distributed System for an Online Multiplayer Game — Deployment (Part 7)

6 min readApr 2, 2022

This is part number seven of the Garage series, you can access all parts at the end of this post.

hmm, let’s get started to deploy all these applications.

The game manager and the game server applications have been dockerized, and some Github Actions have been added to publish the artifacts (docker images) to the Github Container Registry when they are tagged or merged.

Let’s check our requirements again:

Game Manager
Redis: used as cache DB and matchmaking queue in the game manager
MySQL: used as database in the game manager (need persistence storage)
KubeMQ: used as the event broker
Game Server
Ingress: to expose the game manager APIs to the public

Cluster

I used Hashicorp Terraform (IAC) to create the cluster. I used both the Digitalocean Managed Kubernetes and AWS Managed Kubernetes Service (EKS).

All Kubernetes resources, configs, and terraforms codes are stored in a separate repository to have better control of the codes. I’ll refer to this repository as k8s .

AWS Managed Kubernetes Service (EKS)

There are two kinds of resources to define the nodes: node groups and worker groups. The node groups are EKS managed nodes whereas the worker groups are self manage nodes. as talked about before, the game servers nodes need to be accessible from the public directly, so we need to define a Security Group for the game servers nodes to open a range of ports for public access (game client). As of this writing, there is an open issue on AWS container roadmap repository which is requested to add an option for node groups to assign a Security group. To fix this challenge, I used worker groups to define the nodes. Therefore, two worker groups are needed, one for scheduling the game manager and other services and one to schedule the game servers with a custom Security Group.

AWS Security group for the game servers worker node to open a range of ports:

Digitalocean Managed Kubernetes

Node Pools
We need two node pools here same as EKS, a manager node pool to schedule the game manager, MySQL, Redis, and KubeMQ pods, and a game server node pool to schedule the game server pods.

Firewall
The game server exposes the ports in the host network namespace, so the clients can connect to the game server directly, and we need to allow incoming packets to be received from the game server port range.

Firewall resource for DO (Terraform):

Scaling

Remember that the game manager is stateless and can be scaled horizontally but the game server is stateful and is run as a dedicated server and can not be scaled horizontally, we’ll scale the game server node pool manual. (in a different blog post)

Kustomize

It’s needed to be mentioned that I know about GitOps and the tools which make all these easy and secure but I wanted to try the fundamentals.

I used Kustomize to manage the K8S configurations (YAML files) for different environments. The directory structure of each service:

There are two main directories, base, and overlays. the base directory includes the default configs and overlays include all environments configs.

Base

config.env: the hard-coded default configs like default ports
deployment.yaml: the default Deployment file
kustomization.yaml: the default Kustomization file to define resources and configs
service.yaml: the default Service file

Overlays

config.env: the hard-coded configs for environment like ports
config.secret: the environment secrets, like passwords and …
deployment-patch.yaml: the Deployment patch file to merge with the base (default) deployment like changing the replicas
kustomization.yaml: the Kustomization file to define the base resources and configs maps for the environment

Kustomize uses the base config and merges the overlays configs with it.

Game Server Configs

The game server has no Deployment file because it is run by the game manager using Kubernetes API. Actually, the deployment configs are passed via an API call instead of YAML file.

Quotas
Each game container has been limited to a certain amount of CPU resources. To calculate this limit, I put some pressure on the pod (the worst scenario) and measure the maximum CPU resource usage by the kubectl top command. You need to install Kubernetes metrics tools to use this command:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

The game manager limits CPU resources while creating the game server pod using the Kubernetes API.

Roles

The game manager needs permissions to create, watch or list pods across the cluster. To achieve this, the RBAC Authorization is used to define Role and ClusterRole and bind them to the API Group.

Ingress

The Nginx ingress proxies the game manager exposed ports to the public. First, we need to deploy the Nginx ingress controller:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.1.0/deploy/static/provider/do/deploy.yaml

This command will deploy the Nginx ingress controller for the Digitalocean provider.

Services

These pods need to be exposed in the cluster by services (ClusterIP):

Game manager
KubeMQ
Redis
MySQL

Diagram

Now, let's take a look at the diagram we talked about in the architecture post:

The game manager is replicated and multiple instances are run concurrently.
The Nginx ingress works as the load balancer to proxy the user connection from the public to the game manager instances.
The user opens a long-living connection to one of the game manager pods.
The game node pool is scaled manually.

Applying the configs for the first time

The k8s object configs must be applied in order because of dependencies and this pipeline is automated in the Makefile.

Docker Registry secret

The artifacts (docker images) are published on the ghcr privately and to access them in the cluster, a secret for docker-registry is created.

Roles

The Role and ClusterRole configs are applied first to make the game manager able to access the pods and APIs.

KubeMQ

The KubeMQ configs are applied to expose it as a service.

Redis

The Redis configs are applied to expose it as a service.

MySQL

MySQL needs persistent storage to store the data to prevent data loss on the pod destruction. after applying the MySQL configs, we must wait for the pod to be ready and responsive, then create the required database and users. Kubernetes has an API for pod readiness status, therefore, I created a shell script to wait for the MySQL pod readiness status first, before continuing to apply other objects configs.

The shell script checks the pod readiness in a loop with sleep, then it pings the mysqladmin in the pod to make sure that the MySQL daemon is alive. when everything was ok, it creates a database and users and grants the privileges to the users. Now the main process can be continued.

Nginx Ingress

It needs the ingress controller to be deployed first, then a shell script waits for the controller pod readiness status. Sequentially, the ingress configs are applied.

After that, another shell script waits for the load balancer external IP, this might take some time.

Note that the AWS load balancer uses a hostname instead of an IP. because the load balancer's IP may change and the domain helps to resolve the correct load balancer IP.

Updating the DNS Records

After getting the load balancer public IP (DO) or public hostname (AWS), we need to update the DNS records for our API domain.

I wrote a shell script to update the domain DNS records using the Cloudflare API. the script creates an “A” record to Digitalocean load balancer IP or a “CNAME” record to the AWS load balancer hostname.