Designing a Distributed System for an Online Multiplayer Game — Deployment (Part 7)
This is part number seven of the Garage series, you can access all parts at the end of this post.
hmm, let’s get started to deploy all these applications.
The game manager and the game server applications have been dockerized, and some Github Actions have been added to publish the artifacts (docker images) to the Github Container Registry when they are tagged or merged.
Let’s check our requirements again:
- Game Manager
- Redis: used as cache DB and matchmaking queue in the game manager
- MySQL: used as database in the game manager (need persistence storage)
- KubeMQ: used as the event broker
- Game Server
- Ingress: to expose the game manager APIs to the public
I used Hashicorp Terraform (IAC) to create the cluster. I used both the Digitalocean Managed Kubernetes and AWS Managed Kubernetes Service (EKS).
All Kubernetes resources, configs, and terraforms codes are stored in a separate repository to have better control of the codes. I’ll refer to this repository as
AWS Managed Kubernetes Service (EKS)
There are two kinds of resources to define the nodes: node groups and worker groups. The
node groups are EKS managed nodes whereas the
worker groups are self manage nodes. as talked about before, the game servers nodes need to be accessible from the public directly, so we need to define a Security Group for the game servers nodes to open a range of ports for public access (game client). As of this writing, there is an open issue on AWS container roadmap repository which is requested to add an option for node groups to assign a Security group. To fix this challenge, I used
worker groups to define the nodes. Therefore, two
worker groups are needed, one for scheduling the game manager and other services and one to schedule the game servers with a custom Security Group.
AWS Security group for the game servers worker node to open a range of ports:
Digitalocean Managed Kubernetes
We need two node pools here same as EKS, a manager node pool to schedule the game manager, MySQL, Redis, and KubeMQ pods, and a game server node pool to schedule the game server pods.
The game server exposes the ports in the host network namespace, so the clients can connect to the game server directly, and we need to allow incoming packets to be received from the game server port range.
Firewall resource for DO (Terraform):
Remember that the game manager is stateless and can be scaled horizontally but the game server is stateful and is run as a dedicated server and can not be scaled horizontally, we’ll scale the game server node pool manual. (in a different blog post)
It’s needed to be mentioned that I know about GitOps and the tools which make all these easy and secure but I wanted to try the fundamentals.
I used Kustomize to manage the K8S configurations (YAML files) for different environments. The directory structure of each service:
There are two main directories, base, and overlays. the base directory includes the default configs and overlays include all environments configs.
config.env: the hard-coded default configs like default ports
deployment.yaml: the default Deployment file
kustomization.yaml: the default Kustomization file to define resources and configs
service.yaml: the default Service file
config.env: the hard-coded configs for environment like ports
config.secret: the environment secrets, like passwords and …
deployment-patch.yaml: the Deployment patch file to merge with the base (default) deployment like changing the replicas
kustomization.yaml: the Kustomization file to define the base resources and configs maps for the environment
Kustomize uses the base config and merges the overlays configs with it.
Game Server Configs
The game server has no Deployment file because it is run by the game manager using Kubernetes API. Actually, the deployment configs are passed via an API call instead of YAML file.
Each game container has been limited to a certain amount of CPU resources. To calculate this limit, I put some pressure on the pod (the worst scenario) and measure the maximum CPU resource usage by the
kubectl top command. You need to install Kubernetes metrics tools to use this command:
The game manager limits CPU resources while creating the game server pod using the Kubernetes API.
The game manager needs permissions to create, watch or list pods across the cluster. To achieve this, the RBAC Authorization is used to define Role and ClusterRole and bind them to the API Group.
The Nginx ingress proxies the game manager exposed ports to the public. First, we need to deploy the Nginx ingress controller:
This command will deploy the Nginx ingress controller for the Digitalocean provider.
These pods need to be exposed in the cluster by services (ClusterIP):
- Game manager
Now, let's take a look at the diagram we talked about in the architecture post:
- The game manager is replicated and multiple instances are run concurrently.
- The Nginx ingress works as the load balancer to proxy the user connection from the public to the game manager instances.
- The user opens a long-living connection to one of the game manager pods.
- The game node pool is scaled manually.
Applying the configs for the first time
The k8s object configs must be applied in order because of dependencies and this pipeline is automated in the Makefile.
Docker Registry secret
The artifacts (docker images) are published on the ghcr privately and to access them in the cluster, a secret for
docker-registry is created.
The Role and ClusterRole configs are applied first to make the game manager able to access the pods and APIs.
The KubeMQ configs are applied to expose it as a service.
The Redis configs are applied to expose it as a service.
MySQL needs persistent storage to store the data to prevent data loss on the pod destruction. after applying the MySQL configs, we must wait for the pod to be ready and responsive, then create the required database and users. Kubernetes has an API for pod readiness status, therefore, I created a shell script to wait for the MySQL pod readiness status first, before continuing to apply other objects configs.
The shell script checks the pod readiness in a loop with sleep, then it pings the
mysqladmin in the pod to make sure that the MySQL daemon is alive. when everything was ok, it creates a database and users and grants the privileges to the users. Now the main process can be continued.
It needs the ingress controller to be deployed first, then a shell script waits for the controller pod readiness status. Sequentially, the ingress configs are applied.
After that, another shell script waits for the load balancer external IP, this might take some time.
Note that the AWS load balancer uses a hostname instead of an IP. because the load balancer's IP may change and the domain helps to resolve the correct load balancer IP.
Updating the DNS Records
After getting the load balancer public IP (DO) or public hostname (AWS), we need to update the DNS records for our API domain.
I wrote a shell script to update the domain DNS records using the Cloudflare API. the script creates an “A” record to Digitalocean load balancer IP or a “CNAME” record to the AWS load balancer hostname.
After all, it’s the game manager configs’ turn to be applied.
Now all of our applications and services are running:
It’s time to test all these together:
In the next part, we’ll take a look at the CI/CD pipeline to automate the integrations and deployments.