How I migrated my hobby project to k8s

In this article, I want to talk about my hobby project for searching and classifying apartment rental ads from the social network vk.com and my experience moving it to k8s.

A bit about the project

In March 2017, I launched a service for parsing and classifying apartment rental ads from vk.com.

You can read more here about how I tried different ways to classify ads and eventually settled on the lexical parser Yandex Tomita Parser.

You can also read here about the project's architecture at the start and the technologies used and why.

The development of the first version of the service took about a year. I wrote scripts in Ansible to deploy each service component. Occasionally, the service didn't work due to bugs in the overly complicated code or incorrect component settings.

In June 2019, an error in the parser code was found that prevented new ads from being collected. Instead of fixing it, I decided to temporarily turn off the parser.

The reason for restoring the service was learning k8s.

Getting to know k8s

k8s is open-source software for automating the deployment, scaling, and management of containerized applications.

The entire infrastructure of the service is described in configuration files, usually in yaml format.

I won't go into the inner workings of k8s, but I'll give some information about some of its components.

k8s components

Example of a config with ConfigMap

containers:
    -   name: collect-consumer
        image: mrsuh/rent-collector:1.3.1
        envFrom:
            -   configMapRef:
                    name: collector-configmap-1.1.0
            -   secretRef:
                    name: collector-secrets-1.0.0

Example of a config with Labels

apiVersion: apps/v1
kind: Deployment
metadata:
    name: deployment-name
    labels:
        app: deployment-label-app
spec:
    selector:
        matchLabels:
            app: pod-label-app
    template:
        metadata:
            name: pod-name
            labels:
                app: pod-label-app
        spec:
            containers:
                -   name: container-name
                    image: mrsuh/rent-parser:1.0.0
                    ports:
                        -   containerPort: 9080

---

apiVersion: v1
kind: Service
metadata:
    name: service-name
    labels:
        app: service-label-app
spec:
    selector:
        app: pod-label-app
ports:
    -   protocol: TCP
        port: 9080
        type: NodePort

Getting Ready to Move

Simplifying Features

To make the service more stable and predictable, I removed extra components that didn’t work well and rewrote some main parts.
I decided to stop using:

Service сomponents

After all the changes, the service now looks like this:

Building Docker Images

To manage and monitor components in a consistent way, I decided to:

The Docker images themselves don’t have anything special.

Developing k8s Configuration

Now that I had the components in Docker images, I started creating the k8s configuration.

All components that run as daemons were set up in Deployment.
Each daemon needs to be accessible inside the cluster, so all of them have a Service.
Tasks that need to run on a schedule were set up as CronJob.
Static files (like images, JavaScript, and CSS) are stored in the view container, but they need to be served by an Nginx container. Both containers are in the same Pod.

The file system in a Pod isn’t shared by default, but you can copy all the static files to a shared folder (like emptyDir) when the Pod starts. This folder will be shared between containers but only inside the same Pod.

Example of a config with emptyDir

apiVersion: apps/v1
kind: Deployment
metadata:
    name: view
spec:
    selector:
        matchLabels:
            app: view
    replicas: 1
    template:
        metadata:
            labels:
                app: view
        spec:
            volumes:
                -   name: view-static
                    emptyDir: {}
            containers:
                -   name: nginx
                    image: mrsuh/rent-nginx:1.0.0
                -   name: view
                    image: mrsuh/rent-view:1.1.0
                    volumeMounts:
                        -   name: view-static
                            mountPath: /var/www/html
                    lifecycle:
                        postStart:
                            exec:
                                command: ["/bin/sh", "-c", "cp -r /app/web/. /var/www/html"]

The collector component is used in both Deployment and CronJob.

All these components need to access the vk.com API and share the same access token. To handle this, I used a PersistentVolumeClaim. This storage is connected to each Pod and shared between them, but only on the same node.

Example of a config with PersistentVolumeClaim

apiVersion: apps/v1
kind: Deployment
metadata:
    name: collector
spec:
    selector:
        matchLabels:
            app: collector
    replicas: 1
    template:
        metadata:
            labels:
                app: collector
        spec:
            volumes:
                -   name: collector-persistent-storage
                    persistentVolumeClaim:
                        claimName: collector-pv-claim
            containers:
                -   name: collect-consumer
                    image: mrsuh/rent-collector:1.3.1
                    volumeMounts:
                        -   name: collector-persistent-storage
                            mountPath: /tokenStorage
                    command: ["php"]
                    args: ["bin/console", "app:consume", "--channel=collect"]

                -   name: parse-consumer
                    image: mrsuh/rent-collector:1.3.1
                    volumeMounts:
                        -   name: collector-persistent-storage
                            mountPath: /tokenStorage
                    command: ["php"]
                    args: ["bin/console", "app:consume", "--channel=parse"]

A PersistentVolumeClaim is also used to store database data.
Here’s the final structure (each block groups the Pods of one component):

Setting up the k8s Cluster

First, I set up the cluster locally using Minikube.
Of course, there were some errors, so the following commands helped me a lot:

kubectl logs -f pod-name
kubectl describe pod pod-name 

After I learned how to set up a cluster in Minikube, it was easy for me to set it up in DigitalOcean.
In conclusion, I can say that the service has been working steadily for 2 months. You can see the full configuration here.