Running a data processing job in Kubernetes
In this section, we will deploy the simple data processing job from Chapter 1 on Kubernetes. We have already developed the job (https://github.com/PacktPublishing/Bigdata-on-Kubernetes/blob/main/Chapter01/run.py) and built a Dockerfile to package it into a container image (https://github.com/PacktPublishing/Bigdata-on-Kubernetes/blob/main/Chapter01/Dockerfile_job).
Now, we have to build a Docker image and push it to a repository that’s accessible to Kubernetes.
docker build --platform linux/amd64 –f Dockerfile_job –t <USERNAME>/dataprocessingjob:v1 . docker push <USERNAME>/dataprocessingjob:v1
Now, we can create a Kubernetes job to run our data processing task. Here’s an example job manifest:
job.yaml
apiVersion: batch/v1 kind: Job metadata: name: dataprocessingjob spec: template: spec: ...