Setting up an Air-Gapped RasberryPi 5 Cluster - Phase 1

I've been inspired to create a cluster out of RasberryPis due to the challenge it imposes. Currently, I have two Rasberry Pis and want to set up a cluster between the two. Then in the future add more nodes and eventually have my own tiny server that can do whatever I configure them to do.

However, I must create the foundation, therefore, I will document all that I did to get the RasberryPi's running, how to connect the nodes, and how to set them up for Kuberenetes, the clustering technology that I want to learn.

Halfway through the project, I pursued an air-gapped approach to implementing the cluster because my internet is not stable, so in order to learn kubernetes and understand it, I'm just going to have an isolated network for it.

Here is a summary of what I have under my possesion to get all this to work:

Ethernet Cables
Network Switch Example
Rasberry Pis Example
2.5 SSD (Optional)
Sata to USB (Optional)
Cluster Case (Optional)

Prepping the Rasberry Pi 5

I used the official RasberryPi Imager to image Rasbian OS Lite onto the 2.5 SSD for when I connect it to the Rasberry Pi 5. This image has no desktop since I expect my cluster to be headless and have no peripherals. so I focused on two things, making sure that the network is set on first boot and that it can be powered-on with ethernet. This will give me a true experience with handling a headless server.

There were two files that I edited to make this possible:

config.txt
- boot_delay=5
- Added usb_max_current_enable = 1, to allow PoE (Power over Ethernet)
- Otherwise, the RasberryPi refused to boot completely, I found the boot delay made the first boot more consistent, although restarting may be required
cmdline.txt
- cgroup_enable=cpuset cgroup_enable=memory ip=192.168.1.1 at the end of the command
- This forces a static ip to be assigned to the RasberryPi, this simplifies the configuration greatly. Also, kubernetes requires cgroup configuration to be enabled

Setting up the network

In this airgapped environment, the machines must all communicate through a network switch. I have a basic switch that is unmanaged and is relatively easy to set up. In the future, I'm thinking about getting a more advanced switch once I have enough nodes installed. Regardless, the main point is getting the machines to all communicate with each other which is done by setting their ips to the same subnet. In my case I choose to represent the subnet as 192.168.1.255/24. As seen above, the ip of one of my machines is 192.168.1.1, for my admin box where I ssh into the rasberry pi machines, I've set it to 192.168.1.5 and can expand that when I add more nodes.

Make sure that the Network Switch has connected the Rasberry Pi and the admin box at this point.

After this, I turned on the switch, which provided power to the RasberryPi as planned, after waiting for a minute I then tested the LAN connection by pinging the ip 192.168.1.1, and I recieved a response meaning that I have sucessfully set up an isolated network between my desktop and the RasberryPi. I then used SSH to enter the headless RasberryPi node without any peripherals.

Gathering our dependencies

There are a lot of dependencies and configurations to make for the airgap cluster to succesfully initialize. I'm going to organize it by the tool we're using, for example kubernetes will have its dependencies and docker will have its own dependencies. The architecture of my packages are arm64 as shown below for the RasberryPi, please ensure the archtecture of the packages and your server match.

Here is the list:

Kubernetes
- kubelet:arm64
- kubeadm:arm64
- kubectl:arm64
- kubernetes-cni:arm64
- cri-tools:arm64
- cdebconf:arm64
- conntrack:arm64
- install-info:arm64
- libdebian-installer4:arm64
- libpcre2-8-0:arm64
- libsmartcols1:arm64
- libtextwrap1:arm64
- libc6:arm64
- libcgroup2:arm64
- cgroup-tools:arm64
Docker
- iptables:arm64
- libip6tc2:arm64
- docker-ce:arm64
- docker-ce-cli:arm64
- containerd.io:arm64
- docker-buildx-plugin:arm64
- docker-compose-plugin:arm64
Images
- registry.k8s.io/kube-apiserver:v1.32.3
- registry.k8s.io/kube-controller-manager:v1.32.3
- registry.k8s.io/kube-scheduler:v1.32.3
- registry.k8s.io/kube-proxy:v1.32.3
- registry.k8s.io/coredns/coredns:v1.11.3
- registry.k8s.io/pause:3.10
- registry.k8s.io/etcd:3.5.16-0
- flannel/flannel-cni-plugin:v1.6.2-flannel1
- flannel/flannel:v0.26.5
- registry
Configs
- kube-flannel.yml from https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
- CNI plugins from https://github.com/containernetworking/plugins/releases/download/$cni_version/cni-plugins-linux-$arch-$cni_version.tgz
- crictl tools and config from https://github.com/kubernetes-sigs/cri-tools/releases/download/$crictl_version/crictl-$crictl_version-linux-$arch.tar.gz
- containerd.toml modified from the default config obtained from containerd command
- k8s.conf created for networking tuning
- kubeadm_config.yaml modified from the default config obtained from the kubeadm command
- daemon.json for docker tuning
How to retrieve the dependencies
- Use package manager to download the kubernetes and docker packages
  - using the apt-get download $packages_here command
- Use docker to save the images
  - Pull the images for your architecture using docker pull --platform=linux/arm64 $image_here
  - Save the images into tars using docker save -o image_file.tar $image_here
  - make sure default permissions are set to 644 for the image tar files
- Use curl/wget to save config files from trusted sources
- Use the default config files to modify to suit your needs
  - Consult with my configs to see what I changed exactly to get this to work
  - Consult with the command on how to obtain the default config from the following:
    - containerd
    - kubeadm
    - crictl

My intent for this project was to learn how to set up a cluster and using it to run applications by fire. Most of my issues were related to configurations and networking which I'll detail more below. However, I learned a lot from getting this cluster to run in an airgapped environment, from making SSL certificates, making a registry for images, learning how to use IPtables, learning what kuberenetes needs to work, to improving my skills with using the logs to spot issues. Firstly, I'm going to go into great detail on how to install this cluster manually as it is necessary to understand the automation script that I created to bootstrap the cluster.

Installing the dependencies

Installing all packages using package manager
- make sure to disable the sources.list repositories, since this is an airgapped environment we cannot have the package manager looking to outside sources
- apt -q install $packages_dir/*
- This installs all packages under the specified directory, assuming all packages are contained in a single directory
Unpacking CNI and crictl
- use tar command to unpack the cni-plugins and crictl to the specified system directories
- Since the names are specific I simply used a wildcard to select the tars, but best practice would be to specify the entire file itself using metadata vars to specify versions
- tar -C /opt/cni/bin -xzf $package_dir/configs/cni-plugins-linux-*
- tar -C /usr/local/bin -xzf $package_dir/configs/crictl-*
Disabling swap
- dphys-swapfile swapoff is the command to disable swap for the Rasberry Pi 5
- For other machines, consult with instructions on how to disable swap
modprobe vxlan br_netfilter
- These modules are necessary for flannel to work, simply use the command to insert and use them

The next section will detail how to prepare the image tars for use by the cluster. This will be where our cluster will installs images to spin up.

Creating a Registry and uploading the images

This is a major component for the airgapped cluster. If the system can communicate with a registry using https then this section can be skipped. My solution is to create a local registry within the machine for the cluster to pull images from initially when booting up. It simply requires the official docker registry image and docker itself to host the registry.

Make sure to also have the images required for the cluster and the registry ready in tar files as previously shown.

Loading Images onto Docker

First make sure Docker is installed into the system along with its dependencies as you can consult with above. Then we load the tar files into an image using the following command:

docker load -i path/to/image.tar

An example would be the registry image as shown in the following command:

docker load -i registry.tar

This gives us the ability to transport images without requiring an external internet connection. Now we get to run a local registry for the rest of the cluster to pull from.

Running the registry

Once the registry has been loaded into docker as an image, we run it on a specific port using certificates. I choose to use port 443 since kubernetes requires a https connection anyways. As for the certificates to enable the use of https, I created my own and deployed the custom root authority. To learn how I did this use my documentation on creating a root authority using openSSL.

Running the registry utilizes the following command:

sudo docker run -d --restart=always \
                --name=registry \
                -v "$certs_dir"/:/certs \
                -e REGISTRY_HTTP_ADDR=0.0.0.0:443 \
                -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/registry.crt \
                -e REGISTRY_HTTP_TLS_KEY=/certs/registry.key \
                -p 443:443 \
                registry

-d detaches the container so the regstry can run in the background
--restart=always ensures the container restarts if it fails for any reason
--name=registry makes it easier to track the container
-v lets us take existing certificates we previously made and transport it to the container for use
-e lets us set the environment variables within the container to some value
- REGISTRY_HTTP_ADDR is specifiying the address of the registry
- REGISTRY_HTTP_TLS_CERTIFICATE is the location of the certificates within the container for the registry
- REGISTRY_HTTP_TLS_KEY is the certificate key within the container for the registry
-p lets us connect the port 443 from the container to the host machine
Using the Registry

When kubernetes boots up, part of the process pulls from this local registry based on our configurations. Therefore, we need to have the necessary images that it'll need to pull. Refer to the previous list of images that need to be uploaded, it is just the kubernetes images and the flannel images. Whenever we're manipulating the images we need to specify the version as well. For example, the image pause:3.10 needs to have the name and its version separated by a colon.

To push these images into our local registry use the following commands:

docker tag image:version localhost:443/image:version
docker push localhost:443/image:version

To check if our registry has recieved the images we use the curl command on two separate boxes, itself and a separate node. Itself because the cluster starting must be able to pull from it's own registry, while the separate node is to check if it can pull images from an already established registry on a different box. The command that I used was the following:

curl -vv https://192.168.1.1/v2/_catalog/

This will return the network interactions between machines to verify that the certificates are working and the images that the registry has. This is useful for when firewalls get introduced later.

Creating and placing the Configs

ADD FILES FOR CONFIGS

All of these configurations are necessary for the system to initialize a cluster.

cp containerd.toml /etc/containerd/config.toml
- Necessaary for containerd settings as this is how kubeadm installs images and boots up the cluster
- Settings that I changed: sandbox_image set to version of pause from kubernetes image, disabled_plugins set to false, endpoint under registry.mirrors settings to point to our local registry, and SystemCgroup set to true under containerd.runtimes.runc.options
- Here is my version: containerd.toml
cp k8s.conf /etc/sysctl.d/
- Necessary for kubernetes networking
- Here is my version: k8s.conf
cp crictl.yaml /etc/crictl.yaml
- Not necessary but useful to debug crictl when the kubeadm fails to pull images for whatever reason
- Here is my version: crictl.yaml
kube-flannel.yaml
- Only settings I changed were all instances of image where I replaced ghcr.io to my control node's ip for the local registry.
- Here is my version: kube-flannel.yaml
kubeadm_config.yaml
- I changed the following settings from the default config: advertiseAddress, imagePullPolicy, name, imageRepository, kubernetesVersion, and podSubnet
- Here is my version: kubeadm_config.yaml
Make sure to restart the daemons so they save the config with the following commands:

sudo sysctl --system
sudo systemctl restart containerd
sudo systemctl restart docker

Creating the Firewall and Networking

Using IPTables was confusing at first but after tinkering with the ports and ip addresses needed for the cluster to work I started to understand how the chains and rules worked. One of the first things I did was changing the default policy of the firewall to drop all packets in the INPUT and FORWARD chains. This drops all packets by default if they do not match any existing chains or rules. Additionally, keep the port 22 open to allow our remote sessions without being blocked, we specify the source ip and interface to make the chain more robust and secure in the firewall.

iptables -A INPUT -s $source_ip -i eth0 -p tcp --dport 22 -j ACCEPT
iptables -P INPUT DROP
iptables -P FORWARD DROP

Before we open ports there are two important rules that need to be added to prevent basic communication failures in the cluster. These two rules allow the host system accept traffic from alreadly made connections and to communicate with itself. The first rule is important for when it makes connections to trusted ips and a response is expected. This is better than hunting down how and where content is being sent for responses, in my case an image from a local registry.

iptables -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT -i lo -j accept

Opening a port is done using the following template command:

iptables -A INPUT -s $source_ip -p tcp --dport $port_number -j ACCEPT

We are adding rules to the INPUT chain
Specifiying a source ip or a subnet with the -s option, where our subnet is 192.168.1.255/24
-p lets us choose a protocol, tcp in all cases, except for 8472 where udp is used
--dport lets use specify the destination port number
-j specifies where to send the packet if it satisfies the chain, ACCEPT in this case

Both control plane and worker nodes need to have these ports open using the tcp protocol - 443 - 10250 - 8472, exception: use udp protocol since we're using the vxlan backend

Kubernetes control plane and worker nodes need to have their respective ports open using the tcp protocol.

For control plane nodes
- 6443
- 2379 and 2380
- 10259
For worker nodes
- 10256
- 30000 to 32767

For the flannel pod network we need to allow traffic to and from the pod subnet, the default is 10.244.0.0/16 as shown in the commands. Without this, pods could not communicate to the cluster or with each other. We enable the FORWARD chains as flannel primarily uses it for the pods to communicate with each other.

iptables -A INPUT -s 10.244.0.0/16 -j ACCEPT
iptables -A FORWARD -s 10.244.0.0/16 -j ACCEPT
iptables -A FORWARD -d 10.244.0.0/16 -j ACCEPT

This is required for all nodes so it can reach local registries and other nodes. In my experience, this prevented traffic from being dropped when retrieving images and allowed communication with my local registry. This command is creating a default route through the ethernet interface.

ip route default via eth0

Starting the Cluster, Adding more Nodes, and Testing

This is by far the easiest step, but where you'll see the problems begin to occur. To jumpstart the cluster simply use the following command: kubeadm init --config kubeadm.conf

Where --config is taking in the modified kubeadm.conf file. It'll take a minute or two, but if there's no issue then the command guides you on how to set up permissions to start using kubectl, the command to manage the cluster. Make sure to remember or save the instructions printed when the cluster is initialized. Before we can join another node to the cluster run the following command: kubectl apply -f kube-flannel.yaml

This will apply the kube-flannel config to create two pods that'll facilitate pod communication for the cluster. Always make sure to check the health of all our pods by using kubectl get pods -A, which checks ALL pods regardless of namespace.

If there is an issue, I found using kubectl logs, describe, or searching the internet for how to observe the logs with greater detail is useful to finding the issue.

If everything is running smoothly then we can refer to the earlier instruction we saved earlier to join another node, which involves redoing the entire process of installing dependencies since our network is airgapped. However, I have a handy script that installs everything for me, refer to Automation below to see more about the script. After the node is prepped refer to the join specified by the first node which looks like the following: kubeadm join 192.168.1.1:6443 --discovery-token=sometoken --hash=somehash

Everything should be put together and now we can start deploying. The first thing I did was deploy a busybox image that I have saved earlier to test out if pods can communicate with each other.

The command I used to do this was kubectl run -i -rm -t tester --image=busybox:latest --restart=Never -- sh This gave me a shell to start poking around the cluster. The next section is about the major issues I encountered that prevented the cluster from starting or working properly.

Major Issues that I encountered to get AirGapped network to work

Architecture issue
- Symptoms: Refusal to install, Specified overwhelming amount of packages to install
- Cause: Wrong architecture for packages, system was arm64, packages were amd64.
- Solution: I changed the architecture of all the packages from amd64 to arm64 by adding the arm64 branch to my main machine and downloaded from there.
Config Issue
- Symptoms: Refusal to jumpstart cluster, Some Containers in the cluster had FAILED status, and Container Registry Unreachable.
- Cause: Configuration of config files; wrong versions specifed, wrong repositories specified, or misplaced settings
- Solutions: Obtain default config files and edit appropiate settings, using crictl to find errors (a misplaced setting in one case), replacing specified versions to current versions, and replacing domains with local ones for airgapped capability
Networking Issue
- Symptoms: Refusal to pull from Registry during spin-up and Unable to communicate to other nodes
- Cause: Absence of a https connection, absence of a default route, and missing Firewall rules
- Solutions: Establishing a local ROOT CA enabled https communication, establishing a default route allowed communication with the local registry, enabling communication for established and related connections through the firewall

Automation

I had a directory that contained all the dependencies, images, configs, and certificates needed to allow the cluster to boot. This vastly allowed an easier installation as having a package directory for a script to look through and install dependencies made it easy to bootstrap kubernetes for the airgapped environment. Here's the file structure:
- package
  - certs
    - contains the certificates for the registry and the root CA certificate to distribute
  - configs
    - contains the tar of crictl and cni-plugins, with all other necessary config files
  - docker_packages
    - all packages necessary for docker to operate, self-explanatory
  - images
    - location of all tarballs for docker images
  - kube_packages
    - location of all packages necessary for kubernetes
kubernetets_install.sh is a script that I made to bootstrap both control and worker nodes. It automatically installs dependencies, loads the images, creates the registry, places the configs, and create the iptable firewall rules. Although, it does not set the certificates or modify the sources list automatically, I prefer to do those manually just to strengthen my memory of it as I add more nodes and battle-test this script.

References

These references were the most useful articles in getting my cluster to work, solving the issues I encountered, and getting to understand the system that is a Kubernetes Cluster.

What's next?

I disabled the internet sharing since I've decided to do an "air-gapped" cluster, meaning that this cluster will be isolated from the internet. I changed the method to manual from shared. (Previously I was routing my PIs through my main box to reach the internet via ethernet)
For the next phase, I want to actually deploy things onto the cluster to see how it does. I intend to explore more options like Helm, Ansible, and similar tools to handle configurations of nodes and pods. After observing how pods are deployed and run, I see more reason to explore those technologies. Something I definitely do want to try is to host a website from the cluster and using an outside machine to access it. So the effect is this airgapped cluster can deploy resources like a website that users can use securely.
My end goal is to make a CI/CD pipeline where I can upload code to some repository, pull those changes into my admin box, then transfer it to the cluster which will then apply the changes. I may need to have more applications to test this pipeline but we'll see as I haven't gotten that far at this point.