Kubernetes Infrastructure
Production-grade multi-node Kubernetes cluster built on bare-metal hardware — 7 machines, 1 switch, 1 router, 1 radio
Overview
This project is a self-designed, self-built Kubernetes homelab running across 3 desktop PCs and 4 laptops, interconnected through a single managed switch, a dedicated router handling inter-VLAN routing and NAT, and a wireless radio providing mesh backhaul for the laptop nodes. The goal was to replicate a production-grade container orchestration environment entirely from consumer hardware — no cloud, no rented rack space, no managed services.
The cluster runs kubeadm-bootstrapped Kubernetes on Ubuntu Server VMs provisioned inside Proxmox, with Calico as the CNI for pod networking and network policy enforcement, MetalLB for bare-metal LoadBalancer services, Longhorn for distributed persistent storage across nodes, and a full observability stack built on Prometheus + Grafana + Loki. Deployments are automated through Bash scripts and Docker Compose for auxiliary services, with GitOps workflows managing application manifests.
Every component — from the physical cable runs and VLAN trunking to the Kubernetes RBAC policies and Grafana dashboards — was planned, configured, and documented by hand. The cluster currently hosts all self-hosted services (Nextcloud, AI model, DNS, email, wiki, search engine, photo server) in production.
Physical Topology
The entire infrastructure runs from a single physical location. Three desktop PCs serve as the primary compute and storage nodes, while four laptops act as lightweight worker nodes and provide redundancy. All wired nodes connect to a managed switch with 802.1Q VLAN trunking. The router handles inter-VLAN routing, DHCP reservation, NAT to the public internet, and port forwarding for externally-facing services. A wireless radio bridges the laptop nodes into the cluster network over a dedicated 5 GHz backhaul link.
Network Architecture
Network segmentation is enforced at the switch level using three VLANs. The router performs inter-VLAN routing with firewall rules restricting lateral movement between segments. This mirrors enterprise network design where management, application, and storage traffic are isolated to prevent blast radius expansion during a compromise.
| VLAN | ID | Subnet | Purpose |
|---|---|---|---|
| Management | 10 | 10.10.10.0/24 | SSH access, Proxmox UI, router admin, switch management, IPMI/BMC |
| Cluster | 20 | 10.20.20.0/24 | Kubernetes API server, pod-to-pod (Calico overlay), service mesh, MetalLB external IPs |
| Storage | 30 | 10.30.30.0/24 | Longhorn replication, NFS mounts, ZFS snapshot sync, backup traffic |
Kubernetes Architecture
Tech Stack
| Layer | Technology | Role |
|---|---|---|
| Hardware | 3 × Desktop PC, 4 × Laptop | Bare-metal compute and storage nodes |
| Networking | Managed Switch (802.1Q), Router, 5 GHz Radio | VLAN segmentation, NAT, wireless mesh backhaul |
| Hypervisor | Proxmox VE 8.x | VM provisioning, snapshots, live migration |
| OS | Ubuntu Server 24.04 LTS | Minimal server images inside Proxmox VMs |
| Container Runtime | containerd | CRI-compliant runtime for Kubernetes |
| Orchestration | Kubernetes (kubeadm) | Cluster bootstrapping, upgrades, node management |
| CNI | Calico | Pod networking, BGP peering, NetworkPolicy enforcement |
| Load Balancer | MetalLB (L2 mode) | External IP allocation for bare-metal LoadBalancer services |
| Ingress | Nginx Ingress Controller | HTTP/HTTPS routing, TLS termination, path-based routing |
| Storage | Longhorn | Distributed block storage with 3x replication across nodes |
| TLS | Let’s Encrypt + cert-manager | Automated certificate provisioning and renewal |
| Monitoring | Prometheus + Node Exporter | Metrics collection from nodes, pods, and Kubernetes internals |
| Dashboards | Grafana | Visualization, alerting rules, and SLA tracking |
| Logging | Loki + Promtail | Centralized log aggregation and querying |
| Automation | Bash, Docker Compose | Cluster provisioning scripts, auxiliary service orchestration |
| DNS | Pi-hole + Unbound | Internal cluster DNS resolution and ad/threat blocking |
Build Process
Hardware Preparation & Network Wiring
Each desktop PC and laptop was prepared with a clean BIOS/UEFI configuration, boot order set to PXE/USB, and hardware diagnostics run to verify RAM, disk health, and thermal performance. Cat6 Ethernet cables were run from each desktop to the managed switch. The switch was configured with three VLANs (10, 20, 30) and trunk ports for the router uplink. The wireless radio was mounted and configured in bridge mode on VLAN 20 to extend the cluster network to the laptop nodes over 5 GHz.
# Switch VLAN configuration (CLI example)
enable
configure terminal
vlan 10
name MANAGEMENT
vlan 20
name CLUSTER
vlan 30
name STORAGE
# Trunk port to router
interface GigabitEthernet0/1
switchport mode trunk
switchport trunk allowed vlan 10,20,30
# Access ports for desktop nodes (VLAN 20)
interface range GigabitEthernet0/2-4
switchport mode access
switchport access vlan 20
# Trunk port to wireless radio
interface GigabitEthernet0/5
switchport mode trunk
switchport trunk allowed vlan 20
Proxmox Installation & VM Provisioning
Proxmox VE 8.x was installed on each desktop PC as the base hypervisor. Ubuntu Server 24.04 LTS VMs were created on each node with resource allocations matched to the hardware. VMs were configured with two virtual NICs: one on VLAN 20 (cluster traffic) and one on VLAN 30 (storage traffic). Cloud-init templates were used to standardize hostname, SSH keys, and network configuration across all VMs.
# Create a cloud-init template VM on Proxmox
qm create 9000 --name ubuntu-cloud --memory 4096 --cores 2 \
--net0 virtio,bridge=vmbr0,tag=20 \
--net1 virtio,bridge=vmbr0,tag=30 \
--scsihw virtio-scsi-single
# Import Ubuntu cloud image
qm set 9000 --scsi0 local-lvm:0,import-from=/var/lib/vz/template/iso/ubuntu-24.04-server-cloudimg-amd64.img
qm set 9000 --ide2 local-lvm:cloudinit
qm set 9000 --boot order=scsi0
qm set 9000 --serial0 socket --vga serial0
# Configure cloud-init defaults
qm set 9000 --ciuser taki --cipassword changeme \
--sshkeys ~/.ssh/authorized_keys \
--ipconfig0 ip=dhcp \
--ipconfig1 ip=10.30.30.X/24,gw=10.30.30.1
# Clone template for each node
for i in 1 2 3 4 5 6; do
qm clone 9000 10${i} --name k8s-node-${i} --full
qm start 10${i}
done
Kubernetes Cluster Bootstrap with kubeadm
All nodes were prepared with the Kubernetes prerequisites: swap disabled, kernel modules
loaded (br_netfilter, overlay), sysctl parameters set for
IP forwarding, and containerd installed as the CRI runtime. The control plane was
initialized on PC-01, and worker nodes joined using the bootstrap token.
# Run on ALL nodes
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay && sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
sudo swapoff -a && sudo sed -i '/swap/d' /etc/fstab
# Install containerd
sudo apt install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd
# Install kubeadm, kubelet, kubectl
sudo apt install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | \
sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | \
sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update && sudo apt install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
# Initialize control plane (PC-01)
sudo kubeadm init \
--pod-network-cidr=192.168.0.0/16 \
--apiserver-advertise-address=10.20.20.10 \
--control-plane-endpoint=10.20.20.10:6443
mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
# Join workers (run on each worker node)
sudo kubeadm join 10.20.20.10:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
Calico CNI & Network Policy Deployment
Calico was deployed as the CNI plugin for pod networking with BGP peering between nodes and NetworkPolicy enforcement for micro-segmentation. Network policies were written to restrict pod-to-pod traffic: only explicitly allowed communication paths are permitted, following a default-deny ingress posture.
# Install Calico operator and custom resources
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/custom-resources.yaml
# Verify Calico pods are running
kubectl get pods -n calico-system -w
# Default-deny ingress NetworkPolicy (applied per namespace)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
# Allow only Nginx Ingress to reach web pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-to-web
namespace: production
spec:
podSelector:
matchLabels:
app: web
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
MetalLB & Nginx Ingress Controller
MetalLB was deployed in L2 (ARP) mode to provide external IP addresses for LoadBalancer
services on the bare-metal cluster. An IP pool of 10.20.20.200–250 was
reserved on VLAN 20. The Nginx Ingress Controller was deployed as a DaemonSet to handle
all HTTP/HTTPS traffic with TLS termination via cert-manager and Let’s Encrypt.
# Install MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.5/config/manifests/metallb-native.yaml
# Configure IP address pool
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: cluster-pool
namespace: metallb-system
spec:
addresses:
- 10.20.20.200-10.20.20.250
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: cluster-l2
namespace: metallb-system
# Install Nginx Ingress Controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.10.1/deploy/static/provider/baremetal/deploy.yaml
# Install cert-manager for automated TLS
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.0/cert-manager.yaml
# ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
Longhorn Distributed Storage
Longhorn was deployed for persistent storage with 3x replication across the desktop nodes. Each desktop contributes its SSD/NVMe storage to the Longhorn pool. Storage traffic is isolated on VLAN 30 to prevent replication I/O from contesting with cluster API and pod traffic on VLAN 20. Scheduled snapshots and backups to an NFS target provide disaster recovery capability.
# Install Longhorn
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.2/deploy/longhorn.yaml
# Set as default StorageClass
kubectl patch storageclass longhorn -p \
'{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
# Verify volumes and nodes
kubectl -n longhorn-system get pods
kubectl -n longhorn-system get nodes.longhorn.io
# Example PVC using Longhorn
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextcloud-data
namespace: production
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 100Gi
Observability Stack — Prometheus + Grafana + Loki
The full observability stack was deployed using the kube-prometheus-stack Helm chart. Prometheus scrapes metrics from Node Exporter (hardware), kubelet (pod resources), kube-state-metrics (Kubernetes object states), and application-level exporters. Grafana provides pre-built dashboards for cluster health, node resources, pod performance, and Longhorn storage utilization. Loki with Promtail aggregates logs from all pods and system journals into a queryable interface within Grafana.
# Install Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Add repos
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Install kube-prometheus-stack
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set grafana.adminPassword='secureDashboard' \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=longhorn \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
# Install Loki + Promtail
helm install loki grafana/loki-stack \
--namespace monitoring \
--set promtail.enabled=true \
--set loki.persistence.enabled=true \
--set loki.persistence.storageClassName=longhorn \
--set loki.persistence.size=20Gi
# Expose Grafana via Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
namespace: monitoring
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts: [grafana.tyfsadik.org]
secretName: grafana-tls
rules:
- host: grafana.tyfsadik.org
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: monitoring-grafana
port:
number: 80
Services Running on the Cluster
| Service | Namespace | Domain | Replicas |
|---|---|---|---|
| Nextcloud (Cloud Storage) | production | cloud.tyfsadik.org | 2 |
| Private AI Model (Ollama) | ai | ai.tyfsadik.org | 1 |
| Pi-hole + Unbound DNS | dns | Internal | 2 |
| Email Server (Postfix/Dovecot) | tyfsadik.org | 1 | |
| SearXNG Search Engine | search | search.tyfsadik.org | 2 |
| Photo Server | media | photo.tyfsadik.org | 1 |
| Wiki Server (Wikipedia Mirror) | wiki | wiki.tyfsadik.org | 1 |
| Grafana Dashboards | monitoring | grafana.tyfsadik.org | 1 |
| Prometheus + Loki | monitoring | Internal | 1 |
Deployment Workflow
Challenges & Solutions
-
Laptop nodes dropping from cluster over Wi-Fi: The laptop worker nodes
periodically lost connectivity over the wireless bridge, causing kubelet to miss heartbeats
and the control plane to mark them as
NotReady. Resolved by deploying a dedicated 5 GHz radio in bridge mode with a fixed channel, disabling power management on the laptop NICs (iw dev wlan0 set power_save off), and increasing thenode-monitor-grace-periodto 60s to tolerate brief interruptions without evicting pods. - Longhorn replication saturating the network: Initial deployment placed storage replication on the same VLAN as cluster traffic, causing API server latency spikes during large write operations. Resolved by creating a dedicated VLAN 30 for storage traffic and configuring Longhorn to use the VLAN 30 interface for replication.
-
MetalLB ARP conflicts with router DHCP: MetalLB’s L2 mode ARP
responses conflicted with the router’s DHCP leases when the IP pool overlapped with
the DHCP range. Fixed by reserving
10.20.20.200–250as a static range excluded from DHCP and configuring MetalLB to only advertise within that range. -
etcd performance on a single control plane node: With only one control
plane node, etcd write latency spiked during heavy scheduling. Mitigated by placing etcd
data on the NVMe drive (not SSD), tuning
heartbeat-intervalandelection-timeout, and ensuring no other I/O-heavy workloads run on PC-01. - TLS certificate provisioning for multiple subdomains: cert-manager’s HTTP-01 solver required each subdomain to be publicly reachable, which conflicted with internal-only services. Resolved by using DNS-01 challenge validation for internal services and HTTP-01 for public-facing ones.
-
Resource contention on 8 GB laptop nodes: Laptops with only 8 GB RAM
struggled when Longhorn and Prometheus exporters consumed too much memory alongside
application pods. Fixed by applying resource limits and requests to all pods, tainting the
laptop nodes with
node-role=lightweight:PreferNoSchedule, and using node affinity rules to keep heavy workloads on the desktop nodes.
Security Hardening
- RBAC: Least-privilege service accounts per namespace; no pods run with
cluster-admin - NetworkPolicy: Default-deny ingress on all namespaces; explicit allow rules per service
- Pod Security Standards:
restrictedprofile enforced via Pod Security Admission; no privileged containers - Image Scanning: Trivy scans on all container images before deployment; CVE alerts sent to Grafana
- Secrets Management: Kubernetes Secrets encrypted at rest with
aescbcencryption provider - SSH Hardening: Key-only authentication,
fail2banon all nodes, management access restricted to VLAN 10 - Firewall Rules: Router ACLs restrict inter-VLAN traffic; only necessary ports open between segments
What I Learned
- End-to-end Kubernetes cluster lifecycle: bootstrapping, upgrading, scaling, and troubleshooting with kubeadm
- Bare-metal networking for Kubernetes: VLAN segmentation, BGP with Calico, ARP-based load balancing with MetalLB
- Distributed storage engineering: Longhorn replication, IOPS tuning, and failure recovery across physical nodes
- Enterprise-grade observability: Prometheus metric design, Grafana alerting pipelines, Loki log correlation
- Physical infrastructure design: cable management, switch VLAN configuration, wireless backhaul for cluster nodes
- Security-first architecture: network micro-segmentation, RBAC design, pod security standards, image vulnerability scanning
- Resource management on heterogeneous hardware: taints, tolerations, affinity rules, and resource quotas to balance workloads across nodes with different capabilities
- The discipline of documentation: every configuration change logged, every decision recorded, every diagram kept current