(Part 2) Reducing the Cost of Running a Personal k8s Cluster: Volumes and Load Balancers

(Part 2) Reducing the Cost of Running a Personal k8s Cluster: Volumes and Load Balancers

In the previous post in this series, we showed how utilizing Spot Instances and Reserved Instances reduces the annual bill for running our Kubernetes cluster from ~2K to ~1.2K. In this post, we’ll pursue cost reduction for storage and networking resources, our final two prominent, unoptimized costs.1 Our quick calculations from the first post in this series show, that with the default Kops configuration, we pay ~$360 annually for EBS (storage) and ~$216 annually for ELBs (networking), for an annual total of just over $500.

These costs are significant. Even more troubling, our current Kubernetes implementation has us utilizing an unique ELB for each service, meaning our network resources costs will grow linearly with the number of services running on our cluster. Our goal in this blog post is to show how we can reduce, or at the very least cap, our expenditures for storage and networking resources on our Kubernetes cluster.

via GIPHY

Optimizing EBS volumes

We pay for EBS in two different ways with our Kubernetes cluster.

Types of EBS Costs: Root Volumes

First, we pay for the EBS volumes which Kops attaches directly to the master and nodes in our cluster. We call these the root volumes. The default root volume size for masters is 64GB, and the default root volume size for the node is 128GB. Altogether, Kops allocated 320GB. At $.10 per GB-month, when using gp2 volumes, we pay $32 a month for the root volumes mounted directly to our master and nodes.

I’m a little surprised that Kops uses such a large default volume size, because ssh’ing into the host and running a quick df -h showed our hosts vastly over-provisioned with respect to storage.

We immediately reduced the rootVolumeSize for both the master and the nodes to 30GB and 64GB respectively, using the method described in the Kops documentation.

We now pay ~$15 a month, instead of $32, for our hosts’ EBS volumes, cutting our EBS bill in half for a savings of around $180 annually.

Monitoring and Alerting on Root Volume Provisioning

We can again use a combination of Prometheus and Grafana to verify we did not under-provision our instances with respect to storage when we shrunk the rootVolumeSize.

To do so, we start by running the Prometheus Node Exporter as a DaemonSet on our cluster. We have not yet discussed DaemonSets, but they can be thought of as similar to Deployments, except instead of ensuring our cluster is running n copies of pod, DaemonSets ensure that all nodes run an instance of the specified Pod. Running node monitoring daemons, such as Node Exporter, is one of the most common uses of DaemonSets.

The manifest for our Node Exporter DaemonSet can be seen below. We heavily based it on the manifest defined in the CoreOS Prometheus Operator’s kube-prometheus manifests.

---
# This manifest declares the DaemonSet responsible for managing the Pods which
# will export node metrics to prometheus. A DaemonSet ensures all eligible nodes
# run a copy of the pod.
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
labels:
app: node-exporter
name: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
containers:
- args:
- --web.listen-address=127.0.0.1:9100
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
image: quay.io/prometheus/node-exporter:v0.16.0
name: node-exporter
resources:
limits:
cpu: 102m
memory: 180Mi
requests:
cpu: 102m
memory: 180Mi
volumeMounts:
- mountPath: /host/proc
name: proc
readOnly: false
- mountPath: /host/sys
name: sys
readOnly: false
- mountPath: /host/root
mountPropagation: HostToContainer
name: root
readOnly: true
- args:
- --secure-listen-address=$(IP):9100
- --upstream=http://127.0.0.1:9100/
env:
- name: IP
valueFrom:
fieldRef:
fieldPath: status.podIP
image: quay.io/coreos/kube-rbac-proxy:v0.4.0
name: kube-rbac-proxy
ports:
- containerPort: 9100
hostPort: 9100
name: https
resources:
limits:
cpu: 20m
memory: 40Mi
requests:
cpu: 10m
memory: 20Mi
hostNetwork: true
hostPID: true
nodeSelector:
beta.kubernetes.io/os: linux
securityContext:
runAsNonRoot: true
runAsUser: 65534
serviceAccountName: node-exporter
# Schedule this pod on the master, so we can get node resource usage on
# the master as well.
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
volumes:
- hostPath:
path: /proc
name: proc
- hostPath:
path: /sys
name: sys
- hostPath:
path: /
name: root

Similar to with a Deployment, we define a Service for the Node Exporter, which provides us a single point of access for all the NodeExporter pods running on the different nodes.

---
# This manifest declares the Service, which allows Prometheus to access all
# node-exporters for scraping.
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: node-exporter
name: node-exporter
spec:
# We use ClusterIP because we only want this endpoint accessible within the
# cluster.
clusterIP: None
ports:
- name: https
port: 9100
targetPort: https
selector:
app: node-exporter

You can see all of the manifest files in my personal-k8s repo.

We can configure Prometheus to begin collecting these metrics by adding a NodeExporter ServiceMonitor, as is included in the manifest below.

---
# This manifest declares a ServiceMonitor for node-exporter metrics. Any
# Prometheus clusters which select this ServiceMonitor will be configured to
# scrape all node-exporter metrics.
#
# Based on
# https://github.com/coreos/prometheus-operator/blob/7de4c8c901bb06cf9c9738dff7da9c2d015ccde3/contrib/kube-prometheus/manifests/node-exporter-serviceMonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: node-exporter
app: node-exporter
# Prometheus searches for any ServiceMonitor with the `role=service-monitor` label.
role: service-monitor
name: node-exporter
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https
scheme: https
tlsConfig:
insecureSkipVerify: true
jobLabel: k8s-app
selector:
matchLabels:
k8s-app: node-exporter

Node Exporter gives us two very useful metrics: node_filesystem_free_bytes and node_filesystem_size_bytes. We can use the query below to show the amount of available disk space for each instance’s root volume.

min by(device, instance)(node_filesystem_free_bytes{device=~"/dev/.*"}) /
min by(device, instance)(node_filesystem_size_bytes{device=~"/dev/.*"})

As we show below, we can also graph this expression on Grafana, which gives us insight into available disk space on the root volumes.

We can also add an alert for whenever we have less than X% of available disk space on our partition.

---
# This manifest defines the PrometheusRules specific to resource usage
# for monitoring/alerting our k8s cluster.
#
# - `severity=critical` indicates we should raise an alert via Opsgenie.
# - 15m is the default `for`. We may refine this value over time.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: resource-usage
labels:
app: resource-usage
# Prometheus searches for any PrometheusRules with the `role=alert-rules`
# label.
role: alert-rules
spec:
groups:
- name: resource-usage.rules
rules:
...
- alert: NodeEBSLowAvailableSpace
annotations:
message: 'A direct EBS mount on the node (provisioned via Kops) is almost out of space.'
expr: 'count((min by(device, instance)(node_filesystem_free_bytes{device=~"/dev/.*"}) / min by(device, instance)(node_filesystem_size_bytes{device=~"/dev/.*"})) < .1) >= 1'
for: 15m
labels:
severity: critical

Our initial examination of these graphs shows that even after cutting the default Kops root volume size by over 50%, we’re still over-provisioning with respect to storage. When we analyze resource usage in our cluster as part of our regularly scheduled review, we may decide to shrink the root volumes even more. For now, we feel confident in our decision to decrease the root volume size.

Types of EBS Costs: PersistentVolumes

We also pay for EBS volumes when we request storage via a PersistentVolumeClaim. Essentially, PersistentVolumes are a way for pods to request persistent storage, such as AWS EBS volumes. Currently, only Prometheus requires us to save state, so Prometheus is our only application making use of PersistentVolumes.

As can be seen below, we only request 20GB for each Prometheus pod, for a total of 40GB. Again, at $.10 per GB-month, we are only paying $4 a month for this persistent storage, which feels fairly negligible.

Monitoring and Alerting on PersistentVolume Provisioning

We can copy much of our work around root volume alerting and monitoring to PersistentVolume alerting and monitoring. Literally, the only difference is that instead of the node_filesystem_(free|size)_bytes metrics, we utilize the kubelet_volume_stats_(available|capacity)_bytes metrics. We can use the query below to show the amount of available disk space on Pods’ Persistent Volumes.

min by(persistentvolumeclaim)(kubelet_volume_stats_available_bytes) /
min by(persistentvolumeclaim)(kubelet_volume_stats_capacity_bytes)

We can graph this expression on Grafana, giving us insight into historical available disk space.

We also add an alert for whenever we have less than X% of available disk on our Persistent Volume.

---
# This manifest defines the PrometheusRules specific to resource usage
# for monitoring/alerting our k8s cluster.
#
# - `severity=critical` indicates we should raise an alert via Opsgenie.
# - 15m is the default `for`. We may refine this value over time.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: resource-usage
labels:
app: resource-usage
# Prometheus searches for any PrometheusRules with the `role=alert-rules`
# label.
role: alert-rules
spec:
groups:
- name: resource-usage.rules
rules:
...
- alert: PersistentVolumeClaimLowAvailableSpace
annotations:
message: 'A persistent volume claim is almost out of available space.'
expr: 'count((min by(persistentvolumeclaim)(kubelet_volume_stats_available_bytes) / min by(persistentvolumeclaim)(kubelet_volume_stats_capacity_bytes)) < .1) >= 1'
for: 15m
labels:
severity: critical
...

With this alert, we can be confident in our non-costly allocation of 20GB of persistent storage to our Prometheus pods.

Optimizing ELB load balancers

We now turn to our ELB, for which we pay $18 a month, or $216 annually.

Our Kubernetes cluster needs an ELB whenever we want to make an application running on our Kubernetes cluster publicly available.2 Currently, only our blog is publicly available, meaning we only pay for the single ELB.

However, over time, we will wish to run more publicly available applications. We’d like our ELB expenditures to not grow linearly, as ELBs could quickly become the most expensive component of our cluster. We’d like a way to ensure that no matter how many public facing applications we run, our cluster will only have one ELB.

Fortunately, the Kubernetes Ingress resource solves this issue exactly. An Ingress exposes HTTP(s) routes from outside the cluster to internal services. Most importantly, it allows us to trade multiple services each with their own ELB for one Ingress with one ELB which routes to the correct services based on the HTTP(s) route.

We will implement an Ingress for our cluster as soon as we have another service we want to expose publicly, Our efforts to do so are captured in this ticket, and there will be a blog post describing the implementation after it’s complete.

We can now be confident that our ELB expenditures will stay capped at $18 for our Kubernetes cluster’s entire lifecycle.

Wrapping Up

With our examination of storage and networking resources complete, we have examined all the individual costs making up our Kubernetes cluster. When possible (EC2 and EBS), we reduced costs and when not possible (ELB), we ensured costs will not grow linearly with the number of applications running on our cluster.

In the next, and final, post in this series, we’ll wrap up our cost reduction efforts and discuss our total savings. Additionally, we’ll discuss additional avenues we could pursue for further cost reductions and how we plan to ensure we are remain vigilant about cluster costs over the coming months.

via GIPHY


1. We also pay for S3 storage for our Kops configuration and Route53 entries for DNS, but these costs are negligible.

2. Our blog service is publicly available because the Service specifies a LoadBalancer as its type. When the Service type is LoadBalancer, Kubernetes will allocate a publicly facing load balancer on the cloud provider on which the cluster is running. Since we are using AWS as our cloud provider, it allocates an ELB.