Over the holiday break, I spent a lot of my leisure coding time rethinking the way we deploy applications to Kubernetes. The blog series this post kicks off will explore how we migrated from an overly simplistic deploy strategy to one giving us the flexibility we need to deploy more complex applications. To ensure a solid foundation, in this first post, we’ll define our requirements for deploying Kubernetes’ applications and evaluate whether our previous systems and strategies met these requirements (spoiler alert… it didn’t).
Overall impact In parts one and two of this series, we sought to reduce our AWS costs by optimizing our computing, networking, and storage expenditures. Since this post is the final one in the series, let’s consider how we did in aggregate. Before any resource optimizations, we had the following bill: master ec2 (1 m3.medium): (1 * 0.067 $/hour * 24 * 30) = 48.24 nodes ec2 (2 t2.medium): (2 * 0.
In the previous post in this series, we showed how utilizing Spot Instances and Reserved Instances reduces the annual bill for running our Kubernetes cluster from ~2K to ~1.2K. In this post, we’ll pursue cost reduction for storage and networking resources, our final two prominent, unoptimized costs.1 Our quick calculations from the first post in this series show, that with the default Kops configuration, we pay ~$360 annually for EBS (storage) and ~$216 annually for ELBs (networking), for an annual total of just over $500.
Introduction In my last blog post, I introduced our goal of decreasing the cost of running a personal k8s cluster, and made the case for why decreasing the cost is important. We also did some quick calculations which showed that EC2 instances are the most expensive part of our cluster, costing ~$115 per month or ~$1.4K per year. There’s no time like the present to actually start decreasing EC2 costs, so let’s get down to business.
For the last couple of months, I’ve spent the majority of my non-work coding time creating a Kubernetes of my own. My central thesis for this work is that Kubernetes is one of the best platforms for individual developers who want to self-host multiple applications with “production” performance needs (i.e. hosting a blog, a private Gitlab, a NextCloud, etc.). Supporting this thesis requires multiple forms of evidence. via GIPHY
I’m pretty excited to be writing this blog post, as it is the final one in our SLO Implementation series. via GIPHY In this final post, we’ll discuss using Prometheus Alerting Rules and Alertmanager to notify us when our blog is violating its SLO. Adding this alerting ensures we will be aware of any severe issues our users may face, and allows us to minimize the error budget spent by each incident.
For the past couple of weeks, our Prometheus cluster has been quietly polling this blog’s web server for metrics. Now that we’re collecting the data, our next job is make the data provide value. Our data provides value when it assists us in understanding our application’s past and current SLO adherence, and when it improves our actual SLO adherence. In this blog post, we’ll focus on the first of the two aforementioned value propositions.
For all of you just itching to deploy another application to your Kubernetes cluster, this post is for you. via GIPHY In it, I’ll be discussing deploying Prometheus, the foundation of our planned monitoring and alerting, to our Kubernetes cluster. This post will only discuss getting the Prometheus cluster running on our Kubernetes cluster. I’ll leave setting up monitoring, alerting, and useful visualizations for a later blog post in the series.
The Problem So far, my ideas for experimenting with my personal Kubernetes cluster have been spread out across discrete blog posts. As a result, its difficult for me, and I imagine y’all as the readers, to track a prioritized list of projects. via GIPHY I also think that, in the future, it will be useful for us to be able to easily see which projects have been completed and which have not.
In the blog post overviewing our SLO implementation, I listed configuring our blog to expose the metrics for Prometheus to scrape as the first step. To fulfill that promise, this post examines the necessary steps for taking our static website and serving it via a production web server which exposes the latency and success metrics our SLO needs. A brief examination of Prometheus metrics Application monitoring has two fundamental components: instrumentation and exposition.