(Part 4) SLO Implementation: Alerting

(Part 4) SLO Implementation: Alerting

I’m pretty excited to be writing this blog post, as it is the final one in our SLO Implementation series. via GIPHY In this final post, we’ll discuss using Prometheus Alerting Rules and Alertmanager to notify us when our blog is violating its SLO. Adding this alerting ensures we will be aware of any severe issues our users may face, and allows us to minimize the error budget spent by each incident.
(Part 3) SLO Implementation: Deploying Grafana

(Part 3) SLO Implementation: Deploying Grafana

For the past couple of weeks, our Prometheus cluster has been quietly polling this blog’s web server for metrics. Now that we’re collecting the data, our next job is make the data provide value. Our data provides value when it assists us in understanding our application’s past and current SLO adherence, and when it improves our actual SLO adherence. In this blog post, we’ll focus on the first of the two aforementioned value propositions.
(Part 2) SLO Implementation: Prometheus Up & Running

(Part 2) SLO Implementation: Prometheus Up & Running

For all of you just itching to deploy another application to your Kubernetes cluster, this post is for you. via GIPHY In it, I’ll be discussing deploying Prometheus, the foundation of our planned monitoring and alerting, to our Kubernetes cluster. This post will only discuss getting the Prometheus cluster running on our Kubernetes cluster. I’ll leave setting up monitoring, alerting, and useful visualizations for a later blog post in the series.
Personal k8s Cluster Roadmap

Personal k8s Cluster Roadmap

The Problem So far, my ideas for experimenting with my personal Kubernetes cluster have been spread out across discrete blog posts. As a result, its difficult for me, and I imagine y’all as the readers, to track a prioritized list of projects. via GIPHY I also think that, in the future, it will be useful for us to be able to easily see which projects have been completed and which have not.
(Part 1) SLO Implementation: Release the Metrics

(Part 1) SLO Implementation: Release the Metrics

In the blog post overviewing our SLO implementation, I listed configuring our blog to expose the metrics for Prometheus to scrape as the first step. To fulfill that promise, this post examines the necessary steps for taking our static website and serving it via a production web server which exposes the latency and success metrics our SLO needs. A brief examination of Prometheus metrics Application monitoring has two fundamental components: instrumentation and exposition.
(Part 0) SLO Implementation: Overview

(Part 0) SLO Implementation: Overview

My last two blog posts enumerated this blog’s SLO and error budget. Our next logical step is adding the monitoring and alerting infrastructure which will transform our SLO usage from theoretical to practical. Like creating a Kubernetes of One’s Own, this project contains multiple steps which we’ll explore over multiple blog posts. While this series focuses on achieving this goal for this blog’s specific SLO, the techniques are applicable to many scenarios.
This Blog Has an Error Budget Policy

This Blog Has an Error Budget Policy

In my last blog post, I publicized an SLO for this blog. I also mentioned that, in the future, I’d couple the SLO with an error budget and error budget policy. Well, the future is today, because this post will define error budgets and error budget policies and their benefits, before proposing a specific error budget and error budget policy to accompany our previously defined SLO. What are Error Budget and Error Budget Policies?
This Blog Has an SLO

This Blog Has an SLO

Background I recently started reading The Site Reliability Workbook, which is the companion book to the excellent Site Reliability Engineering: How Google Runs Production Systems. via GIPHY These books devote considerable attention to Service Level Ojectives (SLOs), which are a way of defining a given level of service that users can expect. More technically, a SLO is a collection of Service Level Indicators (SLIs), metrics that measure whether our service is providing value, and their acceptable ranges.
Hosting Static Blog on Kubernetes

Hosting Static Blog on Kubernetes

In my last three blog posts, we focused on creating a Kubernetes cluster you can use for your own personal computing needs. But what good is a Kubernetes cluster if we’re not using it to run applications? Spoiler alert, not much. Let’s make your Kubernetes cluster worth the cash you’re paying and get some applications running on it. In this post, we’ll walk through deploying your first application to Kubernetes: a static blog.
(Part 2) A Kubernetes of One's Own: Can We Build It? Yes We Can!

(Part 2) A Kubernetes of One's Own: Can We Build It? Yes We Can!

In my last blog post, we outlined the different methods of creating and maintaining a Kubernetes cluster, before deciding on Kops. In this blog post, we’ll actually create the cluster using Kops. I’ll provide source code and instructions, so by the end of this post, you can have your own Kubernetes cluster! This tutorial is strongly based on Kops AWS tutorial, although its even simplifier because I’ve written some generic terraform configurations which simplify initial AWS configuration.