In our last blog post, we increased the stability of our Kubernetes cluster and also increased its available resources. With these improvements in place, we can tackle deploying our most complex application yet: Nextcloud. By the end of this blog post, you’ll have insight into the major architecture decisions we made when deploying Nextcloud to Kubernetes. As always, we’ll link the full source code should you want to dive deeper.
In our blog series on decreasing the cost of our Kubernetes cluster, we suggested replacing on-demand EC2 instances with spot instances for Kubernetes’ nodes. When we first introduced this idea, we mentioned that this strategy could have negative impacts on both our applications’ availability and our ability to monitor our applications’ availability. At the time, we still converted to spot instances because we believed the savings benefits were worth the decrease in reliability.
In the first blog post in this series, we examined how our previous deployment strategy of running kubectl apply -f on static manifests did not meet our increasingly complex requirements for our strategy/system for deploying Kubernetes’ applications. In the second, and final, post in this mini-series, we’ll outline the new deployment strategy and how it fulfills our requirements. The new system via GIPHY Our new deployment strategy makes use of Helm, a popular tool in the Kubernetes ecosystem which describes itself as “a tool that streamlines installing and managing Kubernetes applications.
Over the holiday break, I spent a lot of my leisure coding time rethinking the way we deploy applications to Kubernetes. The blog series this post kicks off will explore how we migrated from an overly simplistic deploy strategy to one giving us the flexibility we need to deploy more complex applications. To ensure a solid foundation, in this first post, we’ll define our requirements for deploying Kubernetes’ applications and evaluate whether our previous systems and strategies met these requirements (spoiler alert… it didn’t).
Overall impact In parts one and two of this series, we sought to reduce our AWS costs by optimizing our computing, networking, and storage expenditures. Since this post is the final one in the series, let’s consider how we did in aggregate. Before any resource optimizations, we had the following bill: master ec2 (1 m3.medium): (1 * 0.067 $/hour * 24 * 30) = 48.24 nodes ec2 (2 t2.medium): (2 * 0.
In the previous post in this series, we showed how utilizing Spot Instances and Reserved Instances reduces the annual bill for running our Kubernetes cluster from ~2K to ~1.2K. In this post, we’ll pursue cost reduction for storage and networking resources, our final two prominent, unoptimized costs.1 Our quick calculations from the first post in this series show, that with the default Kops configuration, we pay ~$360 annually for EBS (storage) and ~$216 annually for ELBs (networking), for an annual total of just over $500.
Introduction In my last blog post, I introduced our goal of decreasing the cost of running a personal k8s cluster, and made the case for why decreasing the cost is important. We also did some quick calculations which showed that EC2 instances are the most expensive part of our cluster, costing ~$115 per month or ~$1.4K per year. There’s no time like the present to actually start decreasing EC2 costs, so let’s get down to business.
For the last couple of months, I’ve spent the majority of my non-work coding time creating a Kubernetes of my own. My central thesis for this work is that Kubernetes is one of the best platforms for individual developers who want to self-host multiple applications with “production” performance needs (i.e. hosting a blog, a private Gitlab, a NextCloud, etc.). Supporting this thesis requires multiple forms of evidence. via GIPHY
I’m pretty excited to be writing this blog post, as it is the final one in our SLO Implementation series. via GIPHY In this final post, we’ll discuss using Prometheus Alerting Rules and Alertmanager to notify us when our blog is violating its SLO. Adding this alerting ensures we will be aware of any severe issues our users may face, and allows us to minimize the error budget spent by each incident.
For the past couple of weeks, our Prometheus cluster has been quietly polling this blog’s web server for metrics. Now that we’re collecting the data, our next job is make the data provide value. Our data provides value when it assists us in understanding our application’s past and current SLO adherence, and when it improves our actual SLO adherence. In this blog post, we’ll focus on the first of the two aforementioned value propositions.