When it comes to monitoring Kubernetes workloads, Grafana is the one tool that pops up in every conversation. More often than not, it is paired with Prometheus or Loki. The reason is that Grafana is a mere visualization tool, while the data streams are supplied by either Prometheus or Loki. The inference is that you would need to work on three different tools, right from setting them up to installing them on your clusters and configuring the Grafana dashboard to get insights on what you’re looking for.
Still, you are limited with your analytics as you need a multi-tenant model to gain visibility across all your clusters on one dashboard. Before we look at how Ozone helps simplify this and offers way more than what you can get working on by yourself, let us start from the beginning.
What is Grafana
Grafana is an open-source interactive data visualization platform developed by Grafana Labs. It allows users to see their data via charts and graphs that are unified into one or multiple dashboards for easier interpretation and understanding.
Why do we need alerts?
Alerts are crucial to deploying a service to production, whether your software is a blog, e-commerce website, or some kind of a huge management system. It’s important to know about issues with the software quickly. We can’t stare at dashboards with dozens of metrics all day. And alerts provide an easy way for a machine to alert us about any unusual events happening in our software.
What are grafana alerts ?
Grafana alerts are a way to send notifications when a metric crosses a threshold you have configured. For example, you might want to send a Slack message to your team’s channel when your cloud server’s CPU utilization exceeds 80 percent.
Anatomy of Grafana Alerting System
Grafana alerts are split into four key components
· alert rules
· contact points
· notification policies
· silences
Anatomy of Grafana Alerting System
Alert rules
Alert rules define the trigger of an alert.
An alert rule consists of one or more queries and expressions, a condition, the frequency of evaluation, and, optionally, the duration over which the condition is met.
Contact points
Contact points define the medium to which an alert can be sent. It can be your email groups, slack messages, or any other medium.
Notification policies
Notification policies allow you to specify where and how frequently you want alert notifications to be sent. One common pattern is to limit the number of times a notification is sent during a certain time period.
Silences
Silences are a way to configure periods of time to suppress notifications. During a silence, Grafana will continue to track metrics and trigger alerts, but it won’t send notifications to any of your channels.
Setting up an alert in grafana
We’ll set up the contact point in Grafana in this example. Select the Alert option in the sidebar and switch to the tab for contact points. Click on the Add contact point button to create an email contact point.
Your screen should look something like this while creating an email contact point:
Click on “Test” to send you a test email. Once you receive a test email, you can save this contact point .
Let’s now set up an alert which includes a query when `failed` keys are listed in your app logs.
The above query describes that it will fetch all log lines matching filter namespace=ozone
and the lines which have the keyword error in them.
You can specify the hard limits as to when a notification has to be sent. For example, the above screenshot specifies a threshold of 2 errors every minute, after which it sends a notification. After finishing this step, you will be able to see notifications in your email once the above conditions have been fulfilled. Here’s what the mail alerts look like:
Simplifying Kubernetes Monitoring on Ozone
As we said at the beginning of the blog, for Grafana to work the way it should, Prometheus, Loki, or similar tools would need to be installed on your cluster to send back data for Grafana to process and present on its visual dashboard.
Here’s what Ozone offers to simplify Kubernetes monitoring:
- It bundles all three licenses together (Grafana, Prometheus, and Loki.)
- Offers a multi-tenant option for Grafana. This means that you can view data from multiple clusters on one Grafana dashboard.
- You can configure alerts on Ozone rather than moving away to Grafana.
- Grafana is easy-to-access from the Ozone dashboard without the need for a separately licensed instance for the users.
- Granular cluster-level metrics are made available without the need to move out of Ozone to gather the same.
Setting up alerts on Prometheus (client-side) through the Ozone UI
All you need to add is the alert message, the PromQL query, duration, severity, name of the cluster, and the type of notification alert. Once you click “Create,” the alerts will be saved in an alerts list on Ozone for easy access. As you can see, there’s no need to shift to and from the DevOps platform and Prometheus/Grafana.
The notifications/alerts will then be shown on the Ozone dashboard itself.
Multi-Cluster Grafana Dashboard
Ozone supplies a multi-tenant model for Grafana. This allows you to visualize multiple data points on Grafana across multiple clusters, without running your own multiple separate Grafana licences/instances.
Granular Cluster-Level Visibility on Ozone
Under the Infrastructures screen on Ozone, a detailed cluster screen is provided. One of the tabs is an “Overview” screen that displays the cluster health, something which is not available in most of the “New-age software delivery platforms,” and users are expected to gather the data from the relevant provider portals.