Seeing Green Alerts in Action
In this lesson, we will see green alert in action.
Next, we’ll take a look at the alerts screen.
open "http://$PROM_ADDR/alerts"
The screen is empty. Do not despair. We’ll get back to that screen quite a few times. The alerts will be increasing as we progress. For now, just remember that’s where you can find your alerts.
Finally, we’ll open the graph screen.
open "http://$PROM_ADDR/graph"
That is where you’ll spend your time debugging issues you’ll discover through alerts.
Retrieve node information using kube_node_info
#
As our first task, we’ll try to retrieve information about our nodes. We’ll use kube_node_info
so let’s take a look at its description (help) and its type.
kubectl -n metrics run -it test \
--image=appropriate/curl \
--restart=Never \
--rm \
-- prometheus-kube-state-metrics:8080/metrics \
| grep "kube_node_info"
The output, limited to the HELP
and TYPE
entries, is as follows.
# HELP kube_node_info Information about a cluster node.
# TYPE kube_node_info gauge
...
🔍 You are likely to see variations between your results and mine. That’s normal since our clusters probably have different amounts of resources, my bandwidth might be different, and so on. In some cases, my alerts will fire, and yours won’t, or the other way around. I’ll do my best to explain my experience and provide screenshots that accompany them. You’ll have to compare that with what you see on your screen.
Now, let’s try using that metric in Prometheus
.
Please type the following query in the expression field.
kube_node_info
Click the Execute button to retrieve the values of the kube_node_info
metric.
🔍 Unlike previous chapters, the Gist from this one 03-monitor.sh contains not only the commands but also
Prometheus expressions
. They are all commented (with#
). If you’re planning to copy & paste the expressions from the Gist, please exclude the comments. Each expression has a# Prometheus expression
comment on top to help you identify it. As an example, the one you just executed is written in the Gist as follows.#Prometheus expression
#kube_node_info
If you check the HELP
entry of the kube_node_info
, you’ll see that it provides information about a cluster node
and that it is a gauge
. “A gauge is a metric that represents a single numerical value that can arbitrarily go up and down”. That makes sense for information about nodes since their number can increase or decrease over time.
Prometheus Gauge
metric #
đź“Ś A
Prometheus gauge
is a metric that represents a single numerical value that can arbitrarily go up and down.
If we focus on the output, you’ll notice that there are as many entries as there are worker nodes in the cluster. The value (1
) is useless in this context. Labels, on the other hand, can provide some useful information. For example, in my case, the operating system (os_image
) is Ubuntu 16.04.5 LTS
. Through that example, we can see that we can use the metrics not only to calculate values (e.g., available memory) but also to get a glimpse into the specifics of our system.
Get hands-on with 1400+ tech skills courses.