A complete Prometheus based email monitoring system using docker compose

Need help setting up a monitoring system? Here’s a complete, easy to deploy, dockerised monitoring system for a local development environment with email alerts.

Photo by Christian Paul Stobbe on Unsplash

TL;DR

The open source software Prometheus is an effective and reliable way of monitoring a software service like a web application. It will monitor your service and notify you when it goes down. In addition Prometheus also collects numerous kinds of metric data from the target for diagnostics and display purposes. Presented below is a convenient and reliable docker compose script with a complete ensemble of components useful for running a complete Prometheus monitoring system in a local environment for evaluation and integration testing purposes. Be operational with one docker command. Included in the compose design are containers for Prometheus, Prometheus Alertmanager, Mailhog (a test SMTP server) and some Python code acting as a target to be continuously monitored. Prometheus is also able to monitor code developed in Node, Java, Ruby, Go, Rust, C++, C# and PHP to name a few [3]. Easily update the compose script to incorporate your particular service to be monitored.

Introduction
Docker compose design
1. Container descriptions
2. Compose script usage
The email alert flow
Additional metrics
Other development languages can also be monitored
Conclusion
References

Introduction

I recently worked on a project where I designed and implemented a web application for a client. To evaluate the stability of said web application I felt it important to monitor the service over time and have a mechanism to notify me when it goes down. After doing some cursory research I became aware of the open source monitoring software Prometheus, a toolkit useful for monitoring services and sending alerts when they go down [1].

According to the Git repo: “Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed [2].”

Sounds good so far but will it cover my specific requirements? The features I required were:

Be able to continuously monitor the target web application service.
Have SMTP functionality to notify me when the service goes down.
Have a Django or general Python client library. This is because the target web application to be monitored was developed in Django.
To ease deployment on a cloud computer the final solution must be able to be dockerised.
It would be a bonus if additional metrics like CPU usage, memory usage and disk usage is also collected in addition to just monitoring if the service is down.

Preliminary research indicated the Prometheus software will meet all my requirements and is also very well regarded in the industry. However I wanted a way to evaluate its functionality on my local development machine before deploying it to the cloud. Presented below is docker compose based design I created to evaluate Prometheus and also get an idea of how a client library could be integrated into a target service. In the end my evaluation tests showed Prometheus is very well suited for monitoring and the docker compose script I created is very useful not only showing Prometheus’ overall ability to know when a service is down but also to have test environment to:

Understand how Prometheus is used in conjunction with other components to make a complete monitoring system.
Be able to tweak and optimise the monitoring parameters.
Have closed loop verification the client library was correctly integrated into the target service.
Have the whole setup be platform independent by means of running containers in Docker.

In the sections to follow we describe the docker compose design for this monitoring and alert system, explain how to use it and finally how it is suited to also process and display custom metric information.

Docker compose design

Below is the docker compose script. It is part of the complete Github repo which one can find here.

# https://phicygni.com/
# https://github.com/PhiCygni

version: "3"
services:
  # This container contains the Prometheus server
  phi-cygni-prometheus-container:
    image: prom/prometheus:v2.30.3
    container_name: phi-cygni-prometheus-container
    restart: 'no'
    volumes:
      - ./config:/etc/prometheus
    networks:
      - psi-cygni-prometheus-network
    ports:
      - "9090:9090"

  # This container contains the Prometheus Alertmanager server
  phi-cygni-prometheus-alertmanager-container:
    image: prom/alertmanager:v0.23.0
    container_name: phi-cygni-prometheus-alertmanager-container
    restart: 'no'
    volumes:
      - ./config_alertmanager:/etc/alertmanager
    networks:
      - psi-cygni-prometheus-network
    ports:
      - "9093:9093"

  # This container contains the Mailhog SMTP test server
  phi-cygni-mailhog:
    image: mailhog/mailhog:v1.0.1
    container_name: phi-cygni-mailhog
    restart: 'no'
    networks:
      - psi-cygni-prometheus-network
    ports:
      - "1025:1025"
      - "8025:8025"

  # This container contains the Python code running the Prometheus client code
  phi-cygni-python-prometheus-client-container:
    build: ./python
    image: phi-cygni-python-prometheus-client-image:1.0.0
    container_name: phi-cygni-python-prometheus-client-container
    command: sh -c "pip install -r requirements.txt && python test_prometheus_client.py"
    restart: 'no'
    volumes:
      - ./python:/root/python-prometheus-client
    networks:
      - psi-cygni-prometheus-network
    ports:
      - "8000:8000"

  # This container contains the TLS SMTP server which connects to the Mailhog server
  phi-cygni-stunnel:
    image: dweomer/stunnel:latest
    container_name: phi-cygni-stunnel
    restart: 'no'
    environment:
      - STUNNEL_SERVICE=smtps
      - STUNNEL_ACCEPT=465
      - STUNNEL_CONNECT=phi-cygni-mailhog:1025
    ports:
      - "465:465"
    networks:
      - psi-cygni-prometheus-network



networks:
  psi-cygni-prometheus-network:
    driver: bridge

The docker compose script incorporates the following images:

Prometheus: prom/prometheus:v2.30.3
Prometheus Alertmanager: prom/alertmanager:v0.23.0
Stunnel: dweomer/stunnel:latest
Mailhog: mailhog/mailhog:v1.0.1
Python: python:3.8-alpine3.15

All containers are connected though a shared bridge network as shown in the diagram below:

These are the containers of the docker compose script connected to a bridge network.

Container descriptions

Prometheus

The Prometheus container is responsible for continuously monitoring the target service to determine if it is still running and to also collect additional metric information. When it detects the service as being down, an alert is generated which is then sent to the Prometheus Alertmanager container for notification processing.

Prometheus Alertmanager

The Prometheus Alertmanager container is responsible for processing alerts generated by Prometheus and produce email notifications.

Stunnel

Stunnel is a proxy designed to add TLS encryption functionality to existing clients and servers without any changes in the programs’ code [4]. It is necessary because Mailhog (the SMTP server) only supports unencrypted SMTP functionality at present [5]. The Prometheus Alertmanager will send an email to Mailhog via this Stunnel proxy.

Mailhog

Mailhog is a test SMTP server complete with a web UI which makes it very convenient. It is trivial to set up and the perfect SMTP server for this application. I wrote a more detailed article about Mailhog previously and can be found here.

Python target container (target service)

Finally we have the Python container running the target service we want to monitor. For the evaluation test Python code was produced to act as a target service for monitoring. The official Python client library was integrated with the code and the appropriate functions called to serve metric information for monitoring. Find the source as part of the repo here.

"""

This module makes use of the prometheus client library to
provide the prometheus monitoring software with telemetry.

"""
from prometheus_client import start_http_server, Counter, Gauge
import time
# https://pypi.org/project/psutil/
import psutil
import math

objPsutilVirtualMemoryPercent = Gauge('psutil_virtual_memory_percent','psutil_virtual_memory_percent')
objPsutilCpuPercent = Gauge('psutil_cpu_percent','psutil_cpu_percent')
objPsutilDiskUsagePrecent = Gauge('psutil_disk_usage_percent','psutil_disk_usage_percent')
objPsutilSensorsTemperaturesCoretempPackageId0C = Gauge('psutil_sensors_temperatures_coretemp_package_id_0', 'psutil_sensors_temperatures_coretemp_package_id_0')
objPeriodicFunctionCalls = Counter('periodic_function_calls', 'periodic_function_calls')
objPeriodicFunctionArbitraryGraph = Gauge('periodic_function_arbitrary_graph', 'periodic_function_arbitrary_graph')

def vPeriodicFunction(dctCounterPar: dict):
    """ A function which periodically updates prometheus client info. 
    
    """

    # Get the virtual memory as a percentage
    try:
        objPsutilVirtualMemoryPercent.set(psutil.virtual_memory().percent)
    except Exception as e:
        print(f"{e}")

    # Get the CPU utilisation as a percentage
    try:
        objPsutilCpuPercent.set(psutil.cpu_percent(interval=0))
    except Exception as e:
        print(f"{e}")

    # Get the disk usage as a percentage
    try:
        objPsutilCpuPercent.set(psutil.disk_usage('/root/').percent)
    except Exception as e:
        print(f"{e}")

    # Get the CPU sensors temperature for Package ID 0
    try:
        objSensorsTemperatures = psutil.sensors_temperatures()
        objPsutilSensorsTemperaturesCoretempPackageId0C.set(objSensorsTemperatures['coretemp'][0].current)
    except Exception as e:
        print(f"{e}")

    # Keep a counter for how many times this function has been called
    try:
        dctCounterPar['iFunctionCounter'] = dctCounterPar['iFunctionCounter'] + 1
        objPeriodicFunctionCalls.inc(1)
    except Exception as e:
        print(f"{e}")

    # Set an arbitrary graph of a sine function
    try:
        fTrigValue = math.sin(math.pi * (dctCounterPar['iFunctionCounter'] / 450.0))
        objPeriodicFunctionArbitraryGraph.set(fTrigValue)
    except Exception as e:
        print(f"{e}")

    # Sleep for 500 milliseconds
    time.sleep(0.5)

    return


def vMain():
    """ The main function 
    
    """
    dctCounterPar = {}
    dctCounterPar['iFunctionCounter'] = 0

    # Start the prometheus client listen server
    start_http_server(8000)

    # Continuously call the function collecting the telemetry
    while True:
        vPeriodicFunction(dctCounterPar)

    return


if __name__ == '__main__':
    vMain()

Compose script usage

To check out the repo type the following:

$ git clone https://github.com/PhiCygni/prometheus-plus-altermanager-mailhog-smtp-docker-compose.git

Change the directory to the one just checked out:

$ cd prometheus-plus-altermanager-mailhog-smtp-docker-compose/

To build and run the docker containers type the following:

$ docker-compose up

All the containers will be created and started as well. The following ports have been exposed for usage:

Port	Description	URL
9090	Prometheus Web UI	http://localhost:9090
9093	Prometheus Alertmanager Web UI	http://localhost:9093
8025	Mailhog Web UI	http://localhost:8025
8000	Python client Prometheus metric API	http://localhost:8000

To stop the containers type the following:

$ docker-compose stop

In Linux to stop and remove all the containers type the following:

$ ./script-docker-compose-down-remove-everything.sh

In Windows to stop and remove all containers type the following:

$ script-docker-compose-down-remove-everything.bat

The email alert flow

The Prometheus container continuously monitors the Python target service and generates an alert when it goes down. One can induce a target down event by stopping the Python service by means of the command:

$ docker-compose stop phi-cygni-python-prometheus-client-container

Active alerts is shown on the Prometheus UI as in this screen capture:

The Prometheus UI shows the target service as being down.

From here there are several steps which need to happen before an email is generated. The diagram below shows how a Prometheus alert finally translates into getting a notification email:

The flow between containers when an email is generated.

The four steps in the diagram above can be described as:

(1) After the target service goes down, Prometheus will generate an alert and send it to the Alertmanager container via port 9093.

(2) The Alertmanager reacts to the alert by generating an SMTP email and sending it to Stunnel container via port SMTP TLS port 465.

(3) The Stunnel forwards the unencrypted SMTP email to the Mailhog container via port 1025.

(4) The Mailhog container receives the unencrypted SMTP email and displays it on its Web UI.

Alerts received by the Alertmanager will be shown on its UI as in this screen capture:

The Alertmanager UI shows the service as being down.

An example of an alert email notification received by Mailhog is shown in the screen capture below:

An instance down email alert as received on Mailhog.

Additional metrics

Apart from monitoring if services are down, Prometheus also collects and display various kinds of metric information. This screen capture shows the Prometheus UI displaying the CPU temperature as captured from the target Python service:

Prometheus displays the CPU temperature information as a graph.

One can also monitor custom metrics by calling the appropriate client library functions in the target service. For demonstrative purposes I designed the target Python code to generate a floating point value from a sinusoidal function for display over time. It is a completely arbitrary example but demonstrates the functionality. Shown below is a screen capture from the Prometheus UI showing the custom metric as a graph.

A graph of an arbitrary custom metric generated by the Python target service.

Other development languages can also be monitored

It should also be noted Prometheus usage is not limited to monitoring Python code. Prometheus also provides official libraries for the Go, Java, Scala, Ruby and Rust languages. In addition to those there are also unofficial third-party client libraries for Bash, C, C++, Common Lisp, Dart, Elixir, Erlang, Haskell, Lua, C#, Node (Javascript), OCaml, Perl, PHP and R [3]. Prometheus, therefore, will be compatible with almost any kind of code requiring monitoring.

Conclusion

Prometheus is well respected enterprise level monitoring service and the docker compose script presented above incorporates it into a complete and easy to use monitoring system which includes email alerts. The design and usage of the script was explained in addition to showing how email alerts are generated when a target service goes down. Examples of metric information displayed via the Prometheus Web UI were also shown. From here users can easily modify the composer script to incorporate their own specific services for monitoring. The dockerised design of the monitoring system means it can be rapidly deployed, is reproducible and also provides a convenient development environment to tweak monitoring parameters, evaluate alert generation as well as the do the necessary integration with regards to the collection diagnostic metric information.

References

[1] https://prometheus.io/

[2] https://github.com/prometheus/prometheus

[3] https://prometheus.io/docs/instrumenting/clientlibs/

[4] https://www.stunnel.org/

[5] https://github.com/mailhog/MailHog/issues/84