Our engineering department uses Prometheus for a lot of our developer tasks. I have seen it used effectively for my work in multiple ways, as well as having a positive impact on the department.
I am a software engineer on the analytics team at callstats.io that works with a lot of our artificial intelligence algorithms. I received my Ph.D. in Electrical Engineering and Automation from Aalto University in 2018, as well as my M.Sc. in Machine Automation from Tampere University of Technology in 2012.
Before Prometheus, we struggled with lacking significant visibility of our artificial intelligence algorithm performance. In my case, this included hidden trends and anomalies.
Additionally, a big discussion internally at callstats.io has been about how to follow and maintain a culture of responsibility. Specific to the engineering team, this means ensuring we are all responsible for the code and features we deliver. Prometheus has been a great tool to help each developer monitor the performance of their own part and ensure accuracy.
How do we Use Prometheus for Improved Algorithm Performance?
It was difficult to measure the computational time of our algorithms, as some parts of our algorithms get stuck for different reasons and it is difficult to identify problems without visualization. These problems can be caused by many different issues, such as traffic on the database or connection issues. Additionally, it is critical for us to know which part of the algorithm is consuming most of the computational time. Minimizing computational time is key when providing rapid, real-time information to the user.
At callstats.io, we designed an algorithm specifically for trend detection of different appIDs. We trained the model, then tested it and validated it by the data we already had. Initially, the model showed minimum errors. However, the bottlenecks and perceptual results only show up after we are able to actually visualize the performance.
Prometheus helped us by providing real-time monitoring of some important parameters, including computational time of different sections of the algorithms, output of some components including prediction and detection, and outputs of the user notifications.
Additionally, Prometheus enabled us to find bugs in our code that we could not previously identify.
Using Prometheus with Grafana for Improved Team Responsibility
By connecting Prometheus with Grafana, we can visualize trends easily.
For example, we developed an artificial intelligence-based detection system to notify and report hidden trends to our customers. This system works to show trends not only as a time series, but also in real-time. It is critical that this system stays operational consistently. Therefore, we need to be aware of app crashes immediately.
In our prototype, we are able to use Prometheus and Grafana to visualize the traffic of each app and identify if traffic is in an upward or downward trend. If an app crashes, Grafana can send an immediate notification to the app owner. Similarly, if an app is gradually losing users, our model can detect the trend and alert the app owner. In this instance, the app owner is me.
If something goes wrong with the system while I am unavailable, other engineers can take responsibility and report the problem to the team.
For example, Prometheus monitors the output of the artificial intelligence detection system and app traffic. In Grafana, I can set a traffic threshold that sends a notification if it is surpassed. If this happens, I don’t need to be on-premise for the team to be notified and resolve the issue. This is ideal so that we can avoid downtime as much as possible and keep our system running smoothly.
Prometheus has been a great help to our team to make sure we are working effectively and bug-free. We believe making effective products is critical, which is one reason we develop a monitoring process for WebRTC. If you want to improve the quality of audio and video calls in your web application, try a demo of our product today.