Today, in 2024, many monitoring systems on the market are slowly fading out of everyone's view, while some new monitoring systems are gradually coming into their own. Today we'll take a look at what IT operations monitoring systems are most noteworthy in the year 2024.
Prometheus
Undoubtedly, Prometheus is the most noteworthy monitoring system, because the specification and ecology of Prometheus are very powerful, a lot of middleware, databases, directly built-in support for Prometheus, such as ETCD, Kubernetes, RabbitMQ, Nginx VTS, and so on, this potential is very powerful.
Especially in container and microservice monitoring scenarios, the Prometheus ecosystem is a no-brainer because:
- Due to the relatively short lifecycle of resources, monitoring targets are usually discovered based on service discovery, rather than asset-managed (which is what Zabbix is).
- There is a strong need for multi-dimensional filtering, such as aggregation, filtering, grouping by tags, etc. A targeted Query Language is needed. A targeted Query Language is needed, and PromQL was created for this purpose.
Of course, we're talking about the Prometheus ecosystem here, and it's not necessarily true that we use the Prometheus binary, because Prometheus itself doesn't have very good storage and query performance, so many companies choose to use Prometheus-compatible products such as VictoriaMetrics, Thanos, and so on.
Grafana
Prometheus can take care of data collection, storage issues, and provide query interface, query language, but for the display of data, Prometheus itself is not very powerful, usually people will choose to use Grafana as a display tool.
Grafana not only provides a lot of Dashboard templates for Prometheus, but also supports a variety of data sources, such as InfluxDB, Elasticsearch, Loki, MySQL, PostgreSQL, CloudWatch, Zabbix, and so on. Grafana's visualization capabilities are basically the standard or even the de facto standard in the open source space. Grafana's visualization capabilities are basically the benchmark or even the de facto standard in the open source space.
Nightingale
Many companies have multiple sets of Prometheus, I have seen a company in the community has more than 200 sets of Prometheus, four, five, eight or nine sets of even more abound, at this time, we would like to unify the management, for example, the company has eight sets of Kubernetes, each set of Kubernetes has a Prometheus, these For example, there are 8 sets of Kubernetes in the company, each set of Kubernetes has a Prometheus, the data of these Prometheus is similar, and the alarm rules are common, so every time you modify an alarm rule, you have to modify 8 sets of Prometheus, which is troublesome. In addition, the monitoring capability, as a basic capability, is usually open to all business R&D teams in the company, which requires some authority control and knowledge precipitation capabilities, Nightingale can help you solve these problems.
The core of Nightingale is to make an alarm engine, support docking Prometheus, VictoriaMetrics, Thanos, M3DB, Loki and other data sources, unified management alarm rules. Moreover, considering the scenario of network fragmentation in the edge server room, even if the network between the edge server room and the center server room breaks down, the edge server room can still generate and send alarms in a self-closing loop.
Zabbix
Zabbix is relatively old, good at monitoring servers and network devices, not good at monitoring Kubernetes and microservices. Since more and more companies are adopting the public cloud, which naturally takes care of monitoring hardware and network devices, Zabbix's market share is gradually declining.
Many domestic companies are using Zabbix, the community is more active, many companies based on Zabbix encapsulated commercial products, if you are a network engineer or system operation and maintenance, Zabbix is still worth paying attention to.
(sth. or sb) else
Of course, there are some other monitoring systems, such as Cacti, Nagios, are too old, do not recommend the use of Cacti in the circle of network workers still have a certain market share, Nagios basically disappeared.
Monitoring, as an important means of stability assurance, involves a very mixed content, if you are looking for party B to assist in building monitoring, observability programs, welcome to contact us to do product technology exchange:/contact/。