我选择的开源运维监控系统-Prometheus

2020-09-04 15:24:08来源：阅读：-

运维监控系统，不用多介绍，网上有很多介绍的文章。

为什么选择Prometheus呢？大家应该都有自己的理由。

部署简单，Go写的程序基本上解压就可以使用，不用太多的依赖安装。
服务自动化发现，可以使用console，也可以使用动态文件的方式（这个是我想用的）。
高性能，单点就有很高的性能，目前很多监控系统都支持分布式，单点性能不好，靠分布式来缓解压力。
强大的查询语句，有很多实用的函数，可以提供给我们查询分析。
结合grafana可以快速地实现图表。

不喜欢有千万种理由，但喜欢，一个理由就够了。

部署测试

在Prometheus的官网，可以快速下载到安装包，就是一个压缩包而已。有支持了很多平台，我们还是习惯在Linux上进行部署。

tar zxvf prometheus-2.20.0.linux-amd64.tar.gz

prometheus的配置文件是yaml格式的，默认配置文件：

# my global config
global:  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
# Alertmanager configurationalerting:  alertmanagers:  - static_configs:    - targets:      # - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:  # - "first_rules.yml"
  # - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

global：prometheus的全局配置，比如采集间隔，抓取超时时间等。
alerting：配置altermanager，prometheus将报警规则推送到指定的alertmanager实例地址
rule_files：报警规则文件， prometheus根据规则信息，会推送报警信息到alertmanager中。
scrape_configs：抓取配置，prometheus通过这里的配置采集数据。

默认的监听端口是9090，这里默认配置就有抓取prometheus本身指标的配置。

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']

配置启动

prometheus没有Daemon形式执行的，可以使用nohup在后台运行。也可以利用systemd来管理，直接编写一个service文件就可以了。

vim /usr/lib/systemd/system/prometheus.service

[Unit]
Description=prometheusAfter=network.target[Service]Type=simpleWorkingDirectory=/opt/prometheus/prometheus
ExecStart=/opt/prometheus/prometheus --config.file="/opt/prometheus/prometheus.yml"
LimitNOFILE=65536
PrivateTmp=trueRestartSec=2
StartLimitInterval=0Restart=always[Install]WantedBy=multi-user.target

systemd时候记得一定要加上WorkingDirectory的工作目录，这个决定你的数据存放到什么地方。

systemd管理：

systemctl daemon-reload
systemctl enable prometheussystemctl start prometheus
systemctl stop prometheus
systemctl restart prometheus

部署node_exporter

node_exporter是用于获取主机监控指标的程序，也是Go写的，部署一样方便，二进制直接执行就可以了。

tar  zxvf node_exporter-1.0.1.linux-amd64.tar.gz

同样也可以使用systemd执行服务管理

vim /usr/lib/systemd/system/node_exporter.service

[Unit]
Description=prometheusAfter=network.target[Service]Type=simpleWorkingDirectory=/opt/node_expaorter/node_expaorter
ExecStart=/opt/node_expaorter/node_expaorter
LimitNOFILE=65536
PrivateTmp=trueRestartSec=2
StartLimitInterval=0
Restart=always[Install]WantedBy=multi-user.target

systemd管理：

systemctl daemon-reload
systemctl enable node_exportersystemctl start node_exporter
systemctl stop node_exporter
systemctl restart node_exporter

增加监控

编辑prometheus的配置文件prometheus.yaml，添加上job。

复制上面的prometheus的job_name，注意yaml的格式，需要相同的缩进，如果只有几台固定的机器，直接使用静态配置就可以了。

  - job_name: 'node_exporter'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:    - targets: ['192.168.122.100:9100','192.168.122.101:9100','192.168.122.101:9100']

如果有很多台主机需要监控，而且有可能随时需要增加或者删除，这时候最简单的方式就是使用监听目录的文件变化来加载对应的配置。配置也很简单

  - job_name: 'node_exporter'
    refresh_interval: 1m
    file_sd_configs:
      - files:
        - /opt/prometheus/servers/*.json

这样子就设置根据文件的变化来修改对应的配置，比较灵活。

refresh_interval 指定文件检测的间隔，默认是5分钟。

只要是配置目录下后缀名是.json的文件都会被自动加载，当然文件格式是有要求的。

[
   {        "targets": [
            "192.168.122.100:9100"
        ],        "labels": {
            "instance": "192.168.122.100",
            "job": "node_exporter"
        }    },    {        "targets": [
            "192.168.122.101:9100"
        ],        "labels": {
            "instance": "192.168.122.101",
            "job": "node_exporter"
        }    }]

这里可以对标签进行修改或者增加。也可以分成多个文件。格式都要一样的。

重载配置

如果是静态配置需要重载Prometheus的进程，通过kill -HUP [pid] 的方式就可以重载了。

也可以通过启用管理api进行重载，这个需要在启动的时候加--web.enable-admin-api的参数

然后通过访问：

curl http://localhost:9090/-/reload

如果是动态文件的方式就不用管了，配置好文件，到时间间隔检测的文件变化就会自动加载了。

总结

不是很简单，一会就可以完成部署，但是还有很多优化的空间，最好是安全的问题：

systemd服务默认是使用root用户启动服务的，我们可以通过设置User=username，来使用普通用户执行，如果之前使用root用户开启过了，就需要修改一下数据目录data的权限。
进行不要开启admin-api，虽然很方便，但是也存在风险，使用动态文件配置就可以轻松解决重载的问题了。
默认服务都是通过http访问的没有密码验证、ssl加密等安全措施，也是一个风险点。解决！prometheus的端口可以使用nginx进行代理，增加https证书即可；针对node_exporter在1.0版本之后也可以使用https+basic_auth的方式增加安全性。

无监控，不运维！安全也很重要！

首页

资讯

财经

科技

汽车

娱乐

时尚

家居

教育

企业

商讯

微商

消费