metrics代码

sentry 中的 metrics 实现

metrics 相关的实现在文件 pkg/sentry/monitoring/metrics.go 中。

准备工作

变量定义

定义了一些和 metrics 相关的变量:

var (
	// Metrics definitions.
	csrReceivedTotal = stats.Int64(
		"sentry/cert/sign/request_received_total",
		"The number of CSRs received.",
		stats.UnitDimensionless)
	certSignSuccessTotal = stats.Int64(
		"sentry/cert/sign/success_total",
		"The number of certificates issuances that have succeeded.",
		stats.UnitDimensionless)
	certSignFailedTotal = stats.Int64(
		"sentry/cert/sign/failure_total",
		"The number of errors occurred when signing the CSR.",
		stats.UnitDimensionless)
	serverTLSCertIssueFailedTotal = stats.Int64(
		"sentry/servercert/issue_failed_total",
		"The number of server TLS certificate issuance failures.",
		stats.UnitDimensionless)
	issuerCertChangedTotal = stats.Int64(
		"sentry/issuercert/changed_total",
		"The number of issuer cert updates, when issuer cert or key is changed",
		stats.UnitDimensionless)
	issuerCertExpiryTimestamp = stats.Int64(
		"sentry/issuercert/expiry_timestamp",
		"The unix timestamp, in seconds, when issuer/root cert will expire.",
		stats.UnitDimensionless)

	// Metrics Tags.
	failedReasonKey = tag.MustNewKey("reason")
	noKeys          = []tag.Key{}
)

目前总共有 6 个 metrics 指标:

  • csrReceivedTotal:接收到的 csr 的数量
  • certSignSuccessTotal:签署成功的证书数量
  • certSignFailedTotal:签署失败的证书数量
  • serverTLSCertIssueFailedTotal:服务器TLS证书发放失败的次数。
  • issuerCertChangedTotal: 当签发人的证书或钥匙被改变时,签发人证书更新的数量
  • issuerCertExpiryTimestamp:发行人/根证书有效期的unix时间戳,单位是秒。

初始化

初始化 metrics:

func InitMetrics() error {
  // 将 6 个 metrics 指标都注册起来
	return view.Register(
		diagUtils.NewMeasureView(csrReceivedTotal, noKeys, view.Count()),
		diagUtils.NewMeasureView(certSignSuccessTotal, noKeys, view.Count()),
		diagUtils.NewMeasureView(certSignFailedTotal, []tag.Key{failedReasonKey}, view.Count()),
		diagUtils.NewMeasureView(serverTLSCertIssueFailedTotal, []tag.Key{failedReasonKey}, view.Count()),
		diagUtils.NewMeasureView(issuerCertChangedTotal, noKeys, view.Count()),
		diagUtils.NewMeasureView(issuerCertExpiryTimestamp, noKeys, view.LastValue()),
	)
}

收集 metrics

crs 相关

CertSignRequestReceived() 对接收到的 csr 数量进行计数:

// CertSignRequestReceived counts when CSR received.
func CertSignRequestReceived() {
	stats.Record(context.Background(), csrReceivedTotal.M(1))
}

另外 CertSignSucceed() 会对处理成功的情况进行计数:

func CertSignSucceed() {
	stats.Record(context.Background(), certSignSuccessTotal.M(1))
}

而 CertSignFailed() 则会对处理失败的情况进行计数:

func CertSignFailed(reason string) {
	stats.RecordWithTags(
		context.Background(),
		diagUtils.WithTags(certSignFailedTotal.Name(), failedReasonKey, reason),
		certSignFailedTotal.M(1))
}

三者的调用点为 server.go 中的 SignCertificate() 函数,这个函数负责处理 csr 请求:

func (s *server) SignCertificate(ctx context.Context, req *sentryv1pb.SignCertificateRequest) (*sentryv1pb.SignCertificateResponse, error) {
  // 进来就计数:这是 接收到的 csr 数量
	monitoring.CertSignRequestReceived()
  ......
  
  // 每一个错误在return之前都要进行一次失败计数
	if err != nil {
		monitoring.CertSignFailed("cert_parse")
		return nil, err
	}
  ......
  // 如果最后 csr 处理成功,则进行成功计数
  monitoring.CertSignSucceed()

	return resp, nil
}

证书有效期

IssuerCertExpiry() 方法记录 root cert 有效期的情况:

// IssuerCertExpiry records root cert expiry.
func IssuerCertExpiry(expiry *time.Time) {
	stats.Record(context.Background(), issuerCertExpiryTimestamp.M(expiry.Unix()))
}

调用点在 sentry.go 中的 createCAServer() 函数中:

func (s *sentry) createCAServer(ctx context.Context) (ca.CertificateAuthority, identity.Validator) {
	certAuth, authorityErr := ca.NewCertificateAuthority(s.conf)
	trustStoreErr := certAuth.LoadOrStoreTrustBundle(ctx)
	......
	certExpiry := certAuth.GetCACertBundle().GetIssuerCertExpiry()
	monitoring.IssuerCertExpiry(certExpiry)
	......
	return certAuth, v
}

在 CA server 的创建过程中,会加载 trust bundle并检查证书的有效期,在这里记录有效期的数据收集。

服务器证书签发失败

ServerCertIssueFailed() 记录服务器证书签发失败。

func ServerCertIssueFailed(reason string) {
	stats.Record(context.Background(), serverTLSCertIssueFailedTotal.M(1))
}

调用点在 server.go 中:


func (s *server) Run(ctx context.Context, port int, trustBundler ca.TrustRootBundler) error {
  ......
  tlsOpt := s.tlsServerOption(trustBundler)
  s.srv = grpc.NewServer(tlsOpt)
  ......
}

sentry server启动过程中,在启动 grpc server 时,需要获取 tls server 的参数,期间要获取 sentry server 的服务器端证书:

func (s *server) tlsServerOption(trustBundler ca.TrustRootBundler) grpc.ServerOption {
	cp := trustBundler.GetTrustAnchors()

	config := &tls.Config{
		ClientCAs: cp,
		// Require cert verification
		ClientAuth: tls.RequireAndVerifyClientCert,
		GetCertificate: func(*tls.ClientHelloInfo) (*tls.Certificate, error) {
			if s.certificate == nil || needsRefresh(s.certificate, serverCertExpiryBuffer) {
				cert, err := s.getServerCertificate()
				if err != nil {
					monitoring.ServerCertIssueFailed("server_cert")
					log.Error(err)
					return nil, fmt.Errorf("failed to get TLS server certificate: %w", err)
				}
				s.certificate = cert
			}
	......
}

如果获取失败,则会记录这个失败信息。

发行者证书变更

IssuerCertChanged() 记录发行人凭证的变更:

func IssuerCertChanged() {
	stats.Record(context.Background(), issuerCertChangedTotal.M(1))
}

调用点在 main.go 中的 main() 函数中,sentry 在启动后会监视发行者证书(默认为 “/var/run/dapr/credentials” 下的 “issuer.crt” 文件):

func main() {
  ......
			func(ctx context.Context) error {
				select {
				case <-ctx.Done():
					return nil

				case <-issuerEvent:
					monitoring.IssuerCertChanged()
					log.Debug("received issuer credentials changed signal")
				......
	}
  ......
  	// Watch for changes in the watchDir
	mngr.Add(func(ctx context.Context) error {
		log.Infof("starting watch on filesystem directory: %s", watchDir)
		return fswatcher.Watch(ctx, watchDir, issuerEvent)
	})
}