调用流程

根据每个构建块提供的功能，分析从请求发出到请求处理完成的整个调用流程。

主要目标是了解请求处理的主流程和代码实现方式，以及相关的结构设计，不深入展开细节。
代码仓库

按照每个代码仓库来遍历所有代码实现，会展开所有细节。

主要目标是摸清dapr代码实现的每一个角落，实现对代码的全面了解。

目标：深入学习 Dapr 源代码，深度掌握 dapr 设计实现

1 - 服务调用源码分析

Dapr服务调用构建块的源码分析

1.1 - 服务调用的主流程

服务调用的主流程分析

1.1.1 - 流程概述

Dapr服务调用的流程和API概述

API 和端口

Dapr runtime 对外提供两个 API，分别是 Dapr HTTP API 和 Dapr gRPC API。另外两个 dapr runtime 之间的通讯 (Dapr internal API) 固定用 gRPC 协议。

两个 Dapr API 对外暴露的端口，默认是：

3500： HTTP 端口，可以通过命令行参数 dapr-http-port 设置
50001： gRPC 端口，可以通过命令行参数 dapr-grpc-port 设置

Dapr internal API 是内部端口，比较特殊，没有固定的默认值，而是取任意随机可用端口。也可以通过命令行参数 dapr-internal-grpc-port 设置。

为了向服务器端的应用发送请求，dapr 需要获知应用在哪个端口监听并处理请求，这个信息通过命令行参数 app-port 设置。Dapr 的示例中一般喜欢用 3000 端口。

调用流程

HTTP 方式

title Service Invoke via HTTP
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
    =App-1
    ----
    client
]
participant SDK_client [
    =SDK
    ----
    client
]
end box
participant daprd_client [
    =daprd
    ----
    client
]
participant daprd_server [
    =daprd
    ----
    server
]

box "App-2"
participant user_code_server [
    =App-2
    ----
    server
]
end box

user_code_client -> SDK_client : Invoke\nService() 
note left: appId="app-2"\nmethodName="method-1"
SDK_client -[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500
|||
daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ ramdon free port
|||
daprd_server -[#blue]> user_code_server :  http (localhost)
note right: HTTP endpoint "method-1" @ 3000

daprd_server <[#blue]-- user_code_server
daprd_client <[#red]-- daprd_server
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client

gRPC 方式

title Service Invoke via gRPC
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
    =App-1
    ----
    client
]
participant SDK_client [
    =SDK
    ----
    client
]
end box
participant daprd_client [
    =daprd
    ----
    client
]
participant daprd_server [
    =daprd
    ----
    server
]

box "App-2"
participant SDK_server [
    =SDK
    ----
    server
]
participant user_code_server [
    =App-2
    ----
    server
]
end box
user_code_server -> SDK_server: AddServiceInvocationHandler("method-1")
SDK_server -> SDK_server: save handler in invokeHandlers["method-1"]
SDK_server --> user_code_server
user_code_client -> SDK_client : Invoke\nService() 
note left: appId="app-2"\nmethodName="method-1"
SDK_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/InvokeService
|||
daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ random free port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal
|||
daprd_server -[#blue]> SDK_server : gRPC (localhost)
note right: 50001\n/dapr.proto.runtime.v1.AppCallback/OnInvoke
SDK_server -> SDK_server: get handler by invokeHandlers["method-1"]
SDK_server -> user_code_server : invoke handler of "method-1"

SDK_server <-- user_code_server
daprd_server <[#blue]-- SDK_server
daprd_client <[#red]-- daprd_server
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client

gRPC proxying 方式

title Service Invoke via gRPC proxying
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
    =App-1
    ----
    client
]
participant SDK_client [
    =SDK
    ----
    client
]
end box
participant daprd_client [
    =daprd
    ----
    client
]
participant daprd_server [
    =daprd
    ----
    server
]

box "App-2"
participant SDK_server [
    =gRPC
    ----
    server
]
participant user_code_server [
    =App-2
    ----
    server
]
end box
user_code_server -> SDK_server
SDK_server --> user_code_server
user_code_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC\n/user.services.ServiceName/Method-1
|||
daprd_client -[#red]> daprd_server : gRPC proxy (remote call)
note right: gRPC\n/user.services.ServiceName/Method-1
|||
daprd_server -[#blue]> SDK_server : gRPC (localhost)
note right: gRPC\n/user.services.ServiceName/Method-1
SDK_server -> user_code_server : 

SDK_server <-- user_code_server
daprd_server <[#blue]-- SDK_server
daprd_client <[#red]-- daprd_server
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client

1.1.2 - 服务调用相关的Runtime初始化

Dapr Runtime中和服务调用相关的初始化流程

在 dapr runtime 启动进行初始化时，需要开启 API 端口并挂载相应的 handler 来接收并处理服务调用的 outbound 请求。另外为了接收来自其他 dapr runtime 的 inbound 请求，还要启动 dapr internal server。

Dapr HTTP API Server(outbound)

在 dapr runtime 中启动 HTTP server

dapr runtime 的 HTTP server 用的是 fasthttp。

在 dapr runtime 启动时的初始化过程中，会启动 HTTP server，代码在 pkg/runtime/runtime.go 中：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
  ......
  // Start HTTP Server
	err = a.startHTTPServer(a.runtimeConfig.HTTPPort, a.runtimeConfig.PublicPort, a.runtimeConfig.ProfilePort, a.runtimeConfig.AllowedOrigins, pipeline)
	if err != nil {
		log.Fatalf("failed to start HTTP server: %s", err)
	}
  ......
}

func (a *DaprRuntime) startHTTPServer(......) error {
	a.daprHTTPAPI = http.NewAPI(......)

	server := http.NewServer(a.daprHTTPAPI, ......)
  if err := server.StartNonBlocking(); err != nil {		// StartNonBlocking 启动 fasthttp server
		return err
	}
}

StartNonBlocking() 的实现代码在 pkg/http/server.go 中：

// StartNonBlocking starts a new server in a goroutine.
func (s *server) StartNonBlocking() error {
  	......
  	for _, apiListenAddress := range s.config.APIListenAddresses {
			l, err := net.Listen("tcp", fmt.Sprintf("%s:%v", apiListenAddress, s.config.Port))
      listeners = append(listeners, l)
		}
  
  	for _, listener := range listeners {
		// customServer is created in a loop because each instance
		// has a handle on the underlying listener.
		customServer := &fasthttp.Server{
			Handler:            handler,
			MaxRequestBodySize: s.config.MaxRequestBodySize * 1024 * 1024,
			ReadBufferSize:     s.config.ReadBufferSize * 1024,
			StreamRequestBody:  s.config.StreamRequestBody,
		}
		s.servers = append(s.servers, customServer)

		go func(l net.Listener) {
			if err := customServer.Serve(l); err != nil {
				log.Fatal(err)
			}
		}(listener)
	}
}

挂载 DirectMessaging 的 HTTP 端点

在 HTTP API 的初始化过程中，会在 fast http server 上挂载 DirectMessaging 的 HTTP 端点，代码在 pkg/http/api.go 中：

func NewAPI(
  appID string,
	appChannel channel.AppChannel,
	directMessaging messaging.DirectMessaging,
  ......
  	shutdown func()) API {
  
  	api := &api{
		appChannel:               appChannel,
		directMessaging:          directMessaging,
		......
	}
  
  	// 附加 DirectMessaging 的 HTTP 端点
  	api.endpoints = append(api.endpoints, api.constructDirectMessagingEndpoints()...)
}

DirectMessaging 的 HTTP 端点的具体信息在 constructDirectMessagingEndpoints() 方法中：

func (a *api) constructDirectMessagingEndpoints() []Endpoint {
	return []Endpoint{
		{
			Methods:           []string{router.MethodWild},
			Route:             "invoke/{id}/method/{method:*}",
			Alias:             "{method:*}",
			Version:           apiVersionV1,
			KeepParamUnescape: true,
			Handler:           a.onDirectMessage,
		},
	}
}

注意这里的 Route 路径 “invoke/{id}/method/{method:*}"， dapr sdk 就是就通过这样的 url 来发起 HTTP 请求。

title Dapr HTTP API 
hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    client
]

-[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500\n/v1.0/invoke/{id}/method/{method}
|||
<[#blue]-- daprd_client

Dapr gRPC API Server(outbound)

启动 gRPC 服务器

在 dapr runtime 启动时的初始化过程中，会启动 gRPC server，代码在 pkg/runtime/runtime.go 中：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
    // Create and start internal and external gRPC servers
	grpcAPI := a.getGRPCAPI()
    
	err = a.startGRPCAPIServer(grpcAPI, a.runtimeConfig.APIGRPCPort)
    ......
}

func (a *DaprRuntime) startGRPCAPIServer(api grpc.API, port int) error {
	serverConf := a.getNewServerConfig(a.runtimeConfig.APIListenAddresses, port)
	server := grpc.NewAPIServer(api, serverConf, a.globalConfig.Spec.TracingSpec, a.globalConfig.Spec.MetricSpec, a.globalConfig.Spec.APISpec, a.proxy)
    if err := server.StartNonBlocking(); err != nil {
		return err
	}
	......
}

// NewAPIServer returns a new user facing gRPC API server.
func NewAPIServer(api API, config ServerConfig, ......) Server {
	return &server{
		api:         api,
		config:      config,
		kind:        apiServer, // const apiServer = "apiServer"
		......
	}
}

注册 Dapr API

为了让 dapr runtime 的 gRPC 服务器能挂载 Dapr API，需要进行注册上去。

注册的代码实现在 pkg/grpc/server.go 中， StartNonBlocking() 方法在启动 grpc 服务器时，会进行服务注册：

func (s *server) StartNonBlocking() error {
		if s.kind == internalServer {
			internalv1pb.RegisterServiceInvocationServer(server, s.api)
		} else if s.kind == apiServer {
            runtimev1pb.RegisterDaprServer(server, s.api)		// 注意：s.api (即 gRPC api 实现) 被传递进去
		}
		......
}

而 RegisterDaprServer() 方法的实现代码在 pkg/proto/runtime/v1/dapr_grpc.pb.go:

func RegisterDaprServer(s grpc.ServiceRegistrar, srv DaprServer) {
	s.RegisterService(&Dapr_ServiceDesc, srv)					// srv 即 gRPC api 实现
}

Dapr_ServiceDesc 定义

在文件 pkg/proto/runtime/v1/dapr_grpc.pb.go 中有 Dapr Service 的 grpc 服务定义，这是 protoc 生成的 gRPC 代码。

Dapr_ServiceDesc 中有 Dapr Service 各个方法的定义，和服务调用相关的是 InvokeService 方法：

var Dapr_ServiceDesc = grpc.ServiceDesc{
	ServiceName: "dapr.proto.runtime.v1.Dapr",
	HandlerType: (*DaprServer)(nil),
	Methods: []grpc.MethodDesc{
		{
			MethodName: "InvokeService",				# 注册方法名
			Handler:    _Dapr_InvokeService_Handler,	# 关联实现的 Handler
		},
        ......
        },
	},
	Metadata: "dapr/proto/runtime/v1/dapr.proto",
}

这一段是告诉 gRPC server：如果收到访问 dapr.proto.runtime.v1.Dapr 服务的 InvokeService 方法的 gRPC 请求，请把请求转给 _Dapr_InvokeService_Handler 处理。

title Dapr gRPC API 
hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    client
]

-[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/InvokeService
|||
<[#blue]-- daprd_client

而 InvokeService 方法相关联的 handler 方法 _Dapr_InvokeService_Handler 的实现代码是：

func _Dapr_InvokeService_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
	in := new(InvokeServiceRequest)
	if err := dec(in); err != nil {
		return nil, err
	}
	if interceptor == nil {
		return srv.(DaprServer).InvokeService(ctx, in)
	}
	info := &grpc.UnaryServerInfo{
		Server:     srv,
		FullMethod: "/dapr.proto.runtime.v1.Dapr/InvokeService",
	}
	handler := func(ctx context.Context, req interface{}) (interface{}, error) {
		return srv.(DaprServer).InvokeService(ctx, req.(*InvokeServiceRequest))		// 这里调用的 srv 即 gRPC api 实现
	}
	return interceptor(ctx, in, info, handler)
}

最后调用到了 DaprServer 接口实现的 InvokeService 方法，也就是 gPRC API 实现。

Dapr Internal API Server(inbound)

启动 gRPC 服务器

在 dapr runtime 启动时的初始化过程中，会启动 gRPC internal server，代码在 pkg/runtime/runtime.go 中：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
	err = a.startGRPCInternalServer(grpcAPI, a.runtimeConfig.InternalGRPCPort)
	if err != nil {
		log.Fatalf("failed to start internal gRPC server: %s", err)
	}
	log.Infof("internal gRPC server is running on port %v", a.runtimeConfig.InternalGRPCPort)
    ......
}

func (a *DaprRuntime) startGRPCInternalServer(api grpc.API, port int) error {
	serverConf := a.getNewServerConfig([]string{""}, port)
	server := grpc.NewInternalServer(api, serverConf, a.globalConfig.Spec.TracingSpec, a.globalConfig.Spec.MetricSpec, a.authenticator, a.proxy)
	if err := server.StartNonBlocking(); err != nil {
		return err
	}
	a.apiClosers = append(a.apiClosers, server)

	return nil
}

特殊处理：端口

grpc internal server 的端口比较特殊，可以通过命令行参数 “–dapr-internal-grpc-port” 指定，而如果没有指定，是取一个随机的可用端口，而不是取某个固定值。这一点和 dapr HTTP api server 以及 dapr gRPC api server 不同。

具体代码实现在文件 pkg/runtime/cli.go 中：

func FromFlags() (*DaprRuntime, error) {	
	var daprInternalGRPC int
	if *daprInternalGRPCPort != "" {
		daprInternalGRPC, err = strconv.Atoi(*daprInternalGRPCPort)
		if err != nil {
			return nil, errors.Wrap(err, "error parsing dapr-internal-grpc-port")
		}
	} else {
		daprInternalGRPC, err = grpc.GetFreePort()
		if err != nil {
			return nil, errors.Wrap(err, "failed to get free port for internal grpc server")
		}
	}
    ......
}

特殊处理：重用 gRPC API handler

Dapr gRPC internal API 实现时有点特殊：

启动了自己的 gRPC server，也有自己的端口。
但是注册的负责处理请求的 handler 却重用了 Dapr gRPC internal API

darp runtime 的初始化代码中，grpcAPI 对象是 GRPC API Server 和 GRPC Internal Server 共用的：

grpcAPI := a.getGRPCAPI()

err = a.startGRPCAPIServer(grpcAPI, a.runtimeConfig.APIGRPCPort)
err = a.startGRPCInternalServer(grpcAPI, a.runtimeConfig.InternalGRPCPort)

从设计的角度看，这样做不好：混淆了对 outbound 请求和 inbound 请求的处理，影响代码可读性。

注册 Dapr API

为了让 dapr runtime 的 gRPC 服务器能挂载 Dapr internal API，需要进行注册。

注册的代码实现在 pkg/grpc/server.go 中， StartNonBlocking() 方法在启动 grpc 服务器时，会进行服务注册：

func (s *server) StartNonBlocking() error {
		if s.kind == internalServer {
			internalv1pb.RegisterServiceInvocationServer(server, s.api)		// 注意：s.api (即 gRPC api 实现) 被传递进去
		} else if s.kind == apiServer {
			runtimev1pb.RegisterDaprServer(server, s.api)
		}
		......
}

而 RegisterServiceInvocationServer() 方法的实现代码在 pkg/proto/internals/v1/service_invocation_grpc.pb.go:

func RegisterServiceInvocationServer(s grpc.ServiceRegistrar, srv ServiceInvocationServer) {
	s.RegisterService(&ServiceInvocation_ServiceDesc, srv)  					// srv 即 gRPC api 实现
}

ServiceInvocation_ServiceDesc 定义

在文件 pkg/proto/internals/v1/service_invocation_grpc.pb.go 中有 internal Service 的 grpc 服务定义，这是 protoc 生成的 gRPC 代码。

ServiceInvocation_ServiceDesc 中有两个方法的定义，和服务调用相关的是 CallLocal 方法：

var ServiceInvocation_ServiceDesc = grpc.ServiceDesc{
	ServiceName: "dapr.proto.internals.v1.ServiceInvocation",
	HandlerType: (*ServiceInvocationServer)(nil),
	Methods: []grpc.MethodDesc{
		{
			MethodName: "CallActor",
			Handler:    _ServiceInvocation_CallActor_Handler,
		},
		{
			MethodName: "CallLocal",
			Handler:    _ServiceInvocation_CallLocal_Handler,
		},
	},
	Streams:  []grpc.StreamDesc{},
	Metadata: "dapr/proto/internals/v1/service_invocation.proto",
}

这一段是告诉 gRPC server：如果收到访问 dapr.proto.internals.v1.ServiceInvocation 服务的 CallLocal 方法的 gRPC 请求，请把请求转给 _ServiceInvocation_CallLocal_Handler 处理。

title Dapr gRPC internal API 
hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    client
]

-[#red]> daprd_client : gRPC (remote call)
note right: gRPC API @ ramdon port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal
|||
<[#red]-- daprd_client

而 CallLocal 方法相关联的 handler 方法 _ServiceInvocation_CallLocal_Handler 的实现代码是：

func _ServiceInvocation_CallLocal_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
	in := new(InternalInvokeRequest)
	if err := dec(in); err != nil {
		return nil, err
	}
	if interceptor == nil {
		return srv.(ServiceInvocationServer).CallLocal(ctx, in)
	}
	info := &grpc.UnaryServerInfo{
		Server:     srv,
		FullMethod: "/dapr.proto.internals.v1.ServiceInvocation/CallLocal",
	}
	handler := func(ctx context.Context, req interface{}) (interface{}, error) {
        // 这里调用的 srv 即 gRPC api 实现
		return srv.(ServiceInvocationServer).CallLocal(ctx, req.(*InternalInvokeRequest))  
	}
	return interceptor(ctx, in, info, handler)
}

最后调用到了 ServiceInvocationServer 接口实现的 CallLocal 方法，也就是 gPRC API 实现。

1.1.3 - 客户端sdk发出服务调用的outbound请求

Dapr客户端sdk封装dapr api，发出服务调用的outbound请求

Java SDK 实现

在业务代码中使用 service invoke 功能的示例可参考文件 java-sdk/examples/src/main/java/io/dapr/examples/invoke/http/InvokeClient.java，代码示意如下：

DaprClient client = (new DaprClientBuilder()).build();
byte[] response = client.invokeMethod(SERVICE_APP_ID, "say", message, HttpExtension.POST, null,
            byte[].class).block();

java sdk 中 service invoke 默认使用 HTTP ，而其他方法默认使用 gRPC，在 DaprClientProxy 类中初始化了两个 daprclient：

client 字段: 类型为 DaprClientGrpc，连接到 127.0.0.1:5001
methodInvocationOverrideClient 字段：类型为 DaprClientHttp，连接到 127.0.0.1:3500

service invoke 方法默认走 HTTP ，使用的是 DaprClientHttp 类型（文件为 src/main/java/io/dapr/client/DaprClientHttp.java）：

  @Override
  public <T> Mono<T> invokeMethod(String appId, String methodName,......) {
    return methodInvocationOverrideClient.invokeMethod(appId, methodName, request, httpExtension, metadata, clazz);
  }
  
  public <T> Mono<T> invokeMethod(InvokeMethodRequest invokeMethodRequest, TypeRef<T> type) {
    try {
      final String appId = invokeMethodRequest.getAppId();
      final String method = invokeMethodRequest.getMethod();
      ......
      Mono<DaprHttp.Response> response = Mono.subscriberContext().flatMap(
          context -> this.client.invokeApi(httpMethod, pathSegments,
              httpExtension.getQueryParams(), serializedRequestBody, headers, context)
      );
  }

在这里根据请求条件设置 HTTP 请求的各种参数，debug 时可以看到如下图的数据v：

最后发出 HTTP 请求的代码在 src/main/java/io/dapr/client/DaprHttp.java 中的 doInvokeApi() 方法：

  private CompletableFuture<Response> doInvokeApi(String method,
                               String[] pathSegments,
                               Map<String, List<String>> urlParameters,
                               byte[] content, Map<String, String> headers,
                               Context context) {
      ......
      Request.Builder requestBuilder = new Request.Builder()
        .url(urlBuilder.build())
        .addHeader(HEADER_DAPR_REQUEST_ID, requestId);
      
    CompletableFuture<Response> future = new CompletableFuture<>();
    this.httpClient.newCall(request).enqueue(new ResponseFutureCallback(future));
    return future;
  }

发出去给 dapr runtime 的 HTTP 请求如下图所示：

调用的是 dapr runtime 的 HTTP API。

注意: 这里调用的 gRPC 服务是 dapr.proto.runtime.v1.Dapr，方法是 InvokeService，和 dapr runtime 中 gRPC API 对应。

title Service Invoke via HTTP
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
    =App-1
    ----
    client
]
participant SDK_client [
    =SDK
    ----
    client
]
end box
participant daprd_client [
    =daprd
    ----
    client
]

user_code_client -> SDK_client : invokeMethod() 
note left: appId="app-2"\nmethodName="method-1"
SDK_client -[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500\n/v1.0/invoke/app-2/method/method-1
|||
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client

Go sdk实现

在 go 业务代码中使用 service invoke 功能的示例可参考 https://github.com/dapr/go-sdk/blob/main/examples/service/client/main.go，代码示意如下：

client, err := dapr.NewClient()
content := &dapr.DataContent{
		ContentType: "text/plain",
		Data:        []byte("hellow"),
	}
// invoke a method named "app-2" on another dapr enabled service named "method-1"
resp, err := client.InvokeMethodWithContent(ctx, "app-2", "method-1", "post", content)

Go sdk 中定义了 Client 接口，文件为 client/client.go：

// Client is the interface for Dapr client implementation.
type Client interface {
    	// InvokeMethod invokes service without raw data
	InvokeMethod(ctx context.Context, appID, methodName, verb string) (out []byte, err error)

	// InvokeMethodWithContent invokes service with content
	InvokeMethodWithContent(ctx context.Context, appID, methodName, verb string, content *DataContent) (out []byte, err error)

	// InvokeMethodWithCustomContent invokes app with custom content (struct + content type).
	InvokeMethodWithCustomContent(ctx context.Context, appID, methodName, verb string, contentType string, content interface{}) (out []byte, err error)
    ......
}

这三个方法的实现在 client/invoke.go 中，都只是实现了对 InvokeRequest 对象的组装，核心的代码实现在 invokeServiceWithRequest 方法中：：

func (c *GRPCClient) invokeServiceWithRequest(ctx context.Context, req *pb.InvokeServiceRequest) (out []byte, err error) {
	resp, err := c.protoClient.InvokeService(c.withAuthToken(ctx), req)
	......
}

InvokeService() 是 protoc 生成的 grpc 代码，在 dapr/proto/runtime/v1/dapr_grpc.pb.go 中，实现如下：

func (c *daprClient) InvokeService(ctx context.Context, in *InvokeServiceRequest, opts ...grpc.CallOption) (*v1.InvokeResponse, error) {
	out := new(v1.InvokeResponse)
	err := c.cc.Invoke(ctx, "/dapr.proto.runtime.v1.Dapr/InvokeService", in, out, opts...)
	......
}

注意: 这里调用的 gRPC 服务是 dapr.proto.runtime.v1.Dapr，方法是 InvokeService，和 dapr runtime 中 gRPC API 对应。

title Service Invoke via gRPC
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
    =App-1
    ----
    client
]
participant SDK_client [
    =SDK
    ----
    client
]
end box
participant daprd_client [
    =daprd
    ----
    client
]

user_code_client -> SDK_client : InvokeMethodWithContent() 
note left: appId="app-2"\nmethodName="method-1"
SDK_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/InvokeService
|||
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client

其他SDK

TODO

分析总结

所有的语言 SDK 都会实现了从客户端 SDK API 调用到发出远程调用请求给 dapr runtime的功能。具体实现上会有一些差别：

go sdk

全部请求走 gPRC API。
Java sdk
- service invoke 默认走 HTTP API，其他请求默认走 gRPC API。
其他SDK
- 待更新

1.1.4 - Dapr Runtime接收服务调用的outbound请求

Dapr Runtime通过gRPC API 和 HTTP API接收来自应用的outbound请求

Dapr runtime 有两种方式接收来自客户端发起的服务调用的 outbound 请求：gRPC API 和 HTTP API。在接收到请求之后，dapr runtime 会将 outbound 请求转发给目标服务的 dapr runtime。

title Daprd Receive inbound Request
hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    client
]
participant daprd_server [
    =daprd
    ----
    server
]

 -[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500 \n/v1.0/invoke/app-2/method/method-1
 -[#blue]> daprd_client : gRPC (localhost)
note right: GRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/InvokeService
|||
daprd_client -> daprd_client: name resolution
|||
daprd_client -[#red]> daprd_server : gRPC (remote call)

HTTP API

Runtime 初始化时，在注册 HTTP 服务时绑定了 handler 实现和 URL 路由:

func (a *api) constructDirectMessagingEndpoints() []Endpoint {
	return []Endpoint{
		{
			Methods:           []string{router.MethodWild},
			Route:             "invoke/{id}/method/{method:*}",
			Alias:             "{method:*}",
			Version:           apiVersionV1,
			KeepParamUnescape: true,
			Handler:           a.onDirectMessage,
		},
	}
}

当 service invoke 的 HTTP 请求进来后，就会被 fasthttp 路由到 Handler 即 HTTP API 实现的 onDirectMessage() 方法中进行处理。

onDirectMessage 的实现代码在文件 pkg/http/api.go, 示意如下：

func (a *api) onDirectMessage(reqCtx *fasthttp.RequestCtx) {
	......
  req := invokev1.NewInvokeMethodRequest(...)
	resp, err := a.directMessaging.Invoke(reqCtx, targetID, req)
	......
}

备注： HTTP API 的这个 onDirectMessage() 方法取名不对，应该效仿 gRPC API，取名为 InvokeService(). 理由是：这是暴露给外部调用的方法，取名应该表现出它对外暴露的功能，即InvokeService。而不应该暴露内部的实现是调用 directMessaging。

HTTP API 的实现也简单，同样，除了基本的请求/应答参数处理之外，就是将转发请求的事情交给了 directMessaging。

gRPC API

Runtime 初始化时，在注册 gRPC 服务时绑定了 gPRC API 实现和 InvokeService gRPC 方法。

当 service invoke 的 gRPC 请求进来后，就会进入 pkc/grpc/api.go 中的 InvokeService 方法：

func (a *api) InvokeService(ctx context.Context, in *runtimev1pb.InvokeServiceRequest) (*commonv1pb.InvokeResponse, error) {
	......
	resp, err := a.directMessaging.Invoke(ctx, in.Id, req)
	......
	return resp.Message(), respError
}

gRPC API 的实现特别简单，除了基本的请求/应答参数处理之外，就是将转发请求的事情交给了 directMessaging。

Name Resolution

TBD

1.1.5 - Dapr Runtime转发outbound请求

客户端的Dapr Runtime将outbound请求转发给远程服务器端的Dapr Runtime

Dapr runtime 之间相互通讯采用的是 gRPC 协议，定义有 Dapr gRPC internal API。比较特殊的是，采用随机空闲端口而不是默认端口。但也可以通过命令行参数 dapr-internal-grpc-port 指定。

title Daprd-Daprd Communication
hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    client
]
participant daprd_server [
    =daprd
    ----
    server
]

 -[#blue]> daprd_client : HTTP (localhost)
 -[#blue]> daprd_client : gRPC (localhost)
|||
daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ ramdon free port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal

pkg/messaging/direct_messaging.go 中的 DirectMessaging 负责实现转发请求给远程 dapr runtime。

接口

DirectMessaging 接口定义，用来调用远程应用：

// DirectMessaging is the API interface for invoking a remote app.
type DirectMessaging interface {
	Invoke(ctx context.Context, targetAppID string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error)
}

只有一个 invoke 方法。

实现流程

流程概况

invoke 方法的实现：

func (d *directMessaging) Invoke(ctx context.Context, targetAppID string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
	app, err := d.getRemoteApp(targetAppID)

	if app.id == d.appID && app.namespace == d.namespace {
		return d.invokeLocal(ctx, req)   // 如果调用的 appid 就是自己的 appid，这个场景好奇怪。忽略这里的代码先
	}
	return d.invokeWithRetry(ctx, retry.DefaultLinearRetryCount, retry.DefaultLinearBackoffInterval, app, d.invokeRemote, req)
}

invokeRemote 方法的代码简化如下：

func (d *directMessaging) invokeRemote(ctx context.Context, appID, namespace, appAddress string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
    // 建立连接
	conn, err := d.connectionCreatorFn(context.TODO(), appAddress, appID, namespace, false, false, false)
    // 构建 gRPC stub 作为 client
	clientV1 := internalv1pb.NewServiceInvocationClient(conn)
    // 调用 gRPC 的 CallLocal 方法发出远程调用请求到另外一个 Dapr runtime
	resp, err := clientV1.CallLocal(ctx, req.Proto(), opts...)
    // 处理应答
	return invokev1.InternalInvokeResponse(resp)
}

发出 gRPC 请求给远程 dapr runtime

CallLocal() 方法的实现在 service_invocation_grpc.pb.go 中，这是 protoc 成生的 gRPC 代码：

func (c *serviceInvocationClient) CallLocal(ctx context.Context, in *InternalInvokeRequest, opts ...grpc.CallOption) (*InternalInvokeResponse, error) {
	out := new(InternalInvokeResponse)
	err := c.cc.Invoke(ctx, "/dapr.proto.internals.v1.ServiceInvocation/CallLocal", in, out, opts...)
	if err != nil {
		return nil, err
	}
	return out, nil
}

可以看到这个 gRPC 请求调用的是 dapr.proto.internals.v1.ServiceInvocation 服务的 CallLocal 方法。

hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    client
]
participant daprd_server [
    =daprd
    ----
    server
]

daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ ramdon free port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal

实现细节

获取远程地址

hide footbox
skinparam style strictuml

participant directMessaging 
participant "Name resolver\n(consul/kubenetes/mdns)" as localNameReSolver

directMessaging -> localNameReSolver : ResolveID()
localNameReSolver -> localNameReSolver: loadBalance()
note right: kubernetes: dns name\ndns: dns name\nconsul: one address(random)\nmdsn: one address(round robbin)
localNameReSolver --> directMessaging
note right: return only one address in local cluster

hide footbox
skinparam style strictuml

participant directMessaging 
participant "Local Name resolver\n(consul/kubenetes/mdns)" as localNameReSolver
participant "External Name resolver\n(synchronizer)" as externalNameReSolver

directMessaging -> localNameReSolver : ResolveID()
localNameReSolver --> directMessaging
note right: return service instance list in local cluster
directMessaging -[#red]> externalNameReSolver : ResolveID()
externalNameReSolver --> directMessaging
note right: return service instance list in external clusters
directMessaging -[#red]> directMessaging: combine the instance list
directMessaging -[#red]> directMessaging: filter by cluster strategy
note right: local-first\nexternal-first\nbroadcast\nlocal-only\nexternal-onluy
directMessaging -> directMessaging: loadBalance()

1.1.6 - Dapr Runtime接收服务调用的inbound请求

Dapr Runtime通过gRPC internal API接收来自客户端Dapr Runtime的inbound请求

Dapr runtime 之间相互通讯走的是 gRPC internal API，这个 API 也只支持 gRPC 协议。

hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    client
]
participant daprd_server [
    =daprd
    ----
    server
]

daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ ramdon free port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal
daprd_server -> daprd_server : interceptor
daprd_server -[#blue]>  : appChannel.InvokeMethod()

接收请求

Runtime 初始化时，在注册 gRPC 服务时绑定了 gPRC Internal API 实现和 CallLocal gRPC 方法。对于访问 dapr.proto.internals.v1.ServiceInvocation 服务的 CallLocal 方法的 gRPC 请求，会将请求转给 _ServiceInvocation_CallLocal_Handler 处理:

func _ServiceInvocation_CallLocal_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
	......
	if interceptor == nil {
		return srv.(ServiceInvocationServer).CallLocal(ctx, in)
	}
	info := &grpc.UnaryServerInfo{
		Server:     srv,
		FullMethod: "/dapr.proto.internals.v1.ServiceInvocation/CallLocal",
	}
	handler := func(ctx context.Context, req interface{}) (interface{}, error) {
        // 这里调用的 srv 即 gRPC api 实现
		return srv.(ServiceInvocationServer).CallLocal(ctx, req.(*InternalInvokeRequest))  
	}
	return interceptor(ctx, in, info, handler)
}

最后进入 CallLocal() 方法进行处理。

备注：初始化的细节，请见前面章节 “Runtime初始化”

期间会有一个 interceptor 的处理流程，细节后面展开。

转发请求

当 internal invoke 的 gRPC 请求进来后，就会进入 pkc/grpc/api.go 中的 CallLocal 方法：

func (a *api) CallLocal(ctx context.Context, in *internalv1pb.InternalInvokeRequest) (*internalv1pb.InternalInvokeResponse, error) {
	// 1. 构造请求
	req, err := invokev1.InternalInvokeRequest(in)
  if a.accessControlList != nil {
		......
	}
	// 2. 通过 appChannel 向应用发出请求
	resp, err := a.appChannel.InvokeMethod(ctx, req)
    // 3. 处理应答
	return resp.Proto(), err
}

处理方式很清晰，基本上就是将请求通过 app channel 转发。Runtime 本身并没有什么额外的处理逻辑。InternalInvokeRequest() 只是简单处理一下参数：

// InternalInvokeRequest creates InvokeMethodRequest object from InternalInvokeRequest pb object.
func InternalInvokeRequest(pb *internalv1pb.InternalInvokeRequest) (*InvokeMethodRequest, error) {
	req := &InvokeMethodRequest{r: pb}
	if pb.Message == nil {
		return nil, errors.New("Message field is nil")
	}

	return req, nil
}

访问控制

期间会有一个 access control （访问控制）的逻辑:

	if a.accessControlList != nil {
		// An access control policy has been specified for the app. Apply the policies.
		operation := req.Message().Method
		var httpVerb commonv1pb.HTTPExtension_Verb
		// Get the http verb in case the application protocol is http
		if a.appProtocol == config.HTTPProtocol && req.Metadata() != nil && len(req.Metadata()) > 0 {
			httpExt := req.Message().GetHttpExtension()
			if httpExt != nil {
				httpVerb = httpExt.GetVerb()
			}
		}
		callAllowed, errMsg := acl.ApplyAccessControlPolicies(ctx, operation, httpVerb, a.appProtocol, a.accessControlList)

		if !callAllowed {
			return nil, status.Errorf(codes.PermissionDenied, errMsg)
		}
	}

细节后面展开。

1.1.7 - Dapr Runtime转发inbound请求

服务器端的Dapr Runtime将inbound请求转发给服务器端的应用

协议和端口的配置

Dapr runtime 将 inbound 请求转发给服务器端应用:

title Daprd-Daprd Communication
hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    client
]
participant daprd_server [
    =daprd
    ----
    server
]
participant user_code_server [
    =App-2
    ----
    server
]

daprd_client -[#red]> daprd_server : Dapr gRPC internal API (remote call)
daprd_server -[#blue]> user_code_server : Dapr HTTP channel API (localhost)
note right: HTTP endpoint @ 3000\nVERB http://localhost:3000/method?query1=value1
daprd_server -[#blue]> user_code_server : Dapr gRPC channel API (localhost)
note right: gRPC endpoint @ 3000\n/dapr.proto.runtime.v1.AppCallback/OnInvoke

app channel 的通讯协议可以是 HTTP 或者 gRPC 协议，可以通过命令行参数 app-port 指定，默认是 HTTP
应用接收请求的端口可以通过命令行参数 app-protocol 指定，没有默认值。
为了控制对应用造成的压力，还引入了最大并发度的概念，可以通过命令行参数 app-max-concurrency 指定。

请求发送的流程

前面分析过，当 internal invoke 的 gRPC 请求进来后，就会进入 pkc/grpc/api.go 中的 CallLocal 方法：

func (a *api) CallLocal(ctx context.Context, in *internalv1pb.InternalInvokeRequest) (*internalv1pb.InternalInvokeResponse, error) {
	......
	resp, err := a.appChannel.InvokeMethod(ctx, req)
  ......
}

然后通过 appChannel 发送请求。

app channel 的建立

app channel 的建立是在 runtime 初始化时，在 pkg/runtime/runtime.go 的 initRuntime() 方法中：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
    ......
    a.blockUntilAppIsReady()

	err = a.createAppChannel()
	a.daprHTTPAPI.SetAppChannel(a.appChannel)
	grpcAPI.SetAppChannel(a.appChannel)
    ......
}

createAppChannel() 的实现，目前只支持 HTTP 和 gRPC：

func (a *DaprRuntime) createAppChannel() error {
    // 为了建立 app channel，必须配置有 app port
	if a.runtimeConfig.ApplicationPort > 0 {
		var channelCreatorFn func(port, maxConcurrency int, spec config.TracingSpec, sslEnabled bool, maxRequestBodySize int, readBufferSize int) (channel.AppChannel, error)

		switch a.runtimeConfig.ApplicationProtocol {
		case GRPCProtocol:
			channelCreatorFn = a.grpc.CreateLocalChannel
		case HTTPProtocol:
			channelCreatorFn = http_channel.CreateLocalChannel
		default:
      // 只支持 HTTP 和 gRPC
			return errors.Errorf("cannot create app channel for protocol %s", string(a.runtimeConfig.ApplicationProtocol))
		}

		ch, err := channelCreatorFn(a.runtimeConfig.ApplicationPort, a.runtimeConfig.MaxConcurrency, a.globalConfig.Spec.TracingSpec, a.runtimeConfig.AppSSL, a.runtimeConfig.MaxRequestBodySize, a.runtimeConfig.ReadBufferSize)
		a.appChannel = ch
	} else {
		log.Warn("app channel is not initialized. did you make sure to configure an app-port?")
	}

	return nil
}

app channel 的配置参数

和 app channel 密切相关的三个配置项，可以从命令行参数中获取：

func FromFlags() (*DaprRuntime, error) {
    ......
    appPort := flag.String("app-port", "", "The port the application is listening on")
	appProtocol := flag.String("app-protocol", string(HTTPProtocol), "Protocol for the application: grpc or http")	
	appMaxConcurrency := flag.Int("app-max-concurrency", -1, "Controls the concurrency level when forwarding requests to user code")

TracingSpec / AppSSL / MaxRequestBodySize / ReadBufferSize 后面细说，先不展开。

HTTP 通道的实现

HTTP Channel 的实现在文件 pkg/channel/http/http_channel.go 中，其 InvokeMethod()方法：

func (h *Channel) InvokeMethod(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
  ......
	switch req.APIVersion() {
	case internalv1pb.APIVersion_V1:
		rsp, err = h.invokeMethodV1(ctx, req)
  ......
	return rsp, err
}

暂时只有 invokeMethodV1 版本：

func (h *Channel) invokeMethodV1(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
  // 1. 构建HTTP请求
	channelReq := h.constructRequest(ctx, req)
  // 2. 发送请求到应用
	err := h.client.DoTimeout(channelReq, resp, channel.DefaultChannelRequestTimeout)
  // 3. 处理返回的应答
	rsp := h.parseChannelResponse(req, resp, err)
	return rsp, nil
}

这是将收到的请求内容，转成HTTP协议的标准格式，然后通过 fasthttp 发给用户代码。其中转为标准http请求的代码在方法 constructRequest() 中：

func (h *Channel) constructRequest(ctx context.Context, req *invokev1.InvokeMethodRequest) *fasthttp.Request {
	var channelReq = fasthttp.AcquireRequest()

	// Construct app channel URI: VERB http://localhost:3000/method?query1=value1
	uri := fmt.Sprintf("%s/%s", h.baseAddress, req.Message().GetMethod())
	channelReq.SetRequestURI(uri)
	channelReq.URI().SetQueryString(req.EncodeHTTPQueryString())
	channelReq.Header.SetMethod(req.Message().HttpExtension.Verb.String())

	// Recover headers
	invokev1.InternalMetadataToHTTPHeader(ctx, req.Metadata(), channelReq.Header.Set)

  ......
}

这样在服务器端的用户代码中，就可以用不引入 dapr sdk，只需要提供标准 http endpoint 即可。

title Daprd-Daprd Communication
hide footbox
skinparam style strictuml

participant daprd_server [
    =daprd
    ----
    server
]
participant user_code_server [
    =App-2
    ----
    server
]

daprd_server -[#blue]> user_code_server : HTTP (localhost)
note right: HTTP endpoint @ 3000\nVERB http://localhost:3000/method?query1=value1

gRPC 通道的实现

pkg/grpc/grpc.go 中的 CreateLocalChannel() 方法：

// CreateLocalChannel creates a new gRPC AppChannel.
func (g *Manager) CreateLocalChannel(port, maxConcurrency int, spec config.TracingSpec, sslEnabled bool, maxRequestBodySize int, readBufferSize int) (channel.AppChannel, error) {
  // IP地址写死了 127.0.0.1
	conn, err := g.GetGRPCConnection(context.TODO(), fmt.Sprintf("127.0.0.1:%v", port), "", "", true, false, sslEnabled)
  ......
	g.AppClient = conn
	ch := grpc_channel.CreateLocalChannel(port, maxConcurrency, conn, spec, maxRequestBodySize, readBufferSize)
	return ch, nil
}

实现代码在 pkg/channel/grpc/grpc_channel.go 的 InvokeMethod()方法中：

func (g *Channel) InvokeMethod(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
  ......
	switch req.APIVersion() {
	case internalv1pb.APIVersion_V1:
		rsp, err = g.invokeMethodV1(ctx, req)
  ......
	return rsp, err
}

暂时只有 invokeMethodV1 版本：

func (g *Channel) invokeMethodV1(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
  // 1. 创建 AppCallback 的 grpc client
	clientV1 := runtimev1pb.NewAppCallbackClient(g.client)
  // 2. 调用 AppCallback 的 OnInvoke() 方法
	resp, err := clientV1.OnInvoke(ctx, req.Message(), grpc.Header(&header), grpc.Trailer(&trailer))
  // 3. 处理返回的应答
	return rsp.WithMessage(resp), nil
}

gRPC channel 是通过 gRPC 协议调用服务器端应用上的 gRPC 服务完成，具体是 AppCallback 的 OnInvoke() 方法。

title Dapr gRPC Channel
hide footbox
skinparam style strictuml

participant daprd_server [
    =daprd
    ----
    server
]
participant user_code_server [
    =App-2
    ----
    server
]


daprd_server -[#blue]> user_code_server : gRPC (localhost)
note right: gRPC endpoint @ 3000\n/dapr.proto.runtime.v1.AppCallback/OnInvoke

也就是说：如果要支持 gRPC channel，则要求服务器端应用必须实现 AppCallback gRPC 服务器，这一点和 HTTP 不同，对服务器端应用是有侵入的。

1.1.8 - 服务器端App接收inbound请求

服务器端App接收标准HTTP请求，或者实现AppCallbackServer以接受gRPC请求

pkg/proto/runtime/v1/appcallback.pb.go 中的 OnInvoke 方法：

// AppCallbackServer is the server API for AppCallback service.
type AppCallbackServer interface {
	// Invokes service method with InvokeRequest.
	OnInvoke(context.Context, *v1.InvokeRequest) (*v1.InvokeResponse, error)
}

为了接收来自daprd转发的来自客户端的service invoke 请求，服务器端的应用也需要做一些处理。

接收HTTP请求

对于通过 HTTP channel 过来的标准HTTP请求，服务器端的应用只需要提供标准的HTTP端口即可，无须引入dapr SDK。

title Daprd-Daprd Communication
hide footbox
skinparam style strictuml

participant daprd_server [
    =daprd
    ----
    server
]
participant user_code_server [
    =App-2
    ----
    server
]

daprd_server -[#blue]> user_code_server : HTTP (localhost)
note right: HTTP endpoint @ 3000\nVERB http://localhost:3000/method?query1=value1

接收gRPC请求

对于通过 gRPC channel 过来的 gRPC 请求，服务器端的应用则需要实现 gRPC AppCallback 服务的 OnInvoke() 方法：

title Dapr gRPC Channel
hide footbox
skinparam style strictuml

participant daprd_server [
    =daprd
    ----
    server
]
participant user_code_server [
    =App-2
    ----
    server
]


daprd_server -[#blue]> user_code_server : gRPC (localhost)
note right: gRPC endpoint @ 3000\n/dapr.proto.runtime.v1.AppCallback/OnInvoke

AppCallbackServer 的 proto 定义在 dapr 仓库下的文件dapr/proto/runtime/v1/appcallback.proto中：

service AppCallback {
  // Invokes service method with InvokeRequest.
  rpc OnInvoke (common.v1.InvokeRequest) returns (common.v1.InvokeResponse) {}
  ......
}

而 AppCallbackServer 的具体实现则分布在各个不同语言的 sdk 里面。

go-sdk实现

实现在 go-sdk 的 service/grpc/invoke.go 文件的 OnInvoke方法，主要流程为：

func (s *Server) OnInvoke(ctx context.Context, in *cpb.InvokeRequest) (*cpb.InvokeResponse, error) {
	if fn, ok := s.invokeHandlers[in.Method]; ok {
		e := &cc.InvocationEvent{}
		ct, er := fn(ctx, e)
		return &cpb.InvokeResponse{......}, nil
	}
	return nil, fmt.Errorf("method not implemented: %s", in.Method)
}

其中 s.invokeHandlers 中保存处理请求的方法（由参数method作为key）。AddServiceInvocationHandler() 用于增加方法名和 handler 的映射：

// Server is the gRPC service implementation for Dapr.
type Server struct {
	invokeHandlers  map[string]common.ServiceInvocationHandler
}
type  ServiceInvocationHandler func(ctx context.Context, in *InvocationEvent) (out *Content, err error)

func (s *Server) AddServiceInvocationHandler(method string, fn func(ctx context.Context, in *cc.InvocationEvent) (our *cc.Content, err error)) error {
	s.invokeHandlers[method] = fn
	return nil
}

这意味着，在服务器端的应用中，并不需要为这些方法提供 gRPC 相关的 proto 定义，也不需要直接通过 gRPC 把这些方法暴露出去，只需要实现 AppCallback 的 OnInvode() 方法，然后把需要对外暴露的方法注册即可，OnInvode() 方法相当于一个简单的 API 网管。

title Dapr AppCallback OnInvoke gRPC impl
hide footbox
skinparam style strictuml

participant AppCallback [
    =AppCallback
    ----
    OnInvoke()
]

participant invokeHandlers
participant handler

-[#blue]> AppCallback : gRPC OnInvode()
note right: gRPC endpoint @ 3000\n/dapr.proto.runtime.v1.AppCallback/OnInvoke
AppCallback -> invokeHandlers: find handler by method name
invokeHandlers --> AppCallback: registered handler
AppCallback -> handler: call handler
note right: type  ServiceInvocationHandler \nfunc(ctx context.Context, in *InvocationEvent) \n(out *Content, err error)
handler --> AppCallback
<-[#blue]- AppCallback

用户代码实现示例

用户在开发支持 dapr 的 go 服务器端应用时，需要在应用中启动 dapr service server，然后添加各种 handler，包括 ServiceInvocationHandler，如下面这个例子（go-sdk下的 example/serving/grpc/main.go ）：

func main() {
	// create a Dapr service server
	s, err := daprd.NewService(":50001")

	// add a service to service invocation handler
	if err := s.AddServiceInvocationHandler("echo", echoHandler); err != nil {
		log.Fatalf("error adding invocation handler: %v", err)
	}

	// start the server
	if err := s.Start(); err != nil {
		log.Fatalf("server error: %v", err)
	}
}

java-sdk实现

java SDK 中没有找到服务器端实现的代码？待确定。

1.2 - 命名解析的设计和实现

命名解析

1.2.1 - 命名解析概述

命名解析

介绍

Name resolvers provide a common way to interact with different name resolvers, which are used to return the address or IP of other services your applications may connect to.

命名解析器提供了一种与不同命名解析器互动的通用方法，这些解析器用于返回你的应用程序可能要连接到的其他服务的地址或IP。

接口定义

兼容的名称解析器需要实现 nameresolution.go 文件中的 Resolver 接口。

// Resolver是命名解析器的接口。
type Resolver interface {
	// Init initializes name resolver.
	Init(metadata Metadata) error
	// ResolveID resolves name to address.
	ResolveID(req ResolveRequest) (string, error)
}

// ResolveRequest 表示服务发现解析器请求。
type ResolveRequest struct {
	ID        string
	Namespace string
	Port      int
	Data      map[string]string
}

1.2.2 - 使用方式

命名解析在service invoke 流程中的使用方式

解析地址

name resolver 被调用的地方只有一个：

func (d *directMessaging) getRemoteApp(appID string) (remoteApp, error) {
  // 从appID中获取id和namespace
  // appID 可能是类似 "appID.namespace" 的格式
	id, namespace, err := d.requestAppIDAndNamespace(appID)
	if err != nil {
		return remoteApp{}, err
	}

  // 执行 resolver 的解析
	request := nr.ResolveRequest{ID: id, Namespace: namespace, Port: d.grpcPort}
	address, err := d.resolver.ResolveID(request)
	if err != nil {
		return remoteApp{}, err
	}

  // 返回 remoteApp 的地址
	return remoteApp{
		namespace: namespace,
		id:        id,
		address:   address,
	}, nil
}

解析出来的地址在 directMessaging 的 Invoke() 中使用，用来执行远程调用：

// Invoke takes a message requests and invokes an app, either local or remote.
func (d *directMessaging) Invoke(ctx context.Context, targetAppID string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
	app, err := d.getRemoteApp(targetAppID)
	if err != nil {
		return nil, err
	}

  // 如果目标应用的 id 和 namespace 都和 directMessaging 的一致，则执行 invokeLocal()
	if app.id == d.appID && app.namespace == d.namespace {
		return d.invokeLocal(ctx, req)
	}
  
  // 这是在带有重试机制的情况下调用 invokeRemote
	return d.invokeWithRetry(ctx, retry.DefaultLinearRetryCount, retry.DefaultLinearBackoffInterval, app, d.invokeRemote, req)
}

invokeWithRetry() 中忽略重试的代码：

func (d *directMessaging) invokeWithRetry(
	ctx context.Context,
	numRetries int,
	backoffInterval time.Duration,
	app remoteApp,
	fn func(ctx context.Context, appID, namespace, appAddress string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error),
	req *invokev1.InvokeMethodRequest,
) (*invokev1.InvokeMethodResponse, error) {
  
}

invokeRemote()

func (d *directMessaging) invokeRemote(ctx context.Context, appID, namespace, appAddress string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
  // 
	conn, teardown, err := d.connectionCreatorFn(context.TODO(), appAddress, appID, namespace, false, false, false)
	defer teardown()
	if err != nil {
		return nil, err
	}

	ctx = d.setContextSpan(ctx)

	d.addForwardedHeadersToMetadata(req)
	d.addDestinationAppIDHeaderToMetadata(appID, req)

	clientV1 := internalv1pb.NewServiceInvocationClient(conn)

	var opts []grpc.CallOption
	opts = append(opts, grpc.MaxCallRecvMsgSize(d.maxRequestBodySize*1024*1024), grpc.MaxCallSendMsgSize(d.maxRequestBodySize*1024*1024))

	resp, err := clientV1.CallLocal(ctx, req.Proto(), opts...)
	if err != nil {
		return nil, err
	}

	return invokev1.InternalInvokeResponse(resp)
}

1.2.3 - mdns命名解析

mdns命名解析实现

基本输入输出

跳过细节和错误处理，尤其是去除所有同步保护代码（很复杂），只简单看输入和输出：

// ResolveID 通过 mDNS 将名称解析为地址。
func (m *Resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {
	m.browseOne(ctx, req.ID, published)

	select {
	case addr := <-sub.AddrChan:
		return addr, nil
	case err := <-sub.ErrChan:
		return "", err
	case <-time.After(subscriberTimeout):
		return "", fmt.Errorf("timeout waiting for address for app id %s", req.ID)
	}
}

func (m *Resolver) browseOne(ctx context.Context, appID string, published chan struct{}) {
  err := m.browse(browseCtx, appID, onFirst)
}

注意：只用到了 req.ID, 全程没有使用 req.Namespace，也就是 MDNS 根本不支持 Namespace.

mdns解析方式

mdns 的核心实现在 browseOne() 方法中：

func (m *Resolver) browseOne(ctx context.Context, appID string, published chan struct{}) {
  // 启动一个 goroutine 异步执行
	go func() {
		var addr string

		browseCtx, cancel := context.WithCancel(ctx)
		defer cancel()

    // 准备回调函数，收到第一个地址之后就取消 browse，所以这个函数名为 browseOne
		onFirst := func(ip string) {
			addr = ip
			cancel() // cancel to stop browsing.
		}

		m.logger.Debugf("Browsing for first mDNS address for app id %s", appID)

    // 执行 browse
		err := m.browse(browseCtx, appID, onFirst)
		// 忽略错误处理
    ......
    
		m.pubAddrToSubs(appID, addr)

		published <- struct{}{} // signal that all subscribers have been notified.
	}()
}

继续看 browse 的实现：

// browse 将对所提供的 App ID 进行无阻塞的 mdns 网络浏览
func (m *Resolver) browse(ctx context.Context, appID string, onEach func(ip string)) error {
  ......
}

首先通过 zeroconf.NewResolver 构建一个 Resolver：

  import "github.com/grandcat/zeroconf"

	resolver, err := zeroconf.NewResolver(nil)
  if err != nil {
		return fmt.Errorf("failed to initialize resolver: %w", err)
	}
  ......

zeroconf 是一个纯Golang库，采用多播 DNS-SD 来浏览和解析网络中的服务，并在本地网络中注册自己的服务。

执行mdns解析的代码是 resolver.Browse() 方法，解析的结果会异步发送到 entries 这个 channel 中：

	entries := make(chan *zeroconf.ServiceEntry)	
  if err = resolver.Browse(ctx, appID, "local.", entries); err != nil {
		return fmt.Errorf("failed to browse: %w", err)
	}

每个从 mDNS browse 返回的 service entry 会这样处理：

	// handle each service entry returned from the mDNS browse.
	go func(results <-chan *zeroconf.ServiceEntry) {
		for {
			select {
			case entry := <-results:
				if entry == nil {
					break
				}
        // 调用 handleEntry 方法来处理每个返回的 service entry
				handleEntry(entry)
			case <-ctx.Done():
        // 如果所有 service entry 都处理完成了，或者是出错（取消或者超时）
        // 此时需要推出 browse，但在退出之前需要检查一下是否有已经收到但还没有处理的结果
				for len(results) > 0 {
					handleEntry(<-results)
				}

				if errors.Is(ctx.Err(), context.Canceled) {
					m.logger.Debugf("mDNS browse for app id %s canceled.", appID)
				} else if errors.Is(ctx.Err(), context.DeadlineExceeded) {
					m.logger.Debugf("mDNS browse for app id %s timed out.", appID)
				}

				return // stop listening for results.
			}
		}
	}(entries)

handleEntry() 方法的实现：

	handleEntry := func(entry *zeroconf.ServiceEntry) {
		for _, text := range entry.Text {
      // 检查appID看是否是自己要查找的app
			if text != appID {
				m.logger.Debugf("mDNS response doesn't match app id %s, skipping.", appID)
				break
			}

			m.logger.Debugf("mDNS response for app id %s received.", appID)

      // 检查是否有 IPv4 或者 ipv6 地址
			hasIPv4Address := len(entry.AddrIPv4) > 0
			hasIPv6Address := len(entry.AddrIPv6) > 0

			if !hasIPv4Address && !hasIPv6Address {
				m.logger.Debugf("mDNS response for app id %s doesn't contain any IPv4 or IPv6 addresses, skipping.", appID)
				break
			}

			var addr string
			port := entry.Port
      // 目前只支持取第一个地址
			// TODO: we currently only use the first IPv4 and IPv6 address.
			// We should understand the cases in which additional addresses
			// are returned and whether we need to support them.
      // 加入到缓存中，缓存后面细看
			if hasIPv4Address {
				addr = fmt.Sprintf("%s:%d", entry.AddrIPv4[0].String(), port)
				m.addAppAddressIPv4(appID, addr)
			}
			if hasIPv6Address {
				addr = fmt.Sprintf("%s:%d", entry.AddrIPv6[0].String(), port)
				m.addAppAddressIPv6(appID, addr)
			}

      // 开始回调，就是前面说的拿到第一个地址就取消 browse
			if onEach != nil {
				onEach(addr) // invoke callback.
			}
		}
	}

至此就完成了 mdns 的解析，从 ID 到 address。

缓存设计

mdns 是非常慢的，为了性能就需要缓存解析后的地址，前面的代码在解析完成之后会保存这些地址：

// addAppAddressIPv4 adds an IPv4 address to the
// cache for the provided app id.
func (m *Resolver) addAppAddressIPv4(appID string, addr string) {
	m.ipv4Mu.Lock()
	defer m.ipv4Mu.Unlock()

	m.logger.Debugf("Adding IPv4 address %s for app id %s cache entry.", addr, appID)
	if _, ok := m.appAddressesIPv4[appID]; !ok {
		var addrList addressList
		m.appAddressesIPv4[appID] = &addrList
	}
	m.appAddressesIPv4[appID].add(addr)
}

在解析之前，在 ResolveID() 方法中会线尝试检查缓存中是否有数据，如果有就直接使用：

func (m *Resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {
	// check for cached IPv4 addresses for this app id first.
	if addr := m.nextIPv4Address(req.ID); addr != nil {
		return *addr, nil
	}

	// check for cached IPv6 addresses for this app id second.
	if addr := m.nextIPv6Address(req.ID); addr != nil {
		return *addr, nil
	}
  ......
}

从缓存中获取appID对应的地址：

// nextIPv4Address returns the next IPv4 address for
// the provided app id from the cache.
func (m *Resolver) nextIPv4Address(appID string) *string {
	m.ipv4Mu.RLock()
	defer m.ipv4Mu.RUnlock()
	addrList, exists := m.appAddressesIPv4[appID]
	if exists {
		addr := addrList.next()
		if addr != nil {
			m.logger.Debugf("found mDNS IPv4 address in cache: %s", *addr)

			return addr
		}
	}

	return nil
}

addrList.next() 比较有意思，这里不是要获取地址列表，而是取单个地址。也就是说，当有多个地址时，这里 addrList.next() 实际上实现了负载均衡 ^0^

负载均衡

addressList 结构体的组成：

// addressList represents a set of addresses along with
// data used to control and access said addresses.
type addressList struct {
	addresses []address
	counter   int
	mu        sync.RWMutex
}

除了地址数组之外，还有一个 counter ，以及并发保护的读写锁。

// max integer value supported on this architecture.
const maxInt = int(^uint(0) >> 1)

// next 从列表中获取下一个地址，考虑到当前的循环实现。除了尽力而为的线性迭代，对选择没有任何保证。
func (a *addressList) next() *string {
  // 获取读锁
	a.mu.RLock()
	defer a.mu.RUnlock()

	if len(a.addresses) == 0 {
		return nil
	}
  // 如果 counter 达到 maxInt，就从头再来
	if a.counter == maxInt {
		a.counter = 0
	}
  // 用地址数量 对 counter 求余，去余数所对应的地址，然后counter递增
  // 相当于一个最简单常见的 轮询 算法
	index := a.counter % len(a.addresses)
	addr := a.addresses[index]
	a.counter++

	return &addr.ip
}

并发保护

为了避免多个请求同时去解析同一个 ID，因此设计了并发保护机制，对于单个ID，只容许一个请求执行解析，其他请求会等待这个解析的结果：


// ResolveID resolves name to address via mDNS.
func (m *Resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {

	sub := NewSubscriber()

	// add the sub to the pool of subs for this app id.
	m.subMu.Lock()
	appIDSubs, exists := m.subs[req.ID]
	if !exists {
		// WARN: must set appIDSubs variable for use below.
		appIDSubs = NewSubscriberPool(sub)
		m.subs[req.ID] = appIDSubs
	} else {
		appIDSubs.Add(sub)
	}
	m.subMu.Unlock()

	// only one subscriber per pool will perform the first browse for the
	// requested app id. The rest will subscribe for an address or error.
	var once *sync.Once
	var published chan struct{}
	ctx, cancel := context.WithTimeout(context.Background(), browseOneTimeout)
	defer cancel()
	appIDSubs.Once.Do(func() {
		published = make(chan struct{})
		m.browseOne(ctx, req.ID, published)

		// once will only be set for the first browser.
		once = new(sync.Once)
	})
	......
}

总结

mdns name resolver 返回的是一个简单的 ip 地址+端口（v4或者v6），形如 “192.168.0.100:8000”。

1.2.4 - kubernetes

kubernetes 命名解析实现

实现

kubernetes 的实现超级简单，直接按照 Kubernetes services 的格式要求，评出一个 Kubernetes services 的 name 即可：

// ResolveID resolves name to address in Kubernetes.
func (k *resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {
	// Dapr requires this formatting for Kubernetes services
	return fmt.Sprintf("%s-dapr.%s.svc.%s:%d", req.ID, req.Namespace, k.clusterDomain, req.Port), nil
}

其中， req.ID 和 req.Namespace 对应到 Kubernetes 的 service name 和 namespace，注意这里的 Kubernetes service 是在 ID 后面加了 “-dapr” 后缀。Port 来自请求参数，简单拼接而已。

clusterDomain 的设置

clusterDomain 稍微复杂一点，默认值是 “cluster.local”，在构建 Resolver 时设置：

const (
	DefaultClusterDomain = "cluster.local"
)

type resolver struct {
	logger        logger.Logger
	clusterDomain string
}

// NewResolver creates Kubernetes name resolver.
func NewResolver(logger logger.Logger) nameresolution.Resolver {
	return &resolver{
		logger:        logger,
		clusterDomain: DefaultClusterDomain,
	}
}

可以在配置中设置名为 “clusterDomain” 的 metadata 来覆盖默认值：

const (
	ClusterDomainKey     = "clusterDomain"
)

func (k *resolver) Init(metadata nameresolution.Metadata) error {
	configInterface, err := config.Normalize(metadata.Configuration)
	if err != nil {
		return err
	}
	if config, ok := configInterface.(map[string]string); ok {
		clusterDomain := config[ClusterDomainKey]
		if clusterDomain != "" {
			k.clusterDomain = clusterDomain
		}
	}

	return nil
}

总结

kubernetes name resolver 返回的是一个简单的 Kubernetes services 的 name，形如 “app1-dapr.default.svc.cluster.local:80”。而不是一般意义上的 IP 地址。

1.2.5 - dns

dns 命名解析实现

实现

dns 的实现也是超级简单，类似 kubernetes 的实现，直接按照 DNS 的格式要求，评出一个 Kubernetes services 的 name 即可：

// ResolveID resolves name to address in orchestrator.
func (k *resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {
	return fmt.Sprintf("%s-dapr.%s.svc:%d", req.ID, req.Namespace, req.Port), nil
}

所有参数都来自请求，只是拼接而已。

总结

DNS name resolver 返回的是一个简单的 Kubernetes services 的 name，形如 “app1-dapr.default.svc:80”。而不是一般意义上的 IP 地址。

1.2.6 - consul

consul 命名解析实现

初始化

初始化需要读取配置，建立连接：

func (r *resolver) Init(metadata nr.Metadata) error {
	var err error

	r.config, err = getConfig(metadata)
	if err != nil {
		return err
	}

	if err = r.client.InitClient(r.config.Client); err != nil {
		return fmt.Errorf("failed to init consul client: %w", err)
	}

	// register service to consul
	......

	return nil
}

服务注册

在 init 函数中，还可以根据配置的要求执行 consul 的服务注册功能：

	// register service to consul
	if r.config.Registration != nil {
		if err := r.client.Agent().ServiceRegister(r.config.Registration); err != nil {
			return fmt.Errorf("failed to register consul service: %w", err)
		}

		r.logger.Infof("service:%s registered on consul agent", r.config.Registration.Name)
	} else if _, err := r.client.Agent().Self(); err != nil {
		return fmt.Errorf("failed check on consul agent: %w", err)
	}

解析器实现

consul 命名解析器的实现比较简单：

// ResolveID resolves name to address via consul.
func (r *resolver) ResolveID(req nr.ResolveRequest) (string, error) {
	cfg := r.config
  // 查询 consul 中对应服务的健康实例
  // 只用到 req.ID，namespace 没有用到
	services, _, err := r.client.Health().Service(req.ID, "", true, cfg.QueryOptions)
	if err != nil {
		return "", fmt.Errorf("failed to query healthy consul services: %w", err)
	}

	if len(services) == 0 {
		return "", fmt.Errorf("no healthy services found with AppID:%s", req.ID)
	}

  // shuffle：洗牌，将传入的 services 按照随机方式对调位置
	shuffle := func(services []*consul.ServiceEntry) []*consul.ServiceEntry {
		for i := len(services) - 1; i > 0; i-- {
			rndbig, _ := rand.Int(rand.Reader, big.NewInt(int64(i+1)))
			j := rndbig.Int64()

			services[i], services[j] = services[j], services[i]
		}

		return services
	}

  // 先洗牌，然后取结果中的第一个地址，相当于负载均衡中的随机算法
	svc := shuffle(services)[0]

	addr := ""

  // 取地址和port信息
	if port, ok := svc.Service.Meta[cfg.DaprPortMetaKey]; ok {
		if svc.Service.Address != "" {
			addr = fmt.Sprintf("%s:%s", svc.Service.Address, port)
		} else if svc.Node.Address != "" {
			addr = fmt.Sprintf("%s:%s", svc.Node.Address, port)
		} else {
			return "", fmt.Errorf("no healthy services found with AppID:%s", req.ID)
		}
	} else {
		return "", fmt.Errorf("target service AppID:%s found but DAPR_PORT missing from meta", req.ID)
	}

	return addr, nil
}

总结

consul name resolver 返回的是一个简单的ip/端口字符串，形如 “192.168.0.100:80”。对于多个实例，内部实现了随机算法。

1.3 - 访问控制的设计和实现

访问控制

2 - 发布订阅源码分析

Dapr发布订阅构建块的源码分析

2.1 - 发布的主流程

发布的主流程分析

2.1.1 - 流程概述

Dapr发布的流程和API概述

API 和端口

Dapr runtime 对外提供两个 API，分别是 Dapr HTTP API 和 Dapr gRPC API。两个 Dapr API 对外暴露的端口，默认是：

3500： HTTP 端口，可以通过命令行参数 dapr-http-port 设置
50001： gRPC 端口，可以通过命令行参数 dapr-grpc-port 设置

gRPC API

gRPC API 定义在 dapr/proto/runtime/v1/dapr.proto 文件中的 Dapr service 中：

service Dapr {
  // Publishes events to the specific topic.
  rpc PublishEvent(PublishEventRequest) returns (google.protobuf.Empty) {}
  ......
}

// PublishEventRequest is the message to publish event data to pubsub topic
message PublishEventRequest {
  // The name of the pubsub component
  string pubsub_name = 1;

  // The pubsub topic
  string topic = 2;

  // The data which will be published to topic.
  bytes data = 3;

  // The content type for the data (optional).
  string data_content_type = 4;

  // The metadata passing to pub components
  //
  // metadata property:
  // - key : the key of the message.
  map<string, string> metadata = 5;
}

主要的参数是：

pubsub_name：dapr pubsub component的名字
topic：发布消息的目标topic
data：消息的数据

可选参数有：

data_content_type：消息数据的内容类型
metadata：可选的元数据信息，用于扩展

HTTP API

HTTP API 没有明确的单独定义，不过可以从代码中获知。在 pkg/http/api.go 中，构建用于 publish 的 endpoint 的代码如下：

func (a *api) constructPubSubEndpoints() []Endpoint {
	return []Endpoint{
		{
      // 发送 POST 或者 PUT 请求
			Methods: []string{fasthttp.MethodPost, fasthttp.MethodPut},
      // 到这个 URL
			Route:   "publish/{pubsubname}/{topic:*}",
			Version: apiVersionV1,
			Handler: a.onPublish,
		},
	}
}

因此，用于 publish 的 daprd URL 类似于 http://localhost:3500/v1.0/publish/pubsubname1/topic1。

处理请求的 handler 方法 a.onPublish() 中读取参数的代码如下（忽略其他细节）：

const (
  pubsubnameparam          = "pubsubname"
）

// 从 url 中读取 pubsubname
pubsubName := reqCtx.UserValue(pubsubnameparam).(string)
// 从 url 中读取 topic
topic := reqCtx.UserValue(topicParam).(string)
// 从 HTTP body 
body := reqCtx.PostBody()
// 从 HTTP 的 Content-Type header 中读取 data_content_type
contentType := string(reqCtx.Request.Header.Peek("Content-Type"))
  
// 从 HTTP URL query 中读取 metadata
metadata := getMetadataFromRequest(reqCtx)

Metadata 的读取要稍微复杂一些，需要读取所有的 url query 参数，然后根据 key 的前缀判断是不是 metadata：

const (
	metadataPrefix        = "metadata."
)

func getMetadataFromRequest(reqCtx *fasthttp.RequestCtx) map[string]string {
	metadata := map[string]string{}
  // 游历所有的 url query 参数
	reqCtx.QueryArgs().VisitAll(func(key []byte, value []byte) {
		queryKey := string(key)
    // 如果 query 参数的 key 以 "metadata." 开头，就视为一个 metadata 的key
		if strings.HasPrefix(queryKey, metadataPrefix) {
      // key 的 前缀 "metadata." 要去掉
			k := strings.TrimPrefix(queryKey, metadataPrefix)
			metadata[k] = string(value)
		}
	})

	return metadata
}

总结：用于 publish 的完整的 daprd URL 类似于 http://localhost:3500/v1.0/publish/pubsubname1/topic1?metadata.k1=v1&metadata.k2=v2&metadata.k3=v3。消息内容通过 HTTP body 传递，另外可以通过 Content-Type header 传递消息内容类型参数。

发布流程

gRPC 协议

默认情况下使用 gRPC 协议进行消息发布，daprd 在默认的 50001 端口，通过注册的 dapr service 的 PublishEvent() 方法接收来自客户端通过 dapr SDK 发出的 gRPC 请求，之后根据具体的组件实现，对底层实际使用的消息中间件发布事件。流程大体如下：

title Pub-Sub via gRPC Protocol
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
    =User Code
    ----
    producer
]
participant SDK_client [
    =Dapr SDK
    ----
    producer
]
end box
participant daprd_client [
    =daprd
    ----
    producer
]
participant message_broker as "Message Broker"

user_code_client -> SDK_client : PublishEvent() 
note left: pubsub_name="name-1"\ntopic="topic-1"\ndata="[...]"\ndata_content_type=""\nmetadata="[...]"
note right: PublishEvent() @ Dapr service
SDK_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001
|||
daprd_client -[#red]> message_broker : native protocol (remote call)
|||
message_broker --[#red]> daprd_client :
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client

HTTP 协议

HTTP协议类似，daprd 在默认的 3500 端口，通过前面所述的URL接收客户端通过 dapr SDK 发出的 HTTP 请求。流程大体如下：

title Pub-Sub via HTTP Protocol
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
    =User Code
    ----
    producer
]
participant SDK_client [
    =Dapr SDK
    ----
    producer
]
end box
participant daprd_client [
    =daprd
    ----
    producer
]
participant message_broker as "Message Broker"

user_code_client -> SDK_client : PublishEvent() 
note left: pubsub_name="name-1"\ntopic="topic-1"\ndata="[...]"\ndata_content_type=""\nmetadata="[...]"
note right: POST http://localhost:3500/v1.0/publish/pubsubname1/topic1?\nmetadata.k1=v1&metadata.k2=v2&metadata.k3=v3
SDK_client -[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500
|||
daprd_client -[#red]> message_broker : native protocol (remote call)
|||
message_broker --[#red]> daprd_client :
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client

2.1.2 - 发布相关的Runtime初始化

Dapr Runtime中和发布相关的初始化流程

在 dapr runtime 启动进行初始化时，需要开启 API 端口并挂载相应的 handler 来接收并处理发布订阅中的发布请求。另外需要根据配置文件启动 pubsub component 以便连接到外部 message broker。

启动 Dapr gRPC API Server(outbound)

启动 gRPC 服务器

在 dapr runtime 启动时的初始化过程中，会启动 gRPC server，代码在 pkg/runtime/runtime.go 中：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
    // Create and start internal and external gRPC servers
	grpcAPI := a.getGRPCAPI()
    
	err = a.startGRPCAPIServer(grpcAPI, a.runtimeConfig.APIGRPCPort)
    ......
}

func (a *DaprRuntime) startGRPCAPIServer(api grpc.API, port int) error {
	serverConf := a.getNewServerConfig(a.runtimeConfig.APIListenAddresses, port)
	server := grpc.NewAPIServer(api, serverConf, a.globalConfig.Spec.TracingSpec, a.globalConfig.Spec.MetricSpec, a.globalConfig.Spec.APISpec, a.proxy)
    if err := server.StartNonBlocking(); err != nil {
		return err
	}
	......
}

// NewAPIServer returns a new user facing gRPC API server.
func NewAPIServer(api API, config ServerConfig, ......) Server {
	return &server{
		api:         api,
		config:      config,
		kind:        apiServer, // const apiServer = "apiServer"
		......
	}
}

注册 Dapr API

为了让 dapr runtime 的 gRPC 服务器能挂载 Dapr API，需要将定义 dapr api 的 dapr service 注册到 gRPC 服务器上去。

注册的代码实现在 pkg/grpc/server.go 中， StartNonBlocking() 方法在启动 grpc 服务器时，会进行服务注册：

func (s *server) StartNonBlocking() error {
		if s.kind == internalServer {
			internalv1pb.RegisterServiceInvocationServer(server, s.api)
		} else if s.kind == apiServer {
            runtimev1pb.RegisterDaprServer(server, s.api)		// 注意：s.api (即 gRPC api 实现) 被传递进去
		}
		......
}

而 RegisterDaprServer() 方法的实现代码在 pkg/proto/runtime/v1/dapr_grpc.pb.go:

func RegisterDaprServer(s grpc.ServiceRegistrar, srv DaprServer) {
	s.RegisterService(&Dapr_ServiceDesc, srv)					// srv 即 gRPC api 实现
}

Dapr_ServiceDesc 定义

在文件 pkg/proto/runtime/v1/dapr_grpc.pb.go 中有 Dapr Service 的 grpc 服务定义，这是 protoc 生成的 gRPC 代码。

Dapr_ServiceDesc 中有 Dapr Service 各个方法的定义，和发布相关的是 PublishEvent 方法：

var Dapr_ServiceDesc = grpc.ServiceDesc{
	ServiceName: "dapr.proto.runtime.v1.Dapr",
	HandlerType: (*DaprServer)(nil),
	Methods: []grpc.MethodDesc{
		{
			MethodName: "PublishEvent",				  # 注册方法名
			Handler:    _Dapr_PublishEvent_Handler,	  # 关联实现的 Handler
		},
        ......
        },
	},
	Metadata: "dapr/proto/runtime/v1/dapr.proto",
}

这一段是告诉 gRPC server：如果收到访问 dapr.proto.runtime.v1.Dapr 服务的 PublishEvent 方法的 gRPC 请求，请把请求转给 _Dapr_PublishEvent_Handler 处理。

title Dapr publish gRPC API 
hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    producer
]

-[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/PublishEvent
|||
<[#blue]-- daprd_client

而 PublishEvent 方法相关联的 handler 方法 _Dapr_PublishEvent_Handler 的实现代码是：

func _Dapr_PublishEvent_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
	in := new(PublishEventRequest)
	if err := dec(in); err != nil {
		return nil, err
	}
	if interceptor == nil {
		return srv.(DaprServer).PublishEvent(ctx, in)
	}
	info := &grpc.UnaryServerInfo{
		Server:     srv,
		FullMethod: "/dapr.proto.runtime.v1.Dapr/PublishEvent",
	}
	handler := func(ctx context.Context, req interface{}) (interface{}, error) {
		return srv.(DaprServer).PublishEvent(ctx, req.(*PublishEventRequest))
	}
	return interceptor(ctx, in, info, handler)
}

最后调用到了 DaprServer 接口实现的 PublishEvent 方法，也就是 gPRC API 实现。

启动 Dapr HTTP API Server(outbound)

在 dapr runtime 中启动 HTTP server

在 dapr runtime 启动时的初始化过程中，会启动 HTTP server，代码在 pkg/runtime/runtime.go 中

dapr runtime 的 HTTP server 用的是 fasthttp。

在 dapr runtime 启动时的初始化过程中，会启动 HTTP server， 代码在 pkg/runtime/runtime.go 中：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
  ......
  // Start HTTP Server
	err = a.startHTTPServer(a.runtimeConfig.HTTPPort, a.runtimeConfig.PublicPort, a.runtimeConfig.ProfilePort, a.runtimeConfig.AllowedOrigins, pipeline)
	if err != nil {
		log.Fatalf("failed to start HTTP server: %s", err)
	}
  ......
}

func (a *DaprRuntime) startHTTPServer(......) error {
	a.daprHTTPAPI = http.NewAPI(......)

	server := http.NewServer(a.daprHTTPAPI, ......)
  if err := server.StartNonBlocking(); err != nil {		// StartNonBlocking 启动 fasthttp server
		return err
	}
}

挂载 PubSub 的 HTTP 端点

在 HTTP API 的初始化过程中，会在 fast http server 上挂载 PubSub 的 HTTP 端点，代码在 pkg/http/api.go 中：

func NewAPI(
  appID string,
	appChannel channel.AppChannel,
	directMessaging messaging.DirectMessaging,
  ......
  	shutdown func()) API {
  
  	api := &api{
		appChannel:               appChannel,
		directMessaging:          directMessaging,
		......
	}
  
  	// 附加 PubSub 的 HTTP 端点
  	api.endpoints = append(api.endpoints, api.constructPubSubEndpoints()...)
}

PubSub 的 HTTP 端点的具体信息在 constructPubSubEndpoints() 方法中：

func (a *api) constructPubSubEndpoints() []Endpoint {
	return []Endpoint{
		{
			Methods: []string{fasthttp.MethodPost, fasthttp.MethodPut},
			Route:   "publish/{pubsubname}/{topic:*}",
			Version: apiVersionV1,
			Handler: a.onPublish,
		},
	}
}

注意这里的 Route 路径 “publish/{pubsubname}/{topic:*}"， dapr sdk 就是就通过这样的 url 来发起 HTTP publish 请求。

title Dapr Publish HTTP API 
hide footbox
skinparam style strictuml

participant daprd_client [
    =daprd
    ----
    producer
]

-[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500\n/v1.0/publish/{pubsubname}/{topic:*}
|||
<[#blue]-- daprd_client

pubsub 组件初始化

为了提供对 pubsub 的功能支持，需要为 dapr runtime 配置 pubsub component。

pubSubRegistry 和 pubSubs 列表

DaprRuntime 的结构体中保存有 pubSubRegistry 和 pubSubs 列表：

type DaprRuntime struct {
	......
	pubSubRegistry         pubsub_loader.Registry
	pubSubs                map[string]pubsub.PubSub
	......
}

runtime 构建时会初始化这两个结构体：

func NewDaprRuntime(runtimeConfig *Config, globalConfig *config.Configuration, accessControlList *config.AccessControlList, resiliencyProvider resiliency.Provider) *DaprRuntime {
	ctx, cancel := context.WithCancel(context.Background())
	return &DaprRuntime{
		......
		pubSubs:                map[string]pubsub.PubSub{},
		pubSubRegistry:         pubsub_loader.NewRegistry(),
        ......

PubSubRegistry 保存pubsub组件列表

pubSubRegistry 用于保存 dapr runtime 中支持的所有 pubsub component ：

pubSubRegistry struct {
    messageBuses map[string]func() pubsub.PubSub
}

在 runtime binary （cmd/daprd/main.go）的代码中，会列举出所有的 pubsub component ，这也是 darp 和 conponents-contrib 两个仓库的直接联系：

err = rt.Run(
		......
		runtime.WithPubSubs(
			pubsub_loader.New("azure.eventhubs", func() pubs.PubSub {
				return pubsub_eventhubs.NewAzureEventHubs(logContrib)
			}),
			pubsub_loader.New("azure.servicebus", func() pubs.PubSub {
				return servicebus.NewAzureServiceBus(logContrib)
			}),
			pubsub_loader.New("gcp.pubsub", func() pubs.PubSub {
				return pubsub_gcp.NewGCPPubSub(logContrib)
			}),
			pubsub_loader.New("hazelcast", func() pubs.PubSub {
				return pubsub_hazelcast.NewHazelcastPubSub(logContrib)
			}),
			pubsub_loader.New("jetstream", func() pubs.PubSub {
				return pubsub_jetstream.NewJetStream(logContrib)
			}),
			pubsub_loader.New("kafka", func() pubs.PubSub {
				return pubsub_kafka.NewKafka(logContrib)
			}),
			pubsub_loader.New("mqtt", func() pubs.PubSub {
				return pubsub_mqtt.NewMQTTPubSub(logContrib)
			}),
			pubsub_loader.New("natsstreaming", func() pubs.PubSub {
				return natsstreaming.NewNATSStreamingPubSub(logContrib)
			}),
			pubsub_loader.New("pulsar", func() pubs.PubSub {
				return pubsub_pulsar.NewPulsar(logContrib)
			}),
			pubsub_loader.New("rabbitmq", func() pubs.PubSub {
				return rabbitmq.NewRabbitMQ(logContrib)
			}),
			pubsub_loader.New("redis", func() pubs.PubSub {
				return pubsub_redis.NewRedisStreams(logContrib)
			}),
			pubsub_loader.New("snssqs", func() pubs.PubSub {
				return pubsub_snssqs.NewSnsSqs(logContrib)
			}),
			pubsub_loader.New("in-memory", func() pubs.PubSub {
				return pubsub_inmemory.New(logContrib)
			}),
		),
    ......
)

runtime 在初始化时会将这些 pubsub component 信息保存在 pubSubRegistry 中：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
    ......
	a.pubSubRegistry.Register(opts.pubsubs...)
}

需要注意的是，pubSubRegistry 中保存的组件列表是所有的被 dapr runtime 支持的组件列表，但是，不是每个组件在 runtime 启动时都会被装载。组件的安装时按需的，由组件配置文件（yaml）来决定装载和初始化那些组件的示例。

runtime 装载 pubsub 组件

组件在 dapr runtime 初始化时统一装载：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
    ......
	a.pubSubRegistry.Register(opts.pubsubs...)
	a.secretStoresRegistry.Register(opts.secretStores...)
	a.stateStoreRegistry.Register(opts.states...)
    ......
    err = a.loadComponents(opts)
    a.flushOutstandingComponents()
    ......
}

有两种实现，KubernetesMode 和 StandaloneMode：

func (a *DaprRuntime) loadComponents(opts *runtimeOpts) error {
    	var loader components.ComponentLoader

	switch a.runtimeConfig.Mode {
	case modes.KubernetesMode:
		loader = components.NewKubernetesComponents(a.runtimeConfig.Kubernetes, a.namespace, a.operatorClient, a.podName)
	case modes.StandaloneMode:
		loader = components.NewStandaloneComponents(a.runtimeConfig.Standalone)
	default:
		return errors.Errorf("components loader for mode %s not found", a.runtimeConfig.Mode)
	}
    comps, err := loader.LoadComponents()
    ......
}

KubernetesMode 下读取的是 k8s 下的 component CRD：

func (k *KubernetesComponents) LoadComponents() ([]components_v1alpha1.Component, error) {
	resp, err := k.client.ListComponents(context.Background(), &operatorv1pb.ListComponentsRequest{
		Namespace: k.namespace,
		PodName:   k.podName,
	}, ......
}

StandaloneMode 下读取的是由 ComponentsPath 配置(--componentspath)指定的目录下的 component CRD 文件：

func (s *StandaloneComponents) LoadComponents() ([]components_v1alpha1.Component, error) {
	files, err := os.ReadDir(s.config.ComponentsPath)
	......
}

总结

在完成 HTTP server 和 gRPC server 的初始化之后，dapr runtime 就做好了接收 publish 请求的准备。

2.1.3 - 客户端sdk发出publish请求

Dapr客户端sdk封装dapr api，发出发布订阅的publish请求

Java SDK 实现

在业务代码中使用 pubsub 功能的示例可参考文件 dapr java-sdk 中的代码 /src/main/java/io/dapr/examples/pubsub/http/Publisher.java，代码示意如下：

DaprClient client = (new DaprClientBuilder()).build();
String message = String.format("This is message #%d", i);
client.publishEvent(
    "messagebus",
    "testingtopic",
    message,
    singletonMap(Metadata."ttlInSeconds", "1000")).block();

java SDK 中除了 service invoke 默认使用 HTTP ，其他方法都是默认使用 gRPC，在 DaprClientProxy 类中初始化了两个 daprclient：

client 字段: 类型为 DaprClientGrpc，连接到 127.0.0.1:5001
methodInvocationOverrideClient 字段：类型为 DaprClientHttp，连接到 127.0.0.1:3500

pubsub 方法默认走 gRPC ，使用的是 DaprClientGrpc 类型（文件为 src/main/java/io/dapr/client/DaprClientGrpc.java）：

  @Override
  public Mono<Void> publishEvent(PublishEventRequest request) {
    try {
      String pubsubName = request.getPubsubName();
      String topic = request.getTopic();
      Object data = request.getData();
      DaprProtos.PublishEventRequest.Builder envelopeBuilder = DaprProtos.PublishEventRequest.newBuilder()
      ......
      return Mono.subscriberContext().flatMap(
              context ->
                  this.<Empty>createMono(
                      it -> intercept(context, asyncStub).publishEvent(envelopeBuilder.build(), it)
                  )
      ).then();
  }

在这里根据请求条件设置 PublishEvent 请求的各种参数，debug 时可以看到如下图的数据：

java-client-grpc

发出去给 dapr runtime 的 gRPC 请求如下图所示：

java-client-grpc-send

这里调用的 gRPC 服务是 dapr.proto.runtime.v1.Dapr，方法是 PublishEvent，和前一章中 dapr runtime 初始化中设定的 gRPC API 对应。

title PublishEvent via gRPC
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
    =App-1
    ----
    producer
]
participant SDK_client [
    =SDK
    ----
    producer
]
end box
participant daprd_client [
    =daprd
    ----
    producer
]

user_code_client -> SDK_client : PublishEvent() 
note left: pubsub_name="name-1"\ntopic="topic-1"\ndata="[...]"\ndata_content_type=""\nmetadata="[...]"
SDK_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n"dapr.proto.runtime.v1.Dapr/PublishEvent"
|||
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client

Go sdk实现

在 go 业务代码中使用 service invoke 功能的示例可参考 https://github.com/dapr/go-sdk/blob/main/examples/pubsub/pub/pub.go，代码示意如下：

client, err := dapr.NewClient()
err := client.PublishEvent(ctx, pubsubName, topicName, data)

Go SDK 中定义了 Client 接口，文件为 client/client.go：

// Client is the interface for Dapr client implementation.
type Client interface {
	// PublishEvent publishes data onto topic in specific pubsub component.
	PublishEvent(ctx context.Context, pubsubName, topicName string, data interface{}, opts ...PublishEventOption) error
    ......
}

方法的实现在 client/pubsub.go 中，都只是实现了对 PublishEventRequest 对象的组装：

func (c *GRPCClient) invokeServiceWithRequest(ctx context.Context, req *pb.InvokeServiceRequest) (out []byte, err error) {
    request := &pb.PublishEventRequest{
		PubsubName: pubsubName,
		Topic:      topicName,
	}
	_, err := c.protoClient.PublishEvent(c.withAuthToken(ctx), request)
	......
}

PublishEvent() 是 protoc 生成的 grpc 代码，在 dapr/proto/runtime/v1/dapr_grpc.pb.go 中，实现如下：

func (c *daprClient) PublishEvent(ctx context.Context, in *PublishEventRequest, opts ...grpc.CallOption) (*emptypb.Empty, error) {
	out := new(emptypb.Empty)
	err := c.cc.Invoke(ctx, "/dapr.proto.runtime.v1.Dapr/PublishEvent", in, out, opts...)
	if err != nil {
		return nil, err
	}
	return out, nil
}

注意: 这里调用的 gRPC 服务是 dapr.proto.runtime.v1.Dapr，方法是 InvokeService，和 dapr runtime 中 gRPC API 对应。

其他SDK

TODO

2.1.4 - Dapr Runtime 处理来自客户端的 publish 请求

Dapr Runtime 接收来自客户端的 publish 请求的代码分析

在 dapr runtime 中，提供 HTTP 和 gRPC 两种协议，前面 runtime 初始化时介绍了 HTTP 和 gRPC 两种协议是如何在 runtime 初始化时准备好接收来自客户端的 publish 请求的。现在我们介绍在接收到来自客户端的 publish 请求后，dapr runtime 是如何处理请求的。

gRPC API

在 gRPC API 的实现中，PublishEvent() 方法负责处理接收到的 publish 请求，其主要流程大体是如下4个步骤：

type api struct {
    pubsubAdapter              runtimePubsub.Adapter
}

func (a *api) PublishEvent(ctx context.Context, in *runtimev1pb.PublishEventRequest) (*emptypb.Empty, error) {
  // 1. 根据名称找到可以处理请求的 pubsub 组件
  thepubsub := a.pubsubAdapter.GetPubSub(pubsubName)
  // 2. 处理参数的细节：如是否要封装为 cloudevent
  // 细节忽略，后续展开
  // 3. 构建 PublishRequest 请求对象
  req := pubsub.PublishRequest{
		PubsubName: pubsubName,
		Topic:      topic,
		Data:       data,
		Metadata:   in.Metadata,
	}
  // 4. 未退 pubsub 组件来负责具体的请求发送
  err := a.pubsubAdapter.Publish(&req)
}

查找处理请求的 pubsub 组件

  // 检查是否有初始化 pubsubAdapter，没有的话报错退出
  if a.pubsubAdapter == nil {
		err := status.Error(codes.FailedPrecondition, messages.ErrPubsubNotConfigured)
		apiServerLogger.Debug(err)
		return &emptypb.Empty{}, err
	}

	pubsubName := in.PubsubName
  // 检查请求，pubsubName 参数不能为空
	if pubsubName == "" {
		err := status.Error(codes.InvalidArgument, messages.ErrPubsubEmpty)
		apiServerLogger.Debug(err)
		return &emptypb.Empty{}, err
	}

  // 根据 pubsubName 参数在 pubsubAdapter 中找到对应的组件
	thepubsub := a.pubsubAdapter.GetPubSub(pubsubName)
	if thepubsub == nil {
    // 如果找不到，则报错退出
		err := status.Errorf(codes.InvalidArgument, messages.ErrPubsubNotFound, pubsubName)
		apiServerLogger.Debug(err)
		return &emptypb.Empty{}, err
	}

GetPubSub() 方法的实现很简单，就是根据 pubsubName 在现有已经初始化的 pubsub 组件中进行简单的map查找：

// GetPubSub is an adapter method to find a pubsub by name.
func (a *DaprRuntime) GetPubSub(pubsubName string) pubsub.PubSub {
	ps, ok := a.pubSubs[pubsubName]
	if !ok {
		return nil
	}
	return ps.component
}

委托 pubsub 组件发送请求

func (a *DaprRuntime) Publish(req *pubsub.PublishRequest) error {
  // 这里又根据名称做了一次查找
  // TBD：可以考虑做代码优化了，从前面把找到的组件传递过来就好了
	ps, ok := a.pubSubs[req.PubsubName]
	if !ok {
		return runtimePubsub.NotFoundError{PubsubName: req.PubsubName}
	}

  // 检查 pubsub 操作是否被容许
	if allowed := a.isPubSubOperationAllowed(req.PubsubName, req.Topic, ps.scopedPublishings); !allowed {
		return runtimePubsub.NotAllowedError{Topic: req.Topic, ID: a.runtimeConfig.ID}
	}

  // 执行策略
	policy := a.resiliency.ComponentOutboundPolicy(a.ctx, req.PubsubName)
	return policy(func(ctx context.Context) (err error) {
    // 最终调用到底层实际组件的 Publish 方法来发送请求
		return ps.component.Publish(req)
	})
}

HTTP API

HTTP API 的处理方式和 gRPC API 是一致的，只是 HTTP API 这边由于 HTTP 协议的原因，在请求参数的获取上无法像 gRPC API 那样有一个的 runtimev1pb.PublishEventRequest 对象可以完整的封装所有请求参数，HTTP API 会多出一个请求参数的获取过程。

从 HTTP 请求中获取所有参数

HTTP API 实现中的 onPublish() 方法的前面一段代码就是在处理如何从 HTTP 请求中获取 publish 所需的所有参数：

func (a *api) onPublish(reqCtx *fasthttp.RequestCtx) {
  // 1. pubsubName
  pubsubName := reqCtx.UserValue(pubsubnameparam).(string)
  // 2. topic
  topic := reqCtx.UserValue(topicParam).(string)
  // 3. data
  body := reqCtx.PostBody()
  // 4. data content type
	contentType := string(reqCtx.Request.Header.Peek("Content-Type"))
  // 5. metadata
	metadata := getMetadataFromRequest(reqCtx)
  
  // 后续处理和 gRPC 协议一致
  ......
}

2.1.5 - 组件实现

组件实现publish的实际功能

组件接口中的 Publish() 方法定义

在 dapr runtime API 实现（包括 HTTP API 和 gRPC API）和底层 pubsub 组件之间，还有一个简单的内部接口，定义了 pubsub 组件的功能：

// PubSub is the interface for message buses.
type PubSub interface {
	Init(metadata Metadata) error
	Features() []Feature
	Publish(req *PublishRequest) error
	Subscribe(ctx context.Context, req SubscribeRequest, handler Handler) error
	Close() error
}

其中的 Publish() 用来发送消息。请求参数 PublishRequest 的字段和 Dapr API 定义中保持一致：

// PublishRequest is the request to publish a message.
type PublishRequest struct {
	Data        []byte            `json:"data"`
	PubsubName  string            `json:"pubsubname"`
	Topic       string            `json:"topic"`
	Metadata    map[string]string `json:"metadata"`
	ContentType *string           `json:"contentType,omitempty"`
}

redis 组件实现

以 redis stream 为例，看看 publish 方法的实现：

func (r *redisStreams) Publish(req *pubsub.PublishRequest) error {
	_, err := r.client.XAdd(r.ctx, &redis.XAddArgs{
		Stream:       req.Topic,
		MaxLenApprox: r.metadata.maxLenApprox,
		Values:       map[string]interface{}{"data": req.Data},
	}).Result()
	if err != nil {
		return fmt.Errorf("redis streams: error from publish: %s", err)
	}

	return nil
}

redis stream 的实现很简单，req.Topic 参数指定要写入的 redis stream，内容为一个map，其中 key “data” 的值为 req.Data。

2.2 - 订阅主流程

订阅的主流程分析

2.2.1 - 流程概述

Dapr订阅的流程和API概述

API 和端口

订阅流程实际包含三个子流程：

获取应用订阅消息

daprd 需要获知应用的订阅信息。

实现中，dapr 会要求应用收集订阅信息并通过指定方式暴露（SDK 可以提供帮助），以便 daprd 可以通过给应用发送请求来获取这些订阅信息。
执行消息订阅

Daprd 在拿到应用的订阅信息之后，就可以使用底层组件的订阅机制进行消息订阅。
转发消息给应用

daprd 收到来自底层组件的订阅的消息之后，需要将消息转发给应用。

以上子流程1和3都需要 daprd 主动访问应用，因此 dapr 需要获知应用在哪个端口监听并处理订阅请求，这个信息通过命令行参数 app-port 设置。Dapr 的示例中一般喜欢用 3000 端口。

gRPC API

gRPC API 定义在 dapr/proto/runtime/v1/appcallback.proto 文件中的 AppCallback service 中：

service AppCallback {
  // 子流程1:获取应用订阅消息
  rpc ListTopicSubscriptions(google.protobuf.Empty) returns (ListTopicSubscriptionsResponse) {}

  // 子流程3:转发消息给应用
  rpc OnTopicEvent(TopicEventRequest) returns (TopicEventResponse) {}
  ......
}

ListTopicSubscriptionsResponse 的定义:

message ListTopicSubscriptionsResponse {
  repeated common.v1.TopicSubscription subscriptions = 1;
}

message TopicSubscription {
  // pubsub的组件名
  string pubsub_name = 1;

  // 要订阅的topic
  string topic = 2;

  // 可选参数，后面展开
  map<string,string> metadata = 3;
  TopicRoutes routes = 5;
  string dead_letter_topic = 6;
}

即应用可以有多个消息订阅，每个订阅都必须提供 pubsub_name 和 topic 参数。

TopicEventRequest 的定义：

message TopicEventRequest {
  // 这几个参数先忽略
  string id = 1;
  string source = 2;
  string type = 3;
  string spec_version = 4;
  string path = 9;

  // 事件的基本信息
  string data_content_type = 5;
  bytes data = 7;
  string topic = 6;
  string pubsub_name = 8;
}

HTTP API

发布流程

HTTP 协议

title Subscribe via http
hide footbox
skinparam style strictuml
box "App-1"
participant user_code [
    =App-1
    ----
    producer
]
participant SDK [
    =SDK
    ----
    producer
]
end box
participant daprd [
    =daprd
    ----
    producer
]
participant message_broker as "Message Broker"

SDK -> user_code: collection subscribe
user_code --> SDK

daprd -[#blue]> SDK : http
note left: appChannel.InvokeMethod("dapr/subscribe")
SDK --[#blue]> daprd : 

daprd -[#red]> message_broker : subscribe topics
message_broker --[#red]> daprd

|||
|||
|||
|||

message_broker -[#red]> daprd: event
daprd -[#blue]> SDK : http
note left: appChannel.InvokeMethod("/{route}")
SDK -> user_code : 
user_code --> SDK
SDK --[#blue]> daprd
|||

gRPC 方式

title Subscribe via gRPC
hide footbox
skinparam style strictuml
box "App-1"
participant user_code [
    =App-1
    ----
    producer
]
participant SDK [
    =SDK
    ----
    producer
]
end box
participant daprd [
    =daprd
    ----
    producer
]
participant message_broker as "Message Broker"

SDK -> user_code: collection subscribe
user_code --> SDK

daprd -[#blue]> SDK : gRPC
note left: appChannel.ListTopicSubscriptions()
SDK --[#blue]> daprd : 

daprd -[#red]> message_broker : subscribe topics
message_broker --[#red]> daprd

|||
|||
|||
|||

message_broker -[#red]> daprd: event
daprd -[#blue]> SDK : gRPC
note left: appChannel.OnTopicEvent()
SDK -> user_code : 
user_code --> SDK
SDK --[#blue]> daprd
|||

2.2.2 - 订阅相关的Runtime初始化

Dapr Runtime中和订阅相关的初始化流程

在 dapr runtime 启动进行初始化时，需要

访问应用以获取应用的订阅信息：比如应用订阅了哪些topic
根据配置文件启动 subscribe component 以便连接到外部 message broker 进行订阅
将订阅更新的 event 转发给应用

Dapr runtime初始化component列表

dapr runtime 初始化时会创建和 app 的连接，称为 app channel，然后开始发布订阅的初始化：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
	......
    // 有一个单独的 go routine 负责处理 component 的初始化
    go a.processComponents()
    err = a.loadComponents(opts)
    
	// 等待应用ready： 前提是设置了 app port
	a.blockUntilAppIsReady()

	// 创建 app channel
	err = a.createAppChannel()
    // app channel 支持 http 和 grpc
	a.daprHTTPAPI.SetAppChannel(a.appChannel)
	grpcAPI.SetAppChannel(a.appChannel)
    ......
    
    // 开始发布订阅的初始化
    a.startSubscribing()
}

这里有一段复杂的并行初始化components并处理相互依赖的逻辑，忽略这些细节，只看执行 component 初始化的代码：

func (a *DaprRuntime) doProcessOneComponent(category ComponentCategory, comp components_v1alpha1.Component) error {
	switch category {
	case pubsubComponent:
		return a.initPubSub(comp)
	......
	}
	return nil
}

func (a *DaprRuntime) initPubSub(c components_v1alpha1.Component) error {
	pubSub, err := a.pubSubRegistry.Create(c.Spec.Type, c.Spec.Version)

    // 初始化 pubSub component
	err = pubSub.Init(pubsub.Metadata{
		Properties: properties,
	})

	pubsubName := c.ObjectMeta.Name
	a.pubSubs[pubsubName] = pubSub
	return nil
}

这个执行完成之后，a.pubSubs 中便保存有当前配置并初始化好的 pubsub 组件列表。

pubsub组件启动

订阅的初始化在 dapr runtime 启动过程的最后阶段

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
    ......
    // 开始发布订阅的初始化
    a.startSubscribing()
}

startSubscribing() 方法逐个处理 pubSub 组件:

func (a *DaprRuntime) startSubscribing() {
	for name, pubsub := range a.pubSubs {
		if err := a.beginPubSub(name, pubsub); err != nil {
			log.Errorf("error occurred while beginning pubsub %s: %s", name, err)
		}
	}
}

beginPubSub 方法做了两个事情： 1. 获取应用的订阅信息 2. 让组件开始订阅

func (a *DaprRuntime) beginPubSub(name string, ps pubsub.PubSub) error {
	var publishFunc func(ctx context.Context, msg *pubsubSubscribedMessage) error
    ......
	topicRoutes, err := a.getTopicRoutes()
    ......
}

获取应用订阅信息(AppCallback)

在 getTopicRoutes() 方法中，可以通过 HTTP 或者 gRPC 的方式来获取应用订阅信息：

func (a *DaprRuntime) getTopicRoutes() (map[string]TopicRoute, error) {
    ......
    if a.runtimeConfig.ApplicationProtocol == HTTPProtocol {
        // 走 http channel
		subscriptions, err = runtime_pubsub.GetSubscriptionsHTTP(a.appChannel, log)
	} else if a.runtimeConfig.ApplicationProtocol == GRPCProtocol {
        // 走 grpc channel
		client := runtimev1pb.NewAppCallbackClient(a.grpc.AppClient)
		subscriptions, err = runtime_pubsub.GetSubscriptionsGRPC(client, log)
	}
    ......
}

对于 HTTP 方式，调用的是 AppChannel 上定义的 InvokeMethod 方法，这个方法原来设计是用来实现 service invoke 的，dapr runtime 用来通过它将 service invoke 的 http inbound 请求转发给作为服务器端的应用。而在这里，被用来调用 dapr/subscribe 路径：

func GetSubscriptionsHTTP(channel channel.AppChannel, log logger.Logger) ([]Subscription, error) {
    req := invokev1.NewInvokeMethodRequest("dapr/subscribe")
    channel.InvokeMethod(ctx, req)
    ......
}

感想：理论上说这也不是为一种方便的方式，只是总感觉有点怪怪，pubsub 模块的初始化用到了 service invoke 模块的功能。直接发个http请求代码也不复杂。另外 http AppChannel / app callback 的方法和 grpc AppChannel / app callback 不对称，这在设计上缺乏美感。

对于 gRPC 方式，就比较老实的调用了 gRPC AppCallbackClient 的方法 ListTopicSubscriptions():

resp, err = channel.ListTopicSubscriptions(context.Background(), &emptypb.Empty{})

pubsub 组件开始订阅

在获取到应用的订阅信息之后，dapr runtime 就知道这个应用需要订阅哪些topic了。因此就可以继续开始订阅操作：

func (a *DaprRuntime) beginPubSub(name string, ps pubsub.PubSub) error {
	var publishFunc func(ctx context.Context, msg *pubsubSubscribedMessage) error
    ......
    // 获取订阅信息
	topicRoutes, err := a.getTopicRoutes()
    ......
    // 开始订阅
    for topic, route := range v.routes {
        // 在当前 pubsub 组件上为每个 topic 进行订阅
        err := ps.Subscribe(pubsub.SubscribeRequest{
			Topic:    topic,
			Metadata: route.metadata,
        }, func(ctx context.Context, msg *pubsub.NewMessage) error {......}
    }
}

这里的 Subscribe() 方法的定义在 PubSub 接口上，每个 dapr pubsub 组件都会实现这个接口：

type PubSub interface {
	Publish(req *PublishRequest) error
	Subscribe(req SubscribeRequest, handler Handler) error
}

handler 方法的具体实现后面再展开。

2.2.3 - 客户端sdk为dapr提供订阅信息

Dapr客户端sdk封装dapr api，接受dapr发出的ListTopicSubscriptions请求

工作原理

对于订阅信息而言，有四个关键的信息。在 dapr proto 中的定义如下：

message TopicSubscription {
  // Required. The name of the pubsub containing the topic below to subscribe to.
  string pubsub_name = 1;

  // Required. The name of topic which will be subscribed
  string topic = 2;

  // The optional properties used for this topic's subscription e.g. session id
  map<string,string> metadata = 3;

  // The optional routing rules to match against. In the gRPC interface, OnTopicEvent
  // is still invoked but the matching path is sent in the TopicEventRequest.
  TopicRoutes routes = 5;
}

pubsub_name 指定要使用的 pubsub component，topic 是要订阅的主题， metadata 携带扩展信息，而 routes 路由则是标记 dapr 应该如何将订阅到的事件发送给应用。

TODO：对于 HTTP 协议和 gRPC 协议处理会有不同。

java sdk中的封装如下：

public class DaprTopicSubscription {
  private final String pubsubName;
  private final String topic;
  private final String route;
  private final Map<String, String> metadata;
}

dapr sdk 需要帮助应用方便的提供上述订阅信息。

Java SDK 实现

在业务代码中使用 subscribe 功能的示例可参考文件 dapr java-sdk 中的代码 /src/main/java/io/dapr/examples/pubsub/http/subscribe.java，代码示意如下：

// 启动应用，监听端口，一般喜欢使用 3000
public static void main(String[] args) throws Exception {
    ......
	DaprApplication.start(port); 
}

@RestController
public class SubscriberController {
  @Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}")
  @PostMapping(path = "/testingtopic")
  public Mono<Void> handleMessage(@RequestBody(required = false) CloudEvent<String> cloudEvent) {
      ......
  }
}

sdk收集订阅信息

上面代码中的 @Topic 注解是 dapr java sdk 提供的，用来标记需要进行 subscribe 的 topic，代码在src/main/java/io/dapr/Topic.java：

@Documented
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface Topic {
    String name();
    String pubsubName();
    String metadata() default "{}";
}

topic 的收集是典型的 springboot 风格，代码在 sdk-springboot/src/main/java/io/dapr/springboot/DaprBeanPostProcessor.java:

@Component
public class DaprBeanPostProcessor implements BeanPostProcessor {
  @Override
  public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException {
    subscribeToTopics(bean.getClass(), embeddedValueResolver);
    return bean;
  }
}

subscribeToTopics() 方法通过扫描 @topic 注解和 @PostMapping 注解来获取订阅相关的信息：

private static void subscribeToTopics(Class clazz, EmbeddedValueResolver embeddedValueResolver) {

    for (Method method : clazz.getDeclaredMethods()) {
      // 获取 @topic 注解
      Topic topic = method.getAnnotation(Topic.class);
      if (topic == null) {
        continue;
      }

      String route = topic.name();
      // 获取 @PostMapping 注解
      PostMapping mapping = method.getAnnotation(PostMapping.class);

      // 根据 PostMapping 注解获取 route 信息
      if (mapping != null && mapping.path() != null && mapping.path().length >= 1) {
        route = mapping.path()[0];
      } else if (mapping != null && mapping.value() != null && mapping.value().length >= 1) {
        route = mapping.value()[0];
      }

      String topicName = embeddedValueResolver.resolveStringValue(topic.name());
      String pubSubName = embeddedValueResolver.resolveStringValue(topic.pubsubName());
      if ((topicName != null) && (topicName.length() > 0) && pubSubName != null && pubSubName.length() > 0) {
        try {
          TypeReference<HashMap<String, String>> typeRef
                  = new TypeReference<HashMap<String, String>>() {};
          Map<String, String> metadata = MAPPER.readValue(topic.metadata(), typeRef);
          // 保存 subscribe 信息
          DaprRuntime.getInstance().addSubscribedTopic(pubSubName, topicName, route, metadata);
        } catch (JsonProcessingException e) {
          throw new IllegalArgumentException("Error while parsing metadata: " + e.toString());
        }
      }
    }
  }

DaprRuntime 是一个单例对象，这里保存有订阅的 topic 列表：

class DaprRuntime {
    private final Set<String> subscribedTopics = new HashSet<>();
    private final List<DaprTopicSubscription> subscriptions = new ArrayList<>();
    
    public synchronized void addSubscribedTopic(String pubsubName,
                                                String topicName,
                                                String route,
                                                Map<String,String> metadata) {
        if (!this.subscribedTopics.contains(topicName)) {
            this.subscribedTopics.add(topicName);
            this.subscriptions.add(new DaprTopicSubscription(pubsubName, topicName, route, metadata));
        }
    }
}

sdk暴露订阅信息

为了让 dapr 在 springboot 体系中方便使用，dapr java sdk 提供了 DaprController ，以提供诸如健康检查等通用功能，还有和dapr相关的各种端点，其中就有为 dapr runtime 提供订阅信息的接口：

@RestController
public class DaprController {
  ......
  @GetMapping(path = "/dapr/subscribe", produces = MediaType.APPLICATION_JSON_VALUE)
  public byte[] daprSubscribe() throws IOException {
    return SERIALIZER.serialize(DaprRuntime.getInstance().listSubscribedTopics());
  }
}

通过这个URL，就可以将之前收集到的 topic 信息都暴露出去，可以在浏览器中直接访问 http://127.0.0.1:3000/dapr/subscribe，应答内容为:

[{"pubsubName":"messagebus","topic":"testingtopic","route":"/testingtopic","metadata":{}}]

Go sdk实现

在 go 业务代码中使用 subscribe 功能的示例可参考 https://github.com/dapr/go-sdk/blob/main/examples/pubsub/sub/sub.go，代码示意如下：

func main() {
    s := daprd.NewService(":8080")
    err := s.AddTopicEventHandler(defaultSubscription, eventHandler)
    err = s.Start()
}

func eventHandler(ctx context.Context, e *common.TopicEvent) (retry bool, err error) {
	......
	return false, nil
}

sdk收集订阅信息

Go sdk 中定义了 Service 接口

// Service represents Dapr callback service.
type Service interface {
	// AddTopicEventHandler appends provided event handler with its topic and optional metadata to the service.
	// Note, retries are only considered when there is an error. Lack of error is considered as a success
	AddTopicEventHandler(sub *Subscription, fn TopicEventHandler) error
	......
}

Subscription 的定义如下：

// Subscription represents single topic subscription.
type Subscription struct {
	PubsubName string `json:"pubsubname"`
	Topic string `json:"topic"`
	Metadata map[string]string `json:"metadata,omitempty"`
	Route string `json:"route"`
	......
}

这样订阅相关的主要4个参数就通过这个方式指明了。

sdk暴露订阅信息

go sdk 中有 http 和 grpc 两套机制可以实现对外暴露访问端点。

http 的实现在 http/topic.go 中：

func (s *Server) AddTopicEventHandler(sub *common.Subscription, fn common.TopicEventHandler) error {
	if err := s.topicRegistrar.AddSubscription(sub, fn); err != nil {
		return err
	}

    // 注册 http handle，关联 Route 和 fn
	s.mux.Handle(sub.Route, optionsHandler(http.HandlerFunc(
		func(w http.ResponseWriter, r *http.Request) {
            ......
            retry, err := fn(r.Context(), &te)
            ......
        }
    }

grpc类似。

其他SDK

TODO

3 - workflow源码分析

Dapr workflow构建块的源码分析

3.1 - workflow主流程

Dapr workflow 主流程的源码分析

3.1.1 - workflow app start 流程

workflow app start流程的源码分析

3.1.1.1 - 流程概述

workflow app start流程概述

流程整体

workflow app 启动时，典型代码如下：

    // Register the OrderProcessingWorkflow and its activities with the builder.
    WorkflowRuntimeBuilder builder = new WorkflowRuntimeBuilder().registerWorkflow(OrderProcessingWorkflow.class);
    builder.registerActivity(NotifyActivity.class);
    builder.registerActivity(ProcessPaymentActivity.class);
    builder.registerActivity(RequestApprovalActivity.class);
    builder.registerActivity(ReserveInventoryActivity.class);
    builder.registerActivity(UpdateInventoryActivity.class);

    // Build and then start the workflow runtime pulling and executing tasks
    try (WorkflowRuntime runtime = builder.build()) {
      System.out.println("Start workflow runtime");
      runtime.start(false);
    }

这个过程中，注册了 workflow 和 activity，然后 start workflow runtime。workflow runtime 会启动 worker，从 dapr sidecar 持续获取工作任务，包括 workflow task 和 activity task，然后执行这些任务并把任务结果返回给到 dapr sidecar。

@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Sidecar" as DaprSidecar

WorkflowApp -> WorkflowApp: registerWorkflow()

WorkflowApp -> WorkflowApp: registerActivity()

WorkflowApp -[#red]> WorkflowApp: WorkflowRuntime.start()


WorkflowApp -> DaprSidecar: WorkflowRuntime.getWorkItems()
DaprSidecar --> WorkflowApp: 

loop has next task

alt is orchestration task

WorkflowApp -> WorkflowApp: execute orchestration task
WorkflowApp -> DaprSidecar: completeOrchestratorTask()
DaprSidecar --> WorkflowApp: 

else is activity task

WorkflowApp -> WorkflowApp: execute activity task
WorkflowApp -> DaprSidecar: completeActivityTask()
DaprSidecar --> WorkflowApp: 

end

end

@enduml

详细流程

register workflow

@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Java SDK" as DaprJavaSDK
participant "DurableTask Java SDK" as DurableTaskJavaSDK

WorkflowApp -> DaprJavaSDK: registerWorkflow()
DaprJavaSDK -> DurableTaskJavaSDK: addOrchestration()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp: 

@enduml

register activity

@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Java SDK" as DaprJavaSDK
participant "DurableTask Java SDK" as DurableTaskJavaSDK

WorkflowApp -> DaprJavaSDK: registerActivity()
DaprJavaSDK -> DurableTaskJavaSDK: registerActivity()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp: 

@enduml

start workflow runtime

@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Java SDK" as DaprJavaSDK
participant "DurableTask Java SDK" as DurableTaskJavaSDK

WorkflowApp -> DaprJavaSDK: WorkflowRuntime.start()
DaprJavaSDK -> DurableTaskJavaSDK: worker.start()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp: 
@enduml

worker execute tasks

@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Java SDK" as DaprJavaSDK
participant "DurableTask Java SDK" as DurableTaskJavaSDK

WorkflowApp -> DaprJavaSDK: registerWorkflow()
DaprJavaSDK -> DurableTaskJavaSDK: addOrchestration()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp: 

WorkflowApp -> DaprJavaSDK: registerActivity()
DaprJavaSDK -> DurableTaskJavaSDK: registerActivity()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp: 

WorkflowApp -> DaprJavaSDK: WorkflowRuntime.start()
DaprJavaSDK -> DurableTaskJavaSDK: worker.start()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp: 
@enduml

3.1.1.2 - 构建workflowruntime的源码

workflow app start流程中构建workflow runtime的源码

调用代码

workflow app 中构建 WorkflowRuntime 的典型使用代码如下：

    // Register the OrderProcessingWorkflow and its activities with the builder.
    WorkflowRuntimeBuilder builder = new WorkflowRuntimeBuilder().registerWorkflow(OrderProcessingWorkflow.class);
    builder.registerActivity(NotifyActivity.class);
    builder.registerActivity(ProcessPaymentActivity.class);
    builder.registerActivity(RequestApprovalActivity.class);
    builder.registerActivity(ReserveInventoryActivity.class);
    builder.registerActivity(UpdateInventoryActivity.class);

    // Build and then start the workflow runtime pulling and executing tasks
    try (WorkflowRuntime runtime = builder.build()) {
      System.out.println("Start workflow runtime");
      runtime.start(false);
    }

代码实现

WorkflowRuntimeBuilder

这个类在 dapr java sdk。

WorkflowRuntimeBuilder 的实现中，自己会保存 workflows 和 activities 信息，也会构建一个来自 DurableTask java sdk 的 DurableTaskGrpcWorkerBuilder 的实例。

import com.microsoft.durabletask.DurableTaskGrpcWorkerBuilder;

public class WorkflowRuntimeBuilder {
  private static volatile WorkflowRuntime instance;
  private DurableTaskGrpcWorkerBuilder builder;
  private Logger logger;
  private Set<String> workflows = new HashSet<String>();
  private Set<String> activities = new HashSet<String>();

  /**
   * Constructs the WorkflowRuntimeBuilder.
   */
  public WorkflowRuntimeBuilder() {
    this.builder = new DurableTaskGrpcWorkerBuilder().grpcChannel(
                          NetworkUtils.buildGrpcManagedChannel(WORKFLOW_INTERCEPTOR));
    this.logger = Logger.getLogger(WorkflowRuntimeBuilder.class.getName());
  }

registerWorkflow() 方法的实现，除了将请求代理给 DurableTaskGrpcWorkerBuilder 之外，还自己保存到 workflows 集合中：

  public <T extends Workflow> WorkflowRuntimeBuilder registerWorkflow(Class<T> clazz) {
    this.builder = this.builder.addOrchestration(
        new OrchestratorWrapper<>(clazz)
    );
    this.logger.log(Level.INFO, "Registered Workflow: " +  clazz.getSimpleName());
    this.workflows.add(clazz.getSimpleName());
    return this;
  }

registerActivity() 方法的实现类似，除了将请求代理给 DurableTaskGrpcWorkerBuilder 之外，还自己保存到 activities 集合中：

  public <T extends WorkflowActivity> void registerActivity(Class<T> clazz) {
    this.builder = this.builder.addActivity(
        new ActivityWrapper<>(clazz)
    );
    this.logger.log(Level.INFO, "Registered Activity: " +  clazz.getSimpleName());
    this.activities.add(clazz.getSimpleName());
  }

OrchestratorWrapper 和 ActivityWrapper 负责将 class 包装为 TaskOrchestrationFactory 和 TaskActivityFactory。

build() 方法调用 DurableTaskGrpcWorkerBuilder 的 build() 方法构建出一个 DurableTaskGrpcWorker ，然后传递给 WorkflowRuntime 的新实例。

  public WorkflowRuntime build() {
    if (instance == null) {
      synchronized (WorkflowRuntime.class) {
        if (instance == null) {
          instance = new WorkflowRuntime(this.builder.build());
        }
      }
    }
    this.logger.log(Level.INFO, "Successfully built dapr workflow runtime");
    return instance;
  }

DurableTaskGrpcWorkerBuilder

这个类在durabletask java sdk中。

DurableTaskGrpcWorkerBuilder 保存 orchestrationFactories 和 activityFactories，还有和 sidecar 连接的一些信息如端口，grpc channel：

public final class DurableTaskGrpcWorkerBuilder {
    final HashMap<String, TaskOrchestrationFactory> orchestrationFactories = new HashMap<>();
    final HashMap<String, TaskActivityFactory> activityFactories = new HashMap<>();
    int port;
    Channel channel;
    DataConverter dataConverter;
    Duration maximumTimerInterval;
......
}

addOrchestration() 将 TaskOrchestrationFactory 保存到 orchestrationFactories 中，key为 name：

    public DurableTaskGrpcWorkerBuilder addOrchestration(TaskOrchestrationFactory factory) {
        String key = factory.getName();
        ......
        this.orchestrationFactories.put(key, factory);
        return this;
    }

类似的, addActivity() 将 TaskActivityFactory 保存到 activityFactories 中，key为 name：

    public DurableTaskGrpcWorkerBuilder addActivity(TaskActivityFactory factory) {
        String key = factory.getName();
        ......
        this.activityFactories.put(key, factory);
        return this;
    }

build() 方法构建出 DurableTaskGrpcWorker() 对象：

    public DurableTaskGrpcWorker build() {
        return new DurableTaskGrpcWorker(this);
    }

DurableTaskGrpcWorker 的构造函数中会保存注册好的 orchestrationFactories 和 activityFactories，然后构建 TaskHubSidecarServiceGrpc 对象作为 sidecarClient，用于后续和 dapr sidecar 交互：

public final class DurableTaskGrpcWorker implements AutoCloseable {
    private final HashMap<String, TaskOrchestrationFactory> orchestrationFactories = new HashMap<>();
    private final HashMap<String, TaskActivityFactory> activityFactories = new HashMap<>();

    private final TaskHubSidecarServiceBlockingStub sidecarClient;

    DurableTaskGrpcWorker(DurableTaskGrpcWorkerBuilder builder) {
        this.orchestrationFactories.putAll(builder.orchestrationFactories);
        this.activityFactories.putAll(builder.activityFactories);

        Channel sidecarGrpcChannel;
        if (builder.channel != null) {
            // The caller is responsible for managing the channel lifetime
            this.managedSidecarChannel = null;
            sidecarGrpcChannel = builder.channel;
        } else {
            // Construct our own channel using localhost + a port number
            int port = DEFAULT_PORT;
            if (builder.port > 0) {
                port = builder.port;
            }

            // Need to keep track of this channel so we can dispose it on close()
            this.managedSidecarChannel = ManagedChannelBuilder
                    .forAddress("localhost", port)
                    .usePlaintext()
                    .build();
            sidecarGrpcChannel = this.managedSidecarChannel;
        }

        this.sidecarClient = TaskHubSidecarServiceGrpc.newBlockingStub(sidecarGrpcChannel);
        this.dataConverter = builder.dataConverter != null ? builder.dataConverter : new JacksonDataConverter();
        this.maximumTimerInterval = builder.maximumTimerInterval != null ? builder.maximumTimerInterval : DEFAULT_MAXIMUM_TIMER_INTERVAL;
    }

结论

dapr java sdk 中的 WorkflowRuntimeBuilder 和 durabletask java sdk 中的 DurableTaskGrpcWorkerBuilder，都是用来保住构建最终要使用的 WorkflowRuntime 和 DurableTaskGrpcWorker。

3.1.1.3 - 启动workflow runtime的源码

workflow app start流程中启动workflow runtime的源码

调用代码

workflow app 中启动 WorkflowRuntime 的典型使用代码如下：

    // Build and then start the workflow runtime pulling and executing tasks
    try (WorkflowRuntime runtime = builder.build()) {
      System.out.println("Start workflow runtime");
      //这里写死了 block=false,不会 block
      runtime.start(false);
    }

代码实现

WorkflowRuntime

这个类在 dapr java sdk。

WorkflowRuntime 只是对 DurableTaskGrpcWorker 的一个简单包装：

public class WorkflowRuntime implements AutoCloseable {

  private DurableTaskGrpcWorker worker;

  public WorkflowRuntime(DurableTaskGrpcWorker worker) {
    this.worker = worker;
  }
  ......

  public void start(boolean block) {
    if (block) {
      this.worker.startAndBlock();
    } else {
      this.worker.start();
    }
  }
}

DurableTaskGrpcWorker

这个类在durabletask java sdk中。

真实的实现代码在 DurableTaskGrpcWorker 中。


  public void start(boolean block) {
    if (block) {
      this.worker.startAndBlock();
    } else {
      // 1. block写死false了，所以只会进入到这里
      this.worker.start();
    }
  }

  public void start() {
    // 2. 启动线程来执行 startAndBlock，所以是不阻塞的
    new Thread(this::startAndBlock).start();
  }

startAndBlock()方法

这是最关键的代码。

这里不展开，看下一章 workflow runtime 的运行。

3.1.2 - workflow app run 流程

workflow app run流程的源码分析

3.1.2.1 - workflow app run流程概述

workflow app中workflow runtime运行的源码概述

上一章看到 workflow runtime start 之后，就会启动任务处理的流程。

代码实现在 durabletask java sdk 中的 DurableTaskGrpcWorker 类的 startAndBlock()方法中。

这是最关键的代码。

先构建两个 executor，负责执行 Orchestration task 和 activity task：

        TaskOrchestrationExecutor taskOrchestrationExecutor = new TaskOrchestrationExecutor(
                this.orchestrationFactories,
                this.dataConverter,
                this.maximumTimerInterval,
                logger);
        TaskActivityExecutor taskActivityExecutor = new TaskActivityExecutor(
                this.activityFactories,
                this.dataConverter,
                logger);

传入的参数有 orchestrationFactories 和 taskActivityExecutor，之前构建时注册的信息都保存在这里面。

获取工作任务

然后就是一个无限循环，在循环中调用 sidecarClient.getWorkItems(), 针对返回的 workitem stream，还有一个无限循环。而且如果遇到 StatusRuntimeException ，还会sleep之后继续。

while (true) {
  try {
      GetWorkItemsRequest getWorkItemsRequest = GetWorkItemsRequest.newBuilder().build();
      Iterator<WorkItem> workItemStream = this.sidecarClient.getWorkItems(getWorkItemsRequest);
      while (workItemStream.hasNext()) {
        ......
      }
  } catch(StatusRuntimeException e){
    ......
    // Retry after 5 seconds
    try {
        Thread.sleep(5000);
    } catch (InterruptedException ex) {
        break;
    }
    }
}

work items 的类型只有两种 orchestrator 和 activity：

while (workItemStream.hasNext()) {
    WorkItem workItem = workItemStream.next();
    RequestCase requestType = workItem.getRequestCase();
    if (requestType == RequestCase.ORCHESTRATORREQUEST) {
        ......
    } else if (requestType == RequestCase.ACTIVITYREQUEST) {
        ......
    } else {
        logger.log(Level.WARNING, "Received and dropped an unknown '{0}' work-item from the sidecar.", requestType);
    }
}

执行 orchestrator task

通过 taskOrchestrationExecutor 执行 orchestrator task，然后将结果返回给到 dapr sidecar。

OrchestratorRequest orchestratorRequest = workItem.getOrchestratorRequest();

TaskOrchestratorResult taskOrchestratorResult = taskOrchestrationExecutor.execute(
        orchestratorRequest.getPastEventsList(),
        orchestratorRequest.getNewEventsList());

OrchestratorResponse response = OrchestratorResponse.newBuilder()
        .setInstanceId(orchestratorRequest.getInstanceId())
        .addAllActions(taskOrchestratorResult.getActions())
        .setCustomStatus(StringValue.of(taskOrchestratorResult.getCustomStatus()))
        .build();

this.sidecarClient.completeOrchestratorTask(response);

备注：比较奇怪的是这里为什么不用 grpc 双向 stream 来获取任务和返回任务执行结果，而是通过另外一个 completeOrchestratorTask() 方法来发起请求。

执行 avtivity task

类似的，通过 taskActivityExecutor 执行 avtivity task，然后将结果返回给到 dapr sidecar。

ActivityRequest activityRequest = workItem.getActivityRequest();

String output = null;
TaskFailureDetails failureDetails = null;
try {
    output = taskActivityExecutor.execute(
        activityRequest.getName(),
        activityRequest.getInput().getValue(),
        activityRequest.getTaskId());
} catch (Throwable e) {
    failureDetails = TaskFailureDetails.newBuilder()
        .setErrorType(e.getClass().getName())
        .setErrorMessage(e.getMessage())
        .setStackTrace(StringValue.of(FailureDetails.getFullStackTrace(e)))
        .build();
}

ActivityResponse.Builder responseBuilder = ActivityResponse.newBuilder()
        .setInstanceId(activityRequest.getOrchestrationInstance().getInstanceId())
        .setTaskId(activityRequest.getTaskId());

if (output != null) {
    responseBuilder.setResult(StringValue.of(output));
}

if (failureDetails != null) {
    responseBuilder.setFailureDetails(failureDetails);
}

this.sidecarClient.completeActivityTask(responseBuilder.build());

3.1.2.2 - 获取工作任务

workflow runtime运行时获取工作任务的源码概述

获取工作任务的调用代码

DurableTaskGrpcWorker 会调用 sidecarClient.getWorkItems() 来获取工作任务。

private final TaskHubSidecarServiceBlockingStub sidecarClient;

while (true) {
  try {
      GetWorkItemsRequest getWorkItemsRequest = GetWorkItemsRequest.newBuilder().build();
      Iterator<WorkItem> workItemStream = this.sidecarClient.getWorkItems(getWorkItemsRequest);
      while (workItemStream.hasNext()) {
        ......
      }
  } catch{}
}

代码实现

proto 定义

TaskHubSidecarServiceBlockingStub 是根据 protobuf 文件生成的 grpc 代码，其 protobuf 定义在submodules/durabletask-protobuf/protos/orchestrator_service.proto 文件中。

service TaskHubSidecarService {
    ......
    rpc GetWorkItems(GetWorkItemsRequest) returns (stream WorkItem);
    ......
}

GetWorkItemsRequest 和 WorkItem 的消息定义为：

message GetWorkItemsRequest {
    // No parameters currently
}

message WorkItem {
    oneof request {
        OrchestratorRequest orchestratorRequest = 1;
        ActivityRequest activityRequest = 2;
    }
}

WorkItem 可能是 OrchestratorRequest 或者 ActivityRequest 。

OrchestratorRequest

message OrchestratorRequest {
    string instanceId = 1;
    google.protobuf.StringValue executionId = 2;
    repeated HistoryEvent pastEvents = 3;
    repeated HistoryEvent newEvents = 4;
}

ActivityRequest

message ActivityRequest {
    string name = 1;
    google.protobuf.StringValue version = 2;
    google.protobuf.StringValue input = 3;
    OrchestrationInstance orchestrationInstance = 4;
    int32 taskId = 5;
}

HistoryEvent

message HistoryEvent {
    int32 eventId = 1;
    google.protobuf.Timestamp timestamp = 2;
    oneof eventType {
        ExecutionStartedEvent executionStarted = 3;
        ExecutionCompletedEvent executionCompleted = 4;
        ExecutionTerminatedEvent executionTerminated = 5;
        TaskScheduledEvent taskScheduled = 6;
        TaskCompletedEvent taskCompleted = 7;
        TaskFailedEvent taskFailed = 8;
        SubOrchestrationInstanceCreatedEvent subOrchestrationInstanceCreated = 9;
        SubOrchestrationInstanceCompletedEvent subOrchestrationInstanceCompleted = 10;
        SubOrchestrationInstanceFailedEvent subOrchestrationInstanceFailed = 11;
        TimerCreatedEvent timerCreated = 12;
        TimerFiredEvent timerFired = 13;
        OrchestratorStartedEvent orchestratorStarted = 14;
        OrchestratorCompletedEvent orchestratorCompleted = 15;
        EventSentEvent eventSent = 16;
        EventRaisedEvent eventRaised = 17;
        GenericEvent genericEvent = 18;
        HistoryStateEvent historyState = 19;
        ContinueAsNewEvent continueAsNew = 20;
        ExecutionSuspendedEvent executionSuspended = 21;
        ExecutionResumedEvent executionResumed = 22;
    }
}

worker 调用

workflow app 中通过调用 sidecarClient.getWorkItems() 方法来获取 work items。

Iterator<WorkItem> workItemStream = this.sidecarClient.getWorkItems(getWorkItemsRequest);

这里面就是 grpc stub 的生成代码，不细看

TaskHubSidecarService 服务器实现

TaskHubSidecarService 这个 protobuf 定义的 grpc service 的服务器端，代码实现在 durabletask-go 仓库中。

protobuf 生成的 grpc stub 的类在这里：

internal/protos/orchestrator_service_grpc.pb.go
internal/protos/orchestrator_service.pb.go

服务器端代码实现在 backend/executor.go 中：

// GetWorkItems implements protos.TaskHubSidecarServiceServer
func (g *grpcExecutor) GetWorkItems(req *protos.GetWorkItemsRequest, stream protos.TaskHubSidecarService_GetWorkItemsServer) error {
    ......

	// The worker client invokes this method, which streams back work-items as they arrive.
	for {
		select {
		case <-stream.Context().Done():
			g.logger.Infof("work item stream closed")
			return nil
		case wi := <-g.workItemQueue:
			if err := stream.Send(wi); err != nil {
				return err
			}
		case <-g.streamShutdownChan:
			return errShuttingDown
		}
	}
}

所以返回给客户端调用的 work item stream 的数据来自 g.workItemQueue

type grpcExecutor struct {
    ......
	workItemQueue        chan *protos.WorkItem
}

workItemQueue 的实现逻辑

workItemQueue 在 grpcExecutor 中定义：

type grpcExecutor struct {
	workItemQueue        chan *protos.WorkItem
    ......
}

grpcExecutor 在 NewGrpcExecutor() 方法中构建：

// NewGrpcExecutor returns the Executor object and a method to invoke to register the gRPC server in the executor.
func NewGrpcExecutor(be Backend, logger Logger, opts ...grpcExecutorOptions) (executor Executor, registerServerFn func(grpcServer grpc.ServiceRegistrar)) {
	grpcExecutor := &grpcExecutor{
		workItemQueue:        make(chan *protos.WorkItem),
		backend:              be,
		logger:               logger,
		pendingOrchestrators: &sync.Map{},
		pendingActivities:    &sync.Map{},
	}

    ......
}

将数据写入 workItemQueue 的地方有两个：

ExecuteOrchestrator()

func (executor *grpcExecutor) ExecuteOrchestrator(......) {
    ......
        workItem := &protos.WorkItem{
        Request: &protos.WorkItem_OrchestratorRequest{
            OrchestratorRequest: &protos.OrchestratorRequest{
                InstanceId:  string(iid),
                ExecutionId: nil,
                PastEvents:  oldEvents,
                NewEvents:   newEvents,
            },
        },
    }

    executor.workItemQueue <- workItem:
}

ExecuteActivity()

func (executor *grpcExecutor) ExecuteActivity(......) {
    workItem := &protos.WorkItem{
	Request: &protos.WorkItem_ActivityRequest{
		ActivityRequest: &protos.ActivityRequest{
			Name:                  task.Name,
			Version:               task.Version,
			Input:                 task.Input,
			OrchestrationInstance: &protos.OrchestrationInstance{InstanceId: string(iid)},
			TaskId:                e.EventId,
		},
	},

    executor.workItemQueue <- workItem:
}

继续跟踪看 ExecuteOrchestrator() 和 ExecuteActivity() 方法是被谁调用的，这个细节在下一节中。

小结

获取工作任务的任务源头在 dapr sidecar，代码实现在 durabletask-go 项目的 backend/executor.go 中。

3.1.2.3 - 执行orchestrator task

workflow runtime运行时执行orchestrator task的源码

回顾

前面看到执行orchestrator task的代码实现在 durabletask-go 仓库的 client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java 中。

TaskOrchestrationExecutor taskOrchestrationExecutor = new TaskOrchestrationExecutor(
        this.orchestrationFactories,
        this.dataConverter,
        this.maximumTimerInterval,
        logger);
......
Iterator<WorkItem> workItemStream = this.sidecarClient.getWorkItems(getWorkItemsRequest);
while (workItemStream.hasNext()) {
    WorkItem workItem = workItemStream.next();
    RequestCase requestType = workItem.getRequestCase();
    if (requestType == RequestCase.ORCHESTRATORREQUEST) {
        OrchestratorRequest orchestratorRequest = workItem.getOrchestratorRequest();

        TaskOrchestratorResult taskOrchestratorResult = taskOrchestrationExecutor.execute(
                orchestratorRequest.getPastEventsList(),
                orchestratorRequest.getNewEventsList());

        OrchestratorResponse response = OrchestratorResponse.newBuilder()
                .setInstanceId(orchestratorRequest.getInstanceId())
                .addAllActions(taskOrchestratorResult.getActions())
                .setCustomStatus(StringValue.of(taskOrchestratorResult.getCustomStatus()))
                .build();

        this.sidecarClient.completeOrchestratorTask(response);
    }
    ......

实现细节

TaskOrchestrationExecutor

TaskOrchestrationExecutor 类的定义和构造函数：

 final class TaskOrchestrationExecutor {

    private static final String EMPTY_STRING = "";
    private final HashMap<String, TaskOrchestrationFactory> orchestrationFactories;
    private final DataConverter dataConverter;
    private final Logger logger;
    private final Duration maximumTimerInterval;

    public TaskOrchestrationExecutor(
            HashMap<String, TaskOrchestrationFactory> orchestrationFactories,
            DataConverter dataConverter,
            Duration maximumTimerInterval,
            Logger logger) {
        this.orchestrationFactories = orchestrationFactories;
        this.dataConverter = dataConverter;
        this.maximumTimerInterval = maximumTimerInterval;
        this.logger = logger;
    }

其中 orchestrationFactories 是从前面 registerWorkflow()时保存的已经注册的工作流信息。

execute() 方法：

public TaskOrchestratorResult execute(List<HistoryEvent> pastEvents, List<HistoryEvent> newEvents) {
    ContextImplTask context = new ContextImplTask(pastEvents, newEvents);

    boolean completed = false;
    try {
        // Play through the history events until either we've played through everything
        // or we receive a yield signal
        while (context.processNextEvent()) { /* no method body */ }
        completed = true;
    } catch (OrchestratorBlockedException orchestratorBlockedException) {
        logger.fine("The orchestrator has yielded and will await for new events.");
    } catch (ContinueAsNewInterruption continueAsNewInterruption) {
        logger.fine("The orchestrator has continued as new.");
        context.complete(null);
    } catch (Exception e) {
        // The orchestrator threw an unhandled exception - fail it
        // TODO: What's the right way to log this?
        logger.warning("The orchestrator failed with an unhandled exception: " + e.toString());
        context.fail(new FailureDetails(e));
    }

    if ((context.continuedAsNew && !context.isComplete) || (completed && context.pendingActions.isEmpty() && !context.waitingForEvents())) {
        // There are no further actions for the orchestrator to take so auto-complete the orchestration.
        context.complete(null);
    }

    return new TaskOrchestratorResult(context.pendingActions.values(), context.getCustomStatus());
}

这里只是主要流程，细节实现在内部私有类 ContextImplTask 中。

ContextImplTask

ContextImplTask 的定义和构造函数，使用到 OrchestrationHistoryIterator。

private class ContextImplTask implements TaskOrchestrationContext {
    private final OrchestrationHistoryIterator historyEventPlayer;
    ......

    public ContextImplTask(List<HistoryEvent> pastEvents, List<HistoryEvent> newEvents) {
        this.historyEventPlayer = new OrchestrationHistoryIterator(pastEvents, newEvents);
    }
    ......

    private boolean processNextEvent() {
        return this.historyEventPlayer.moveNext();
    }
}

OrchestrationHistoryIterator

OrchestrationHistoryIterator 的类定义和构造函数，其中 pastEvents 和 newEvents 是 daprd sidecar 那边在 getWorkItem() 返回的 orchestratorRequest 中携带的数据。

private class OrchestrationHistoryIterator {
    private final List<HistoryEvent> pastEvents;
    private final List<HistoryEvent> newEvents;

    private List<HistoryEvent> currentHistoryList;
    private int currentHistoryIndex;

    public OrchestrationHistoryIterator(List<HistoryEvent> pastEvents, List<HistoryEvent> newEvents) {
        this.pastEvents = pastEvents;
        this.newEvents = newEvents;
        this.currentHistoryList = pastEvents;
    }

currentHistoryList 初始化指向 pastEvents，currentHistoryIndex 为0。

然后继续看 moveNext() 方法：

    public boolean moveNext() {
        if (this.currentHistoryList == pastEvents && this.currentHistoryIndex >= pastEvents.size()) {
            // 如果当前 currentHistoryList 指向的是 pastEvents，并且已经指到最后一个元素了。
            // 那么 moveNext 就应该指向 this.newEvents，然后将 currentHistoryIndex 设置为0 （即指向第一个元素）
            // Move forward to the next list
            this.currentHistoryList = this.newEvents;
            this.currentHistoryIndex = 0;

            // 这意味着 pastEvents 的游历接触，即 replay 完成。
            ContextImplTask.this.setDoneReplaying();
        }

        if (this.currentHistoryList == this.newEvents && this.currentHistoryIndex >= this.newEvents.size()) {
            // 如果当前 currentHistoryList 指向的是 newEvents，并且已经指到最后一个元素了。
            // 此时已经完成游历，没有更多元素，返回 false 表示可以结束了。
            // We're done enumerating the history
            return false;
        }

        // Process the next event in the history
        // 获取当前元素，然后 currentHistoryIndex +1 指向下一个元素
        HistoryEvent next = this.currentHistoryList.get(this.currentHistoryIndex++);
        // 处理事件
        ContextImplTask.this.processEvent(next);
        return true;
    }

处理事件的代码实现在 ContextImplTask 的 processEvent() 方法中：

        private void processEvent(HistoryEvent e) {
            boolean overrideSuspension = e.getEventTypeCase() == HistoryEvent.EventTypeCase.EXECUTIONRESUMED || e.getEventTypeCase() == HistoryEvent.EventTypeCase.EXECUTIONTERMINATED;
            if (this.isSuspended && !overrideSuspension) {
                this.handleEventWhileSuspended(e);
            } else {
                switch (e.getEventTypeCase()) {
                    case ORCHESTRATORSTARTED:
                        Instant instant = DataConverter.getInstantFromTimestamp(e.getTimestamp());
                        this.setCurrentInstant(instant);
                        break;
                    case ORCHESTRATORCOMPLETED:
                        // No action
                        break;
                    case EXECUTIONSTARTED:
                        ExecutionStartedEvent startedEvent = e.getExecutionStarted();
                        String name = startedEvent.getName();
                        this.setName(name);
                        String instanceId = startedEvent.getOrchestrationInstance().getInstanceId();
                        this.setInstanceId(instanceId);
                        String input = startedEvent.getInput().getValue();
                        this.setInput(input);
                        TaskOrchestrationFactory factory = TaskOrchestrationExecutor.this.orchestrationFactories.get(name);
                        if (factory == null) {
                            // Try getting the default orchestrator
                            factory = TaskOrchestrationExecutor.this.orchestrationFactories.get("*");
                        }
                        // TODO: Throw if the factory is null (orchestration by that name doesn't exist)
                        TaskOrchestration orchestrator = factory.create();
                        orchestrator.run(this);
                        break;
//                case EXECUTIONCOMPLETED:
//                    break;
//                case EXECUTIONFAILED:
//                    break;
                    case EXECUTIONTERMINATED:
                        this.handleExecutionTerminated(e);
                        break;
                    case TASKSCHEDULED:
                        this.handleTaskScheduled(e);
                        break;
                    case TASKCOMPLETED:
                        this.handleTaskCompleted(e);
                        break;
                    case TASKFAILED:
                        this.handleTaskFailed(e);
                        break;
                    case TIMERCREATED:
                        this.handleTimerCreated(e);
                        break;
                    case TIMERFIRED:
                        this.handleTimerFired(e);
                        break;
                    case SUBORCHESTRATIONINSTANCECREATED:
                        this.handleSubOrchestrationCreated(e);
                        break;
                    case SUBORCHESTRATIONINSTANCECOMPLETED:
                        this.handleSubOrchestrationCompleted(e);
                        break;
                    case SUBORCHESTRATIONINSTANCEFAILED:
                        this.handleSubOrchestrationFailed(e);
                        break;
//                case EVENTSENT:
//                    break;
                    case EVENTRAISED:
                        this.handleEventRaised(e);
                        break;
//                case GENERICEVENT:
//                    break;
//                case HISTORYSTATE:
//                    break;
//                case EVENTTYPE_NOT_SET:
//                    break;
                    case EXECUTIONSUSPENDED:
                        this.handleExecutionSuspended(e);
                        break;
                    case EXECUTIONRESUMED:
                        this.handleExecutionResumed(e);
                        break;
                    default:
                        throw new IllegalStateException("Don't know how to handle history type " + e.getEventTypeCase());
                }
            }
        }

这里具体会执行什么代码，就要看给过来的 event 是什么了。

EXECUTIONSTARTED 事件的执行

这是 ExecutionStartedEvent 的 proto 定义：

message ExecutionStartedEvent {
    string name = 1;
    google.protobuf.StringValue version = 2;
    google.protobuf.StringValue input = 3;
    OrchestrationInstance orchestrationInstance = 4;
    ParentInstanceInfo parentInstance = 5;
    google.protobuf.Timestamp scheduledStartTimestamp = 6;
    TraceContext parentTraceContext = 7;
    google.protobuf.StringValue orchestrationSpanID = 8;
}

EXECUTIONSTARTED 事件的处理：

case EXECUTIONSTARTED:
    ExecutionStartedEvent startedEvent = e.getExecutionStarted();
    String name = startedEvent.getName();
    this.setName(name);
    String instanceId = startedEvent.getOrchestrationInstance().getInstanceId();
    this.setInstanceId(instanceId);
    String input = startedEvent.getInput().getValue();
    this.setInput(input);
    TaskOrchestrationFactory factory = TaskOrchestrationExecutor.this.orchestrationFactories.get(name);
    if (factory == null) {
        // Try getting the default orchestrator
        factory = TaskOrchestrationExecutor.this.orchestrationFactories.get("*");
    }
    // TODO: Throw if the factory is null (orchestration by that name doesn't exist)
    TaskOrchestration orchestrator = factory.create();
    orchestrator.run(this);
    break;

name / instanceId / input 等基本信息直接设置在 ContextImplTask 上。

factory 要从 orchestrationFactories 里面根据 name 查找，如果没有找到，则获取默认。

从 factory 创建 TaskOrchestration，再运行 orchestrator.run()：

    TaskOrchestration orchestrator = factory.create();
    orchestrator.run(this);

这就回到 TaskOrchestration 的实现了。

OrchestratorWrapper

Dapr java sdk 中的 OrchestratorWrapper 实现了 TaskOrchestration 接口

class OrchestratorWrapper<T extends Workflow> implements TaskOrchestrationFactory {
  @Override
  public TaskOrchestration create() {
    return ctx -> {
      T workflow;
      try {
        workflow = this.workflowConstructor.newInstance();
      } ......
    };
  }
}

3.1.3 - client app start 流程

client app start流程的源码分析

3.1.3.1 - 流程概述

client app start流程概述

流程整体

client app 启动时，典型代码如下（忽略细节和异常处理）：

DaprWorkflowClient workflowClient = new DaprWorkflowClient();
String instanceId = workflowClient.scheduleNewWorkflow(OrderProcessingWorkflow.class, order);
workflowClient.waitForInstanceStart(instanceId, Duration.ofSeconds(10), false);
WorkflowInstanceStatus workflowStatus = workflowClient.waitForInstanceCompletion(instanceId,
          Duration.ofSeconds(30),

这个过程中，初始化 workflowClient，然后通过 workflowClient 调度执行了一个 workflow 实例：包括等待实例启动，等待实例完成。

@startuml
participant "Client App" as ClientApp
participant "Dapr Sidecar" as DaprSidecar

ClientApp -> ClientApp: create workflow client

ClientApp -[#red]> DaprSidecar: scheduleNewWorkflow()
DaprSidecar --> ClientApp: instanceId

ClientApp -> DaprSidecar: waitForInstanceStart(instanceId)
DaprSidecar --> ClientApp: 

ClientApp -> DaprSidecar: waitForInstanceCompletion(instanceId)
DaprSidecar --> ClientApp: 

@enduml

3.1.3.2 - client app start流程

client app start流程源码分析

DaprWorkflowClient

Dapr java SDK 中的 DaprWorkflowClient，包裹了 durabletask java sdk 的 DurableTaskClient：

public class DaprWorkflowClient implements AutoCloseable {

  private DurableTaskClient innerClient;
  private ManagedChannel grpcChannel;

  private DaprWorkflowClient(ManagedChannel grpcChannel) {
    this(createDurableTaskClient(grpcChannel), grpcChannel);
  }

  private DaprWorkflowClient(DurableTaskClient innerClient, ManagedChannel grpcChannel) {
    this.innerClient = innerClient;
    this.grpcChannel = grpcChannel;
  }

  private static DurableTaskClient createDurableTaskClient(ManagedChannel grpcChannel) {
    return new DurableTaskGrpcClientBuilder()
        .grpcChannel(grpcChannel)
        .build();
  }
  ......
}

scheduleNewWorkflow()方法代理给了 DurableTaskClient 的 scheduleNewOrchestrationInstance() 方法：

  public <T extends Workflow> String scheduleNewWorkflow(Class<T> clazz, Object input, String instanceId) {
    return this.innerClient.scheduleNewOrchestrationInstance(clazz.getCanonicalName(), input, instanceId);
  }

DurableTaskClient 和 DurableTaskGrpcClient

这两个类在 durabletask java sdk 中。

DurableTaskGrpcClient 的 scheduleNewOrchestrationInstance() 方法的实现：

@Override
public String scheduleNewOrchestrationInstance(
        String orchestratorName,
        NewOrchestrationInstanceOptions options) {
    if (orchestratorName == null || orchestratorName.length() == 0) {
        throw new IllegalArgumentException("A non-empty orchestrator name must be specified.");
    }

    Helpers.throwIfArgumentNull(options, "options");

    CreateInstanceRequest.Builder builder = CreateInstanceRequest.newBuilder();
    builder.setName(orchestratorName);

    String instanceId = options.getInstanceId();
    if (instanceId == null) {
        instanceId = UUID.randomUUID().toString();
    }
    builder.setInstanceId(instanceId);

    String version = options.getVersion();
    if (version != null) {
        builder.setVersion(StringValue.of(version));
    }

    Object input = options.getInput();
    if (input != null) {
        String serializedInput = this.dataConverter.serialize(input);
        builder.setInput(StringValue.of(serializedInput));
    }

    Instant startTime = options.getStartTime();
    if (startTime != null) {
        Timestamp ts = DataConverter.getTimestampFromInstant(startTime);
        builder.setScheduledStartTimestamp(ts);
    }

    CreateInstanceRequest request = builder.build();
    CreateInstanceResponse response = this.sidecarClient.startInstance(request);
    return response.getInstanceId();
}

前面一大段都是为了构建 CreateInstanceRequest，然后最后调用 sidecarClient.startInstance() 方法去访问 sidecar 。

proto 定义

TaskHubSidecarServiceBlockingStub 是根据 protobuf 文件生成的 grpc 代码，其 protobuf 定义在submodules/durabletask-protobuf/protos/orchestrator_service.proto 文件中。

service TaskHubSidecarService {
    ......
    // Starts a new orchestration instance.
    rpc StartInstance(CreateInstanceRequest) returns (CreateInstanceResponse);
    ......
}

CreateInstanceRequest 消息的定义为：

message CreateInstanceRequest {
    string instanceId = 1;
    string name = 2;
    google.protobuf.StringValue version = 3;
    google.protobuf.StringValue input = 4;
    google.protobuf.Timestamp scheduledStartTimestamp = 5;
    OrchestrationIdReusePolicy orchestrationIdReusePolicy = 6;
}

备注：这个version字段不知道是做什么的？后面注意看看细节。

CreateInstanceResponse 信息的定义，很简单，只有一个 instanceId 字段。

message CreateInstanceResponse {
    string instanceId = 1;
}

代码实现

StartInstance 的代码实现在 backend/executor.go 中:

func (g *grpcExecutor) StartInstance(ctx context.Context, req *protos.CreateInstanceRequest) (*protos.CreateInstanceResponse, error) {
	instanceID := req.InstanceId
	ctx, span := helpers.StartNewCreateOrchestrationSpan(ctx, req.Name, req.Version.GetValue(), instanceID)
	defer span.End()

	e := helpers.NewExecutionStartedEvent(req.Name, instanceID, req.Input, nil, helpers.TraceContextFromSpan(span))
	if err := g.backend.CreateOrchestrationInstance(ctx, e, WithOrchestrationIdReusePolicy(req.OrchestrationIdReusePolicy)); err != nil {
		return nil, err
	}

	return &protos.CreateInstanceResponse{InstanceId: instanceID}, nil
}

StartNewCreateOrchestrationSpan() 方法

helpers.StartNewCreateOrchestrationSpan() 方法的实现：

func StartNewCreateOrchestrationSpan(
	ctx context.Context, name string, version string, instanceID string,
) (context.Context, trace.Span) {
	attributes := []attribute.KeyValue{
		{Key: "durabletask.type", Value: attribute.StringValue("orchestration")},
		{Key: "durabletask.task.name", Value: attribute.StringValue(name)},
		{Key: "durabletask.task.instance_id", Value: attribute.StringValue(instanceID)},
	}
	return startNewSpan(ctx, "create_orchestration", name, version, attributes, trace.SpanKindClient, time.Now().UTC())
}

startNewSpan()的实现：

func startNewSpan(
	ctx context.Context,
	taskType string,
	taskName string,
	taskVersion string,
	attributes []attribute.KeyValue,
	kind trace.SpanKind,
	timestamp time.Time,
) (context.Context, trace.Span) {
	var spanName string
	if taskVersion != "" {
		spanName = taskType + "||" + taskName + "||" + taskVersion
		attributes = append(attributes, attribute.KeyValue{
			Key:   "durabletask.task.version",
			Value: attribute.StringValue(taskVersion),
		})
	} else if taskName != "" {
		spanName = taskType + "||" + taskName
	} else {
		spanName = taskType
	}

	var span trace.Span
	ctx, span = tracer.Start(
		ctx,
		spanName,
		trace.WithSpanKind(kind),
		trace.WithTimestamp(timestamp),
		trace.WithAttributes(attributes...),
	)
	return ctx, span
}

构建 spanName 的逻辑比较复杂，因为 taskVersion 和 taskName 可能为空（按说 taskName 不能为空）

spanName = taskType + "||" + taskName + "||" + taskVersion
spanName = taskType + "||" + taskName
spanName = taskType

NewExecutionStartedEvent() 方法

这行代码的作用是构建一个 ExecutionStartedEvent 事件：

e := helpers.NewExecutionStartedEvent(req.Name, instanceID, req.Input, nil, helpers.TraceContextFromSpan(span))

具体实现为：

func NewExecutionStartedEvent(
	name string,
	instanceId string,
	input *wrapperspb.StringValue,
	parent *protos.ParentInstanceInfo,
	parentTraceContext *protos.TraceContext,
) *protos.HistoryEvent {
	return &protos.HistoryEvent{
		EventId:   -1,
		Timestamp: timestamppb.New(time.Now()),
		EventType: &protos.HistoryEvent_ExecutionStarted{
			ExecutionStarted: &protos.ExecutionStartedEvent{
				Name:           name,
				ParentInstance: parent,
				Input:          input,
				OrchestrationInstance: &protos.OrchestrationInstance{
					InstanceId:  instanceId,
					ExecutionId: wrapperspb.String(uuid.New().String()),
				},
				ParentTraceContext: parentTraceContext,
			},
		},
	}
}

备注：这里没有用到 version 字段

CreateOrchestrationInstance() 方法

最关键的代码：

if err := g.backend.CreateOrchestrationInstance(ctx, e, WithOrchestrationIdReusePolicy(req.OrchestrationIdReusePolicy)); err != nil {
  return nil, err
}

Backend 是一个 interface，CreateOrchestrationInstance() 方法定义如下：

type Backend interface {
  // CreateOrchestrationInstance creates a new orchestration instance with a history event that
	// wraps a ExecutionStarted event.
	CreateOrchestrationInstance(context.Context, *HistoryEvent, ...OrchestrationIdReusePolicyOptions) error
  ......
}

daprd 的实现

在 daprd sidecar 的代码实现中，这个 backend 是这样构建的，代码在 dapr/dapr 仓库的 pkg/runtime/wfengine/wfengine.go :

func (wfe *WorkflowEngine) ConfigureGrpcExecutor() {
	// Enable lazy auto-starting the worker only when a workflow app connects to fetch work items.
	autoStartCallback := backend.WithOnGetWorkItemsConnectionCallback(func(ctx context.Context) error {
		// NOTE: We don't propagate the context here because that would cause the engine to shut
		//       down when the client disconnects and cancels the passed-in context. Once it starts
		//       up, we want to keep the engine running until the runtime shuts down.
		if err := wfe.Start(context.Background()); err != nil {
			// This can happen if the workflow app connects before the sidecar has finished initializing.
			// The client app is expected to continuously retry until successful.
			return fmt.Errorf("failed to auto-start the workflow engine: %w", err)
		}
		return nil
	})

	// Create a channel that can be used to disconnect the remote client during shutdown.
	wfe.disconnectChan = make(chan any, 1)
	disconnectHelper := backend.WithStreamShutdownChannel(wfe.disconnectChan)

	wfe.executor, wfe.registerGrpcServerFn = backend.NewGrpcExecutor(wfe.Backend, wfLogger, autoStartCallback, disconnectHelper)
}

WorkflowEngine 的初始化代码在 pkg/runtime/runtime.go 中：

	// Creating workflow engine after components are loaded
	wfe := wfengine.NewWorkflowEngine(a.runtimeConfig.id, a.globalConfig.GetWorkflowSpec(), a.processor.WorkflowBackend())
	wfe.ConfigureGrpcExecutor()
	a.workflowEngine = wfe

	processor := processor.New(processor.Options{
		ID:             runtimeConfig.id,
		Namespace:      namespace,
		IsHTTP:         runtimeConfig.appConnectionConfig.Protocol.IsHTTP(),
		ActorsEnabled:  len(runtimeConfig.actorsService) > 0,
		Registry:       runtimeConfig.registry,
		ComponentStore: compStore,
		Meta:           meta,
		GlobalConfig:   globalConfig,
		Resiliency:     resiliencyProvider,
		Mode:           runtimeConfig.mode,
		PodName:        podName,
		Standalone:     runtimeConfig.standalone,
		OperatorClient: operatorClient,
		GRPC:           grpc,
		Channels:       channels,
	})

ActorBackend

ActorBackend 实现了 durabletask-go 定义的 Backend 接口：

type ActorBackend struct {
	orchestrationWorkItemChan chan *backend.OrchestrationWorkItem
	activityWorkItemChan      chan *backend.ActivityWorkItem
	startedOnce               sync.Once
	config                    actorsBackendConfig
	activityActorOpts         activityActorOpts
	workflowActorOpts         workflowActorOpts

	actorRuntime  actors.ActorRuntime
	actorsReady   atomic.Bool
	actorsReadyCh chan struct{}
}

CreateOrchestrationInstance() 方法的实现：

func (abe *ActorBackend) CreateOrchestrationInstance(ctx context.Context, e *backend.HistoryEvent, opts ...backend.OrchestrationIdReusePolicyOptions) error {
	if err := abe.validateConfiguration(); err != nil {
		return err
	}

  // 对输入做必要的检查
	var workflowInstanceID string
	if es := e.GetExecutionStarted(); es == nil {
		return errors.New("the history event must be an ExecutionStartedEvent")
	} else if oi := es.GetOrchestrationInstance(); oi == nil {
		return errors.New("the ExecutionStartedEvent did not contain orchestration instance information")
	} else {
		workflowInstanceID = oi.GetInstanceId()
	}

	policy := &api.OrchestrationIdReusePolicy{}
	for _, opt := range opts {
		opt(policy)
	}

	eventData, err := backend.MarshalHistoryEvent(e)
	if err != nil {
		return err
	}

	requestBytes, err := json.Marshal(CreateWorkflowInstanceRequest{
		Policy:          policy,
		StartEventBytes: eventData,
	})
	if err != nil {
		return fmt.Errorf("failed to marshal CreateWorkflowInstanceRequest: %w", err)
	}

	// Invoke the well-known workflow actor directly, which will be created by this invocation request.
	// Note that this request goes directly to the actor runtime, bypassing the API layer.
	req := internalsv1pb.NewInternalInvokeRequest(CreateWorkflowInstanceMethod).
		WithActor(abe.config.workflowActorType, workflowInstanceID).
		WithData(requestBytes).
		WithContentType(invokev1.JSONContentType)
	start := time.Now()
	_, err = abe.actorRuntime.Call(ctx, req)
	elapsed := diag.ElapsedSince(start)
	if err != nil {
		// failed request to CREATE workflow, record count and latency metrics.
		diag.DefaultWorkflowMonitoring.WorkflowOperationEvent(ctx, diag.CreateWorkflow, diag.StatusFailed, elapsed)
		return err
	}
	// successful request to CREATE workflow, record count and latency metrics.
	diag.DefaultWorkflowMonitoring.WorkflowOperationEvent(ctx, diag.CreateWorkflow, diag.StatusSuccess, elapsed)
	return nil
}

关键代码在:

_, err = abe.actorRuntime.Call(ctx, req)

这是通过 actor 来进行调用。

其中 ActorRuntime 是这样设置进来的：

func (abe *ActorBackend) SetActorRuntime(ctx context.Context, actorRuntime actors.ActorRuntime) {
	abe.actorRuntime = actorRuntime
	if abe.actorsReady.CompareAndSwap(false, true) {
		close(abe.actorsReadyCh)
	}
}

调用的地方在 pkg/runtime/runtime.go 的 initWorkflowEngine() 方法中：

func (a *DaprRuntime) initWorkflowEngine(ctx context.Context) error {
	wfComponentFactory := wfengine.BuiltinWorkflowFactory(a.workflowEngine)

	// If actors are not enabled, still invoke SetActorRuntime on the workflow engine with `nil` to unblock startup
	if abe, ok := a.workflowEngine.Backend.(interface {
		SetActorRuntime(ctx context.Context, actorRuntime actors.ActorRuntime)
	}); ok {
		log.Info("Configuring workflow engine with actors backend")
		var actorRuntime actors.ActorRuntime
		if a.runtimeConfig.ActorsEnabled() {
			actorRuntime = a.actor
		}
		abe.SetActorRuntime(ctx, actorRuntime)
	}
  ......

actorRuntime的实现

ActorRuntime 的 interface 定义：

// ActorRuntime is the main runtime for the actors subsystem.
type ActorRuntime interface {
	Actors
	io.Closer
	Init(context.Context) error
	IsActorHosted(ctx context.Context, req *ActorHostedRequest) bool
	GetRuntimeStatus(ctx context.Context) *runtimev1pb.ActorRuntime
	RegisterInternalActor(ctx context.Context, actorType string, actor InternalActorFactory, actorIdleTimeout time.Duration) error
}

ActorRuntime 继承了 Actors interface，call()方法在这里定义：

// Actors allow calling into virtual actors as well as actor state management.
type Actors interface {
	// Call an actor.
	Call(ctx context.Context, req *internalv1pb.InternalInvokeRequest) (*internalv1pb.InternalInvokeResponse, error)
  ......
}

Call()方法的代码实现：

func (a *actorsRuntime) Call(ctx context.Context, req *internalv1pb.InternalInvokeRequest) (res *internalv1pb.InternalInvokeResponse, err error) {
	err = a.placement.WaitUntilReady(ctx)
	if err != nil {
		return nil, fmt.Errorf("failed to wait for placement readiness: %w", err)
	}

	// Do a lookup to check if the actor is local
	actor := req.GetActor()
	actorType := actor.GetActorType()
	lar, err := a.placement.LookupActor(ctx, internal.LookupActorRequest{
		ActorType: actorType,
		ActorID:   actor.GetActorId(),
	})
	if err != nil {
		return nil, err
	}

	if a.isActorLocal(lar.Address, a.actorsConfig.Config.HostAddress, a.actorsConfig.Config.Port) {
		// If this is an internal actor, we call it using a separate path
		internalAct, ok := a.getInternalActor(actorType, actor.GetActorId())
		if ok {
			res, err = a.callInternalActor(ctx, req, internalAct)
		} else {
			res, err = a.callLocalActor(ctx, req)
		}
	} else {
		res, err = a.callRemoteActorWithRetry(ctx, retry.DefaultLinearRetryCount, retry.DefaultLinearBackoffInterval, a.callRemoteActor, lar.Address, lar.AppID, req)
	}

	if err != nil {
		if res != nil && actorerrors.Is(err) {
			return res, err
		}
		return nil, err
	}
	return res, nil
}

关键代码在这里，调用 placement.LookupActor() 方法来查找要调用的目标actor的地址：

	lar, err := a.placement.LookupActor(ctx, internal.LookupActorRequest{
		ActorType: actorType,
		ActorID:   actor.GetActorId(),
	})

placement 的实现

PlacementService 的接口定义：

type PlacementService interface {
	io.Closer

	Start(context.Context) error
	WaitUntilReady(ctx context.Context) error
	LookupActor(ctx context.Context, req LookupActorRequest) (LookupActorResponse, error)
	AddHostedActorType(actorType string, idleTimeout time.Duration) error
	ReportActorDeactivation(ctx context.Context, actorType, actorID string) error

	SetHaltActorFns(haltFn HaltActorFn, haltAllFn HaltAllActorsFn)
	SetOnAPILevelUpdate(fn func(apiLevel uint32))
	SetOnTableUpdateFn(fn func())

	// PlacementHealthy returns true if the placement service is healthy.
	PlacementHealthy() bool
	// StatusMessage returns a custom status message.
	StatusMessage() string
}

代码实现在 pkg/actors/placement/placement.go 中：

// LookupActor resolves to actor service instance address using consistent hashing table.
func (p *actorPlacement) LookupActor(ctx context.Context, req internal.LookupActorRequest) (internal.LookupActorResponse, error) {
	// Retry here to allow placement table dissemination/rebalancing to happen.
	policyDef := p.resiliency.BuiltInPolicy(resiliency.BuiltInActorNotFoundRetries)
	policyRunner := resiliency.NewRunner[internal.LookupActorResponse](ctx, policyDef)
	return policyRunner(func(ctx context.Context) (res internal.LookupActorResponse, rErr error) {
		rAddr, rAppID, rErr := p.doLookupActor(ctx, req.ActorType, req.ActorID)
		if rErr != nil {
			return res, fmt.Errorf("error finding address for actor %s/%s: %w", req.ActorType, req.ActorID, rErr)
		} else if rAddr == "" {
			return res, fmt.Errorf("did not find address for actor %s/%s", req.ActorType, req.ActorID)
		}
		res.Address = rAddr
		res.AppID = rAppID
		return res, nil
	})
}

doLookupActor():

func (p *actorPlacement) doLookupActor(ctx context.Context, actorType, actorID string) (string, string, error) {
  // 加读锁
	p.placementTableLock.RLock()
	defer p.placementTableLock.RUnlock()

	if p.placementTables == nil {
		return "", "", errors.New("placement tables are not set")
	}

  // 先根据 actorType 找到符合要求的 Entries
	t := p.placementTables.Entries[actorType]
	if t == nil {
		return "", "", nil
	}
	host, err := t.GetHost(actorID)
	if err != nil || host == nil {
		return "", "", nil //nolint:nilerr
	}
	return host.Name, host.AppID, nil
}

p.placementTables 的结构体定义如下：

type ConsistentHashTables struct {
	Version string
	Entries map[string]*Consistent
}

Consistent 的结构体定义如下：

// Consistent represents a data structure for consistent hashing.
type Consistent struct {
	hosts             map[uint64]string
	sortedSet         []uint64
	loadMap           map[string]*Host
	totalLoad         int64
	replicationFactor int

	sync.RWMutex
}

host, err := t.GetHost(actorID) 代码对应的 GetHost() 方法：

func (c *Consistent) GetHost(key string) (*Host, error) {
	h, err := c.Get(key)
	if err != nil {
		return nil, err
	}

	return c.loadMap[h], nil
}

4 - kit仓库的源码学习

Dapr源码学习之kit仓库

https://github.com/dapr/kit

4.1 - kit仓库简介

存放共享的工具代码

kit仓库的介绍

Shared utility code for Dapr runtime.

https://github.com/dapr/kit

目前内容很少，只有 logger/config/retry 三个package。

kit仓库的背景

kit 仓库是后来提取出来的仓库，原来的代码存放在 dapr 仓库中，被 dapr 仓库中的其他代码使用。后来 components-contrib 仓库的代码也使用了这些基础代码，这导致了一个循环依赖：

dapr 仓库依赖 components-contrib 仓库: 使用 components-contrib 仓库仓库中的各种 components 实现
components-contrib 仓库依赖dapr 仓库：使用dapr 仓库中的基础代码。

participant dapr
participant       "components-contrib"       as components
dapr -> components : for component impl 
components -> dapr : for common code

为了让依赖关系更加的清晰，避免循环依赖，因此将这些基础代码从 dapr 仓库中移出来存放在单独的 kit仓库中，之后的依赖关系就是这样：

dapr 仓库依赖 components-contrib 仓库: 使用 components-contrib 仓库仓库中的各种 components 实现
dapr 仓库依赖 kit 仓库：使用 kit 仓库中的基础代码。
components-contrib 仓库依赖 kit 仓库：使用 kit 仓库中的基础代码。

participant dapr
participant       "components-contrib"       as components
participant kit

dapr -> kit : for common code
components -> kit : for common code
dapr -> components : for component impl

4.2 - logger的源码学习

Dapr Logger package的源码学习

4.2.1 - logger.go的源码学习

定义logger相关的日志类型、schema、日志级别、接口以及保存全局logger列表

Dapr Logger package中的logger.go文件的源码学习，定义logger相关`的日志类型、schema、日志级别、接口以及保存全局logger列表。

logger的相关定义

log type

log类型分为普通 log 和 request 两种：

const (
	// LogTypeLog is normal log type
	LogTypeLog = "log"
	// LogTypeRequest is Request log type
	LogTypeRequest = "request"
    ......
}

log schema

const (
    ......
	// Field names that defines Dapr log schema
	logFieldTimeStamp = "time"
	logFieldLevel     = "level"
	logFieldType      = "type"
	logFieldScope     = "scope"
	logFieldMessage   = "msg"
	logFieldInstance  = "instance"
	logFieldDaprVer   = "ver"
	logFieldAppID     = "app_id"
)

log level

log level 没啥特别，很传统的定义：

const (
	// DebugLevel has verbose message
	DebugLevel LogLevel = "debug"
	// InfoLevel is default log level
	InfoLevel LogLevel = "info"
	// WarnLevel is for logging messages about possible issues
	WarnLevel LogLevel = "warn"
	// ErrorLevel is for logging errors
	ErrorLevel LogLevel = "error"
	// FatalLevel is for logging fatal messages. The system shuts down after logging the message.
	FatalLevel LogLevel = "fatal"

	// UndefinedLevel is for undefined log level
	UndefinedLevel LogLevel = "undefined"
)

注意： FatalLevel 有特别的意义，”The system shuts down after logging the message“. 所以这个不能随便用。

toLogLevel() 方法将字符串转为 LogLevel，大小写不敏感：

// toLogLevel converts to LogLevel
func toLogLevel(level string) LogLevel {
	switch strings.ToLower(level) {
	case "debug":
		return DebugLevel
	case "info":
		return InfoLevel
	case "warn":
		return WarnLevel
	case "error":
		return ErrorLevel
	case "fatal":
		return FatalLevel
	}

	// unsupported log level by Dapr
	return UndefinedLevel
}

Logger 接口定义

// Logger includes the logging api sets
type Logger interface {
	// EnableJSONOutput enables JSON formatted output log
	EnableJSONOutput(enabled bool)

	// SetAppID sets dapr_id field in log. Default value is empty string
	SetAppID(id string)
	// SetOutputLevel sets log output level
	SetOutputLevel(outputLevel LogLevel)

	// WithLogType specify the log_type field in log. Default value is LogTypeLog
	WithLogType(logType string) Logger

	// Info logs a message at level Info.
	Info(args ...interface{})
	// Infof logs a message at level Info.
	Infof(format string, args ...interface{})
	// Debug logs a message at level Debug.
	Debug(args ...interface{})
	// Debugf logs a message at level Debug.
	Debugf(format string, args ...interface{})
	// Warn logs a message at level Warn.
	Warn(args ...interface{})
	// Warnf logs a message at level Warn.
	Warnf(format string, args ...interface{})
	// Error logs a message at level Error.
	Error(args ...interface{})
	// Errorf logs a message at level Error.
	Errorf(format string, args ...interface{})
	// Fatal logs a message at level Fatal then the process will exit with status set to 1.
	Fatal(args ...interface{})
	// Fatalf logs a message at level Fatal then the process will exit with status set to 1.
	Fatalf(format string, args ...interface{})
}

logger的创建和获取

全局 logger 列表

// globalLoggers is the collection of Dapr Logger that is shared globally.
// TODO: User will disable or enable logger on demand.
var globalLoggers = map[string]Logger{}  // map保存所有的logger实例
var globalLoggersLock = sync.RWMutex{}   // 用读写锁对map进行保护

创建新logger或获取已经保存的logger

logger创建之后会保存在 global loggers 中，这意味着每个 name 的logger只会创建一个实例。

// NewLogger creates new Logger instance.
func NewLogger(name string) Logger {
	globalLoggersLock.Lock()	// 加写锁
	defer globalLoggersLock.Unlock()

	logger, ok := globalLoggers[name]
	if !ok {
		logger = newDaprLogger(name)
		globalLoggers[name] = logger
	}

	return logger
}

newDaprLogger() 方法的细节见 dapr_logger.go。

获取所有已经创建的logger列表

func getLoggers() map[string]Logger {
	globalLoggersLock.RLock()		// 加读锁
	defer globalLoggersLock.RUnlock()

	l := map[string]Logger{}
	for k, v := range globalLoggers {
		l[k] = v
	}

	return l
}

4.2.2 - dapr_logger.go的源码学习

daprLogger 是实际的日志实现

Dapr logger package中的dapr_logger.go文件的源码分析，daprLogger 是实际的日志实现。

daprLogger 结构体定义

daprLogger 结构体，底层实现是 logrus ：

// daprLogger is the implemention for logrus
type daprLogger struct {
	// name is the name of logger that is published to log as a scope
	name string
	// loger is the instance of logrus logger
	logger *logrus.Entry
}

创建Dapr logger

创建Dapr logger的逻辑：

func newDaprLogger(name string) *daprLogger {
   // 底层是 logrus
	newLogger := logrus.New()
   // 输出到 stdout
	newLogger.SetOutput(os.Stdout)

	dl := &daprLogger{
		name: name,
		logger: newLogger.WithFields(logrus.Fields{
			logFieldScope: name,
         // 默认是普通log类型
			logFieldType:  LogTypeLog,
		}),
	}

   // 设置是否启用json输出，defaultJSONOutput默认是false
	dl.EnableJSONOutput(defaultJSONOutput)

	return dl
}

启用json输出

函数名有点小问题，实际是初始化logger，是否enables JSON只是部分逻辑：

// EnableJSONOutput enables JSON formatted output log
func (l *daprLogger) EnableJSONOutput(enabled bool) {
	var formatter logrus.Formatter

	fieldMap := logrus.FieldMap{
		// If time field name is conflicted, logrus adds "fields." prefix.
		// So rename to unused field @time to avoid the confliction.
		logrus.FieldKeyTime:  logFieldTimeStamp,
		logrus.FieldKeyLevel: logFieldLevel,
		logrus.FieldKeyMsg:   logFieldMessage,
	}

	hostname, _ := os.Hostname()
	l.logger.Data = logrus.Fields{
		logFieldScope:    l.logger.Data[logFieldScope],
		logFieldType:     LogTypeLog,
		logFieldInstance: hostname,
		logFieldDaprVer:  DaprVersion,
	}

	if enabled {
		formatter = &logrus.JSONFormatter{
			TimestampFormat: time.RFC3339Nano,
			FieldMap:        fieldMap,
		}
	} else {
		formatter = &logrus.TextFormatter{
			TimestampFormat: time.RFC3339Nano,
			FieldMap:        fieldMap,
		}
	}

	l.logger.Logger.SetFormatter(formatter)
}

logger的设置

设置DaprVersion

var DaprVersion string = "unknown"

func (l *daprLogger) EnableJSONOutput(enabled bool) {
	l.logger.Data = logrus.Fields{
        ......
		logFieldDaprVer:  DaprVersion,
	}
}

DaprVersion的值来自于 makefile (dapr/Makefile):

LOGGER_PACKAGE_NAME := github.com/dapr/kit/logger

DEFAULT_LDFLAGS:=-X $(BASE_PACKAGE_NAME)/pkg/version.gitcommit=$(GIT_COMMIT) \
  -X $(BASE_PACKAGE_NAME)/pkg/version.gitversion=$(GIT_VERSION) \
  -X $(BASE_PACKAGE_NAME)/pkg/version.version=$(DAPR_VERSION) \
  -X $(LOGGER_PACKAGE_NAME).DaprVersion=$(DAPR_VERSION)

设置appid

设置日志的 app_id 字段，默认为空。

// SetAppID sets app_id field in log. Default value is empty string
func (l *daprLogger) SetAppID(id string) {
	l.logger = l.logger.WithField(logFieldAppID, id)
}

这个方法在logger被初始化时调用进行设置，见 options.go 方法：

func ApplyOptionsToLoggers(options *Options) error {
    	......
 		if options.appID != undefinedAppID {
			v.SetAppID(options.appID)
		}   
}

设置日志级别

// SetOutputLevel sets log output level
func (l *daprLogger) SetOutputLevel(outputLevel LogLevel) {
   l.logger.Logger.SetLevel(toLogrusLevel(outputLevel))
}

func toLogrusLevel(lvl LogLevel) logrus.Level {
	// ignore error because it will never happens
	l, _ := logrus.ParseLevel(string(lvl))
	return l
}

这个是在原有的 daprLogger 实例上进行设置，没啥特殊。

设置日志类型

默认是普通 log 类型，如果要设置log类型：

// WithLogType specify the log_type field in log. Default value is LogTypeLog
func (l *daprLogger) WithLogType(logType string) Logger {
   // 这里重新构造了一个新的 daprLogger 结构体，然后返回
   return &daprLogger{
      name:   l.name,
      logger: l.logger.WithField(logFieldType, logType),
   }
}

疑问和TODO：

为什么不是直接设置 l.logger，而是构造一个新的结构体，然后返回还是 Logger ？
会不会有隐患？前面logger创建之后是存放在global logger map中的，key是简单的 name 而不是 name + logtype，这岂不是无法保存一个 name 两个类型的两个 logger 对象？

logger的实现

所有的写log的方法都简单代理给了 l.logger （*logrus.Entry）：

// Info logs a message at level Info.
func (l *daprLogger) Info(args ...interface{}) {
	l.logger.Log(logrus.InfoLevel, args...)
}

// Infof logs a message at level Info.
func (l *daprLogger) Infof(format string, args ...interface{}) {
	l.logger.Logf(logrus.InfoLevel, format, args...)
}

// Debug logs a message at level Debug.
func (l *daprLogger) Debug(args ...interface{}) {
	l.logger.Log(logrus.DebugLevel, args...)
}

// Debugf logs a message at level Debug.
func (l *daprLogger) Debugf(format string, args ...interface{}) {
	l.logger.Logf(logrus.DebugLevel, format, args...)
}

// Warn logs a message at level Warn.
func (l *daprLogger) Warn(args ...interface{}) {
	l.logger.Log(logrus.WarnLevel, args...)
}

// Warnf logs a message at level Warn.
func (l *daprLogger) Warnf(format string, args ...interface{}) {
	l.logger.Logf(logrus.WarnLevel, format, args...)
}

// Error logs a message at level Error.
func (l *daprLogger) Error(args ...interface{}) {
	l.logger.Log(logrus.ErrorLevel, args...)
}

// Errorf logs a message at level Error.
func (l *daprLogger) Errorf(format string, args ...interface{}) {
	l.logger.Logf(logrus.ErrorLevel, format, args...)
}

// Fatal logs a message at level Fatal then the process will exit with status set to 1.
func (l *daprLogger) Fatal(args ...interface{}) {
	l.logger.Fatal(args...)
}

// Fatalf logs a message at level Fatal then the process will exit with status set to 1.
func (l *daprLogger) Fatalf(format string, args ...interface{}) {
	l.logger.Fatalf(format, args...)
}

注意 logrus 的 Fatalf() 方法的实现，在输出日志之后会调用ExitFunc（如果没设置则默认是 os.Exit）

func (entry *Entry) Fatalf(format string, args ...interface{}) {
	entry.Logf(FatalLevel, format, args...)
	entry.Logger.Exit(1)
}
func (logger *Logger) Exit(code int) {
	runHandlers()
	if logger.ExitFunc == nil {
		logger.ExitFunc = os.Exit
	}
	logger.ExitFunc(code)
}

这会导致进程退出。因此要慎用。

4.2.3 - options.go的源码学习

设置 logger 相关的属性，包括从命令行参数中解析标记

Dapr logger package中的 options.go 文件的源码学习，设置logger相关的属性，包括从命令行参数中解析标记。

默认属性

const (
	defaultJSONOutput  = false
	defaultOutputLevel = "info"
	undefinedAppID     = ""
)

Options 结构体定义

Options 结构体，就三个字段：

// Options defines the sets of options for Dapr logging
type Options struct {
   // appID is the unique id of Dapr Application
   // 默认为空
   appID string

   // JSONFormatEnabled is the flag to enable JSON formatted log
   // 默认为fasle
   JSONFormatEnabled bool

   // OutputLevel is the level of logging
   // 默认为 info
   OutputLevel string
}

设值方法

// SetOutputLevel sets the log output level
func (o *Options) SetOutputLevel(outputLevel string) error {
   // 疑问：这里检查和赋值存在不一致：如果 outputLevel 中有大写字母
   // TODO：改进一下
   if toLogLevel(outputLevel) == UndefinedLevel {
      return errors.Errorf("undefined Log Output Level: %s", outputLevel)
   }
   o.OutputLevel = outputLevel
   return nil
}

// SetAppID sets Dapr ID
func (o *Options) SetAppID(id string) {
   o.appID = id
}

疑问：为什么字段和设置方法不统一？

JSONFormatEnabled 是 public 字段，没有Set方法
OutputLevel 是 public 字段，有 Set 方法，Set 方法做了输入值的检测。
- 问题来了：既然是 public 字段，那么绕开 Set 方法直接赋值岂不是就绕开了输入值检测的逻辑？
appID 是 private 字段，有 Set 方法，而 Set 方法什么都没有做，只是简单赋值，那么为什么不直接用 public 字段呢？

检查发现：

SetOutputLevel 在dapr/dapr 项目中没有任何人调用

默认构造

返回每个字段的默认值，没啥特殊：

// DefaultOptions returns default values of Options
func DefaultOptions() Options {
   return Options{
      JSONFormatEnabled: defaultJSONOutput,
      appID:             undefinedAppID,
      OutputLevel:       defaultOutputLevel,
   }
}

备注：go 不像 java 可以在字段定义时直接赋值一个默认值，有时还真不方便。

从命令行标记中读取日志属性

在命令行参数中读取 log-level 和 log-as-json 两个标记并设置 OutputLevel 和 JSONFormatEnabled：

// AttachCmdFlags attaches log options to command flags
func (o *Options) AttachCmdFlags(
   stringVar func(p *string, name string, value string, usage string),
   boolVar func(p *bool, name string, value bool, usage string)) {
	if stringVar != nil {
		stringVar(
			&o.OutputLevel,
			"log-level",
			defaultOutputLevel,
			"Options are debug, info, warn, error, or fatal (default info)")
	}
	if boolVar != nil {
		boolVar(
			&o.JSONFormatEnabled,
			"log-as-json",
			defaultJSONOutput,
			"print log as JSON (default false)")
	}
}

备注：这大概就是 OutputLevel 和 JSONFormatEnabled 两个字段是 public 的原因？

这个方法会在每个二进制文件(runtime(也就是daprd) / injector / operator / placement / sentry) 的初始化代码中调用：

loggerOptions := logger.DefaultOptions()
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)

注意：这个时候 OutputLevel 的值是没有经过检查而直接设值的，绕开了 SetOutputLevel 方法的检查。

将属性应用到所有的logger

// ApplyOptionsToLoggers applys options to all registered loggers
func ApplyOptionsToLoggers(options *Options) error {
   // 所有的 logger 指的是保存在全局 logger map 中所有 logger
   internalLoggers := getLoggers()

   // Apply formatting options first
   for _, v := range internalLoggers {
      v.EnableJSONOutput(options.JSONFormatEnabled)

      if options.appID != undefinedAppID {
         v.SetAppID(options.appID)
      }
   }

   daprLogLevel := toLogLevel(options.OutputLevel)
   if daprLogLevel == UndefinedLevel {
      // 在这里做了 OutputLevel 值的有效性检查
      return errors.Errorf("invalid value for --log-level: %s", options.OutputLevel)
   }

   for _, v := range internalLoggers {
      v.SetOutputLevel(daprLogLevel)
   }
   return nil
}

TODO：OutputLevel 赋值有效性检查的地方现在发现有两个，其中一个还没有被使用。准备PR修订。

查了一下这个方法的确是在每个二进制文件(runtime(也就是daprd) / injector / operator / placement / sentry) 的初始化代码中调用：

loggerOptions := logger.DefaultOptions()
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)
......
// Apply options to all loggers
loggerOptions.SetAppID(*appID)
if err := logger.ApplyOptionsToLoggers(&loggerOptions); err != nil {
   return nil, err
}

TODO: ApplyOptionsToLoggers这个方法名最好修改增加“来自命令行的options”语义，否则报错 “invalid value for –log-level“ 就会很奇怪。

4.3 - config的源码学习

Dapr config package的源码学习

4.3.1 - decode.go的源码学习

从config中解析出配置信息。

Dapr config package中的 decode.go 文件的源码学习。

Decoder的相关定义

StringDecoder

// StringDecoder被用作自定义类型（或别名类型）来覆盖 `decodeString` DecodeHook中的基本解码功能的一种方式。 
// `encoding.TextMashaller`没有被使用，是因为它与许多Go类型相匹配，并且会有潜在的意外结果。
// 指定一个自定义的解码func应该是非常有意的。
type StringDecoder interface {
	DecodeString(value string) error
}

Decode()方法

// Decode()将通用map值从 `input` 解码到 `output`，同时提供有用的错误信息。
// `output`必须是一个指向Go结构体的指针，该结构体包含应被解码的字段的 `mapstructure` 结构体标签。
// 这个函数在解码被解析为 `map[string]interface{}` 的配置文件或被解析为`map[string]string` 的组件元数据的值时很有用。
// 
// 大部分繁重的工作都由 mapstructure 库处理。自定义的解码器被用来处理将字符串值解码为支持的原生类型。
func Decode(input interface{}, output interface{}) error {
	// 构建mapstructure的decoder
	decoder, err := mapstructure.NewDecoder(&mapstructure.DecoderConfig{ // nolint:exhaustivestruct
		Result:     output,
		DecodeHook: decodeString,	// 这里植入我们的hook
	})
	if err != nil {
		return err
	}
	// 委托给mapstructure的decoder进行解码
	return decoder.Decode(input)
}

DecodeHookFunc 的定义：

type DecodeHookFunc interface{}

DecodeHookFunc() 要求必须是下面的三个方法之一：

// DecodeHookFuncType is a DecodeHookFunc which has complete information about
// the source and target types.
type DecodeHookFuncType func(reflect.Type, reflect.Type, interface{}) (interface{}, error)

// DecodeHookFuncKind is a DecodeHookFunc which knows only the Kinds of the
// source and target types.
type DecodeHookFuncKind func(reflect.Kind, reflect.Kind, interface{}) (interface{}, error)

// DecodeHookFuncRaw is a DecodeHookFunc which has complete access to both the source and target
// values.
type DecodeHookFuncValue func(from reflect.Value, to reflect.Value) (interface{}, error)

config实现中采用的是第一种：有 source 和 target 类型的完整信息。

decodeString()方法

decodeString()方法的实现：

func decodeString(
	f reflect.Type,
	t reflect.Type,
	data interface{}) (interface{}, error) {
	if t.Kind() == reflect.String && f.Kind() != reflect.String {
		return fmt.Sprintf("%v", data), nil
	}
	if f.Kind() == reflect.Ptr {
		f = f.Elem()
		data = reflect.ValueOf(data).Elem().Interface()
	}
	if f.Kind() != reflect.String {
		return data, nil
	}

	dataString, ok := data.(string)
	if !ok {
		return nil, errors.Errorf("expected string: got %s", reflect.TypeOf(data))
	}

	var result interface{}
	var decoder StringDecoder

	if t.Implements(typeStringDecoder) {
		result = reflect.New(t.Elem()).Interface()
		decoder = result.(StringDecoder)
	} else if reflect.PtrTo(t).Implements(typeStringDecoder) {
		result = reflect.New(t).Interface()
		decoder = result.(StringDecoder)
	}

	if decoder != nil {
		if err := decoder.DecodeString(dataString); err != nil {
			if t.Kind() == reflect.Ptr {
				t = t.Elem()
			}

			return nil, errors.Errorf("invalid %s %q: %v", t.Name(), dataString, err)
		}

		return result, nil
	}

	switch t {
	case typeDuration:
		// Check for simple integer values and treat them
		// as milliseconds
		if val, err := strconv.Atoi(dataString); err == nil {
			return time.Duration(val) * time.Millisecond, nil
		}

		// Convert it by parsing
		d, err := time.ParseDuration(dataString)

		return d, invalidError(err, "duration", dataString)
	case typeTime:
		// Convert it by parsing
		t, err := time.Parse(time.RFC3339Nano, dataString)
		if err == nil {
			return t, nil
		}
		t, err = time.Parse(time.RFC3339, dataString)

		return t, invalidError(err, "time", dataString)
	}

	switch t.Kind() { // nolint: exhaustive
	case reflect.Uint:
		val, err := strconv.ParseUint(dataString, 10, 64)

		return uint(val), invalidError(err, "uint", dataString)
	case reflect.Uint64:
		val, err := strconv.ParseUint(dataString, 10, 64)

		return val, invalidError(err, "uint64", dataString)
	case reflect.Uint32:
		val, err := strconv.ParseUint(dataString, 10, 32)

		return uint32(val), invalidError(err, "uint32", dataString)
	case reflect.Uint16:
		val, err := strconv.ParseUint(dataString, 10, 16)

		return uint16(val), invalidError(err, "uint16", dataString)
	case reflect.Uint8:
		val, err := strconv.ParseUint(dataString, 10, 8)

		return uint8(val), invalidError(err, "uint8", dataString)

	case reflect.Int:
		val, err := strconv.ParseInt(dataString, 10, 64)

		return int(val), invalidError(err, "int", dataString)
	case reflect.Int64:
		val, err := strconv.ParseInt(dataString, 10, 64)

		return val, invalidError(err, "int64", dataString)
	case reflect.Int32:
		val, err := strconv.ParseInt(dataString, 10, 32)

		return int32(val), invalidError(err, "int32", dataString)
	case reflect.Int16:
		val, err := strconv.ParseInt(dataString, 10, 16)

		return int16(val), invalidError(err, "int16", dataString)
	case reflect.Int8:
		val, err := strconv.ParseInt(dataString, 10, 8)

		return int8(val), invalidError(err, "int8", dataString)

	case reflect.Float32:
		val, err := strconv.ParseFloat(dataString, 32)

		return float32(val), invalidError(err, "float32", dataString)
	case reflect.Float64:
		val, err := strconv.ParseFloat(dataString, 64)

		return val, invalidError(err, "float64", dataString)

	case reflect.Bool:
		val, err := strconv.ParseBool(dataString)

		return val, invalidError(err, "bool", dataString)

	default:
		return data, nil
	}
}

4.3.2 - normalize.go的源码学习

对JSON进行标准化处理

Dapr config package中的 normalize.go 文件的源码学习。

将 map[interface{}]interface{} 转换为 map[string]interface{}，以便对JSON进行标准化处理，并在组件初始化时使用。

代码实现:

func Normalize(i interface{}) (interface{}, error) {
	var err error
	switch x := i.(type) {				// 只标准化三种类型，其他类型直接返回
	case map[interface{}]interface{}:	// 1. 对于map[interface{}]interface{}，key和value都要做正常化
		m2 := map[string]interface{}{}
		for k, v := range x {
			if strKey, ok := k.(string); ok {
                // 将key的类型改成string，value继续做正常化
				if m2[strKey], err = Normalize(v); err != nil {
					return nil, err
				}
			} else {
                // 要求key一定是string，否则报错
				return nil, fmt.Errorf("error parsing config field: %v", k)
			}
		}

		return m2, nil
	case map[string]interface{}:		// 2. 对于map[string{}]interface{}，只需要对value做正常化
		m2 := map[string]interface{}{}
		for k, v := range x {
			if m2[k], err = Normalize(v); err != nil {
				return nil, err
			}
		}

		return m2, nil
	case []interface{}:					// 3. 对于[]interface{}这样的数组，每个数组元素都做正常化
		for i, v := range x {
			if x[i], err = Normalize(v); err != nil {
				return nil, err
			}
		}
	}

	return i, nil
}

4.3.3 - prefix.go的源码学习

去除key的前缀

Dapr config package中的 prefix.go 文件的源码学习。

代码实现

func PrefixedBy(input interface{}, prefix string) (interface{}, error) {
	normalized, err := Normalize(input)
	if err != nil {
        // 唯一可能来自normalize的错误是: 输入是map[interface{}]interface{}，而某个key不是字符串
		return input, err
	}
	input = normalized

	if inputMap, ok := input.(map[string]interface{}); ok {
		converted := make(map[string]interface{}, len(inputMap))
		for k, v := range inputMap {
			if strings.HasPrefix(k, prefix) {
				key := uncapitalize(strings.TrimPrefix(k, prefix)) // 去掉key的前缀
				converted[key] = v
			}
		}

		return converted, nil
	} else if inputMap, ok := input.(map[string]string); ok {
		converted := make(map[string]string, len(inputMap))
		for k, v := range inputMap {
			if strings.HasPrefix(k, prefix) {
				key := uncapitalize(strings.TrimPrefix(k, prefix)) // 去掉key的前缀
				converted[key] = v
			}
		}

		return converted, nil
	}

	return input, nil
}

uncapitalize()方法将字符串转为小写：

func uncapitalize(str string) string {
	if len(str) == 0 {
		return str
	}

	vv := []rune(str) // Introduced later
	vv[0] = unicode.ToLower(vv[0])

	return string(vv)
}

使用场景

被 retry.go 的 DecodeConfigWithPrefix() 方法调用

func DecodeConfigWithPrefix(c *Config, input interface{}, prefix string) error {
	input, err := config.PrefixedBy(input, prefix)
	if err != nil {
		return err
	}

	return DecodeConfig(c, input)
}

4.4 - retry的源码学习

Dapr retry package的源码学习

4.4.1 - retry.go的源码学习

对JSON进行标准化处理

Dapr retry package中的 retry.go 文件的源码学习。

重试策略

多次重试之间的间隔策略，有两种：PolicyConstant 是固定值，PolicyExponential是指数增长。

// PolicyType 表示后退延迟(back off delay)应该是固定值还是指数增长。
// PolicyType denotes if the back off delay should be constant or exponential.
type PolicyType int

const (
	// PolicyConstant is a backoff policy that always returns the same backoff delay.
    // PolicyConstant是一个总是返回相同退避延迟的退避策略。
	PolicyConstant PolicyType = iota
	// PolicyExponential is a backoff implementation that increases the backoff period
	// for each retry attempt using a randomization function that grows exponentially.
    // PolicyExponential是一个退避实现，它使用一个以指数增长的随机化函数来增加每次重试的退避周期。
	PolicyExponential
)

重试配置

// Config 封装了退避策略的配置。
type Config struct {
	Policy PolicyType `mapstructure:"policy"`

	// Constant back off
	Duration time.Duration `mapstructure:"duration"`

	// Exponential back off
	InitialInterval     time.Duration `mapstructure:"initialInterval"`
	RandomizationFactor float32       `mapstructure:"randomizationFactor"`
	Multiplier          float32       `mapstructure:"multiplier"`
	MaxInterval         time.Duration `mapstructure:"maxInterval"`
	MaxElapsedTime      time.Duration `mapstructure:"maxElapsedTime"`

	// Additional options
	MaxRetries int64 `mapstructure:"maxRetries"`
}

注意: 每个字段都标记了 mapstructure ，这是为了使用 mapstructure 进行解码。

默认配置为:

func DefaultConfig() Config {
	return Config{
		Policy:              PolicyConstant,		// 默认为固定间隔
		Duration:            5 * time.Second,		// 间隔时间默认是5秒钟
		InitialInterval:     backoff.DefaultInitialInterval,
		RandomizationFactor: backoff.DefaultRandomizationFactor,
		Multiplier:          backoff.DefaultMultiplier,
		MaxInterval:         backoff.DefaultMaxInterval,
		MaxElapsedTime:      backoff.DefaultMaxElapsedTime,
		MaxRetries:          -1,					// 默认一直进行重试
	}
}

不带重试的默认配置：

// 这对那些可以自行处理重试的broker来说可能很有用。
func DefaultConfigWithNoRetry() Config {
	c := DefaultConfig()
	c.MaxRetries = 0		// MaxRetries 设置为0

	return c
}

解码配置

DecodeConfig() 方法将 go 结构体解析为 Config :

func DecodeConfig(c *Config, input interface{}) error {
	// Use the default config if `c` is empty/zero value.
	var emptyConfig Config
	if *c == emptyConfig {		// 如果c是一个初始化之后没有进行赋值的Config结构体，则改用默认配置的Config
		*c = DefaultConfig()
	}

	return config.Decode(input, c)
}

DecodeConfigWithPrefix() 方法在将 go 结构体解析为 Config 之前，先去除前缀，并进行key和value的正常化:

func DecodeConfigWithPrefix(c *Config, input interface{}, prefix string) error {
	input, err := config.PrefixedBy(input, prefix)		// 去除前缀，并进行key和value的正常化
	if err != nil {
		return err
	}

	return DecodeConfig(c, input)
}

DecodeString()方法解析重试策略：

func (p *PolicyType) DecodeString(value string) error {
	switch strings.ToLower(value) {
	case "constant":
		*p = PolicyConstant
	case "exponential":
		*p = PolicyExponential
	default:
		return errors.Errorf("unexpected back off policy type: %s", value)
	}

	return nil
}

重试退避时间的生成

NewBackOff() 方法返回一个 BackOff 实例，可直接与NotifyRecover或backoff.RetryNotify一起使用。该实例不会因为上下文取消而停止。要支持取消（推荐），请使用NewBackOffWithContext。由于底层的回退实现并不总是线程安全的，所以每次使用RetryNotifyRecover或backoff.RetryNotify时都应该调用NewBackOff或NewBackOffWithContext。

func (c *Config) NewBackOff() backoff.BackOff {
	var b backoff.BackOff
	switch c.Policy {
	case PolicyConstant:							// 1. 对于固定周期只需要返回配置项中设定的时间间隔，默认5秒钟
		b = backoff.NewConstantBackOff(c.Duration) 
	case PolicyExponential:							// 2. 对于指数周期,通过 backoff 类库来实现，简单透传配置参数
		eb := backoff.NewExponentialBackOff()
		eb.InitialInterval = c.InitialInterval
		eb.RandomizationFactor = float64(c.RandomizationFactor)
		eb.Multiplier = float64(c.Multiplier)
		eb.MaxInterval = c.MaxInterval
		eb.MaxElapsedTime = c.MaxElapsedTime
		b = eb
	}

	if c.MaxRetries >= 0 {
		b = backoff.WithMaxRetries(b, uint64(c.MaxRetries))
	}

	return b
}

NewBackOffWithContext() 方法返回一个BackOff实例，以便与RetryNotifyRecover或backoff.RetryNotify直接使用。如果提供的上下文被取消，则用于取消重试。

由于底层的回退实现并不总是线程安全的，NewBackOff或NewBackOffWithContext应该在每次使用RetryNotifyRecover或backoff.RetryNotify时被调用。

func (c *Config) NewBackOffWithContext(ctx context.Context) backoff.BackOff {
	b := c.NewBackOff()

	return backoff.WithContext(b, ctx)
}

恢复通知

标准 backoff.RetryNotify的用法:

func RetryNotify(operation Operation, b BackOff, notify Notify) error {
   return RetryNotifyWithTimer(operation, b, notify, nil)
}

// Operation 是由Retry()或RetryNotify()执行的。
// 如果该操作返回错误，将使用退避策略重试。
type Operation func() error
// Notify是一个出错通知的函数。
// 如果操作失败（有错误），它会收到一个操作错误和回退延迟。
// 注意，如果退避政策要求停止重试。通知函数不会被调用。
type Notify func(error, time.Duration)

如果出现问题，需要多次重试才恢复，会存在几个问题：

Notify()方法会被调用多次
不好判断是否恢复：理论上"恢复"的概念是先有出错(一次或者连续多次出错)，然后成功（出错之后的第一次不出错）

NotifyRecover() 方法是 backoff.RetryNotify 的封装器，它为之前操作失败但后来恢复的情况增加了另一个回调。这个包装器的主要目的是只在操作第一次失败时调用 “notify”，在最后成功时调用 “recovered”。这有助于将日志信息限制在操作者需要被提醒的事件上。

这里的NotifyRecover() 方法包装了 Operation() 和 Notify() 函数:

func NotifyRecover(operation backoff.Operation, b backoff.BackOff, notify backoff.Notify, recovered func()) error {
	var notified bool

	return backoff.RetryNotify(func() error {
		err := operation()

        // notified为true说明之前执行过notify，即出现了一次或者多次连续错误。
        // err为空说明operation不再出错
        // 这才可以成为"恢复"
		if err == nil && notified {	
            notified = false	// 重置 notified ，下一次 operation() 再成功也不会再出发recovered()
            recovered()			// 满足逻辑，可以触发一次 recovered() 方法
		}

		return err
	}, b, func(err error, d time.Duration) {
        if !notified {		// 只在第一次时调用真正的notify()函数，其他情况下忽略
			notify(err, d)
			notified = true
		}
	})
}

备注：感觉 notified 这个变量的取名不够清晰，它的语义不应该是"是否触发了通知"，而是"是否发生了错误而一直没有恢复"。应该改为类似 errorNotRecoverd 之类的，语义更清晰一些。

5 - dapr仓库的源码学习

Dapr源码学习之dapr仓库

dapr仓库中的代码：

https://github.com/dapr/dapr

5.1 - 工具类代码的源码学习

Dapr 工具类代码的源码学习

工具类代码指完全作为工具使用的代码，这些代码往往是在代码调用链的最底层，自身没有任何特定逻辑，只专注于完成某个特定的功能，作为上层代码的工具使用。

工具类代码处于代码依赖关系的最底层。

5.1.1 - concurrency的源码学习

Dapr concurrency package的源码学习

concurrency packge的代码不多，暂时只有一个 limiter.go。

5.1.1.1 - limiter.go的源码学习

rating limiter的代码实现和使用场景

Dapr concurrency package中的 limiter.go 文件的源码学习，rating limiter的代码实现和使用场景。

重点：充分利用 golang chan 的特性

代码实现

Limiter 结构体定义

// Limiter object
type Limiter struct {
   limit         int
   tickets       chan int
   numInProgress int32
}

字段说明：

limit：最大并发数的限制，这是一个配置项，默认100，初始化后不再修改。
tickets：用 go 的 chan 来保存和分发 tickets
numInProgress：当前正在执行中的数量，这是一个实时状态

构建Limiter

const (
   // DefaultLimit is the default concurrency limit
   DefaultLimit = 100
)

// NewLimiter allocates a new ConcurrencyLimiter
func NewLimiter(limit int) *Limiter {
   if limit <= 0 {
      limit = DefaultLimit
   }

   // allocate a limiter instance
   c := &Limiter{
      limit:   limit,
      // tickets chan 的 size 设置为 limit
      tickets: make(chan int, limit),
   }

   // allocate the tickets:
   // 开始时先准备和limit数量相当的可用 tickets
   for i := 0; i < c.limit; i++ {
      c.tickets <- i
   }

   return c
}

Limiter的实现

// Execute adds a function to the execution queue.
// if num of go routines allocated by this instance is < limit
// launch a new go routine to execute job
// else wait until a go routine becomes available
func (c *Limiter) Execute(job func(param interface{}), param interface{}) int {
   // 从 chan 中拿一个有效票据
   // 如果当前 chan 中有票据，则说明 go routines 的数量还没有达到 limit 的最大限制，还可以继续启动go routine执行job
   // 如果当前 chan 中没有票据，则说明 go routines 的数量已经达到 limit 的最大限制，需要限速了。execute方法会阻塞在这里，等待有job执行完成释放票据
   ticket := <-c.tickets
   // 拿到之后更新numInProgress，数量加一，要求是原子操作
   atomic.AddInt32(&c.numInProgress, 1)
   // 启动 go routine 执行 job
   go func(param interface{}) {
      // 通过defer来做 job 完成后的清理
      defer func() {
         // 将票据释放给 chan，这样后续的 job 有机会申请到
         c.tickets <- ticket
         // 更新numInProgress，数量减一，要求是原子操作
         atomic.AddInt32(&c.numInProgress, -1)
      }()

      // 执行job
      job(param)
   }(param)
   
   // 返回当前的票据号
   return ticket
}

wait方法

wait方法会阻塞并等待所有的已经通过 execute() 方法拿到票据的 go routine 执行完毕。

// Wait will block all the previously Executed jobs completed running.
//
// IMPORTANT: calling the Wait function while keep calling Execute leads to
//            un-desired race conditions
func (c *Limiter) Wait() {
   // 这是从 chan 中读取所有的票据，只要有任何票据被 job 释放都会去争抢
   // 最后wait()方法获取到所有的票据，其他 job 自然就无法获取票据从而阻塞住所有job的工作
   // 但这并不能保证一定能第一时间抢的到，如果还有其他的 job 也在调用 execute() 方法申请票据，那只有等这个 job 完成工作释放票据时再次争抢
   for i := 0; i < c.limit; i++ {
      <-c.tickets
   }
}

使用场景

并行执行批量操作时限速

在 pkg/grpc/api.go 和 pkg/http/api.go 的 GetBulkState（）方法中，通过 limiter 来限制批量操作的并发数量：

// 构建limiter，limit参数由 请求参数中的 Parallelism 制定
limiter := concurrency.NewLimiter(int(in.Parallelism))
n := len(reqs)
for i := 0; i < n; i++ {
   fn := func(param interface{}) {
		......
   }
    // 提交 job 给 limiter
   limiter.Execute(fn, &reqs[i])
}

// 等待所有的 job 执行完成
limiter.Wait()

在 actor 中也有类似的代码:

limiter := concurrency.NewLimiter(actorMetadata.RemindersMetadata.PartitionCount)
for i := range getRequests {
    fn := func(param interface{}) {
    	......
    }
    limiter.Execute(fn, &bulkResponse[i])
}
limiter.Wait()

5.2 - 类库类代码的源码学习

Dapr 类库类代码的源码学习

类库类代码指为了更方便的使用第三方类库而封装的辅助类代码，这些代码也通常是在代码调用链的底层，专注于完成某方面特定的功能，可能会带有一点点 dapr 的逻辑。

工具类代码处于代码依赖关系的倒数第二层底层，仅仅比工具类代码高一层。

5.2.1 - grcp的源码学习

Dapr grpc package的源码学习

5.2.1.1 - util.go的源码学习

目前只有用于转换state参数类型的两个方法

Dapr grpc package中的 util.go文件的源码分析，目前只有用于转换state参数类型的两个方法。

stateConsistencyToString 方法

stateConsistencyToString 方法将 StateOptions_StateConsistency 转为 string：

func stateConsistencyToString(c commonv1pb.StateOptions_StateConsistency) string {
	switch c {
	case commonv1pb.StateOptions_CONSISTENCY_EVENTUAL:
		return "eventual"
	case commonv1pb.StateOptions_CONSISTENCY_STRONG:
		return "strong"
	}

	return ""
}

stateConcurrencyToString 方法

方法方法将 StateOptions_StateConsistency 转为 string：

func stateConcurrencyToString(c commonv1pb.StateOptions_StateConcurrency) string {
	switch c {
	case commonv1pb.StateOptions_CONCURRENCY_FIRST_WRITE:
		return "first-write"
	case commonv1pb.StateOptions_CONCURRENCY_LAST_WRITE:
		return "last-write"
	}

	return ""
}

5.2.1.2 - port.go的源码学习

只有一个 GetFreePort 方法用于获取一个空闲的端口。

Dapr grpc package中的 port.go文件的源码分析，只有一个 GetFreePort 方法用于获取一个空闲的端口。

GetFreePort 方法

GetFreePort 方法从操作系统获取一个空闲可用的端口：

// GetFreePort returns a free port from the OS
func GetFreePort() (int, error) {
	addr, err := net.ResolveTCPAddr("tcp", "localhost:0")
	if err != nil {
		return 0, err
	}

	l, err := net.ListenTCP("tcp", addr)
	if err != nil {
		return 0, err
	}
	defer l.Close()
	return l.Addr().(*net.TCPAddr).Port, nil
}

通过将端口设置为0, 来让操作系统自动分配一个可用的端口。注意返回时一定要关闭这个连接。

5.2.1.3 - dial.go的源码学习

目前只有用于建连获取地址前缀的一个方法

Dapr grpc package中的 dial.go文件的源码分析，目前只有用于建连获取地址前缀的一个方法。

GetDialAddressPrefix 方法

GetDialAddressPrefix 为给定的 DaprMode 返回 dial 前缀，用于gPRC 客户端连接：

// GetDialAddressPrefix returns a dial prefix for a gRPC client connections
// For a given DaprMode.
func GetDialAddressPrefix(mode modes.DaprMode) string {
	if runtime.GOOS == "windows" {
		return ""
	}

	switch mode {
	case modes.KubernetesMode:
		return "dns:///"
	default:
		return ""
	}
}

注意：Kubernetes 模式下返回 “dns:///”

调用场景，只在 grpc.go 的 GetGRPCConnection() 方法中被调用：

// GetGRPCConnection returns a new grpc connection for a given address and inits one if doesn't exist
func (g *Manager) GetGRPCConnection(address, id string, namespace string, skipTLS, recreateIfExists, sslEnabled bool) (*grpc.ClientConn, error) {
    dialPrefix := GetDialAddressPrefix(g.mode)
    ......
    conn, err := grpc.DialContext(ctx, dialPrefix+address, opts...)
    ......
}

5.3 - 基础代码的源码学习

Dapr 基础代码的源码学习

基础代码是 Dapr 代码中最基础的部分，这些代码已经是 dapr 自身逻辑的组成部分，但处于比较偏底层，也不是 dapr 的主要链路，通常代码量也不大。

基础代码在依赖关系中位于工具类代码和类库类代码之上。

5.3.1 - version的源码学习

Dapr version package的源码学习

代码实现

version 的代码超级简单，就一个 version.go，内容也只有一点点：

// Values for these are injected by the build.
var (
   version = "edge"
   commit  string
)

// Version returns the Dapr version. This is either a semantic version
// number or else, in the case of unreleased code, the string "edge".
func Version() string {
   return version
}

// Commit returns the git commit SHA for the code that Dapr was built from.
func Commit() string {
   return commit
}

version：要不就是语义话版本，比如 1.0.0 这种，要不就是 edge 表示未发布的代码
commit：build的时候的 git commit

如何注入

Values for these are injected by the build.

那是怎么注入的呢？ Build 总不能调用代码，而且这两个值也是private。

Dapr 下的 Makefile 文件中：

# git rev-list -1 HEAD 得到的 git commit 的 hash 值
# 如：63147334aa246d76f9f65708c257460567a1cff4
GIT_COMMIT  = $(shell git rev-list -1 HEAD)
# git describe --always --abbrev=7 --dirty 得到的是版本信息
# 如：v1.0.0-rc.4-5-g6314733
GIT_VERSION = $(shell git describe --always --abbrev=7 --dirty)

ifdef REL_VERSION
   DAPR_VERSION := $(REL_VERSION)
else
   DAPR_VERSION := edge
endif

BASE_PACKAGE_NAME := github.com/dapr/dapr

DEFAULT_LDFLAGS:=-X $(BASE_PACKAGE_NAME)/pkg/version.commit=$(GIT_VERSION) -X $(BASE_PACKAGE_NAME)/pkg/version.version=$(DAPR_VERSION)

ifeq ($(origin DEBUG), undefined)
  BUILDTYPE_DIR:=release
  LDFLAGS:="$(DEFAULT_LDFLAGS) -s -w"
else ifeq ($(DEBUG),0)
  BUILDTYPE_DIR:=release
  LDFLAGS:="$(DEFAULT_LDFLAGS) -s -w"
else
  BUILDTYPE_DIR:=debug
  GCFLAGS:=-gcflags="all=-N -l"
  LDFLAGS:="$(DEFAULT_LDFLAGS)"
  $(info Build with debugger information)
endif

define genBinariesForTarget
.PHONY: $(5)/$(1)
$(5)/$(1):
	CGO_ENABLED=$(CGO) GOOS=$(3) GOARCH=$(4) go build $(GCFLAGS) -ldflags=$(LDFLAGS) \
	-o $(5)/$(1) $(2)/;
endef

TODO：没看懂，有时间详细研究一下这个makefile。

5.3.2 - modes的源码学习

Dapr modes package的源码学习

代码实现

modes 的代码超级简单，就一个 modes.go，内容也只有一点点：

// DaprMode is the runtime mode for Dapr.
type DaprMode string

const (
	// KubernetesMode is a Kubernetes Dapr mode
	KubernetesMode DaprMode = "kubernetes"
	// StandaloneMode is a Standalone Dapr mode
	StandaloneMode DaprMode = "standalone"
)

Dapr有两种运行模式

kubernetes 模式
standalone 模式

运行模式的总结

两种模式的差异：

配置文件读取的方式：
- standalone 模式下读取本地文件，文件路径由命令行参数 config 指定。
- kubernetes 模式下读取k8s中存储的CRD，CRD的名称由命令行参数 config 指定。
```
config := flag.String("config", "", "Path to config file, or name of a configuration object")
```
TODO

5.3.3 - cors的源码学习

Dapr cors package的源码学习

代码实现

cors 的代码超级简单，就一个 cors.go，内容也只有一点点：

// DefaultAllowedOrigins is the default origins allowed for the Dapr HTTP servers
const DefaultAllowedOrigins = "*"

AllowedOrigins配置的读取

AllowedOrigins 配置在启动时通过命令行参数 allowed-origins 传入，默认值为 DefaultAllowedOrigins （"*"）。然后传入给 NewRuntimeConfig（）方法：

func FromFlags() (*DaprRuntime, error) {
allowedOrigins := flag.String("allowed-origins", cors.DefaultAllowedOrigins, "Allowed HTTP origins")

	runtimeConfig := NewRuntimeConfig(*appID, placementAddresses, *controlPlaneAddress, *allowedOrigins ......)
}

之后保存在 NewRuntimeConfig 的 AllowedOrigins 字段中：

func NewRuntimeConfig(
   id string, placementAddresses []string,
   controlPlaneAddress, allowedOrigins ......) *Config {
   return &Config{
   	AllowedOrigins:      allowedOrigins,
   	......
   }

AllowedOrigins配置的使用

pkg/http/server.go 的 useCors() 方法：

func (s *server) useCors(next fasthttp.RequestHandler) fasthttp.RequestHandler {
   if s.config.AllowedOrigins == cors_dapr.DefaultAllowedOrigins {
      return next
   }

   log.Infof("enabled cors http middleware")
   origins := strings.Split(s.config.AllowedOrigins, ",")
   corsHandler := s.getCorsHandler(origins)
   return corsHandler.CorsMiddleware(next)
}

5.3.4 - proto的源码学习

Dapr proto package的源码学习

5.3.5 - config的源码学习

Dapr config package的源码学习

5.3.6 - credentials的源码学习

Dapr credentials package的源码学习

5.3.6.1 - certchain.go的源码学习

credentials 结构体持有证书相关的各种 path

Dapr credentials package中的 certchain.go 文件的源码学习，credentials 结构体持有证书相关的各种 path。

CertChain 结构体定义

CertChain 结构体持有证书信任链的PEM值：

// CertChain holds the certificate trust chain PEM values
type CertChain struct {
	RootCA []byte
	Cert   []byte
	Key    []byte
}

装载证书的LoadFromDisk 方法

LoadFromDisk 方法从给定目录中读取 CertChain：

// LoadFromDisk retruns a CertChain from a given directory
func LoadFromDisk(rootCertPath, issuerCertPath, issuerKeyPath string) (*CertChain, error) {
   rootCert, err := ioutil.ReadFile(rootCertPath)
   if err != nil {
      return nil, err
   }
   cert, err := ioutil.ReadFile(issuerCertPath)
   if err != nil {
      return nil, err
   }
   key, err := ioutil.ReadFile(issuerKeyPath)
   if err != nil {
      return nil, err
   }
   return &CertChain{
      RootCA: rootCert,
      Cert:   cert,
      Key:    key,
   }, nil
}

使用场景

placement 的 main.go 中，如果 mTLS 开启了，则会读取 tls 证书：

func loadCertChains(certChainPath string) *credentials.CertChain {
   tlsCreds := credentials.NewTLSCredentials(certChainPath)

   log.Info("mTLS enabled, getting tls certificates")
   // try to load certs from disk, if not yet there, start a watch on the local filesystem
   chain, err := credentials.LoadFromDisk(tlsCreds.RootCertPath(), tlsCreds.CertPath(), tlsCreds.KeyPath())
	......
}

operator 的 operator.go 中，也会判断，如果 MTLSEnabled :

var certChain *credentials.CertChain
if o.config.MTLSEnabled {
   log.Info("mTLS enabled, getting tls certificates")
   // try to load certs from disk, if not yet there, start a watch on the local filesystem
   chain, err := credentials.LoadFromDisk(o.config.Credentials.RootCertPath(), o.config.Credentials.CertPath(), o.config.Credentials.KeyPath())
   ......
}

备注：上面两段代码重复度极高，最好能重构一下。

sentry 中也有调用：

func (c *defaultCA) validateAndBuildTrustBundle() (*trustRootBundle, error) {
	var (
		issuerCreds     *certs.Credentials
		rootCertBytes   []byte
		issuerCertBytes []byte
	)

	// certs exist on disk or getting created, load them when ready
	if !shouldCreateCerts(c.config) {
		err := detectCertificates(c.config.RootCertPath)
		if err != nil {
			return nil, err
		}

		certChain, err := credentials.LoadFromDisk(c.config.RootCertPath, c.config.IssuerCertPath, c.config.IssuerKeyPath)
		if err != nil {
			return nil, errors.Wrap(err, "error loading cert chain from disk")
		}

TODO: 证书相关的细节后面单独细看。

5.3.6.2 - credentials.go的源码学习

credentials 结构体持有证书相关的各种 path

Dapr credentials package中的 credentials.go文件的源码学习，credentials 结构体持有证书相关的各种 path。

TLSCredentials 结构体定义

只有一个字段 credentialsPath：

// TLSCredentials holds paths for credentials
type TLSCredentials struct {
   credentialsPath string
}

构造方法很简单：

// NewTLSCredentials returns a new TLSCredentials
func NewTLSCredentials(path string) TLSCredentials {
   return TLSCredentials{
      credentialsPath: path,
   }
}

获取相关 path 的方法

获取 credentialsPath，这个path中保存有 TLS 证书：

// Path returns the directory holding the TLS credentials
func (t *TLSCredentials) Path() string {
   return t.credentialsPath
}

分别获取 root cert / cert / cert key 的 path：

// RootCertPath returns the file path for the root cert
func (t *TLSCredentials) RootCertPath() string {
   return filepath.Join(t.credentialsPath, RootCertFilename)
}

// CertPath returns the file path for the cert
func (t *TLSCredentials) CertPath() string {
   return filepath.Join(t.credentialsPath, IssuerCertFilename)
}

// KeyPath returns the file path for the cert key
func (t *TLSCredentials) KeyPath() string {
   return filepath.Join(t.credentialsPath, IssuerKeyFilename)
}

5.3.6.3 - tls.go的源码学习

从 cert/key 中装载 tls.config 对象

Dapr credentials package中的 tls.go文件的源码学习，从 cert/key 中装载 tls.config 对象。

TLSConfigFromCertAndKey() 方法

TLSConfigFromCertAndKey() 方法从 PEM 格式中有效的 cert/key 对中返回 tls.config 对象：

// TLSConfigFromCertAndKey return a tls.config object from valid cert/key pair in PEM format.
func TLSConfigFromCertAndKey(certPem, keyPem []byte, serverName string, rootCA *x509.CertPool) (*tls.Config, error) {
	cert, err := tls.X509KeyPair(certPem, keyPem)
	if err != nil {
		return nil, err
	}

	// nolint:gosec
	config := &tls.Config{
		InsecureSkipVerify: false,
		RootCAs:            rootCA,
		ServerName:         serverName,
		Certificates:       []tls.Certificate{cert},
	}

	return config, nil
}

5.3.6.4 - grpc.go的源码学习

获取服务器端选项和客户端选项

Dapr credentials package中的 grpc.go文件的源码学习，获取服务器端选项和客户端选项。

GetServerOptions() 方法

func GetServerOptions(certChain *CertChain) ([]grpc.ServerOption, error) {
	opts := []grpc.ServerOption{}
	if certChain == nil {
		return opts, nil
	}

	cp := x509.NewCertPool()
	cp.AppendCertsFromPEM(certChain.RootCA)

	cert, err := tls.X509KeyPair(certChain.Cert, certChain.Key)
	if err != nil {
		return opts, nil
	}

	// nolint:gosec
	config := &tls.Config{
		ClientCAs: cp,
		// Require cert verification
		ClientAuth:   tls.RequireAndVerifyClientCert,
		Certificates: []tls.Certificate{cert},
	}
	opts = append(opts, grpc.Creds(credentials.NewTLS(config)))

	return opts, nil
}

GetClientOptions() 方法

func GetClientOptions(certChain *CertChain, serverName string) ([]grpc.DialOption, error) {
	opts := []grpc.DialOption{}
	if certChain != nil {
		cp := x509.NewCertPool()
		ok := cp.AppendCertsFromPEM(certChain.RootCA)
		if !ok {
			return nil, errors.New("failed to append PEM root cert to x509 CertPool")
		}
		config, err := TLSConfigFromCertAndKey(certChain.Cert, certChain.Key, serverName, cp)
		if err != nil {
			return nil, errors.Wrap(err, "failed to create tls config from cert and key")
		}
		opts = append(opts, grpc.WithTransportCredentials(credentials.NewTLS(config)))
	} else {
		opts = append(opts, grpc.WithInsecure())
	}
	return opts, nil
}

TODO: 好吧，细节后面看，加密我不熟。

5.4 - Runtime的源码学习

Dapr runtime的源码学习

5.4.1 - options.go的源码学习

用于定制 runtime 中包含的组件

Dapr runtime package中的 options.go 文件的源码学习，用于定制 runtime 中包含的组件。

runtimeOpts 结构体定义

runtimeOpts封装了需要包含在 runtime 中的 component：

type (
	// runtimeOpts encapsulates the components to include in the runtime.
	runtimeOpts struct {
		secretStores    []secretstores.SecretStore
		states          []state.State
		pubsubs         []pubsub.PubSub
		nameResolutions []nameresolution.NameResolution
		inputBindings   []bindings.InputBinding
		outputBindings  []bindings.OutputBinding
		httpMiddleware  []http.Middleware
	}
)

Option 方法

Option 方法用于定制 runtime：

type (
	// Option is a function that customizes the runtime.
	Option func(o *runtimeOpts)
)

定制runtime的With系列方法

提供多个 WithXxx() 方法，用于定制 runtime 的组件：


// WithSecretStores adds secret store components to the runtime.
func WithSecretStores(secretStores ...secretstores.SecretStore) Option {
	return func(o *runtimeOpts) {
		o.secretStores = append(o.secretStores, secretStores...)
	}
}

// WithStates adds state store components to the runtime.
func WithStates(states ...state.State) Option {
	return func(o *runtimeOpts) {
		o.states = append(o.states, states...)
	}
}

// WithPubSubs adds pubsub store components to the runtime.
func WithPubSubs(pubsubs ...pubsub.PubSub) Option {
	return func(o *runtimeOpts) {
		o.pubsubs = append(o.pubsubs, pubsubs...)
	}
}

// WithNameResolutions adds name resolution components to the runtime.
func WithNameResolutions(nameResolutions ...nameresolution.NameResolution) Option {
	return func(o *runtimeOpts) {
		o.nameResolutions = append(o.nameResolutions, nameResolutions...)
	}
}

// WithInputBindings adds input binding components to the runtime.
func WithInputBindings(inputBindings ...bindings.InputBinding) Option {
	return func(o *runtimeOpts) {
		o.inputBindings = append(o.inputBindings, inputBindings...)
	}
}

// WithOutputBindings adds output binding components to the runtime.
func WithOutputBindings(outputBindings ...bindings.OutputBinding) Option {
	return func(o *runtimeOpts) {
		o.outputBindings = append(o.outputBindings, outputBindings...)
	}
}

// WithHTTPMiddleware adds HTTP middleware components to the runtime.
func WithHTTPMiddleware(httpMiddleware ...http.Middleware) Option {
	return func(o *runtimeOpts) {
		o.httpMiddleware = append(o.httpMiddleware, httpMiddleware...)
	}
}

这些方法都只是简单的往 runtimeOpts 结构体的各个组件字段里面保存信息，用于后续 runtime 的初始化。

5.4.2 - config.go的源码学习

解析命令行标记并返回 DaprRuntime 实例

Dapr runtime package中的 cli.go 文件的源码学习，解析命令行标记并返回 DaprRuntime 实例。

cli.go 基本上就一个 FromFlags() 方法。

常量定义

protocol，目前只支持 http 和 grpc ：

// Protocol is a communications protocol
type Protocol string

const (
	// GRPCProtocol is a gRPC communication protocol
	GRPCProtocol Protocol = "grpc"
	// HTTPProtocol is a HTTP communication protocol
	HTTPProtocol Protocol = "http"
)

各种端口的默认值：

const (
	// DefaultDaprHTTPPort is the default http port for Dapr
	DefaultDaprHTTPPort = 3500
	// DefaultDaprAPIGRPCPort is the default API gRPC port for Dapr
	DefaultDaprAPIGRPCPort = 50001
	// DefaultProfilePort is the default port for profiling endpoints
	DefaultProfilePort = 7777
	// DefaultMetricsPort is the default port for metrics endpoints
	DefaultMetricsPort = 9090
)

http默认配置，目前只有一个 MaxRequestBodySize ：

const (
	// DefaultMaxRequestBodySize is the default option for the maximum body size in MB for Dapr HTTP servers
	DefaultMaxRequestBodySize = 4
)

Config 结构体

// Config holds the Dapr Runtime configuration
type Config struct {
	ID                   string
	HTTPPort             int
	ProfilePort          int
	EnableProfiling      bool
	APIGRPCPort          int
	InternalGRPCPort     int
	ApplicationPort      int
	ApplicationProtocol  Protocol
	Mode                 modes.DaprMode
	PlacementAddresses   []string
	GlobalConfig         string
	AllowedOrigins       string
	Standalone           config.StandaloneConfig
	Kubernetes           config.KubernetesConfig
	MaxConcurrency       int
	mtlsEnabled          bool
	SentryServiceAddress string
	CertChain            *credentials.CertChain
	AppSSL               bool
	MaxRequestBodySize   int
}

有点乱，所有的字段都是扁平的，以后越加越多。。。

构建Config

简单赋值构建 config 结构体，这个参数是在太多了一点：

// NewRuntimeConfig returns a new runtime config
func NewRuntimeConfig(
   id string, placementAddresses []string,
   controlPlaneAddress, allowedOrigins, globalConfig, componentsPath, appProtocol, mode string,
   httpPort, internalGRPCPort, apiGRPCPort, appPort, profilePort int,
   enableProfiling bool, maxConcurrency int, mtlsEnabled bool, sentryAddress string, appSSL bool, maxRequestBodySize int) *Config {
   return &Config{
      ID:                  id,
      HTTPPort:            httpPort,
      InternalGRPCPort:    internalGRPCPort,
      APIGRPCPort:         apiGRPCPort,
      ApplicationPort:     appPort,
      ProfilePort:         profilePort,
      ApplicationProtocol: Protocol(appProtocol),
      Mode:                modes.DaprMode(mode),
      PlacementAddresses:  placementAddresses,
      GlobalConfig:        globalConfig,
      AllowedOrigins:      allowedOrigins,
      Standalone: config.StandaloneConfig{
         ComponentsPath: componentsPath,
      },
      Kubernetes: config.KubernetesConfig{
         ControlPlaneAddress: controlPlaneAddress,
      },
      EnableProfiling:      enableProfiling,
      MaxConcurrency:       maxConcurrency,
      mtlsEnabled:          mtlsEnabled,
      SentryServiceAddress: sentryAddress,
      AppSSL:               appSSL,
      MaxRequestBodySize:   maxRequestBodySize,
   }
}

5.4.3 - cli.go的源码学习

解析命令行标记并返回 DaprRuntime 实例

Dapr runtime package中的 cli.go 文件的源码学习，解析命令行标记并返回 DaprRuntime 实例。

cli.go 基本上就一个 FromFlags() 方法。

FromFlags()概述

FromFlags() 方法解析命令行标记并返回 DaprRuntime 实例：

// FromFlags parses command flags and returns DaprRuntime instance
func FromFlags() (*DaprRuntime, error) {
   ......
   return NewDaprRuntime(runtimeConfig, globalConfig, accessControlList), nil
}

解析命令行标记

通用标记

代码如下：

mode := flag.String("mode", string(modes.StandaloneMode), "Runtime mode for Dapr")
daprHTTPPort := flag.String("dapr-http-port", fmt.Sprintf("%v", DefaultDaprHTTPPort), "HTTP port for Dapr API to listen on")
daprAPIGRPCPort := flag.String("dapr-grpc-port", fmt.Sprintf("%v", DefaultDaprAPIGRPCPort), "gRPC port for the Dapr API to listen on")
daprInternalGRPCPort := flag.String("dapr-internal-grpc-port", "", "gRPC port for the Dapr Internal API to listen on")
appPort := flag.String("app-port", "", "The port the application is listening on")
profilePort := flag.String("profile-port", fmt.Sprintf("%v", DefaultProfilePort), "The port for the profile server")
appProtocol := flag.String("app-protocol", string(HTTPProtocol), "Protocol for the application: grpc or http")
componentsPath := flag.String("components-path", "", "Path for components directory. If empty, components will not be loaded. Self-hosted mode only")
config := flag.String("config", "", "Path to config file, or name of a configuration object")
appID := flag.String("app-id", "", "A unique ID for Dapr. Used for Service Discovery and state")
controlPlaneAddress := flag.String("control-plane-address", "", "Address for a Dapr control plane")
sentryAddress := flag.String("sentry-address", "", "Address for the Sentry CA service")
placementServiceHostAddr := flag.String("placement-host-address", "", "Addresses for Dapr Actor Placement servers")
allowedOrigins := flag.String("allowed-origins", cors.DefaultAllowedOrigins, "Allowed HTTP origins")
enableProfiling := flag.Bool("enable-profiling", false, "Enable profiling")
runtimeVersion := flag.Bool("version", false, "Prints the runtime version")
appMaxConcurrency := flag.Int("app-max-concurrency", -1, "Controls the concurrency level when forwarding requests to user code")
enableMTLS := flag.Bool("enable-mtls", false, "Enables automatic mTLS for daprd to daprd communication channels")
appSSL := flag.Bool("app-ssl", false, "Sets the URI scheme of the app to https and attempts an SSL connection")
daprHTTPMaxRequestSize := flag.Int("dapr-http-max-request-size", -1, "Increasing max size of request body in MB to handle uploading of big files. By default 4 MB.")

TODO：应该有命令行参数的文档，对照文档学习一遍。

解析日志相关的标记

loggerOptions := logger.DefaultOptions()
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)

解析metrics相关的标记

metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)

// attaching only metrics-port option
metricsExporter.Options().AttachCmdFlag(flag.StringVar)

然后执行解析：

flag.Parse()

执行version命令

如果只是version命令，则打印版本信息之后就可以退出进程了：

runtimeVersion := flag.Bool("version", false, "Prints the runtime version")

if *runtimeVersion {
   fmt.Println(version.Version())
   os.Exit(0)
}

初始化日志和metrics

日志初始化

根据日志属性初始化logger:

loggerOptions := logger.DefaultOptions()
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)

if *appID == "" {
   return nil, errors.New("app-id parameter cannot be empty")
}

// Apply options to all loggers
loggerOptions.SetAppID(*appID)
if err := logger.ApplyOptionsToLoggers(&loggerOptions); err != nil {
   return nil, err
}

完成日志初始化之后就可以愉快的打印日志了：

log.Infof("starting Dapr Runtime -- version %s -- commit %s", version.Version(), version.Commit())
log.Infof("log level set to: %s", loggerOptions.OutputLevel)

metrics初始化

初始化dapr metrics exporter：

// Initialize dapr metrics exporter
if err := metricsExporter.Init(); err != nil {
   log.Fatal(err)
}

解析配置

解析dapr各种端口设置

dapr-http-port / dapr-grpc-port / profile-port / dapr-internal-grpc-port / app-port ：

daprHTTP, err := strconv.Atoi(*daprHTTPPort)
if err != nil {
   return nil, errors.Wrap(err, "error parsing dapr-http-port flag")
}

daprAPIGRPC, err := strconv.Atoi(*daprAPIGRPCPort)
if err != nil {
   return nil, errors.Wrap(err, "error parsing dapr-grpc-port flag")
}

profPort, err := strconv.Atoi(*profilePort)
if err != nil {
   return nil, errors.Wrap(err, "error parsing profile-port flag")
}

var daprInternalGRPC int
if *daprInternalGRPCPort != "" {
   daprInternalGRPC, err = strconv.Atoi(*daprInternalGRPCPort)
   if err != nil {
      return nil, errors.Wrap(err, "error parsing dapr-internal-grpc-port")
   }
} else {
   daprInternalGRPC, err = grpc.GetFreePort()
   if err != nil {
      return nil, errors.Wrap(err, "failed to get free port for internal grpc server")
   }
}

var applicationPort int
if *appPort != "" {
   applicationPort, err = strconv.Atoi(*appPort)
   if err != nil {
      return nil, errors.Wrap(err, "error parsing app-port")
   }
}

解析其他配置

继续解析 maxRequestBodySize / placementAddresses / concurrency / appProtocol 等配置：

var maxRequestBodySize int
if *daprHTTPMaxRequestSize != -1 {
   maxRequestBodySize = *daprHTTPMaxRequestSize
} else {
   maxRequestBodySize = DefaultMaxRequestBodySize
}

placementAddresses := []string{}
if *placementServiceHostAddr != "" {
   placementAddresses = parsePlacementAddr(*placementServiceHostAddr)
}

var concurrency int
if *appMaxConcurrency != -1 {
   concurrency = *appMaxConcurrency
}

appPrtcl := string(HTTPProtocol)
if *appProtocol != string(HTTPProtocol) {
   appPrtcl = *appProtocol
}

构建Runtime的三大配置

构建runtimeConfig

runtimeConfig := NewRuntimeConfig(*appID, placementAddresses, *controlPlaneAddress, *allowedOrigins, *config, *componentsPath,
   appPrtcl, *mode, daprHTTP, daprInternalGRPC, daprAPIGRPC, applicationPort, profPort, *enableProfiling, concurrency, *enableMTLS, *sentryAddress, *appSSL, maxRequestBodySize)

MTLS相关的配置：

if *enableMTLS {
   runtimeConfig.CertChain, err = security.GetCertChain()
   if err != nil {
      return nil, err
   }
}

构建globalConfig

var globalConfig *global_config.Configuration

根据 config 配置文件的配置，还有 dapr 模式的配置，读取相应的配置文件：

config := flag.String("config", "", "Path to config file, or name of a configuration object")

if *config != "" {
   switch modes.DaprMode(*mode) {
      case modes.KubernetesMode:
      client, conn, clientErr := client.GetOperatorClient(*controlPlaneAddress, security.TLSServerName, runtimeConfig.CertChain)
      if clientErr != nil {
         return nil, clientErr
      }
      defer conn.Close()
      namespace = os.Getenv("NAMESPACE")
      globalConfig, configErr = global_config.LoadKubernetesConfiguration(*config, namespace, client)
      case modes.StandaloneMode:
      globalConfig, _, configErr = global_config.LoadStandaloneConfiguration(*config)
   }

   if configErr != nil {
      log.Debugf("Config error: %v", configErr)
   }
}

if configErr != nil {
   log.Fatalf("error loading configuration: %s", configErr)
}

简单说：kubernetes 模式下读取CRD，standalone 模式下读取本地配置文件。

如果 config 没有配置，则使用默认的 global 配置：

if globalConfig == nil {
   log.Info("loading default configuration")
   globalConfig = global_config.LoadDefaultConfiguration()
}

构建accessControlList

var accessControlList *global_config.AccessControlList

accessControlList, err = global_config.ParseAccessControlSpec(globalConfig.Spec.AccessControlSpec, string(runtimeConfig.ApplicationProtocol))
if err != nil {
   log.Fatalf(err.Error())
}

构造 DaprRuntime 实例

最后构造 DaprRuntime 实例：

return NewDaprRuntime(runtimeConfig, globalConfig, accessControlList), nil

5.4.4 - Runtime App Channel的源码学习

Dapr runtime 中 App Channel的源码学习

5.4.4.1 - channel.go的源码学习

定义 AppChannel 接口和方法

Dapr channel package中的 channel.go 文件的源码学习，定义 AppChannel 接口和方法。

AppChannel 是和用户代码进行通讯的抽象。

常量定义 DefaultChannelAddress，考虑到 dapr 通常是以 sidecar 模式部署的，因此默认channel 地址是 127.0.0.1

const (
   // DefaultChannelAddress is the address that user application listen to
   DefaultChannelAddress = "127.0.0.1"
)

方法定义：

// AppChannel is an abstraction over communications with user code
type AppChannel interface {
   GetBaseAddress() string
   InvokeMethod(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error)
}

5.4.4.2 - grpc_channel.go的源码学习

AppChannel 的 gRPC 实现。

Dapr channel package中的 grpc_channel.go 文件的源码学习，AppChannel 的 gRPC 实现。

Channel 结构体定义

Channel是一个具体的AppChannel实现，用于与基于gRPC的用户代码进行交互。

// Channel is a concrete AppChannel implementation for interacting with gRPC based user code
type Channel struct {
  // grpc 客户端连接
	client           *grpc.ClientConn
  // user code（应用）的地址
	baseAddress      string
  // 限流用的 go chan
	ch               chan int
	tracingSpec      config.TracingSpec
	appMetadataToken string
}

创建 Channel 结构体

// CreateLocalChannel creates a gRPC connection with user code
func CreateLocalChannel(port, maxConcurrency int, conn *grpc.ClientConn, spec config.TracingSpec) *Channel {
	c := &Channel{
		client:           conn,
    // baseAddress 就是 "ip:port"
		baseAddress:      fmt.Sprintf("%s:%d", channel.DefaultChannelAddress, port),
		tracingSpec:      spec,
		appMetadataToken: auth.GetAppToken(),
	}
	if maxConcurrency > 0 {
    // 如果有并发控制要求，则创建用于并发控制的go channel
		c.ch = make(chan int, maxConcurrency)
	}
	return c
}

GetBaseAddress 方法

// GetBaseAddress returns the application base address
func (g *Channel) GetBaseAddress() string {
   return g.baseAddress
}

这个方法用来获取app的基础路径，可以拼接其他的字路径，如：

func (a *actorsRuntime) startAppHealthCheck(opts ...health.Option) {
	healthAddress := fmt.Sprintf("%s/healthz", a.appChannel.GetBaseAddress())
	ch := health.StartEndpointHealthCheck(healthAddress, opts...)
	......
}

备注：只有 actor 这一个地方用到了这个方法

InvokeMethod 方法

InvokeMethod 方法通过 gRPC 调用 user code：

// InvokeMethod invokes user code via gRPC
func (g *Channel) InvokeMethod(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
   var rsp *invokev1.InvokeMethodResponse
   var err error

   switch req.APIVersion() {
   case internalv1pb.APIVersion_V1:
      // 目前只支持 v1 版本
      rsp, err = g.invokeMethodV1(ctx, req)

   default:
      // Reject unsupported version
      // 其他版本会被拒绝
      rsp = nil
      err = status.Error(codes.Unimplemented, fmt.Sprintf("Unsupported spec version: %d", req.APIVersion()))
   }

   return rsp, err
}

invokeMethodV1() 的实现

// invokeMethodV1 calls user applications using daprclient v1
func (g *Channel) invokeMethodV1(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
   if g.ch != nil {
      // 往 ch 里面发一个int，等价于当前并发数量 + 1
      g.ch <- 1
   }

   // 创建一个 app callback 的 client
   clientV1 := runtimev1pb.NewAppCallbackClient(g.client)
   // 将内部 metadata 转为 grpc 的 metadata
   grpcMetadata := invokev1.InternalMetadataToGrpcMetadata(ctx, req.Metadata(), true)

   if g.appMetadataToken != "" {
      grpcMetadata.Set(auth.APITokenHeader, g.appMetadataToken)
   }

   // Prepare gRPC Metadata
   ctx = metadata.NewOutgoingContext(context.Background(), grpcMetadata)

   var header, trailer metadata.MD
   // 调用user code
   resp, err := clientV1.OnInvoke(ctx, req.Message(), grpc.Header(&header), grpc.Trailer(&trailer))

   if g.ch != nil {
      // 从 ch 中读取一个int，等价于当前并发数量 - 1
      // 但这个操作并没有额外保护，如果上面的代码发生 panic，岂不是这个计数器就出错了？
      // 考虑把这个操作放在 deffer 中进行会比较安全
      <-g.ch
   }

   var rsp *invokev1.InvokeMethodResponse
   if err != nil {
      // Convert status code
      respStatus := status.Convert(err)
      // Prepare response
      rsp = invokev1.NewInvokeMethodResponse(int32(respStatus.Code()), respStatus.Message(), respStatus.Proto().Details)
   } else {
      rsp = invokev1.NewInvokeMethodResponse(int32(codes.OK), "", nil)
   }

   rsp.WithHeaders(header).WithTrailers(trailer)

   return rsp.WithMessage(resp), nil
}

使用这个方法的地方有：

actor 的 callLocalActor() 和 deactivateActor()
Grpc api 中的 CallLocal()
messaging 中 direct_message 的 invokeLocal()
runtime中
- getConfigurationHTTP()
- isAppSubscribedToBinding()
- publishMessageHTTP()
- sendBindingEventToApp()

5.5 - Components的源码学习

Dapr Components的源码学习

5.5.1 - Binding组件的源码学习

Dapr Binding组件的源码学习

5.5.2 - Middleware组件的源码学习

Dapr Middleware组件的源码学习

5.5.3 - NameResolution组件的源码学习

Dapr NameResolution组件的源码学习

5.5.4 - PubSub组件的源码学习

Dapr PubSub组件的源码学习

5.5.5 - SecretStores组件的源码学习

Dapr SecretStores组件的源码学习

5.5.6 - Store组件的源码学习

Dapr Store组件的源码学习

5.5.7 - workflow组件的源码学习

Dapr workflow组件的源码学习

5.5.7.1 - registry.go的源码学习

结构体定义

Registry 结构体

Registry 结构体是用来注册返回工作流实现的组件接口

import (
	wfs "github.com/dapr/components-contrib/workflows"
)
// Registry is an interface for a component that returns registered state store implementations.
type Registry struct {
	Logger             logger.Logger
	workflowComponents map[string]func(logger.Logger) wfs.Workflow
}

这里的 Workflow 在 components-contrib 中定义。

默认Registry

默认Registry的创建

package 中定义了一个默认Registry， singleton, 还是 public的：

// DefaultRegistry is the singleton with the registry .
var DefaultRegistry *Registry = NewRegistry()

// NewRegistry is used to create workflow registry.
func NewRegistry() *Registry {
	return &Registry{
		workflowComponents: map[string]func(logger.Logger) wfs.Workflow{},
	}
}

RegisterComponent() 方法

RegisterComponent() 方法在在 register 结构体的 workflowComponents 字段中加入一条或多条记录

func (s *Registry) RegisterComponent(componentFactory func(logger.Logger) wfs.Workflow, names ...string) {
	for _, name := range names {
		s.workflowComponents[createFullName(name)] = componentFactory
	}
}

func createFullName(name string) string {
	return strings.ToLower("workflow." + name)
}

key 是 "workflow." + name 转小写， value 是传入的 componentFactory，这是一个函数，只要传入一个 logger,就能返回 Workflow 实现。

create() 方法

create() 方法根据指定的 name ，version 来构建对应的 workflow 实现：

func (s *Registry) Create(name, version, logName string) (wfs.Workflow, error) {
	if method, ok := s.getWorkflowComponent(name, version, logName); ok {
		return method(), nil
	}
	return nil, fmt.Errorf("couldn't find wokflow %s/%s", name, version)
}

关键实现代码在 getWorkflowComponent() 方法中：

func (s *Registry) getWorkflowComponent(name, version, logName string) (func() wfs.Workflow, bool) {
	nameLower := strings.ToLower(name)
	versionLower := strings.ToLower(version)
    // 用 nameLower+"/"+versionLower 拼接出 key
    // 然后在 register 结构体的 workflowComponents 字段中查找
    // TODO： 保存的时候是 key 是 `"workflow." + name` 转小写
	workflowFn, ok := s.workflowComponents[nameLower+"/"+versionLower]
	if ok {
		return s.wrapFn(workflowFn, logName), true
	}
    // 如果没有找到，看看是不是 InitialVersion
	if components.IsInitialVersion(versionLower) {
        // 如果是 InitialVersion，则不需要拼接 version 内容，直接通过 name 来查找
        // TODO：这要求 name 必须是 "workflow." 开头？
		workflowFn, ok = s.workflowComponents[nameLower]
		if ok {
			return s.wrapFn(workflowFn, logName), true
		}
	}
	return nil, false
}

如果有在 workflowComponents 字段中找到注册的 workflow 实现的 factory, 则用这个 factory 生成 workflow 的实现：

func (s *Registry) wrapFn(componentFactory func(logger.Logger) wfs.Workflow, logName string) func() wfs.Workflow {
	return func() wfs.Workflow {
        // registey 的 logger 会被用来做 workflow 实现的 logger
		l := s.Logger
		if logName != "" && l != nil {
            // 在 logger 中增加 component 字段，值为 logName
			l = l.WithFields(map[string]any{
				"component": logName,
			})
		}
        // 最后调用 factory 的方法来构建 workflow 实现
		return componentFactory(l)
	}
}

总结

需要小心核对 key 的内容：

是否带 “workflow.” 前缀
是否带version 或者是否是 InitialVersion

5.6 - Healthz的源码学习

Dapr Healthz的源码学习

5.6.1 - health.go的源码学习

health checking的客户端实现

Dapr health package中的 health.go 文件的源码分析，health checking的客户端实现

代码实现

Option 方法定义

// Option is an a function that applies a health check option
type Option func(o *healthCheckOptions)

healthCheckOptions 结构体定义

healthCheckOptions 结构体

type healthCheckOptions struct {
	initialDelay      time.Duration
	requestTimeout    time.Duration
	failureThreshold  int
	interval          time.Duration
	successStatusCode int
}

With系列方法

WithXxx 方法用来设置上述5个健康检查的选项，每个方法都返回一个 Option 函数：

// WithInitialDelay sets the initial delay for the health check
func WithInitialDelay(delay time.Duration) Option {
	return func(o *healthCheckOptions) {
		o.initialDelay = delay
	}
}

// WithFailureThreshold sets the failure threshold for the health check
func WithFailureThreshold(threshold int) Option {
	return func(o *healthCheckOptions) {
		o.failureThreshold = threshold
	}
}

// WithRequestTimeout sets the request timeout for the health check
func WithRequestTimeout(timeout time.Duration) Option {
	return func(o *healthCheckOptions) {
		o.requestTimeout = timeout
	}
}

// WithSuccessStatusCode sets the status code for the health check
func WithSuccessStatusCode(code int) Option {
	return func(o *healthCheckOptions) {
		o.successStatusCode = code
	}
}

// WithInterval sets the interval for the health check
func WithInterval(interval time.Duration) Option {
	return func(o *healthCheckOptions) {
		o.interval = interval
	}
}

StartEndpointHealthCheck 方法

StartEndpointHealthCheck 方法用给定的选项在指定的地址上启动健康检查。它返回一个通道，如果端点是健康的则发出true，如果满足失败条件则发出false。

// StartEndpointHealthCheck starts a health check on the specified address with the given options.
// It returns a channel that will emit true if the endpoint is healthy and false if the failure conditions
// Have been met.
func StartEndpointHealthCheck(endpointAddress string, opts ...Option) chan bool {
	options := &healthCheckOptions{}
	applyDefaults(options)

   // 执行每个 Option 函数来设置健康检查的选项
	for _, o := range opts {
		o(options)
	}
	signalChan := make(chan bool, 1)

	go func(ch chan<- bool, endpointAddress string, options *healthCheckOptions) {
      // 设置健康检查的间隔时间 interval，默认5秒一次
		ticker := time.NewTicker(options.interval)
		failureCount := 0
      // 先 sleep initialDelay 时间再开始健康检查
		time.Sleep(options.initialDelay)

      // 创建 http client，设置请求超时时间为 requestTimeout
		client := &fasthttp.Client{
			MaxConnsPerHost:           5, // Limit Keep-Alive connections
			ReadTimeout:               options.requestTimeout,
			MaxIdemponentCallAttempts: 1,
		}

		req := fasthttp.AcquireRequest()
		req.SetRequestURI(endpointAddress)
		req.Header.SetMethod(fasthttp.MethodGet)
		defer fasthttp.ReleaseRequest(req)

		for range ticker.C {
			resp := fasthttp.AcquireResponse()
			err := client.DoTimeout(req, resp, options.requestTimeout)
         // 通过检查应答的状态码来判断健康检查是否成功： successStatusCode
			if err != nil || resp.StatusCode() != options.successStatusCode {
            // 健康检查失败，错误计数器加一
				failureCount++
            // 如果连续错误次数达到阈值 failureThreshold，则视为健康检查失败，发送false到channel
				if failureCount == options.failureThreshold {
					ch <- false
				}
			} else {
            // 健康检查成功，发送 true 到 channel
				ch <- true
            // 同时重制 failureCount
				failureCount = 0
			}
			fasthttp.ReleaseResponse(resp)
		}
	}(signalChan, endpointAddress, options)
	return signalChan
}

applyDefaults() 方法设置默认属性：

const (
	initialDelay      = time.Second * 1
	failureThreshold  = 2
	requestTimeout    = time.Second * 2
	interval          = time.Second * 5
	successStatusCode = 200
)

func applyDefaults(o *healthCheckOptions) {
   o.failureThreshold = failureThreshold
   o.initialDelay = initialDelay
   o.requestTimeout = requestTimeout
   o.successStatusCode = successStatusCode
   o.interval = interval
}

健康检查方式总结

对某一个给定地址 endpointAddress 进行健康检查的步骤和方式为：

先 sleep initialDelay 时间再开始健康检查：可能对方还在初始化过程中
每隔间隔时间 interval 时间发起一次健康检查
每次健康检查是向目标地址 endpointAddress 发起一个 HTTP GET 请求，超时时间为 requestTimeout
检查应答判断是否健康
- 返回应答并且应答的状态码是 successStatusCode 则视为本地健康检查成功
- 超时或者应答的状态码不是 successStatusCode 则视为本地健康检查失败
如果失败则开始累加计数器，然后间隔 interval 时间之后再次进行健康检查
如果多次失败，累计达到阈值 failureThreshold，报告为健康检查失败
只要单次成功，则清理之前的错误累计次数，报告为健康检查成功。

5.6.2 - server.go的源码学习

healthz server的实现

Dapr health package中的 server.go 文件的源码分析，healthz server的实现

代码实现

Health server

healthz server 的接口定义：

// Server is the interface for the healthz server
type Server interface {
	Run(context.Context, int) error
	Ready()
	NotReady()
}

server 结构体，ready 字段保存状态：

type server struct {
	ready bool
	log   logger.Logger
}

创建 healthz server的方法：

// NewServer returns a new healthz server
func NewServer(log logger.Logger) Server {
   return &server{
      log: log,
   }
}

设置 ready 状态的两个方法：

// Ready sets a ready state for the endpoint handlers
func (s *server) Ready() {
   s.ready = true
}

// NotReady sets a not ready state for the endpoint handlers
func (s *server) NotReady() {
   s.ready = false
}

运行healthz server

Run 方法启动一个带有 healthz 端点的 http 服务器，端口通过参数 port 指定：

// Run starts a net/http server with a healthz endpoint
func (s *server) Run(ctx context.Context, port int) error {
   router := http.NewServeMux()
   router.Handle("/healthz", s.healthz())

   srv := &http.Server{
      Addr:    fmt.Sprintf(":%d", port),
      Handler: router,
   }
   ...
}

启动之后：

   doneCh := make(chan struct{})

   go func() {
      select {
      case <-ctx.Done():
         s.log.Info("Healthz server is shutting down")
         shutdownCtx, cancel := context.WithTimeout(
            context.Background(),
            time.Second*5,
         )
         defer cancel()
         srv.Shutdown(shutdownCtx) // nolint: errcheck
      case <-doneCh:
      }
   }()

   s.log.Infof("Healthz server is listening on %s", srv.Addr)
   err := srv.ListenAndServe()
   if err != http.ErrServerClosed {
      s.log.Errorf("Healthz server error: %s", err)
   }
   close(doneCh)
   return err
}

healthz server 处理请求

healthz() 方法是 health endpoint 的 handler，根据当前 healthz server 的 ready 字段的状态值返回 HTTP 状态码：

// healthz is a health endpoint handler
func (s *server) healthz() http.Handler {
   return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
      var status int
      if s.ready {
      	// ready 返回 200
         status = http.StatusOK
      } else {
         // 不 ready 则返回 503
         status = http.StatusServiceUnavailable
      }
      w.WriteHeader(status)
   })
}

使用场景

healthz server 在 injector / placement / sentry / operator 中都有使用，这些进程都是在 main 方法中启动 healthz server。

injector

injector 启动在 8080 端口：

const (
	healthzPort = 8080
)

func main() {
   ......
	go func() {
		healthzServer := health.NewServer(log)
		healthzServer.Ready()

		healthzErr := healthzServer.Run(ctx, healthzPort)
		if healthzErr != nil {
			log.Fatalf("failed to start healthz server: %s", healthzErr)
		}
	}()
	......
}

placement

placement 默认启动在 8080 端口（也可以通过命令行参数修改端口）：

const (
	defaultHealthzPort       = 8080
)

func main() {
	flag.IntVar(&cfg.healthzPort, "healthz-port", cfg.healthzPort, "sets the HTTP port for the healthz server")
   ......
	go startHealthzServer(cfg.healthzPort)
	......
}

func startHealthzServer(healthzPort int) {
	healthzServer := health.NewServer(log)
	healthzServer.Ready()

	if err := healthzServer.Run(context.Background(), healthzPort); err != nil {
		log.Fatalf("failed to start healthz server: %s", err)
	}
}

sentry

sentry 启动在 8080 端口：

const (
	healthzPort = 8080
)

func main() {
   ......
	go func() {
		healthzServer := health.NewServer(log)
		healthzServer.Ready()

		err := healthzServer.Run(ctx, healthzPort)
		if err != nil {
			log.Fatalf("failed to start healthz server: %s", err)
		}
	}()
	......
}

operator

operator 启动在 8080 端口：

const (
	healthzPort = 8080
)

func main() {
   ......
	go func() {
		healthzServer := health.NewServer(log)
		healthzServer.Ready()

		err := healthzServer.Run(ctx, healthzPort)
		if err != nil {
			log.Fatalf("failed to start healthz server: %s", err)
		}
	}()
	......
}

darpd

特别指出：daprd 没有使用 healthz server，daprd 是直接在 dapr HTTP api 的基础上增加了 healthz 的功能。

具体代码在 http/api.go 中：

func NewAPI(......
   api.endpoints = append(api.endpoints, api.constructHealthzEndpoints()...)
	return api
}

func (a *api) constructHealthzEndpoints() []Endpoint {
   return []Endpoint{
      {
         Methods: []string{fasthttp.MethodGet},
         Route:   "healthz",
         Version: apiVersionV1,
         Handler: a.onGetHealthz,
      },
   }
}

onGetHealthz() 方法处理请求：

func (a *api) onGetHealthz(reqCtx *fasthttp.RequestCtx) {
   if !a.readyStatus {
      msg := NewErrorResponse("ERR_HEALTH_NOT_READY", messages.ErrHealthNotReady)
      respondWithError(reqCtx, fasthttp.StatusInternalServerError, msg)
      log.Debug(msg)
   } else {
      respondEmpty(reqCtx)
   }
}

func respondEmpty(ctx *fasthttp.RequestCtx) {
	ctx.Response.SetBody(nil)
	ctx.Response.SetStatusCode(fasthttp.StatusNoContent)
}

注意：这里成功时返回的状态码是 204 StatusNoContent，而不是通常的 200 OK。

5.7 - Metrics的源码学习

Dapr Metrics的源码学习

5.7.1 - exporter.go的源码学习

Exporter 是用于 metrics 导出器的接口，当前只支持 Prometheus

Dapr metrics package中的 exporter.go文件的源码分析，包括结构体定义、方法实现。当前只支持 Prometheus。

Exporter定义和实现

Exporter 接口定义

Exporter 接口定义：

// Exporter is the interface for metrics exporters
type Exporter interface {
	// Init initializes metrics exporter
	Init() error
	// Options returns Exporter options
	Options() *Options
}

exporter 结构体定义

exporter 结构体定义：

// exporter is the base struct
type exporter struct {
	namespace string
	options   *Options
	logger    logger.Logger
}

构建 exporter

// NewExporter creates new MetricsExporter instance
func NewExporter(namespace string) Exporter {
	// TODO: support multiple exporters
	return &promMetricsExporter{
		&exporter{
			namespace: namespace,
			options:   defaultMetricOptions(),
			logger:    logger.NewLogger("dapr.metrics"),
		},
		nil,
	}
}

当前只支持 promMetrics 的 Exporter。

接口方法Options()的实现

Options() 方法简单返回 m.options：

// Options returns current metric exporter options
func (m *exporter) Options() *Options {
	return m.options
}

具体的赋值在 defaultMetricOptions().

Prometheus Exporter的实现

promMetricsExporter 结构体定义

// promMetricsExporter is prometheus metric exporter
type promMetricsExporter struct {
	*exporter
	ocExporter *ocprom.Exporter
}

内嵌 exporter （相当于继承），还有一个 ocprom.Exporter 字段。

接口方法 Init() 的实现

初始化 opencensus 的 exporter：


// Init initializes opencensus exporter
func (m *promMetricsExporter) Init() error {
	if !m.exporter.Options().MetricsEnabled {
		return nil
	}

	// Add default health metrics for process
	
	// 添加默认的 health metrics： 进程信息，和 go 信息
	registry := prom.NewRegistry()
	registry.MustRegister(prom.NewProcessCollector(prom.ProcessCollectorOpts{}))
	registry.MustRegister(prom.NewGoCollector())

	var err error
	m.ocExporter, err = ocprom.NewExporter(ocprom.Options{
		Namespace: m.namespace,
		Registry:  registry,
	})

	if err != nil {
		return errors.Errorf("failed to create Prometheus exporter: %v", err)
	}

	// register exporter to view
	view.RegisterExporter(m.ocExporter)

	// start metrics server
	return m.startMetricServer()
}

startMetricServer() 方法的实现

启动 MetricServer，监听端口来自 options 的 MetricsPort，监听路径为 defaultMetricsPath:


const (
	defaultMetricsPath     = "/"
)

// startMetricServer starts metrics server
func (m *promMetricsExporter) startMetricServer() error {
	if !m.exporter.Options().MetricsEnabled {
		// skip if metrics is not enabled
		return nil
	}

	addr := fmt.Sprintf(":%d", m.options.MetricsPort())

	if m.ocExporter == nil {
		return errors.New("exporter was not initialized")
	}

	m.exporter.logger.Infof("metrics server started on %s%s", addr, defaultMetricsPath)
	go func() {
		mux := http.NewServeMux()
		mux.Handle(defaultMetricsPath, m.ocExporter)

		if err := http.ListenAndServe(addr, mux); err != nil {
			m.exporter.logger.Fatalf("failed to start metrics server: %v", err)
		}
	}()

	return nil
}

5.7.2 - options.go的源码学习

metrics 相关的配置选项

Dapr metrics package中的 options.go文件的源码学习

代码实现

Options 结构体定义

// Options defines the sets of options for Dapr logging
type Options struct {
	// OutputLevel is the level of logging
	MetricsEnabled bool

	metricsPort string
}

默认值

metrics 默认端口 9090, 默认启用 metrics：

const (
	defaultMetricsPort    = "9090"
	defaultMetricsEnabled = true
)

func defaultMetricOptions() *Options {
	return &Options{
		metricsPort:    defaultMetricsPort,
		MetricsEnabled: defaultMetricsEnabled,
	}
}

MetricsPort() 方法实现

MetricsPort() 方法用于获取 metrics 端口，如果配置错误，则使用默认端口 9090：

// MetricsPort gets metrics port.
func (o *Options) MetricsPort() uint64 {
	port, err := strconv.ParseUint(o.metricsPort, 10, 64)
	if err != nil {
		// Use default metrics port as a fallback
		port, _ = strconv.ParseUint(defaultMetricsPort, 10, 64)
	}

	return port
}

解析命令行标记的方法

AttachCmdFlags() 方法

AttachCmdFlags() 方法解析 metrics-port 和 enable-metrics 两个命令行标记：

// AttachCmdFlags attaches metrics options to command flags
func (o *Options) AttachCmdFlags(
	stringVar func(p *string, name string, value string, usage string),
	boolVar func(p *bool, name string, value bool, usage string)) {
	stringVar(
		&o.metricsPort,
		"metrics-port",
		defaultMetricsPort,
		"The port for the metrics server")
	boolVar(
		&o.MetricsEnabled,
		"enable-metrics",
		defaultMetricsEnabled,
		"Enable prometheus metric")
}

AttachCmdFlag() 方法

AttachCmdFlag() 方法只解析 metrics-port 命令行标记（不解析 enable-metrics ）：

// AttachCmdFlag attaches single metrics option to command flags
func (o *Options) AttachCmdFlag(
	stringVar func(p *string, name string, value string, usage string)) {
	stringVar(
		&o.metricsPort,
		"metrics-port",
		defaultMetricsPort,
		"The port for the metrics server")
}

使用场景

只解析 metrics-port 命令行标记的 AttachCmdFlag() 方法在 dapr runtime 启动时被调用（也只被这一个地方调用）：

metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)

// attaching only metrics-port option
metricsExporter.Options().AttachCmdFlag(flag.StringVar)

而解析 metrics-port 和 enable-metrics 两个命令行标记的 AttachCmdFlags() 方法被 injector / operator / placement / sentry 调用：

func init() {
	metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)
	metricsExporter.Options().AttachCmdFlags(flag.StringVar, flag.BoolVar)
}

5.8 - workflow的源码

Dapr workflow的源码

5.8.1 - workflow API

Dapr workflow的API定义

proto 定义

dapr/proto/runtime/v1/dapr.proto

service Dapr {
  // Starts a new instance of a workflow
  rpc StartWorkflowAlpha1 (StartWorkflowRequest) returns (StartWorkflowResponse) {}

  // Gets details about a started workflow instance
  rpc GetWorkflowAlpha1 (GetWorkflowRequest) returns (GetWorkflowResponse) {}

  // Purge Workflow
  rpc PurgeWorkflowAlpha1 (PurgeWorkflowRequest) returns (google.protobuf.Empty) {}

  // Terminates a running workflow instance
  rpc TerminateWorkflowAlpha1 (TerminateWorkflowRequest) returns (google.protobuf.Empty) {}

  // Pauses a running workflow instance
  rpc PauseWorkflowAlpha1 (PauseWorkflowRequest) returns (google.protobuf.Empty) {}

  // Resumes a paused workflow instance
  rpc ResumeWorkflowAlpha1 (ResumeWorkflowRequest) returns (google.protobuf.Empty) {}

  // Raise an event to a running workflow instance
  rpc RaiseEventWorkflowAlpha1 (RaiseEventWorkflowRequest) returns (google.protobuf.Empty) {}
}

workflow 没有 sidecar 往应用方向发请求的场景，也就是没有 appcallback 。

生成的 go 代码

pkg/proto/runtime/v1 下存放的是根据 proto 生成的 go 代码

比如 pkg/proto/runtime/v1/dapr_grpc.pb.go

5.8.2 - workflow HTTP API

Dapr workflow的HTTP API实现

pkg/http/api.go

构建workflow的endpoint

const (
    workflowComponent        = "workflowComponent"
	workflowName             = "workflowName"
)

func NewAPI(opts APIOpts) API {
	api := &api{
        ......
	api.endpoints = append(api.endpoints, api.constructWorkflowEndpoints()...)
	return api
}

constructWorkflowEndpoints() 方法的实现在 pkg/http/api_workflow.go 中：

func (a *api) constructWorkflowEndpoints() []Endpoint {
	return []Endpoint{
		{
			Methods: []string{http.MethodGet},
			Route:   "workflows/{workflowComponent}/{instanceID}",
			Version: apiVersionV1alpha1,
			Handler: a.onGetWorkflowHandler(),
		},
		{
			Methods: []string{http.MethodPost},
			Route:   "workflows/{workflowComponent}/{instanceID}/raiseEvent/{eventName}",
			Version: apiVersionV1alpha1,
			Handler: a.onRaiseEventWorkflowHandler(),
		},
		{
			Methods: []string{http.MethodPost},
			Route:   "workflows/{workflowComponent}/{workflowName}/start",
			Version: apiVersionV1alpha1,
			Handler: a.onStartWorkflowHandler(),
		},
		{
			Methods: []string{http.MethodPost},
			Route:   "workflows/{workflowComponent}/{instanceID}/pause",
			Version: apiVersionV1alpha1,
			Handler: a.onPauseWorkflowHandler(),
		},
		{
			Methods: []string{http.MethodPost},
			Route:   "workflows/{workflowComponent}/{instanceID}/resume",
			Version: apiVersionV1alpha1,
			Handler: a.onResumeWorkflowHandler(),
		},
		{
			Methods: []string{http.MethodPost},
			Route:   "workflows/{workflowComponent}/{instanceID}/terminate",
			Version: apiVersionV1alpha1,
			Handler: a.onTerminateWorkflowHandler(),
		},
		{
			Methods: []string{http.MethodPost},
			Route:   "workflows/{workflowComponent}/{instanceID}/purge",
			Version: apiVersionV1alpha1,
			Handler: a.onPurgeWorkflowHandler(),
		},
	}
}

handler 实现

pkg/http/api_workflow.go

onStartWorkflowHandler()

// Route:   "workflows/{workflowComponent}/{workflowName}/start?instanceID={instanceID}",
// Workflow Component: Component specified in yaml
// Workflow Name: Name of the workflow to run
// Instance ID: Identifier of the specific run
func (a *api) onStartWorkflowHandler() http.HandlerFunc {
	return UniversalHTTPHandler(
		a.universal.StartWorkflowAlpha1,
        // UniversalHTTPHandlerOpts 是范型结构体
		UniversalHTTPHandlerOpts[*runtimev1pb.StartWorkflowRequest, *runtimev1pb.StartWorkflowResponse]{
			// We pass the input body manually rather than parsing it using protojson
			SkipInputBody: true,
			InModifier: func(r *http.Request, in *runtimev1pb.StartWorkflowRequest) (*runtimev1pb.StartWorkflowRequest, error) {
				in.WorkflowName = chi.URLParam(r, workflowName)
				in.WorkflowComponent = chi.URLParam(r, workflowComponent)

                // instance id 是可选的，如果没有指定则生成一个随机的
				// The instance ID is optional. If not specified, we generate a random one.
				in.InstanceId = r.URL.Query().Get(instanceID)
				if in.InstanceId == "" {
					randomID, err := uuid.NewRandom()
					if err != nil {
						return nil, err
					}
					in.InstanceId = randomID.String()
				}

                // HTTP request body 直接用来做 workflow 的 Input
				// We accept the HTTP request body as the input to the workflow
				// without making any assumptions about its format.
				var err error
				in.Input, err = io.ReadAll(r.Body)
				if err != nil {
					return nil, messages.ErrBodyRead.WithFormat(err)
				}
				return in, nil
			},
			SuccessStatusCode: http.StatusAccepted,
		})
}

onGetWorkflowHandler()

// Route: POST "workflows/{workflowComponent}/{instanceID}"
func (a *api) onGetWorkflowHandler() http.HandlerFunc {
	return UniversalHTTPHandler(
		a.universal.GetWorkflowAlpha1,
		UniversalHTTPHandlerOpts[*runtimev1pb.GetWorkflowRequest, *runtimev1pb.GetWorkflowResponse]{
			InModifier: workflowInModifier[*runtimev1pb.GetWorkflowRequest],
		})
}

workflowInModifier() 方法是通用方法，读取 WorkflowComponent 和 InstanceId 两个参数：

// Shared InModifier method for all universal handlers for workflows that adds the "WorkflowComponent" and "InstanceId" properties
func workflowInModifier[T runtimev1pb.WorkflowRequests](r *http.Request, in T) (T, error) {
	in.SetWorkflowComponent(chi.URLParam(r, workflowComponent))
	in.SetInstanceId(chi.URLParam(r, instanceID))
	return in, nil
}

5.8.3 - workflow gRPC API

Dapr workflow的gRPC API实现

proto 定义

dapr/proto/runtime/v1/dapr.proto

service Dapr {
  // Starts a new instance of a workflow
  rpc StartWorkflowAlpha1 (StartWorkflowRequest) returns (StartWorkflowResponse) {}

  // Gets details about a started workflow instance
  rpc GetWorkflowAlpha1 (GetWorkflowRequest) returns (GetWorkflowResponse) {}

  // Purge Workflow
  rpc PurgeWorkflowAlpha1 (PurgeWorkflowRequest) returns (google.protobuf.Empty) {}

  // Terminates a running workflow instance
  rpc TerminateWorkflowAlpha1 (TerminateWorkflowRequest) returns (google.protobuf.Empty) {}

  // Pauses a running workflow instance
  rpc PauseWorkflowAlpha1 (PauseWorkflowRequest) returns (google.protobuf.Empty) {}

  // Resumes a paused workflow instance
  rpc ResumeWorkflowAlpha1 (ResumeWorkflowRequest) returns (google.protobuf.Empty) {}

  // Raise an event to a running workflow instance
  rpc RaiseEventWorkflowAlpha1 (RaiseEventWorkflowRequest) returns (google.protobuf.Empty) {}
}

workflow 没有 sidecar 往应用方向发请求的场景，也就是没有 appcallback 。

生成的 go 代码

pkg/proto/runtime/v1 下存放的是根据 proto 生成的 go 代码

比如 pkg/proto/runtime/v1/dapr_grpc.pb.go

5.9 - 状态管理的源码

Dapr状态管理的源码

5.9.1 - 状态管理源码的概述

Dapr状态管理源码的概述

状态管理的源码

5.9.2 - 状态管理的初始化源码分析

Dapr状态管理的初始化源码分析

State Store Registry

stateStoreRegistry的初始化准备

stateStoreRegistry Registry 的初始化在 runtime 初始化时进行：

func NewDaprRuntime(runtimeConfig *Config, globalConfig *config.Configuration) *DaprRuntime {
  ......
  stateStoreRegistry:     state_loader.NewRegistry(),
}

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {	
  ......
  a.stateStoreRegistry.Register(opts.states...)
  ......
}

这些 opts 来自 runtime 启动时的配置，如 cmd/daprd/main.go 下：

func main() {
	rt, err := runtime.FromFlags()
	if err != nil {
		log.Fatal(err)
	}

	err = rt.Run(
    ......
    runtime.WithStates(
			state_loader.New("redis", func() state.Store {
				return state_redis.NewRedisStateStore(logContrib)
			}),
			state_loader.New("consul", func() state.Store {
				return consul.NewConsulStateStore(logContrib)
			}),
			state_loader.New("azure.blobstorage", func() state.Store {
				return state_azure_blobstorage.NewAzureBlobStorageStore(logContrib)
			}),
			state_loader.New("azure.cosmosdb", func() state.Store {
				return state_cosmosdb.NewCosmosDBStateStore(logContrib)
			}),
			state_loader.New("azure.tablestorage", func() state.Store {
				return state_azure_tablestorage.NewAzureTablesStateStore(logContrib)
			}),
			//state_loader.New("etcd", func() state.Store {
			//	return etcd.NewETCD(logContrib)
			//}),
			state_loader.New("cassandra", func() state.Store {
				return cassandra.NewCassandraStateStore(logContrib)
			}),
			state_loader.New("memcached", func() state.Store {
				return memcached.NewMemCacheStateStore(logContrib)
			}),
			state_loader.New("mongodb", func() state.Store {
				return mongodb.NewMongoDB(logContrib)
			}),
			state_loader.New("zookeeper", func() state.Store {
				return zookeeper.NewZookeeperStateStore(logContrib)
			}),
			state_loader.New("gcp.firestore", func() state.Store {
				return firestore.NewFirestoreStateStore(logContrib)
			}),
			state_loader.New("postgresql", func() state.Store {
				return postgresql.NewPostgreSQLStateStore(logContrib)
			}),
			state_loader.New("sqlserver", func() state.Store {
				return sqlserver.NewSQLServerStateStore(logContrib)
			}),
			state_loader.New("hazelcast", func() state.Store {
				return hazelcast.NewHazelcastStore(logContrib)
			}),
			state_loader.New("cloudstate.crdt", func() state.Store {
				return cloudstate.NewCRDT(logContrib)
			}),
			state_loader.New("couchbase", func() state.Store {
				return couchbase.NewCouchbaseStateStore(logContrib)
			}),
			state_loader.New("aerospike", func() state.Store {
				return aerospike.NewAerospikeStateStore(logContrib)
			}),
		),
    ......
}

在这里配置各种 state store 的实现。

State Store Registry的实现方式

pkg/components/state/registry.go，定义了registry的接口和数据结构：

// Registry is an interface for a component that returns registered state store implementations
type Registry interface {
	Register(components ...State)
	CreateStateStore(name string) (state.Store, error)
}

type stateStoreRegistry struct {
	stateStores map[string]func() state.Store
}

state.Store 是 dapr 定义的标准 state store的接口，所有的实现都要遵循这个接口。定义在 github.com/dapr/components-contrib/state/store.go 文件中：

// Store is an interface to perform operations on store
type Store interface {
	Init(metadata Metadata) error
	Delete(req *DeleteRequest) error
	BulkDelete(req []DeleteRequest) error
	Get(req *GetRequest) (*GetResponse, error)
	Set(req *SetRequest) error
	BulkSet(req []SetRequest) error
}

前面 runtime 初始化时，每个实现都通过 New 方法将 name 和对应的 state store 关联起来：

type State struct {
	Name          string
	FactoryMethod func() state.Store
}

func New(name string, factoryMethod func() state.Store) State {
	return State{
		Name:          name,
		FactoryMethod: factoryMethod,
	}
}

State Store的初始化流程

pkg/runtime/runtime.go :

State 的初始化在 runtime 初始化时进行：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
	......
	go a.processComponents()
	......
}

func (a *DaprRuntime) processComponents() {
   for {
      comp, more := <-a.pendingComponents
      if !more {
         a.pendingComponentsDone <- true
         return
      }
      if err := a.processOneComponent(comp); err != nil {
         log.Errorf("process component %s error, %s", comp.Name, err)
      }
   }
}

processOneComponent:

func (a *DaprRuntime) processOneComponent(comp components_v1alpha1.Component) error {
	res := a.preprocessOneComponent(&comp)
  
	compCategory := a.figureOutComponentCategory(comp)

	......
	return nil
}

doProcessOneComponent:

func (a *DaprRuntime) doProcessOneComponent(category ComponentCategory, comp components_v1alpha1.Component) error {
	switch category {
	case stateComponent:
		return a.initState(comp)
	}
		......
	return nil
}

initState方法的实现:

// Refer for state store api decision  https://github.com/dapr/dapr/blob/master/docs/decision_records/api/API-008-multi-state-store-api-design.md
func (a *DaprRuntime) initState(s components_v1alpha1.Component) error {
	// 构建 state store（这里才开始集成components的代码）
	store, err := a.stateStoreRegistry.CreateStateStore(s.Spec.Type)
	if err != nil {
		log.Warnf("error creating state store %s: %s", s.Spec.Type, err)
		diag.DefaultMonitoring.ComponentInitFailed(s.Spec.Type, "creation")
		return err
	}
	if store != nil {
		props := a.convertMetadataItemsToProperties(s.Spec.Metadata)
		// components的store实现在这里做初始化，如建连
		err := store.Init(state.Metadata{
			Properties: props,
		})
		if err != nil {
			diag.DefaultMonitoring.ComponentInitFailed(s.Spec.Type, "init")
			log.Warnf("error initializing state store %s: %s", s.Spec.Type, err)
			return err
		}

		// 将初始化完成的store实现存放在runtime中
		a.stateStores[s.ObjectMeta.Name] = store

		// set specified actor store if "actorStateStore" is true in the spec.
		actorStoreSpecified := props[actorStateStore]
		if actorStoreSpecified == "true" {
			if a.actorStateStoreCount++; a.actorStateStoreCount == 1 {
				a.actorStateStoreName = s.ObjectMeta.Name
			}
		}
		diag.DefaultMonitoring.ComponentInitialized(s.Spec.Type)
	}

	if a.actorStateStoreName == "" || a.actorStateStoreCount != 1 {
		log.Warnf("either no actor state store or multiple actor state stores are specified in the configuration, actor stores specified: %d", a.actorStateStoreCount)
	}

	return nil
}

其中 CreateStateStore 方法的实现在 pkg/components/state/registry.go 中：

func (s *stateStoreRegistry) CreateStateStore(name string) (state.Store, error) {
	if method, ok := s.stateStores[name]; ok {
		return method(), nil
	}
	return nil, errors.Errorf("couldn't find state store %s", name)
}

5.9.3 - 状态管理的runtime处理源码分析

Dapr状态管理的runtime处理源码分析

runtime 处理 state 请求的代码在 pkg/grpc/api.go 中。

get state

func (a *api) GetState(ctx context.Context, in *runtimev1pb.GetStateRequest) (*runtimev1pb.GetStateResponse, error) {
  // 找 store name 对应的 state store
  // 所以请求里面的 store name，必须对应 yaml 文件里面的 name
	store, err := a.getStateStore(in.StoreName)
	if err != nil {
		apiServerLogger.Debug(err)
		return &runtimev1pb.GetStateResponse{}, err
	}

	req := state.GetRequest{
		Key:      a.getModifiedStateKey(in.Key),
		Metadata: in.Metadata,
		Options: state.GetStateOption{
			Consistency: stateConsistencyToString(in.Consistency),
		},
	}

  // 执行查询
  // 里面实际上会先执行 HGETALL 命令，失败后再执行 GET 命令
	getResponse, err := store.Get(&req)
	if err != nil {
		err = fmt.Errorf("ERR_STATE_GET: %s", err)
		apiServerLogger.Debug(err)
		return &runtimev1pb.GetStateResponse{}, err
	}

	response := &runtimev1pb.GetStateResponse{}
	if getResponse != nil {
		response.Etag = getResponse.ETag
		response.Data = getResponse.Data
	}
	return response, nil
}

get bulk state

get bulk 方法的实现是有 runtime 封装 get 方法而成，底层 state store 只需要实现单个查询的 get 即可。

func (a *api) GetBulkState(ctx context.Context, in *runtimev1pb.GetBulkStateRequest) (*runtimev1pb.GetBulkStateResponse, error) {
   store, err := a.getStateStore(in.StoreName)
   if err != nil {
      apiServerLogger.Debug(err)
      return &runtimev1pb.GetBulkStateResponse{}, err
   }

   resp := &runtimev1pb.GetBulkStateResponse{}
   // 如果 Parallelism <= 0，则取默认值100
   limiter := concurrency.NewLimiter(int(in.Parallelism))

   for _, k := range in.Keys {
      fn := func(param interface{}) {
         req := state.GetRequest{
            Key:      a.getModifiedStateKey(param.(string)),
            Metadata: in.Metadata,
         }

         r, err := store.Get(&req)
         item := &runtimev1pb.BulkStateItem{
            Key: param.(string),
         }
         if err != nil {
            item.Error = err.Error()
         } else if r != nil {
            item.Data = r.Data
            item.Etag = r.ETag
         }
         resp.Items = append(resp.Items, item)
      }

      limiter.Execute(fn, k)
   }
   limiter.Wait()

   return resp, nil
}

save state

func (a *api) SaveState(ctx context.Context, in *runtimev1pb.SaveStateRequest) (*empty.Empty, error) {
   store, err := a.getStateStore(in.StoreName)
   if err != nil {
      apiServerLogger.Debug(err)
      return &empty.Empty{}, err
   }

   reqs := []state.SetRequest{}
   for _, s := range in.States {
      req := state.SetRequest{
         Key:      a.getModifiedStateKey(s.Key),
         Metadata: s.Metadata,
         Value:    s.Value,
         ETag:     s.Etag,
      }
      if s.Options != nil {
         req.Options = state.SetStateOption{
            Consistency: stateConsistencyToString(s.Options.Consistency),
            Concurrency: stateConcurrencyToString(s.Options.Concurrency),
         }
      }
      reqs = append(reqs, req)
   }

   // 调用 store 的 BulkSet 方法
   // 事实上store的Set方法根本没有被 runtime 调用？？？
   err = store.BulkSet(reqs)
   if err != nil {
      err = fmt.Errorf("ERR_STATE_SAVE: %s", err)
      apiServerLogger.Debug(err)
      return &empty.Empty{}, err
   }
   return &empty.Empty{}, nil
}

delete state

func (a *api) DeleteState(ctx context.Context, in *runtimev1pb.DeleteStateRequest) (*empty.Empty, error) {
   store, err := a.getStateStore(in.StoreName)
   if err != nil {
      apiServerLogger.Debug(err)
      return &empty.Empty{}, err
   }

   req := state.DeleteRequest{
      Key:      a.getModifiedStateKey(in.Key),
      Metadata: in.Metadata,
      ETag:     in.Etag,
   }
   if in.Options != nil {
      req.Options = state.DeleteStateOption{
         Concurrency: stateConcurrencyToString(in.Options.Concurrency),
         Consistency: stateConsistencyToString(in.Options.Consistency),
      }
   }

   // 调用 store 的delete方法
   // store 的 BulkDelete 方法没有调用
   // runtime 也没有对外暴露 BulkDelete 方法
   err = store.Delete(&req)
   if err != nil {
      err = fmt.Errorf("ERR_STATE_DELETE: failed deleting state with key %s: %s", in.Key, err)
      apiServerLogger.Debug(err)
      return &empty.Empty{}, err
   }
   return &empty.Empty{}, nil
}

Execute State Transaction

如果要支持事务，则要求实现 TransactionalStore 接口：

type TransactionalStore interface {
   // Init方法是和普通store接口一致的
   Init(metadata Metadata) error
   // 增加的是 Multi 方法
   Multi(request *TransactionalStateRequest) error
}

runtime 的 ExecuteStateTransaction 方法的实现：

func (a *api) ExecuteStateTransaction(ctx context.Context, in *runtimev1pb.ExecuteStateTransactionRequest) (*empty.Empty, error) {
   if a.stateStores == nil || len(a.stateStores) == 0 {
      err := errors.New("ERR_STATE_STORE_NOT_CONFIGURED")
      apiServerLogger.Debug(err)
      return &empty.Empty{}, err
   }

   storeName := in.StoreName

   if a.stateStores[storeName] == nil {
      err := errors.New("ERR_STATE_STORE_NOT_FOUND")
      apiServerLogger.Debug(err)
      return &empty.Empty{}, err
   }

   // 检测是否是 TransactionalStore
   transactionalStore, ok := a.stateStores[storeName].(state.TransactionalStore)
   if !ok {
      err := errors.New("ERR_STATE_STORE_NOT_SUPPORTED")
      apiServerLogger.Debug(err)
      return &empty.Empty{}, err
   }

   // 构造请求
   operations := []state.TransactionalStateOperation{}
   for _, inputReq := range in.Operations {
      var operation state.TransactionalStateOperation
      var req = inputReq.Request
      switch state.OperationType(inputReq.OperationType) {
      case state.Upsert:
         setReq := state.SetRequest{
            Key: a.getModifiedStateKey(req.Key),
            // Limitation:
            // components that cannot handle byte array need to deserialize/serialize in
            // component sepcific way in components-contrib repo.
            Value:    req.Value,
            Metadata: req.Metadata,
            ETag:     req.Etag,
         }

         if req.Options != nil {
            setReq.Options = state.SetStateOption{
               Concurrency: stateConcurrencyToString(req.Options.Concurrency),
               Consistency: stateConsistencyToString(req.Options.Consistency),
            }
         }

         operation = state.TransactionalStateOperation{
            Operation: state.Upsert,
            Request:   setReq,
         }

      case state.Delete:
         delReq := state.DeleteRequest{
            Key:      a.getModifiedStateKey(req.Key),
            Metadata: req.Metadata,
            ETag:     req.Etag,
         }

         if req.Options != nil {
            delReq.Options = state.DeleteStateOption{
               Concurrency: stateConcurrencyToString(req.Options.Concurrency),
               Consistency: stateConsistencyToString(req.Options.Consistency),
            }
         }

         operation = state.TransactionalStateOperation{
            Operation: state.Delete,
            Request:   delReq,
         }

      default:
         err := fmt.Errorf("ERR_OPERATION_NOT_SUPPORTED: operation type %s not supported", inputReq.OperationType)
         apiServerLogger.Debug(err)
         return &empty.Empty{}, err
      }

      operations = append(operations, operation)
   }
 
   // 调用 state store 的 Multi 方法执行有事务性的多个操作
   err := transactionalStore.Multi(&state.TransactionalStateRequest{
      Operations: operations,
      Metadata:   in.Metadata,
   })

   if err != nil {
      err = fmt.Errorf("ERR_STATE_TRANSACTION: %s", err)
      apiServerLogger.Debug(err)
      return &empty.Empty{}, err
   }
   return &empty.Empty{}, nil
}

5.9.4 - 状态管理中Redis实现的处理源码分析

Dapr状态管理中Redis实现的处理源码分析

状态管理的redis实现

Redis的实现在 dapr/components-contrib 下，/state/redis/redis.go 中：

// StateStore is a Redis state store
type StateStore struct {
	client   *redis.Client
	json     jsoniter.API
	metadata metadata
	replicas int

	logger logger.Logger
}

// NewRedisStateStore returns a new redis state store
func NewRedisStateStore(logger logger.Logger) *StateStore {
	return &StateStore{
		json:   jsoniter.ConfigFastest,
		logger: logger,
	}
}

初始化

在 dapr runtime 初始化时，关联 redis 的 state 实现：

state_loader.New("redis", func() state.Store {
    return state_redis.NewRedisStateStore(logContrib)
}),

然后 Init 方法会在 state 初始化时被 dapr runtime 调用，Redis的实现内容为：

// Init does metadata and connection parsing
func (r *StateStore) Init(metadata state.Metadata) error {
	m, err := parseRedisMetadata(metadata)
	if err != nil {
		return err
	}
	r.metadata = m

	if r.metadata.failover {
		r.client = r.newFailoverClient(m)
	} else {
		r.client = r.newClient(m)
	}

	if _, err = r.client.Ping().Result(); err != nil {
		return fmt.Errorf("redis store: error connecting to redis at %s: %s", m.host, err)
	}

	r.replicas, err = r.getConnectedSlaves()

	return err
}

get state

get的实现方式：

// Get retrieves state from redis with a key
func (r *StateStore) Get(req *state.GetRequest) (*state.GetResponse, error) {
   res, err := r.client.DoContext(context.Background(), "HGETALL", req.Key).Result() // Prefer values with ETags
   if err != nil {
      return r.directGet(req) //Falls back to original get
   }
   if res == nil {
      // 结果为空的处理1
      return &state.GetResponse{}, nil
   }
   vals := res.([]interface{})
   if len(vals) == 0 {
      // 结果为空的处理2
      // 所以如果没有找到对应key的值，是给空应答，而不是报错
      return &state.GetResponse{}, nil
   }

   data, version, err := r.getKeyVersion(vals)
   if err != nil {
      return nil, err
   }
   return &state.GetResponse{
      Data: []byte(data),
      ETag: version,
   }, nil
}

支持ETag的实现方式

要支持ETag，就不能简单用 redis 的 key / value 方式直接在value中存放state的数据（data字段，byte[]格式），这个“value”需要包含出data之外的其他Etag字段，比如 version。

redis state实现的设计方式方式是：对于每个存储在 redis 中的 state item中，其value是一个hashmap，在这个value hashmap中通过不同的key存放多个信息：

data：state的数据
version：ETag需要的version

所以前面要用 HGETALL 命令把这个hashamap的所有key/value都取出来，然后现在要通过getKeyVersion方法来从这些key/value中读取data和version：

func (r *StateStore) getKeyVersion(vals []interface{}) (data string, version string, err error) {
   seenData := false
   seenVersion := false
   for i := 0; i < len(vals); i += 2 {
      field, _ := strconv.Unquote(fmt.Sprintf("%q", vals[i]))
      switch field {
      case "data":
         data, _ = strconv.Unquote(fmt.Sprintf("%q", vals[i+1]))
         seenData = true
      case "version":
         version, _ = strconv.Unquote(fmt.Sprintf("%q", vals[i+1]))
         seenVersion = true
      }
   }
   if !seenData || !seenVersion {
      return "", "", errors.New("required hash field 'data' or 'version' was not found")
   }
   return data, version, nil
}

返回的时候，带上ETag：

return &state.GetResponse{
      Data: []byte(data),
      ETag: version,
   }, nil

不支持ETag的实现方式

如果 HGETALL 命令执行失败，则fall back到普通场景：redis中只简单保存数据，没有etag。此时保存方式就是简单的key/value，用简单的 GET 命令直接读取：

func (r *StateStore) directGet(req *state.GetRequest) (*state.GetResponse, error) {
   res, err := r.client.DoContext(context.Background(), "GET", req.Key).Result()
   if err != nil {
      return nil, err
   }

   if res == nil {
      return &state.GetResponse{}, nil
   }

   s, _ := strconv.Unquote(fmt.Sprintf("%q", res))
   return &state.GetResponse{
      Data: []byte(s),
   }, nil
}

备注：这个设计有个性能问题，如果redis中的数据是用简单key/value存储，没有etag，则每次读取都要进行两个：第一次 HGETALL 命令失败，然后 fall back 用 GET 命令再读第二次。

save state

redis的实现，有 set 方法和 BulkSet

// Set saves state into redis
func (r *StateStore) Set(req *state.SetRequest) error {
   return state.SetWithOptions(r.setValue, req)
}

// BulkSet performs a bulks save operation
func (r *StateStore) BulkSet(req []state.SetRequest) error {
   for i := range req {
      err := r.Set(&req[i])
      if err != nil {
         // 这个地方有异议
         // 按照代码逻辑，只要有一个save操作失败，就直接return而放弃后续的操作
         return err
      }
   }

   return nil
}

实际实现在 r.setValue 方法中：

func (r *StateStore) setValue(req *state.SetRequest) error {
   err := state.CheckRequestOptions(req.Options)
   if err != nil {
      return err
   }
   
   // 解析etag，要求etag必须是可以转为整型
   ver, err := r.parseETag(req.ETag)
   if err != nil {
      return err
   }

   // LastWrite win意味着无视ETag的异同，强制写入
   // 所以这里重置 ver 为 0
   if req.Options.Concurrency == state.LastWrite {
      ver = 0
   }

   bt, _ := utils.Marshal(req.Value, r.json.Marshal)

	 // 用 EVAL 命令执行一段 LUA 脚本，脚本内容为 setQuery
   _, err = r.client.DoContext(context.Background(), "EVAL", setQuery, 1, req.Key, ver, bt).Result()
   if err != nil {
      return fmt.Errorf("failed to set key %s: %s", req.Key, err)
   }

	 // 如果要求强一致性，而且副本数量大于0
   if req.Options.Consistency == state.Strong && r.replicas > 0 {
     // 则需要等待所有副本数都写入成功
      _, err = r.client.DoContext(context.Background(), "WAIT", r.replicas, 1000).Result()
      if err != nil {
         return fmt.Errorf("timed out while waiting for %v replicas to acknowledge write", r.replicas)
      }
   }

   return nil
}

更多redis细节：

setQuery 脚本

setQuery                 = "local var1 = redis.pcall(\"HGET\", KEYS[1], \"version\"); if type(var1) == \"table\" then redis.call(\"DEL\", KEYS[1]); end; if not var1 or type(var1)==\"table\" or var1 == \"\" or var1 == ARGV[1] or ARGV[1] == \"0\" then redis.call(\"HSET\", KEYS[1], \"data\", ARGV[2]) return redis.call(\"HINCRBY\", KEYS[1], \"version\", 1) else return error(\"failed to set key \" .. KEYS[1]) end"

WAIT numreplicas timeout 命令：https://redis.io/commands/wait

delete state

// Delete performs a delete operation
func (r *StateStore) Delete(req *state.DeleteRequest) error {
   err := state.CheckRequestOptions(req.Options)
   if err != nil {
      return err
   }
   return state.DeleteWithOptions(r.deleteValue, req)
}

// 内部循环调用 Delete
// BulkDelete 方法没有暴露给 dapr runtime
// BulkDelete performs a bulk delete operation
func (r *StateStore) BulkDelete(req []state.DeleteRequest) error {
   for i := range req {
      err := r.Delete(&req[i])
      if err != nil {
         return err
      }
   }

   return nil
}

实际实现在 r.deleteValue 方法中：

func (r *StateStore) deleteValue(req *state.DeleteRequest) error {
   if req.ETag == "" {
      // ETag的空值则改为 “0” / 零值
      req.ETag = "0"
   }
   _, err := r.client.DoContext(context.Background(), "EVAL", delQuery, 1, req.Key, req.ETag).Result()

   if err != nil {
      return fmt.Errorf("failed to delete key '%s' due to ETag mismatch", req.Key)
   }

   return nil
}

更多redis细节：

delQuery 脚本

delQuery                 = "local var1 = redis.pcall(\"HGET\", KEYS[1], \"version\"); if not var1 or type(var1)==\"table\" or var1 == ARGV[1] or var1 == \"\" or ARGV[1] == \"0\" then return redis.call(\"DEL\", KEYS[1]) else return error(\"failed to delete \" .. KEYS[1]) end"

State Transaction

redis state store 实现了 TransactionalStore，它的 Multi方式：

// Multi performs a transactional operation. succeeds only if all operations succeed, and fails if one or more operations fail
func (r *StateStore) Multi(request *state.TransactionalStateRequest) error {
   // 用的是 redis-go 封装的 TxPipeline
   pipe := r.client.TxPipeline()
   for _, o := range request.Operations {
      if o.Operation == state.Upsert {
         req := o.Request.(state.SetRequest)

         bt, _ := utils.Marshal(req.Value, r.json.Marshal)

         pipe.Set(req.Key, bt, defaultExpirationTime)
      } else if o.Operation == state.Delete {
         req := o.Request.(state.DeleteRequest)
         pipe.Del(req.Key)
      }
   }

   _, err := pipe.Exec()
   return err
}

5.10 - 资源绑定的源码

Dapr的资源绑定的源码

5.10.1 - 资源绑定的源码概述

Dapr的资源绑定的源码概述

5.10.2 - 资源绑定的初始化源码分析

Dapr资源绑定的初始化源码分析

Binding Registry

Binding Registry的初始化准备

Binding Registry 的初始化在 runtime 初始化时进行：

func NewDaprRuntime(runtimeConfig *Config, globalConfig *config.Configuration) *DaprRuntime {
  ......
  bindingsRegistry:       bindings_loader.NewRegistry(),
}

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {	
  ......
  a.bindingsRegistry.RegisterInputBindings(opts.inputBindings...)
	a.bindingsRegistry.RegisterOutputBindings(opts.outputBindings...)
  ......
}

这些 opts 来自 runtime 启动时的配置，如 cmd/daprd/main.go 下：

func main() {
	rt, err := runtime.FromFlags()
	if err != nil {
		log.Fatal(err)
	}

	err = rt.Run(
    ......
    runtime.WithInputBindings(
			bindings_loader.NewInput("aws.sqs", func() bindings.InputBinding {
				return sqs.NewAWSSQS(logContrib)
			}),
			bindings_loader.NewInput("aws.kinesis", func() bindings.InputBinding {
				return kinesis.NewAWSKinesis(logContrib)
			}),
			bindings_loader.NewInput("azure.eventhubs", func() bindings.InputBinding {
				return eventhubs.NewAzureEventHubs(logContrib)
			}),
			bindings_loader.NewInput("kafka", func() bindings.InputBinding {
				return kafka.NewKafka(logContrib)
			}),
			bindings_loader.NewInput("mqtt", func() bindings.InputBinding {
				return mqtt.NewMQTT(logContrib)
			}),
			bindings_loader.NewInput("rabbitmq", func() bindings.InputBinding {
				return bindings_rabbitmq.NewRabbitMQ(logContrib)
			}),
			bindings_loader.NewInput("azure.servicebusqueues", func() bindings.InputBinding {
				return servicebusqueues.NewAzureServiceBusQueues(logContrib)
			}),
			bindings_loader.NewInput("azure.storagequeues", func() bindings.InputBinding {
				return storagequeues.NewAzureStorageQueues(logContrib)
			}),
			bindings_loader.NewInput("gcp.pubsub", func() bindings.InputBinding {
				return pubsub.NewGCPPubSub(logContrib)
			}),
			bindings_loader.NewInput("kubernetes", func() bindings.InputBinding {
				return kubernetes.NewKubernetes(logContrib)
			}),
			bindings_loader.NewInput("azure.eventgrid", func() bindings.InputBinding {
				return eventgrid.NewAzureEventGrid(logContrib)
			}),
			bindings_loader.NewInput("twitter", func() bindings.InputBinding {
				return twitter.NewTwitter(logContrib)
			}),
			bindings_loader.NewInput("cron", func() bindings.InputBinding {
				return cron.NewCron(logContrib)
			}),
		),
    runtime.WithOutputBindings(
			bindings_loader.NewOutput("aws.sqs", func() bindings.OutputBinding {
				return sqs.NewAWSSQS(logContrib)
			}),
			bindings_loader.NewOutput("aws.sns", func() bindings.OutputBinding {
				return sns.NewAWSSNS(logContrib)
			}),
			bindings_loader.NewOutput("aws.kinesis", func() bindings.OutputBinding {
				return kinesis.NewAWSKinesis(logContrib)
			}),
			bindings_loader.NewOutput("azure.eventhubs", func() bindings.OutputBinding {
				return eventhubs.NewAzureEventHubs(logContrib)
			}),
			bindings_loader.NewOutput("aws.dynamodb", func() bindings.OutputBinding {
				return dynamodb.NewDynamoDB(logContrib)
			}),
			bindings_loader.NewOutput("azure.cosmosdb", func() bindings.OutputBinding {
				return bindings_cosmosdb.NewCosmosDB(logContrib)
			}),
			bindings_loader.NewOutput("gcp.bucket", func() bindings.OutputBinding {
				return bucket.NewGCPStorage(logContrib)
			}),
			bindings_loader.NewOutput("http", func() bindings.OutputBinding {
				return http.NewHTTP(logContrib)
			}),
			bindings_loader.NewOutput("kafka", func() bindings.OutputBinding {
				return kafka.NewKafka(logContrib)
			}),
			bindings_loader.NewOutput("mqtt", func() bindings.OutputBinding {
				return mqtt.NewMQTT(logContrib)
			}),
			bindings_loader.NewOutput("rabbitmq", func() bindings.OutputBinding {
				return bindings_rabbitmq.NewRabbitMQ(logContrib)
			}),
			bindings_loader.NewOutput("redis", func() bindings.OutputBinding {
				return redis.NewRedis(logContrib)
			}),
			bindings_loader.NewOutput("aws.s3", func() bindings.OutputBinding {
				return s3.NewAWSS3(logContrib)
			}),
			bindings_loader.NewOutput("azure.blobstorage", func() bindings.OutputBinding {
				return blobstorage.NewAzureBlobStorage(logContrib)
			}),
			bindings_loader.NewOutput("azure.servicebusqueues", func() bindings.OutputBinding {
				return servicebusqueues.NewAzureServiceBusQueues(logContrib)
			}),
			bindings_loader.NewOutput("azure.storagequeues", func() bindings.OutputBinding {
				return storagequeues.NewAzureStorageQueues(logContrib)
			}),
			bindings_loader.NewOutput("gcp.pubsub", func() bindings.OutputBinding {
				return pubsub.NewGCPPubSub(logContrib)
			}),
			bindings_loader.NewOutput("azure.signalr", func() bindings.OutputBinding {
				return signalr.NewSignalR(logContrib)
			}),
			bindings_loader.NewOutput("twilio.sms", func() bindings.OutputBinding {
				return sms.NewSMS(logContrib)
			}),
			bindings_loader.NewOutput("twilio.sendgrid", func() bindings.OutputBinding {
				return sendgrid.NewSendGrid(logContrib)
			}),
			bindings_loader.NewOutput("azure.eventgrid", func() bindings.OutputBinding {
				return eventgrid.NewAzureEventGrid(logContrib)
			}),
			bindings_loader.NewOutput("cron", func() bindings.OutputBinding {
				return cron.NewCron(logContrib)
			}),
			bindings_loader.NewOutput("twitter", func() bindings.OutputBinding {
				return twitter.NewTwitter(logContrib)
			}),
			bindings_loader.NewOutput("influx", func() bindings.OutputBinding {
				return influx.NewInflux(logContrib)
			}),
		),
    ......
}

在这里配置各种 inputbinding 和 output binding的实现。

Binding Registry的实现方式

pkg/components/bindings/registry.go，定义了多个数据结构：

type (
	// InputBinding is an input binding component definition.
	InputBinding struct {
		Name          string
		FactoryMethod func() bindings.InputBinding
	}

	// OutputBinding is an output binding component definition.
	OutputBinding struct {
		Name          string
		FactoryMethod func() bindings.OutputBinding
	}

	// Registry is the interface of a components that allows callers to get registered instances of input and output bindings
	Registry interface {
		RegisterInputBindings(components ...InputBinding)
		RegisterOutputBindings(components ...OutputBinding)
		CreateInputBinding(name string) (bindings.InputBinding, error)
		CreateOutputBinding(name string) (bindings.OutputBinding, error)
	}

	bindingsRegistry struct {
		inputBindings  map[string]func() bindings.InputBinding
		outputBindings map[string]func() bindings.OutputBinding
	}
)

前面 runtime 初始化时，每个实现都通过 NewInput 方法和 NewOutput方法，将 name 和对应的InputBinding/OutputBinding关联起来：

// NewInput creates a InputBinding.
func NewInput(name string, factoryMethod func() bindings.InputBinding) InputBinding {
	return InputBinding{
		Name:          name,
		FactoryMethod: factoryMethod,
	}
}

// NewOutput creates a OutputBinding.
func NewOutput(name string, factoryMethod func() bindings.OutputBinding) OutputBinding {
	return OutputBinding{
		Name:          name,
		FactoryMethod: factoryMethod,
	}
}

RegisterInputBindings 和 RegisterOutputBindings 方法用来注册 input binding 和 output binding

的实现，在runtime 初始化时被调用：

// RegisterInputBindings registers one or more new input bindings.
func (b *bindingsRegistry) RegisterInputBindings(components ...InputBinding) {
	for _, component := range components {
		b.inputBindings[createFullName(component.Name)] = component.FactoryMethod
	}
}

// RegisterOutputBindings registers one or more new output bindings.
func (b *bindingsRegistry) RegisterOutputBindings(components ...OutputBinding) {
	for _, component := range components {
		b.outputBindings[createFullName(component.Name)] = component.FactoryMethod
	}
}

func createFullName(name string) string {
  // createFullName统一增加前缀 bindings.
	return fmt.Sprintf("bindings.%s", name)
}

binding的初始化流程

pkg/runtime/runtime.go :

Binding 的初始化在 runtime 初始化时进行：

func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
	......
	go a.processComponents()
	......
}

func (a *DaprRuntime) processComponents() {
   for {
      comp, more := <-a.pendingComponents
      if !more {
         a.pendingComponentsDone <- true
         return
      }
      if err := a.processOneComponent(comp); err != nil {
         log.Errorf("process component %s error, %s", comp.Name, err)
      }
   }
}

processOneComponent:

func (a *DaprRuntime) processOneComponent(comp components_v1alpha1.Component) error {
	res := a.preprocessOneComponent(&comp)
  
	compCategory := a.figureOutComponentCategory(comp)

	......
	return nil
}

doProcessOneComponent:

func (a *DaprRuntime) doProcessOneComponent(category ComponentCategory, comp components_v1alpha1.Component) error {
	switch category {
	case bindingsComponent:
		return a.initBinding(comp)
		......
	}
	return nil
}

initBinding:

func (a *DaprRuntime) initBinding(c components_v1alpha1.Component) error {
	if err := a.initOutputBinding(c); err != nil {
		log.Errorf("failed to init output bindings: %s", err)
		return err
	}

	if err := a.initInputBinding(c); err != nil {
		log.Errorf("failed to init input bindings: %s", err)
		return err
	}
	return nil
}

在这里进行 input binding 和 output binding 的初始化。

Output Binding的初始化

pkg/runtime/runtime.go：

func (a *DaprRuntime) initOutputBinding(c components_v1alpha1.Component) error {
  // 成功
	binding, err := a.bindingsRegistry.CreateOutputBinding(c.Spec.Type)
	if err != nil {
		log.Warnf("failed to create output binding %s (%s): %s", c.ObjectMeta.Name, c.Spec.Type, err)
		diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "creation")
		return err
	}

	if binding != nil {
		err := binding.Init(bindings.Metadata{
			Properties: a.convertMetadataItemsToProperties(c.Spec.Metadata),
			Name:       c.ObjectMeta.Name,
		})
		if err != nil {
			log.Errorf("failed to init output binding %s (%s): %s", c.ObjectMeta.Name, c.Spec.Type, err)
			diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init")
			return err
		}
		log.Infof("successful init for output binding %s (%s)", c.ObjectMeta.Name, c.Spec.Type)
		a.outputBindings[c.ObjectMeta.Name] = binding
		diag.DefaultMonitoring.ComponentInitialized(c.Spec.Type)
	}
	return nil
}

其中 CreateOutputBinding 方法的实现在 pkg/components/bindings/registry.go 中：

// Create instantiates an output binding based on `name`.
func (b *bindingsRegistry) CreateOutputBinding(name string) (bindings.OutputBinding, error) {
	if method, ok := b.outputBindings[name]; ok {
    // 调用 factory 方法生成具体实现的 outputBinding
		return method(), nil
	}
	return nil, errors.Errorf("couldn't find output binding %s", name)
}

Input Binding的初始化

TODO

5.10.3 - 资源绑定的Redis output实现源码分析

Dapr资源绑定的Redis output实现源码分析

备注：根据 https://github.com/dapr/docs/blob/master/concepts/bindings/README.md 的描述，redis 只实现了 output binding。

output binding 的实现

Redis的实现在 dapr/components-contrib 下，/bindings/redis/redis.go 中：

func (r *Redis) Operations() []bindings.OperationKind {
  // 只支持create
	return []bindings.OperationKind{bindings.CreateOperation}
}

func (r *Redis) Invoke(req *bindings.InvokeRequest) (*bindings.InvokeResponse, error) {
  // 通过 metadata 传递 key
	if val, ok := req.Metadata["key"]; ok && val != "" {
		key := val
    // 调用标准 redis 客户端，执行 SET 命令
		_, err := r.client.DoContext(context.Background(), "SET", key, req.Data).Result()
		if err != nil {
			return nil, err
		}
		return nil, nil
	}
	return nil, errors.New("redis binding: missing key on write request metadata")
}

完整分析

初始化：

在 dapr runtime 初始化时，关联 redis 的 output binding实现：

bindings_loader.NewOutput("redis", func() bindings.OutputBinding {
   return redis.NewRedis(logContrib)
}),

然后 Init 方法会在 output binding初始化时被 dapr runtime 调用，Redis的实现内容为：

// Init performs metadata parsing and connection creation
func (r *Redis) Init(meta bindings.Metadata) error {
  // 解析metadata
	m, err := r.parseMetadata(meta)
	if err != nil {
		return err
	}

  // redis 连接属性
	opts := &redis.Options{
		Addr:            m.host,
		Password:        m.password,
		DB:              defaultDB,
		MaxRetries:      m.maxRetries,
		MaxRetryBackoff: m.maxRetryBackoff,
	}

	/* #nosec */
	if m.enableTLS {
		opts.TLSConfig = &tls.Config{
			InsecureSkipVerify: m.enableTLS,
		}
	}

  // 建立redis连接
	r.client = redis.NewClient(opts)
	_, err = r.client.Ping().Result()
	if err != nil {
		return fmt.Errorf("redis binding: error connecting to redis at %s: %s", m.host, err)
	}

	return err
}

5.10.4 - 资源绑定的output处理源码分析

Dapr资源绑定的output处理源码分析

pkc/grpc/api.go 中的 InvokeBinding 方法：

func (a *api) InvokeBinding(ctx context.Context, in *runtimev1pb.InvokeBindingRequest) (*runtimev1pb.InvokeBindingResponse, error) {
	req := &bindings.InvokeRequest{
		Metadata:  in.Metadata,
		Operation: bindings.OperationKind(in.Operation),
	}
	if in.Data != nil {
		req.Data = in.Data
	}

	r := &runtimev1pb.InvokeBindingResponse{}
  // 关键实现在这里
	resp, err := a.sendToOutputBindingFn(in.Name, req)
	if err != nil {
		err = fmt.Errorf("ERR_INVOKE_OUTPUT_BINDING: %s", err)
		apiServerLogger.Debug(err)
		return r, err
	}

	if resp != nil {
		r.Data = resp.Data
		r.Metadata = resp.Metadata
	}
	return r, nil
}

sendToOutputBindingFn 方法的初始化在这里：

func (a *DaprRuntime) getGRPCAPI() grpc.API {
	return grpc.NewAPI(a.runtimeConfig.ID, a.appChannel, a.stateStores, a.secretStores, a.getPublishAdapter(), a.directMessaging, a.actor, a.sendToOutputBinding, a.globalConfig.Spec.TracingSpec)
}

sendToOutputBinding 方法的实现在 pkg/runtime/runtime.go:

func (a *DaprRuntime) sendToOutputBinding(name string, req *bindings.InvokeRequest) (*bindings.InvokeResponse, error) {
   if req.Operation == "" {
      return nil, errors.New("operation field is missing from request")
   }

   // 根据 name 找已经注册好的 binding
   if binding, ok := a.outputBindings[name]; ok {
      ops := binding.Operations()
      for _, o := range ops {
      	 // 找到改 binding 下支持的 operation
         if o == req.Operation {
         		// 关键代码，需要转到具体的实现了
            return binding.Invoke(req)
         }
      }
      supported := make([]string, len(ops))
      for _, o := range ops {
         supported = append(supported, string(o))
      }
      return nil, errors.Errorf("binding %s does not support operation %s. supported operations:%s", name, req.Operation, strings.Join(supported, " "))
   }
   return nil, errors.Errorf("couldn't find output binding %s", name)
}

5.10.5 - 资源绑定的Metadata总结

Dapr资源绑定的Metadata总结

总结一下各种binding实现中 metadata 的设计和使用:

实现	配置级别的metadata	请求级别的metadata
alicloud oss		key
HTTP	url / method	无
cron	schedule	无
MQTT	url / topic	无
RabbitMQ	host / queueName / durable deleteWhenUnused / prefetchCount	ttlInSeconds
Redis	host / password / enableTLS / maxRetries / maxRetryBackoff	key
Influx	url / token / org / bucket	无
Kafka	brokers / topics / publishTopic consumerGroup / authRequried saslUsername / saslPassword	key
Kubernetes	namespace / resyncPeriodInSec /	无
twilio-sendgrid	apiKey / emailFrom / emailTo subject / emailCc / emailBcc	emailFrom / emailTo / subject emailCc / emailBcc
twilio-sms	toNumber / fromNumber / accountSid authToken / timeout	toNumber
twitter	consumerKey / consumerSecret / accessToken accessSecret / query	query / lang / result / since_id
gcp-bucket	bucket / type / project_id / private_key_id private_key / client_email / client_id auth_uri / token_uri auth_provider_x509_cert_url / client_x509_cert_url	name
gcp-pubsub	topic / subscription / type / project_id / private_key_id / private_key client_email / client_id / auth_uri / token_uri auth_provider_x509_cert_url / client_x509_cert_url	topic
Azure-blobstorage	storageAccount / storageAccessKey / container	blobName / ContentType / ContentMD5 ContentEncoding / ContentLanguage ContentDisposition / CacheControl
Azure-cosmosDB	url / masterKey / database / collection / partitionKey	无
Azure-EventGrid	tenantId / subscriptionId / clientId clientSecret / subscriberEndpoint handshakePort / scope eventSubscriptionName / accessKey topicEndpoint	无
Azure-EventHubs	connection / consumerGroup / storageAccountName / storageAccountKey / storageContainerName partitionID / partitionKey	partitionKey
Azure-ServiceBusQueues	connectionString / queueName / ttl	id / correlationID / ttlInSeconds
Azure-SignalR	connectionString / hub	hub / group / user
Azure-storagequeue		ttlInSeconds
Aws-dynamodb	region / endpoint / accessKey secretKey / table	无
Aws-kinesis	streamName / consumerName / region endpoint / accessKey secretKey / mode	partitionKey
Aws-s3	region / endpoint / accessKey secretKey / bucket	key
Aws-sns	topicArn / region / endpoint accessKey / secretKey	无
Aws-sqs	queueName / region / endpoint accessKey / secretKey	无

5.11 - Injector的源码分析

Dapr Injector的源码分析

5.11.1 - Injector的代码实现

Dapr Injector的代码实现

Inject的流程

以e2e中的 stateapp 为例。

应用的原始Deployment

tests/apps/stateapp/service.yaml 中是 stateapp 的 Service 定义和 Deployment定义。

Service的定义没有什么特殊：

kind: Service
apiVersion: v1
metadata:
  name: stateapp
  labels:
    testapp: stateapp
spec:
  selector:
    testapp: stateapp
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: LoadBalancer

deployment的定义：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stateapp
  labels:
    testapp: stateapp
spec:
  replicas: 1
  selector:
    matchLabels:
      testapp: stateapp
  template: # stateapp的pod定义
    metadata:
      labels:
        testapp: stateapp
      annotations:
        dapr.io/enabled: "true"
        dapr.io/app-id: "stateapp"
        dapr.io/app-port: "3000"
    spec:   #stateapp的container定义，暂时pod中只定义了这个一个container
      containers:
      - name: stateapp
        image: docker.io/YOUR_DOCKER_ALIAS/e2e-stateapp:dev
        ports:
        - containerPort: 3000
        imagePullPolicy: Always

单独看 stateapp 的 pod 定义的 annotations ，

      annotations:
        dapr.io/enabled: "true"
        dapr.io/app-id: "stateapp"
        dapr.io/app-port: "3000"

源码

getPodPatchOperations：

func (i *injector) getPodPatchOperations(ar *v1beta1.AdmissionReview,
	namespace, image string, kubeClient *kubernetes.Clientset, daprClient scheme.Interface) ([]PatchOperation, error) {
	req := ar.Request
	var pod corev1.Pod
	if err := json.Unmarshal(req.Object.Raw, &pod); err != nil {
		errors.Wrap(err, "could not unmarshal raw object")
		return nil, err
	}

	log.Infof(
		"AdmissionReview for Kind=%v, Namespace=%v Name=%v (%v) UID=%v "+
			"patchOperation=%v UserInfo=%v",
		req.Kind,
		req.Namespace,
		req.Name,
		pod.Name,
		req.UID,
		req.Operation,
		req.UserInfo,
	)

	if !isResourceDaprEnabled(pod.Annotations) || podContainsSidecarContainer(&pod) {
		return nil, nil
	}
  ...

这个info日志打印的例子如下：

{"instance":"dapr-sidecar-injector-5f6f4bb6df-n5dsk","level":"info","msg":"AdmissionReview for Kind=/v1, Kind=Pod, Namespace=dapr-tests Name= () UID=d0126a13-9efd-432e-894a-5ddbee55898c patchOperation=CREATE UserInfo={system:serviceaccount:kube-system:replicaset-controller 3e5de149-07a3-434e-a8de-209abee69760 [system:serviceaccounts system:serviceaccounts:kube-system system:authenticated] map[]}","scope":"dapr.injector","time":"2020-09-25T07:07:07.6482457Z","type":"log","ver":"edge"}

可以看到在 namespace dapr-tests 下 pod 有 CREATE operation时Injector有开始工作。

isResourceDaprEnabled(pod.Annotations) 检查是否是 dapr，判断的方式是看 pod 是否有名为dapr.io/enabled 的 annotation并且设置为true，缺省为false：

const (
	daprEnabledKey                    = "dapr.io/enabled"
)
func isResourceDaprEnabled(annotations map[string]string) bool {
	return getBoolAnnotationOrDefault(annotations, daprEnabledKey, false)
}

podContainsSidecarContainer 检查 pod 是不是已经包含 dapr的sidecar，判断的方式是看 container 的名字是不是 daprd：

const (
	sidecarContainerName              = "daprd"
)
func podContainsSidecarContainer(pod *corev1.Pod) bool {
	for _, c := range pod.Spec.Containers {
		if c.Name == sidecarContainerName {
			return true
		}
	}
	return false
}

继续getPodPatchOperations()：

	id := getAppID(pod)
	// Keep DNS resolution outside of getSidecarContainer for unit testing.
	placementAddress := fmt.Sprintf("%s:80", getKubernetesDNS(placementService, namespace))
	sentryAddress := fmt.Sprintf("%s:80", getKubernetesDNS(sentryService, namespace))
	apiSrvAddress := fmt.Sprintf("%s:80", getKubernetesDNS(apiAddress, namespace))

getAppID(pod) 通过读取 annotation 来获取应用id，注意 “dapr.io/id” 已经废弃，1.0 之后将被删除，替换为dapr.io/app-id"：

const (
	appIDKey                          = "dapr.io/app-id"
  	// Deprecated, remove in v1.0
	idKey                 = "dapr.io/id"
)
func getAppID(pod corev1.Pod) string {
	id := getStringAnnotationOrDefault(pod.Annotations, appIDKey, "")
	if id != "" {
		return id
	}

	return getStringAnnotationOrDefault(pod.Annotations, idKey, pod.GetName())
}

mtlsEnabled的判断

	var trustAnchors string
	var certChain string
	var certKey string
	var identity string

	mtlsEnabled := mTLSEnabled(daprClient)
	if mtlsEnabled {
		trustAnchors, certChain, certKey = getTrustAnchorsAndCertChain(kubeClient, namespace)
		identity = fmt.Sprintf("%s:%s", req.Namespace, pod.Spec.ServiceAccountName)
	}

mTLSEnabled判断的方式，居然是读取所有的namespace下的dapr configuration：

const (
	// NamespaceAll is the default argument to specify on a context when you want to list or filter resources across all namespaces
	NamespaceAll string = ""
)
func mTLSEnabled(daprClient scheme.Interface) bool {
	resp, err := daprClient.ConfigurationV1alpha1().Configurations(meta_v1.NamespaceAll).List(meta_v1.ListOptions{})
	if err != nil {
		return defaultMtlsEnabled
	}

	for _, c := range resp.Items {
		if c.GetName() == defaultConfig {  // "daprsystem"
			return c.Spec.MTLSSpec.Enabled
		}
	}
	return defaultMtlsEnabled
}

通过读取k8s的资源来判断是否要开启 mtls，tests/config/dapr_mtls_off_config.yaml 有example内容：

apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
  name: daprsystem # 名字一定要是 daprsystem
spec:
  mtls:
    enabled: "false"  # 在这里配置要不要开启 mtls
    workloadCertTTL: "1h"
    allowedClockSkew: "20m"

但这个坑货

E0925 09:37:53.480772       1 reflector.go:153] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:224: Failed to list *v1alpha1.Configuration: v1alpha1.ConfigurationList.Items: []v1alpha1.Configuration: v1alpha1.Configuration.Spec: v1alpha1.ConfigurationSpec.MTLSSpec: v1alpha1.MTLSSpec.Enabled: ReadBool: expect t or f, but found ", error found in #10 byte of ...|enabled":"false","wo|..., bigger context ...|pec":{"mtls":{"allowedClockSkew":"20m","enabled":"false","workloadCertTTL":"1h"}}},{"apiVersion":"da|...

生效的应用pod定义

apiVersion: v1
kind: Pod
metadata:
  annotations:
    dapr.io/app-id: stateapp
    dapr.io/app-port: "3000"
    dapr.io/enabled: "true"
    dapr.io/sidecar-cpu-limit: "4.0"
    dapr.io/sidecar-cpu-request: "0.5"
    dapr.io/sidecar-memory-limit: 512Mi
    dapr.io/sidecar-memory-request: 250Mi
  creationTimestamp: "2020-09-25T07:07:07Z"
  generateName: stateapp-567b6b9c6f-
  labels:
    pod-template-hash: 567b6b9c6f
    testapp: stateapp
  name: stateapp-567b6b9c6f-84kzb
  namespace: dapr-tests
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: stateapp-567b6b9c6f
    uid: 25a34367-79ed-4e19-868a-5b063a45b1f4
  resourceVersion: "146616"
  selfLink: /api/v1/namespaces/dapr-tests/pods/stateapp-567b6b9c6f-84kzb
  uid: 0f4060df-0312-4d73-91c1-6f085462b33d
  spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
  containers:
  - env:
    - name: DAPR_HTTP_PORT
      value: "3500"
    - name: DAPR_GRPC_PORT
      value: "50001"
    image: docker.io/skyao/e2e-stateapp:dev-linux-amd64
    imagePullPolicy: Always
    name: stateapp
    ports:
    - containerPort: 3000
      name: http
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-qncjc
      readOnly: true
  - args:
    - --mode
    - kubernetes
    - --dapr-http-port
    - "3500"
    - --dapr-grpc-port
    - "50001"
    - --dapr-internal-grpc-port
    - "50002"
    - --app-port
    - "3000"
    - --app-id
    - stateapp
    - --control-plane-address
    - dapr-api.dapr-system.svc.cluster.local:80
    - --app-protocol
    - http
    - --placement-host-address
    - dapr-placement.dapr-system.svc.cluster.local:80
    - --config
    - ""
    - --log-level
    - info
    - --app-max-concurrency
    - "-1"
    - --sentry-address
    - dapr-sentry.dapr-system.svc.cluster.local:80
    - --metrics-port
    - "9090"
    - --enable-mtls
    command:
    - /daprd
    env:
    - name: DAPR_HOST_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: NAMESPACE
      value: dapr-tests
    - name: DAPR_TRUST_ANCHORS
      value: |
        -----BEGIN CERTIFICATE-----
        MIIB3TCCAYKgAwIBAgIRAMra+wjgMY6ABDtu3/vJ0NcwCgYIKoZIzj0EAwIwMTEX
        MBUGA1UEChMOZGFwci5pby9zZW50cnkxFjAUBgNVBAMTDWNsdXN0ZXIubG9jYWww
        HhcNMjAwOTI1MDU1ODAzWhcNMjEwOTI1MDU1ODAzWjAxMRcwFQYDVQQKEw5kYXBy
        LmlvL3NlbnRyeTEWMBQGA1UEAxMNY2x1c3Rlci5sb2NhbDBZMBMGByqGSM49AgEG
        CCqGSM49AwEHA0IABE/w/8YBtRJPYNJkcDM05e9PhrbGjBU/RQd09J909OJebDe8
        rthysygWrcGYHYKziKK2Pyc1j4ua2xklLC5DFEWjezB5MA4GA1UdDwEB/wQEAwIC
        BDAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwDwYDVR0TAQH/BAUwAwEB
        /zAdBgNVHQ4EFgQUQ2v6OiayM9V4DPAU6UZHGe/nc1swGAYDVR0RBBEwD4INY2x1
        c3Rlci5sb2NhbDAKBggqhkjOPQQDAgNJADBGAiEAtVBx9vDXiRE3fXJTU2yK11W5
        eo+Ce4+U6/vXDtzw4PUCIQDlLOB45ihOAhhLVLG9akhgwJOrgZLEW3FZjRabpSsb
        og==
        -----END CERTIFICATE-----        
    - name: DAPR_CERT_CHAIN
      value: |
        -----BEGIN CERTIFICATE-----
        MIIBxDCCAWqgAwIBAgIQQ1sfEH4aYacFZwBau+aOozAKBggqhkjOPQQDAjAxMRcw
        FQYDVQQKEw5kYXByLmlvL3NlbnRyeTEWMBQGA1UEAxMNY2x1c3Rlci5sb2NhbDAe
        Fw0yMDA5MjUwNTU4MDNaFw0yMTA5MjUwNTU4MDNaMBgxFjAUBgNVBAMTDWNsdXN0
        ZXIubG9jYWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAARhj7MQ1uiOkZvJ0AYV
        uiFca/Iu9D5O98E5JN1mjCohRawk+QT1PjW05YtmyVji4Tt6ckIMvOXwG3aoTsGO
        UbRio30wezAOBgNVHQ8BAf8EBAMCAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4E
        FgQUTPUh0WWBB5baKs3aJjMzInVLX/EwHwYDVR0jBBgwFoAUQ2v6OiayM9V4DPAU
        6UZHGe/nc1swGAYDVR0RBBEwD4INY2x1c3Rlci5sb2NhbDAKBggqhkjOPQQDAgNI
        ADBFAiBO0oCadeYyLM+RkSAYPSTtjMyEZ0wv1/BsWuUMg+KZ6AIhALHnT0pxiqlj
        miYT4WZWvaBc17AbUh1efgV2DAaNKm54
        -----END CERTIFICATE-----
                
    - name: DAPR_CERT_KEY
      value: |
        -----BEGIN EC PRIVATE KEY-----
        MHcCAQEEIDj6niLJ5ep+fDdY71bKyWl9RZHrXyRjND6pWySL2Q4UoAoGCCqGSM49
        AwEHoUQDQgAEYY+zENbojpGbydAGFbohXGvyLvQ+TvfBOSTdZowqIUWsJPkE9T41
        tOWLZslY4uE7enJCDLzl8Bt2qE7BjlG0Yg==
        -----END EC PRIVATE KEY-----        
    - name: SENTRY_LOCAL_IDENTITY
      value: default:dapr-tests
    image: docker.io/skyao/daprd:dev-linux-amd64
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1.0/healthz
        port: 3500
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 6
      successThreshold: 1
      timeoutSeconds: 3
    name: daprd
    ports:
    - containerPort: 3500
      name: dapr-http
      protocol: TCP
    - containerPort: 50001
      name: dapr-grpc
      protocol: TCP
    - containerPort: 50002
      name: dapr-internal
      protocol: TCP
    - containerPort: 9090
      name: dapr-metrics
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /v1.0/healthz
        port: 3500
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 6
      successThreshold: 1
      timeoutSeconds: 3
    resources:
      limits:
        cpu: "4"
        memory: 512Mi
      requests:
        cpu: 500m
        memory: 250Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-qncjc
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: docker-desktop
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-qncjc
    secret:
      defaultMode: 420
      secretName: default-token-qncjc
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-09-25T07:07:07Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-09-25T07:07:07Z"
    message: 'containers with unready status: [daprd]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-09-25T07:07:07Z"
    message: 'containers with unready status: [daprd]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-09-25T07:07:07Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://26a1d85ac6e2accd833832681b8dc2aa809e3c0fcfa293398bd5e7c2e8bf3e2b
    image: skyao/daprd:dev-linux-amd64
    imageID: docker-pullable://skyao/daprd@sha256:387f3bf4e7397c43dca9ac2d248a9ce790b1c1888aa0d6de3b07107ce124752f
    lastState:
      terminated:
        containerID: docker://26a1d85ac6e2accd833832681b8dc2aa809e3c0fcfa293398bd5e7c2e8bf3e2b
        exitCode: 1
        finishedAt: "2020-09-25T08:03:14Z"
        reason: Error
        startedAt: "2020-09-25T08:03:04Z"
    name: daprd
    ready: false
    restartCount: 21
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=daprd pod=stateapp-567b6b9c6f-84kzb_dapr-tests(0f4060df-0312-4d73-91c1-6f085462b33d)
        reason: CrashLoopBackOff
  - containerID: docker://737745ace04213c9519ad1f91e248015c89a80e2b3d61081c3c530d1c89bdbae
    image: skyao/e2e-stateapp:dev-linux-amd64
    imageID: docker-pullable://skyao/e2e-stateapp@sha256:16351b331f1338a61348c9a87fce43728369f1bf18ee69d9d45fb13db0283644
    lastState: {}
    name: stateapp
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-09-25T07:07:24Z"
  hostIP: 192.168.65.3
  phase: Running
  podIP: 10.1.0.194
  podIPs:
  - ip: 10.1.0.194
  qosClass: Burstable
  startTime: "2020-09-25T07:07:07Z"

其他

injector自身的pod定义

dapr-sidecar-injector

apiVersion: v1
kind: Pod
metadata:
  annotations:
    prometheus.io/path: /
    prometheus.io/port: "9090"
    prometheus.io/scrape: "true"
  creationTimestamp: "2020-09-25T05:57:37Z"
  generateName: dapr-sidecar-injector-5f6f4bb6df-
  labels:
    app: dapr-sidecar-injector
    app.kubernetes.io/component: sidecar-injector
    app.kubernetes.io/managed-by: helm
    app.kubernetes.io/name: dapr
    app.kubernetes.io/part-of: dapr
    app.kubernetes.io/version: dev-linux-amd64
    pod-template-hash: 5f6f4bb6df
  name: dapr-sidecar-injector-5f6f4bb6df-n5dsk
  namespace: dapr-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: dapr-sidecar-injector-5f6f4bb6df
    uid: ff47b1df-6da7-4a19-b99d-15622ca3a485
  resourceVersion: "133143"
  selfLink: /api/v1/namespaces/dapr-system/pods/dapr-sidecar-injector-5f6f4bb6df-n5dsk
  uid: 40df3834-4df2-495a-aa26-5b2a22de7639
  spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
        weight: 1
  containers:
  - args:
    - --log-level
    - info
    - --log-as-json
    - --metrics-port
    - "9090"
    command:
    - /injector
    env:
    - name: TLS_CERT_FILE
      value: /dapr/cert/tls.crt
    - name: TLS_KEY_FILE
      value: /dapr/cert/tls.key
    - name: SIDECAR_IMAGE
      value: docker.io/skyao/daprd:dev-linux-amd64
    - name: NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    image: docker.io/skyao/dapr:dev-linux-amd64
    imagePullPolicy: Always
        livenessProbe:
      failureThreshold: 5
      httpGet:
        path: /healthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    name: dapr-sidecar-injector
    ports:
    - containerPort: 4000
      name: https
      protocol: TCP
    - containerPort: 9090
      name: metrics
      protocol: TCP
    readinessProbe:
      failureThreshold: 5
      httpGet:
        path: /healthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 3
      periodSeconds: 3
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
        securityContext:
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /dapr/cert
      name: cert
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: dapr-operator-token-lgpvc
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: docker-desktop
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: dapr-operator
  serviceAccountName: dapr-operator
  terminationGracePeriodSeconds: 30
  tolerations:
    - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: cert
    secret:
      defaultMode: 420
      secretName: dapr-sidecar-injector-cert
  - name: dapr-operator-token-lgpvc
    secret:
      defaultMode: 420
      secretName: dapr-operator-token-lgpvc
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-09-25T05:57:37Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-09-25T05:58:10Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-09-25T05:58:10Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-09-25T05:57:37Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://a820646b468a07eabdd89ca133f062a93e85256afc6c19c1bdf13b56980ec5e9
    image: skyao/dapr:dev-linux-amd64
    imageID: docker-pullable://skyao/dapr@sha256:77003eee9fd02d9fc24c2e9f385a6c86223bc35915cede98a8897c0dfc51ee61
    lastState: {}
    name: dapr-sidecar-injector
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-09-25T05:58:06Z"
  hostIP: 192.168.65.3
  phase: Running
  podIP: 10.1.0.188
  podIPs:
  - ip: 10.1.0.188
  qosClass: BestEffort
  startTime: "2020-09-25T05:57:37Z"

5.11.2 - main.go的源码学习

Dapr Injector 的 main 代码

Dapr injector 中的 main.go 文件的源码分析。

init() 方法

init() 进行初始化，包括 flag （logger， metric），

flag 设定和读取

func init() {
	loggerOptions := logger.DefaultOptions()
	// 这里设定了 `log-level` 和 `log-as-json`
	loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)

	metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)
	
	// 这里设定了 `metrics-port` 和 `enable-metrics`
metricsExporter.Options().AttachCmdFlags(flag.StringVar, flag.BoolVar)

	flag.Parse()

参考 injector pod yaml文件中 Command 段：

    Command:
      /injector
    Args:
      --log-level
      info
      --log-as-json
      --enable-metrics
      --metrics-port
      9090

初始化 logger

	// Apply options to all loggers
	if err := logger.ApplyOptionsToLoggers(&loggerOptions); err != nil {
		log.Fatal(err)
	} else {
		log.Infof("log level set to: %s", loggerOptions.OutputLevel)
	}

初始化 metrics

	// Initialize dapr metrics exporter
	if err := metricsExporter.Init(); err != nil {
		log.Fatal(err)
	}

	// Initialize injector service metrics
	if err := monitoring.InitMetrics(); err != nil {
		log.Fatal(err)
	}

main() 方法

获取配置

从环境变量中读取配置：

func main() {
	logger.DaprVersion = version.Version()
	log.Infof("starting Dapr Sidecar Injector -- version %s -- commit %s", version.Version(), version.Commit())

	ctx := signals.Context()
	cfg, err := injector.GetConfigFromEnvironment()
	if err != nil {
		log.Fatalf("error getting config: %s", err)
	}
	......
}

获取daprClient

	kubeClient := utils.GetKubeClient()
	conf := utils.GetConfig()
	daprClient, _ := scheme.NewForConfig(conf)

启动 healthz server

	go func() {
		healthzServer := health.NewServer(log)
		healthzServer.Ready()

		healthzErr := healthzServer.Run(ctx, healthzPort)
		if healthzErr != nil {
			log.Fatalf("failed to start healthz server: %s", healthzErr)
		}
	}()

service account

	uids, err := injector.AllowedControllersServiceAccountUID(ctx, kubeClient)
	if err != nil {
		log.Fatalf("failed to get authentication uids from services accounts: %s", err)
	}

创建 injector

	injector.NewInjector(uids, cfg, daprClient, kubeClient).Run(ctx)

graceful shutdown

简单的sleep 5秒作为 graceful shutdown ：

	shutdownDuration := 5 * time.Second
	log.Infof("allowing %s for graceful shutdown to complete", shutdownDuration)
	<-time.After(shutdownDuration)

5.11.3 - config.go的源码学习

Dapr Injector 的 config 代码

Dapr injector package中的 config.go 文件的源码分析。

代码实现

Config 结构体定义

Injector 相关的配置项定义：

// Config represents configuration options for the Dapr Sidecar Injector webhook server
type Config struct {
	TLSCertFile            string `envconfig:"TLS_CERT_FILE" required:"true"`
	TLSKeyFile             string `envconfig:"TLS_KEY_FILE" required:"true"`
	SidecarImage           string `envconfig:"SIDECAR_IMAGE" required:"true"`
	SidecarImagePullPolicy string `envconfig:"SIDECAR_IMAGE_PULL_POLICY"`
	Namespace              string `envconfig:"NAMESPACE" required:"true"`
}

NewConfigWithDefaults() 方法

只设置了一个 SidecarImagePullPolicy 的默认值：

func NewConfigWithDefaults() Config {
	return Config{
		SidecarImagePullPolicy: "Always",
	}
}

这个方法只被下面的 GetConfigFromEnvironment() 方法调用。

GetConfigFromEnvironment() 方法

从环境中获取配置

func GetConfigFromEnvironment() (Config, error) {
	c := NewConfigWithDefaults()
	err := envconfig.Process("", &c)
	return c, err
}

envconfig.Process() 的代码实现会通过反射读取到 Config 结构体的信息，然后根据设定的环境变量名来读取。

这个方法的调用只有一个地方，在injector main 函数的开始位置：

func main() {
   log.Infof("starting Dapr Sidecar Injector -- version %s -- commit %s", version.Version(), version.Commit())

   ctx := signals.Context()
   cfg, err := injector.GetConfigFromEnvironment()
   if err != nil {
      log.Fatalf("error getting config: %s", err)
   }
   ......  
}

通过命令如 k describe pod dapr-sidecar-injector-6f656b7dd-sg87p -n dapr-system 拿到 injector pod 的yaml 文件，可以看到 Environment 的这一段：

    Environment:
      TLS_CERT_FILE:              /dapr/cert/tls.crt
      TLS_KEY_FILE:               /dapr/cert/tls.key
      SIDECAR_IMAGE:              docker.io/skyao/daprd:dev-linux-amd64
      SIDECAR_IMAGE_PULL_POLICY:  IfNotPresent
      NAMESPACE:                  dapr-system (v1:metadata.namespace)

injector yaml 备用

以下是完整的 injector pod yaml，留着备用：

Name:         dapr-sidecar-injector-6f656b7dd-sg87p
Namespace:    dapr-system
Priority:     0
Node:         docker-desktop/192.168.65.3
Start Time:   Mon, 19 Apr 2021 15:04:07 +0800
Labels:       app=dapr-sidecar-injector
              app.kubernetes.io/component=sidecar-injector
              app.kubernetes.io/managed-by=helm
              app.kubernetes.io/name=dapr
              app.kubernetes.io/part-of=dapr
              app.kubernetes.io/version=dev-linux-amd64
              pod-template-hash=6f656b7dd
Annotations:  prometheus.io/path: /
              prometheus.io/port: 9090
              prometheus.io/scrape: true
Status:       Running
IP:           10.1.2.162
IPs:
  IP:           10.1.2.162
Controlled By:  ReplicaSet/dapr-sidecar-injector-6f656b7dd
Containers:
  dapr-sidecar-injector:
    Container ID:  docker://544dabf00bdaba9cf8f320218dd0b7e6d2ebce7fbf5184ce162d58bc693162d9
    Image:         docker.io/skyao/dapr:dev-linux-amd64
    Image ID:      docker-pullable://skyao/dapr@sha256:b4843ee78eabf014e15749bc4daa5c249ce3d33f796a89aaba9d117dd3dc76c9
    Ports:         4000/TCP, 9090/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /injector
    Args:
      --log-level
      info
      --log-as-json
      --enable-metrics
      --metrics-port
      9090
    State:          Running
      Started:      Mon, 19 Apr 2021 15:04:08 +0800
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=5
    Readiness:      http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=5
    Environment:
      TLS_CERT_FILE:              /dapr/cert/tls.crt
      TLS_KEY_FILE:               /dapr/cert/tls.key
      SIDECAR_IMAGE:              docker.io/skyao/daprd:dev-linux-amd64
      SIDECAR_IMAGE_PULL_POLICY:  IfNotPresent
      NAMESPACE:                  dapr-system (v1:metadata.namespace)
    Mounts:
      /dapr/cert from cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from dapr-operator-token-cjpnd (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  dapr-sidecar-injector-cert
    Optional:    false
  dapr-operator-token-cjpnd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  dapr-operator-token-cjpnd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  17m   default-scheduler  Successfully assigned dapr-system/dapr-sidecar-injector-6f656b7dd-sg87p to docker-desktop
  Normal  Pulled     17m   kubelet            Container image "docker.io/skyao/dapr:dev-linux-amd64" already present on machine
  Normal  Created    17m   kubelet            Created container dapr-sidecar-injector
  Normal  Started    17m   kubelet            Started container dapr-sidecar-injector

5.11.4 - injector.go的源码学习

Dapr Injector 中的 injector.go 的代码

主流程代码

接口和结构体定义和创建

Injector 是Dapr运行时 sidecar 注入组件的接口。

// Injector is the interface for the Dapr runtime sidecar injection component
type Injector interface {
   Run(ctx context.Context)
}

injector 结构体定义：

type injector struct {
   config       Config
   deserializer runtime.Decoder
   server       *http.Server
   kubeClient   *kubernetes.Clientset
   daprClient   scheme.Interface
   authUIDs     []string
}

创建新的 injector 结构体（这个方法在injecot的main方法中被调用）：

// NewInjector returns a new Injector instance with the given config
func NewInjector(authUIDs []string, config Config, daprClient scheme.Interface, kubeClient *kubernetes.Clientset) Injector {
   mux := http.NewServeMux()

   i := &injector{
      config: config,
      deserializer: serializer.NewCodecFactory(
         runtime.NewScheme(),
      ).UniversalDeserializer(),
      // 启动http server
      server: &http.Server{
         Addr:    fmt.Sprintf(":%d", port),
         Handler: mux,
      },
      kubeClient: kubeClient,
      daprClient: daprClient,
      authUIDs:   authUIDs,
   }

   // 给 k8s 调用的 mutate 端点
   mux.HandleFunc("/mutate", i.handleRequest)
   return i
}

Run()方法

最核心的run方法，

func (i *injector) Run(ctx context.Context) {
   doneCh := make(chan struct{})

   // 启动go routing，监听 ctx 和 doneCh 的信号
   go func() {
      select {
      case <-ctx.Done():
         log.Info("Sidecar injector is shutting down")
         shutdownCtx, cancel := context.WithTimeout(
            context.Background(),
            time.Second*5,
         )
         defer cancel()
         i.server.Shutdown(shutdownCtx) // nolint: errcheck
      case <-doneCh:
      }
   }()

   // 打印启动时的日志，这行日志可以通过 
   log.Infof("Sidecar injector is listening on %s, patching Dapr-enabled pods", i.server.Addr)
   // TODO：这里有时会报错，证书有问题，导致injector无法正常工作，后面再来检查
   err := i.server.ListenAndServeTLS(i.config.TLSCertFile, i.config.TLSKeyFile)
   if err != http.ErrServerClosed {
      log.Errorf("Sidecar injector error: %s", err)
   }
   close(doneCh)
}

可以对比通过 k logs dapr-sidecar-injector-86b8dc4dcd-bkbgw -n dapr-system 命令查看到的injecot 日志内容：

{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"log level set to: info","scope":"dapr.injector","time":"2021-05-11T01:13:20.1904136Z","type":"log","ver":"unknown"}
{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"metrics server started on :9090/","scope":"dapr.metrics","time":"2021-05-11T01:13:20.1907347Z","type":"log","ver":"unknown"}
{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"starting Dapr Sidecar Injector -- version edge -- commit v1.0.0-rc.4-163-g9a4210a-dirty","scope":"dapr.injector","time":"2021-05-11T01:13:20.191669Z","type":"log","ver":"unknown"}
{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"Healthz server is listening on :8080","scope":"dapr.injector","time":"2021-05-11T01:13:20.1928941Z","type":"log","ver":"unknown"}

{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"Sidecar injector is listening on :4000, patching Dapr-enabled pods","scope":"dapr.injector","time":"2021-05-11T01:13:20.208587Z","type":"log","ver":"unknown"}

handleRequest方法

handleRequest方法用来处理来自 k8s api server的 mutate 调用：

mux.HandleFunc("/mutate", i.handleRequest)

func (i *injector) handleRequest(w http.ResponseWriter, r *http.Request) {
  ......
}

代码比较长，忽略部分细节代码。

读取请求的body，验证长度和content-type：

defer r.Body.Close()

var body []byte
if r.Body != nil {
   if data, err := ioutil.ReadAll(r.Body); err == nil {
      body = data
   }
}
if len(body) == 0 {
   log.Error("empty body")
   http.Error(w, "empty body", http.StatusBadRequest)
   return
}

contentType := r.Header.Get("Content-Type")
if contentType != "application/json" {
  log.Errorf("Content-Type=%s, expect application/json", contentType)
  http.Error(
    w,
    "invalid Content-Type, expect `application/json`",
    http.StatusUnsupportedMediaType,
  )

  return
}

反序列化body，并做一些基本的验证：

ar := v1.AdmissionReview{}
_, gvk, err := i.deserializer.Decode(body, nil, &ar)
if err != nil {
   log.Errorf("Can't decode body: %v", err)
} else {
   if !utils.StringSliceContains(ar.Request.UserInfo.UID, i.authUIDs) {
      err = errors.Wrapf(err, "unauthorized request")
      log.Error(err)
   } else if ar.Request.Kind.Kind != "Pod" {
      err = errors.Wrapf(err, "invalid kind for review: %s", ar.Kind)
      log.Error(err)
   } else {
      patchOps, err = i.getPodPatchOperations(&ar, i.config.Namespace, i.config.SidecarImage, i.config.SidecarImagePullPolicy, i.kubeClient, i.daprClient)
   }
}

getPodPatchOperations 是核心代码，后面细看。

统一处理前面可能产生的错误，以及 getPodPatchOperations() 的处理结果：

diagAppID := getAppIDFromRequest(ar.Request)

if err != nil {
   admissionResponse = toAdmissionResponse(err)
   log.Errorf("Sidecar injector failed to inject for app '%s'. Error: %s", diagAppID, err)
   monitoring.RecordFailedSidecarInjectionCount(diagAppID, "patch")
} else if len(patchOps) == 0 {
   // len(patchOps) == 0 表示什么都没改，返回  Allowed: true
   admissionResponse = &v1.AdmissionResponse{
      Allowed: true,
   }
} else {
   var patchBytes []byte
   // 将 patchOps 序列化为json
   patchBytes, err = json.Marshal(patchOps)
   if err != nil {
      admissionResponse = toAdmissionResponse(err)
   } else {
      // 返回AdmissionResponse
      admissionResponse = &v1.AdmissionResponse{
         Allowed: true,
         Patch:   patchBytes,
         PatchType: func() *v1.PatchType {
            pt := v1.PatchTypeJSONPatch
            return &pt
         }(),
      }
   }
}

组装 AdmissionReview:

admissionReview := v1.AdmissionReview{}
if admissionResponse != nil {
   admissionReview.Response = admissionResponse
   if ar.Request != nil {
      admissionReview.Response.UID = ar.Request.UID
      admissionReview.SetGroupVersionKind(*gvk)
   }
}

将应答序列化并返回：

log.Infof("ready to write response ...")
respBytes, err := json.Marshal(admissionReview)
if err != nil {
   http.Error(
      w,
      err.Error(),
      http.StatusInternalServerError,
   )

   log.Errorf("Sidecar injector failed to inject for app '%s'. Can't deserialize response: %s", diagAppID, err)
   monitoring.RecordFailedSidecarInjectionCount(diagAppID, "response")
}
w.Header().Set("Content-Type", "application/json")
if _, err := w.Write(respBytes); err != nil {
   log.Error(err)
} else {
   log.Infof("Sidecar injector succeeded injection for app '%s'", diagAppID)
   monitoring.RecordSuccessfulSidecarInjectionCount(diagAppID)
}

帮助类代码

toAdmissionResponse方法

toAdmissionResponse 方法用于从一个 error 创建 k8s 的 AdmissionResponse ：

// toAdmissionResponse is a helper function to create an AdmissionResponse
// with an embedded error
func toAdmissionResponse(err error) *v1.AdmissionResponse {
   return &v1.AdmissionResponse{
      Result: &metav1.Status{
         Message: err.Error(),
      },
   }
}

获取AppID

getAppIDFromRequest() 方法从 AdmissionRequest 中获取AppID：

func getAppIDFromRequest(req *v1.AdmissionRequest) string {
   // default App ID
   appID := ""

   // if req is not given
   if req == nil {
      return appID
   }

   var pod corev1.Pod
   // 解析pod的raw数据为json
   if err := json.Unmarshal(req.Object.Raw, &pod); err != nil {
      log.Warnf("could not unmarshal raw object: %v", err)
   } else {
      // 然后从pod信息中获取appID
      appID = getAppID(pod)
   }

   return appID
}

getAppID()方法的实现如下，首先读取 “dapr.io/app-id” 的 Annotation，如果没有，则取 pod 的 name 作为默认AppID：

const	appIDKey                          = "dapr.io/app-id"
func getAppID(pod corev1.Pod) string {
	return getStringAnnotationOrDefault(pod.Annotations, appIDKey, pod.GetName())
}

分支代码

ServiceAccount 相关代码

AllowedControllersServiceAccountUID（）方法返回UID数组，这些是 webhook handler 上容许的 service account 列表：

var allowedControllersServiceAccounts = []string{
	"replicaset-controller",
	"deployment-controller",
	"cronjob-controller",
	"job-controller",
	"statefulset-controller",
}

// AllowedControllersServiceAccountUID returns an array of UID, list of allowed service account on the webhook handler
func AllowedControllersServiceAccountUID(ctx context.Context, kubeClient *kubernetes.Clientset) ([]string, error) {
   allowedUids := []string{}
   for i, allowedControllersServiceAccount := range allowedControllersServiceAccounts {
      saUUID, err := getServiceAccount(ctx, kubeClient, allowedControllersServiceAccount)
      // i == 0 => "replicaset-controller" is the only one mandatory
      if err != nil && i == 0 {
         return nil, err
      } else if err != nil {
         log.Warnf("Unable to get SA %s UID (%s)", allowedControllersServiceAccount, err)
         continue
      }
      allowedUids = append(allowedUids, saUUID)
   }

   return allowedUids, nil
}

func getServiceAccount(ctx context.Context, kubeClient *kubernetes.Clientset, allowedControllersServiceAccount string) (string, error) {
	ctxWithTimeout, cancel := context.WithTimeout(ctx, getKubernetesServiceAccountTimeoutSeconds*time.Second)
	defer cancel()

	sa, err := kubeClient.CoreV1().ServiceAccounts(metav1.NamespaceSystem).Get(ctxWithTimeout, allowedControllersServiceAccount, metav1.GetOptions{})
	if err != nil {
		return "", err
	}

	return string(sa.ObjectMeta.UID), nil
}

5.11.5 - patch_operation.go的源码学习

Dapr Injector 中的 patch_operation.go 的代码

代码非常简单，只定义了一个结构体 PatchOperation，用来表示要应用于Kubernetes资源的一个单独的变化。

// PatchOperation represents a discreet change to be applied to a Kubernetes resource
type PatchOperation struct {
	Op    string      `json:"op"`
	Path  string      `json:"path"`
	Value interface{} `json:"value,omitempty"`
}

5.11.6 - pod_patch.go的源码学习

Dapr Injector 中的 pod_patch.go 的代码

主流程

getPodPatchOperations() 是最重要的方法，injector 对 pod 的修改就在这里进行：


func (i *injector) getPodPatchOperations(ar *v1.AdmissionReview,
	namespace, image, imagePullPolicy string, kubeClient *kubernetes.Clientset, daprClient scheme.Interface) ([]PatchOperation, error) {
    ......
  	return patchOps, nil
}

解析request，得到 pod 对象（这里和前面重复了？）：

req := ar.Request
var pod corev1.Pod
if err := json.Unmarshal(req.Object.Raw, &pod); err != nil {
   errors.Wrap(err, "could not unmarshal raw object")
   return nil, err
}

判断是否需要 injector 做处理：

if !isResourceDaprEnabled(pod.Annotations) || podContainsSidecarContainer(&pod) {
   return nil, nil
}

// 判断是否启动了dapr，依据是是否设置 annotation "dapr.io/enabled" 为 true，默认为false
const daprEnabledKey                    = "dapr.io/enabled"
func isResourceDaprEnabled(annotations map[string]string) bool {
	return getBoolAnnotationOrDefault(annotations, daprEnabledKey, false)
}

// 判断是否包含了 dapr 的 sidecar container
const 	sidecarContainerName              = "daprd"
func podContainsSidecarContainer(pod *corev1.Pod) bool {
	for _, c := range pod.Spec.Containers {
    // 检测方式是循环pod中的所有container，检查是否有container的名字为 "daprd"
		if c.Name == sidecarContainerName {
			return true
		}
	}
	return false
}

创建 daprd sidecar container：

sidecarContainer, err := getSidecarContainer(pod.Annotations, id, image, imagePullPolicy, req.Namespace, apiSrvAddress, placementAddress, tokenMount, trustAnchors, certChain, certKey, sentryAddress, mtlsEnabled, identity)

getSidecarContainer（）的细节后面看，先走完主流程。

patchOps := []PatchOperation{}
envPatchOps := []PatchOperation{}
var path string
var value interface{}
if len(pod.Spec.Containers) == 0 {
   // 如果pod的container数量为0（什么情况下会有这种没有container的pod？）
   path = containersPath
   value = []corev1.Container{*sidecarContainer}
} else {
   // 将 daprd 的sidecar 加入
   envPatchOps = addDaprEnvVarsToContainers(pod.Spec.Containers)
   // TODO：path 的设值有什么规范或者要求？
   path = "/spec/containers/-"
   value = sidecarContainer
}

	patchOps = append(
		patchOps,
		PatchOperation{
			Op:    "add",
			Path:  path,
			Value: value,
		},
	)
	patchOps = append(patchOps, envPatchOps...)

addDaprEnvVarsToContainers

// This function add Dapr environment variables to all the containers in any Dapr enabled pod.
// The containers can be injected or user defined.
func addDaprEnvVarsToContainers(containers []corev1.Container) []PatchOperation {
   portEnv := []corev1.EnvVar{
      {
         Name:  userContainerDaprHTTPPortName,
         Value: strconv.Itoa(sidecarHTTPPort),
      },
      {
         Name:  userContainerDaprGRPCPortName,
         Value: strconv.Itoa(sidecarAPIGRPCPort),
      },
   }
   envPatchOps := make([]PatchOperation, 0, len(containers))
   for i, container := range containers {
      path := fmt.Sprintf("%s/%d/env", containersPath, i)
      patchOps := getEnvPatchOperations(container.Env, portEnv, path)
      envPatchOps = append(envPatchOps, patchOps...)
   }
   return envPatchOps
}

分支流程：mTLS的处理

mtlsEnabled := mTLSEnabled(daprClient)
if mtlsEnabled {
   trustAnchors, certChain, certKey = getTrustAnchorsAndCertChain(kubeClient, namespace)
   identity = fmt.Sprintf("%s:%s", req.Namespace, pod.Spec.ServiceAccountName)
}

func mTLSEnabled(daprClient scheme.Interface) bool {
   resp, err := daprClient.ConfigurationV1alpha1().Configurations(meta_v1.NamespaceAll).List(meta_v1.ListOptions{})
   if err != nil {
      log.Errorf("Failed to load dapr configuration from k8s, use default value %t for mTLSEnabled: %s", defaultMtlsEnabled, err)
      return defaultMtlsEnabled
   }

   for _, c := range resp.Items {
      if c.GetName() == defaultConfig {
         return c.Spec.MTLSSpec.Enabled
      }
   }
   log.Infof("Dapr system configuration (%s) is not found, use default value %t for mTLSEnabled", defaultConfig, defaultMtlsEnabled)
   return defaultMtlsEnabled
}

分支处理：serviceaccount

6 - components-contrib仓库的源码学习

Dapr源码学习之components-contrib仓库

components-contrib仓库中的代码：

https://github.com/dapr/components-contrib

6.1 - workflow组件的源码学习

components-contrib仓库中的workflow组件代码实现

6.1.1 - workflow定义和操作方法

workflow的定义和操作方法的具体内容

代码量比较少，就放在一起看吧。

接口定义

workflow 接口

workflow 接口定义了 workflow 上要履行的操作：

var ErrNotImplemented = errors.New("this component doesn't implement the current API operation")

type Workflow interface {
	Init(metadata Metadata) error
	Start(ctx context.Context, req *StartRequest) (*StartResponse, error)
	Terminate(ctx context.Context, req *TerminateRequest) error
	Get(ctx context.Context, req *GetRequest) (*StateResponse, error)
	RaiseEvent(ctx context.Context, req *RaiseEventRequest) error
	Purge(ctx context.Context, req *PurgeRequest) error
	Pause(ctx context.Context, req *PauseRequest) error
	Resume(ctx context.Context, req *ResumeRequest) error
}

其中 Init 是初始化 workflow 实现。

Start / Terminate / Pause / Resume 是 workflow 的生命周期管理。

如果没有实现上述操作，则需要返回错误，而错误信息在 ErrNotImplemented 中有统一给出。

操作

init 操作

通过 metadata 进行初始化，和其他组件类似：

type Workflow interface {
	Init(metadata Metadata) error
	......
}

type Metadata struct {
	metadata.Base `json:",inline"`
}

Start 操作

start 操作用来开始一个工作流：

type Workflow interface {
	Start(ctx context.Context, req *StartRequest) (*StartResponse, error)
	......
}

// StartRequest is the struct describing a start workflow request.
type StartRequest struct {
	InstanceID    string            `json:"instanceID"`
	Options       map[string]string `json:"options"`
	WorkflowName  string            `json:"workflowName"`
	WorkflowInput []byte            `json:"workflowInput"`
}

type StartResponse struct {
	InstanceID string `json:"instanceID"`
}

start 操作的请求参数是：

InstanceID：
Options：map[string]string
WorkflowName：
WorkflowInput： []byte

start 操作的响应参数是：

InstanceID：

Terminate 操作

Terminate 操作用来终止一个 workflow：

type Workflow interface {
	Terminate(ctx context.Context, req *TerminateRequest) error
}

type TerminateRequest struct {
	InstanceID string `json:"instanceID"`
}

start 操作的请求只需要传递一个 InstanceID 参数。

Get 操作

Get 操作用来或者一个工作流实例的状态：

type Workflow interface {
	Get(ctx context.Context, req *GetRequest) (*StateResponse, error)
	......
}

type GetRequest struct {
	InstanceID string `json:"instanceID"`
}

type StateResponse struct {
	Workflow *WorkflowState `json:"workflow"`
}

type WorkflowState struct {
	InstanceID    string            `json:"instanceID"`
	WorkflowName  string            `json:"workflowName"`
	CreatedAt     time.Time         `json:"startedAt"`
	LastUpdatedAt time.Time         `json:"lastUpdatedAt"`
	RuntimeStatus string            `json:"runtimeStatus"`
	Properties    map[string]string `json:"properties"`
}

Get 操作的请求只需要传递一个 InstanceID 参数。

Get 操作的响应参数是 WorkflowState，字段有：

InstanceID：
WorkflowName：
CreatedAt
LastUpdatedAt
RuntimeStatus
Properties

Purge 操作

Purge 操作用来终止一个 workflow：

type Workflow interface {
	Purge(ctx context.Context, req *PurgeRequest) error
}

type PurgeRequest struct {
	InstanceID string `json:"instanceID"`
}

Purge 操作的请求只需要传递一个 InstanceID 参数。

Pause 操作

Pause 操作用来暂停一个 workflow：

type Workflow interface {
	Pause(ctx context.Context, req *PauseRequest) error
}

type PauseRequest struct {
	InstanceID string `json:"instanceID"`
}

Pause 操作的请求只需要传递一个 InstanceID 参数。

Resume 操作

Resume 操作用来继续一个 workflow：

type Workflow interface {
	Resume(ctx context.Context, req *ResumeRequest) error
}

type ResumeRequest struct {
	InstanceID string `json:"instanceID"`
}

Resume 操作的请求只需要传递一个 InstanceID 参数。

6.1.2 - temporal集成

temporal集成的实现

workflow 定义

TemporalWF 结构体包含 temporal 的 client：

type TemporalWF struct {
	client client.Client
	logger logger.Logger
}

temporalMetadata 结构体定义 metadata：

type temporalMetadata struct {
	Identity  string `json:"identity" mapstructure:"identity"`
	HostPort  string `json:"hostport" mapstructure:"hostport"`
	Namespace string `json:"namespace" mapstructure:"namespace"`
}

创建workflow

NewTemporalWorkflow()方法

// NewTemporalWorkflow returns a new workflow.
func NewTemporalWorkflow(logger logger.Logger) workflows.Workflow {
	s := &TemporalWF{
		logger: logger,
	}
	return s
}

Init()方法

func (c *TemporalWF) Init(metadata workflows.Metadata) error {
	c.logger.Debugf("Temporal init start")
	m, err := c.parseMetadata(metadata)
	if err != nil {
		return err
	}
	cOpt := client.Options{}
	if m.HostPort != "" {
		cOpt.HostPort = m.HostPort
	}
	if m.Identity != "" {
		cOpt.Identity = m.Identity
	}
	if m.Namespace != "" {
		cOpt.Namespace = m.Namespace
	}
	// Create the workflow client
	newClient, err := client.Dial(cOpt)
	if err != nil {
		return err
	}
	c.client = newClient

	return nil
}

func (c *TemporalWF) parseMetadata(meta workflows.Metadata) (*temporalMetadata, error) {
	var m temporalMetadata
	err := metadata.DecodeMetadata(meta.Properties, &m)
	return &m, err
}

workflow操作

Start

func (c *TemporalWF) Start(ctx context.Context, req *workflows.StartRequest) (*workflows.StartResponse, error) {
	c.logger.Debugf("starting workflow")

	if len(req.Options) == 0 {
		c.logger.Debugf("no options provided")
		return nil, errors.New("no options provided. At the very least, a task queue is needed")
	}

	if _, ok := req.Options["task_queue"]; !ok {
		c.logger.Debugf("no task queue provided")
		return nil, errors.New("no task queue provided")
	}
	taskQ := req.Options["task_queue"]

	opt := client.StartWorkflowOptions{ID: req.InstanceID, TaskQueue: taskQ}

	var inputArgs interface{}
	if err := decodeInputData(req.WorkflowInput, &inputArgs); err != nil {
		return nil, fmt.Errorf("error decoding workflow input data: %w", err)
	}

	run, err := c.client.ExecuteWorkflow(ctx, opt, req.WorkflowName, inputArgs)
	if err != nil {
		return nil, fmt.Errorf("error executing workflow: %w", err)
	}
	wfStruct := workflows.StartResponse{InstanceID: run.GetID()}
	return &wfStruct, nil
}

代码和 temporal 的牵连还是很重的，WorkflowInput 相当于透传给了 temporal ，dapr 对此没有做任何的抽象和封装，只是简单透传。

Terminate

func (c *TemporalWF) Terminate(ctx context.Context, req *workflows.TerminateRequest) error {
	c.logger.Debugf("terminating workflow")

	err := c.client.TerminateWorkflow(ctx, req.InstanceID, "", "")
	if err != nil {
		return fmt.Errorf("error terminating workflow: %w", err)
	}
	return nil
}

7 - sentry库的源码学习

Dapr源码学习之sentry

7.1 - sentry的main函数入口

sentry 模块的入口在文件 cmd/sentry/main.go 中。

准备工作

读取命令行参数

const (
	defaultCredentialsPath = "/var/run/dapr/credentials"
	// defaultDaprSystemConfigName is the default resource object name for Dapr System Config.
	defaultDaprSystemConfigName = "daprsystem"

	healthzPort = 8080
)

func main() {
	configName := flag.String("config", defaultDaprSystemConfigName, "Path to config file, or name of a configuration object")
	credsPath := flag.String("issuer-credentials", defaultCredentialsPath, "Path to the credentials directory holding the issuer data")
	flag.StringVar(&credentials.RootCertFilename, "issuer-ca-filename", credentials.RootCertFilename, "Certificate Authority certificate filename")
	flag.StringVar(&credentials.IssuerCertFilename, "issuer-certificate-filename", credentials.IssuerCertFilename, "Issuer certificate filename")
	flag.StringVar(&credentials.IssuerKeyFilename, "issuer-key-filename", credentials.IssuerKeyFilename, "Issuer private key filename")
	trustDomain := flag.String("trust-domain", "localhost", "The CA trust domain")
	tokenAudience := flag.String("token-audience", "", "Expected audience for tokens; multiple values can be separated by a comma")
......
}

logger 和 metrics 的参数需要展开：

	loggerOptions := logger.DefaultOptions()
	loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)

	metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)
	metricsExporter.Options().AttachCmdFlags(flag.StringVar, flag.BoolVar)

获取 k8s 的配置文件路径：

	var kubeconfig *string
	if home := homedir.HomeDir(); home != "" {
    // 读取 home 路径
		kubeconfig = flag.String("kubeconfig", filepath.Join(home, ".kube", "config"), "(optional) absolute path to the kubeconfig file")
	} else {
    // 通过 `--kubeconfig` 传递完整的 kubeconfig 文件路径
		kubeconfig = flag.String("kubeconfig", "", "absolute path to the kubeconfig file")
	}

最后解析一把：

flag.Parse()

设置环境变量

将 kubeconfig 的值设置到 KUBE_CONFIG 环境变量：

var (
	KubeConfigVar = "KUBE_CONFIG"
)

if err := utils.SetEnvVariables(map[string]string{
		utils.KubeConfigVar: *kubeconfig,
	}); err != nil {
		log.Fatalf("error set env failed:  %s", err.Error())
	}

初始化

这行日志标记着初始化正式开始：

	log.Infof("starting sentry certificate authority -- version %s -- commit %s", buildinfo.Version(), buildinfo.Commit())
	log.Infof("log level set to: %s", loggerOptions.OutputLevel)

初始化metrics

// Initialize dapr metrics exporter
	if err := metricsExporter.Init(); err != nil {
		log.Fatal(err)
	}

初始化监控

	if err := monitoring.InitMetrics(); err != nil {
		log.Fatal(err)
	}

读取配置

  // 拼凑文件路径
  issuerCertPath := filepath.Join(*credsPath, credentials.IssuerCertFilename) //issuer.crt
	issuerKeyPath := filepath.Join(*credsPath, credentials.IssuerKeyFilename)   // issuer.key
	rootCertPath := filepath.Join(*credsPath, credentials.RootCertFilename)     // ca.crt

  // 读取 sentry 配置：
  config, err := config.FromConfigName(*configName)
	if err != nil {
		log.Warn(err)
	}

  // 保存证书相关的各个路径和参数
	config.IssuerCertPath = issuerCertPath
	config.IssuerKeyPath = issuerKeyPath
	config.RootCertPath = rootCertPath
	config.TrustDomain = *trustDomain
	if *tokenAudience != "" {
		config.TokenAudience = tokenAudience
	}

启动服务

启动sentry server

	ca := sentry.NewSentryCA()

	// Start the server in background
	err = ca.Start(runCtx, config)
	if err != nil {
		log.Fatalf("failed to restart sentry server: %s", err)
	}

启动 health server

	log.Infof("starting watch on filesystem directory: %s", watchDir)

// Start the health server in background
	go func() {
		healthzServer := health.NewServer(log)
		healthzServer.Ready()

		if innerErr := healthzServer.Run(runCtx, healthzPort); innerErr != nil {
			log.Fatalf("failed to start healthz server: %s", innerErr)
		}
	}()

监控目录变化

  issuerEvent := make(chan struct{})
  watchDir := filepath.Dir(config.IssuerCertPath)

  // Watch for changes in the watchDir
	// This also blocks until runCtx is canceled
	fswatcher.Watch(runCtx, watchDir, issuerEvent)

这个函数会一直阻塞直到 runCtx 被取消（这意味着要退出 sentry 进程）。

如果有文件更新，则 issuerEvent 会收到 event，issuerEvent 相关的处理代码：

	go func() {
		// Restart the server when the issuer credentials change
		var restart <-chan time.Time
		for {
			select {
			case <-issuerEvent:
				monitoring.IssuerCertChanged()
				log.Debug("received issuer credentials changed signal")
				// Batch all signals within 2s of each other
				if restart == nil {
          // issuerEvent 不会被直接处理，而是安排在 2 秒发一个 restart event
          // 2秒之内的各种 issuerEvent 都会被这个 restart event 集中处理
					restart = time.After(2 * time.Second)
				}
			case <-restart:
        // 收到 restart，意味着 issuerEvent 已经积攒了 2 秒钟，可以统一处理了
				log.Warn("issuer credentials changed; reloading")
				innerErr := ca.Restart(runCtx, config)
				if innerErr != nil {
					log.Fatalf("failed to restart sentry server: %s", innerErr)
				}
        // 重置 restart，恢复原样，以便处理 2 秒之后的后续 issuerEvent
				restart = nil
			}
		}
	}()

退出

	shutdownDuration := 5 * time.Second
	log.Infof("allowing %s for graceful shutdown to complete", shutdownDuration)
	<-time.After(shutdownDuration)

总结

去除非核心代码，sentry main 函数的主要功能是启动 sentry 的 ca server, 并监控目录，如果有变化则重启 ca server。

7.2 - sentry的Proto定义

proto服务定义

sentry 模块的 proto 服务定义在文件 dapr/proto/sentry/v1/sentry.proto 中。

service CA {
  // A request for a time-bound certificate to be signed.
  //
  // The requesting side must provide an id for both loosely based
  // And strong based identities.
  rpc SignCertificate (SignCertificateRequest) returns (SignCertificateResponse) {}
}

SignCertificate() 方法要求签署一个有时间限制的证书。请求方必须提供一个可以同时用于松散型身份和强势型身份的ID。

SignCertificateRequest 的定义：

message SignCertificateRequest {
  string id = 1;
  string token = 2;
  string trust_domain = 3;
  string namespace = 4;
  // A PEM-encoded x509 CSR.
  bytes certificate_signing_request = 5;
}

SignCertificateResponse 的定义：

message SignCertificateResponse {
  // A PEM-encoded x509 Certificate.
  bytes workload_certificate = 1;

  // A list of PEM-encoded x509 Certificates that establish the trust chain
  // between the workload certificate and the well-known trust root cert.
  repeated bytes trust_chain_certificates = 2;

  google.protobuf.Timestamp valid_until = 3;
}

trust_chain_certificates 是一个 PEM 编码的 x509 证书的列表，这些证书在 workload_certificate 和众所周知的信任根证书（trust root cert）之间建立信任链。

7.3 - sentry代码

sentry 模块的主要实现在文件 pkg/sentry/sentry.go 中。

定义

定义 CA 接口

type CertificateAuthority interface {
	Start(context.Context, config.SentryConfig) error
	Stop()
	Restart(context.Context, config.SentryConfig) error
}

start 和 restart 的函数定义是一样的。

定义 sentry 结构体

type sentry struct {
	conf        config.SentryConfig    // sentry的配置，启动时由 main 函数初始化后传入
	ctx         context.Context				 // 启动时由 main 函数初始化后传入
	cancel      context.CancelFunc     
	server      server.CAServer        // CA server
	restartLock sync.Mutex             // 用于 restart 的锁
	running     chan bool
	stopping    chan bool
}

主流程

Sentry.go 被 sentry main.go 调用，主要工作流程就是三个事情：

// 1. 初始化
ca := sentry.NewSentryCA()
// 2. 启动
err = ca.Start(runCtx, config)
// 3. 在需要时重启
innerErr := ca.Restart(runCtx, config)

备注：sentry main.go 没有调用 sentry的 stop()，这个 stop() 只在 restart() 方法中被调用。

初始化 sentry

NewSentryCA() 的实现：

// NewSentryCA returns a new Sentry Certificate Authority instance.
func NewSentryCA() CertificateAuthority {
	return &sentry{
		running: make(chan bool, 1),
	}
}

什么都没干，只是初始化了 running 这个channel。

启动 sentry

// Start the server in background.
func (s *sentry) Start(ctx context.Context, conf config.SentryConfig) error {
	// If the server is already running, return an error
	select {
	case s.running <- true:
	default:
		return errors.New("CertificateAuthority server is already running")
	}

	// Create the CA server
	s.conf = conf
	certAuth, v := s.createCAServer()

	// Start the server in background
	s.ctx, s.cancel = context.WithCancel(ctx)
	go s.run(certAuth, v)

	// Wait 100ms to ensure a clean startup
	time.Sleep(100 * time.Millisecond)

	return nil
}

主要工作就是创建 CA server，然后运行服务。

创建 ca server

createCAServer() 方法加载信任锚和签发者证书，然后创建一个新的CA：

// Loads the trust anchors and issuer certs, then creates a new CA.
func (s *sentry) createCAServer() (ca.CertificateAuthority, identity.Validator) {
	// Create CA
	certAuth, authorityErr := ca.NewCertificateAuthority(s.conf)
	if authorityErr != nil {
		log.Fatalf("error getting certificate authority: %s", authorityErr)
	}
	log.Info("certificate authority loaded")

	// Load the trust bundle
	trustStoreErr := certAuth.LoadOrStoreTrustBundle()
	if trustStoreErr != nil {
		log.Fatalf("error loading trust root bundle: %s", trustStoreErr)
	}
	certExpiry := certAuth.GetCACertBundle().GetIssuerCertExpiry()
	if certExpiry == nil {
		log.Fatalf("error loading trust root bundle: missing certificate expiry")
	} else {
		// Need to be in an else block for the linter
		log.Infof("trust root bundle loaded. issuer cert expiry: %s", certExpiry.String())
	}
	monitoring.IssuerCertExpiry(certExpiry)

	// Create identity validator
	v, validatorErr := s.createValidator()
	if validatorErr != nil {
		log.Fatalf("error creating validator: %s", validatorErr)
	}
	log.Info("validator created")

	return certAuth, v
}

方法返回 ca.CertificateAuthority 和 identity.Validator 。

创建 identity.Validator

createValidator 的实现细节：

func (s *sentry) createValidator() (identity.Validator, error) {
	if config.IsKubernetesHosted() {  // 通过 KUBERNETES_SERVICE_HOST 环境变量来判断
		// we're in Kubernetes, create client and init a new serviceaccount token validator
		kubeClient, err := k8s.GetClient()
		if err != nil {
			return nil, fmt.Errorf("failed to create kubernetes client: %w", err)
		}

		// TODO: Remove once the NoDefaultTokenAudience feature is finalized
		noDefaultTokenAudience := false

    // 创建 kubernetes 的 Validator
		return kubernetes.NewValidator(kubeClient, s.conf.GetTokenAudiences(), noDefaultTokenAudience), nil
	}
  
  // 创建 selfhosted 的 Validator
	return selfhosted.NewValidator(), nil
}

运行 sentry

run 方法运行 CA server，阻塞直到服务器关闭：

// Runs the CA server.
// This method blocks until the server is shut down.
func (s *sentry) run(certAuth ca.CertificateAuthority, v identity.Validator) {
	s.server = server.NewCAServer(certAuth, v)

	// In background, watch for the root certificate's expiration
	go watchCertExpiry(s.ctx, certAuth)

	// Watch for context cancelation to stop the server
	go func() {
		<-s.ctx.Done()
		s.server.Shutdown()
		close(s.running)
		s.running = make(chan bool, 1)
		if s.stopping != nil {
			close(s.stopping)
		}
	}()

	// Start the server; this is a blocking call
	log.Infof("sentry certificate authority is running, protecting y'all")
	serverRunErr := s.server.Run(s.conf.Port, certAuth.GetCACertBundle())
	if serverRunErr != nil {
		log.Fatalf("error starting gRPC server: %s", serverRunErr)
	}
}

启动 ca 的 grpc server 以便接收外部请求。

监控证书过期

Run() 方法中启动了一个 goroutine，用于监控证书是否过期。如果快要过期了，则会显示错误信息。

// Watches certificates' expiry and shows an error message when they're nearing expiration time.
// This is a blocking method that should be run in its own goroutine.
func watchCertExpiry(ctx context.Context, certAuth ca.CertificateAuthority) {
	log.Debug("starting root certificate expiration watcher")
  // time 是每小时触发一次
	certExpiryCheckTicker := time.NewTicker(time.Hour)
	for {
		select {
		case <-certExpiryCheckTicker.C:
			caCrt := certAuth.GetCACertBundle().GetRootCertPem()
			block, _ := pem.Decode(caCrt)
			cert, certParseErr := x509.ParseCertificate(block.Bytes)
			if certParseErr != nil {
				log.Warn("could not determine Dapr root certificate expiration time")
				break
			}
			if cert.NotAfter.Before(time.Now().UTC()) {
        // 已经过期则报警
				log.Warn("Dapr root certificate expiration warning: certificate has expired.")
				break
			}
			if (cert.NotAfter.Add(-30 * 24 * time.Hour)).Before(time.Now().UTC()) {
        // 有效期不足30天也报警
				expiryDurationHours := int(cert.NotAfter.Sub(time.Now().UTC()).Hours())
				log.Warnf("Dapr root certificate expiration warning: certificate expires in %d days and %d hours", expiryDurationHours/24, expiryDurationHours%24)
			} else {
				validity := cert.NotAfter.Sub(time.Now().UTC())
				log.Debugf("Dapr root certificate is still valid for %s", validity.String())
			}
		case <-ctx.Done():
			log.Debug("terminating root certificate expiration watcher")
			certExpiryCheckTicker.Stop()
			return
		}
	}
}

停止 sentry

// Stop the server.
func (s *sentry) Stop() {
	log.Info("sentry certificate authority is shutting down")
	if s.cancel != nil {
		s.stopping = make(chan bool)
		s.cancel()
		<-s.stopping
		s.stopping = nil
	}
}

重启 sentry

Restart() 方法重启 sentry：

func (s *sentry) Restart(ctx context.Context, conf config.SentryConfig) error {
	s.restartLock.Lock()
	defer s.restartLock.Unlock()
	log.Info("sentry certificate authority is restarting")
	s.Stop()
	// Wait 200ms to ensure a clean shutdown
	time.Sleep(200 * time.Millisecond)
	return s.Start(ctx, conf)
}

步骤：

先加锁
停止 sentry
sleep 200 毫秒
再启动 sentry

7.4 - CA server代码

ca server 的实现在文件 pkg/sentry/server/server.go 中。

定义

CAServer 接口

// CAServer is an interface for the Certificate Authority server.
type CAServer interface {
	Run(port int, trustBundle ca.TrustRootBundler) error
	Shutdown()
}

server 结构体

type server struct {
	certificate *tls.Certificate
	certAuth    ca.CertificateAuthority
	srv         *grpc.Server    // grpc server，用来对外提供 grpc 服务
	validator   identity.Validator
}

主流程

server.go 被 sentry.go 调用，主要工作流程就是三个事情：

// 1. 初始化CA server
s.server = server.NewCAServer(certAuth, v)
// 2. 运行CA server
s.server.Run(s.conf.Port, certAuth.GetCACertBundle())
// 3. 在需要时关闭CA server
s.server.Shutdown()

初始化CA server

// NewCAServer returns a new CA Server running a gRPC server.
func NewCAServer(ca ca.CertificateAuthority, validator identity.Validator) CAServer {
	return &server{
		certAuth:  ca,
		validator: validator,
	}
}

保存传递进来的参数，这两个参数在 sentry.go 中初始化。

运行 CA server

CA server 主要提供两个功能：

基本的 grpc 服务：以便为客户端提供服务
安全：必须为提供的服务进行安全保护，因此客户端必须实用 trust root cert

// Run starts a secured gRPC server for the Sentry Certificate Authority.
// It enforces client side cert validation using the trust root cert.
func (s *server) Run(port int, trustBundler ca.TrustRootBundler) error {
	addr := fmt.Sprintf(":%d", port)
	lis, err := net.Listen("tcp", addr)
	if err != nil {
		return fmt.Errorf("could not listen on %s: %w", addr, err)
	}

	tlsOpt := s.tlsServerOption(trustBundler)
  // 创建 grpc server
	s.srv = grpc.NewServer(tlsOpt)
  // 注册 ca server 到 grpc server
	sentryv1pb.RegisterCAServer(s.srv, s)

  // 启动 grpc server 监听服务地址
	if err := s.srv.Serve(lis); err != nil {
		return fmt.Errorf("grpc serve error: %w", err)
	}
	return nil
}

trustBundler 是从 sentry.go 中传递过来，后面详细展开。

关闭 CA server

func (s *server) Shutdown() {
	if s.srv != nil {
    // 调用 grpc 的 GracefulStop，会在请求完成后再关闭
		s.srv.GracefulStop()
	}
}

客户端安全

tlsServerOption() 方法，为客户端连接准备 tls 相关的选项：

func (s *server) tlsServerOption(trustBundler ca.TrustRootBundler) grpc.ServerOption {
	cp := trustBundler.GetTrustAnchors()

	//nolint:gosec
	config := &tls.Config{
		ClientCAs: cp,
    // 这里要求验证客户端证书
		// Require cert verification
		ClientAuth: tls.RequireAndVerifyClientCert,
		GetCertificate: func(*tls.ClientHelloInfo) (*tls.Certificate, error) {
			if s.certificate == nil || needsRefresh(s.certificate, serverCertExpiryBuffer) {
        // 如果ca server的证书为空，或者需要刷新，则开始创建/刷新证书
				cert, err := s.getServerCertificate()
				if err != nil {
					monitoring.ServerCertIssueFailed("server_cert")
					log.Error(err)
					return nil, fmt.Errorf("failed to get TLS server certificate: %w", err)
				}
				s.certificate = cert
			}
			return s.certificate, nil
		},
	}
	return grpc.Creds(credentials.NewTLS(config))
}

needsRefresh() 方法的实现：

func needsRefresh(cert *tls.Certificate, expiryBuffer time.Duration) bool {
	leaf := cert.Leaf
	if leaf == nil {
		return true
	}

	// Check if the leaf certificate is about to expire.
  // 检查是不是快要过期了:15 分钟
	return leaf.NotAfter.Add(-serverCertExpiryBuffer).Before(time.Now().UTC())
}
const (
	serverCertExpiryBuffer = time.Minute * 15
)

getServerCertificate() 方法负责生成服务器端的证书：

func (s *server) getServerCertificate() (*tls.Certificate, error) {
	csrPem, pkPem, err := csr.GenerateCSR("", false)
	if err != nil {
		return nil, err
	}

	now := time.Now().UTC()
	issuerExp := s.certAuth.GetCACertBundle().GetIssuerCertExpiry()
	if issuerExp == nil {
		return nil, errors.New("could not find expiration in issuer certificate")
	}
	serverCertTTL := issuerExp.Sub(now)

	resp, err := s.certAuth.SignCSR(csrPem, s.certAuth.GetCACertBundle().GetTrustDomain(), nil, serverCertTTL, false)
	if err != nil {
		return nil, err
	}

	certPem := resp.CertPEM
	certPem = append(certPem, s.certAuth.GetCACertBundle().GetIssuerCertPem()...)
	if rootCertPem := s.certAuth.GetCACertBundle().GetRootCertPem(); len(rootCertPem) > 0 {
		certPem = append(certPem, rootCertPem...)
	}

	cert, err := tls.X509KeyPair(certPem, pkPem)
	if err != nil {
		return nil, err
	}

	return &cert, nil
}

更多细节要看 certAuth.SignCSR() 方法的实现。

签署证书

SignCertificate() 方法处理从 dapr sidedar 发起的 CSR 请求。这个方法接收带有 identity 和初始证书的请求，并为调用者返回包括信任链在内的签名证书和过期时间。

// SignCertificate handles CSR requests originating from Dapr sidecars.
// The method receives a request with an identity and initial cert and returns
// A signed certificate including the trust chain to the caller along with an expiry date.
func (s *server) SignCertificate(ctx context.Context, req *sentryv1pb.SignCertificateRequest) (*sentryv1pb.SignCertificateResponse, error) {
	monitoring.CertSignRequestReceived()

	csrPem := req.GetCertificateSigningRequest()

  // 解析请求中的 CSR
	csr, err := certs.ParsePemCSR(csrPem)
	if err != nil {
		err = fmt.Errorf("cannot parse certificate signing request pem: %w", err)
		log.Error(err)
		monitoring.CertSignFailed("cert_parse")
		return nil, err
	}

  // 验证 CSR
	err = s.certAuth.ValidateCSR(csr)
	if err != nil {
		err = fmt.Errorf("error validating csr: %w", err)
		log.Error(err)
		monitoring.CertSignFailed("cert_validation")
		return nil, err
	}

  // 验证请求身份
	err = s.validator.Validate(req.GetId(), req.GetToken(), req.GetNamespace())
	if err != nil {
		err = fmt.Errorf("error validating requester identity: %w", err)
		log.Error(err)
		monitoring.CertSignFailed("req_id_validation")
		return nil, err
	}

  // 签名证书
	identity := identity.NewBundle(csr.Subject.CommonName, req.GetNamespace(), req.GetTrustDomain())
	signed, err := s.certAuth.SignCSR(csrPem, csr.Subject.CommonName, identity, -1, false)
	if err != nil {
		err = fmt.Errorf("error signing csr: %w", err)
		log.Error(err)
		monitoring.CertSignFailed("cert_sign")
		return nil, err
	}

  // 准备要返回的各种数据
	certPem := signed.CertPEM
	issuerCert := s.certAuth.GetCACertBundle().GetIssuerCertPem()
	rootCert := s.certAuth.GetCACertBundle().GetRootCertPem()

	certPem = append(certPem, issuerCert...)
	if len(rootCert) > 0 {
		certPem = append(certPem, rootCert...)
	}

	if len(certPem) == 0 {
		err = errors.New("insufficient data in certificate signing request, no certs signed")
		log.Error(err)
		monitoring.CertSignFailed("insufficient_data")
		return nil, err
	}

	expiry := timestamppb.New(signed.Certificate.NotAfter)
	if err = expiry.CheckValid(); err != nil {
		return nil, fmt.Errorf("could not validate certificate validity: %w", err)
	}

  // 组装 response 结构体
	resp := &sentryv1pb.SignCertificateResponse{
		WorkloadCertificate:    certPem,
		TrustChainCertificates: [][]byte{issuerCert, rootCert},
		ValidUntil:             expiry,
	}

	monitoring.CertSignSucceed()

	return resp, nil
}

总结

实现很简单，就是涉及到证书的各种操作，需要有相关的背景知识。

7.5 - csr代码

处理 csr 的相关逻辑

csr 相关的逻辑实现在文件 pkg/sentry/csr/csr.go 中。

准备工作

常量定义

const (
	blockTypeECPrivateKey = "EC PRIVATE KEY" // EC private key
	blockTypePrivateKey   = "PRIVATE KEY"    // PKCS#8 private key
	encodeMsgCSR          = "CERTIFICATE REQUEST"
	encodeMsgCert         = "CERTIFICATE"
)

全局变量定义

// The OID for the SAN extension (http://www.alvestrand.no/objectid/2.5.29.17.html)
var oidSubjectAlternativeName = asn1.ObjectIdentifier{2, 5, 29, 17}

实现

生成CSR

GenerateCSR() f方法创建 X.509 certificate sign request 和私钥：

// GenerateCSR creates a X.509 certificate sign request and private key.
func GenerateCSR(org string, pkcs8 bool) ([]byte, []byte, error) {
  // 生成 ec 私钥
	key, err := certs.GenerateECPrivateKey()
	if err != nil {
		return nil, nil, fmt.Errorf("unable to generate private keys: %w", err)
	}

  // 生成 csr 模版
	templ, err := genCSRTemplate(org)
	if err != nil {
		return nil, nil, fmt.Errorf("error generating csr template: %w", err)
	}

  // 创建证书请求
	csrBytes, err := x509.CreateCertificateRequest(rand.Reader, templ, key)
	if err != nil {
		return nil, nil, fmt.Errorf("failed to create CSR: %w", err)
	}

  // 编码证书
	crtPem, keyPem, err := encode(true, csrBytes, key, pkcs8)
	return crtPem, keyPem, err
}

生成 csr 模版的实现，只设置了Organization ：

func genCSRTemplate(org string) (*x509.CertificateRequest, error) {
	return &x509.CertificateRequest{
		Subject: pkix.Name{
			Organization: []string{org},
		},
	}, nil
}

编码证书的实现代码：

func encode(csr bool, csrOrCert []byte, privKey *ecdsa.PrivateKey, pkcs8 bool) ([]byte, []byte, error) {
	// 判断是 "CERTIFICATE" 还是 "CERTIFICATE REQUEST"
  encodeMsg := encodeMsgCert
	if csr {
		encodeMsg = encodeMsgCSR
	}
  // 执行编码
	csrOrCertPem := pem.EncodeToMemory(&pem.Block{Type: encodeMsg, Bytes: csrOrCert})

	var encodedKey, privPem []byte
	var err error

	if pkcs8 {
    // 如果是 pkcs8，需要将私钥编码为 PKCS8 私钥 / "PRIVATE KEY"
		if encodedKey, err = x509.MarshalPKCS8PrivateKey(privKey); err != nil {
			return nil, nil, err
		}
    // 将上面的 PKCS8 私钥编码到内存
		privPem = pem.EncodeToMemory(&pem.Block{Type: blockTypePrivateKey, Bytes: encodedKey})
	} else {
    // 不是 pkcs8 的话，需要将私钥编码为 EC 私钥 / "EC PRIVATE KEY"
		encodedKey, err = x509.MarshalECPrivateKey(privKey)
		if err != nil {
			return nil, nil, err
		}
		privPem = pem.EncodeToMemory(&pem.Block{Type: blockTypeECPrivateKey, Bytes: encodedKey})
	}
	return csrOrCertPem, privPem, nil
}

生成基础证书

generateBaseCert() 方法返回一个基本的非CA证书，该证书可以通过添加 subject、key usage 和附加属性成为一个工作负载或CA证书：

// generateBaseCert returns a base non-CA cert that can be made a workload or CA cert
// By adding subjects, key usage and additional proerties.
func generateBaseCert(ttl, skew time.Duration, publicKey interface{}) (*x509.Certificate, error) {
  // 创建一个新的序列号
	serNum, err := newSerialNumber()
	if err != nil {
		return nil, err
	}

	now := time.Now().UTC()
	// Allow for clock skew with the NotBefore validity bound.
  // 允许在 NotBefore 有效期内出现时钟偏移。
	notBefore := now.Add(-1 * skew)
	notAfter := now.Add(ttl)

  // 创建并返回 x509 证书
	return &x509.Certificate{
		SerialNumber: serNum,
		NotBefore:    notBefore,
		NotAfter:     notAfter,
		PublicKey:    publicKey,
	}, nil
}

创建一个新的序列号的代码实现细节：

func newSerialNumber() (*big.Int, error) {
  // 序列号的最大值，1 << 128
	serialNumLimit := new(big.Int).Lsh(big.NewInt(1), 128)
  // 在这个区间内取随机数
	serialNum, err := rand.Int(rand.Reader, serialNumLimit)
	if err != nil {
		return nil, fmt.Errorf("error generating serial number: %w", err)
	}
	return serialNum, nil
}

生成基础证书的第一步就是生成其他证书。

生成 Root Cert CSR

GenerateRootCertCSR() 方法返回 CA root cert x509 证书：

// GenerateRootCertCSR returns a CA root cert x509 Certificate.
func GenerateRootCertCSR(org, cn string, publicKey interface{}, ttl, skew time.Duration) (*x509.Certificate, error) {
  // 先生成基本证书
	cert, err := generateBaseCert(ttl, skew, publicKey)
	if err != nil {
		return nil, err
	}

  // 设置证书的参数
	cert.KeyUsage = x509.KeyUsageCertSign
	cert.ExtKeyUsage = append(cert.ExtKeyUsage, x509.ExtKeyUsageServerAuth, x509.ExtKeyUsageClientAuth)
	cert.Subject = pkix.Name{
		CommonName:   cn,
		Organization: []string{org},
	}
	cert.DNSNames = []string{cn}
	cert.IsCA = true
	cert.BasicConstraintsValid = true
	cert.SignatureAlgorithm = x509.ECDSAWithSHA256
	return cert, nil
}

生成 CSR Certificate

GenerateCSRCertificate() 方法返回 x509 Certificate，输入为 CSR / 签名证书 / 公钥 / 签名私钥和持续时间：

// GenerateCSRCertificate returns an x509 Certificate from a CSR, signing cert, public key, signing private key and duration.
func GenerateCSRCertificate(csr *x509.CertificateRequest, subject string, identityBundle *identity.Bundle, signingCert *x509.Certificate, publicKey interface{}, signingKey crypto.PrivateKey,
	ttl, skew time.Duration, isCA bool,
) ([]byte, error) {
  // 先生成基本证书
	cert, err := generateBaseCert(ttl, skew, publicKey)
	if err != nil {
		return nil, fmt.Errorf("error generating csr certificate: %w", err)
	}
	if isCA {
		cert.KeyUsage = x509.KeyUsageCertSign | x509.KeyUsageCRLSign
	} else {
		cert.KeyUsage = x509.KeyUsageDigitalSignature | x509.KeyUsageKeyEncipherment
		cert.ExtKeyUsage = append(cert.ExtKeyUsage, x509.ExtKeyUsageServerAuth, x509.ExtKeyUsageClientAuth)
	}

	if subject == "cluster.local" {
		cert.Subject = pkix.Name{
			CommonName: subject,
		}
		cert.DNSNames = []string{subject}
	}

	cert.Issuer = signingCert.Issuer
	cert.IsCA = isCA
	cert.IPAddresses = csr.IPAddresses
	cert.Extensions = csr.Extensions
	cert.BasicConstraintsValid = true
	cert.SignatureAlgorithm = csr.SignatureAlgorithm

	if identityBundle != nil {
		spiffeID, err := identity.CreateSPIFFEID(identityBundle.TrustDomain, identityBundle.Namespace, identityBundle.ID)
		if err != nil {
			return nil, fmt.Errorf("error generating spiffe id: %w", err)
		}

		rv := []asn1.RawValue{
			{
				Bytes: []byte(spiffeID),
				Class: asn1.ClassContextSpecific,
				Tag:   asn1.TagOID,
			},
			{
				Bytes: []byte(fmt.Sprintf("%s.%s.svc.cluster.local", subject, identityBundle.Namespace)),
				Class: asn1.ClassContextSpecific,
				Tag:   2,
			},
		}

		b, err := asn1.Marshal(rv)
		if err != nil {
			return nil, fmt.Errorf("failed to marshal asn1 raw value for spiffe id: %w", err)
		}

		cert.ExtraExtensions = append(cert.ExtraExtensions, pkix.Extension{
			Id:       oidSubjectAlternativeName,
			Value:    b,
			Critical: true, // According to x509 and SPIFFE specs, a SubjAltName extension must be critical if subject name and DNS are not present.
		})
	}

	return x509.CreateCertificate(rand.Reader, cert, signingCert, publicKey, signingKey)
}

这里涉及很多 x509 相关的领域知识。

7.6 - certs代码

处理 certs 的相关逻辑

certs 相关的逻辑实现在文件 pkg/sentry/certs/certs.go 中。

准备工作

常量定义

const (
	BlockTypeCertificate     = "CERTIFICATE"
	BlockTypeECPrivateKey    = "EC PRIVATE KEY"
	BlockTypePKCS1PrivateKey = "RSA PRIVATE KEY"
	BlockTypePKCS8PrivateKey = "PRIVATE KEY"
)

备注：这里的常量定义和 csr.go 中的有部分重复。

结构体定义

Credentials 结构体包含一个证书和一个私钥：

// Credentials holds a certificate and private key.
type Credentials struct {
	PrivateKey  crypto.PrivateKey
	Certificate *x509.Certificate
}

实现

解码 PEM key

DecodePEMKey() 接收一个 PEM key 字节数组并返回一个代表 RSA 或 EC 私钥的对象：

func DecodePEMKey(key []byte) (crypto.PrivateKey, error) {
  // 解码 pem key
	block, _ := pem.Decode(key)
	if block == nil {
		return nil, errors.New("key is not PEM encoded")
	}
  
  // 按照类型进行后续解析处理
	switch block.Type {
	case BlockTypeECPrivateKey:
    // EC Private Key
		return x509.ParseECPrivateKey(block.Bytes)
	case BlockTypePKCS1PrivateKey:
    // PKCS1 Private Key
		return x509.ParsePKCS1PrivateKey(block.Bytes)
	case BlockTypePKCS8PrivateKey:
    // PKCS8 Private Key
		return x509.ParsePKCS8PrivateKey(block.Bytes)
	default:
		return nil, fmt.Errorf("unsupported block type %s", block.Type)
	}
}

解码 PEM 证书

DecodePEMCertificates() 方法接收一个 PEM 编码的 x509 证书字节数组，并以 x509.Certificate 对象片断的方式返回所有证书：

func DecodePEMCertificates(crtb []byte) ([]*x509.Certificate, error) {
	certs := []*x509.Certificate{}
  // crtb 数组可能包含多个证书
	for len(crtb) > 0 {
		var err error
		var cert *x509.Certificate

    // 解码单个 pem 证书
		cert, crtb, err = decodeCertificatePEM(crtb)
		if err != nil {
			return nil, err
		}
		if cert != nil {
			// it's a cert, add to pool
			certs = append(certs, cert)
		}
	}
	return certs, nil
}

decodeCertificatePEM() 方法解码单个 pem 证书：

func decodeCertificatePEM(crtb []byte) (*x509.Certificate, []byte, error) {
  // 执行pem 解码
  // pem.Decode() 方法将在输入中找到下一个 PEM 格式的块（证书，私钥  等）的输入。
  // 它返回该块和输入的其余部分。
  // 注意是返回剩余部分，当没有更多部分时，返回的长度为0
  // 如果没有找到PEM数据，则返回 block 为nil，其余部分返回整个输入。
	block, crtb := pem.Decode(crtb)
	if block == nil {
		return nil, crtb, errors.New("invalid PEM certificate")
	}
	if block.Type != BlockTypeCertificate {
		return nil, nil, nil
	}
  // 解码 x509 证书
	c, err := x509.ParseCertificate(block.Bytes)
	return c, crtb, err
}

生成基础证书的第一步就是生成其他证书。

从文件中获取 PEM 凭证

PEMCredentialsFromFiles() 方法接收一个密钥/证书对的路径，并返回一个经过验证的Credentials包装器：

func PEMCredentialsFromFiles(certPem, keyPem []byte) (*Credentials, error) {
  // 解码 PEM key
	pk, err := DecodePEMKey(keyPem)
	if err != nil {
		return nil, err
	}

  // 解码 PEM 证书
  // 如果有多个证书，实际后续只使用多个证书中的第一个
	crts, err := DecodePEMCertificates(certPem)
	if err != nil {
		return nil, err
	}

	if len(crts) == 0 {
		return nil, errors.New("no certificates found")
	}

  // 检查私钥和证书的 PublicKey 是否匹配
	match := matchCertificateAndKey(pk, crts[0])
	if !match {
		return nil, errors.New("error validating credentials: public and private key pair do not match")
	}

  // 构建 Credentials 结构体并返回
	creds := &Credentials{
		PrivateKey:  pk,
		Certificate: crts[0],
	}

	return creds, nil
}

matchCertificateAndKey() 方法检查私钥和证书的 PublicKey 是否匹配：

func matchCertificateAndKey(pk any, cert *x509.Certificate) bool {
  // 根据私钥的类型进行匹配
  // 实际是根据私钥类型的不同，获取到 cert 相应的 PublicKey，然后和私钥的 PublicKey 对比看是否相同
	switch key := pk.(type) {
	case *ecdsa.PrivateKey:
    // ecdsa PrivateKey
		if cert.PublicKeyAlgorithm != x509.ECDSA {
			return false
		}
		pub, ok := cert.PublicKey.(*ecdsa.PublicKey)
		return ok && pub.Equal(key.Public())
	case *rsa.PrivateKey:
    // rsa PrivateKey
		if cert.PublicKeyAlgorithm != x509.RSA {
			return false
		}
		pub, ok := cert.PublicKey.(*rsa.PublicKey)
		return ok && pub.Equal(key.Public())
	case ed25519.PrivateKey:
    // ed25519 Private Key
		if cert.PublicKeyAlgorithm != x509.Ed25519 {
			return false
		}
		pub, ok := cert.PublicKey.(ed25519.PublicKey)
		return ok && pub.Equal(key.Public())
	default:
		return false
	}
}

从 PEM 创建 cert pool

CertPoolFromPEM() 方法从一个 PEM 编码的证书字符串返回一个 CertPool

func CertPoolFromPEM(certPem []byte) (*x509.CertPool, error) {
  // 解码 PEM 证书
	certs, err := DecodePEMCertificates(certPem)
	if err != nil {
		return nil, err
	}
	if len(certs) == 0 {
		return nil, errors.New("no certificates found")
	}

  // 从多个证书中创建 cert pool
	return certPoolFromCertificates(certs), nil
}

certPoolFromCertificates() 方法的实现很简单：

func certPoolFromCertificates(certs []*x509.Certificate) *x509.CertPool {
  // 创建 cert pool
	pool := x509.NewCertPool()
	for _, c := range certs {
    // 将每个证书添加到 pool
		pool.AddCert(c)
	}
	return pool
}

解析 PRM CSR

ParsePemCSR() 使用给定的 PEM 编码的证书签名请求构建一个 x509 证书请求：

func ParsePemCSR(csrPem []byte) (*x509.CertificateRequest, error) {
  // pem 解码
	block, _ := pem.Decode(csrPem)
	if block == nil {
		return nil, errors.New("certificate signing request is not properly encoded")
	}
  
  // 尝试 x509 解码证书请求
	csr, err := x509.ParseCertificateRequest(block.Bytes)
	if err != nil {
		return nil, fmt.Errorf("failed to parse X.509 certificate signing request: %w", err)
	}
	return csr, nil
}

生成 ECP 私钥

GenerateECPrivateKey() 方法返回一个新的 ECP 私钥：

func GenerateECPrivateKey() (*ecdsa.PrivateKey, error) {
	return ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
}

这里涉及很多 x509 相关的领域知识。

7.7 - certs store代码

实现 certs 的存储 store

certs 相关的存储逻辑实现在文件 pkg/sentry/certs/store.go 中。

准备工作

常量定义

const (
	defaultSecretNamespace = "default"
)

实现

存储凭证

StoreCredentials() 方法将 trust bundle 存储在 Kubernetes secret store 或者本地磁盘上，取决于托管的平台：

func StoreCredentials(ctx context.Context, conf config.SentryConfig, rootCertPem, issuerCertPem, issuerKeyPem []byte) error {
	if config.IsKubernetesHosted() {
    // 如果是 k8s 托管来
		return storeKubernetes(ctx, rootCertPem, issuerCertPem, issuerKeyPem)
	}
  
  // 否则是自托管
	return storeSelfhosted(rootCertPem, issuerCertPem, issuerKeyPem, conf.RootCertPath, conf.IssuerCertPath, conf.IssuerKeyPath)
}

在 Kubernetes 中的存储

storeKubernetes() 方法将凭证存储在 Kubernetes secret store 中：

// 部分常量定于在 consts.go 中
const (
	TrustBundleK8sSecretName = "dapr-trust-bundle" /* #nosec */
)

func storeKubernetes(ctx context.Context, rootCertPem, issuerCertPem, issuerCertKey []byte) error {
  // 准备 k8s client
	kubeClient, err := kubernetes.GetClient()
	if err != nil {
		return err
	}

  // 获取 namespace
	namespace := getNamespace()
  // 调用 k8s API 的方法获取 secret
	secret, err := kubeClient.CoreV1().Secrets(namespace).Get(context.TODO(), consts.TrustBundleK8sSecretName, metav1.GetOptions{})
	if errors.IsNotFound(err) {
		return fmt.Errorf("failed to get secret %w", err)
	}

  // 将 rootCertPem / issuerCertPem / issuerCertKey 保存到 secret 的 Data 中
	secret.Data = map[string][]byte{
		credentials.RootCertFilename:   rootCertPem,
		credentials.IssuerCertFilename: issuerCertPem,
		credentials.IssuerKeyFilename:  issuerCertKey,
	}

  // 更新保存 secret
	// We update and not create because sentry expects a secret to already exist
	_, err = kubeClient.CoreV1().Secrets(namespace).Update(ctx, secret, metav1.UpdateOptions{})
	if err != nil {
		return fmt.Errorf("failed saving secret to kubernetes: %w", err)
	}
	return nil
}

其中 getNamespace() 读取环境变量 “NAMESPACE” 来获知当前的命名空间，缺省值为 “default”：

const (
	defaultSecretNamespace = "default"
)

func getNamespace() string {
	namespace := os.Getenv("NAMESPACE")
	if namespace == "" {
		namespace = defaultSecretNamespace
	}
	return namespace
}

自托管时的存储

storeSelfhosted() 方法将凭证存储在本地文件中：

func StoreCredentials(...) {
  ......
	return storeSelfhosted(rootCertPem, issuerCertPem, issuerKeyPem, conf.RootCertPath, conf.IssuerCertPath, conf.IssuerKeyPath)
  }

func storeSelfhosted(rootCertPem, issuerCertPem, issuerKeyPem []byte, rootCertPath, issuerCertPath, issuerKeyPath string) error {
  // 分别将三个内容保存到三个文件中
	err := os.WriteFile(rootCertPath, rootCertPem, 0o644)
	if err != nil {
		return fmt.Errorf("failed saving file to %s: %w", rootCertPath, err)
	}

	err = os.WriteFile(issuerCertPath, issuerCertPem, 0o644)
	if err != nil {
		return fmt.Errorf("failed saving file to %s: %w", issuerCertPath, err)
	}

	err = os.WriteFile(issuerKeyPath, issuerKeyPem, 0o644)
	if err != nil {
		return fmt.Errorf("failed saving file to %s: %w", issuerKeyPath, err)
	}
	return nil
}

rootCertPem / issuerCertPem / issuerKeyPem 分别保存到 conf.RootCertPath / conf.IssuerCertPath / conf.IssuerKeyPath 这三个 sentry 配置指定的文件路径中。

回顾一下 main.go 中读取相关配置的代码实现：

const (
	defaultCredentialsPath = "/var/run/dapr/credentials"
)

var (
	// RootCertFilename is the filename that holds the root certificate.
	RootCertFilename = "ca.crt"
	// IssuerCertFilename is the filename that holds the issuer certificate.
	IssuerCertFilename = "issuer.crt"
	// IssuerKeyFilename is the filename that holds the issuer key.
	IssuerKeyFilename = "issuer.key"
)

func main() {
  ......
credsPath := flag.String("issuer-credentials", defaultCredentialsPath, "Path to the credentials directory holding the issuer data")	
  flag.StringVar(&credentials.RootCertFilename, "issuer-ca-filename", credentials.RootCertFilename, "Certificate Authority certificate filename")
	flag.StringVar(&credentials.IssuerCertFilename, "issuer-certificate-filename", credentials.IssuerCertFilename, "Issuer certificate filename")
	flag.StringVar(&credentials.IssuerKeyFilename, "issuer-key-filename", credentials.IssuerKeyFilename, "Issuer private key filename")

	issuerCertPath := filepath.Join(*credsPath, credentials.IssuerCertFilename)
	issuerKeyPath := filepath.Join(*credsPath, credentials.IssuerKeyFilename)
	rootCertPath := filepath.Join(*credsPath, credentials.RootCertFilename)

	......
	config.IssuerCertPath = issuerCertPath
	config.IssuerKeyPath = issuerKeyPath
	config.RootCertPath = rootCertPath
  ......
}

可见默认是使用 “/var/run/dapr/credentials” 目录下的这三个文件：

“ca.crt”
“issuer.crt”
“issuer.key”

7.8 - metrics代码

sentry 中的 metrics 实现

metrics 相关的实现在文件 pkg/sentry/monitoring/metrics.go 中。

准备工作

变量定义

定义了一些和 metrics 相关的变量：

var (
	// Metrics definitions.
	csrReceivedTotal = stats.Int64(
		"sentry/cert/sign/request_received_total",
		"The number of CSRs received.",
		stats.UnitDimensionless)
	certSignSuccessTotal = stats.Int64(
		"sentry/cert/sign/success_total",
		"The number of certificates issuances that have succeeded.",
		stats.UnitDimensionless)
	certSignFailedTotal = stats.Int64(
		"sentry/cert/sign/failure_total",
		"The number of errors occurred when signing the CSR.",
		stats.UnitDimensionless)
	serverTLSCertIssueFailedTotal = stats.Int64(
		"sentry/servercert/issue_failed_total",
		"The number of server TLS certificate issuance failures.",
		stats.UnitDimensionless)
	issuerCertChangedTotal = stats.Int64(
		"sentry/issuercert/changed_total",
		"The number of issuer cert updates, when issuer cert or key is changed",
		stats.UnitDimensionless)
	issuerCertExpiryTimestamp = stats.Int64(
		"sentry/issuercert/expiry_timestamp",
		"The unix timestamp, in seconds, when issuer/root cert will expire.",
		stats.UnitDimensionless)

	// Metrics Tags.
	failedReasonKey = tag.MustNewKey("reason")
	noKeys          = []tag.Key{}
)

目前总共有 6 个 metrics 指标：

csrReceivedTotal：接收到的 csr 的数量
certSignSuccessTotal：签署成功的证书数量
certSignFailedTotal：签署失败的证书数量
serverTLSCertIssueFailedTotal：服务器TLS证书发放失败的次数。
issuerCertChangedTotal：当签发人的证书或钥匙被改变时，签发人证书更新的数量
issuerCertExpiryTimestamp：发行人/根证书有效期的unix时间戳，单位是秒。

初始化

初始化 metrics：

func InitMetrics() error {
  // 将 6 个 metrics 指标都注册起来
	return view.Register(
		diagUtils.NewMeasureView(csrReceivedTotal, noKeys, view.Count()),
		diagUtils.NewMeasureView(certSignSuccessTotal, noKeys, view.Count()),
		diagUtils.NewMeasureView(certSignFailedTotal, []tag.Key{failedReasonKey}, view.Count()),
		diagUtils.NewMeasureView(serverTLSCertIssueFailedTotal, []tag.Key{failedReasonKey}, view.Count()),
		diagUtils.NewMeasureView(issuerCertChangedTotal, noKeys, view.Count()),
		diagUtils.NewMeasureView(issuerCertExpiryTimestamp, noKeys, view.LastValue()),
	)
}

收集 metrics

crs 相关

CertSignRequestReceived() 对接收到的 csr 数量进行计数：

// CertSignRequestReceived counts when CSR received.
func CertSignRequestReceived() {
	stats.Record(context.Background(), csrReceivedTotal.M(1))
}

另外 CertSignSucceed() 会对处理成功的情况进行计数：

func CertSignSucceed() {
	stats.Record(context.Background(), certSignSuccessTotal.M(1))
}

而 CertSignFailed() 则会对处理失败的情况进行计数：

func CertSignFailed(reason string) {
	stats.RecordWithTags(
		context.Background(),
		diagUtils.WithTags(certSignFailedTotal.Name(), failedReasonKey, reason),
		certSignFailedTotal.M(1))
}

三者的调用点为 server.go 中的 SignCertificate() 函数，这个函数负责处理 csr 请求：

func (s *server) SignCertificate(ctx context.Context, req *sentryv1pb.SignCertificateRequest) (*sentryv1pb.SignCertificateResponse, error) {
  // 进来就计数：这是 接收到的 csr 数量
	monitoring.CertSignRequestReceived()
  ......
  
  // 每一个错误在return之前都要进行一次失败计数
	if err != nil {
		monitoring.CertSignFailed("cert_parse")
		return nil, err
	}
  ......
  // 如果最后 csr 处理成功，则进行成功计数
  monitoring.CertSignSucceed()

	return resp, nil
}

证书有效期

IssuerCertExpiry() 方法记录 root cert 有效期的情况：

// IssuerCertExpiry records root cert expiry.
func IssuerCertExpiry(expiry *time.Time) {
	stats.Record(context.Background(), issuerCertExpiryTimestamp.M(expiry.Unix()))
}

调用点在 sentry.go 中的 createCAServer() 函数中：

func (s *sentry) createCAServer(ctx context.Context) (ca.CertificateAuthority, identity.Validator) {
	certAuth, authorityErr := ca.NewCertificateAuthority(s.conf)
	trustStoreErr := certAuth.LoadOrStoreTrustBundle(ctx)
	......
	certExpiry := certAuth.GetCACertBundle().GetIssuerCertExpiry()
	monitoring.IssuerCertExpiry(certExpiry)
	......
	return certAuth, v
}

在 CA server 的创建过程中，会加载 trust bundle并检查证书的有效期，在这里记录有效期的数据收集。

服务器证书签发失败

ServerCertIssueFailed() 记录服务器证书签发失败。

func ServerCertIssueFailed(reason string) {
	stats.Record(context.Background(), serverTLSCertIssueFailedTotal.M(1))
}

调用点在 server.go 中：


func (s *server) Run(ctx context.Context, port int, trustBundler ca.TrustRootBundler) error {
  ......
  tlsOpt := s.tlsServerOption(trustBundler)
  s.srv = grpc.NewServer(tlsOpt)
  ......
}

sentry server启动过程中，在启动 grpc server 时，需要获取 tls server 的参数，期间要获取 sentry server 的服务器端证书：

func (s *server) tlsServerOption(trustBundler ca.TrustRootBundler) grpc.ServerOption {
	cp := trustBundler.GetTrustAnchors()

	config := &tls.Config{
		ClientCAs: cp,
		// Require cert verification
		ClientAuth: tls.RequireAndVerifyClientCert,
		GetCertificate: func(*tls.ClientHelloInfo) (*tls.Certificate, error) {
			if s.certificate == nil || needsRefresh(s.certificate, serverCertExpiryBuffer) {
				cert, err := s.getServerCertificate()
				if err != nil {
					monitoring.ServerCertIssueFailed("server_cert")
					log.Error(err)
					return nil, fmt.Errorf("failed to get TLS server certificate: %w", err)
				}
				s.certificate = cert
			}
	......
}

如果获取失败，则会记录这个失败信息。

发行者证书变更

IssuerCertChanged() 记录发行人凭证的变更：

func IssuerCertChanged() {
	stats.Record(context.Background(), issuerCertChangedTotal.M(1))
}

调用点在 main.go 中的 main() 函数中，sentry 在启动后会监视发行者证书（默认为 “/var/run/dapr/credentials” 下的 “issuer.crt” 文件）：

func main() {
  ......
			func(ctx context.Context) error {
				select {
				case <-ctx.Done():
					return nil

				case <-issuerEvent:
					monitoring.IssuerCertChanged()
					log.Debug("received issuer credentials changed signal")
				......
	}
  ......
  	// Watch for changes in the watchDir
	mngr.Add(func(ctx context.Context) error {
		log.Infof("starting watch on filesystem directory: %s", watchDir)
		return fswatcher.Watch(ctx, watchDir, issuerEvent)
	})
}

7.9 - sentry库的identify实现

identify

7.9.1 - identify.go

identify结构体定义

结构体定义

// Bundle 包含了足以以识别一个跨信任域和命名空间的工作负载的所有的元素：

type Bundle struct {
	ID          string
	Namespace   string
	TrustDomain string
}

其实就三个元素： ID / Namespace 以及 TrustDomain

NewBundle() 方法

NewBundle() 方法返回一个新的 identity Bundle。

func NewBundle(id, namespace, trustDomain string) *Bundle {
	// Empty namespace and trust domain result in an empty bundle
  // 如果 namespace 或者 trust domain 为空，则返回空的 bundle（nil）
	if namespace == "" || trustDomain == "" {
		return nil
	}

  // 否则指示简单的赋值三个属性
	return &Bundle{
		ID:          id,
		Namespace:   namespace,
		TrustDomain: trustDomain,
	}
}

namespace和trustDomain是可选参数。当为空时，将返回一个 nil 值。

7.9.2 - validator.go

validator 接口定义

接口定义

Validator 通过使用 ID 和 token 来验证证书请求的身份

type Validator interface {
	Validate(id, token, namespace string) error
}

7.9.3 - spiff.go

创建 spiff ID

CreateSPIFFEID 方法

CreateSPIFFEID() 方法从给定的 trustDomain, namespace, appID 创建符合 SPIFFE 标准的唯一ID：

func CreateSPIFFEID(trustDomain, namespace, appID string) (string, error) {
  // trustDomain, namespace, appID 三者都不能为空
	if trustDomain == "" {
		return "", errors.New("can't create spiffe id: trust domain is empty")
	}
	if namespace == "" {
		return "", errors.New("can't create spiffe id: namespace is empty")
	}
	if appID == "" {
		return "", errors.New("can't create spiffe id: app id is empty")
	}

  // 根据 SPIFFE 规范进行验证
	// Validate according to the SPIFFE spec
	if strings.ContainsRune(trustDomain, ':') {
    // trustDomain不能带":"
		return "", errors.New("trust domain cannot contain the ':' character")
	}
  // trustDomain 的长度不能大于255个 byte
	if len([]byte(trustDomain)) > 255 {
		return "", errors.New("trust domain cannot exceed 255 bytes")
	}

  // 拼接出 SPIFFE ID
	id := fmt.Sprintf("spiffe://%s/ns/%s/%s", trustDomain, namespace, appID)
	if len([]byte(id)) > 2048 {
    // 验证 SPIFFE ID 长度不大于 2048
		return "", errors.New("spiffe id cannot exceed 2048 bytes")
	}
	return id, nil
}

7.9.4 - kubernetes下的validator.go

kubernetes下的validator实现

准备工作

结构体定义

validator 结构体定义：

type validator struct {
	client    k8s.Interface
	auth      kauth.AuthenticationV1Interface
	audiences []string
}

创建validator的方法

NewValidator() 方法创建新的 validator 结构体：

func NewValidator(client k8s.Interface, audiences []string) identity.Validator {
	return &validator{
		client:    client,
		auth:      client.AuthenticationV1(),
		audiences: audiences,
	}
}

实现

Validate() 实现通过使用 ID 和 token 来验证证书请求的身份：

func (v *validator) Validate(id, token, namespace string) error {
  // id 和 token 不能为空
	if id == "" {
		return fmt.Errorf("%s: id field in request must not be empty", errPrefix)
	}
	if token == "" {
		return fmt.Errorf("%s: token field in request must not be empty", errPrefix)
	}

	// TODO: Remove in Dapr 1.12 to enforce setting an explicit audience
	var canTryWithNilAudience, showDefaultTokenAudienceWarning bool

	audiences := v.audiences
  
	if len(audiences) == 0 {
    // 处理用户没有显式设置 audience 的特殊情况
    // 此时采用默认是 sentryConsts.ServiceAccountTokenAudience "dapr.io/sentry"
		audiences = []string{sentryConsts.ServiceAccountTokenAudience}

		// TODO: Remove in Dapr 1.12 to enforce setting an explicit audience
		// Because the user did not specify an explicit audience and is instead relying on the default, if the authentication fails we can retry with nil audience
    // 并记录下来这是特殊情况，如果认证失败则应该尝试 audience 为 nil 的情况
		canTryWithNilAudience = true
	}
	tokenReview := &kauthapi.TokenReview{
		Spec: kauthapi.TokenReviewSpec{
			Token:     token,
			Audiences: audiences,
		},
	}

tr: // TODO: Remove in Dapr 1.12 to enforce setting an explicit audience

	prts, err := v.executeTokenReview(tokenReview)
	if err != nil {
		// TODO: Remove in Dapr 1.12 to enforce setting an explicit audience
		if canTryWithNilAudience {
			// Retry with a nil audience, which means the default audience for the K8s API server
			tokenReview.Spec.Audiences = nil
			showDefaultTokenAudienceWarning = true
			canTryWithNilAudience = false
			goto tr
		}

		return err
	}

	// TODO: Remove in Dapr 1.12 to enforce setting an explicit audience
	if showDefaultTokenAudienceWarning {
		log.Warn("WARNING: Sentry accepted a token with the audience for the Kubernetes API server. This is deprecated and only supported to ensure a smooth upgrade from Dapr pre-1.10.")
	}

	if len(prts) != 4 || prts[0] != "system" {
		return fmt.Errorf("%s: provided token is not a properly structured service account token", errPrefix)
	}

	podSa := prts[3]
	podNs := prts[2]

  // 检验 namespace
	if namespace != "" {
		if podNs != namespace {
			return fmt.Errorf("%s: namespace mismatch. received namespace: %s", errPrefix, namespace)
		}
	}

  // 检验 id
	if id != podNs+":"+podSa {
		return fmt.Errorf("%s: token/id mismatch. received id: %s", errPrefix, id)
	}
	return nil
}

executeTokenReview() 方法执行 tokenReview，如果 token 无效或者失败则返回错误：

func (v *validator) executeTokenReview(tokenReview *kauthapi.TokenReview) ([]string, error) {
	review, err := v.auth.TokenReviews().Create(context.TODO(), tokenReview, v1.CreateOptions{})
	if err != nil {
		return nil, fmt.Errorf("%s: token review failed: %w", errPrefix, err)
	}
	if review.Status.Error != "" {
		return nil, fmt.Errorf("%s: invalid token: %s", errPrefix, review.Status.Error)
	}
	if !review.Status.Authenticated {
		return nil, fmt.Errorf("%s: authentication failed", errPrefix)
	}
	return strings.Split(review.Status.User.Username, ":"), nil
}

7.9.5 - selfhosted下的validator.go

selfhosted下的validator实现

selfhosted 下实际没有做验证：

func NewValidator() identity.Validator {
	return &validator{}
}

type validator struct{}

func (v *validator) Validate(id, token, namespace string) error {
	// no validation for self hosted.
	return nil
}

只是保留了一套代码框架，以满足 Validator 接口的要求。

这意味着在 selfhosted 下是不会进行身份验证的。

8 - Java SDK

Dapr JAVA SDK源码解析

8.1 - Java SDK 概述

Dapr JAVA SDK 概述

项目结构

主要有以下子项目：

sdk
sdk-autogen
sdk-springboot
sdk-tests

8.2 - sdk-autogen

sdk-autogen子项目：负责从proto生成java代码

8.2.1 - pom.xml

sdk-autogen子项目：pom.xml内容

基本定义

依赖

定义的项目依赖：

javax.annotation-api： provided
grpc-netty-shaded： runtime
grpc-protobuf
grpc-stub
grpc-testing： test

其中 grpc 版本为 1.42.1。

  <properties>
    <grpc.version>1.42.1</grpc.version>
  </properties>

<dependencies>
    <dependency>
      <groupId>javax.annotation</groupId>
      <artifactId>javax.annotation-api</artifactId>
      <version>1.3.2</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>io.grpc</groupId>
      <artifactId>grpc-netty-shaded</artifactId>
      <version>${grpc.version}</version>
      <scope>runtime</scope>
    </dependency>
    <dependency>
      <groupId>io.grpc</groupId>
      <artifactId>grpc-protobuf</artifactId>
      <version>${grpc.version}</version>
    </dependency>
    <dependency>
      <groupId>io.grpc</groupId>
      <artifactId>grpc-stub</artifactId>
      <version>${grpc.version}</version>
    </dependency>
    <dependency>
      <groupId>io.grpc</groupId>
      <artifactId>grpc-testing</artifactId>
      <version>${grpc.version}</version>
      <scope>test</scope>
    </dependency>
  </dependencies>

代码生成的目录

两个目录：

input： proto
output： generated-sources

  <properties>
    <protobuf.output.directory>${project.build.directory}/generated-sources</protobuf.output.directory>
    <protobuf.input.directory>${project.build.directory}/proto</protobuf.input.directory>
  </properties>

maven插件

download-maven-plugin

download-maven-plugin 用来下载 proto 文件。

插件的功能可以简单理解为：

用 wget 命令从 ${dapr.proto.baseurl}/common/v1/common.proto 处下载到 common.proto 文件
用 wget 命令从 ${dapr.proto.baseurl}/common/v1/dapr.proto 处下载到 dapr.proto 文件
用 wget 命令从 ${dapr.proto.baseurl}/common/v1/appcallback.proto 处下载到 appcallback.proto 文件
以上三个文件下载后都会放置到目录 ${protobuf.input.directory}/dapr/proto/common/v1 下

<plugin>
        <groupId>com.googlecode.maven-download-plugin</groupId>
        <artifactId>download-maven-plugin</artifactId>
        <version>1.6.0</version>
        <executions>
          <execution>
            <id>getCommonProto</id>
            <!-- the wget goal actually binds itself to this phase by default -->
            <phase>initialize</phase>
            <goals>
              <goal>wget</goal>
            </goals>
            <configuration>
              <url>${dapr.proto.baseurl}/common/v1/common.proto</url>
              <outputFileName>common.proto</outputFileName>
              <!-- default target location, just to demonstrate the parameter -->
              <outputDirectory>${protobuf.input.directory}/dapr/proto/common/v1</outputDirectory>
            </configuration>
          </execution>
          <execution>
            <id>getDaprProto</id>
            <!-- the wget goal actually binds itself to this phase by default -->
            <phase>initialize</phase>
            <goals>
              <goal>wget</goal>
            </goals>
            <configuration>
              <url>${dapr.proto.baseurl}/runtime/v1/dapr.proto</url>
              <outputFileName>dapr.proto</outputFileName>
              <!-- default target location, just to demonstrate the parameter -->
              <outputDirectory>${protobuf.input.directory}</outputDirectory>
            </configuration>
          </execution>
          <execution>
            <id>getDaprClientProto</id>
            <!-- the wget goal actually binds itself to this phase by default -->
            <phase>initialize</phase>
            <goals>
              <goal>wget</goal>
            </goals>
            <configuration>
              <url>${dapr.proto.baseurl}/runtime/v1/appcallback.proto</url>
              <outputFileName>appcallback.proto</outputFileName>
              <!-- default target location, just to demonstrate the parameter -->
              <outputDirectory>${protobuf.input.directory}</outputDirectory>
            </configuration>
          </execution>
        </executions>
      </plugin>

protoc-jar-maven-plugin

最关键的地方，protoc-jar-maven-plugin 用于将 proto 文件生成 java 代码。

<plugin>
        <groupId>com.github.os72</groupId>
        <artifactId>protoc-jar-maven-plugin</artifactId>
        <version>3.11.4</version>
        <executions>
          <execution>
            <phase>generate-sources</phase>
            <goals>
              <goal>run</goal>
            </goals>
            <configuration>
              <protocVersion>${protobuf.version}</protocVersion>
              <addProtoSources>inputs</addProtoSources>
              <includeMavenTypes>direct</includeMavenTypes>
              <includeStdTypes>true</includeStdTypes>
              <inputDirectories>
                <include>${protobuf.input.directory}/dapr/proto/common/v1</include>
                <include>${protobuf.input.directory}</include>
              </inputDirectories>
              <outputTargets>
                <outputTarget>
                  <type>java</type>
                  <outputDirectory>${protobuf.output.directory}</outputDirectory>
                </outputTarget>
                <outputTarget>
                  <type>grpc-java</type>
                  <outputDirectory>${protobuf.output.directory}</outputDirectory>
                  <pluginArtifact>io.grpc:protoc-gen-grpc-java:${grpc.version}</pluginArtifact>
                </outputTarget>
              </outputTargets>
            </configuration>
          </execution>
        </executions>
      </plugin>

spotbugs-maven-plugin

没啥特殊，只是为自动生成的代码跳过 findbugs

      <plugin>
        <groupId>com.github.spotbugs</groupId>
        <artifactId>spotbugs-maven-plugin</artifactId>
        <configuration>
          <!-- Skip findbugs for auto-generated code -->
          <skip>true</skip>
        </configuration>
      </plugin>

maven-javadoc-plugin

没啥特殊。

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-javadoc-plugin</artifactId>
    <version>3.2.0</version>
    <executions>
        <execution>
        <id>attach-javadocs</id>
        <goals>
            <goal>jar</goal>
        </goals>
        </execution>
    </executions>
    </plugin>

maven-source-plugin

没啥特殊。

<plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-source-plugin</artifactId>
        <version>3.2.1</version>
        <executions>
          <execution>
            <id>attach-sources</id>
            <goals>
              <goal>jar-no-fork</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

执行结果和分析

执行代码生成操作

执行 mvn install 命令，就可以看到代码生成的过程和结果。

download-maven-plugin 插件首先会下载 proto 文件到 target/proto 目录：

之后 protoc-jar-maven-plugin 插件会将这些 proto 文件生成 java 代码：

编译完成之后 proto 文件和 class 文件都被放到 target/classes 目录：

最后被打包为 jar 包，以及对应的 sources 和 javadoc 的 jar：

解开这个jar包，可以看到里面的文件内容和 target/classes 目录里面的内容是一致的：

里面不仅仅有 java classes文件，还有 proto 文件。

注意事项

dapr proto 文件是来源于 ${dapr.proto.baseurl}，通过 wget 命令下载。

而 dapr.proto.baseurl 的定义在 java-sdk 根目录下的 pom.xml 文件中定义：

<dapr.proto.baseurl>https://raw.githubusercontent.com/dapr/dapr/v1.7.0-rc.2/dapr/proto</dapr.proto.baseurl>

这里就涉及到 proto 文件的版本（所在分支 / tag /commit id）。本地开发时如果涉及到 proto 文件的修改，就需要更新这里的 url 地址以对应正确的 proto 文件。反过来说，如果发现根据 proto 生成的代码没有反映出 proto 中新的修改，则应该第一时间检查这个 url 地址的有效性。

8.3 - sdk

sdk子项目：java sdk的核心内容

8.3.1 - 序列化

java sdk 中序列化的设计和实现

8.3.1.1 - 背景

java sdk 中序列化的背景

文档介绍

https://github.com/dapr/java-sdk#how-to-use-a-custom-serializer

dapr java-sdk 项目的 readme 中有这么一段介绍：

How to use a custom serializer

如何使用一个自定义的序列化器

This SDK provides a basic serialization for request/response objects but also for state objects. Applications should provide their own serialization for production scenarios.

这个SDK为请求/响应对象提供了一个基本的序列化，但也为状态对象提供了序列化。应用程序应该为生产场景提供他们自己的序列化。

8.3.1.2 - DaprObjectSerializer

DaprObjectSerializer 接口定义了 dapr 的对象序列化器

接口定义

DaprObjectSerializer 接口很简单，定义如下：

// 对应用程序的对象进行序列化和反序列化
public interface DaprObjectSerializer {

  // 将给定的对象序列化为byte[].
  byte[] serialize(Object o) throws IOException;

  // 将给定的byte[]反序列化为一个对象。
  <T> T deserialize(byte[] data, TypeRef<T> type) throws IOException;

  // 返回请求的内容类型
  String getContentType();
}

getContentType() 方法获知内容的类型，serialize() 和 deserialize() 分别实现序列化和反序列化，即实现对象和 byte[] 的相互转换。

8.3.1.3 - DefaultObjectSerializer

DefaultObjectSerializer 是 dapr 的默认对象序列化器

DefaultObjectSerializer 继承自 ObjectSerializer, serialize 和 deserialize 都只是代理给 ObjectSerializer ，而 getContentType() 方法则 hard code 为返回 “application/json”：

public class DefaultObjectSerializer extends ObjectSerializer implements DaprObjectSerializer {

  @Override
  public byte[] serialize(Object o) throws IOException {
    return super.serialize(o);
  }

  @Override
  public <T> T deserialize(byte[] data, TypeRef<T> type) throws IOException {
    return super.deserialize(data, type);
  }

  @Override
  public String getContentType() {
    return "application/json";
  }
}

8.3.1.4 - ObjectSerializer

ObjectSerializer 是 dapr 的默认对象序列化器

类定义

public class ObjectSerializer {
  // 默认构造函数，以避免类在包外被实例化，但仍可以被继承。
  protected ObjectSerializer() {
  }
}

jackson 相关设置

  protected static final ObjectMapper OBJECT_MAPPER = new ObjectMapper()
      .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
      .setSerializationInclusion(JsonInclude.Include.NON_NULL);

serialize() 方法实现

public byte[] serialize(Object state) throws IOException {
    if (state == null) {
      return null;
    }

    if (state.getClass() == Void.class) {
      return null;
    }

    // Have this check here to be consistent with deserialization (see deserialize() method below).
    if (state instanceof byte[]) {
      return (byte[]) state;
    }

    // Proto buffer class is serialized directly.
    if (state instanceof MessageLite) {
      return ((MessageLite) state).toByteArray();
    }

    // Not string, not primitive, so it is a complex type: we use JSON for that.
    return OBJECT_MAPPER.writeValueAsBytes(state);
  }

deserialize() 方法实现

这两个方法都是简单代理：

  public <T> T deserialize(byte[] content, TypeRef<T> type) throws IOException {
    return deserialize(content, OBJECT_MAPPER.constructType(type.getType()));
  }

  public <T> T deserialize(byte[] content, Class<T> clazz) throws IOException {
    return deserialize(content, OBJECT_MAPPER.constructType(clazz));
  }

具体实现在这里：

  private <T> T deserialize(byte[] content, JavaType javaType) throws IOException {
    // 对应 serialize 的做法
    if ((javaType == null) || javaType.isTypeOrSubTypeOf(Void.class)) {
      return null;
    }

    // 如果是 java 基本类型，则交给 deserializePrimitives() 方法处理
    // 注意此时 content 有可能是 null 或者 空数组
    if (javaType.isPrimitive()) {
      return deserializePrimitives(content, javaType);
    }

    // 对应 serialize 的做法
    if (content == null) {
      return null;
    }

    // Deserialization of GRPC response fails without this check since it does not come as base64 encoded byte[].
    // 如果没有这个检查，GRPC响应的反序列化就会失败，因为它不是以 base64 编码的 byte[] 形式出现的。
    // TBD：这里有点不是太理解
    if (javaType.hasRawClass(byte[].class)) {
      return (T) content;
    }

    // // 对应 serialize 的做法，但长度为零的检测放在 byte[] 检测之后
    if (content.length == 0) {
      return null;
    }

    // 对 CloudEvent 的支持：如果是 CloudEvent，则单独序列化
    if (javaType.hasRawClass(CloudEvent.class)) {
      return (T) CloudEvent.deserialize(content);
    }

    // 对 grpc MessageLite 的支持：通过反射调用 parseFrom 方法
    if (javaType.isTypeOrSubTypeOf(MessageLite.class)) {
      try {
        Method method = javaType.getRawClass().getDeclaredMethod("parseFrom", byte[].class);
        if (method != null) {
          return (T) method.invoke(null, content);
        }
      } catch (NoSuchMethodException e) {
        // It was a best effort. Skip this try.
      } catch (Exception e) {
        throw new IOException(e);
      }
    }

    // 最后才通过 jackson 进行标准的 json 序列化
    return OBJECT_MAPPER.readValue(content, javaType);
  }

deserializePrimitives() 方法

对原生类型的解析：

private static <T> T deserializePrimitives(byte[] content, JavaType javaType) throws IOException {
    if ((content == null) || (content.length == 0)) {
      // content 为null或者空的特殊处理，相当于是缺省值
      if (javaType.hasRawClass(boolean.class)) {
        return (T) Boolean.FALSE;
      }

      if (javaType.hasRawClass(byte.class)) {
        return (T) Byte.valueOf((byte) 0);
      }

      if (javaType.hasRawClass(short.class)) {
        return (T) Short.valueOf((short) 0);
      }

      if (javaType.hasRawClass(int.class)) {
        return (T) Integer.valueOf(0);
      }

      if (javaType.hasRawClass(long.class)) {
        return (T) Long.valueOf(0L);
      }

      if (javaType.hasRawClass(float.class)) {
        return (T) Float.valueOf(0);
      }

      if (javaType.hasRawClass(double.class)) {
        return (T) Double.valueOf(0);
      }

      if (javaType.hasRawClass(char.class)) {
        return (T) Character.valueOf(Character.MIN_VALUE);
      }

      return null;
    }

    // 对于非空值，通过 jackson 进行反序列化
    return OBJECT_MAPPER.readValue(content, javaType);
  }

总结

这个代码中，在 jackson 处理之前有很多特殊逻辑，这些逻辑理论上应该是独立于 jackson 序列化方案的，如果要引入其他 DaprObjectSerializer 的实现，这些特殊逻辑都要重复 n 次，有代码重复和逻辑不一致的风险。

最好是能把这些逻辑提取出来，在序列化和反序列化时先用这些特殊逻辑出来一遍，最后再交给 DaprObjectSerializer ，会比较合理。

再有就是依赖冲突问题，目前的 DaprObjectSerializer 方案没有给出完整的解决方案。jackson 的依赖还是写死的。

8.3.2 - HTTP客户端

java sdk 中的HTTP客户端

8.3.2.1 - DaprHttp

Dapr HTTP 的 okhttp3 + jackson 实现

常量定义

  public static final String API_VERSION = "v1.0";

  public static final String ALPHA_1_API_VERSION = "v1.0-alpha1";

  private static final String HEADER_DAPR_REQUEST_ID = "X-DaprRequestId";

  private static final String DEFAULT_HTTP_SCHEME = "http";

  private static final Set<String> ALLOWED_CONTEXT_IN_HEADERS =
      Collections.unmodifiableSet(new HashSet<>(Arrays.asList("grpc-trace-bin", "traceparent", "tracestate")));

HTTP 方法定义：

  public enum HttpMethods {
    NONE,
    GET,
    PUT,
    POST,
    DELETE,
    HEAD,
    CONNECT,
    OPTIONS,
    TRACE
  }

基本类定义

  public static class Response {
    private byte[] body;
    private Map<String, String> headers;
    private int statusCode;
    ......
  }

DaprHttp 类定义

  private final OkHttpClient httpClient;
  private final int port;
  private final String hostname;

  DaprHttp(String hostname, int port, OkHttpClient httpClient) {
    this.hostname = hostname;
    this.port = port;
    this.httpClient = httpClient;
  }

invokeApi() 方法实现

这个方法有多个重载，最终的实现如下，用来执行http调用请求：

  /**
   * 调用API，返回文本格式有效载荷。
   *
   * @param method        HTTP method.
   * @param pathSegments  Array of path segments (/a/b/c -> ["a", "b", "c"]).
   * @param urlParameters Parameters in the URL
   * @param content       payload to be posted.
   * @param headers       HTTP headers.
   * @param context       OpenTelemetry's Context.
   * @return CompletableFuture for Response.
   */
private CompletableFuture<Response> doInvokeApi(String method,
                               String[] pathSegments,
                               Map<String, List<String>> urlParameters,
                               byte[] content, Map<String, String> headers,
                               Context context) {
    // 方法人口参数基本就是一个非常简化的HTTP请求的格式抽象

    // 取 UUID 为 requestId
    final String requestId = UUID.randomUUID().toString();
    RequestBody body;

    //组装 okhttp3 的 request
    String contentType = headers != null ? headers.get(Metadata.CONTENT_TYPE) : null;
    MediaType mediaType = contentType == null ? MEDIA_TYPE_APPLICATION_JSON : MediaType.get(contentType);
    if (content == null) {
      body = mediaType.equals(MEDIA_TYPE_APPLICATION_JSON)
          ? REQUEST_BODY_EMPTY_JSON
          : RequestBody.Companion.create(new byte[0], mediaType);
    } else {
      body = RequestBody.Companion.create(content, mediaType);
    }
    HttpUrl.Builder urlBuilder = new HttpUrl.Builder();
    urlBuilder.scheme(DEFAULT_HTTP_SCHEME)
        .host(this.hostname)
        .port(this.port);
    for (String pathSegment : pathSegments) {
      urlBuilder.addPathSegment(pathSegment);
    }
    Optional.ofNullable(urlParameters).orElse(Collections.emptyMap()).entrySet().stream()
        .forEach(urlParameter ->
            Optional.ofNullable(urlParameter.getValue()).orElse(Collections.emptyList()).stream()
              .forEach(urlParameterValue ->
                  urlBuilder.addQueryParameter(urlParameter.getKey(), urlParameterValue)));

    Request.Builder requestBuilder = new Request.Builder()
        .url(urlBuilder.build())
        .addHeader(HEADER_DAPR_REQUEST_ID, requestId);
    if (context != null) {
      context.stream()
          .filter(entry -> ALLOWED_CONTEXT_IN_HEADERS.contains(entry.getKey().toString().toLowerCase()))
          .forEach(entry -> requestBuilder.addHeader(entry.getKey().toString(), entry.getValue().toString()));
    }
    if (HttpMethods.GET.name().equals(method)) {
      requestBuilder.get();
    } else if (HttpMethods.DELETE.name().equals(method)) {
      requestBuilder.delete();
    } else {
      requestBuilder.method(method, body);
    }

    String daprApiToken = Properties.API_TOKEN.get();
    if (daprApiToken != null) {
      requestBuilder.addHeader(Headers.DAPR_API_TOKEN, daprApiToken);
    }

    if (headers != null) {
      Optional.ofNullable(headers.entrySet()).orElse(Collections.emptySet()).stream()
          .forEach(header -> {
            requestBuilder.addHeader(header.getKey(), header.getValue());
          });
    }
    // 完成 request 的组装，构建 request 对象
    Request request = requestBuilder.build();

    // 发出 okhttp3 的请求，然后返回 CompletableFuture
    CompletableFuture<Response> future = new CompletableFuture<>();
    this.httpClient.newCall(request).enqueue(new ResponseFutureCallback(future));
    return future;
  }

在 http 请求组装过程中，注意 header 的处理：

request id： “X-DaprRequestId”，值为 UUID
dapr api token： “dapr-api-token”，值从系统变量 “dapr.api.token” 或者环境变量 “DAPR_API_TOKEN” 中获取
发送请求时明确传递的header：透传
OpenTelemetry 相关的值：会试图从传递进来的 OpenTelemetry context 中获取 “grpc-trace-bin”, “traceparent”, “tracestate” 这三个 header 并继续传递下去

8.3.2.2 - DaprHttpBuilder

builder for DaprHttp，基于 okhttp

代码没啥特殊的，就注意一下 okhttp 的一些参数的获取。

另外 MaxRequestsPerHost 默认为5，这是一个超级大坑！

private DaprHttp buildDaprHttp() {
    // 双重检查锁
    if (OK_HTTP_CLIENT == null) {
      synchronized (LOCK) {
        if (OK_HTTP_CLIENT == null) {
          OkHttpClient.Builder builder = new OkHttpClient.Builder();
          Duration readTimeout = Duration.ofSeconds(Properties.HTTP_CLIENT_READ_TIMEOUT_SECONDS.get());
          builder.readTimeout(readTimeout);

          Dispatcher dispatcher = new Dispatcher();
          dispatcher.setMaxRequests(Properties.HTTP_CLIENT_MAX_REQUESTS.get());
          //这里有一个超级大坑！
          // The maximum number of requests for each host to execute concurrently.
          // Default value is 5 in okhttp which is totally UNACCEPTABLE!
          // For sidecar case, set it the same as maxRequests.
          dispatcher.setMaxRequestsPerHost(Properties.HTTP_CLIENT_MAX_REQUESTS.get());
          builder.dispatcher(dispatcher);

          ConnectionPool pool = new ConnectionPool(Properties.HTTP_CLIENT_MAX_IDLE_CONNECTIONS.get(),
                  KEEP_ALIVE_DURATION, TimeUnit.SECONDS);
          builder.connectionPool(pool);

          OK_HTTP_CLIENT = builder.build();
        }
      }
    }

    return new DaprHttp(Properties.SIDECAR_IP.get(), Properties.HTTP_PORT.get(), OK_HTTP_CLIENT);
  }
}

8.3.3 - gRPC客户端

java sdk 中的gRPC客户端

8.3.4 - Dapr 客户端

java sdk 中的 Dapr 客户端

8.3.4.1 - DaprClient

DaprClient 接口定义

// 无论需要何种GRPC或HTTP客户端实现，都可以使用通用客户端适配器。
public interface DaprClient extends AutoCloseable {

  Mono<Void> waitForSidecar(int timeoutInMilliseconds);

  Mono<Void> shutdown();
}

其他方法都是和 dapr api 相关的方法，然后所有的方法都是实现了 reactive 风格，如：

  Mono<Void> publishEvent(String pubsubName, String topicName, Object data);

  Mono<Void> publishEvent(String pubsubName, String topicName, Object data, Map<String, String> metadata);

  Mono<Void> publishEvent(PublishEventRequest request);

8.3.4.2 - DaprPreviewClient

DaprPreviewClient 接口用于定义 preview 和 alpha 的 API

DaprPreviewClient 接口定义，目前只有新增的 configuration api 的方法和 state query 的方法：

// 无论需要何种GRPC或HTTP客户端实现，都可以使用通用客户端适配器。
public interface DaprPreviewClient extends AutoCloseable {

  Mono<ConfigurationItem> getConfiguration(String storeName, String key);

  Flux<List<ConfigurationItem>> subscribeToConfiguration(String storeName, String... keys);

  <T> Mono<QueryStateResponse<T>> queryState(String storeName, String query, TypeRef<T> type);
}

备注：distribuyted lock 的方法还没有加上来，估计是还没有开始实现。

8.3.4.3 - AbstractDaprClient

AbstractDaprClient 抽象基类实现

// 抽象类，具有客户端实现之间共同的便利方法。
abstract class AbstractDaprClient implements DaprClient, DaprPreviewClient {
  // 这里还是写死了 jackson！
  // TBD： 看下是哪里在用
  protected static final ObjectMapper JSON_REQUEST_MAPPER = new ObjectMapper();

  protected DaprObjectSerializer objectSerializer;

  protected DaprObjectSerializer stateSerializer;

    AbstractDaprClient(
      DaprObjectSerializer objectSerializer,
      DaprObjectSerializer stateSerializer) {
    this.objectSerializer = objectSerializer;
    this.stateSerializer = stateSerializer;
  }
}

其他都方法实现基本都是一些代理方法，没有实质性内容，实际实现都应该在子类中实现。

  @Override
  public Mono<Void> publishEvent(String pubsubName, String topicName, Object data) {
    return this.publishEvent(pubsubName, topicName, data, null);
  }

    @Override
  public Mono<Void> publishEvent(String pubsubName, String topicName, Object data, Map<String, String> metadata) {
    PublishEventRequest req = new PublishEventRequest(pubsubName, topicName, data)
        .setMetadata(metadata);
    return this.publishEvent(req).then();
  }

这些方法重载可以理解成一些语法糖，可以不用构造复杂的请求对象如 PublishEventRequest 就可以方便的直接使用而已。

8.3.4.4 - DaprClientHttp

Dapr Client Http 实现

类定义

public class DaprClientHttp extends AbstractDaprClient {

  private final DaprHttp client;
  private final boolean isObjectSerializerDefault;
  private final boolean isStateSerializerDefault;

  DaprClientHttp(DaprHttp client, DaprObjectSerializer objectSerializer, DaprObjectSerializer stateSerializer) {
    super(objectSerializer, stateSerializer);
    this.client = client;
    this.isObjectSerializerDefault = objectSerializer.getClass() == DefaultObjectSerializer.class;
    this.isStateSerializerDefault = stateSerializer.getClass() == DefaultObjectSerializer.class;
  }

   DaprClientHttp(DaprHttp client) {
    this(client, new DefaultObjectSerializer(), new DefaultObjectSerializer());
  }
}

client特有的方法实现

waitForSidecar() 方法

waitForSidecar() 方法通过连接指定的 sidecar ip地址和端口来判断并等待 sidecar 是不是可用。

  public Mono<Void> waitForSidecar(int timeoutInMilliseconds) {
    return Mono.fromRunnable(() -> {
      try {
        NetworkUtils.waitForSocket(Properties.SIDECAR_IP.get(), Properties.HTTP_PORT.get(), timeoutInMilliseconds);
      } catch (InterruptedException e) {
        throw new RuntimeException(e);
      }
    });
  }

close() 方法

close() 方法是实现 java.lang.AutoCloseable 的要求，DaprClient 继承了这个接口：

  @Override
  public void close() {
    // 简单的关闭 http client
    client.close();
  }

dapr api 方法的实现

publishEvent()方法

publishEvent()方法主要是两个任务：

组装发送请求的各种参数，包括 http 请求的 method，path，parameters，以及事件序列化后 byte[] 格式的数据
调用 DaprClient 发出 HTTP 请求

@Override
  public Mono<Void> publishEvent(PublishEventRequest request) {
    try {
      String pubsubName = request.getPubsubName();
      String topic = request.getTopic();
      Object data = request.getData();
      Map<String, String> metadata = request.getMetadata();

      if (topic == null || topic.trim().isEmpty()) {
        throw new IllegalArgumentException("Topic name cannot be null or empty.");
      }

      byte[] serializedEvent = objectSerializer.serialize(data);
      // Content-type can be overwritten on a per-request basis.
      // It allows CloudEvents to be handled differently, for example.
      String contentType = request.getContentType();
      if (contentType == null || contentType.isEmpty()) {
        contentType = objectSerializer.getContentType();
      }
      Map<String, String> headers = Collections.singletonMap("content-type", contentType);

      String[] pathSegments = new String[]{ DaprHttp.API_VERSION, "publish", pubsubName, topic };

      Map<String, List<String>> queryArgs = metadataToQueryArgs(metadata);
      return Mono.subscriberContext().flatMap(
          context -> this.client.invokeApi(
              DaprHttp.HttpMethods.POST.name(), pathSegments, queryArgs, serializedEvent, headers, context
          )
      ).then();
    } catch (Exception ex) {
      return DaprException.wrapMono(ex);
    }
  }

shutdown() 方法

注意这个 shutdown() 方法是关闭 sidecar，因此也是需要发送请求到 sidecar 的：

  @Override
  public Mono<Void> shutdown() {
    String[] pathSegments = new String[]{ DaprHttp.API_VERSION, "shutdown" };
    return Mono.subscriberContext().flatMap(
            context -> client.invokeApi(DaprHttp.HttpMethods.POST.name(), pathSegments,
                null, null, context))
        .then();
  }

http 请求最终是通过 DaprClient 发出去的。

8.3.4.5 - DaprClientGrpc

Dapr Client gRPC 实现

类定义


public class DaprClientGrpc extends AbstractDaprClient {

  private Closeable channel;

  private DaprGrpc.DaprStub asyncStub;

  DaprClientGrpc(
      Closeable closeableChannel,
      DaprGrpc.DaprStub asyncStub,
      DaprObjectSerializer objectSerializer,
      DaprObjectSerializer stateSerializer) {
    super(objectSerializer, stateSerializer);
    this.channel = closeableChannel;
    this.asyncStub = intercept(asyncStub);
  }

}

client特有的方法实现

waitForSidecar() 方法

waitForSidecar() 方法通过连接指定的 sidecar ip地址和端口来判断并等待 sidecar 是不是可用。

和 HTTP 的实现差别只是端口不同。

  @Override
  public Mono<Void> waitForSidecar(int timeoutInMilliseconds) {
    return Mono.fromRunnable(() -> {
      try {
        NetworkUtils.waitForSocket(Properties.SIDECAR_IP.get(), Properties.GRPC_PORT.get(), timeoutInMilliseconds);
      } catch (InterruptedException e) {
        throw new RuntimeException(e);
      }
    });
  }

close() 方法

close() 方法是实现 java.lang.AutoCloseable 的要求，DaprClient 继承了这个接口：

  public void close() throws Exception {
    if (channel != null) {
      DaprException.wrap(() -> {
        // 关闭channel
        channel.close();
        return true;
      }).call();
    }
  }

dapr api 方法的实现

publishEvent()方法

publishEvent()方法主要是两个任务：

组装发送 grpc 请求的各种参数，构建 PublishEventRequest 请求对象
调用 gRPC asyncStub 的对应方法发出 gRPC 请求

@Override
  public Mono<Void> publishEvent(PublishEventRequest request) {
    try {
      String pubsubName = request.getPubsubName();
      String topic = request.getTopic();
      Object data = request.getData();
      DaprProtos.PublishEventRequest.Builder envelopeBuilder = DaprProtos.PublishEventRequest.newBuilder()
          .setTopic(topic)
          .setPubsubName(pubsubName)
          .setData(ByteString.copyFrom(objectSerializer.serialize(data)));

      // Content-type can be overwritten on a per-request basis.
      // It allows CloudEvents to be handled differently, for example.
      String contentType = request.getContentType();
      if (contentType == null || contentType.isEmpty()) {
        contentType = objectSerializer.getContentType();
      }
      envelopeBuilder.setDataContentType(contentType);

      Map<String, String> metadata = request.getMetadata();
      if (metadata != null) {
        envelopeBuilder.putAllMetadata(metadata);
      }

      return Mono.subscriberContext().flatMap(
          context ->
              this.<Empty>createMono(
                  it -> intercept(context, asyncStub).publishEvent(envelopeBuilder.build(), it)
              )
      ).then();
    } catch (Exception ex) {
      return DaprException.wrapMono(ex);
    }
  }

8.3.5 - opencensus

dapr java sdk 提供对 opencensus 的支持

8.3.6 - 注解

dapr java sdk 提供注解支持

8.3.6.1 - topic注解

topic 注解提供对 subscribe 的支持

@topic 注解用来订阅某个主题， pubsubName, name, metadata 分别对应 dapr pub/sub API 中的 pubsubName， topic，metadata 字段：

@Documented
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface Topic {
  String name();
  String pubsubName();
  String metadata() default "{}";

  // 用于匹配传入的 cloud event 的规则。
  Rule rule() default @Rule(match = "", priority = 0);
}

以下是 @topic 注解使用的典型例子：

  @Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}")
  @PostMapping(path = "/testingtopic")
  public Mono<Void> handleMessage(@RequestBody(required = false) CloudEvent<?> cloudEvent) {
    ......
  }

8.3.6.2 - rule注解

rule 注解用来表述匹配规则

@topic 注解用来表述匹配规则。

@Documented
@Target(ElementType.ANNOTATION_TYPE)
@Retention(RetentionPolicy.RUNTIME)
public @interface Rule {

  // 用于匹配传入的 cloud event 的通用表达式语言（ Common Expression Language / CEL）表达。
  String match();

  // 规则的优先级，用于排序。最低的数字有更高的优先权。
  int priority();
}

以下是 @rule 注解使用的典型例子：

  @Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}",
          rule = @Rule(match = "event.type == \"v2\"", priority = 1))
  @PostMapping(path = "/testingtopicV2")
  public Mono<Void> handleMessageV2(@RequestBody(required = false) CloudEvent cloudEvent) {
    ......
  }

8.4 - actors

sdk子项目：actor模式实现

8.4.1 - client

actor client

8.4.2 - actor runtime

actor runtime

8.5 - springboot

sdk子项目：springboot集成

8.5.1 - spring auto configuration

sdk子项目：springboot集成

meta-inf

按照 springboot 的标准做法，src/main/resources/META-INF/spring.factories 文件内容如下:

org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
io.dapr.springboot.DaprAutoConfiguration

DaprAutoConfiguration

DaprAutoConfiguration 的内容非常简单：

@Configuration
@ConditionalOnWebApplication
@ComponentScan("io.dapr.springboot")
public class DaprAutoConfiguration {
}

DaprBeanPostProcessor

DaprBeanPostProcessor 用来处理 dapr 注解。

@Component
public class DaprBeanPostProcessor implements BeanPostProcessor {

  private static final ObjectMapper MAPPER = new ObjectMapper();

  private final EmbeddedValueResolver embeddedValueResolver;

  DaprBeanPostProcessor(ConfigurableBeanFactory beanFactory) {
    embeddedValueResolver = new EmbeddedValueResolver(beanFactory);
  }
  ......
}

BeanPostProcessor 接口的 postProcessBeforeInitialization() 的说明如下：

在任何 Bean 初始化回调（如 InitializingBean 的 afterPropertiesSet 或自定义 init-method ）之前，将此 BeanPostProcessor 应用于给定的新 Bean 实例。该 bean 将已经被填充了属性值。返回的 Bean 实例可能是一个围绕原始 Bean 的包装器。

也就是每个 bean 在初始化后都会调用这个方法以便植入我们需要的逻辑，如在这里就需要扫描 bean 是否带有 dapr 的 topic 注解：

  @Override
  public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException {
    if (bean == null) {
      return null;
    }

    subscribeToTopics(bean.getClass(), embeddedValueResolver);

    return bean;
  }

subscribeToTopics() 方法的具体实现后面再详细看，期间还有规则匹配的实现代码。

postProcessAfterInitialization() 方法没有特殊逻辑，简单返回原始bean：

  @Override
  public Object postProcessAfterInitialization(Object bean, String beanName) throws BeansException {
    return bean;
  }

8.5.2 - controller

处理 dapr callback 请求的 springboot controller

@RestController
public class DaprController {
}

healthz endpoint

用于 health check 的 endpoint，路径为 “/healthz”，实现为空。

@GetMapping(path = "/healthz")
public void healthz() {
}

TBD：这里是否要考虑 sidecar 的某些状态？目前这是只要 sidecar 进程和端口可以访问就会应答状态OK，而不管sidecar 中的功能是否正常。

dapr configuration endpoint

用于获取 dapr sidecar 的自身配置, 路径为 “/dapr/config”

@GetMapping(path = "/dapr/config", produces = MediaType.APPLICATION_JSON_VALUE)
public byte[] daprConfig() throws IOException {
  return ActorRuntime.getInstance().serializeConfig();
}

但看 ActorRuntime 的代码实现，这个 config 是指 actor configuration：

  public byte[] serializeConfig() throws IOException {
    return INTERNAL_SERIALIZER.serialize(this.config);
  }

  private ActorRuntime(ManagedChannel channel, DaprClient daprClient) throws IllegalStateException {
    this.config = new ActorRuntimeConfig();
  }

用于获取当前 dapr sidecar 的 pub/sub 订阅信息，路径为 “/dapr/subscribe”:

@GetMapping(path = "/dapr/subscribe", produces = MediaType.APPLICATION_JSON_VALUE)
public byte[] daprSubscribe() throws IOException {
  return SERIALIZER.serialize(DaprRuntime.getInstance().listSubscribedTopics());
}

actor endpoint

用于 actor 的 endpoint，包括 deactive, invoke actor method, invoke actor timer 和 invoke actor reminder:

@DeleteMapping(path = "/actors/{type}/{id}")
  public Mono<Void> deactivateActor(@PathVariable("type") String type,
                                    @PathVariable("id") String id) {
    return ActorRuntime.getInstance().deactivate(type, id);
  }

  @PutMapping(path = "/actors/{type}/{id}/method/{method}")
  public Mono<byte[]> invokeActorMethod(@PathVariable("type") String type,
                                        @PathVariable("id") String id,
                                        @PathVariable("method") String method,
                                        @RequestBody(required = false) byte[] body) {
    return ActorRuntime.getInstance().invoke(type, id, method, body);
  }

  @PutMapping(path = "/actors/{type}/{id}/method/timer/{timer}")
  public Mono<Void> invokeActorTimer(@PathVariable("type") String type,
                                     @PathVariable("id") String id,
                                     @PathVariable("timer") String timer,
                                     @RequestBody byte[] body) {
    return ActorRuntime.getInstance().invokeTimer(type, id, timer, body);
  }

  @PutMapping(path = "/actors/{type}/{id}/method/remind/{reminder}")
  public Mono<Void> invokeActorReminder(@PathVariable("type") String type,
                                        @PathVariable("id") String id,
                                        @PathVariable("reminder") String reminder,
                                        @RequestBody(required = false) byte[] body) {
    return ActorRuntime.getInstance().invokeReminder(type, id, reminder, body);
  }

8.5.3 - topic subscription

实现 pub/sub 中的 topic 订阅

读取 topic 订阅注解

订阅 topic 的具体代码实现在类 DaprBeanPostProcessor 的 subscribeToTopics() 方法中，在 bean 初始化时被调用。

topic 注解使用的例子如下：

  @Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}",
          rule = @Rule(match = "event.type == \"v2\"", priority = 1))
  @PostMapping(path = "/testingtopicV2")
  public Mono<Void> handleMessageV2(@RequestBody(required = false) CloudEvent cloudEvent) {
    ......
  }

读取 topic 注解

现在需要在 postProcessBeforeInitialization() 方法中扫描并解析所有有 topic 注解的 bean：

@Override
public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException {
  subscribeToTopics(bean.getClass(), embeddedValueResolver);
  return bean;
}

private static void subscribeToTopics(Class clazz, EmbeddedValueResolver embeddedValueResolver) {
    if (clazz == null) {
      return;
    }

    // 先用 Superclass 做一次递归调用，这样就会从当前类的父类开始先推衍
    // 由于每次都是父类先执行，因此这会一直递归到最顶层的 Object 类 
    subscribeToTopics(clazz.getSuperclass(), embeddedValueResolver);
    // 取当前类的所有方法
    for (Method method : clazz.getDeclaredMethods()) {
      // 然后看方法上是不是标记了 dapr 的 topic 注解
      Topic topic = method.getAnnotation(Topic.class);
      if (topic == null) {
        continue;
      }

      // 如果方法上有标记 dapr 的 topic 注解，则开始处理
      // 先获取 topic 注解上的属性 topic name, pubsub name, rule 
      Rule rule = topic.rule();
      String topicName = embeddedValueResolver.resolveStringValue(topic.name());
      String pubSubName = embeddedValueResolver.resolveStringValue(topic.pubsubName());
      // rule 也是一个注解，获取 match 属性
      String match = embeddedValueResolver.resolveStringValue(rule.match());
      if ((topicName != null) && (topicName.length() > 0) && pubSubName != null && pubSubName.length() > 0) {
        // topicName 和 pubSubName 不能为空 （metadata 可以为空，rule可以为空）
        try {
          TypeReference<HashMap<String, String>> typeRef
                  = new TypeReference<HashMap<String, String>>() {};
          // 读取 topic 注解上的 metadata 属性
          Map<String, String> metadata = MAPPER.readValue(topic.metadata(), typeRef);
          // 读取路由信息，细节看下一节
          List<String> routes = getAllCompleteRoutesForPost(clazz, method, topicName);
          for (String route : routes) {
            // 将读取的路由信息添加到 dapr runtime 中。
            // 细节看下一节
            DaprRuntime.getInstance().addSubscribedTopic(
                pubSubName, topicName, match, rule.priority(), route, metadata);
          }
        } catch (JsonProcessingException e) {
          throw new IllegalArgumentException("Error while parsing metadata: " + e);
        }
      }
    }
  }

读取路由信息

路由信息配置方法如下：

  @Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}",
          rule = @Rule(match = "event.type == \"v2\"", priority = 1))
  @PostMapping(path = "/testingtopicV2")
  public Mono<Void> handleMessageV2(@RequestBody(required = false) CloudEvent cloudEvent) {
    ......
  }

getAllCompleteRoutesForPost() 方法负责读取 @rule 注解相关的路由信息：

private static List<String> getAllCompleteRoutesForPost(Class clazz, Method method, String topicName) {
    List<String> routesList = new ArrayList<>();
    RequestMapping clazzRequestMapping =
        (RequestMapping) clazz.getAnnotation(RequestMapping.class);
    String[] clazzLevelRoute = null;
    if (clazzRequestMapping != null) {
      clazzLevelRoute = clazzRequestMapping.value();
    }
    // 读取该方法上的路由信息，注意必须是 POST
    String[] postValueArray = getRoutesForPost(method, topicName);
    if (postValueArray != null && postValueArray.length >= 1) {
      for (String postValue : postValueArray) {
        if (clazzLevelRoute != null && clazzLevelRoute.length >= 1) {
          for (String clazzLevelValue : clazzLevelRoute) {
            // 完整的路由路径应该是类级别 + 方法级别
            String route = clazzLevelValue + confirmLeadingSlash(postValue);
            routesList.add(route);
          }
        } else {
          routesList.add(postValue);
        }
      }
    }
    return routesList;
  }

getRoutesForPost() 方法用来读取 @topic 注解所在方法的 @PostMapping 注解，以便获得路由的 path 信息，对应例子如下：

  @Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}",
          rule = @Rule(match = "event.type == \"v2\"", priority = 1))
  @PostMapping(path = "/testingtopicV2")
  public Mono<Void> handleMessageV2(@RequestBody(required = false) CloudEvent cloudEvent) {
    ......
  }

getRoutesForPost() 方法的代码实现如下：

private static String[] getRoutesForPost(Method method, String topicName) {
    String[] postValueArray = new String[] {topicName};
    // 读取 PostMapping 注解
    PostMapping postMapping = method.getAnnotation(PostMapping.class);
    if (postMapping != null) {
      // 如果有 PostMapping 注解
      if (postMapping.path() != null && postMapping.path().length >= 1) {
        // 如果 path 属性有设置则从 path 属性取值
        postValueArray = postMapping.path();
      } else if (postMapping.value() != null && postMapping.value().length >= 1) {
        // 如果 path 属性没有设置则直接从 PostMapping 注解的 value 中取值
        postValueArray = postMapping.value();
      }
    } else {
      // 如果没有 PostMapping 注解，则尝试读取 RequestMapping 注解
      RequestMapping reqMapping = method.getAnnotation(RequestMapping.class);
      for (RequestMethod reqMethod : reqMapping.method()) {
        // 要求 RequestMethod 为 POST
        if (reqMethod == RequestMethod.POST) {
          // 同样读取 path 或者 value 的值
          if (reqMapping.path() != null && reqMapping.path().length >= 1) {
            postValueArray = reqMapping.path();
          } else if (reqMapping.value() != null && reqMapping.value().length >= 1) {
            postValueArray = reqMapping.value();
          }
          break;
        }
      }
    }
    return postValueArray;
  }

getRoutesForPost() 方法的解读，就是从标记了 @topic 注解的方法上读取路由信息，也就是后续订阅的事件应该发送的地址。读取的逻辑为：

优先读取 PostMapping 注解，没有的话读取 RequestMethod 为 POST 的 RequestMapping 注解
优先读取上述注解的 path 属性，没有的话读取 value

保存 topic 订阅信息

topic 订阅信息在读取之后，就会通过 DaprRuntime 的 addSubscribedTopic() 方法保存起来：

public synchronized void addSubscribedTopic(String pubsubName,
                                              String topicName,
                                              String match,
                                              int priority,
                                              String route,
                                              Map<String,String> metadata) {
    // 用 pubsubName 和 topicName 做 key
    DaprTopicKey topicKey = new DaprTopicKey(pubsubName, topicName);

    // 获取 key 对应的 builder，没有的话就创建一个
    DaprSubscriptionBuilder builder = subscriptionBuilders.get(topicKey);
    if (builder == null) {
      builder = new DaprSubscriptionBuilder(pubsubName, topicName);
      subscriptionBuilders.put(topicKey, builder);
    }

    // match 不为空则添加 rule，为空则采用默认路径
    if (match.length() > 0) {
      builder.addRule(route, match, priority);
    } else {
      builder.setDefaultPath(route);
    }

    if (metadata != null && !metadata.isEmpty()) {
      builder.setMetadata(metadata);
    }
  }

考虑到调用的地方代码是：

// 读取路由信息
List<String> routes = getAllCompleteRoutesForPost(clazz, method, topicName);
for (String route : routes) {
  // 将读取的路由信息添加到 dapr runtime 中。
  DaprRuntime.getInstance().addSubscribedTopic(
      pubSubName, topicName, match, rule.priority(), route, metadata);
}

所以前面的读取流程可以理解为就是读取和 topic 订阅有关的上述6个参数，然后保存起老。

应答 topic 订阅信息

在 DaprController 中，daprSubscribe() 方法对外暴露路径 /dapr/subscribe ，以便让 dapr sidecar 可以通过读取该路径来获取当前应用的 topic 订阅信息：

@GetMapping(path = "/dapr/subscribe", produces = MediaType.APPLICATION_JSON_VALUE)
public byte[] daprSubscribe() throws IOException {
  return SERIALIZER.serialize(DaprRuntime.getInstance().listSubscribedTopics());
}

而 DaprRuntime 的 listSubscribedTopics() 方法获取的就是前面保存起来的 topic 订阅信息：

  public synchronized DaprTopicSubscription[] listSubscribedTopics() {
    List<DaprTopicSubscription> values = subscriptionBuilders.values().stream()
            .map(b -> b.build()).collect(Collectors.toList());
    return values.toArray(new DaprTopicSubscription[0]);
  }

流程总结

整个 topic 订阅流程的示意图如下：

title topic subscription
hide footbox
skinparam style strictuml


box "Application" #LightBlue
participant DaprBeanPostProcessor
participant bean
participant DaprRuntime
participant DaprController
end box
participant daprd

-> DaprBeanPostProcessor: postProcessBeforeInitialization(bean)

DaprBeanPostProcessor -> bean: get @topic
bean --> DaprBeanPostProcessor
 
alt if bean has @topic
DaprBeanPostProcessor -> bean: parse @topic @rule
bean --> DaprBeanPostProcessor: pubsub name, topic name, match,\n priority, routes, metadata

DaprBeanPostProcessor -> DaprRuntime: addSubscribedTopic()

DaprRuntime -> DaprRuntime: save in map\n subscriptionBuilders

DaprRuntime --> DaprBeanPostProcessor
end
<-- DaprBeanPostProcessor

daprd -> DaprController: get subscription
DaprController -> DaprRuntime: listSubscribedTopics()
DaprRuntime --> DaprController
DaprController --> daprd

8.6 - workflow

sdk子项目：workflow

8.6.1 - workflow定义

workflow定义

workflow

Workflow 定义定义很简单：

public abstract class Workflow {
  // 默认构造函数应该可以不用写的
  public Workflow(){
  }

  public abstract WorkflowStub create();

  public void run(WorkflowContext ctx) {
    this.create().run(ctx);
  }
}

create() 方法定义创建 WorkflowStub 的模板方法，然后在 run() 方法通过执行 create() 方法创建 WorkflowStub ，在执行 WorkflowStub 的 run() 方法。

WorkflowStub

WorkflowStub 是一个单方法的接口定义，用于实现函数编程，标注有 java.lang.@FunctionalInterface 注解。

@FunctionalInterface
public interface WorkflowStub {
  void run(WorkflowContext ctx);
}

@FunctionalInterface 的 javadoc 描述如下：

一种信息性注解类型，用于表明接口类型声明是 Java 语言规范所定义的函数接口。从概念上讲，一个函数接口只有一个抽象方法。由于默认方法有一个实现，所以它们不是抽象方法。如果一个接口声明了一个覆盖 java.lang.Object 公共方法之一的抽象方法，该方法也不计入接口的抽象方法数，因为接口的任何实现都将有一个来自 java.lang.Object 或其他地方的实现。

请注意，函数接口的实例可以通过 lambda 表达式、方法引用或构造器引用来创建。

如果一个类型被注释为该注释类型，编译器必须生成一条错误信息，除非：

该类型是接口类型，而不是注解类型、枚举或类。

注解的类型满足函数接口的要求。

然而，无论接口声明中是否有 FunctionalInterface 注解，编译器都会将任何符合函数接口定义的接口视为函数接口。

WorkflowContext

出乎意外的是 WorkflowContext 的定义超级复杂，远远不是一个上下文那么简单。

WorkflowContext的基本方法

WorkflowContext 接口上定义了大量的方法，其中部分基本方法

public interface WorkflowContext {
  // 通过这个方法传递 logger 对象以供在后续执行时打印日志
  Logger getLogger();

  // 获取 workflow 的 name
  String getName();

  // 获取 workflow instance 的 id
  String getInstanceId();

  //获取当前协调时间（UTC）
  Instant getCurrentInstant();

  // 完成当前 wofklow，输出是完成的workflow的序列化输出
  void complete(Object output);
  ......
}

waitForExternalEvent()方法

WorkflowContext 接口上定义了三个 waitForExternalEvent() 接口方法和一个默认实现：

public interface WorkflowContext {
  ......
  <V> Task<V> waitForExternalEvent(String name, Duration timeout, Class<V> dataType) throws TaskCanceledException;

  <V> Task<Void> waitForExternalEvent(String name, Duration timeout) throws TaskCanceledException;

  <V> Task<Void> waitForExternalEvent(String name) throws TaskCanceledException;

  default <V> Task<V> waitForExternalEvent(String name, Class<V> dataType) {
    try {
      return this.waitForExternalEvent(name, null, dataType);
    } catch (TaskCanceledException e) {
      // This should never happen because of the max duration
      throw new RuntimeException("An unexpected exception was throw while waiting for an external event.", e);
    }
  }
  ......
}

waitForExternalEvent 的 javadoc 描述如下：

等待名为 name 的事件发生，并返回一个 Task，该任务在收到事件时完成，或在超时时取消。

如果当前协调器尚未等待名为 name 的事件，那么事件将保存在协调器实例状态中，并在调用此方法时立即派发。即使当前协调器在收到事件前取消了等待操作，事件保存也会发生。

协调器可以多次等待同一事件名，因此允许等待多个同名事件。协调器收到的每个外部事件将只完成本方法返回的一个任务。

特别注意：这个 Task 的类型是 com.microsoft.durabletask.Task ，直接用在 dapr workflow 的接口定义上，意味着 dapr workflow 彻底和 durabletask 绑定。

callActivity()方法

WorkflowContext 接口上定义了 callActivity() 接口方法和多个默认方法来重写不同参数的 callActivity() 方法

public interface WorkflowContext {
  ......
  <V> Task<V> callActivity(String name, Object input, TaskOptions options, Class<V> returnType);

  default Task<Void> callActivity(String name) {
    return this.callActivity(name, null, null, Void.class);
  }

  default Task<Void> callActivity(String name, Object input) {
    return this.callActivity(name, input, null, Void.class);
  }

  default <V> Task<V> callActivity(String name, Class<V> returnType) {
    return this.callActivity(name, null, null, returnType);
  }

  default <V> Task<V> callActivity(String name, Object input, Class<V> returnType) {
    return this.callActivity(name, input, null, returnType);
  }

  default Task<Void> callActivity(String name, Object input, TaskOptions options) {
    return this.callActivity(name, input, options, Void.class);
  }
  ......
}

waitForExternalEvent 的 javadoc 描述如下：

使用指定的 input 异步调用一个 activity，并在 activity 完成时返回一个新的 task。如果 activity 成功完成，返回的 task 值将是 task 的输出。如果 activity 失败，返回的 task 将以 TaskFailedException 异常完成。

isReplaying() 方法

isReplaying() 用来判断当前工作流当前是否正在重放之前的执行：

public interface WorkflowContext {
  ......
  boolean isReplaying();
}

waitForExternalEvent 的 javadoc 描述如下：

获取一个值，指示工作流当前是否正在重放之前的执行。

工作流函数从内存中卸载后会进行 “重放”，以重建本地变量状态。在重放过程中，先前执行的任务将自动使用存储在工作流历史记录中的先前查看值完成。一旦工作流达到不再重放现有历史记录的程度，此方法将返回 false。

如果您的逻辑只需要在不重放时运行，则可以使用此方法。例如，某些类型的应用程序日志在作为重放的一部分进行复制时可能会变得过于嘈杂。应用程序代码可以检查函数是否正在重放，然后在该值为 false 时发出日志语句。

allOf()和 anyOf()方法

  <V> Task<List<V>> allOf(List<Task<V>> tasks) throws CompositeTaskFailedException;

  Task<Task<?>> anyOf(List<Task<?>> tasks);

  default Task<Task<?>> anyOf(Task<?>... tasks) {
    return this.anyOf(Arrays.asList(tasks));
  }

allOf 的 javadoc 描述如下：

返回一个新任务，该任务在所有给定任务完成后完成。如果任何给定任务在完成时出现异常，返回的任务也会在完成时出现 CompositeTaskFailedException，其中包含第一次遇到的故障的详细信息。返回的任务值是给定任务返回值的有序列表。如果没有提供任务，则返回值为空的已完成任务。

该方法适用于在继续协调的下一步之前等待一组独立任务的完成，如下面的示例：

Task t1 = ctx.callActivity(“MyActivity”, String.class)； Task t2 = ctx.callActivity(“MyActivity”, String.class)； Task t3 = ctx.callActivity(“MyActivity”, String.class)；

List orderedResults = ctx.allOf(List.of(t1, t2, t3)).await()；

任何给定任务出现异常都会导致非受查的 CompositeTaskFailedException 异常。可以通过检查该异常来获取单个任务的失败详情。

try { List orderedResults = ctx.allOf(List.of(t1, t2, t3)).await()； } catch (CompositeTaskFailedException e) { List exceptions = e.getExceptions() } }

特别注意：这个 CompositeTaskFailedException 的类型是 com.microsoft.durabletask.CompositeTaskFailedException ，直接用在 dapr workflow 的接口定义上，意味着 dapr workflow 彻底和 durabletask 绑定。

anyOf 的 javadoc 描述如下：

当任何给定任务完成时，返回一个已完成的新任务。新任务的值是已完成任务对象的引用。如果没有提供任务，则返回一个永不完成的任务。

该方法适用于等待多个并发任务，并在第一个任务完成时执行特定于任务的操作，如下面的示例：

Task event1 = ctx.waitForExternalEvent(“Event1”)； Task event2 = ctx.waitForExternalEvent(“Event2”)； Task event3 = ctx.waitForExternalEvent(“Event3”)；

Task winner = ctx.anyOf(event1、event2、event3).await()；如果（winner == event1）{ // … } else if (winner == event2) { // … // … } else if (winner == event3) { // … // … }

anyOf 方法还可用于实现长时间超时，如下面的示例：

Task activityTask = ctx.callActivity(“SlowActivity”)； Task timeoutTask = ctx.createTimer(Duration.ofMinutes(30))；

Task winner = ctx.anyOf(activityTask, timeoutTask).await()；如果（winner == activityTask）{ // 完成情况 } else { // 超时情况 }

createTimer()方法

创建一个在指定延迟后过期的 durable timer。

指定较长的延迟（例如，几天或更长时间的延迟）可能会导致创建多个内部管理的 durable timer。协调器代码不需要意识到这种行为。不过，框架日志和存储的历史状态中可能会显示这种行为。

  Task<Void> createTimer(Duration duration);

  default Task<Void> createTimer(ZonedDateTime zonedDateTime) {
    throw new UnsupportedOperationException("This method is not implemented.");
  }

getInput()方法

getInput() 方法获取当前任务协调器的反序列化输入。

<V> V getInput(Class<V> targetType);

callSubWorkflow()

callSubWorkflow() 方法异步调用另一个工作流作为子工作流：

  default Task<Void> callSubWorkflow(String name) {
    return this.callSubWorkflow(name, null);
  }

  default Task<Void> callSubWorkflow(String name, Object input) {
    return this.callSubWorkflow(name, input, null);
  }

  default <V> Task<V> callSubWorkflow(String name, Object input, Class<V> returnType) {
    return this.callSubWorkflow(name, input, null, returnType);
  }

  default <V> Task<V> callSubWorkflow(String name, Object input, String instanceID, Class<V> returnType) {
    return this.callSubWorkflow(name, input, instanceID, null, returnType);
  }

  default Task<Void> callSubWorkflow(String name, Object input, String instanceID, TaskOptions options) {
    return this.callSubWorkflow(name, input, instanceID, options, Void.class);
  }

  <V> Task<V> callSubWorkflow(String name,
                              @Nullable Object input,
                              @Nullable String instanceID,
                              @Nullable TaskOptions options,
                              Class<V> returnType);

callSubWorkflow() 的 javadoc 描述如下：

异步调用另一个工作流作为子工作流，并在子工作流完成时返回一个任务。如果子工作流成功完成，返回的任务值将是 activity 的输出。如果子工作流失败，返回的任务将以 TaskFailedException 异常完成。

子工作流有自己的 instance ID、历史和状态，与启动它的父工作流无关。将大型协调分解为子工作流有很多好处：

将大型协调拆分成一系列较小的子工作流可以使代码更易于维护。

如果协调逻辑需要协调大量任务，那么在多个计算节点上并发分布协调逻辑就非常有用。

通过保持较小的父协调历史记录，可以减少内存使用和 CPU 开销。

缺点是启动子工作流和处理其输出会产生开销。这通常只适用于非常小的协调。

由于子工作流独立于父工作流，因此终止父协调不会影响任何子工作流。

continueAsNew()

callSubWorkflow() 方法使用新输入重启协调并清除其历史记录：

  default void continueAsNew(Object input) {
    this.continueAsNew(input, true);
  }

  void continueAsNew(Object input, boolean preserveUnprocessedEvents);
}

continueAsNew() 的 javadoc 描述如下：

使用新输入重启协调并清除其历史记录。

该方法主要针对永恒协调(eternal orchestrations)，即可能永远无法完成的协调。它的工作原理是重新启动协调，为其提供新的输入，并截断现有的协调历史。它允许协调无限期地继续运行，而不会让其历史记录无限制地增长。定期截断历史记录的好处包括降低内存使用率、减少存储容量，以及在重建状态时缩短协调器重播时间。

当协调器调用 continueAsNew 时，任何未完成任务的结果都将被丢弃。例如，如果计划了一个定时器，但在定时器启动前调用了 continueAsNew，那么定时器事件将被丢弃。唯一的例外是外部事件。默认情况下，如果协调收到外部事件但尚未处理，则会通过调用 waitForExternalEvent 将该事件保存在协调状态单元中。即使协调器使用 continueAsNew 重新启动，这些事件也会保留在内存中。可以通过为 preserveUnprocessedEvents 参数值指定 false 来禁用此行为。

协调器实现应在调用 continueAsNew 方法后立即完成。

8.6.2 - DaprWorkflowContextImpl实现

DaprWorkflowContextImpl实现

类定义

DaprWorkflowContextImpl 类实现了 WorkflowContext 接口，实现上采用代理给内部字段 innerContext，这是一个 com.microsoft.durabletask.TaskOrchestrationContext

import com.microsoft.durabletask.TaskOrchestrationContext;

public class DaprWorkflowContextImpl implements WorkflowContext {
  private final TaskOrchestrationContext innerContext;
  private final Logger logger;
  ......
}

构造函数只是简单赋值，加了一些必要的 null 检测：

public DaprWorkflowContextImpl(TaskOrchestrationContext context) throws IllegalArgumentException {
    this(context, LoggerFactory.getLogger(WorkflowContext.class));
  }

  public DaprWorkflowContextImpl(TaskOrchestrationContext context, Logger logger) throws IllegalArgumentException {
    if (context == null) {
      throw new IllegalArgumentException("Context cannot be null");
    }
    if (logger == null) {
      throw new IllegalArgumentException("Logger cannot be null");
    }

    this.innerContext = context;
    this.logger = logger;
  }

方法实现

除 getLogger() 外的所有方法的实现都是简单的代理给 innerContext 的同名方法：

  public Logger getLogger() {
    if (this.innerContext.getIsReplaying()) {
      return NOPLogger.NOP_LOGGER;
    }
    return this.logger;
  }

  public String getName() {
    return this.innerContext.getName();
  }

  public String getInstanceId() {
    return this.innerContext.getInstanceId();
  }

  public Instant getCurrentInstant() {
    return this.innerContext.getCurrentInstant();
  }

  public boolean isReplaying() {
    return this.innerContext.getIsReplaying();
  }

  public <V> Task<V> callSubWorkflow(String name, @Nullable Object input, @Nullable String instanceID,
                                     @Nullable TaskOptions options, Class<V> returnType) {

    return this.innerContext.callSubOrchestrator(name, input, instanceID, options, returnType);
  }

  public void continueAsNew(Object input) {
    this.innerContext.continueAsNew(input);
  }

小结

这个类基本就是 com.microsoft.durabletask.TaskOrchestrationContext 的简单包裹，所有功能都代理给 com.microsoft.durabletask.TaskOrchestrationContext，包括设计甚至方法名。

dapr 的 workflow 实现基本是完全绑定在 durabletask 上的。

8.6.3 - runtime package

runtime package中的代码

8.6.3.1 - WorkflowRuntime实现

WorkflowRuntime的代码实现

WorkflowRuntime 简单封装了 durabletask 的 DurableTaskGrpcWorker：

import com.microsoft.durabletask.DurableTaskGrpcWorker;

public class WorkflowRuntime implements AutoCloseable {

  private DurableTaskGrpcWorker worker;

  public WorkflowRuntime(DurableTaskGrpcWorker worker) {
    this.worker = worker;
  }
  ......   
}

然后将 start() 和 close() 方法简单的代理给 durabletask 的 DurableTaskGrpcWorker：

  public void start() {
    this.start(true);
  }

  public void start(boolean block) {
    if (block) {
      this.worker.startAndBlock();
    } else {
      this.worker.start();
    }
  }

  public void close() {
    if (this.worker != null) {
      this.worker.close();
      this.worker = null;
    }
  }

8.6.3.2 - WorkflowRuntimeBuilder实现

WorkflowRuntime的代码实现

类定义

WorkflowRuntimeBuilder 用来构建 WorkflowRuntime，类似 WorkflowRuntime 只是简单封装了 durabletask 的 DurableTaskGrpcWorker， WorkflowRuntimeBuilder 的实现也是简单封装了 durabletask 的 DurableTaskGrpcWorkerBuilder：

import com.microsoft.durabletask.DurableTaskGrpcWorkerBuilder;

public class WorkflowRuntimeBuilder {
  private static volatile WorkflowRuntime instance;
  private DurableTaskGrpcWorkerBuilder builder;

  public WorkflowRuntimeBuilder() {
    this.builder = new DurableTaskGrpcWorkerBuilder().grpcChannel(NetworkUtils.buildGrpcManagedChannel());
  }
  ......
}

grpcChannel()的细节后面细看。

registerWorkflow()方法

registerWorkflow() 方法注册 workflow 对象，实际代理给 DurableTaskGrpcWorkerBuilder 的 addOrchestration() 方法：

  public <T extends Workflow> WorkflowRuntimeBuilder registerWorkflow(Class<T> clazz) {
    this.builder = this.builder.addOrchestration(
        new OrchestratorWrapper<>(clazz)
    );

    return this;
  }

registerActivity() 方法

registerActivity() 方法注册 activity 对象，实际代理给 DurableTaskGrpcWorkerBuilder 的 addActivity() 方法：

  public <T extends WorkflowActivity> void registerActivity(Class<T> clazz) {
    this.builder = this.builder.addActivity(
        new ActivityWrapper<>(clazz)
    );
  }

build() 方法

build() 方法实现了一个简单的单例，只容许构建一个 WorkflowRuntime 的 instance：

private static volatile WorkflowRuntime instance;  

public WorkflowRuntime build() {
    if (instance == null) {
      synchronized (WorkflowRuntime.class) {
        if (instance == null) {
          instance = new WorkflowRuntime(this.builder.build());
        }
      }
    }
    return instance;
  }

grpcChannel 的构建细节

DurableTaskGrpcWorkerBuilder() 在构建时，需要设置 grpcChannel，而这个 grpcChannel 是通过 NetworkUtils.buildGrpcManagedChannel() 方法来实现的。

NetworkUtils.buildGrpcManagedChannel() 在 sdk/src/main/java/io/dapr/utils/NetworkUtils.java 文件中，是一个通用的网络工具类。buildGrpcManagedChannel() 方法的实现如下：

  
private static final String DEFAULT_SIDECAR_IP = "127.0.0.1";
private static final Integer DEFAULT_GRPC_PORT = 50001;

public static final Property<String> SIDECAR_IP = new StringProperty(
      "dapr.sidecar.ip",
      "DAPR_SIDECAR_IP",
      DEFAULT_SIDECAR_IP);

  public static final Property<Integer> GRPC_PORT = new IntegerProperty(
      "dapr.grpc.port",
      "DAPR_GRPC_PORT",
      DEFAULT_GRPC_PORT);

  public static final Property<String> GRPC_ENDPOINT = new StringProperty(
      "dapr.grpc.endpoint",
      "DAPR_GRPC_ENDPOINT",
      null);

public static ManagedChannel buildGrpcManagedChannel() {
    // 从系统属性或者环境变量中读取 dapr sidecar 的IP
    String address = Properties.SIDECAR_IP.get();
    // 从系统属性或者环境变量中读取 dapr grpc 端口
    int port = Properties.GRPC_PORT.get();
    // 默认不用https
    boolean insecure = true;
    // 从系统属性或者环境变量中读取 dapr grpc 端点信息
    String grpcEndpoint = Properties.GRPC_ENDPOINT.get();
    if ((grpcEndpoint != null) && !grpcEndpoint.isEmpty()) {
      // 如果 dapr grpc 端点不为空，则用 grpc 端点的内容覆盖 
      URI uri = URI.create(grpcEndpoint);
      // 通过 schema 是不是 http 来判断是 http 还是 https
      insecure = uri.getScheme().equalsIgnoreCase("http");
      // grpcEndpoint 如果设置有端口则采用，没有设置则根据是 http 还是 https 来选择 80 或者 443 端口
      port = uri.getPort() > 0 ? uri.getPort() : (insecure ? 80 : 443);
      // 覆盖 dapr sidecar 的地址
      address = uri.getHost();
      if ((uri.getPath() != null) && !uri.getPath().isEmpty()) {
        address += uri.getPath();
      }
    }
    
    // 构建连接到指定地址的 grpc channel
    ManagedChannelBuilder<?> builder = ManagedChannelBuilder.forAddress(address, port)
        .userAgent(Version.getSdkVersion());
    if (insecure) {
      builder = builder.usePlaintext();
    }
    return builder.build();
  }

从部署来看，runtime 运行在 client 一侧的 app 应用程序内部，然后通过 durabletask 的 sdk 连接到 dapr sidecar 了，走 grpc 协议。

这个设计有点奇怪，dapr sdk 和 dapr sidecar 之间没有走标准的 dapr API，而是通过 durabletask 的 sdk 。

8.6.3.3 - OrchestratorWrapper实现

OrchestratorWrapper的代码实现

背景

WorkflowRuntimeBuilder 的 registerWorkflow() 方法在注册 workflow 对象时，实际代理给 DurableTaskGrpcWorkerBuilder 的 addOrchestration() 方法：

import com.microsoft.durabletask.TaskOrchestrationFactory;  

public <T extends Workflow> WorkflowRuntimeBuilder registerWorkflow(Class<T> clazz) {
    this.builder = this.builder.addOrchestration(
        new OrchestratorWrapper<>(clazz)
    );

    return this;
  }

而 addOrchestration() 方法的输入参数为 com.microsoft.durabletask.TaskOrchestrationFactory：

public interface TaskOrchestrationFactory {
    String getName();
    TaskOrchestration create();
}

因此需要提供一个 TaskOrchestrationFactory 的实现。

类定义

OrchestratorWrapper 类实现了 com.microsoft.durabletask.TaskOrchestrationFactory 接口：

class OrchestratorWrapper<T extends Workflow> implements TaskOrchestrationFactory {
  private final Constructor<T> workflowConstructor;
  private final String name;
  ......  
}

构造函数：

  public OrchestratorWrapper(Class<T> clazz) {
    // 获取并设置 name
    this.name = clazz.getCanonicalName();
    try {
      // 获取 Constructor
      this.workflowConstructor = clazz.getDeclaredConstructor();
    } catch (NoSuchMethodException e) {
      throw new RuntimeException(
          String.format("No constructor found for workflow class '%s'.", this.name), e
      );
    }
  }

接口实现

TaskOrchestrationFactory 接口要求的 getName() 方法，直接返回前面获取的 name：

  @Override
  public String getName() {
    return name;
  }

TaskOrchestrationFactory 接口要求的 create() 方法，要返回一个 durabletask 的 TaskOrchestration ，而 TaskOrchestration 是一个 @FunctionalInterface，仅有一个 run() 方法：

@FunctionalInterface
public interface TaskOrchestration {
    void run(TaskOrchestrationContext ctx);
}

因此构建 TaskOrchestration 实例的方式被简写为：

import com.microsoft.durabletask.TaskOrchestration;

  @Override
  public TaskOrchestration create() {
    return ctx -> {
      T workflow;
      try {
        // 通过 workflow 的构造器生成一个 workflow 实例
        workflow = this.workflowConstructor.newInstance();
      } catch (InstantiationException | IllegalAccessException | InvocationTargetException e) {
        throw new RuntimeException(
            String.format("Unable to instantiate instance of workflow class '%s'", this.name), e
        );
      }
      // 将 durable task 的 context 包装为 dapr 的 workflow context DaprWorkflowContextImpl
      // 然后执行 workflow.run()
      workflow.run(new DaprWorkflowContextImpl(ctx));
    };

  }

8.6.3.4 - ActivityWrapper实现

ActivityWrapper的代码实现

背景

WorkflowRuntimeBuilder 的 registerActivity() 方法在注册 activity 对象时，实际代理给 DurableTaskGrpcWorkerBuilder 的 addActivity() 方法：

import com.microsoft.durabletask.TaskOrchestrationFactory;  

  public <T extends WorkflowActivity> void registerActivity(Class<T> clazz) {
    this.builder = this.builder.addActivity(
        new ActivityWrapper<>(clazz)
    );
  }

而 addActivity() 方法的输入参数为 com.microsoft.durabletask.TaskActivityFactory：

public interface TaskActivityFactory {
    String getName();
    TaskActivity create();
}

因此需要提供一个 TaskActivityFactory 的实现。

类定义

ActivityWrapper 类实现了 com.microsoft.durabletask.TaskActivityFactory 接口：

public class ActivityWrapper<T extends WorkflowActivity> implements TaskActivityFactory {
  private final Constructor<T> activityConstructor;
  private final String name;
  ......  
}

构造函数：

  public ActivityWrapper(Class<T> clazz) {
    this.name = clazz.getCanonicalName();
    try {
      this.activityConstructor = clazz.getDeclaredConstructor();
    } catch (NoSuchMethodException e) {
      throw new RuntimeException(
          String.format("No constructor found for activity class '%s'.", this.name), e
      );
    }
  }

接口实现

TaskActivityFactory 接口要求的 getName() 方法，直接返回前面获取的 name：

  @Override
  public String getName() {
    return name;
  }

TaskActivityFactory 接口要求的 create() 方法，要返回一个 durabletask 的 TaskActivity ，而 TaskActivity 是一个 @FunctionalInterface，仅有一个 run() 方法：

@FunctionalInterface
public interface TaskActivity {
    Object run(TaskActivityContext ctx);
}

因此构建 TaskActivity 实例的方式被简写为：

import com.microsoft.durabletask.TaskActivity;

  @Override
  public TaskActivity create() {
    return ctx -> {
      Object result;
      T activity;
      
      try {
        activity = this.activityConstructor.newInstance();
      } catch (InstantiationException | IllegalAccessException | InvocationTargetException e) {
        throw new RuntimeException(
            String.format("Unable to instantiate instance of activity class '%s'", this.name), e
        );
      }

      result = activity.run(new WorkflowActivityContext(ctx));
      return result;
    };
  }
}

8.6.3.5 - WorkflowActivity实现

WorkflowActivity的代码实现

WorkflowActivity接口定义

WorkflowActivity接口定义了 Activity

public interface WorkflowActivity {
  /**
   * 执行活动逻辑并返回一个值，该值将被序列化并返回给调用的协调器。
   *
   * @param ctx 提供有关当前活动执行的信息，如活动名称和协调程序提供给它的输入数据。
   * @return 要返回给调用协调器的任何可序列化值。
   */
  Object run(WorkflowActivityContext ctx);
}

WorkflowActivity 的 javadoc 描述如下：

任务活动实现的通用接口。

活动(Activity)是 durable task 协调的基本工作单位。活动(Activity)是在业务流程中进行协调的任务。例如，您可以创建一个协调器来处理订单。这些任务包括检查库存、向客户收费和创建装运。每个任务都是一个单独的活动(Activity)。这些活动(Activity)可以串行执行、并行执行或两者结合执行。

与任务协调器不同的是，活动(Activity)在工作类型上不受限制。活动(Activity)函数经常用于进行网络调用或运行 CPU 密集型操作。活动(Activity)还可以将数据返回给协调器函数。 durable task 运行时保证每个被调用的活动(Activity)函数在协调执行期间至少被执行一次。

由于活动(Activity)只能保证至少执行一次，因此建议尽可能将活动(Activity)逻辑作为幂等逻辑来实现。

协调器使用 io.dapr.workflows.WorkflowContext.callActivity 方法重载之一来调度活动。

WorkflowActivityContext

WorkflowActivityContext 简单包装了 durabletask 的 TaskActivityContext ：

import com.microsoft.durabletask.TaskActivityContext;

public class WorkflowActivityContext implements TaskActivityContext {
  private final TaskActivityContext innerContext;

  public WorkflowActivityContext(TaskActivityContext context) throws IllegalArgumentException {
    if (context == null) {
      throw new IllegalArgumentException("Context cannot be null");
    }
    this.innerContext = context;
  }
  ......
}

TaskActivityContext 接口要求的 getName() 和 getInput() 方法都简单代理给了内部的 durabletask 的 TaskActivityContext ：

  public String getName() {
    return this.innerContext.getName();
  }

  public <T> T getInput(Class<T> targetType) {
    return this.innerContext.getInput(targetType);
  }

备注：这样的包装并没有起到隔离 dapr sdk 和 durabletask sdk 的目的，还是紧密的耦合在一起，包装的意义何在？

8.6.4 - client package

client package中的代码

8.6.4.1 - DaprWorkflowClient代码实现

DaprWorkflowClient 的代码实现

定义和创建

类定义

DaprWorkflowClient 定义管理 Dapr 工作流实例的客户端操作。

注意这里是 “管理” ！

import com.microsoft.durabletask.DurableTaskClient;

public class DaprWorkflowClient implements AutoCloseable {

  DurableTaskClient innerClient;
  ManagedChannel grpcChannel;
    
  public DaprWorkflowClient() {
    this(NetworkUtils.buildGrpcManagedChannel());
  }
    
  private DaprWorkflowClient(ManagedChannel grpcChannel) {
    this(createDurableTaskClient(grpcChannel), grpcChannel);
  }
    
  private DaprWorkflowClient(DurableTaskClient innerClient, ManagedChannel grpcChannel) {
    this.innerClient = innerClient;
    this.grpcChannel = grpcChannel;
  }

实现上依然是包装 durabletask 的 DurableTaskClient ，而 durabletask 的 DurableTaskClient 在创建时需要传入一个 grpcChannel。

关键点在于这个 grpcChannel 的创建，可以从外部传入，如果没有传入则可以通过 NetworkUtils.buildGrpcManagedChannel() 方法进行创建。

grpcChannel 的创建

实现和之前 WorkflowRuntimeBuilder 中的一致，都是调用 NetworkUtils.buildGrpcManagedChannel() 方法。

NetworkUtils.buildGrpcManagedChannel() 方法在 dapr java sdk 中一共有3处调用：

WorkflowRuntimeBuilder：

  public WorkflowRuntimeBuilder() {
    this.builder = new DurableTaskGrpcWorkerBuilder().grpcChannel(NetworkUtils.buildGrpcManagedChannel());
  }

DaprWorkflowClient：

  public DaprWorkflowClient() {
    this(NetworkUtils.buildGrpcManagedChannel());
  }

DaprClientBuilder

final ManagedChannel channel = NetworkUtils.buildGrpcManagedChannel();

DurableTaskClient 的创建

DurableTaskClient 的创建是简单的调用 durabletask 的 DurableTaskGrpcClientBuilder 来实现的：

import com.microsoft.durabletask.DurableTaskGrpcClientBuilder;

private static DurableTaskClient createDurableTaskClient(ManagedChannel grpcChannel) {
    return new DurableTaskGrpcClientBuilder()
        .grpcChannel(grpcChannel)
        .build();
  }

close() 方法

close() 方法用于关闭 DaprWorkflowClient，内部实现为关闭包装的 durabletask 的 DurableTaskClient 以及创建时传入的 grpcChannel：

  public void close() throws InterruptedException {
    try {
      if (this.innerClient != null) {
        this.innerClient.close();
        this.innerClient = null;
      }
    } finally {
      if (this.grpcChannel != null && !this.grpcChannel.isShutdown()) {
        this.grpcChannel.shutdown().awaitTermination(5, TimeUnit.SECONDS);
        this.grpcChannel = null;
      }
    }
  }
}

操作 workflow instance

scheduleNewWorkflow() 方法

scheduleNewWorkflow() 方法调度一个新的 workflow ，即创建并开始一个新的 workflow instance，这个方法返回 workflow instance id：

package io.dapr.workflows.client;  

public <T extends Workflow> String scheduleNewWorkflow(Class<T> clazz) {
    return this.innerClient.scheduleNewOrchestrationInstance(clazz.getCanonicalName());
  }

  public <T extends Workflow> String scheduleNewWorkflow(Class<T> clazz, Object input) {
    return this.innerClient.scheduleNewOrchestrationInstance(clazz.getCanonicalName(), input);
  }

  public <T extends Workflow> String scheduleNewWorkflow(Class<T> clazz, Object input, String instanceId) {
    return this.innerClient.scheduleNewOrchestrationInstance(clazz.getCanonicalName(), input, instanceId);
  }

实现完全代理给 durabletask 的 DurableTaskClient 。

terminateWorkflow() 方法

terminateWorkflow() 方法终止一个 workflow instance 的执行，需要传入之前从 scheduleNewWorkflow() 方法中得到的 workflow instance id。

  public void terminateWorkflow(String workflowInstanceId, @Nullable Object output) {
    this.innerClient.terminate(workflowInstanceId, output);
  }

output 参数是可选的，用来传递被终止的 workflow instance 的输出。

getInstanceState() 方法

getInstanceState() 方法获取 workflow instance 的状态，同样需要传入之前从 scheduleNewWorkflow() 方法中得到的 workflow instance id：

  @Nullable
  public WorkflowInstanceStatus getInstanceState(String instanceId, boolean getInputsAndOutputs) {
    OrchestrationMetadata metadata = this.innerClient.getInstanceMetadata(instanceId, getInputsAndOutputs);
    if (metadata == null) {
      return null;
    }
    return new WorkflowInstanceStatus(metadata);
  }

实现为调用 durabletask 的 DurableTaskClient 的 getInstanceMetadata() 方法来获取 OrchestrationMetadata，然后转换为 dapr 定义的 WorkflowInstanceStatus()。

这里的细节在 WorkflowInstanceStatus 类实现中展开。

waitForInstanceStart() 方法

waitForInstanceStart() 方法等待 workflow instance 执行的开始：

  @Nullable
  public WorkflowInstanceStatus waitForInstanceStart(String instanceId, Duration timeout, boolean getInputsAndOutputs)
      throws TimeoutException {

    OrchestrationMetadata metadata = this.innerClient.waitForInstanceStart(instanceId, timeout, getInputsAndOutputs);
    return metadata == null ? null : new WorkflowInstanceStatus(metadata);
  }

waitForInstanceStart() 方法的 javadoc 描述为：

等待工作流开始运行，并返回一个 WorkflowInstanceStatus 对象，该对象包含已启动实例的元数据，以及可选的输入、输出和自定义状态有效载荷。

“已启动” 的工作流实例是指未处于 “Pending” 状态的任何实例。

如果调用该方法时工作流实例已在运行，该方法将立即返回。

waitForInstanceCompletion() 方法

waitForInstanceCompletion() 方法等待 workflow instance 执行的完成：

  @Nullable
  public WorkflowInstanceStatus waitForInstanceCompletion(String instanceId, Duration timeout,
                                                          boolean getInputsAndOutputs) throws TimeoutException {

    OrchestrationMetadata metadata =
        this.innerClient.waitForInstanceCompletion(instanceId, timeout, getInputsAndOutputs);
    return metadata == null ? null : new WorkflowInstanceStatus(metadata);
  }

waitForInstanceStart() 方法的 javadoc 描述为：

等待工作流完成，并返回一个包含已完成实例元数据的 WorkflowInstanceStatus 对象。

“已完成” 的工作流实例是指处于终止状态之一的任何实例。例如，“Completed”、“Failed” 或 “Terminated” 状态。

工作流是长期运行的，可能需要数小时、数天或数月才能完成。工作流也可能是长久的，在这种情况下，除非终止，否则永远不会完成。在这种情况下，该调用可能会无限期阻塞，因此必须注意确保使用适当的超时。如果调用该方法时工作流实例已经完成，该方法将立即返回。

purgeInstance() 方法

purgeInstance() 方法从工作流状态存储中清除工作流实例的状态：

  public boolean purgeInstance(String workflowInstanceId) {
    PurgeResult result = this.innerClient.purgeInstance(workflowInstanceId);
    if (result != null) {
      return result.getDeletedInstanceCount() > 0;
    }
    return false;
  }

如果找到工作流状态并成功清除，则返回 true，否则返回 false。

raiseEvent() 方法

raiseEvent() 方法向等待中的工作流实例发送事件通知消息：

  public void raiseEvent(String workflowInstanceId, String eventName, Object eventPayload) {
    this.innerClient.raiseEvent(workflowInstanceId, eventName, eventPayload);
  }

TaskHub的方法

这两个方法暂时还知道什么情况下用，暂时忽略。

  public void createTaskHub(boolean recreateIfExists) {
    this.innerClient.createTaskHub(recreateIfExists);
  }

  public void deleteTaskHub() {
    this.innerClient.deleteTaskHub();
  }

8.6.4.2 - WorkflowInstanceStatus代码实现

WorkflowInstanceStatus 的代码实现

类定义和构造函数

WorkflowInstanceStatus 代表工作流实例当前状态的快照，包括元数据。

WorkflowInstanceStatus 的实现依然是包装 durabletask，内部是一个 durabletask 的 OrchestrationMetadata，以及 OrchestrationMetadata 携带的 FailureDetails：

import com.microsoft.durabletask.FailureDetails;
import com.microsoft.durabletask.OrchestrationMetadata;

public class WorkflowInstanceStatus {

  private final OrchestrationMetadata orchestrationMetadata;

  @Nullable
  private final WorkflowFailureDetails failureDetails;
    
  public WorkflowInstanceStatus(OrchestrationMetadata orchestrationMetadata) {
    if (orchestrationMetadata == null) {
      throw new IllegalArgumentException("OrchestrationMetadata cannot be null");
    }
    this.orchestrationMetadata = orchestrationMetadata;
    FailureDetails details = orchestrationMetadata.getFailureDetails();
    if (details != null) {
      this.failureDetails = new WorkflowFailureDetails(details);
    } else {
      this.failureDetails = null;
    }
  }

获取 FailureDetails 之后将转为 dapr 的 WorkflowFailureDetails, 这里的细节在 WorkflowFailureDetails 类实现中展开。

各种代理方法

8.6.4.3 - WorkflowFailureDetails代码实现

WorkflowFailureDetails 的代码实现

WorkflowFailureDetails 只是非常简单的包装了 durabletask 的 FailureDetails

public class WorkflowFailureDetails {

  FailureDetails workflowFailureDetails;

  /**
   * Class constructor.
   * @param failureDetails failure Details
   */
  public WorkflowFailureDetails(FailureDetails failureDetails) {
    this.workflowFailureDetails = failureDetails;
  }

然后代理各种方法：

  public String getErrorType() {
    return workflowFailureDetails.getErrorType();
  }

  public String getErrorMessage() {
    return workflowFailureDetails.getErrorMessage();
  }

  public String getStackTrace() {
    return workflowFailureDetails.getStackTrace();
  }