将按照两个维度进行 dapr 源码学习:
-
调用流程
根据每个构建块提供的功能,分析从请求发出到请求处理完成的整个调用流程。
主要目标是了解请求处理的主流程和代码实现方式,以及相关的结构设计,不深入展开细节。
-
代码仓库
按照每个代码仓库来遍历所有代码实现,会展开所有细节。
主要目标是摸清dapr代码实现的每一个角落,实现对代码的全面了解。
目标:深入学习 Dapr 源代码,深度掌握 dapr 设计实现
将按照两个维度进行 dapr 源码学习:
调用流程
根据每个构建块提供的功能,分析从请求发出到请求处理完成的整个调用流程。
主要目标是了解请求处理的主流程和代码实现方式,以及相关的结构设计,不深入展开细节。
代码仓库
按照每个代码仓库来遍历所有代码实现,会展开所有细节。
主要目标是摸清dapr代码实现的每一个角落,实现对代码的全面了解。
目标:深入学习 Dapr 源代码,深度掌握 dapr 设计实现
Dapr runtime 对外提供两个 API,分别是 Dapr HTTP API 和 Dapr gRPC API。另外两个 dapr runtime 之间的通讯 (Dapr internal API) 固定用 gRPC 协议。
两个 Dapr API 对外暴露的端口,默认是:
dapr-http-port
设置dapr-grpc-port
设置Dapr internal API 是内部端口,比较特殊,没有固定的默认值,而是取任意随机可用端口。也可以通过命令行参数 dapr-internal-grpc-port
设置。
为了向服务器端的应用发送请求,dapr 需要获知应用在哪个端口监听并处理请求,这个信息通过命令行参数 app-port
设置。Dapr 的示例中一般喜欢用 3000 端口。
title Service Invoke via HTTP
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
=App-1
----
client
]
participant SDK_client [
=SDK
----
client
]
end box
participant daprd_client [
=daprd
----
client
]
participant daprd_server [
=daprd
----
server
]
box "App-2"
participant user_code_server [
=App-2
----
server
]
end box
user_code_client -> SDK_client : Invoke\nService()
note left: appId="app-2"\nmethodName="method-1"
SDK_client -[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500
|||
daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ ramdon free port
|||
daprd_server -[#blue]> user_code_server : http (localhost)
note right: HTTP endpoint "method-1" @ 3000
daprd_server <[#blue]-- user_code_server
daprd_client <[#red]-- daprd_server
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client
title Service Invoke via gRPC
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
=App-1
----
client
]
participant SDK_client [
=SDK
----
client
]
end box
participant daprd_client [
=daprd
----
client
]
participant daprd_server [
=daprd
----
server
]
box "App-2"
participant SDK_server [
=SDK
----
server
]
participant user_code_server [
=App-2
----
server
]
end box
user_code_server -> SDK_server: AddServiceInvocationHandler("method-1")
SDK_server -> SDK_server: save handler in invokeHandlers["method-1"]
SDK_server --> user_code_server
user_code_client -> SDK_client : Invoke\nService()
note left: appId="app-2"\nmethodName="method-1"
SDK_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/InvokeService
|||
daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ random free port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal
|||
daprd_server -[#blue]> SDK_server : gRPC (localhost)
note right: 50001\n/dapr.proto.runtime.v1.AppCallback/OnInvoke
SDK_server -> SDK_server: get handler by invokeHandlers["method-1"]
SDK_server -> user_code_server : invoke handler of "method-1"
SDK_server <-- user_code_server
daprd_server <[#blue]-- SDK_server
daprd_client <[#red]-- daprd_server
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client
title Service Invoke via gRPC proxying
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
=App-1
----
client
]
participant SDK_client [
=SDK
----
client
]
end box
participant daprd_client [
=daprd
----
client
]
participant daprd_server [
=daprd
----
server
]
box "App-2"
participant SDK_server [
=gRPC
----
server
]
participant user_code_server [
=App-2
----
server
]
end box
user_code_server -> SDK_server
SDK_server --> user_code_server
user_code_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC\n/user.services.ServiceName/Method-1
|||
daprd_client -[#red]> daprd_server : gRPC proxy (remote call)
note right: gRPC\n/user.services.ServiceName/Method-1
|||
daprd_server -[#blue]> SDK_server : gRPC (localhost)
note right: gRPC\n/user.services.ServiceName/Method-1
SDK_server -> user_code_server :
SDK_server <-- user_code_server
daprd_server <[#blue]-- SDK_server
daprd_client <[#red]-- daprd_server
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client
在 dapr runtime 启动进行初始化时,需要开启 API 端口并挂载相应的 handler 来接收并处理服务调用的 outbound 请求。另外为了接收来自其他 dapr runtime 的 inbound 请求,还要启动 dapr internal server。
dapr runtime 的 HTTP server 用的是 fasthttp。
在 dapr runtime 启动时的初始化过程中,会启动 HTTP server, 代码在 pkg/runtime/runtime.go
中:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
// Start HTTP Server
err = a.startHTTPServer(a.runtimeConfig.HTTPPort, a.runtimeConfig.PublicPort, a.runtimeConfig.ProfilePort, a.runtimeConfig.AllowedOrigins, pipeline)
if err != nil {
log.Fatalf("failed to start HTTP server: %s", err)
}
......
}
func (a *DaprRuntime) startHTTPServer(......) error {
a.daprHTTPAPI = http.NewAPI(......)
server := http.NewServer(a.daprHTTPAPI, ......)
if err := server.StartNonBlocking(); err != nil { // StartNonBlocking 启动 fasthttp server
return err
}
}
StartNonBlocking() 的实现代码在 pkg/http/server.go
中:
// StartNonBlocking starts a new server in a goroutine.
func (s *server) StartNonBlocking() error {
......
for _, apiListenAddress := range s.config.APIListenAddresses {
l, err := net.Listen("tcp", fmt.Sprintf("%s:%v", apiListenAddress, s.config.Port))
listeners = append(listeners, l)
}
for _, listener := range listeners {
// customServer is created in a loop because each instance
// has a handle on the underlying listener.
customServer := &fasthttp.Server{
Handler: handler,
MaxRequestBodySize: s.config.MaxRequestBodySize * 1024 * 1024,
ReadBufferSize: s.config.ReadBufferSize * 1024,
StreamRequestBody: s.config.StreamRequestBody,
}
s.servers = append(s.servers, customServer)
go func(l net.Listener) {
if err := customServer.Serve(l); err != nil {
log.Fatal(err)
}
}(listener)
}
}
在 HTTP API 的初始化过程中,会在 fast http server 上挂载 DirectMessaging 的 HTTP 端点,代码在 pkg/http/api.go
中:
func NewAPI(
appID string,
appChannel channel.AppChannel,
directMessaging messaging.DirectMessaging,
......
shutdown func()) API {
api := &api{
appChannel: appChannel,
directMessaging: directMessaging,
......
}
// 附加 DirectMessaging 的 HTTP 端点
api.endpoints = append(api.endpoints, api.constructDirectMessagingEndpoints()...)
}
DirectMessaging 的 HTTP 端点的具体信息在 constructDirectMessagingEndpoints() 方法中:
func (a *api) constructDirectMessagingEndpoints() []Endpoint {
return []Endpoint{
{
Methods: []string{router.MethodWild},
Route: "invoke/{id}/method/{method:*}",
Alias: "{method:*}",
Version: apiVersionV1,
KeepParamUnescape: true,
Handler: a.onDirectMessage,
},
}
}
注意这里的 Route 路径 “invoke/{id}/method/{method:*}", dapr sdk 就是就通过这样的 url 来发起 HTTP 请求。
title Dapr HTTP API
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
client
]
-[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500\n/v1.0/invoke/{id}/method/{method}
|||
<[#blue]-- daprd_client
在 dapr runtime 启动时的初始化过程中,会启动 gRPC server, 代码在 pkg/runtime/runtime.go
中:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
// Create and start internal and external gRPC servers
grpcAPI := a.getGRPCAPI()
err = a.startGRPCAPIServer(grpcAPI, a.runtimeConfig.APIGRPCPort)
......
}
func (a *DaprRuntime) startGRPCAPIServer(api grpc.API, port int) error {
serverConf := a.getNewServerConfig(a.runtimeConfig.APIListenAddresses, port)
server := grpc.NewAPIServer(api, serverConf, a.globalConfig.Spec.TracingSpec, a.globalConfig.Spec.MetricSpec, a.globalConfig.Spec.APISpec, a.proxy)
if err := server.StartNonBlocking(); err != nil {
return err
}
......
}
// NewAPIServer returns a new user facing gRPC API server.
func NewAPIServer(api API, config ServerConfig, ......) Server {
return &server{
api: api,
config: config,
kind: apiServer, // const apiServer = "apiServer"
......
}
}
为了让 dapr runtime 的 gRPC 服务器能挂载 Dapr API,需要进行注册上去。
注册的代码实现在 pkg/grpc/server.go
中, StartNonBlocking() 方法在启动 grpc 服务器时,会进行服务注册:
func (s *server) StartNonBlocking() error {
if s.kind == internalServer {
internalv1pb.RegisterServiceInvocationServer(server, s.api)
} else if s.kind == apiServer {
runtimev1pb.RegisterDaprServer(server, s.api) // 注意:s.api (即 gRPC api 实现) 被传递进去
}
......
}
而 RegisterDaprServer() 方法的实现代码在 pkg/proto/runtime/v1/dapr_grpc.pb.go
:
func RegisterDaprServer(s grpc.ServiceRegistrar, srv DaprServer) {
s.RegisterService(&Dapr_ServiceDesc, srv) // srv 即 gRPC api 实现
}
在文件 pkg/proto/runtime/v1/dapr_grpc.pb.go
中有 Dapr Service 的 grpc 服务定义,这是 protoc 生成的 gRPC 代码。
Dapr_ServiceDesc 中有 Dapr Service 各个方法的定义,和服务调用相关的是 InvokeService
方法:
var Dapr_ServiceDesc = grpc.ServiceDesc{
ServiceName: "dapr.proto.runtime.v1.Dapr",
HandlerType: (*DaprServer)(nil),
Methods: []grpc.MethodDesc{
{
MethodName: "InvokeService", # 注册方法名
Handler: _Dapr_InvokeService_Handler, # 关联实现的 Handler
},
......
},
},
Metadata: "dapr/proto/runtime/v1/dapr.proto",
}
这一段是告诉 gRPC server: 如果收到访问 dapr.proto.runtime.v1.Dapr
服务的 InvokeService
方法的 gRPC 请求,请把请求转给 _Dapr_InvokeService_Handler
处理。
title Dapr gRPC API
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
client
]
-[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/InvokeService
|||
<[#blue]-- daprd_client
而 InvokeService
方法相关联的 handler 方法 _Dapr_InvokeService_Handler
的实现代码是:
func _Dapr_InvokeService_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
in := new(InvokeServiceRequest)
if err := dec(in); err != nil {
return nil, err
}
if interceptor == nil {
return srv.(DaprServer).InvokeService(ctx, in)
}
info := &grpc.UnaryServerInfo{
Server: srv,
FullMethod: "/dapr.proto.runtime.v1.Dapr/InvokeService",
}
handler := func(ctx context.Context, req interface{}) (interface{}, error) {
return srv.(DaprServer).InvokeService(ctx, req.(*InvokeServiceRequest)) // 这里调用的 srv 即 gRPC api 实现
}
return interceptor(ctx, in, info, handler)
}
最后调用到了 DaprServer 接口实现的 InvokeService 方法,也就是 gPRC API 实现。
在 dapr runtime 启动时的初始化过程中,会启动 gRPC internal server, 代码在 pkg/runtime/runtime.go
中:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
err = a.startGRPCInternalServer(grpcAPI, a.runtimeConfig.InternalGRPCPort)
if err != nil {
log.Fatalf("failed to start internal gRPC server: %s", err)
}
log.Infof("internal gRPC server is running on port %v", a.runtimeConfig.InternalGRPCPort)
......
}
func (a *DaprRuntime) startGRPCInternalServer(api grpc.API, port int) error {
serverConf := a.getNewServerConfig([]string{""}, port)
server := grpc.NewInternalServer(api, serverConf, a.globalConfig.Spec.TracingSpec, a.globalConfig.Spec.MetricSpec, a.authenticator, a.proxy)
if err := server.StartNonBlocking(); err != nil {
return err
}
a.apiClosers = append(a.apiClosers, server)
return nil
}
grpc internal server 的端口比较特殊,可以通过命令行参数 “–dapr-internal-grpc-port” 指定,而如果没有指定,是取一个随机的可用端口,而不是取某个固定值。这一点和 dapr HTTP api server 以及 dapr gRPC api server 不同。
具体代码实现在文件 pkg/runtime/cli.go
中:
func FromFlags() (*DaprRuntime, error) {
var daprInternalGRPC int
if *daprInternalGRPCPort != "" {
daprInternalGRPC, err = strconv.Atoi(*daprInternalGRPCPort)
if err != nil {
return nil, errors.Wrap(err, "error parsing dapr-internal-grpc-port")
}
} else {
daprInternalGRPC, err = grpc.GetFreePort()
if err != nil {
return nil, errors.Wrap(err, "failed to get free port for internal grpc server")
}
}
......
}
Dapr gRPC internal API 实现时有点特殊:
darp runtime 的初始化代码中,grpcAPI 对象是 GRPC API Server 和 GRPC Internal Server 共用的:
grpcAPI := a.getGRPCAPI()
err = a.startGRPCAPIServer(grpcAPI, a.runtimeConfig.APIGRPCPort)
err = a.startGRPCInternalServer(grpcAPI, a.runtimeConfig.InternalGRPCPort)
从设计的角度看,这样做不好:混淆了对 outbound 请求和 inbound 请求的处理,影响代码可读性。
为了让 dapr runtime 的 gRPC 服务器能挂载 Dapr internal API,需要进行注册。
注册的代码实现在 pkg/grpc/server.go
中, StartNonBlocking() 方法在启动 grpc 服务器时,会进行服务注册:
func (s *server) StartNonBlocking() error {
if s.kind == internalServer {
internalv1pb.RegisterServiceInvocationServer(server, s.api) // 注意:s.api (即 gRPC api 实现) 被传递进去
} else if s.kind == apiServer {
runtimev1pb.RegisterDaprServer(server, s.api)
}
......
}
而 RegisterServiceInvocationServer() 方法的实现代码在 pkg/proto/internals/v1/service_invocation_grpc.pb.go
:
func RegisterServiceInvocationServer(s grpc.ServiceRegistrar, srv ServiceInvocationServer) {
s.RegisterService(&ServiceInvocation_ServiceDesc, srv) // srv 即 gRPC api 实现
}
在文件 pkg/proto/internals/v1/service_invocation_grpc.pb.go
中有 internal Service 的 grpc 服务定义,这是 protoc 生成的 gRPC 代码。
ServiceInvocation_ServiceDesc 中有两个方法的定义,和服务调用相关的是 CallLocal
方法:
var ServiceInvocation_ServiceDesc = grpc.ServiceDesc{
ServiceName: "dapr.proto.internals.v1.ServiceInvocation",
HandlerType: (*ServiceInvocationServer)(nil),
Methods: []grpc.MethodDesc{
{
MethodName: "CallActor",
Handler: _ServiceInvocation_CallActor_Handler,
},
{
MethodName: "CallLocal",
Handler: _ServiceInvocation_CallLocal_Handler,
},
},
Streams: []grpc.StreamDesc{},
Metadata: "dapr/proto/internals/v1/service_invocation.proto",
}
这一段是告诉 gRPC server: 如果收到访问 dapr.proto.internals.v1.ServiceInvocation
服务的 CallLocal
方法的 gRPC 请求,请把请求转给 _ServiceInvocation_CallLocal_Handler
处理。
title Dapr gRPC internal API
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
client
]
-[#red]> daprd_client : gRPC (remote call)
note right: gRPC API @ ramdon port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal
|||
<[#red]-- daprd_client
而 CallLocal
方法相关联的 handler 方法 _ServiceInvocation_CallLocal_Handler
的实现代码是:
func _ServiceInvocation_CallLocal_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
in := new(InternalInvokeRequest)
if err := dec(in); err != nil {
return nil, err
}
if interceptor == nil {
return srv.(ServiceInvocationServer).CallLocal(ctx, in)
}
info := &grpc.UnaryServerInfo{
Server: srv,
FullMethod: "/dapr.proto.internals.v1.ServiceInvocation/CallLocal",
}
handler := func(ctx context.Context, req interface{}) (interface{}, error) {
// 这里调用的 srv 即 gRPC api 实现
return srv.(ServiceInvocationServer).CallLocal(ctx, req.(*InternalInvokeRequest))
}
return interceptor(ctx, in, info, handler)
}
最后调用到了 ServiceInvocationServer 接口实现的 CallLocal 方法,也就是 gPRC API 实现。
在业务代码中使用 service invoke 功能的示例可参考文件 java-sdk/examples/src/main/java/io/dapr/examples/invoke/http/InvokeClient.java
,代码示意如下:
DaprClient client = (new DaprClientBuilder()).build();
byte[] response = client.invokeMethod(SERVICE_APP_ID, "say", message, HttpExtension.POST, null,
byte[].class).block();
java sdk 中 service invoke 默认使用 HTTP ,而其他方法默认使用 gRPC,在 DaprClientProxy 类中初始化了两个 daprclient:
service invoke 方法默认走 HTTP ,使用的是 DaprClientHttp 类型 (文件为 src/main/java/io/dapr/client/DaprClientHttp.java):
@Override
public <T> Mono<T> invokeMethod(String appId, String methodName,......) {
return methodInvocationOverrideClient.invokeMethod(appId, methodName, request, httpExtension, metadata, clazz);
}
public <T> Mono<T> invokeMethod(InvokeMethodRequest invokeMethodRequest, TypeRef<T> type) {
try {
final String appId = invokeMethodRequest.getAppId();
final String method = invokeMethodRequest.getMethod();
......
Mono<DaprHttp.Response> response = Mono.subscriberContext().flatMap(
context -> this.client.invokeApi(httpMethod, pathSegments,
httpExtension.getQueryParams(), serializedRequestBody, headers, context)
);
}
在这里根据请求条件设置 HTTP 请求的各种参数,debug 时可以看到如下图的数据v:
最后发出 HTTP 请求的代码在 src/main/java/io/dapr/client/DaprHttp.java
中的 doInvokeApi() 方法:
private CompletableFuture<Response> doInvokeApi(String method,
String[] pathSegments,
Map<String, List<String>> urlParameters,
byte[] content, Map<String, String> headers,
Context context) {
......
Request.Builder requestBuilder = new Request.Builder()
.url(urlBuilder.build())
.addHeader(HEADER_DAPR_REQUEST_ID, requestId);
CompletableFuture<Response> future = new CompletableFuture<>();
this.httpClient.newCall(request).enqueue(new ResponseFutureCallback(future));
return future;
}
发出去给 dapr runtime 的 HTTP 请求如下图所示:
调用的是 dapr runtime 的 HTTP API。
注意: 这里调用的 gRPC 服务是 dapr.proto.runtime.v1.Dapr
, 方法是 InvokeService
,和 dapr runtime 中 gRPC API 对应。
title Service Invoke via HTTP
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
=App-1
----
client
]
participant SDK_client [
=SDK
----
client
]
end box
participant daprd_client [
=daprd
----
client
]
user_code_client -> SDK_client : invokeMethod()
note left: appId="app-2"\nmethodName="method-1"
SDK_client -[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500\n/v1.0/invoke/app-2/method/method-1
|||
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client
在 go 业务代码中使用 service invoke 功能的示例可参考 https://github.com/dapr/go-sdk/blob/main/examples/service/client/main.go,代码示意如下:
client, err := dapr.NewClient()
content := &dapr.DataContent{
ContentType: "text/plain",
Data: []byte("hellow"),
}
// invoke a method named "app-2" on another dapr enabled service named "method-1"
resp, err := client.InvokeMethodWithContent(ctx, "app-2", "method-1", "post", content)
Go sdk 中定义了 Client 接口,文件为 client/client.go
:
// Client is the interface for Dapr client implementation.
type Client interface {
// InvokeMethod invokes service without raw data
InvokeMethod(ctx context.Context, appID, methodName, verb string) (out []byte, err error)
// InvokeMethodWithContent invokes service with content
InvokeMethodWithContent(ctx context.Context, appID, methodName, verb string, content *DataContent) (out []byte, err error)
// InvokeMethodWithCustomContent invokes app with custom content (struct + content type).
InvokeMethodWithCustomContent(ctx context.Context, appID, methodName, verb string, contentType string, content interface{}) (out []byte, err error)
......
}
这三个方法的实现在 client/invoke.go
中,都只是实现了对 InvokeRequest 对象的组装,核心的代码实现在 invokeServiceWithRequest 方法中::
func (c *GRPCClient) invokeServiceWithRequest(ctx context.Context, req *pb.InvokeServiceRequest) (out []byte, err error) {
resp, err := c.protoClient.InvokeService(c.withAuthToken(ctx), req)
......
}
InvokeService() 是 protoc 生成的 grpc 代码,在 dapr/proto/runtime/v1/dapr_grpc.pb.go
中,实现如下:
func (c *daprClient) InvokeService(ctx context.Context, in *InvokeServiceRequest, opts ...grpc.CallOption) (*v1.InvokeResponse, error) {
out := new(v1.InvokeResponse)
err := c.cc.Invoke(ctx, "/dapr.proto.runtime.v1.Dapr/InvokeService", in, out, opts...)
......
}
注意: 这里调用的 gRPC 服务是 dapr.proto.runtime.v1.Dapr
, 方法是 InvokeService
,和 dapr runtime 中 gRPC API 对应。
title Service Invoke via gRPC
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
=App-1
----
client
]
participant SDK_client [
=SDK
----
client
]
end box
participant daprd_client [
=daprd
----
client
]
user_code_client -> SDK_client : InvokeMethodWithContent()
note left: appId="app-2"\nmethodName="method-1"
SDK_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/InvokeService
|||
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client
TODO
所有的语言 SDK 都会实现了从客户端 SDK API 调用到发出远程调用请求给 dapr runtime的功能。具体实现上会有一些差别:
go sdk
全部请求走 gPRC API。
Java sdk
其他SDK
Dapr runtime 有两种方式接收来自客户端发起的服务调用的 outbound 请求:gRPC API 和 HTTP API。在接收到请求之后,dapr runtime 会将 outbound 请求转发给目标服务的 dapr runtime。
title Daprd Receive inbound Request
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
client
]
participant daprd_server [
=daprd
----
server
]
-[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500 \n/v1.0/invoke/app-2/method/method-1
-[#blue]> daprd_client : gRPC (localhost)
note right: GRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/InvokeService
|||
daprd_client -> daprd_client: name resolution
|||
daprd_client -[#red]> daprd_server : gRPC (remote call)
Runtime 初始化时,在注册 HTTP 服务时绑定了 handler 实现和 URL 路由:
func (a *api) constructDirectMessagingEndpoints() []Endpoint {
return []Endpoint{
{
Methods: []string{router.MethodWild},
Route: "invoke/{id}/method/{method:*}",
Alias: "{method:*}",
Version: apiVersionV1,
KeepParamUnescape: true,
Handler: a.onDirectMessage,
},
}
}
当 service invoke 的 HTTP 请求进来后,就会被 fasthttp 路由到 Handler 即 HTTP API 实现的 onDirectMessage() 方法中进行处理。
onDirectMessage 的实现代码在文件 pkg/http/api.go
, 示意如下:
func (a *api) onDirectMessage(reqCtx *fasthttp.RequestCtx) {
......
req := invokev1.NewInvokeMethodRequest(...)
resp, err := a.directMessaging.Invoke(reqCtx, targetID, req)
......
}
备注: HTTP API 的这个 onDirectMessage() 方法取名不对,应该效仿 gRPC API,取名为 InvokeService(). 理由是:这是暴露给外部调用的方法,取名应该表现出它对外暴露的功能,即InvokeService。而不应该暴露内部的实现是调用 directMessaging。
HTTP API 的实现也简单,同样,除了基本的请求/应答参数处理之外,就是将转发请求的事情交给了 directMessaging。
Runtime 初始化时,在注册 gRPC 服务时绑定了 gPRC API 实现和 InvokeService gRPC 方法。
当 service invoke 的 gRPC 请求进来后,就会进入 pkc/grpc/api.go
中的 InvokeService 方法:
func (a *api) InvokeService(ctx context.Context, in *runtimev1pb.InvokeServiceRequest) (*commonv1pb.InvokeResponse, error) {
......
resp, err := a.directMessaging.Invoke(ctx, in.Id, req)
......
return resp.Message(), respError
}
gRPC API 的实现特别简单,除了基本的请求/应答参数处理之外,就是将转发请求的事情交给了 directMessaging。
TBD
Dapr runtime 之间相互通讯采用的是 gRPC 协议,定义有 Dapr gRPC internal API。比较特殊的是,采用随机空闲端口而不是默认端口。但也可以通过命令行参数 dapr-internal-grpc-port
指定。
title Daprd-Daprd Communication
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
client
]
participant daprd_server [
=daprd
----
server
]
-[#blue]> daprd_client : HTTP (localhost)
-[#blue]> daprd_client : gRPC (localhost)
|||
daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ ramdon free port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal
pkg/messaging/direct_messaging.go
中的 DirectMessaging 负责实现转发请求给远程 dapr runtime。
DirectMessaging 接口定义,用来调用远程应用:
// DirectMessaging is the API interface for invoking a remote app.
type DirectMessaging interface {
Invoke(ctx context.Context, targetAppID string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error)
}
只有一个 invoke 方法。
invoke 方法的实现:
func (d *directMessaging) Invoke(ctx context.Context, targetAppID string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
app, err := d.getRemoteApp(targetAppID)
if app.id == d.appID && app.namespace == d.namespace {
return d.invokeLocal(ctx, req) // 如果调用的 appid 就是自己的 appid,这个场景好奇怪。忽略这里的代码先
}
return d.invokeWithRetry(ctx, retry.DefaultLinearRetryCount, retry.DefaultLinearBackoffInterval, app, d.invokeRemote, req)
}
invokeRemote 方法的代码简化如下:
func (d *directMessaging) invokeRemote(ctx context.Context, appID, namespace, appAddress string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
// 建立连接
conn, err := d.connectionCreatorFn(context.TODO(), appAddress, appID, namespace, false, false, false)
// 构建 gRPC stub 作为 client
clientV1 := internalv1pb.NewServiceInvocationClient(conn)
// 调用 gRPC 的 CallLocal 方法发出远程调用请求到另外一个 Dapr runtime
resp, err := clientV1.CallLocal(ctx, req.Proto(), opts...)
// 处理应答
return invokev1.InternalInvokeResponse(resp)
}
CallLocal() 方法的实现在 service_invocation_grpc.pb.go
中,这是 protoc 成生的 gRPC 代码:
func (c *serviceInvocationClient) CallLocal(ctx context.Context, in *InternalInvokeRequest, opts ...grpc.CallOption) (*InternalInvokeResponse, error) {
out := new(InternalInvokeResponse)
err := c.cc.Invoke(ctx, "/dapr.proto.internals.v1.ServiceInvocation/CallLocal", in, out, opts...)
if err != nil {
return nil, err
}
return out, nil
}
可以看到这个 gRPC 请求调用的是 dapr.proto.internals.v1.ServiceInvocation 服务的 CallLocal 方法。
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
client
]
participant daprd_server [
=daprd
----
server
]
daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ ramdon free port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal
hide footbox
skinparam style strictuml
participant directMessaging
participant "Name resolver\n(consul/kubenetes/mdns)" as localNameReSolver
directMessaging -> localNameReSolver : ResolveID()
localNameReSolver -> localNameReSolver: loadBalance()
note right: kubernetes: dns name\ndns: dns name\nconsul: one address(random)\nmdsn: one address(round robbin)
localNameReSolver --> directMessaging
note right: return only one address in local cluster
hide footbox
skinparam style strictuml
participant directMessaging
participant "Local Name resolver\n(consul/kubenetes/mdns)" as localNameReSolver
participant "External Name resolver\n(synchronizer)" as externalNameReSolver
directMessaging -> localNameReSolver : ResolveID()
localNameReSolver --> directMessaging
note right: return service instance list in local cluster
directMessaging -[#red]> externalNameReSolver : ResolveID()
externalNameReSolver --> directMessaging
note right: return service instance list in external clusters
directMessaging -[#red]> directMessaging: combine the instance list
directMessaging -[#red]> directMessaging: filter by cluster strategy
note right: local-first\nexternal-first\nbroadcast\nlocal-only\nexternal-onluy
directMessaging -> directMessaging: loadBalance()
Dapr runtime 之间相互通讯走的是 gRPC internal API,这个 API 也只支持 gRPC 协议。
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
client
]
participant daprd_server [
=daprd
----
server
]
daprd_client -[#red]> daprd_server : gRPC (remote call)
note right: internal API @ ramdon free port\n/dapr.proto.internals.v1.ServiceInvocation/CallLocal
daprd_server -> daprd_server : interceptor
daprd_server -[#blue]> : appChannel.InvokeMethod()
Runtime 初始化时,在注册 gRPC 服务时绑定了 gPRC Internal API 实现和 CallLocal gRPC 方法。对于访问 dapr.proto.internals.v1.ServiceInvocation
服务的 CallLocal
方法的 gRPC 请求,会将请求转给 _ServiceInvocation_CallLocal_Handler
处理:
func _ServiceInvocation_CallLocal_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
......
if interceptor == nil {
return srv.(ServiceInvocationServer).CallLocal(ctx, in)
}
info := &grpc.UnaryServerInfo{
Server: srv,
FullMethod: "/dapr.proto.internals.v1.ServiceInvocation/CallLocal",
}
handler := func(ctx context.Context, req interface{}) (interface{}, error) {
// 这里调用的 srv 即 gRPC api 实现
return srv.(ServiceInvocationServer).CallLocal(ctx, req.(*InternalInvokeRequest))
}
return interceptor(ctx, in, info, handler)
}
最后进入 CallLocal() 方法进行处理。
备注:初始化的细节,请见前面章节 “Runtime初始化”
期间会有一个 interceptor 的处理流程,细节后面展开。
当 internal invoke 的 gRPC 请求进来后,就会进入 pkc/grpc/api.go
中的 CallLocal 方法:
func (a *api) CallLocal(ctx context.Context, in *internalv1pb.InternalInvokeRequest) (*internalv1pb.InternalInvokeResponse, error) {
// 1. 构造请求
req, err := invokev1.InternalInvokeRequest(in)
if a.accessControlList != nil {
......
}
// 2. 通过 appChannel 向应用发出请求
resp, err := a.appChannel.InvokeMethod(ctx, req)
// 3. 处理应答
return resp.Proto(), err
}
处理方式很清晰,基本上就是将请求通过 app channel 转发。Runtime 本身并没有什么额外的处理逻辑。InternalInvokeRequest() 只是简单处理一下参数:
// InternalInvokeRequest creates InvokeMethodRequest object from InternalInvokeRequest pb object.
func InternalInvokeRequest(pb *internalv1pb.InternalInvokeRequest) (*InvokeMethodRequest, error) {
req := &InvokeMethodRequest{r: pb}
if pb.Message == nil {
return nil, errors.New("Message field is nil")
}
return req, nil
}
期间会有一个 access control (访问控制)的逻辑:
if a.accessControlList != nil {
// An access control policy has been specified for the app. Apply the policies.
operation := req.Message().Method
var httpVerb commonv1pb.HTTPExtension_Verb
// Get the http verb in case the application protocol is http
if a.appProtocol == config.HTTPProtocol && req.Metadata() != nil && len(req.Metadata()) > 0 {
httpExt := req.Message().GetHttpExtension()
if httpExt != nil {
httpVerb = httpExt.GetVerb()
}
}
callAllowed, errMsg := acl.ApplyAccessControlPolicies(ctx, operation, httpVerb, a.appProtocol, a.accessControlList)
if !callAllowed {
return nil, status.Errorf(codes.PermissionDenied, errMsg)
}
}
细节后面展开。
Dapr runtime 将 inbound 请求转发给服务器端应用:
title Daprd-Daprd Communication
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
client
]
participant daprd_server [
=daprd
----
server
]
participant user_code_server [
=App-2
----
server
]
daprd_client -[#red]> daprd_server : Dapr gRPC internal API (remote call)
daprd_server -[#blue]> user_code_server : Dapr HTTP channel API (localhost)
note right: HTTP endpoint @ 3000\nVERB http://localhost:3000/method?query1=value1
daprd_server -[#blue]> user_code_server : Dapr gRPC channel API (localhost)
note right: gRPC endpoint @ 3000\n/dapr.proto.runtime.v1.AppCallback/OnInvoke
app-port
指定,默认是 HTTPapp-protocol
指定,没有默认值。app-max-concurrency
指定。前面分析过,当 internal invoke 的 gRPC 请求进来后,就会进入 pkc/grpc/api.go
中的 CallLocal 方法:
func (a *api) CallLocal(ctx context.Context, in *internalv1pb.InternalInvokeRequest) (*internalv1pb.InternalInvokeResponse, error) {
......
resp, err := a.appChannel.InvokeMethod(ctx, req)
......
}
然后通过 appChannel 发送请求。
app channel 的建立是在 runtime 初始化时,在 pkg/runtime/runtime.go
的 initRuntime() 方法中:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
a.blockUntilAppIsReady()
err = a.createAppChannel()
a.daprHTTPAPI.SetAppChannel(a.appChannel)
grpcAPI.SetAppChannel(a.appChannel)
......
}
createAppChannel() 的实现,目前只支持 HTTP 和 gRPC:
func (a *DaprRuntime) createAppChannel() error {
// 为了建立 app channel,必须配置有 app port
if a.runtimeConfig.ApplicationPort > 0 {
var channelCreatorFn func(port, maxConcurrency int, spec config.TracingSpec, sslEnabled bool, maxRequestBodySize int, readBufferSize int) (channel.AppChannel, error)
switch a.runtimeConfig.ApplicationProtocol {
case GRPCProtocol:
channelCreatorFn = a.grpc.CreateLocalChannel
case HTTPProtocol:
channelCreatorFn = http_channel.CreateLocalChannel
default:
// 只支持 HTTP 和 gRPC
return errors.Errorf("cannot create app channel for protocol %s", string(a.runtimeConfig.ApplicationProtocol))
}
ch, err := channelCreatorFn(a.runtimeConfig.ApplicationPort, a.runtimeConfig.MaxConcurrency, a.globalConfig.Spec.TracingSpec, a.runtimeConfig.AppSSL, a.runtimeConfig.MaxRequestBodySize, a.runtimeConfig.ReadBufferSize)
a.appChannel = ch
} else {
log.Warn("app channel is not initialized. did you make sure to configure an app-port?")
}
return nil
}
和 app channel 密切相关的三个配置项,可以从命令行参数中获取:
func FromFlags() (*DaprRuntime, error) {
......
appPort := flag.String("app-port", "", "The port the application is listening on")
appProtocol := flag.String("app-protocol", string(HTTPProtocol), "Protocol for the application: grpc or http")
appMaxConcurrency := flag.Int("app-max-concurrency", -1, "Controls the concurrency level when forwarding requests to user code")
TracingSpec / AppSSL / MaxRequestBodySize / ReadBufferSize 后面细说,先不展开。
HTTP Channel 的实现在文件 pkg/channel/http/http_channel.go
中,其 InvokeMethod()方法:
func (h *Channel) InvokeMethod(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
......
switch req.APIVersion() {
case internalv1pb.APIVersion_V1:
rsp, err = h.invokeMethodV1(ctx, req)
......
return rsp, err
}
暂时只有 invokeMethodV1 版本:
func (h *Channel) invokeMethodV1(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
// 1. 构建HTTP请求
channelReq := h.constructRequest(ctx, req)
// 2. 发送请求到应用
err := h.client.DoTimeout(channelReq, resp, channel.DefaultChannelRequestTimeout)
// 3. 处理返回的应答
rsp := h.parseChannelResponse(req, resp, err)
return rsp, nil
}
这是将收到的请求内容,转成HTTP协议的标准格式,然后通过 fasthttp 发给用户代码。其中转为标准http请求的代码在方法 constructRequest() 中:
func (h *Channel) constructRequest(ctx context.Context, req *invokev1.InvokeMethodRequest) *fasthttp.Request {
var channelReq = fasthttp.AcquireRequest()
// Construct app channel URI: VERB http://localhost:3000/method?query1=value1
uri := fmt.Sprintf("%s/%s", h.baseAddress, req.Message().GetMethod())
channelReq.SetRequestURI(uri)
channelReq.URI().SetQueryString(req.EncodeHTTPQueryString())
channelReq.Header.SetMethod(req.Message().HttpExtension.Verb.String())
// Recover headers
invokev1.InternalMetadataToHTTPHeader(ctx, req.Metadata(), channelReq.Header.Set)
......
}
这样在服务器端的用户代码中,就可以用不引入 dapr sdk,只需要提供标准 http endpoint 即可。
title Daprd-Daprd Communication
hide footbox
skinparam style strictuml
participant daprd_server [
=daprd
----
server
]
participant user_code_server [
=App-2
----
server
]
daprd_server -[#blue]> user_code_server : HTTP (localhost)
note right: HTTP endpoint @ 3000\nVERB http://localhost:3000/method?query1=value1
pkg/grpc/grpc.go
中的 CreateLocalChannel() 方法:
// CreateLocalChannel creates a new gRPC AppChannel.
func (g *Manager) CreateLocalChannel(port, maxConcurrency int, spec config.TracingSpec, sslEnabled bool, maxRequestBodySize int, readBufferSize int) (channel.AppChannel, error) {
// IP地址写死了 127.0.0.1
conn, err := g.GetGRPCConnection(context.TODO(), fmt.Sprintf("127.0.0.1:%v", port), "", "", true, false, sslEnabled)
......
g.AppClient = conn
ch := grpc_channel.CreateLocalChannel(port, maxConcurrency, conn, spec, maxRequestBodySize, readBufferSize)
return ch, nil
}
实现代码在 pkg/channel/grpc/grpc_channel.go
的 InvokeMethod()方法中:
func (g *Channel) InvokeMethod(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
......
switch req.APIVersion() {
case internalv1pb.APIVersion_V1:
rsp, err = g.invokeMethodV1(ctx, req)
......
return rsp, err
}
暂时只有 invokeMethodV1 版本:
func (g *Channel) invokeMethodV1(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
// 1. 创建 AppCallback 的 grpc client
clientV1 := runtimev1pb.NewAppCallbackClient(g.client)
// 2. 调用 AppCallback 的 OnInvoke() 方法
resp, err := clientV1.OnInvoke(ctx, req.Message(), grpc.Header(&header), grpc.Trailer(&trailer))
// 3. 处理返回的应答
return rsp.WithMessage(resp), nil
}
gRPC channel 是通过 gRPC 协议调用服务器端应用上的 gRPC 服务完成,具体是 AppCallback 的 OnInvoke() 方法。
title Dapr gRPC Channel
hide footbox
skinparam style strictuml
participant daprd_server [
=daprd
----
server
]
participant user_code_server [
=App-2
----
server
]
daprd_server -[#blue]> user_code_server : gRPC (localhost)
note right: gRPC endpoint @ 3000\n/dapr.proto.runtime.v1.AppCallback/OnInvoke
也就是说:如果要支持 gRPC channel,则要求服务器端应用必须实现 AppCallback gRPC 服务器,这一点和 HTTP 不同,对服务器端应用是有侵入的。
pkg/proto/runtime/v1/appcallback.pb.go
中的 OnInvoke 方法:
// AppCallbackServer is the server API for AppCallback service.
type AppCallbackServer interface {
// Invokes service method with InvokeRequest.
OnInvoke(context.Context, *v1.InvokeRequest) (*v1.InvokeResponse, error)
}
为了接收来自daprd转发的来自客户端的service invoke 请求,服务器端的应用也需要做一些处理。
对于通过 HTTP channel 过来的标准HTTP请求,服务器端的应用只需要提供标准的HTTP端口即可,无须引入dapr SDK。
title Daprd-Daprd Communication
hide footbox
skinparam style strictuml
participant daprd_server [
=daprd
----
server
]
participant user_code_server [
=App-2
----
server
]
daprd_server -[#blue]> user_code_server : HTTP (localhost)
note right: HTTP endpoint @ 3000\nVERB http://localhost:3000/method?query1=value1
对于通过 gRPC channel 过来的 gRPC 请求,服务器端的应用则需要实现 gRPC AppCallback 服务的 OnInvoke() 方法:
title Dapr gRPC Channel
hide footbox
skinparam style strictuml
participant daprd_server [
=daprd
----
server
]
participant user_code_server [
=App-2
----
server
]
daprd_server -[#blue]> user_code_server : gRPC (localhost)
note right: gRPC endpoint @ 3000\n/dapr.proto.runtime.v1.AppCallback/OnInvoke
AppCallbackServer 的 proto 定义在 dapr 仓库下的文件dapr/proto/runtime/v1/appcallback.proto
中:
service AppCallback {
// Invokes service method with InvokeRequest.
rpc OnInvoke (common.v1.InvokeRequest) returns (common.v1.InvokeResponse) {}
......
}
而 AppCallbackServer 的具体实现则分布在各个不同语言的 sdk 里面。
实现在 go-sdk 的 service/grpc/invoke.go
文件的 OnInvoke方法,主要流程为:
func (s *Server) OnInvoke(ctx context.Context, in *cpb.InvokeRequest) (*cpb.InvokeResponse, error) {
if fn, ok := s.invokeHandlers[in.Method]; ok {
e := &cc.InvocationEvent{}
ct, er := fn(ctx, e)
return &cpb.InvokeResponse{......}, nil
}
return nil, fmt.Errorf("method not implemented: %s", in.Method)
}
其中 s.invokeHandlers
中保存处理请求的方法(由参数method作为key)。AddServiceInvocationHandler() 用于增加方法名和 handler 的映射 :
// Server is the gRPC service implementation for Dapr.
type Server struct {
invokeHandlers map[string]common.ServiceInvocationHandler
}
type ServiceInvocationHandler func(ctx context.Context, in *InvocationEvent) (out *Content, err error)
func (s *Server) AddServiceInvocationHandler(method string, fn func(ctx context.Context, in *cc.InvocationEvent) (our *cc.Content, err error)) error {
s.invokeHandlers[method] = fn
return nil
}
这意味着,在服务器端的应用中,并不需要为这些方法提供 gRPC 相关的 proto 定义,也不需要直接通过 gRPC 把这些方法暴露出去,只需要实现 AppCallback 的 OnInvode() 方法,然后把需要对外暴露的方法注册即可,OnInvode() 方法相当于一个简单的 API 网管。
title Dapr AppCallback OnInvoke gRPC impl
hide footbox
skinparam style strictuml
participant AppCallback [
=AppCallback
----
OnInvoke()
]
participant invokeHandlers
participant handler
-[#blue]> AppCallback : gRPC OnInvode()
note right: gRPC endpoint @ 3000\n/dapr.proto.runtime.v1.AppCallback/OnInvoke
AppCallback -> invokeHandlers: find handler by method name
invokeHandlers --> AppCallback: registered handler
AppCallback -> handler: call handler
note right: type ServiceInvocationHandler \nfunc(ctx context.Context, in *InvocationEvent) \n(out *Content, err error)
handler --> AppCallback
<-[#blue]- AppCallback
用户在开发支持 dapr 的 go 服务器端应用时,需要在应用中启动 dapr service server,然后添加各种 handler,包括 ServiceInvocationHandler,如下面这个例子(go-sdk下的 example/serving/grpc/main.go
):
func main() {
// create a Dapr service server
s, err := daprd.NewService(":50001")
// add a service to service invocation handler
if err := s.AddServiceInvocationHandler("echo", echoHandler); err != nil {
log.Fatalf("error adding invocation handler: %v", err)
}
// start the server
if err := s.Start(); err != nil {
log.Fatalf("server error: %v", err)
}
}
java SDK 中没有找到服务器端实现的代码?待确定。
Name resolvers provide a common way to interact with different name resolvers, which are used to return the address or IP of other services your applications may connect to.
命名解析器提供了一种与不同命名解析器互动的通用方法,这些解析器用于返回你的应用程序可能要连接到的其他服务的地址或IP。
兼容的名称解析器需要实现 nameresolution.go
文件中的 Resolver
接口。
// Resolver是命名解析器的接口。
type Resolver interface {
// Init initializes name resolver.
Init(metadata Metadata) error
// ResolveID resolves name to address.
ResolveID(req ResolveRequest) (string, error)
}
// ResolveRequest 表示服务发现解析器请求。
type ResolveRequest struct {
ID string
Namespace string
Port int
Data map[string]string
}
name resolver 被调用的地方只有一个:
func (d *directMessaging) getRemoteApp(appID string) (remoteApp, error) {
// 从appID中获取id和namespace
// appID 可能是类似 "appID.namespace" 的格式
id, namespace, err := d.requestAppIDAndNamespace(appID)
if err != nil {
return remoteApp{}, err
}
// 执行 resolver 的解析
request := nr.ResolveRequest{ID: id, Namespace: namespace, Port: d.grpcPort}
address, err := d.resolver.ResolveID(request)
if err != nil {
return remoteApp{}, err
}
// 返回 remoteApp 的地址
return remoteApp{
namespace: namespace,
id: id,
address: address,
}, nil
}
解析出来的地址在 directMessaging 的 Invoke() 中使用,用来执行远程调用:
// Invoke takes a message requests and invokes an app, either local or remote.
func (d *directMessaging) Invoke(ctx context.Context, targetAppID string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
app, err := d.getRemoteApp(targetAppID)
if err != nil {
return nil, err
}
// 如果目标应用的 id 和 namespace 都和 directMessaging 的一致,则执行 invokeLocal()
if app.id == d.appID && app.namespace == d.namespace {
return d.invokeLocal(ctx, req)
}
// 这是在带有重试机制的情况下调用 invokeRemote
return d.invokeWithRetry(ctx, retry.DefaultLinearRetryCount, retry.DefaultLinearBackoffInterval, app, d.invokeRemote, req)
}
invokeWithRetry() 中忽略重试的代码:
func (d *directMessaging) invokeWithRetry(
ctx context.Context,
numRetries int,
backoffInterval time.Duration,
app remoteApp,
fn func(ctx context.Context, appID, namespace, appAddress string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error),
req *invokev1.InvokeMethodRequest,
) (*invokev1.InvokeMethodResponse, error) {
}
invokeRemote()
func (d *directMessaging) invokeRemote(ctx context.Context, appID, namespace, appAddress string, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
//
conn, teardown, err := d.connectionCreatorFn(context.TODO(), appAddress, appID, namespace, false, false, false)
defer teardown()
if err != nil {
return nil, err
}
ctx = d.setContextSpan(ctx)
d.addForwardedHeadersToMetadata(req)
d.addDestinationAppIDHeaderToMetadata(appID, req)
clientV1 := internalv1pb.NewServiceInvocationClient(conn)
var opts []grpc.CallOption
opts = append(opts, grpc.MaxCallRecvMsgSize(d.maxRequestBodySize*1024*1024), grpc.MaxCallSendMsgSize(d.maxRequestBodySize*1024*1024))
resp, err := clientV1.CallLocal(ctx, req.Proto(), opts...)
if err != nil {
return nil, err
}
return invokev1.InternalInvokeResponse(resp)
}
跳过细节和错误处理,尤其是去除所有同步保护代码(很复杂),只简单看输入和输出:
// ResolveID 通过 mDNS 将名称解析为地址。
func (m *Resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {
m.browseOne(ctx, req.ID, published)
select {
case addr := <-sub.AddrChan:
return addr, nil
case err := <-sub.ErrChan:
return "", err
case <-time.After(subscriberTimeout):
return "", fmt.Errorf("timeout waiting for address for app id %s", req.ID)
}
}
func (m *Resolver) browseOne(ctx context.Context, appID string, published chan struct{}) {
err := m.browse(browseCtx, appID, onFirst)
}
注意:只用到了 req.ID, 全程没有使用 req.Namespace,也就是 MDNS 根本不支持 Namespace.
mdns 的核心实现在 browseOne() 方法中:
func (m *Resolver) browseOne(ctx context.Context, appID string, published chan struct{}) {
// 启动一个 goroutine 异步执行
go func() {
var addr string
browseCtx, cancel := context.WithCancel(ctx)
defer cancel()
// 准备回调函数,收到第一个地址之后就取消 browse,所以这个函数名为 browseOne
onFirst := func(ip string) {
addr = ip
cancel() // cancel to stop browsing.
}
m.logger.Debugf("Browsing for first mDNS address for app id %s", appID)
// 执行 browse
err := m.browse(browseCtx, appID, onFirst)
// 忽略错误处理
......
m.pubAddrToSubs(appID, addr)
published <- struct{}{} // signal that all subscribers have been notified.
}()
}
继续看 browse 的实现:
// browse 将对所提供的 App ID 进行无阻塞的 mdns 网络浏览
func (m *Resolver) browse(ctx context.Context, appID string, onEach func(ip string)) error {
......
}
首先通过 zeroconf.NewResolver 构建一个 Resolver:
import "github.com/grandcat/zeroconf"
resolver, err := zeroconf.NewResolver(nil)
if err != nil {
return fmt.Errorf("failed to initialize resolver: %w", err)
}
......
zeroconf 是一个纯Golang库,采用多播 DNS-SD 来浏览和解析网络中的服务,并在本地网络中注册自己的服务。
执行mdns解析的代码是 resolver.Browse() 方法,解析的结果会异步发送到 entries 这个 channel 中:
entries := make(chan *zeroconf.ServiceEntry)
if err = resolver.Browse(ctx, appID, "local.", entries); err != nil {
return fmt.Errorf("failed to browse: %w", err)
}
每个从 mDNS browse 返回的 service entry 会这样处理:
// handle each service entry returned from the mDNS browse.
go func(results <-chan *zeroconf.ServiceEntry) {
for {
select {
case entry := <-results:
if entry == nil {
break
}
// 调用 handleEntry 方法来处理每个返回的 service entry
handleEntry(entry)
case <-ctx.Done():
// 如果所有 service entry 都处理完成了,或者是出错(取消或者超时)
// 此时需要推出 browse,但在退出之前需要检查一下是否有已经收到但还没有处理的结果
for len(results) > 0 {
handleEntry(<-results)
}
if errors.Is(ctx.Err(), context.Canceled) {
m.logger.Debugf("mDNS browse for app id %s canceled.", appID)
} else if errors.Is(ctx.Err(), context.DeadlineExceeded) {
m.logger.Debugf("mDNS browse for app id %s timed out.", appID)
}
return // stop listening for results.
}
}
}(entries)
handleEntry() 方法的实现:
handleEntry := func(entry *zeroconf.ServiceEntry) {
for _, text := range entry.Text {
// 检查appID看是否是自己要查找的app
if text != appID {
m.logger.Debugf("mDNS response doesn't match app id %s, skipping.", appID)
break
}
m.logger.Debugf("mDNS response for app id %s received.", appID)
// 检查是否有 IPv4 或者 ipv6 地址
hasIPv4Address := len(entry.AddrIPv4) > 0
hasIPv6Address := len(entry.AddrIPv6) > 0
if !hasIPv4Address && !hasIPv6Address {
m.logger.Debugf("mDNS response for app id %s doesn't contain any IPv4 or IPv6 addresses, skipping.", appID)
break
}
var addr string
port := entry.Port
// 目前只支持取第一个地址
// TODO: we currently only use the first IPv4 and IPv6 address.
// We should understand the cases in which additional addresses
// are returned and whether we need to support them.
// 加入到缓存中,缓存后面细看
if hasIPv4Address {
addr = fmt.Sprintf("%s:%d", entry.AddrIPv4[0].String(), port)
m.addAppAddressIPv4(appID, addr)
}
if hasIPv6Address {
addr = fmt.Sprintf("%s:%d", entry.AddrIPv6[0].String(), port)
m.addAppAddressIPv6(appID, addr)
}
// 开始回调,就是前面说的拿到第一个地址就取消 browse
if onEach != nil {
onEach(addr) // invoke callback.
}
}
}
至此就完成了 mdns 的解析,从 ID 到 address。
mdns 是非常慢的,为了性能就需要缓存解析后的地址,前面的代码在解析完成之后会保存这些地址:
// addAppAddressIPv4 adds an IPv4 address to the
// cache for the provided app id.
func (m *Resolver) addAppAddressIPv4(appID string, addr string) {
m.ipv4Mu.Lock()
defer m.ipv4Mu.Unlock()
m.logger.Debugf("Adding IPv4 address %s for app id %s cache entry.", addr, appID)
if _, ok := m.appAddressesIPv4[appID]; !ok {
var addrList addressList
m.appAddressesIPv4[appID] = &addrList
}
m.appAddressesIPv4[appID].add(addr)
}
在解析之前,在 ResolveID() 方法中会线尝试检查缓存中是否有数据,如果有就直接使用:
func (m *Resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {
// check for cached IPv4 addresses for this app id first.
if addr := m.nextIPv4Address(req.ID); addr != nil {
return *addr, nil
}
// check for cached IPv6 addresses for this app id second.
if addr := m.nextIPv6Address(req.ID); addr != nil {
return *addr, nil
}
......
}
从缓存中获取appID对应的地址:
// nextIPv4Address returns the next IPv4 address for
// the provided app id from the cache.
func (m *Resolver) nextIPv4Address(appID string) *string {
m.ipv4Mu.RLock()
defer m.ipv4Mu.RUnlock()
addrList, exists := m.appAddressesIPv4[appID]
if exists {
addr := addrList.next()
if addr != nil {
m.logger.Debugf("found mDNS IPv4 address in cache: %s", *addr)
return addr
}
}
return nil
}
addrList.next() 比较有意思,这里不是要获取地址列表,而是取单个地址。也就是说,当有多个地址时,这里 addrList.next() 实际上实现了负载均衡 ^0^
addressList 结构体的组成:
// addressList represents a set of addresses along with
// data used to control and access said addresses.
type addressList struct {
addresses []address
counter int
mu sync.RWMutex
}
除了地址数组之外,还有一个 counter ,以及并发保护的读写锁。
// max integer value supported on this architecture.
const maxInt = int(^uint(0) >> 1)
// next 从列表中获取下一个地址,考虑到当前的循环实现。除了尽力而为的线性迭代,对选择没有任何保证。
func (a *addressList) next() *string {
// 获取读锁
a.mu.RLock()
defer a.mu.RUnlock()
if len(a.addresses) == 0 {
return nil
}
// 如果 counter 达到 maxInt,就从头再来
if a.counter == maxInt {
a.counter = 0
}
// 用地址数量 对 counter 求余,去余数所对应的地址,然后counter递增
// 相当于一个最简单常见的 轮询 算法
index := a.counter % len(a.addresses)
addr := a.addresses[index]
a.counter++
return &addr.ip
}
为了避免多个请求同时去解析同一个 ID,因此设计了并发保护机制,对于单个ID,只容许一个请求执行解析,其他请求会等待这个解析的结果:
// ResolveID resolves name to address via mDNS.
func (m *Resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {
sub := NewSubscriber()
// add the sub to the pool of subs for this app id.
m.subMu.Lock()
appIDSubs, exists := m.subs[req.ID]
if !exists {
// WARN: must set appIDSubs variable for use below.
appIDSubs = NewSubscriberPool(sub)
m.subs[req.ID] = appIDSubs
} else {
appIDSubs.Add(sub)
}
m.subMu.Unlock()
// only one subscriber per pool will perform the first browse for the
// requested app id. The rest will subscribe for an address or error.
var once *sync.Once
var published chan struct{}
ctx, cancel := context.WithTimeout(context.Background(), browseOneTimeout)
defer cancel()
appIDSubs.Once.Do(func() {
published = make(chan struct{})
m.browseOne(ctx, req.ID, published)
// once will only be set for the first browser.
once = new(sync.Once)
})
......
}
mdns name resolver 返回的是一个简单的 ip 地址+端口(v4或者v6),形如 “192.168.0.100:8000”。
kubernetes 的实现超级简单,直接按照 Kubernetes services 的格式要求,评出一个 Kubernetes services 的 name 即可:
// ResolveID resolves name to address in Kubernetes.
func (k *resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {
// Dapr requires this formatting for Kubernetes services
return fmt.Sprintf("%s-dapr.%s.svc.%s:%d", req.ID, req.Namespace, k.clusterDomain, req.Port), nil
}
其中, req.ID 和 req.Namespace 对应到 Kubernetes 的 service name 和 namespace,注意这里的 Kubernetes service 是在 ID 后面加了 “-dapr” 后缀。Port 来自请求参数,简单拼接而已。
clusterDomain 稍微复杂一点,默认值是 “cluster.local”,在构建 Resolver 时设置:
const (
DefaultClusterDomain = "cluster.local"
)
type resolver struct {
logger logger.Logger
clusterDomain string
}
// NewResolver creates Kubernetes name resolver.
func NewResolver(logger logger.Logger) nameresolution.Resolver {
return &resolver{
logger: logger,
clusterDomain: DefaultClusterDomain,
}
}
可以在配置中设置名为 “clusterDomain” 的 metadata 来覆盖默认值:
const (
ClusterDomainKey = "clusterDomain"
)
func (k *resolver) Init(metadata nameresolution.Metadata) error {
configInterface, err := config.Normalize(metadata.Configuration)
if err != nil {
return err
}
if config, ok := configInterface.(map[string]string); ok {
clusterDomain := config[ClusterDomainKey]
if clusterDomain != "" {
k.clusterDomain = clusterDomain
}
}
return nil
}
kubernetes name resolver 返回的是一个简单的 Kubernetes services 的 name,形如 “app1-dapr.default.svc.cluster.local:80”。而不是一般意义上的 IP 地址。
dns 的实现也是超级简单,类似 kubernetes 的实现,直接按照 DNS 的格式要求,评出一个 Kubernetes services 的 name 即可:
// ResolveID resolves name to address in orchestrator.
func (k *resolver) ResolveID(req nameresolution.ResolveRequest) (string, error) {
return fmt.Sprintf("%s-dapr.%s.svc:%d", req.ID, req.Namespace, req.Port), nil
}
所有参数都来自请求,只是拼接而已。
DNS name resolver 返回的是一个简单的 Kubernetes services 的 name,形如 “app1-dapr.default.svc:80”。而不是一般意义上的 IP 地址。
初始化需要读取配置,建立连接:
func (r *resolver) Init(metadata nr.Metadata) error {
var err error
r.config, err = getConfig(metadata)
if err != nil {
return err
}
if err = r.client.InitClient(r.config.Client); err != nil {
return fmt.Errorf("failed to init consul client: %w", err)
}
// register service to consul
......
return nil
}
在 init 函数中,还可以根据配置的要求执行 consul 的服务注册功能:
// register service to consul
if r.config.Registration != nil {
if err := r.client.Agent().ServiceRegister(r.config.Registration); err != nil {
return fmt.Errorf("failed to register consul service: %w", err)
}
r.logger.Infof("service:%s registered on consul agent", r.config.Registration.Name)
} else if _, err := r.client.Agent().Self(); err != nil {
return fmt.Errorf("failed check on consul agent: %w", err)
}
consul 命名解析器的实现比较简单:
// ResolveID resolves name to address via consul.
func (r *resolver) ResolveID(req nr.ResolveRequest) (string, error) {
cfg := r.config
// 查询 consul 中对应服务的健康实例
// 只用到 req.ID,namespace 没有用到
services, _, err := r.client.Health().Service(req.ID, "", true, cfg.QueryOptions)
if err != nil {
return "", fmt.Errorf("failed to query healthy consul services: %w", err)
}
if len(services) == 0 {
return "", fmt.Errorf("no healthy services found with AppID:%s", req.ID)
}
// shuffle:洗牌,将传入的 services 按照随机方式对调位置
shuffle := func(services []*consul.ServiceEntry) []*consul.ServiceEntry {
for i := len(services) - 1; i > 0; i-- {
rndbig, _ := rand.Int(rand.Reader, big.NewInt(int64(i+1)))
j := rndbig.Int64()
services[i], services[j] = services[j], services[i]
}
return services
}
// 先洗牌,然后取结果中的第一个地址,相当于负载均衡中的随机算法
svc := shuffle(services)[0]
addr := ""
// 取地址和port信息
if port, ok := svc.Service.Meta[cfg.DaprPortMetaKey]; ok {
if svc.Service.Address != "" {
addr = fmt.Sprintf("%s:%s", svc.Service.Address, port)
} else if svc.Node.Address != "" {
addr = fmt.Sprintf("%s:%s", svc.Node.Address, port)
} else {
return "", fmt.Errorf("no healthy services found with AppID:%s", req.ID)
}
} else {
return "", fmt.Errorf("target service AppID:%s found but DAPR_PORT missing from meta", req.ID)
}
return addr, nil
}
consul name resolver 返回的是一个简单的ip/端口字符串,形如 “192.168.0.100:80”。对于多个实例,内部实现了随机算法。
Dapr runtime 对外提供两个 API,分别是 Dapr HTTP API 和 Dapr gRPC API。两个 Dapr API 对外暴露的端口,默认是:
dapr-http-port
设置dapr-grpc-port
设置gRPC API 定义在 dapr/proto/runtime/v1/dapr.proto
文件中的 Dapr service 中:
service Dapr {
// Publishes events to the specific topic.
rpc PublishEvent(PublishEventRequest) returns (google.protobuf.Empty) {}
......
}
// PublishEventRequest is the message to publish event data to pubsub topic
message PublishEventRequest {
// The name of the pubsub component
string pubsub_name = 1;
// The pubsub topic
string topic = 2;
// The data which will be published to topic.
bytes data = 3;
// The content type for the data (optional).
string data_content_type = 4;
// The metadata passing to pub components
//
// metadata property:
// - key : the key of the message.
map<string, string> metadata = 5;
}
主要的参数是:
可选参数有:
HTTP API 没有明确的单独定义,不过可以从代码中获知。在 pkg/http/api.go
中,构建用于 publish 的 endpoint 的代码如下:
func (a *api) constructPubSubEndpoints() []Endpoint {
return []Endpoint{
{
// 发送 POST 或者 PUT 请求
Methods: []string{fasthttp.MethodPost, fasthttp.MethodPut},
// 到这个 URL
Route: "publish/{pubsubname}/{topic:*}",
Version: apiVersionV1,
Handler: a.onPublish,
},
}
}
因此,用于 publish 的 daprd URL 类似于 http://localhost:3500/v1.0/publish/pubsubname1/topic1
。
处理请求的 handler 方法 a.onPublish() 中读取参数的代码如下(忽略其他细节):
const (
pubsubnameparam = "pubsubname"
)
// 从 url 中读取 pubsubname
pubsubName := reqCtx.UserValue(pubsubnameparam).(string)
// 从 url 中读取 topic
topic := reqCtx.UserValue(topicParam).(string)
// 从 HTTP body
body := reqCtx.PostBody()
// 从 HTTP 的 Content-Type header 中读取 data_content_type
contentType := string(reqCtx.Request.Header.Peek("Content-Type"))
// 从 HTTP URL query 中读取 metadata
metadata := getMetadataFromRequest(reqCtx)
Metadata 的读取要稍微复杂一些,需要读取所有的 url query 参数,然后根据 key 的前缀判断是不是 metadata:
const (
metadataPrefix = "metadata."
)
func getMetadataFromRequest(reqCtx *fasthttp.RequestCtx) map[string]string {
metadata := map[string]string{}
// 游历所有的 url query 参数
reqCtx.QueryArgs().VisitAll(func(key []byte, value []byte) {
queryKey := string(key)
// 如果 query 参数的 key 以 "metadata." 开头,就视为一个 metadata 的key
if strings.HasPrefix(queryKey, metadataPrefix) {
// key 的 前缀 "metadata." 要去掉
k := strings.TrimPrefix(queryKey, metadataPrefix)
metadata[k] = string(value)
}
})
return metadata
}
总结:用于 publish 的完整的 daprd URL 类似于 http://localhost:3500/v1.0/publish/pubsubname1/topic1?metadata.k1=v1&metadata.k2=v2&metadata.k3=v3
。消息内容通过 HTTP body 传递,另外可以通过 Content-Type header 传递消息内容类型参数。
默认情况下使用 gRPC 协议进行消息发布,daprd 在默认的 50001 端口,通过注册的 dapr service 的 PublishEvent() 方法接收来自客户端通过 dapr SDK 发出的 gRPC 请求,之后根据具体的组件实现,对底层实际使用的消息中间件发布事件。流程大体如下:
title Pub-Sub via gRPC Protocol
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
=User Code
----
producer
]
participant SDK_client [
=Dapr SDK
----
producer
]
end box
participant daprd_client [
=daprd
----
producer
]
participant message_broker as "Message Broker"
user_code_client -> SDK_client : PublishEvent()
note left: pubsub_name="name-1"\ntopic="topic-1"\ndata="[...]"\ndata_content_type=""\nmetadata="[...]"
note right: PublishEvent() @ Dapr service
SDK_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001
|||
daprd_client -[#red]> message_broker : native protocol (remote call)
|||
message_broker --[#red]> daprd_client :
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client
HTTP协议类似,daprd 在默认的 3500 端口,通过前面所述的URL接收客户端通过 dapr SDK 发出的 HTTP 请求。流程大体如下:
title Pub-Sub via HTTP Protocol
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
=User Code
----
producer
]
participant SDK_client [
=Dapr SDK
----
producer
]
end box
participant daprd_client [
=daprd
----
producer
]
participant message_broker as "Message Broker"
user_code_client -> SDK_client : PublishEvent()
note left: pubsub_name="name-1"\ntopic="topic-1"\ndata="[...]"\ndata_content_type=""\nmetadata="[...]"
note right: POST http://localhost:3500/v1.0/publish/pubsubname1/topic1?\nmetadata.k1=v1&metadata.k2=v2&metadata.k3=v3
SDK_client -[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500
|||
daprd_client -[#red]> message_broker : native protocol (remote call)
|||
message_broker --[#red]> daprd_client :
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client
在 dapr runtime 启动进行初始化时,需要开启 API 端口并挂载相应的 handler 来接收并处理发布订阅中的发布请求。另外需要根据配置文件启动 pubsub component 以便连接到外部 message broker。
在 dapr runtime 启动时的初始化过程中,会启动 gRPC server, 代码在 pkg/runtime/runtime.go
中:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
// Create and start internal and external gRPC servers
grpcAPI := a.getGRPCAPI()
err = a.startGRPCAPIServer(grpcAPI, a.runtimeConfig.APIGRPCPort)
......
}
func (a *DaprRuntime) startGRPCAPIServer(api grpc.API, port int) error {
serverConf := a.getNewServerConfig(a.runtimeConfig.APIListenAddresses, port)
server := grpc.NewAPIServer(api, serverConf, a.globalConfig.Spec.TracingSpec, a.globalConfig.Spec.MetricSpec, a.globalConfig.Spec.APISpec, a.proxy)
if err := server.StartNonBlocking(); err != nil {
return err
}
......
}
// NewAPIServer returns a new user facing gRPC API server.
func NewAPIServer(api API, config ServerConfig, ......) Server {
return &server{
api: api,
config: config,
kind: apiServer, // const apiServer = "apiServer"
......
}
}
为了让 dapr runtime 的 gRPC 服务器能挂载 Dapr API,需要将定义 dapr api 的 dapr service 注册到 gRPC 服务器上去。
注册的代码实现在 pkg/grpc/server.go
中, StartNonBlocking() 方法在启动 grpc 服务器时,会进行服务注册:
func (s *server) StartNonBlocking() error {
if s.kind == internalServer {
internalv1pb.RegisterServiceInvocationServer(server, s.api)
} else if s.kind == apiServer {
runtimev1pb.RegisterDaprServer(server, s.api) // 注意:s.api (即 gRPC api 实现) 被传递进去
}
......
}
而 RegisterDaprServer() 方法的实现代码在 pkg/proto/runtime/v1/dapr_grpc.pb.go
:
func RegisterDaprServer(s grpc.ServiceRegistrar, srv DaprServer) {
s.RegisterService(&Dapr_ServiceDesc, srv) // srv 即 gRPC api 实现
}
在文件 pkg/proto/runtime/v1/dapr_grpc.pb.go
中有 Dapr Service 的 grpc 服务定义,这是 protoc 生成的 gRPC 代码。
Dapr_ServiceDesc 中有 Dapr Service 各个方法的定义,和发布相关的是 PublishEvent
方法:
var Dapr_ServiceDesc = grpc.ServiceDesc{
ServiceName: "dapr.proto.runtime.v1.Dapr",
HandlerType: (*DaprServer)(nil),
Methods: []grpc.MethodDesc{
{
MethodName: "PublishEvent", # 注册方法名
Handler: _Dapr_PublishEvent_Handler, # 关联实现的 Handler
},
......
},
},
Metadata: "dapr/proto/runtime/v1/dapr.proto",
}
这一段是告诉 gRPC server: 如果收到访问 dapr.proto.runtime.v1.Dapr
服务的 PublishEvent
方法的 gRPC 请求,请把请求转给 _Dapr_PublishEvent_Handler
处理。
title Dapr publish gRPC API
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
producer
]
-[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n/dapr.proto.runtime.v1.Dapr/PublishEvent
|||
<[#blue]-- daprd_client
而 PublishEvent
方法相关联的 handler 方法 _Dapr_PublishEvent_Handler
的实现代码是:
func _Dapr_PublishEvent_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) {
in := new(PublishEventRequest)
if err := dec(in); err != nil {
return nil, err
}
if interceptor == nil {
return srv.(DaprServer).PublishEvent(ctx, in)
}
info := &grpc.UnaryServerInfo{
Server: srv,
FullMethod: "/dapr.proto.runtime.v1.Dapr/PublishEvent",
}
handler := func(ctx context.Context, req interface{}) (interface{}, error) {
return srv.(DaprServer).PublishEvent(ctx, req.(*PublishEventRequest))
}
return interceptor(ctx, in, info, handler)
}
最后调用到了 DaprServer 接口实现的 PublishEvent 方法,也就是 gPRC API 实现。
在 dapr runtime 启动时的初始化过程中,会启动 HTTP server, 代码在 pkg/runtime/runtime.go
中
dapr runtime 的 HTTP server 用的是 fasthttp。
在 dapr runtime 启动时的初始化过程中,会启动 HTTP server, 代码在 pkg/runtime/runtime.go 中:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
// Start HTTP Server
err = a.startHTTPServer(a.runtimeConfig.HTTPPort, a.runtimeConfig.PublicPort, a.runtimeConfig.ProfilePort, a.runtimeConfig.AllowedOrigins, pipeline)
if err != nil {
log.Fatalf("failed to start HTTP server: %s", err)
}
......
}
func (a *DaprRuntime) startHTTPServer(......) error {
a.daprHTTPAPI = http.NewAPI(......)
server := http.NewServer(a.daprHTTPAPI, ......)
if err := server.StartNonBlocking(); err != nil { // StartNonBlocking 启动 fasthttp server
return err
}
}
在 HTTP API 的初始化过程中,会在 fast http server 上挂载 PubSub 的 HTTP 端点,代码在 pkg/http/api.go
中:
func NewAPI(
appID string,
appChannel channel.AppChannel,
directMessaging messaging.DirectMessaging,
......
shutdown func()) API {
api := &api{
appChannel: appChannel,
directMessaging: directMessaging,
......
}
// 附加 PubSub 的 HTTP 端点
api.endpoints = append(api.endpoints, api.constructPubSubEndpoints()...)
}
PubSub 的 HTTP 端点的具体信息在 constructPubSubEndpoints() 方法中:
func (a *api) constructPubSubEndpoints() []Endpoint {
return []Endpoint{
{
Methods: []string{fasthttp.MethodPost, fasthttp.MethodPut},
Route: "publish/{pubsubname}/{topic:*}",
Version: apiVersionV1,
Handler: a.onPublish,
},
}
}
注意这里的 Route 路径 “publish/{pubsubname}/{topic:*}", dapr sdk 就是就通过这样的 url 来发起 HTTP publish 请求。
title Dapr Publish HTTP API
hide footbox
skinparam style strictuml
participant daprd_client [
=daprd
----
producer
]
-[#blue]> daprd_client : HTTP (localhost)
note right: HTTP API @ 3500\n/v1.0/publish/{pubsubname}/{topic:*}
|||
<[#blue]-- daprd_client
为了提供对 pubsub 的功能支持,需要为 dapr runtime 配置 pubsub component。
DaprRuntime 的结构体中保存有 pubSubRegistry 和 pubSubs 列表:
type DaprRuntime struct {
......
pubSubRegistry pubsub_loader.Registry
pubSubs map[string]pubsub.PubSub
......
}
runtime 构建时会初始化这两个结构体:
func NewDaprRuntime(runtimeConfig *Config, globalConfig *config.Configuration, accessControlList *config.AccessControlList, resiliencyProvider resiliency.Provider) *DaprRuntime {
ctx, cancel := context.WithCancel(context.Background())
return &DaprRuntime{
......
pubSubs: map[string]pubsub.PubSub{},
pubSubRegistry: pubsub_loader.NewRegistry(),
......
pubSubRegistry 用于保存 dapr runtime 中支持的所有 pubsub component :
pubSubRegistry struct {
messageBuses map[string]func() pubsub.PubSub
}
在 runtime binary (cmd/daprd/main.go
)的代码中,会列举出所有的 pubsub component ,这也是 darp 和 conponents-contrib 两个仓库的直接联系:
err = rt.Run(
......
runtime.WithPubSubs(
pubsub_loader.New("azure.eventhubs", func() pubs.PubSub {
return pubsub_eventhubs.NewAzureEventHubs(logContrib)
}),
pubsub_loader.New("azure.servicebus", func() pubs.PubSub {
return servicebus.NewAzureServiceBus(logContrib)
}),
pubsub_loader.New("gcp.pubsub", func() pubs.PubSub {
return pubsub_gcp.NewGCPPubSub(logContrib)
}),
pubsub_loader.New("hazelcast", func() pubs.PubSub {
return pubsub_hazelcast.NewHazelcastPubSub(logContrib)
}),
pubsub_loader.New("jetstream", func() pubs.PubSub {
return pubsub_jetstream.NewJetStream(logContrib)
}),
pubsub_loader.New("kafka", func() pubs.PubSub {
return pubsub_kafka.NewKafka(logContrib)
}),
pubsub_loader.New("mqtt", func() pubs.PubSub {
return pubsub_mqtt.NewMQTTPubSub(logContrib)
}),
pubsub_loader.New("natsstreaming", func() pubs.PubSub {
return natsstreaming.NewNATSStreamingPubSub(logContrib)
}),
pubsub_loader.New("pulsar", func() pubs.PubSub {
return pubsub_pulsar.NewPulsar(logContrib)
}),
pubsub_loader.New("rabbitmq", func() pubs.PubSub {
return rabbitmq.NewRabbitMQ(logContrib)
}),
pubsub_loader.New("redis", func() pubs.PubSub {
return pubsub_redis.NewRedisStreams(logContrib)
}),
pubsub_loader.New("snssqs", func() pubs.PubSub {
return pubsub_snssqs.NewSnsSqs(logContrib)
}),
pubsub_loader.New("in-memory", func() pubs.PubSub {
return pubsub_inmemory.New(logContrib)
}),
),
......
)
runtime 在初始化时会将这些 pubsub component 信息保存在 pubSubRegistry 中:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
a.pubSubRegistry.Register(opts.pubsubs...)
}
需要注意的是,pubSubRegistry 中保存的组件列表是所有的被 dapr runtime 支持的组件列表,但是,不是每个组件在 runtime 启动时都会被装载。组件的安装时按需的,由组件配置文件(yaml)来决定装载和初始化那些组件的示例。
组件在 dapr runtime 初始化时统一装载:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
a.pubSubRegistry.Register(opts.pubsubs...)
a.secretStoresRegistry.Register(opts.secretStores...)
a.stateStoreRegistry.Register(opts.states...)
......
err = a.loadComponents(opts)
a.flushOutstandingComponents()
......
}
有两种实现,KubernetesMode 和 StandaloneMode:
func (a *DaprRuntime) loadComponents(opts *runtimeOpts) error {
var loader components.ComponentLoader
switch a.runtimeConfig.Mode {
case modes.KubernetesMode:
loader = components.NewKubernetesComponents(a.runtimeConfig.Kubernetes, a.namespace, a.operatorClient, a.podName)
case modes.StandaloneMode:
loader = components.NewStandaloneComponents(a.runtimeConfig.Standalone)
default:
return errors.Errorf("components loader for mode %s not found", a.runtimeConfig.Mode)
}
comps, err := loader.LoadComponents()
......
}
KubernetesMode 下读取的是 k8s 下的 component CRD:
func (k *KubernetesComponents) LoadComponents() ([]components_v1alpha1.Component, error) {
resp, err := k.client.ListComponents(context.Background(), &operatorv1pb.ListComponentsRequest{
Namespace: k.namespace,
PodName: k.podName,
}, ......
}
StandaloneMode 下读取的是由 ComponentsPath 配置(--componentspath
)指定的目录下的 component CRD 文件:
func (s *StandaloneComponents) LoadComponents() ([]components_v1alpha1.Component, error) {
files, err := os.ReadDir(s.config.ComponentsPath)
......
}
在完成 HTTP server 和 gRPC server 的初始化之后,dapr runtime 就做好了接收 publish 请求的准备。
在业务代码中使用 pubsub 功能的示例可参考文件 dapr java-sdk 中的代码 /src/main/java/io/dapr/examples/pubsub/http/Publisher.java
,代码示意如下:
DaprClient client = (new DaprClientBuilder()).build();
String message = String.format("This is message #%d", i);
client.publishEvent(
"messagebus",
"testingtopic",
message,
singletonMap(Metadata."ttlInSeconds", "1000")).block();
java SDK 中除了 service invoke 默认使用 HTTP ,其他方法都是默认使用 gRPC,在 DaprClientProxy 类中初始化了两个 daprclient:
pubsub 方法默认走 gRPC ,使用的是 DaprClientGrpc
类型 (文件为 src/main/java/io/dapr/client/DaprClientGrpc.java
):
@Override
public Mono<Void> publishEvent(PublishEventRequest request) {
try {
String pubsubName = request.getPubsubName();
String topic = request.getTopic();
Object data = request.getData();
DaprProtos.PublishEventRequest.Builder envelopeBuilder = DaprProtos.PublishEventRequest.newBuilder()
......
return Mono.subscriberContext().flatMap(
context ->
this.<Empty>createMono(
it -> intercept(context, asyncStub).publishEvent(envelopeBuilder.build(), it)
)
).then();
}
在这里根据请求条件设置 PublishEvent 请求的各种参数,debug 时可以看到如下图的数据:
发出去给 dapr runtime 的 gRPC 请求如下图所示:
这里调用的 gRPC 服务是 dapr.proto.runtime.v1.Dapr
, 方法是 PublishEvent
,和前一章中 dapr runtime 初始化中设定的 gRPC API 对应。
title PublishEvent via gRPC
hide footbox
skinparam style strictuml
box "App-1"
participant user_code_client [
=App-1
----
producer
]
participant SDK_client [
=SDK
----
producer
]
end box
participant daprd_client [
=daprd
----
producer
]
user_code_client -> SDK_client : PublishEvent()
note left: pubsub_name="name-1"\ntopic="topic-1"\ndata="[...]"\ndata_content_type=""\nmetadata="[...]"
SDK_client -[#blue]> daprd_client : gRPC (localhost)
note right: gRPC API @ 50001\n"dapr.proto.runtime.v1.Dapr/PublishEvent"
|||
SDK_client <[#blue]-- daprd_client
user_code_client <-- SDK_client
在 go 业务代码中使用 service invoke 功能的示例可参考 https://github.com/dapr/go-sdk/blob/main/examples/pubsub/pub/pub.go,代码示意如下:
client, err := dapr.NewClient()
err := client.PublishEvent(ctx, pubsubName, topicName, data)
Go SDK 中定义了 Client 接口,文件为 client/client.go
:
// Client is the interface for Dapr client implementation.
type Client interface {
// PublishEvent publishes data onto topic in specific pubsub component.
PublishEvent(ctx context.Context, pubsubName, topicName string, data interface{}, opts ...PublishEventOption) error
......
}
方法的实现在 client/pubsub.go
中,都只是实现了对 PublishEventRequest 对象的组装:
func (c *GRPCClient) invokeServiceWithRequest(ctx context.Context, req *pb.InvokeServiceRequest) (out []byte, err error) {
request := &pb.PublishEventRequest{
PubsubName: pubsubName,
Topic: topicName,
}
_, err := c.protoClient.PublishEvent(c.withAuthToken(ctx), request)
......
}
PublishEvent() 是 protoc 生成的 grpc 代码,在 dapr/proto/runtime/v1/dapr_grpc.pb.go
中,实现如下:
func (c *daprClient) PublishEvent(ctx context.Context, in *PublishEventRequest, opts ...grpc.CallOption) (*emptypb.Empty, error) {
out := new(emptypb.Empty)
err := c.cc.Invoke(ctx, "/dapr.proto.runtime.v1.Dapr/PublishEvent", in, out, opts...)
if err != nil {
return nil, err
}
return out, nil
}
注意: 这里调用的 gRPC 服务是 dapr.proto.runtime.v1.Dapr
, 方法是 InvokeService
,和 dapr runtime 中 gRPC API 对应。
TODO
在 dapr runtime 中,提供 HTTP 和 gRPC 两种协议,前面 runtime 初始化时介绍了 HTTP 和 gRPC 两种协议是如何在 runtime 初始化时准备好接收来自客户端的 publish 请求的。现在我们介绍在接收到来自客户端的 publish 请求后,dapr runtime 是如何处理请求的。
在 gRPC API 的实现中,PublishEvent() 方法负责处理接收到的 publish 请求,其主要流程大体是如下4个步骤:
type api struct {
pubsubAdapter runtimePubsub.Adapter
}
func (a *api) PublishEvent(ctx context.Context, in *runtimev1pb.PublishEventRequest) (*emptypb.Empty, error) {
// 1. 根据名称找到可以处理请求的 pubsub 组件
thepubsub := a.pubsubAdapter.GetPubSub(pubsubName)
// 2. 处理参数的细节:如是否要封装为 cloudevent
// 细节忽略,后续展开
// 3. 构建 PublishRequest 请求对象
req := pubsub.PublishRequest{
PubsubName: pubsubName,
Topic: topic,
Data: data,
Metadata: in.Metadata,
}
// 4. 未退 pubsub 组件来负责具体的请求发送
err := a.pubsubAdapter.Publish(&req)
}
// 检查是否有初始化 pubsubAdapter,没有的话报错退出
if a.pubsubAdapter == nil {
err := status.Error(codes.FailedPrecondition, messages.ErrPubsubNotConfigured)
apiServerLogger.Debug(err)
return &emptypb.Empty{}, err
}
pubsubName := in.PubsubName
// 检查请求,pubsubName 参数不能为空
if pubsubName == "" {
err := status.Error(codes.InvalidArgument, messages.ErrPubsubEmpty)
apiServerLogger.Debug(err)
return &emptypb.Empty{}, err
}
// 根据 pubsubName 参数在 pubsubAdapter 中找到对应的组件
thepubsub := a.pubsubAdapter.GetPubSub(pubsubName)
if thepubsub == nil {
// 如果找不到,则报错退出
err := status.Errorf(codes.InvalidArgument, messages.ErrPubsubNotFound, pubsubName)
apiServerLogger.Debug(err)
return &emptypb.Empty{}, err
}
GetPubSub() 方法的实现很简单,就是根据 pubsubName 在现有已经初始化的 pubsub 组件中进行简单的map查找:
// GetPubSub is an adapter method to find a pubsub by name.
func (a *DaprRuntime) GetPubSub(pubsubName string) pubsub.PubSub {
ps, ok := a.pubSubs[pubsubName]
if !ok {
return nil
}
return ps.component
}
func (a *DaprRuntime) Publish(req *pubsub.PublishRequest) error {
// 这里又根据名称做了一次查找
// TBD:可以考虑做代码优化了,从前面把找到的组件传递过来就好了
ps, ok := a.pubSubs[req.PubsubName]
if !ok {
return runtimePubsub.NotFoundError{PubsubName: req.PubsubName}
}
// 检查 pubsub 操作是否被容许
if allowed := a.isPubSubOperationAllowed(req.PubsubName, req.Topic, ps.scopedPublishings); !allowed {
return runtimePubsub.NotAllowedError{Topic: req.Topic, ID: a.runtimeConfig.ID}
}
// 执行策略
policy := a.resiliency.ComponentOutboundPolicy(a.ctx, req.PubsubName)
return policy(func(ctx context.Context) (err error) {
// 最终调用到底层实际组件的 Publish 方法来发送请求
return ps.component.Publish(req)
})
}
HTTP API 的处理方式和 gRPC API 是一致的,只是 HTTP API 这边由于 HTTP 协议的原因,在请求参数的获取上无法像 gRPC API 那样有一个的 runtimev1pb.PublishEventRequest 对象可以完整的封装所有请求参数,HTTP API 会多出一个请求参数的获取过程。
HTTP API 实现中的 onPublish() 方法的前面一段代码就是在处理如何从 HTTP 请求中获取 publish 所需的所有参数:
func (a *api) onPublish(reqCtx *fasthttp.RequestCtx) {
// 1. pubsubName
pubsubName := reqCtx.UserValue(pubsubnameparam).(string)
// 2. topic
topic := reqCtx.UserValue(topicParam).(string)
// 3. data
body := reqCtx.PostBody()
// 4. data content type
contentType := string(reqCtx.Request.Header.Peek("Content-Type"))
// 5. metadata
metadata := getMetadataFromRequest(reqCtx)
// 后续处理和 gRPC 协议一致
......
}
在 dapr runtime API 实现(包括 HTTP API 和 gRPC API)和底层 pubsub 组件之间,还有一个简单的内部接口,定义了 pubsub 组件的功能:
// PubSub is the interface for message buses.
type PubSub interface {
Init(metadata Metadata) error
Features() []Feature
Publish(req *PublishRequest) error
Subscribe(ctx context.Context, req SubscribeRequest, handler Handler) error
Close() error
}
其中的 Publish() 用来发送消息。请求参数 PublishRequest 的字段和 Dapr API 定义中保持一致:
// PublishRequest is the request to publish a message.
type PublishRequest struct {
Data []byte `json:"data"`
PubsubName string `json:"pubsubname"`
Topic string `json:"topic"`
Metadata map[string]string `json:"metadata"`
ContentType *string `json:"contentType,omitempty"`
}
以 redis stream 为例,看看 publish 方法的实现:
func (r *redisStreams) Publish(req *pubsub.PublishRequest) error {
_, err := r.client.XAdd(r.ctx, &redis.XAddArgs{
Stream: req.Topic,
MaxLenApprox: r.metadata.maxLenApprox,
Values: map[string]interface{}{"data": req.Data},
}).Result()
if err != nil {
return fmt.Errorf("redis streams: error from publish: %s", err)
}
return nil
}
redis stream 的实现很简单,req.Topic 参数指定要写入的 redis stream,内容为一个map,其中 key “data” 的值为 req.Data。
订阅流程实际包含三个子流程:
获取应用订阅消息
daprd 需要获知应用的订阅信息。
实现中,dapr 会要求应用收集订阅信息并通过指定方式暴露(SDK 可以提供帮助),以便 daprd 可以通过给应用发送请求来获取这些订阅信息。
执行消息订阅
Daprd 在拿到应用的订阅信息之后,就可以使用底层组件的订阅机制进行消息订阅。
转发消息给应用
daprd 收到来自底层组件的订阅的消息之后,需要将消息转发给应用。
以上子流程1和3都需要 daprd 主动访问应用,因此 dapr 需要获知应用在哪个端口监听并处理订阅请求,这个信息通过命令行参数 app-port
设置。Dapr 的示例中一般喜欢用 3000 端口。
gRPC API 定义在 dapr/proto/runtime/v1/appcallback.proto
文件中的 AppCallback service 中:
service AppCallback {
// 子流程1:获取应用订阅消息
rpc ListTopicSubscriptions(google.protobuf.Empty) returns (ListTopicSubscriptionsResponse) {}
// 子流程3:转发消息给应用
rpc OnTopicEvent(TopicEventRequest) returns (TopicEventResponse) {}
......
}
ListTopicSubscriptionsResponse 的定义:
message ListTopicSubscriptionsResponse {
repeated common.v1.TopicSubscription subscriptions = 1;
}
message TopicSubscription {
// pubsub的组件名
string pubsub_name = 1;
// 要订阅的topic
string topic = 2;
// 可选参数,后面展开
map<string,string> metadata = 3;
TopicRoutes routes = 5;
string dead_letter_topic = 6;
}
即应用可以有多个消息订阅,每个订阅都必须提供 pubsub_name 和 topic 参数。
TopicEventRequest 的定义:
message TopicEventRequest {
// 这几个参数先忽略
string id = 1;
string source = 2;
string type = 3;
string spec_version = 4;
string path = 9;
// 事件的基本信息
string data_content_type = 5;
bytes data = 7;
string topic = 6;
string pubsub_name = 8;
}
title Subscribe via http
hide footbox
skinparam style strictuml
box "App-1"
participant user_code [
=App-1
----
producer
]
participant SDK [
=SDK
----
producer
]
end box
participant daprd [
=daprd
----
producer
]
participant message_broker as "Message Broker"
SDK -> user_code: collection subscribe
user_code --> SDK
daprd -[#blue]> SDK : http
note left: appChannel.InvokeMethod("dapr/subscribe")
SDK --[#blue]> daprd :
daprd -[#red]> message_broker : subscribe topics
message_broker --[#red]> daprd
|||
|||
|||
|||
message_broker -[#red]> daprd: event
daprd -[#blue]> SDK : http
note left: appChannel.InvokeMethod("/{route}")
SDK -> user_code :
user_code --> SDK
SDK --[#blue]> daprd
|||
title Subscribe via gRPC
hide footbox
skinparam style strictuml
box "App-1"
participant user_code [
=App-1
----
producer
]
participant SDK [
=SDK
----
producer
]
end box
participant daprd [
=daprd
----
producer
]
participant message_broker as "Message Broker"
SDK -> user_code: collection subscribe
user_code --> SDK
daprd -[#blue]> SDK : gRPC
note left: appChannel.ListTopicSubscriptions()
SDK --[#blue]> daprd :
daprd -[#red]> message_broker : subscribe topics
message_broker --[#red]> daprd
|||
|||
|||
|||
message_broker -[#red]> daprd: event
daprd -[#blue]> SDK : gRPC
note left: appChannel.OnTopicEvent()
SDK -> user_code :
user_code --> SDK
SDK --[#blue]> daprd
|||
在 dapr runtime 启动进行初始化时,需要
dapr runtime 初始化时会创建和 app 的连接,称为 app channel,然后开始发布订阅的初始化:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
// 有一个单独的 go routine 负责处理 component 的初始化
go a.processComponents()
err = a.loadComponents(opts)
// 等待应用ready: 前提是设置了 app port
a.blockUntilAppIsReady()
// 创建 app channel
err = a.createAppChannel()
// app channel 支持 http 和 grpc
a.daprHTTPAPI.SetAppChannel(a.appChannel)
grpcAPI.SetAppChannel(a.appChannel)
......
// 开始发布订阅的初始化
a.startSubscribing()
}
这里有一段复杂的并行初始化components并处理相互依赖的逻辑,忽略这些细节,只看执行 component 初始化的代码:
func (a *DaprRuntime) doProcessOneComponent(category ComponentCategory, comp components_v1alpha1.Component) error {
switch category {
case pubsubComponent:
return a.initPubSub(comp)
......
}
return nil
}
func (a *DaprRuntime) initPubSub(c components_v1alpha1.Component) error {
pubSub, err := a.pubSubRegistry.Create(c.Spec.Type, c.Spec.Version)
// 初始化 pubSub component
err = pubSub.Init(pubsub.Metadata{
Properties: properties,
})
pubsubName := c.ObjectMeta.Name
a.pubSubs[pubsubName] = pubSub
return nil
}
这个执行完成之后,a.pubSubs 中便保存有当前配置并初始化好的 pubsub 组件列表。
订阅的初始化在 dapr runtime 启动过程的最后阶段
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
// 开始发布订阅的初始化
a.startSubscribing()
}
startSubscribing() 方法逐个处理 pubSub 组件:
func (a *DaprRuntime) startSubscribing() {
for name, pubsub := range a.pubSubs {
if err := a.beginPubSub(name, pubsub); err != nil {
log.Errorf("error occurred while beginning pubsub %s: %s", name, err)
}
}
}
beginPubSub 方法做了两个事情: 1. 获取应用的订阅信息 2. 让组件开始订阅
func (a *DaprRuntime) beginPubSub(name string, ps pubsub.PubSub) error {
var publishFunc func(ctx context.Context, msg *pubsubSubscribedMessage) error
......
topicRoutes, err := a.getTopicRoutes()
......
}
在 getTopicRoutes() 方法中,可以通过 HTTP 或者 gRPC 的方式来获取应用订阅信息:
func (a *DaprRuntime) getTopicRoutes() (map[string]TopicRoute, error) {
......
if a.runtimeConfig.ApplicationProtocol == HTTPProtocol {
// 走 http channel
subscriptions, err = runtime_pubsub.GetSubscriptionsHTTP(a.appChannel, log)
} else if a.runtimeConfig.ApplicationProtocol == GRPCProtocol {
// 走 grpc channel
client := runtimev1pb.NewAppCallbackClient(a.grpc.AppClient)
subscriptions, err = runtime_pubsub.GetSubscriptionsGRPC(client, log)
}
......
}
对于 HTTP 方式,调用的是 AppChannel 上定义的 InvokeMethod 方法,这个方法原来设计是用来实现 service invoke 的,dapr runtime 用来通过它将 service invoke 的 http inbound 请求转发给作为服务器端的应用。而在这里,被用来调用 dapr/subscribe
路径:
func GetSubscriptionsHTTP(channel channel.AppChannel, log logger.Logger) ([]Subscription, error) {
req := invokev1.NewInvokeMethodRequest("dapr/subscribe")
channel.InvokeMethod(ctx, req)
......
}
感想:理论上说这也不是为一种方便的方式,只是总感觉有点怪怪,pubsub 模块的初始化用到了 service invoke 模块的功能。直接发个http请求代码也不复杂。另外 http AppChannel / app callback 的方法和 grpc AppChannel / app callback 不对称,这在设计上缺乏美感。
对于 gRPC 方式,就比较老实的调用了 gRPC AppCallbackClient 的方法 ListTopicSubscriptions():
resp, err = channel.ListTopicSubscriptions(context.Background(), &emptypb.Empty{})
在获取到应用的订阅信息之后,dapr runtime 就知道这个应用需要订阅哪些topic了。因此就可以继续开始订阅操作:
func (a *DaprRuntime) beginPubSub(name string, ps pubsub.PubSub) error {
var publishFunc func(ctx context.Context, msg *pubsubSubscribedMessage) error
......
// 获取订阅信息
topicRoutes, err := a.getTopicRoutes()
......
// 开始订阅
for topic, route := range v.routes {
// 在当前 pubsub 组件上为每个 topic 进行订阅
err := ps.Subscribe(pubsub.SubscribeRequest{
Topic: topic,
Metadata: route.metadata,
}, func(ctx context.Context, msg *pubsub.NewMessage) error {......}
}
}
这里的 Subscribe() 方法的定义在 PubSub 接口上,每个 dapr pubsub 组件都会实现这个接口:
type PubSub interface {
Publish(req *PublishRequest) error
Subscribe(req SubscribeRequest, handler Handler) error
}
handler 方法的具体实现后面再展开。
对于订阅信息而言,有四个关键的信息。在 dapr proto 中的定义如下:
message TopicSubscription {
// Required. The name of the pubsub containing the topic below to subscribe to.
string pubsub_name = 1;
// Required. The name of topic which will be subscribed
string topic = 2;
// The optional properties used for this topic's subscription e.g. session id
map<string,string> metadata = 3;
// The optional routing rules to match against. In the gRPC interface, OnTopicEvent
// is still invoked but the matching path is sent in the TopicEventRequest.
TopicRoutes routes = 5;
}
pubsub_name 指定要使用的 pubsub component,topic 是要订阅的主题, metadata 携带扩展信息,而 routes 路由则是标记 dapr 应该如何将订阅到的事件发送给应用。
TODO:对于 HTTP 协议和 gRPC 协议处理会有不同。
java sdk中的封装如下:
public class DaprTopicSubscription {
private final String pubsubName;
private final String topic;
private final String route;
private final Map<String, String> metadata;
}
dapr sdk 需要帮助应用方便的提供上述订阅信息。
在业务代码中使用 subscribe 功能的示例可参考文件 dapr java-sdk 中的代码 /src/main/java/io/dapr/examples/pubsub/http/subscribe.java
,代码示意如下:
// 启动应用,监听端口,一般喜欢使用 3000
public static void main(String[] args) throws Exception {
......
DaprApplication.start(port);
}
@RestController
public class SubscriberController {
@Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}")
@PostMapping(path = "/testingtopic")
public Mono<Void> handleMessage(@RequestBody(required = false) CloudEvent<String> cloudEvent) {
......
}
}
上面代码中的 @Topic 注解是 dapr java sdk 提供的,用来标记需要进行 subscribe 的 topic,代码在src/main/java/io/dapr/Topic.java
:
@Documented
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface Topic {
String name();
String pubsubName();
String metadata() default "{}";
}
topic 的收集是典型的 springboot 风格,代码在 sdk-springboot/src/main/java/io/dapr/springboot/DaprBeanPostProcessor.java
:
@Component
public class DaprBeanPostProcessor implements BeanPostProcessor {
@Override
public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException {
subscribeToTopics(bean.getClass(), embeddedValueResolver);
return bean;
}
}
subscribeToTopics() 方法通过扫描 @topic 注解和 @PostMapping 注解来获取订阅相关的信息:
private static void subscribeToTopics(Class clazz, EmbeddedValueResolver embeddedValueResolver) {
for (Method method : clazz.getDeclaredMethods()) {
// 获取 @topic 注解
Topic topic = method.getAnnotation(Topic.class);
if (topic == null) {
continue;
}
String route = topic.name();
// 获取 @PostMapping 注解
PostMapping mapping = method.getAnnotation(PostMapping.class);
// 根据 PostMapping 注解获取 route 信息
if (mapping != null && mapping.path() != null && mapping.path().length >= 1) {
route = mapping.path()[0];
} else if (mapping != null && mapping.value() != null && mapping.value().length >= 1) {
route = mapping.value()[0];
}
String topicName = embeddedValueResolver.resolveStringValue(topic.name());
String pubSubName = embeddedValueResolver.resolveStringValue(topic.pubsubName());
if ((topicName != null) && (topicName.length() > 0) && pubSubName != null && pubSubName.length() > 0) {
try {
TypeReference<HashMap<String, String>> typeRef
= new TypeReference<HashMap<String, String>>() {};
Map<String, String> metadata = MAPPER.readValue(topic.metadata(), typeRef);
// 保存 subscribe 信息
DaprRuntime.getInstance().addSubscribedTopic(pubSubName, topicName, route, metadata);
} catch (JsonProcessingException e) {
throw new IllegalArgumentException("Error while parsing metadata: " + e.toString());
}
}
}
}
DaprRuntime 是一个单例对象,这里保存有订阅的 topic 列表:
class DaprRuntime {
private final Set<String> subscribedTopics = new HashSet<>();
private final List<DaprTopicSubscription> subscriptions = new ArrayList<>();
public synchronized void addSubscribedTopic(String pubsubName,
String topicName,
String route,
Map<String,String> metadata) {
if (!this.subscribedTopics.contains(topicName)) {
this.subscribedTopics.add(topicName);
this.subscriptions.add(new DaprTopicSubscription(pubsubName, topicName, route, metadata));
}
}
}
为了让 dapr 在 springboot 体系中方便使用,dapr java sdk 提供了 DaprController ,以提供诸如健康检查等通用功能,还有和dapr相关的各种端点,其中就有为 dapr runtime 提供订阅信息的接口:
@RestController
public class DaprController {
......
@GetMapping(path = "/dapr/subscribe", produces = MediaType.APPLICATION_JSON_VALUE)
public byte[] daprSubscribe() throws IOException {
return SERIALIZER.serialize(DaprRuntime.getInstance().listSubscribedTopics());
}
}
通过这个URL,就可以将之前收集到的 topic 信息都暴露出去,可以在浏览器中直接访问 http://127.0.0.1:3000/dapr/subscribe
,应答内容为:
[{"pubsubName":"messagebus","topic":"testingtopic","route":"/testingtopic","metadata":{}}]
在 go 业务代码中使用 subscribe 功能的示例可参考 https://github.com/dapr/go-sdk/blob/main/examples/pubsub/sub/sub.go,代码示意如下:
func main() {
s := daprd.NewService(":8080")
err := s.AddTopicEventHandler(defaultSubscription, eventHandler)
err = s.Start()
}
func eventHandler(ctx context.Context, e *common.TopicEvent) (retry bool, err error) {
......
return false, nil
}
Go sdk 中定义了 Service 接口
// Service represents Dapr callback service.
type Service interface {
// AddTopicEventHandler appends provided event handler with its topic and optional metadata to the service.
// Note, retries are only considered when there is an error. Lack of error is considered as a success
AddTopicEventHandler(sub *Subscription, fn TopicEventHandler) error
......
}
Subscription 的定义如下:
// Subscription represents single topic subscription.
type Subscription struct {
PubsubName string `json:"pubsubname"`
Topic string `json:"topic"`
Metadata map[string]string `json:"metadata,omitempty"`
Route string `json:"route"`
......
}
这样订阅相关的主要4个参数就通过这个方式指明了。
go sdk 中有 http 和 grpc 两套机制可以实现对外暴露访问端点。
http 的实现在 http/topic.go
中:
func (s *Server) AddTopicEventHandler(sub *common.Subscription, fn common.TopicEventHandler) error {
if err := s.topicRegistrar.AddSubscription(sub, fn); err != nil {
return err
}
// 注册 http handle,关联 Route 和 fn
s.mux.Handle(sub.Route, optionsHandler(http.HandlerFunc(
func(w http.ResponseWriter, r *http.Request) {
......
retry, err := fn(r.Context(), &te)
......
}
}
grpc类似。
TODO
workflow app 启动时,典型代码如下:
// Register the OrderProcessingWorkflow and its activities with the builder.
WorkflowRuntimeBuilder builder = new WorkflowRuntimeBuilder().registerWorkflow(OrderProcessingWorkflow.class);
builder.registerActivity(NotifyActivity.class);
builder.registerActivity(ProcessPaymentActivity.class);
builder.registerActivity(RequestApprovalActivity.class);
builder.registerActivity(ReserveInventoryActivity.class);
builder.registerActivity(UpdateInventoryActivity.class);
// Build and then start the workflow runtime pulling and executing tasks
try (WorkflowRuntime runtime = builder.build()) {
System.out.println("Start workflow runtime");
runtime.start(false);
}
这个过程中,注册了 workflow 和 activity,然后 start workflow runtime。workflow runtime 会启动 worker,从 dapr sidecar 持续获取工作任务,包括 workflow task 和 activity task,然后执行这些任务并把任务结果返回给到 dapr sidecar。
@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Sidecar" as DaprSidecar
WorkflowApp -> WorkflowApp: registerWorkflow()
WorkflowApp -> WorkflowApp: registerActivity()
WorkflowApp -[#red]> WorkflowApp: WorkflowRuntime.start()
WorkflowApp -> DaprSidecar: WorkflowRuntime.getWorkItems()
DaprSidecar --> WorkflowApp:
loop has next task
alt is orchestration task
WorkflowApp -> WorkflowApp: execute orchestration task
WorkflowApp -> DaprSidecar: completeOrchestratorTask()
DaprSidecar --> WorkflowApp:
else is activity task
WorkflowApp -> WorkflowApp: execute activity task
WorkflowApp -> DaprSidecar: completeActivityTask()
DaprSidecar --> WorkflowApp:
end
end
@enduml
@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Java SDK" as DaprJavaSDK
participant "DurableTask Java SDK" as DurableTaskJavaSDK
WorkflowApp -> DaprJavaSDK: registerWorkflow()
DaprJavaSDK -> DurableTaskJavaSDK: addOrchestration()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp:
@enduml
@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Java SDK" as DaprJavaSDK
participant "DurableTask Java SDK" as DurableTaskJavaSDK
WorkflowApp -> DaprJavaSDK: registerActivity()
DaprJavaSDK -> DurableTaskJavaSDK: registerActivity()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp:
@enduml
@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Java SDK" as DaprJavaSDK
participant "DurableTask Java SDK" as DurableTaskJavaSDK
WorkflowApp -> DaprJavaSDK: WorkflowRuntime.start()
DaprJavaSDK -> DurableTaskJavaSDK: worker.start()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp:
@enduml
@startuml
participant "Workflow App" as WorkflowApp
participant "Dapr Java SDK" as DaprJavaSDK
participant "DurableTask Java SDK" as DurableTaskJavaSDK
WorkflowApp -> DaprJavaSDK: registerWorkflow()
DaprJavaSDK -> DurableTaskJavaSDK: addOrchestration()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp:
WorkflowApp -> DaprJavaSDK: registerActivity()
DaprJavaSDK -> DurableTaskJavaSDK: registerActivity()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp:
WorkflowApp -> DaprJavaSDK: WorkflowRuntime.start()
DaprJavaSDK -> DurableTaskJavaSDK: worker.start()
DurableTaskJavaSDK --> DaprJavaSDK
DaprJavaSDK --> WorkflowApp:
@enduml
workflow app 中构建 WorkflowRuntime 的典型使用代码如下:
// Register the OrderProcessingWorkflow and its activities with the builder.
WorkflowRuntimeBuilder builder = new WorkflowRuntimeBuilder().registerWorkflow(OrderProcessingWorkflow.class);
builder.registerActivity(NotifyActivity.class);
builder.registerActivity(ProcessPaymentActivity.class);
builder.registerActivity(RequestApprovalActivity.class);
builder.registerActivity(ReserveInventoryActivity.class);
builder.registerActivity(UpdateInventoryActivity.class);
// Build and then start the workflow runtime pulling and executing tasks
try (WorkflowRuntime runtime = builder.build()) {
System.out.println("Start workflow runtime");
runtime.start(false);
}
这个类在 dapr java sdk。
WorkflowRuntimeBuilder 的实现中,自己会保存 workflows 和 activities 信息,也会构建一个来自 DurableTask java sdk 的 DurableTaskGrpcWorkerBuilder 的实例。
import com.microsoft.durabletask.DurableTaskGrpcWorkerBuilder;
public class WorkflowRuntimeBuilder {
private static volatile WorkflowRuntime instance;
private DurableTaskGrpcWorkerBuilder builder;
private Logger logger;
private Set<String> workflows = new HashSet<String>();
private Set<String> activities = new HashSet<String>();
/**
* Constructs the WorkflowRuntimeBuilder.
*/
public WorkflowRuntimeBuilder() {
this.builder = new DurableTaskGrpcWorkerBuilder().grpcChannel(
NetworkUtils.buildGrpcManagedChannel(WORKFLOW_INTERCEPTOR));
this.logger = Logger.getLogger(WorkflowRuntimeBuilder.class.getName());
}
registerWorkflow() 方法的实现,除了将请求代理给 DurableTaskGrpcWorkerBuilder 之外,还自己保存到 workflows 集合中:
public <T extends Workflow> WorkflowRuntimeBuilder registerWorkflow(Class<T> clazz) {
this.builder = this.builder.addOrchestration(
new OrchestratorWrapper<>(clazz)
);
this.logger.log(Level.INFO, "Registered Workflow: " + clazz.getSimpleName());
this.workflows.add(clazz.getSimpleName());
return this;
}
registerActivity() 方法的实现类似,除了将请求代理给 DurableTaskGrpcWorkerBuilder 之外,还自己保存到 activities 集合中:
public <T extends WorkflowActivity> void registerActivity(Class<T> clazz) {
this.builder = this.builder.addActivity(
new ActivityWrapper<>(clazz)
);
this.logger.log(Level.INFO, "Registered Activity: " + clazz.getSimpleName());
this.activities.add(clazz.getSimpleName());
}
OrchestratorWrapper 和 ActivityWrapper 负责将 class 包装为 TaskOrchestrationFactory 和 TaskActivityFactory。
build() 方法调用 DurableTaskGrpcWorkerBuilder 的 build() 方法构建出一个 DurableTaskGrpcWorker ,然后传递给 WorkflowRuntime 的新实例。
public WorkflowRuntime build() {
if (instance == null) {
synchronized (WorkflowRuntime.class) {
if (instance == null) {
instance = new WorkflowRuntime(this.builder.build());
}
}
}
this.logger.log(Level.INFO, "Successfully built dapr workflow runtime");
return instance;
}
这个类在durabletask java sdk中。
DurableTaskGrpcWorkerBuilder 保存 orchestrationFactories 和 activityFactories,还有和 sidecar 连接的一些信息如端口,grpc channel:
public final class DurableTaskGrpcWorkerBuilder {
final HashMap<String, TaskOrchestrationFactory> orchestrationFactories = new HashMap<>();
final HashMap<String, TaskActivityFactory> activityFactories = new HashMap<>();
int port;
Channel channel;
DataConverter dataConverter;
Duration maximumTimerInterval;
......
}
addOrchestration() 将 TaskOrchestrationFactory 保存到 orchestrationFactories 中,key为 name:
public DurableTaskGrpcWorkerBuilder addOrchestration(TaskOrchestrationFactory factory) {
String key = factory.getName();
......
this.orchestrationFactories.put(key, factory);
return this;
}
类似的, addActivity() 将 TaskActivityFactory 保存到 activityFactories 中,key为 name:
public DurableTaskGrpcWorkerBuilder addActivity(TaskActivityFactory factory) {
String key = factory.getName();
......
this.activityFactories.put(key, factory);
return this;
}
build() 方法构建出 DurableTaskGrpcWorker() 对象:
public DurableTaskGrpcWorker build() {
return new DurableTaskGrpcWorker(this);
}
DurableTaskGrpcWorker 的构造函数中会保存注册好的 orchestrationFactories 和 activityFactories,然后构建 TaskHubSidecarServiceGrpc 对象作为 sidecarClient,用于后续和 dapr sidecar 交互:
public final class DurableTaskGrpcWorker implements AutoCloseable {
private final HashMap<String, TaskOrchestrationFactory> orchestrationFactories = new HashMap<>();
private final HashMap<String, TaskActivityFactory> activityFactories = new HashMap<>();
private final TaskHubSidecarServiceBlockingStub sidecarClient;
DurableTaskGrpcWorker(DurableTaskGrpcWorkerBuilder builder) {
this.orchestrationFactories.putAll(builder.orchestrationFactories);
this.activityFactories.putAll(builder.activityFactories);
Channel sidecarGrpcChannel;
if (builder.channel != null) {
// The caller is responsible for managing the channel lifetime
this.managedSidecarChannel = null;
sidecarGrpcChannel = builder.channel;
} else {
// Construct our own channel using localhost + a port number
int port = DEFAULT_PORT;
if (builder.port > 0) {
port = builder.port;
}
// Need to keep track of this channel so we can dispose it on close()
this.managedSidecarChannel = ManagedChannelBuilder
.forAddress("localhost", port)
.usePlaintext()
.build();
sidecarGrpcChannel = this.managedSidecarChannel;
}
this.sidecarClient = TaskHubSidecarServiceGrpc.newBlockingStub(sidecarGrpcChannel);
this.dataConverter = builder.dataConverter != null ? builder.dataConverter : new JacksonDataConverter();
this.maximumTimerInterval = builder.maximumTimerInterval != null ? builder.maximumTimerInterval : DEFAULT_MAXIMUM_TIMER_INTERVAL;
}
dapr java sdk 中的 WorkflowRuntimeBuilder 和 durabletask java sdk 中的 DurableTaskGrpcWorkerBuilder,都是用来保住构建最终要使用的 WorkflowRuntime 和 DurableTaskGrpcWorker。
workflow app 中启动 WorkflowRuntime 的典型使用代码如下:
// Build and then start the workflow runtime pulling and executing tasks
try (WorkflowRuntime runtime = builder.build()) {
System.out.println("Start workflow runtime");
//这里写死了 block=false,不会 block
runtime.start(false);
}
这个类在 dapr java sdk。
WorkflowRuntime 只是对 DurableTaskGrpcWorker 的一个简单包装:
public class WorkflowRuntime implements AutoCloseable {
private DurableTaskGrpcWorker worker;
public WorkflowRuntime(DurableTaskGrpcWorker worker) {
this.worker = worker;
}
......
public void start(boolean block) {
if (block) {
this.worker.startAndBlock();
} else {
this.worker.start();
}
}
}
这个类在durabletask java sdk中。
真实的实现代码在 DurableTaskGrpcWorker 中。
public void start(boolean block) {
if (block) {
this.worker.startAndBlock();
} else {
// 1. block写死false了,所以只会进入到这里
this.worker.start();
}
}
public void start() {
// 2. 启动线程来执行 startAndBlock,所以是不阻塞的
new Thread(this::startAndBlock).start();
}
这是最关键的代码。
这里不展开,看下一章 workflow runtime 的运行。
上一章看到 workflow runtime start 之后,就会启动任务处理的流程。
代码实现在 durabletask java sdk 中的 DurableTaskGrpcWorker 类的 startAndBlock()方法中。
这是最关键的代码。
先构建两个 executor,负责执行 Orchestration task 和 activity task:
TaskOrchestrationExecutor taskOrchestrationExecutor = new TaskOrchestrationExecutor(
this.orchestrationFactories,
this.dataConverter,
this.maximumTimerInterval,
logger);
TaskActivityExecutor taskActivityExecutor = new TaskActivityExecutor(
this.activityFactories,
this.dataConverter,
logger);
传入的参数有 orchestrationFactories 和 taskActivityExecutor,之前构建时注册的信息都保存在这里面。
然后就是一个无限循环,在循环中调用 sidecarClient.getWorkItems(), 针对返回的 workitem stream,还有一个无限循环。而且如果遇到 StatusRuntimeException ,还会sleep之后继续。
while (true) {
try {
GetWorkItemsRequest getWorkItemsRequest = GetWorkItemsRequest.newBuilder().build();
Iterator<WorkItem> workItemStream = this.sidecarClient.getWorkItems(getWorkItemsRequest);
while (workItemStream.hasNext()) {
......
}
} catch(StatusRuntimeException e){
......
// Retry after 5 seconds
try {
Thread.sleep(5000);
} catch (InterruptedException ex) {
break;
}
}
}
work items 的类型只有两种 orchestrator 和 activity:
while (workItemStream.hasNext()) {
WorkItem workItem = workItemStream.next();
RequestCase requestType = workItem.getRequestCase();
if (requestType == RequestCase.ORCHESTRATORREQUEST) {
......
} else if (requestType == RequestCase.ACTIVITYREQUEST) {
......
} else {
logger.log(Level.WARNING, "Received and dropped an unknown '{0}' work-item from the sidecar.", requestType);
}
}
通过 taskOrchestrationExecutor 执行 orchestrator task,然后将结果返回给到 dapr sidecar。
OrchestratorRequest orchestratorRequest = workItem.getOrchestratorRequest();
TaskOrchestratorResult taskOrchestratorResult = taskOrchestrationExecutor.execute(
orchestratorRequest.getPastEventsList(),
orchestratorRequest.getNewEventsList());
OrchestratorResponse response = OrchestratorResponse.newBuilder()
.setInstanceId(orchestratorRequest.getInstanceId())
.addAllActions(taskOrchestratorResult.getActions())
.setCustomStatus(StringValue.of(taskOrchestratorResult.getCustomStatus()))
.build();
this.sidecarClient.completeOrchestratorTask(response);
备注:比较奇怪的是这里为什么不用 grpc 双向 stream 来获取任务和返回任务执行结果,而是通过另外一个 completeOrchestratorTask() 方法来发起请求。
类似的,通过 taskActivityExecutor 执行 avtivity task,然后将结果返回给到 dapr sidecar。
ActivityRequest activityRequest = workItem.getActivityRequest();
String output = null;
TaskFailureDetails failureDetails = null;
try {
output = taskActivityExecutor.execute(
activityRequest.getName(),
activityRequest.getInput().getValue(),
activityRequest.getTaskId());
} catch (Throwable e) {
failureDetails = TaskFailureDetails.newBuilder()
.setErrorType(e.getClass().getName())
.setErrorMessage(e.getMessage())
.setStackTrace(StringValue.of(FailureDetails.getFullStackTrace(e)))
.build();
}
ActivityResponse.Builder responseBuilder = ActivityResponse.newBuilder()
.setInstanceId(activityRequest.getOrchestrationInstance().getInstanceId())
.setTaskId(activityRequest.getTaskId());
if (output != null) {
responseBuilder.setResult(StringValue.of(output));
}
if (failureDetails != null) {
responseBuilder.setFailureDetails(failureDetails);
}
this.sidecarClient.completeActivityTask(responseBuilder.build());
DurableTaskGrpcWorker 会调用 sidecarClient.getWorkItems() 来获取工作任务。
private final TaskHubSidecarServiceBlockingStub sidecarClient;
while (true) {
try {
GetWorkItemsRequest getWorkItemsRequest = GetWorkItemsRequest.newBuilder().build();
Iterator<WorkItem> workItemStream = this.sidecarClient.getWorkItems(getWorkItemsRequest);
while (workItemStream.hasNext()) {
......
}
} catch{}
}
TaskHubSidecarServiceBlockingStub 是根据 protobuf 文件生成的 grpc 代码,其 protobuf 定义在submodules/durabletask-protobuf/protos/orchestrator_service.proto
文件中。
service TaskHubSidecarService {
......
rpc GetWorkItems(GetWorkItemsRequest) returns (stream WorkItem);
......
}
GetWorkItemsRequest 和 WorkItem 的消息定义为:
message GetWorkItemsRequest {
// No parameters currently
}
message WorkItem {
oneof request {
OrchestratorRequest orchestratorRequest = 1;
ActivityRequest activityRequest = 2;
}
}
WorkItem 可能是 OrchestratorRequest 或者 ActivityRequest 。
message OrchestratorRequest {
string instanceId = 1;
google.protobuf.StringValue executionId = 2;
repeated HistoryEvent pastEvents = 3;
repeated HistoryEvent newEvents = 4;
}
message ActivityRequest {
string name = 1;
google.protobuf.StringValue version = 2;
google.protobuf.StringValue input = 3;
OrchestrationInstance orchestrationInstance = 4;
int32 taskId = 5;
}
message HistoryEvent {
int32 eventId = 1;
google.protobuf.Timestamp timestamp = 2;
oneof eventType {
ExecutionStartedEvent executionStarted = 3;
ExecutionCompletedEvent executionCompleted = 4;
ExecutionTerminatedEvent executionTerminated = 5;
TaskScheduledEvent taskScheduled = 6;
TaskCompletedEvent taskCompleted = 7;
TaskFailedEvent taskFailed = 8;
SubOrchestrationInstanceCreatedEvent subOrchestrationInstanceCreated = 9;
SubOrchestrationInstanceCompletedEvent subOrchestrationInstanceCompleted = 10;
SubOrchestrationInstanceFailedEvent subOrchestrationInstanceFailed = 11;
TimerCreatedEvent timerCreated = 12;
TimerFiredEvent timerFired = 13;
OrchestratorStartedEvent orchestratorStarted = 14;
OrchestratorCompletedEvent orchestratorCompleted = 15;
EventSentEvent eventSent = 16;
EventRaisedEvent eventRaised = 17;
GenericEvent genericEvent = 18;
HistoryStateEvent historyState = 19;
ContinueAsNewEvent continueAsNew = 20;
ExecutionSuspendedEvent executionSuspended = 21;
ExecutionResumedEvent executionResumed = 22;
}
}
workflow app 中通过调用 sidecarClient.getWorkItems() 方法来获取 work items。
Iterator<WorkItem> workItemStream = this.sidecarClient.getWorkItems(getWorkItemsRequest);
这里面就是 grpc stub 的生成代码,不细看
TaskHubSidecarService 这个 protobuf 定义的 grpc service 的服务器端,代码实现在 durabletask-go 仓库中。
protobuf 生成的 grpc stub 的类在这里:
服务器端代码实现在 backend/executor.go
中:
// GetWorkItems implements protos.TaskHubSidecarServiceServer
func (g *grpcExecutor) GetWorkItems(req *protos.GetWorkItemsRequest, stream protos.TaskHubSidecarService_GetWorkItemsServer) error {
......
// The worker client invokes this method, which streams back work-items as they arrive.
for {
select {
case <-stream.Context().Done():
g.logger.Infof("work item stream closed")
return nil
case wi := <-g.workItemQueue:
if err := stream.Send(wi); err != nil {
return err
}
case <-g.streamShutdownChan:
return errShuttingDown
}
}
}
所以返回给客户端调用的 work item stream 的数据来自 g.workItemQueue
type grpcExecutor struct {
......
workItemQueue chan *protos.WorkItem
}
workItemQueue 在 grpcExecutor 中定义:
type grpcExecutor struct {
workItemQueue chan *protos.WorkItem
......
}
grpcExecutor 在 NewGrpcExecutor() 方法中构建:
// NewGrpcExecutor returns the Executor object and a method to invoke to register the gRPC server in the executor.
func NewGrpcExecutor(be Backend, logger Logger, opts ...grpcExecutorOptions) (executor Executor, registerServerFn func(grpcServer grpc.ServiceRegistrar)) {
grpcExecutor := &grpcExecutor{
workItemQueue: make(chan *protos.WorkItem),
backend: be,
logger: logger,
pendingOrchestrators: &sync.Map{},
pendingActivities: &sync.Map{},
}
......
}
将数据写入 workItemQueue 的地方有两个:
ExecuteOrchestrator()
func (executor *grpcExecutor) ExecuteOrchestrator(......) {
......
workItem := &protos.WorkItem{
Request: &protos.WorkItem_OrchestratorRequest{
OrchestratorRequest: &protos.OrchestratorRequest{
InstanceId: string(iid),
ExecutionId: nil,
PastEvents: oldEvents,
NewEvents: newEvents,
},
},
}
executor.workItemQueue <- workItem:
}
ExecuteActivity()
func (executor *grpcExecutor) ExecuteActivity(......) {
workItem := &protos.WorkItem{
Request: &protos.WorkItem_ActivityRequest{
ActivityRequest: &protos.ActivityRequest{
Name: task.Name,
Version: task.Version,
Input: task.Input,
OrchestrationInstance: &protos.OrchestrationInstance{InstanceId: string(iid)},
TaskId: e.EventId,
},
},
executor.workItemQueue <- workItem:
}
继续跟踪看 ExecuteOrchestrator() 和 ExecuteActivity() 方法是被谁调用的,这个细节在下一节中。
获取工作任务的任务源头在 dapr sidecar,代码实现在 durabletask-go 项目的 backend/executor.go
中。
前面看到执行orchestrator task的代码实现在 durabletask-go 仓库的 client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java
中。
TaskOrchestrationExecutor taskOrchestrationExecutor = new TaskOrchestrationExecutor(
this.orchestrationFactories,
this.dataConverter,
this.maximumTimerInterval,
logger);
......
Iterator<WorkItem> workItemStream = this.sidecarClient.getWorkItems(getWorkItemsRequest);
while (workItemStream.hasNext()) {
WorkItem workItem = workItemStream.next();
RequestCase requestType = workItem.getRequestCase();
if (requestType == RequestCase.ORCHESTRATORREQUEST) {
OrchestratorRequest orchestratorRequest = workItem.getOrchestratorRequest();
TaskOrchestratorResult taskOrchestratorResult = taskOrchestrationExecutor.execute(
orchestratorRequest.getPastEventsList(),
orchestratorRequest.getNewEventsList());
OrchestratorResponse response = OrchestratorResponse.newBuilder()
.setInstanceId(orchestratorRequest.getInstanceId())
.addAllActions(taskOrchestratorResult.getActions())
.setCustomStatus(StringValue.of(taskOrchestratorResult.getCustomStatus()))
.build();
this.sidecarClient.completeOrchestratorTask(response);
}
......
TaskOrchestrationExecutor 类的定义和构造函数:
final class TaskOrchestrationExecutor {
private static final String EMPTY_STRING = "";
private final HashMap<String, TaskOrchestrationFactory> orchestrationFactories;
private final DataConverter dataConverter;
private final Logger logger;
private final Duration maximumTimerInterval;
public TaskOrchestrationExecutor(
HashMap<String, TaskOrchestrationFactory> orchestrationFactories,
DataConverter dataConverter,
Duration maximumTimerInterval,
Logger logger) {
this.orchestrationFactories = orchestrationFactories;
this.dataConverter = dataConverter;
this.maximumTimerInterval = maximumTimerInterval;
this.logger = logger;
}
其中 orchestrationFactories 是从前面 registerWorkflow()时保存的已经注册的工作流信息。
execute() 方法:
public TaskOrchestratorResult execute(List<HistoryEvent> pastEvents, List<HistoryEvent> newEvents) {
ContextImplTask context = new ContextImplTask(pastEvents, newEvents);
boolean completed = false;
try {
// Play through the history events until either we've played through everything
// or we receive a yield signal
while (context.processNextEvent()) { /* no method body */ }
completed = true;
} catch (OrchestratorBlockedException orchestratorBlockedException) {
logger.fine("The orchestrator has yielded and will await for new events.");
} catch (ContinueAsNewInterruption continueAsNewInterruption) {
logger.fine("The orchestrator has continued as new.");
context.complete(null);
} catch (Exception e) {
// The orchestrator threw an unhandled exception - fail it
// TODO: What's the right way to log this?
logger.warning("The orchestrator failed with an unhandled exception: " + e.toString());
context.fail(new FailureDetails(e));
}
if ((context.continuedAsNew && !context.isComplete) || (completed && context.pendingActions.isEmpty() && !context.waitingForEvents())) {
// There are no further actions for the orchestrator to take so auto-complete the orchestration.
context.complete(null);
}
return new TaskOrchestratorResult(context.pendingActions.values(), context.getCustomStatus());
}
这里只是主要流程,细节实现在内部私有类 ContextImplTask 中。
ContextImplTask 的定义和构造函数,使用到 OrchestrationHistoryIterator。
private class ContextImplTask implements TaskOrchestrationContext {
private final OrchestrationHistoryIterator historyEventPlayer;
......
public ContextImplTask(List<HistoryEvent> pastEvents, List<HistoryEvent> newEvents) {
this.historyEventPlayer = new OrchestrationHistoryIterator(pastEvents, newEvents);
}
......
private boolean processNextEvent() {
return this.historyEventPlayer.moveNext();
}
}
OrchestrationHistoryIterator 的类定义和构造函数,其中 pastEvents 和 newEvents 是 daprd sidecar 那边在 getWorkItem() 返回的 orchestratorRequest 中携带的数据。
private class OrchestrationHistoryIterator {
private final List<HistoryEvent> pastEvents;
private final List<HistoryEvent> newEvents;
private List<HistoryEvent> currentHistoryList;
private int currentHistoryIndex;
public OrchestrationHistoryIterator(List<HistoryEvent> pastEvents, List<HistoryEvent> newEvents) {
this.pastEvents = pastEvents;
this.newEvents = newEvents;
this.currentHistoryList = pastEvents;
}
currentHistoryList 初始化指向 pastEvents,currentHistoryIndex 为0。
然后继续看 moveNext() 方法:
public boolean moveNext() {
if (this.currentHistoryList == pastEvents && this.currentHistoryIndex >= pastEvents.size()) {
// 如果当前 currentHistoryList 指向的是 pastEvents,并且已经指到最后一个元素了。
// 那么 moveNext 就应该指向 this.newEvents,然后将 currentHistoryIndex 设置为0 (即指向第一个元素)
// Move forward to the next list
this.currentHistoryList = this.newEvents;
this.currentHistoryIndex = 0;
// 这意味着 pastEvents 的游历接触,即 replay 完成。
ContextImplTask.this.setDoneReplaying();
}
if (this.currentHistoryList == this.newEvents && this.currentHistoryIndex >= this.newEvents.size()) {
// 如果当前 currentHistoryList 指向的是 newEvents,并且已经指到最后一个元素了。
// 此时已经完成游历,没有更多元素,返回 false 表示可以结束了。
// We're done enumerating the history
return false;
}
// Process the next event in the history
// 获取当前元素,然后 currentHistoryIndex +1 指向下一个元素
HistoryEvent next = this.currentHistoryList.get(this.currentHistoryIndex++);
// 处理事件
ContextImplTask.this.processEvent(next);
return true;
}
处理事件的代码实现在 ContextImplTask 的 processEvent() 方法中:
private void processEvent(HistoryEvent e) {
boolean overrideSuspension = e.getEventTypeCase() == HistoryEvent.EventTypeCase.EXECUTIONRESUMED || e.getEventTypeCase() == HistoryEvent.EventTypeCase.EXECUTIONTERMINATED;
if (this.isSuspended && !overrideSuspension) {
this.handleEventWhileSuspended(e);
} else {
switch (e.getEventTypeCase()) {
case ORCHESTRATORSTARTED:
Instant instant = DataConverter.getInstantFromTimestamp(e.getTimestamp());
this.setCurrentInstant(instant);
break;
case ORCHESTRATORCOMPLETED:
// No action
break;
case EXECUTIONSTARTED:
ExecutionStartedEvent startedEvent = e.getExecutionStarted();
String name = startedEvent.getName();
this.setName(name);
String instanceId = startedEvent.getOrchestrationInstance().getInstanceId();
this.setInstanceId(instanceId);
String input = startedEvent.getInput().getValue();
this.setInput(input);
TaskOrchestrationFactory factory = TaskOrchestrationExecutor.this.orchestrationFactories.get(name);
if (factory == null) {
// Try getting the default orchestrator
factory = TaskOrchestrationExecutor.this.orchestrationFactories.get("*");
}
// TODO: Throw if the factory is null (orchestration by that name doesn't exist)
TaskOrchestration orchestrator = factory.create();
orchestrator.run(this);
break;
// case EXECUTIONCOMPLETED:
// break;
// case EXECUTIONFAILED:
// break;
case EXECUTIONTERMINATED:
this.handleExecutionTerminated(e);
break;
case TASKSCHEDULED:
this.handleTaskScheduled(e);
break;
case TASKCOMPLETED:
this.handleTaskCompleted(e);
break;
case TASKFAILED:
this.handleTaskFailed(e);
break;
case TIMERCREATED:
this.handleTimerCreated(e);
break;
case TIMERFIRED:
this.handleTimerFired(e);
break;
case SUBORCHESTRATIONINSTANCECREATED:
this.handleSubOrchestrationCreated(e);
break;
case SUBORCHESTRATIONINSTANCECOMPLETED:
this.handleSubOrchestrationCompleted(e);
break;
case SUBORCHESTRATIONINSTANCEFAILED:
this.handleSubOrchestrationFailed(e);
break;
// case EVENTSENT:
// break;
case EVENTRAISED:
this.handleEventRaised(e);
break;
// case GENERICEVENT:
// break;
// case HISTORYSTATE:
// break;
// case EVENTTYPE_NOT_SET:
// break;
case EXECUTIONSUSPENDED:
this.handleExecutionSuspended(e);
break;
case EXECUTIONRESUMED:
this.handleExecutionResumed(e);
break;
default:
throw new IllegalStateException("Don't know how to handle history type " + e.getEventTypeCase());
}
}
}
这里具体会执行什么代码,就要看给过来的 event 是什么了。
这是 ExecutionStartedEvent 的 proto 定义:
message ExecutionStartedEvent {
string name = 1;
google.protobuf.StringValue version = 2;
google.protobuf.StringValue input = 3;
OrchestrationInstance orchestrationInstance = 4;
ParentInstanceInfo parentInstance = 5;
google.protobuf.Timestamp scheduledStartTimestamp = 6;
TraceContext parentTraceContext = 7;
google.protobuf.StringValue orchestrationSpanID = 8;
}
EXECUTIONSTARTED 事件的处理:
case EXECUTIONSTARTED:
ExecutionStartedEvent startedEvent = e.getExecutionStarted();
String name = startedEvent.getName();
this.setName(name);
String instanceId = startedEvent.getOrchestrationInstance().getInstanceId();
this.setInstanceId(instanceId);
String input = startedEvent.getInput().getValue();
this.setInput(input);
TaskOrchestrationFactory factory = TaskOrchestrationExecutor.this.orchestrationFactories.get(name);
if (factory == null) {
// Try getting the default orchestrator
factory = TaskOrchestrationExecutor.this.orchestrationFactories.get("*");
}
// TODO: Throw if the factory is null (orchestration by that name doesn't exist)
TaskOrchestration orchestrator = factory.create();
orchestrator.run(this);
break;
name / instanceId / input 等基本信息直接设置在 ContextImplTask 上。
factory 要从 orchestrationFactories 里面根据 name 查找,如果没有找到,则获取默认。
从 factory 创建 TaskOrchestration,再运行 orchestrator.run():
TaskOrchestration orchestrator = factory.create();
orchestrator.run(this);
这就回到 TaskOrchestration 的实现了。
Dapr java sdk 中的 OrchestratorWrapper 实现了 TaskOrchestration 接口
class OrchestratorWrapper<T extends Workflow> implements TaskOrchestrationFactory {
@Override
public TaskOrchestration create() {
return ctx -> {
T workflow;
try {
workflow = this.workflowConstructor.newInstance();
} ......
};
}
}
client app 启动时,典型代码如下(忽略细节和异常处理):
DaprWorkflowClient workflowClient = new DaprWorkflowClient();
String instanceId = workflowClient.scheduleNewWorkflow(OrderProcessingWorkflow.class, order);
workflowClient.waitForInstanceStart(instanceId, Duration.ofSeconds(10), false);
WorkflowInstanceStatus workflowStatus = workflowClient.waitForInstanceCompletion(instanceId,
Duration.ofSeconds(30),
这个过程中,初始化 workflowClient,然后通过 workflowClient 调度执行了一个 workflow 实例:包括等待实例启动,等待实例完成。
@startuml
participant "Client App" as ClientApp
participant "Dapr Sidecar" as DaprSidecar
ClientApp -> ClientApp: create workflow client
ClientApp -[#red]> DaprSidecar: scheduleNewWorkflow()
DaprSidecar --> ClientApp: instanceId
ClientApp -> DaprSidecar: waitForInstanceStart(instanceId)
DaprSidecar --> ClientApp:
ClientApp -> DaprSidecar: waitForInstanceCompletion(instanceId)
DaprSidecar --> ClientApp:
@enduml
Dapr java SDK 中的 DaprWorkflowClient,包裹了 durabletask java sdk 的 DurableTaskClient:
public class DaprWorkflowClient implements AutoCloseable {
private DurableTaskClient innerClient;
private ManagedChannel grpcChannel;
private DaprWorkflowClient(ManagedChannel grpcChannel) {
this(createDurableTaskClient(grpcChannel), grpcChannel);
}
private DaprWorkflowClient(DurableTaskClient innerClient, ManagedChannel grpcChannel) {
this.innerClient = innerClient;
this.grpcChannel = grpcChannel;
}
private static DurableTaskClient createDurableTaskClient(ManagedChannel grpcChannel) {
return new DurableTaskGrpcClientBuilder()
.grpcChannel(grpcChannel)
.build();
}
......
}
scheduleNewWorkflow()方法代理给了 DurableTaskClient 的 scheduleNewOrchestrationInstance() 方法:
public <T extends Workflow> String scheduleNewWorkflow(Class<T> clazz, Object input, String instanceId) {
return this.innerClient.scheduleNewOrchestrationInstance(clazz.getCanonicalName(), input, instanceId);
}
这两个类在 durabletask java sdk 中。
DurableTaskGrpcClient 的 scheduleNewOrchestrationInstance() 方法的实现:
@Override
public String scheduleNewOrchestrationInstance(
String orchestratorName,
NewOrchestrationInstanceOptions options) {
if (orchestratorName == null || orchestratorName.length() == 0) {
throw new IllegalArgumentException("A non-empty orchestrator name must be specified.");
}
Helpers.throwIfArgumentNull(options, "options");
CreateInstanceRequest.Builder builder = CreateInstanceRequest.newBuilder();
builder.setName(orchestratorName);
String instanceId = options.getInstanceId();
if (instanceId == null) {
instanceId = UUID.randomUUID().toString();
}
builder.setInstanceId(instanceId);
String version = options.getVersion();
if (version != null) {
builder.setVersion(StringValue.of(version));
}
Object input = options.getInput();
if (input != null) {
String serializedInput = this.dataConverter.serialize(input);
builder.setInput(StringValue.of(serializedInput));
}
Instant startTime = options.getStartTime();
if (startTime != null) {
Timestamp ts = DataConverter.getTimestampFromInstant(startTime);
builder.setScheduledStartTimestamp(ts);
}
CreateInstanceRequest request = builder.build();
CreateInstanceResponse response = this.sidecarClient.startInstance(request);
return response.getInstanceId();
}
前面一大段都是为了构建 CreateInstanceRequest,然后最后调用 sidecarClient.startInstance() 方法去访问 sidecar 。
TaskHubSidecarServiceBlockingStub 是根据 protobuf 文件生成的 grpc 代码,其 protobuf 定义在submodules/durabletask-protobuf/protos/orchestrator_service.proto 文件中。
service TaskHubSidecarService {
......
// Starts a new orchestration instance.
rpc StartInstance(CreateInstanceRequest) returns (CreateInstanceResponse);
......
}
CreateInstanceRequest 消息的定义为:
message CreateInstanceRequest {
string instanceId = 1;
string name = 2;
google.protobuf.StringValue version = 3;
google.protobuf.StringValue input = 4;
google.protobuf.Timestamp scheduledStartTimestamp = 5;
OrchestrationIdReusePolicy orchestrationIdReusePolicy = 6;
}
备注:这个version字段不知道是做什么的?后面注意看看细节。
CreateInstanceResponse 信息的定义,很简单,只有一个 instanceId 字段。
message CreateInstanceResponse {
string instanceId = 1;
}
StartInstance 的代码实现在 backend/executor.go
中:
func (g *grpcExecutor) StartInstance(ctx context.Context, req *protos.CreateInstanceRequest) (*protos.CreateInstanceResponse, error) {
instanceID := req.InstanceId
ctx, span := helpers.StartNewCreateOrchestrationSpan(ctx, req.Name, req.Version.GetValue(), instanceID)
defer span.End()
e := helpers.NewExecutionStartedEvent(req.Name, instanceID, req.Input, nil, helpers.TraceContextFromSpan(span))
if err := g.backend.CreateOrchestrationInstance(ctx, e, WithOrchestrationIdReusePolicy(req.OrchestrationIdReusePolicy)); err != nil {
return nil, err
}
return &protos.CreateInstanceResponse{InstanceId: instanceID}, nil
}
helpers.StartNewCreateOrchestrationSpan() 方法的实现:
func StartNewCreateOrchestrationSpan(
ctx context.Context, name string, version string, instanceID string,
) (context.Context, trace.Span) {
attributes := []attribute.KeyValue{
{Key: "durabletask.type", Value: attribute.StringValue("orchestration")},
{Key: "durabletask.task.name", Value: attribute.StringValue(name)},
{Key: "durabletask.task.instance_id", Value: attribute.StringValue(instanceID)},
}
return startNewSpan(ctx, "create_orchestration", name, version, attributes, trace.SpanKindClient, time.Now().UTC())
}
startNewSpan()的实现:
func startNewSpan(
ctx context.Context,
taskType string,
taskName string,
taskVersion string,
attributes []attribute.KeyValue,
kind trace.SpanKind,
timestamp time.Time,
) (context.Context, trace.Span) {
var spanName string
if taskVersion != "" {
spanName = taskType + "||" + taskName + "||" + taskVersion
attributes = append(attributes, attribute.KeyValue{
Key: "durabletask.task.version",
Value: attribute.StringValue(taskVersion),
})
} else if taskName != "" {
spanName = taskType + "||" + taskName
} else {
spanName = taskType
}
var span trace.Span
ctx, span = tracer.Start(
ctx,
spanName,
trace.WithSpanKind(kind),
trace.WithTimestamp(timestamp),
trace.WithAttributes(attributes...),
)
return ctx, span
}
构建 spanName 的逻辑比较复杂,因为 taskVersion 和 taskName 可能为空(按说 taskName 不能为空)
taskType + "||" + taskName + "||" + taskVersion
taskType + "||" + taskName
taskType
这行代码的作用是构建一个 ExecutionStartedEvent 事件:
e := helpers.NewExecutionStartedEvent(req.Name, instanceID, req.Input, nil, helpers.TraceContextFromSpan(span))
具体实现为:
func NewExecutionStartedEvent(
name string,
instanceId string,
input *wrapperspb.StringValue,
parent *protos.ParentInstanceInfo,
parentTraceContext *protos.TraceContext,
) *protos.HistoryEvent {
return &protos.HistoryEvent{
EventId: -1,
Timestamp: timestamppb.New(time.Now()),
EventType: &protos.HistoryEvent_ExecutionStarted{
ExecutionStarted: &protos.ExecutionStartedEvent{
Name: name,
ParentInstance: parent,
Input: input,
OrchestrationInstance: &protos.OrchestrationInstance{
InstanceId: instanceId,
ExecutionId: wrapperspb.String(uuid.New().String()),
},
ParentTraceContext: parentTraceContext,
},
},
}
}
备注:这里没有用到 version 字段
最关键的代码:
if err := g.backend.CreateOrchestrationInstance(ctx, e, WithOrchestrationIdReusePolicy(req.OrchestrationIdReusePolicy)); err != nil {
return nil, err
}
Backend 是一个 interface,CreateOrchestrationInstance() 方法定义如下:
type Backend interface {
// CreateOrchestrationInstance creates a new orchestration instance with a history event that
// wraps a ExecutionStarted event.
CreateOrchestrationInstance(context.Context, *HistoryEvent, ...OrchestrationIdReusePolicyOptions) error
......
}
在 daprd sidecar 的代码实现中,这个 backend 是这样构建的,代码在 dapr/dapr 仓库的 pkg/runtime/wfengine/wfengine.go
:
func (wfe *WorkflowEngine) ConfigureGrpcExecutor() {
// Enable lazy auto-starting the worker only when a workflow app connects to fetch work items.
autoStartCallback := backend.WithOnGetWorkItemsConnectionCallback(func(ctx context.Context) error {
// NOTE: We don't propagate the context here because that would cause the engine to shut
// down when the client disconnects and cancels the passed-in context. Once it starts
// up, we want to keep the engine running until the runtime shuts down.
if err := wfe.Start(context.Background()); err != nil {
// This can happen if the workflow app connects before the sidecar has finished initializing.
// The client app is expected to continuously retry until successful.
return fmt.Errorf("failed to auto-start the workflow engine: %w", err)
}
return nil
})
// Create a channel that can be used to disconnect the remote client during shutdown.
wfe.disconnectChan = make(chan any, 1)
disconnectHelper := backend.WithStreamShutdownChannel(wfe.disconnectChan)
wfe.executor, wfe.registerGrpcServerFn = backend.NewGrpcExecutor(wfe.Backend, wfLogger, autoStartCallback, disconnectHelper)
}
WorkflowEngine 的初始化代码在 pkg/runtime/runtime.go
中:
// Creating workflow engine after components are loaded
wfe := wfengine.NewWorkflowEngine(a.runtimeConfig.id, a.globalConfig.GetWorkflowSpec(), a.processor.WorkflowBackend())
wfe.ConfigureGrpcExecutor()
a.workflowEngine = wfe
processor := processor.New(processor.Options{
ID: runtimeConfig.id,
Namespace: namespace,
IsHTTP: runtimeConfig.appConnectionConfig.Protocol.IsHTTP(),
ActorsEnabled: len(runtimeConfig.actorsService) > 0,
Registry: runtimeConfig.registry,
ComponentStore: compStore,
Meta: meta,
GlobalConfig: globalConfig,
Resiliency: resiliencyProvider,
Mode: runtimeConfig.mode,
PodName: podName,
Standalone: runtimeConfig.standalone,
OperatorClient: operatorClient,
GRPC: grpc,
Channels: channels,
})
ActorBackend 实现了 durabletask-go 定义的 Backend 接口:
type ActorBackend struct {
orchestrationWorkItemChan chan *backend.OrchestrationWorkItem
activityWorkItemChan chan *backend.ActivityWorkItem
startedOnce sync.Once
config actorsBackendConfig
activityActorOpts activityActorOpts
workflowActorOpts workflowActorOpts
actorRuntime actors.ActorRuntime
actorsReady atomic.Bool
actorsReadyCh chan struct{}
}
CreateOrchestrationInstance() 方法的实现:
func (abe *ActorBackend) CreateOrchestrationInstance(ctx context.Context, e *backend.HistoryEvent, opts ...backend.OrchestrationIdReusePolicyOptions) error {
if err := abe.validateConfiguration(); err != nil {
return err
}
// 对输入做必要的检查
var workflowInstanceID string
if es := e.GetExecutionStarted(); es == nil {
return errors.New("the history event must be an ExecutionStartedEvent")
} else if oi := es.GetOrchestrationInstance(); oi == nil {
return errors.New("the ExecutionStartedEvent did not contain orchestration instance information")
} else {
workflowInstanceID = oi.GetInstanceId()
}
policy := &api.OrchestrationIdReusePolicy{}
for _, opt := range opts {
opt(policy)
}
eventData, err := backend.MarshalHistoryEvent(e)
if err != nil {
return err
}
requestBytes, err := json.Marshal(CreateWorkflowInstanceRequest{
Policy: policy,
StartEventBytes: eventData,
})
if err != nil {
return fmt.Errorf("failed to marshal CreateWorkflowInstanceRequest: %w", err)
}
// Invoke the well-known workflow actor directly, which will be created by this invocation request.
// Note that this request goes directly to the actor runtime, bypassing the API layer.
req := internalsv1pb.NewInternalInvokeRequest(CreateWorkflowInstanceMethod).
WithActor(abe.config.workflowActorType, workflowInstanceID).
WithData(requestBytes).
WithContentType(invokev1.JSONContentType)
start := time.Now()
_, err = abe.actorRuntime.Call(ctx, req)
elapsed := diag.ElapsedSince(start)
if err != nil {
// failed request to CREATE workflow, record count and latency metrics.
diag.DefaultWorkflowMonitoring.WorkflowOperationEvent(ctx, diag.CreateWorkflow, diag.StatusFailed, elapsed)
return err
}
// successful request to CREATE workflow, record count and latency metrics.
diag.DefaultWorkflowMonitoring.WorkflowOperationEvent(ctx, diag.CreateWorkflow, diag.StatusSuccess, elapsed)
return nil
}
关键代码在:
_, err = abe.actorRuntime.Call(ctx, req)
这是通过 actor 来进行调用。
其中 ActorRuntime 是这样设置进来的:
func (abe *ActorBackend) SetActorRuntime(ctx context.Context, actorRuntime actors.ActorRuntime) {
abe.actorRuntime = actorRuntime
if abe.actorsReady.CompareAndSwap(false, true) {
close(abe.actorsReadyCh)
}
}
调用的地方在 pkg/runtime/runtime.go
的 initWorkflowEngine() 方法中:
func (a *DaprRuntime) initWorkflowEngine(ctx context.Context) error {
wfComponentFactory := wfengine.BuiltinWorkflowFactory(a.workflowEngine)
// If actors are not enabled, still invoke SetActorRuntime on the workflow engine with `nil` to unblock startup
if abe, ok := a.workflowEngine.Backend.(interface {
SetActorRuntime(ctx context.Context, actorRuntime actors.ActorRuntime)
}); ok {
log.Info("Configuring workflow engine with actors backend")
var actorRuntime actors.ActorRuntime
if a.runtimeConfig.ActorsEnabled() {
actorRuntime = a.actor
}
abe.SetActorRuntime(ctx, actorRuntime)
}
......
ActorRuntime 的 interface 定义:
// ActorRuntime is the main runtime for the actors subsystem.
type ActorRuntime interface {
Actors
io.Closer
Init(context.Context) error
IsActorHosted(ctx context.Context, req *ActorHostedRequest) bool
GetRuntimeStatus(ctx context.Context) *runtimev1pb.ActorRuntime
RegisterInternalActor(ctx context.Context, actorType string, actor InternalActorFactory, actorIdleTimeout time.Duration) error
}
ActorRuntime 继承了 Actors interface,call()方法在这里定义:
// Actors allow calling into virtual actors as well as actor state management.
type Actors interface {
// Call an actor.
Call(ctx context.Context, req *internalv1pb.InternalInvokeRequest) (*internalv1pb.InternalInvokeResponse, error)
......
}
Call()方法的代码实现:
func (a *actorsRuntime) Call(ctx context.Context, req *internalv1pb.InternalInvokeRequest) (res *internalv1pb.InternalInvokeResponse, err error) {
err = a.placement.WaitUntilReady(ctx)
if err != nil {
return nil, fmt.Errorf("failed to wait for placement readiness: %w", err)
}
// Do a lookup to check if the actor is local
actor := req.GetActor()
actorType := actor.GetActorType()
lar, err := a.placement.LookupActor(ctx, internal.LookupActorRequest{
ActorType: actorType,
ActorID: actor.GetActorId(),
})
if err != nil {
return nil, err
}
if a.isActorLocal(lar.Address, a.actorsConfig.Config.HostAddress, a.actorsConfig.Config.Port) {
// If this is an internal actor, we call it using a separate path
internalAct, ok := a.getInternalActor(actorType, actor.GetActorId())
if ok {
res, err = a.callInternalActor(ctx, req, internalAct)
} else {
res, err = a.callLocalActor(ctx, req)
}
} else {
res, err = a.callRemoteActorWithRetry(ctx, retry.DefaultLinearRetryCount, retry.DefaultLinearBackoffInterval, a.callRemoteActor, lar.Address, lar.AppID, req)
}
if err != nil {
if res != nil && actorerrors.Is(err) {
return res, err
}
return nil, err
}
return res, nil
}
关键代码在这里,调用 placement.LookupActor() 方法来查找要调用的目标actor的地址:
lar, err := a.placement.LookupActor(ctx, internal.LookupActorRequest{
ActorType: actorType,
ActorID: actor.GetActorId(),
})
PlacementService 的接口定义:
type PlacementService interface {
io.Closer
Start(context.Context) error
WaitUntilReady(ctx context.Context) error
LookupActor(ctx context.Context, req LookupActorRequest) (LookupActorResponse, error)
AddHostedActorType(actorType string, idleTimeout time.Duration) error
ReportActorDeactivation(ctx context.Context, actorType, actorID string) error
SetHaltActorFns(haltFn HaltActorFn, haltAllFn HaltAllActorsFn)
SetOnAPILevelUpdate(fn func(apiLevel uint32))
SetOnTableUpdateFn(fn func())
// PlacementHealthy returns true if the placement service is healthy.
PlacementHealthy() bool
// StatusMessage returns a custom status message.
StatusMessage() string
}
代码实现在 pkg/actors/placement/placement.go
中:
// LookupActor resolves to actor service instance address using consistent hashing table.
func (p *actorPlacement) LookupActor(ctx context.Context, req internal.LookupActorRequest) (internal.LookupActorResponse, error) {
// Retry here to allow placement table dissemination/rebalancing to happen.
policyDef := p.resiliency.BuiltInPolicy(resiliency.BuiltInActorNotFoundRetries)
policyRunner := resiliency.NewRunner[internal.LookupActorResponse](ctx, policyDef)
return policyRunner(func(ctx context.Context) (res internal.LookupActorResponse, rErr error) {
rAddr, rAppID, rErr := p.doLookupActor(ctx, req.ActorType, req.ActorID)
if rErr != nil {
return res, fmt.Errorf("error finding address for actor %s/%s: %w", req.ActorType, req.ActorID, rErr)
} else if rAddr == "" {
return res, fmt.Errorf("did not find address for actor %s/%s", req.ActorType, req.ActorID)
}
res.Address = rAddr
res.AppID = rAppID
return res, nil
})
}
doLookupActor():
func (p *actorPlacement) doLookupActor(ctx context.Context, actorType, actorID string) (string, string, error) {
// 加读锁
p.placementTableLock.RLock()
defer p.placementTableLock.RUnlock()
if p.placementTables == nil {
return "", "", errors.New("placement tables are not set")
}
// 先根据 actorType 找到符合要求的 Entries
t := p.placementTables.Entries[actorType]
if t == nil {
return "", "", nil
}
host, err := t.GetHost(actorID)
if err != nil || host == nil {
return "", "", nil //nolint:nilerr
}
return host.Name, host.AppID, nil
}
p.placementTables 的结构体定义如下:
type ConsistentHashTables struct {
Version string
Entries map[string]*Consistent
}
Consistent 的结构体定义如下:
// Consistent represents a data structure for consistent hashing.
type Consistent struct {
hosts map[uint64]string
sortedSet []uint64
loadMap map[string]*Host
totalLoad int64
replicationFactor int
sync.RWMutex
}
host, err := t.GetHost(actorID)
代码对应的 GetHost() 方法:
func (c *Consistent) GetHost(key string) (*Host, error) {
h, err := c.Get(key)
if err != nil {
return nil, err
}
return c.loadMap[h], nil
}
Shared utility code for Dapr runtime.
目前内容很少,只有 logger/config/retry 三个package。
kit 仓库是后来提取出来的仓库,原来的代码存放在 dapr 仓库中,被 dapr 仓库中的其他代码使用。后来 components-contrib 仓库的代码也使用了这些基础代码,这导致了一个循环依赖:
participant dapr
participant "components-contrib" as components
dapr -> components : for component impl
components -> dapr : for common code
为了让依赖关系更加的清晰,避免循环依赖,因此将这些基础代码从 dapr 仓库中移出来存放在单独的 kit仓库中,之后的依赖关系就是这样:
participant dapr
participant "components-contrib" as components
participant kit
dapr -> kit : for common code
components -> kit : for common code
dapr -> components : for component impl
Dapr Logger package中的logger.go文件的源码学习,定义logger相关`的日志类型、schema、日志级别、接口以及保存全局logger列表。
log类型分为 普通 log 和 request 两种:
const (
// LogTypeLog is normal log type
LogTypeLog = "log"
// LogTypeRequest is Request log type
LogTypeRequest = "request"
......
}
const (
......
// Field names that defines Dapr log schema
logFieldTimeStamp = "time"
logFieldLevel = "level"
logFieldType = "type"
logFieldScope = "scope"
logFieldMessage = "msg"
logFieldInstance = "instance"
logFieldDaprVer = "ver"
logFieldAppID = "app_id"
)
log level 没啥特别,很传统的定义:
const (
// DebugLevel has verbose message
DebugLevel LogLevel = "debug"
// InfoLevel is default log level
InfoLevel LogLevel = "info"
// WarnLevel is for logging messages about possible issues
WarnLevel LogLevel = "warn"
// ErrorLevel is for logging errors
ErrorLevel LogLevel = "error"
// FatalLevel is for logging fatal messages. The system shuts down after logging the message.
FatalLevel LogLevel = "fatal"
// UndefinedLevel is for undefined log level
UndefinedLevel LogLevel = "undefined"
)
注意: FatalLevel 有特别的意义,”The system shuts down after logging the message“. 所以这个不能随便用。
toLogLevel() 方法将字符串转为 LogLevel,大小写不敏感:
// toLogLevel converts to LogLevel
func toLogLevel(level string) LogLevel {
switch strings.ToLower(level) {
case "debug":
return DebugLevel
case "info":
return InfoLevel
case "warn":
return WarnLevel
case "error":
return ErrorLevel
case "fatal":
return FatalLevel
}
// unsupported log level by Dapr
return UndefinedLevel
}
// Logger includes the logging api sets
type Logger interface {
// EnableJSONOutput enables JSON formatted output log
EnableJSONOutput(enabled bool)
// SetAppID sets dapr_id field in log. Default value is empty string
SetAppID(id string)
// SetOutputLevel sets log output level
SetOutputLevel(outputLevel LogLevel)
// WithLogType specify the log_type field in log. Default value is LogTypeLog
WithLogType(logType string) Logger
// Info logs a message at level Info.
Info(args ...interface{})
// Infof logs a message at level Info.
Infof(format string, args ...interface{})
// Debug logs a message at level Debug.
Debug(args ...interface{})
// Debugf logs a message at level Debug.
Debugf(format string, args ...interface{})
// Warn logs a message at level Warn.
Warn(args ...interface{})
// Warnf logs a message at level Warn.
Warnf(format string, args ...interface{})
// Error logs a message at level Error.
Error(args ...interface{})
// Errorf logs a message at level Error.
Errorf(format string, args ...interface{})
// Fatal logs a message at level Fatal then the process will exit with status set to 1.
Fatal(args ...interface{})
// Fatalf logs a message at level Fatal then the process will exit with status set to 1.
Fatalf(format string, args ...interface{})
}
// globalLoggers is the collection of Dapr Logger that is shared globally.
// TODO: User will disable or enable logger on demand.
var globalLoggers = map[string]Logger{} // map保存所有的logger实例
var globalLoggersLock = sync.RWMutex{} // 用读写锁对map进行保护
logger创建之后会保存在 global loggers 中,这意味着每个 name 的logger只会创建一个实例。
// NewLogger creates new Logger instance.
func NewLogger(name string) Logger {
globalLoggersLock.Lock() // 加写锁
defer globalLoggersLock.Unlock()
logger, ok := globalLoggers[name]
if !ok {
logger = newDaprLogger(name)
globalLoggers[name] = logger
}
return logger
}
newDaprLogger() 方法的细节见 dapr_logger.go。
func getLoggers() map[string]Logger {
globalLoggersLock.RLock() // 加读锁
defer globalLoggersLock.RUnlock()
l := map[string]Logger{}
for k, v := range globalLoggers {
l[k] = v
}
return l
}
Dapr logger package中的dapr_logger.go文件的源码分析,daprLogger 是实际的日志实现。
daprLogger 结构体,底层实现是 logrus :
// daprLogger is the implemention for logrus
type daprLogger struct {
// name is the name of logger that is published to log as a scope
name string
// loger is the instance of logrus logger
logger *logrus.Entry
}
创建Dapr logger的逻辑:
func newDaprLogger(name string) *daprLogger {
// 底层是 logrus
newLogger := logrus.New()
// 输出到 stdout
newLogger.SetOutput(os.Stdout)
dl := &daprLogger{
name: name,
logger: newLogger.WithFields(logrus.Fields{
logFieldScope: name,
// 默认是普通log类型
logFieldType: LogTypeLog,
}),
}
// 设置是否启用json输出,defaultJSONOutput默认是false
dl.EnableJSONOutput(defaultJSONOutput)
return dl
}
函数名有点小问题,实际是初始化logger,是否enables JSON只是部分逻辑:
// EnableJSONOutput enables JSON formatted output log
func (l *daprLogger) EnableJSONOutput(enabled bool) {
var formatter logrus.Formatter
fieldMap := logrus.FieldMap{
// If time field name is conflicted, logrus adds "fields." prefix.
// So rename to unused field @time to avoid the confliction.
logrus.FieldKeyTime: logFieldTimeStamp,
logrus.FieldKeyLevel: logFieldLevel,
logrus.FieldKeyMsg: logFieldMessage,
}
hostname, _ := os.Hostname()
l.logger.Data = logrus.Fields{
logFieldScope: l.logger.Data[logFieldScope],
logFieldType: LogTypeLog,
logFieldInstance: hostname,
logFieldDaprVer: DaprVersion,
}
if enabled {
formatter = &logrus.JSONFormatter{
TimestampFormat: time.RFC3339Nano,
FieldMap: fieldMap,
}
} else {
formatter = &logrus.TextFormatter{
TimestampFormat: time.RFC3339Nano,
FieldMap: fieldMap,
}
}
l.logger.Logger.SetFormatter(formatter)
}
var DaprVersion string = "unknown"
func (l *daprLogger) EnableJSONOutput(enabled bool) {
l.logger.Data = logrus.Fields{
......
logFieldDaprVer: DaprVersion,
}
}
DaprVersion的值来自于 makefile (dapr/Makefile
):
LOGGER_PACKAGE_NAME := github.com/dapr/kit/logger
DEFAULT_LDFLAGS:=-X $(BASE_PACKAGE_NAME)/pkg/version.gitcommit=$(GIT_COMMIT) \
-X $(BASE_PACKAGE_NAME)/pkg/version.gitversion=$(GIT_VERSION) \
-X $(BASE_PACKAGE_NAME)/pkg/version.version=$(DAPR_VERSION) \
-X $(LOGGER_PACKAGE_NAME).DaprVersion=$(DAPR_VERSION)
设置日志的 app_id 字段,默认为空。
// SetAppID sets app_id field in log. Default value is empty string
func (l *daprLogger) SetAppID(id string) {
l.logger = l.logger.WithField(logFieldAppID, id)
}
这个方法在logger被初始化时调用进行设置,见 options.go 方法:
func ApplyOptionsToLoggers(options *Options) error {
......
if options.appID != undefinedAppID {
v.SetAppID(options.appID)
}
}
// SetOutputLevel sets log output level
func (l *daprLogger) SetOutputLevel(outputLevel LogLevel) {
l.logger.Logger.SetLevel(toLogrusLevel(outputLevel))
}
func toLogrusLevel(lvl LogLevel) logrus.Level {
// ignore error because it will never happens
l, _ := logrus.ParseLevel(string(lvl))
return l
}
这个是在原有的 daprLogger 实例上进行设置,没啥特殊。
默认是普通 log 类型,如果要设置log类型:
// WithLogType specify the log_type field in log. Default value is LogTypeLog
func (l *daprLogger) WithLogType(logType string) Logger {
// 这里重新构造了一个新的 daprLogger 结构体,然后返回
return &daprLogger{
name: l.name,
logger: l.logger.WithField(logFieldType, logType),
}
}
疑问和TODO:
所有的写log的方法都简单代理给了 l.logger (*logrus.Entry):
// Info logs a message at level Info.
func (l *daprLogger) Info(args ...interface{}) {
l.logger.Log(logrus.InfoLevel, args...)
}
// Infof logs a message at level Info.
func (l *daprLogger) Infof(format string, args ...interface{}) {
l.logger.Logf(logrus.InfoLevel, format, args...)
}
// Debug logs a message at level Debug.
func (l *daprLogger) Debug(args ...interface{}) {
l.logger.Log(logrus.DebugLevel, args...)
}
// Debugf logs a message at level Debug.
func (l *daprLogger) Debugf(format string, args ...interface{}) {
l.logger.Logf(logrus.DebugLevel, format, args...)
}
// Warn logs a message at level Warn.
func (l *daprLogger) Warn(args ...interface{}) {
l.logger.Log(logrus.WarnLevel, args...)
}
// Warnf logs a message at level Warn.
func (l *daprLogger) Warnf(format string, args ...interface{}) {
l.logger.Logf(logrus.WarnLevel, format, args...)
}
// Error logs a message at level Error.
func (l *daprLogger) Error(args ...interface{}) {
l.logger.Log(logrus.ErrorLevel, args...)
}
// Errorf logs a message at level Error.
func (l *daprLogger) Errorf(format string, args ...interface{}) {
l.logger.Logf(logrus.ErrorLevel, format, args...)
}
// Fatal logs a message at level Fatal then the process will exit with status set to 1.
func (l *daprLogger) Fatal(args ...interface{}) {
l.logger.Fatal(args...)
}
// Fatalf logs a message at level Fatal then the process will exit with status set to 1.
func (l *daprLogger) Fatalf(format string, args ...interface{}) {
l.logger.Fatalf(format, args...)
}
注意 logrus 的 Fatalf() 方法的实现,在输出日志之后会调用ExitFunc(如果没设置则默认是 os.Exit
)
func (entry *Entry) Fatalf(format string, args ...interface{}) {
entry.Logf(FatalLevel, format, args...)
entry.Logger.Exit(1)
}
func (logger *Logger) Exit(code int) {
runHandlers()
if logger.ExitFunc == nil {
logger.ExitFunc = os.Exit
}
logger.ExitFunc(code)
}
这会导致进程退出。因此要慎用。
Dapr logger package中的 options.go 文件的源码学习,设置logger相关的属性,包括从命令行参数中解析标记。
const (
defaultJSONOutput = false
defaultOutputLevel = "info"
undefinedAppID = ""
)
Options 结构体,就三个字段:
// Options defines the sets of options for Dapr logging
type Options struct {
// appID is the unique id of Dapr Application
// 默认为空
appID string
// JSONFormatEnabled is the flag to enable JSON formatted log
// 默认为fasle
JSONFormatEnabled bool
// OutputLevel is the level of logging
// 默认为 info
OutputLevel string
}
// SetOutputLevel sets the log output level
func (o *Options) SetOutputLevel(outputLevel string) error {
// 疑问:这里检查和赋值存在不一致:如果 outputLevel 中有大写字母
// TODO:改进一下
if toLogLevel(outputLevel) == UndefinedLevel {
return errors.Errorf("undefined Log Output Level: %s", outputLevel)
}
o.OutputLevel = outputLevel
return nil
}
// SetAppID sets Dapr ID
func (o *Options) SetAppID(id string) {
o.appID = id
}
疑问 :为什么字段和设置方法不统一?
检查发现:
返回每个字段的默认值,没啥特殊:
// DefaultOptions returns default values of Options
func DefaultOptions() Options {
return Options{
JSONFormatEnabled: defaultJSONOutput,
appID: undefinedAppID,
OutputLevel: defaultOutputLevel,
}
}
备注:go 不像 java 可以在字段定义时直接赋值一个默认值,有时还真不方便。
在命令行参数中读取 log-level
和 log-as-json
两个标记并设置 OutputLevel 和 JSONFormatEnabled:
// AttachCmdFlags attaches log options to command flags
func (o *Options) AttachCmdFlags(
stringVar func(p *string, name string, value string, usage string),
boolVar func(p *bool, name string, value bool, usage string)) {
if stringVar != nil {
stringVar(
&o.OutputLevel,
"log-level",
defaultOutputLevel,
"Options are debug, info, warn, error, or fatal (default info)")
}
if boolVar != nil {
boolVar(
&o.JSONFormatEnabled,
"log-as-json",
defaultJSONOutput,
"print log as JSON (default false)")
}
}
备注:这大概就是 OutputLevel 和 JSONFormatEnabled 两个字段是 public 的原因?
这个方法会在每个二进制文件(runtime(也就是daprd) / injector / operator / placement / sentry) 的初始化代码中调用:
loggerOptions := logger.DefaultOptions()
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)
注意:这个时候 OutputLevel 的值是没有经过检查而直接设值的,绕开了 SetOutputLevel 方法的检查。
// ApplyOptionsToLoggers applys options to all registered loggers
func ApplyOptionsToLoggers(options *Options) error {
// 所有的 logger 指的是保存在全局 logger map 中所有 logger
internalLoggers := getLoggers()
// Apply formatting options first
for _, v := range internalLoggers {
v.EnableJSONOutput(options.JSONFormatEnabled)
if options.appID != undefinedAppID {
v.SetAppID(options.appID)
}
}
daprLogLevel := toLogLevel(options.OutputLevel)
if daprLogLevel == UndefinedLevel {
// 在这里做了 OutputLevel 值的有效性检查
return errors.Errorf("invalid value for --log-level: %s", options.OutputLevel)
}
for _, v := range internalLoggers {
v.SetOutputLevel(daprLogLevel)
}
return nil
}
TODO:OutputLevel 赋值有效性检查的地方现在发现有两个,其中一个还没有被使用。准备PR修订。
查了一下这个方法的确是在每个二进制文件(runtime(也就是daprd) / injector / operator / placement / sentry) 的初始化代码中调用:
loggerOptions := logger.DefaultOptions()
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)
......
// Apply options to all loggers
loggerOptions.SetAppID(*appID)
if err := logger.ApplyOptionsToLoggers(&loggerOptions); err != nil {
return nil, err
}
TODO: ApplyOptionsToLoggers这个方法名最好修改增加“来自命令行的options”语义,否则报错 “invalid value for –log-level“ 就会很奇怪。
Dapr config package中的 decode.go 文件的源码学习。
// StringDecoder被用作自定义类型(或别名类型)来覆盖 `decodeString` DecodeHook中的基本解码功能的一种方式。
// `encoding.TextMashaller`没有被使用,是因为它与许多Go类型相匹配,并且会有潜在的意外结果。
// 指定一个自定义的解码func应该是非常有意的。
type StringDecoder interface {
DecodeString(value string) error
}
// Decode()将通用map值从 `input` 解码到 `output`,同时提供有用的错误信息。
// `output`必须是一个指向Go结构体的指针,该结构体包含应被解码的字段的 `mapstructure` 结构体标签。
// 这个函数在解码被解析为 `map[string]interface{}` 的配置文件或被解析为`map[string]string` 的组件元数据的值时很有用。
//
// 大部分繁重的工作都由 mapstructure 库处理。自定义的解码器被用来处理将字符串值解码为支持的原生类型。
func Decode(input interface{}, output interface{}) error {
// 构建mapstructure的decoder
decoder, err := mapstructure.NewDecoder(&mapstructure.DecoderConfig{ // nolint:exhaustivestruct
Result: output,
DecodeHook: decodeString, // 这里植入我们的hook
})
if err != nil {
return err
}
// 委托给mapstructure的decoder进行解码
return decoder.Decode(input)
}
DecodeHookFunc 的定义:
type DecodeHookFunc interface{}
DecodeHookFunc() 要求必须是下面的三个方法之一:
// DecodeHookFuncType is a DecodeHookFunc which has complete information about
// the source and target types.
type DecodeHookFuncType func(reflect.Type, reflect.Type, interface{}) (interface{}, error)
// DecodeHookFuncKind is a DecodeHookFunc which knows only the Kinds of the
// source and target types.
type DecodeHookFuncKind func(reflect.Kind, reflect.Kind, interface{}) (interface{}, error)
// DecodeHookFuncRaw is a DecodeHookFunc which has complete access to both the source and target
// values.
type DecodeHookFuncValue func(from reflect.Value, to reflect.Value) (interface{}, error)
config实现中采用的是第一种: 有 source 和 target 类型的完整信息。
decodeString()方法的实现:
func decodeString(
f reflect.Type,
t reflect.Type,
data interface{}) (interface{}, error) {
if t.Kind() == reflect.String && f.Kind() != reflect.String {
return fmt.Sprintf("%v", data), nil
}
if f.Kind() == reflect.Ptr {
f = f.Elem()
data = reflect.ValueOf(data).Elem().Interface()
}
if f.Kind() != reflect.String {
return data, nil
}
dataString, ok := data.(string)
if !ok {
return nil, errors.Errorf("expected string: got %s", reflect.TypeOf(data))
}
var result interface{}
var decoder StringDecoder
if t.Implements(typeStringDecoder) {
result = reflect.New(t.Elem()).Interface()
decoder = result.(StringDecoder)
} else if reflect.PtrTo(t).Implements(typeStringDecoder) {
result = reflect.New(t).Interface()
decoder = result.(StringDecoder)
}
if decoder != nil {
if err := decoder.DecodeString(dataString); err != nil {
if t.Kind() == reflect.Ptr {
t = t.Elem()
}
return nil, errors.Errorf("invalid %s %q: %v", t.Name(), dataString, err)
}
return result, nil
}
switch t {
case typeDuration:
// Check for simple integer values and treat them
// as milliseconds
if val, err := strconv.Atoi(dataString); err == nil {
return time.Duration(val) * time.Millisecond, nil
}
// Convert it by parsing
d, err := time.ParseDuration(dataString)
return d, invalidError(err, "duration", dataString)
case typeTime:
// Convert it by parsing
t, err := time.Parse(time.RFC3339Nano, dataString)
if err == nil {
return t, nil
}
t, err = time.Parse(time.RFC3339, dataString)
return t, invalidError(err, "time", dataString)
}
switch t.Kind() { // nolint: exhaustive
case reflect.Uint:
val, err := strconv.ParseUint(dataString, 10, 64)
return uint(val), invalidError(err, "uint", dataString)
case reflect.Uint64:
val, err := strconv.ParseUint(dataString, 10, 64)
return val, invalidError(err, "uint64", dataString)
case reflect.Uint32:
val, err := strconv.ParseUint(dataString, 10, 32)
return uint32(val), invalidError(err, "uint32", dataString)
case reflect.Uint16:
val, err := strconv.ParseUint(dataString, 10, 16)
return uint16(val), invalidError(err, "uint16", dataString)
case reflect.Uint8:
val, err := strconv.ParseUint(dataString, 10, 8)
return uint8(val), invalidError(err, "uint8", dataString)
case reflect.Int:
val, err := strconv.ParseInt(dataString, 10, 64)
return int(val), invalidError(err, "int", dataString)
case reflect.Int64:
val, err := strconv.ParseInt(dataString, 10, 64)
return val, invalidError(err, "int64", dataString)
case reflect.Int32:
val, err := strconv.ParseInt(dataString, 10, 32)
return int32(val), invalidError(err, "int32", dataString)
case reflect.Int16:
val, err := strconv.ParseInt(dataString, 10, 16)
return int16(val), invalidError(err, "int16", dataString)
case reflect.Int8:
val, err := strconv.ParseInt(dataString, 10, 8)
return int8(val), invalidError(err, "int8", dataString)
case reflect.Float32:
val, err := strconv.ParseFloat(dataString, 32)
return float32(val), invalidError(err, "float32", dataString)
case reflect.Float64:
val, err := strconv.ParseFloat(dataString, 64)
return val, invalidError(err, "float64", dataString)
case reflect.Bool:
val, err := strconv.ParseBool(dataString)
return val, invalidError(err, "bool", dataString)
default:
return data, nil
}
}
Dapr config package中的 normalize.go 文件的源码学习。
将 map[interface{}]interface{}
转换为 map[string]interface{}
,以便对JSON进行标准化处理,并在组件初始化时使用。
代码实现:
func Normalize(i interface{}) (interface{}, error) {
var err error
switch x := i.(type) { // 只标准化三种类型,其他类型直接返回
case map[interface{}]interface{}: // 1. 对于map[interface{}]interface{},key和value都要做正常化
m2 := map[string]interface{}{}
for k, v := range x {
if strKey, ok := k.(string); ok {
// 将key的类型改成string,value继续做正常化
if m2[strKey], err = Normalize(v); err != nil {
return nil, err
}
} else {
// 要求key一定是string,否则报错
return nil, fmt.Errorf("error parsing config field: %v", k)
}
}
return m2, nil
case map[string]interface{}: // 2. 对于map[string{}]interface{},只需要对value做正常化
m2 := map[string]interface{}{}
for k, v := range x {
if m2[k], err = Normalize(v); err != nil {
return nil, err
}
}
return m2, nil
case []interface{}: // 3. 对于[]interface{}这样的数组,每个数组元素都做正常化
for i, v := range x {
if x[i], err = Normalize(v); err != nil {
return nil, err
}
}
}
return i, nil
}
Dapr config package中的 prefix.go 文件的源码学习。
func PrefixedBy(input interface{}, prefix string) (interface{}, error) {
normalized, err := Normalize(input)
if err != nil {
// 唯一可能来自normalize的错误是: 输入是map[interface{}]interface{},而某个key不是字符串
return input, err
}
input = normalized
if inputMap, ok := input.(map[string]interface{}); ok {
converted := make(map[string]interface{}, len(inputMap))
for k, v := range inputMap {
if strings.HasPrefix(k, prefix) {
key := uncapitalize(strings.TrimPrefix(k, prefix)) // 去掉key的前缀
converted[key] = v
}
}
return converted, nil
} else if inputMap, ok := input.(map[string]string); ok {
converted := make(map[string]string, len(inputMap))
for k, v := range inputMap {
if strings.HasPrefix(k, prefix) {
key := uncapitalize(strings.TrimPrefix(k, prefix)) // 去掉key的前缀
converted[key] = v
}
}
return converted, nil
}
return input, nil
}
uncapitalize()方法将字符串转为小写:
func uncapitalize(str string) string {
if len(str) == 0 {
return str
}
vv := []rune(str) // Introduced later
vv[0] = unicode.ToLower(vv[0])
return string(vv)
}
被 retry.go 的 DecodeConfigWithPrefix() 方法调用
func DecodeConfigWithPrefix(c *Config, input interface{}, prefix string) error {
input, err := config.PrefixedBy(input, prefix)
if err != nil {
return err
}
return DecodeConfig(c, input)
}
Dapr retry package中的 retry.go 文件的源码学习。
多次重试之间的间隔策略,有两种:PolicyConstant 是固定值,PolicyExponential是指数增长。
// PolicyType 表示后退延迟(back off delay)应该是固定值还是指数增长。
// PolicyType denotes if the back off delay should be constant or exponential.
type PolicyType int
const (
// PolicyConstant is a backoff policy that always returns the same backoff delay.
// PolicyConstant是一个总是返回相同退避延迟的退避策略。
PolicyConstant PolicyType = iota
// PolicyExponential is a backoff implementation that increases the backoff period
// for each retry attempt using a randomization function that grows exponentially.
// PolicyExponential是一个退避实现,它使用一个以指数增长的随机化函数来增加每次重试的退避周期。
PolicyExponential
)
// Config 封装了退避策略的配置。
type Config struct {
Policy PolicyType `mapstructure:"policy"`
// Constant back off
Duration time.Duration `mapstructure:"duration"`
// Exponential back off
InitialInterval time.Duration `mapstructure:"initialInterval"`
RandomizationFactor float32 `mapstructure:"randomizationFactor"`
Multiplier float32 `mapstructure:"multiplier"`
MaxInterval time.Duration `mapstructure:"maxInterval"`
MaxElapsedTime time.Duration `mapstructure:"maxElapsedTime"`
// Additional options
MaxRetries int64 `mapstructure:"maxRetries"`
}
注意: 每个字段都标记了
mapstructure
,这是为了使用 mapstructure 进行解码。
默认配置为:
func DefaultConfig() Config {
return Config{
Policy: PolicyConstant, // 默认为固定间隔
Duration: 5 * time.Second, // 间隔时间默认是5秒钟
InitialInterval: backoff.DefaultInitialInterval,
RandomizationFactor: backoff.DefaultRandomizationFactor,
Multiplier: backoff.DefaultMultiplier,
MaxInterval: backoff.DefaultMaxInterval,
MaxElapsedTime: backoff.DefaultMaxElapsedTime,
MaxRetries: -1, // 默认一直进行重试
}
}
不带重试的默认配置:
// 这对那些可以自行处理重试的broker来说可能很有用。
func DefaultConfigWithNoRetry() Config {
c := DefaultConfig()
c.MaxRetries = 0 // MaxRetries 设置为0
return c
}
DecodeConfig() 方法将 go 结构体解析为 Config
:
func DecodeConfig(c *Config, input interface{}) error {
// Use the default config if `c` is empty/zero value.
var emptyConfig Config
if *c == emptyConfig { // 如果c是一个初始化之后没有进行赋值的Config结构体,则改用默认配置的Config
*c = DefaultConfig()
}
return config.Decode(input, c)
}
DecodeConfigWithPrefix() 方法在将 go 结构体解析为 Config
之前,先去除前缀,并进行key和value的正常化:
func DecodeConfigWithPrefix(c *Config, input interface{}, prefix string) error {
input, err := config.PrefixedBy(input, prefix) // 去除前缀,并进行key和value的正常化
if err != nil {
return err
}
return DecodeConfig(c, input)
}
DecodeString()方法解析重试策略:
func (p *PolicyType) DecodeString(value string) error {
switch strings.ToLower(value) {
case "constant":
*p = PolicyConstant
case "exponential":
*p = PolicyExponential
default:
return errors.Errorf("unexpected back off policy type: %s", value)
}
return nil
}
NewBackOff() 方法 返回一个 BackOff
实例,可直接与NotifyRecover
或backoff.RetryNotify
一起使用。该实例不会因为上下文取消而停止。要支持取消(推荐),请使用NewBackOffWithContext
。 由于底层的回退实现并不总是线程安全的,所以每次使用RetryNotifyRecover
或backoff.RetryNotify
时都应该调用NewBackOff
或NewBackOffWithContext
。
func (c *Config) NewBackOff() backoff.BackOff {
var b backoff.BackOff
switch c.Policy {
case PolicyConstant: // 1. 对于固定周期只需要返回配置项中设定的时间间隔,默认5秒钟
b = backoff.NewConstantBackOff(c.Duration)
case PolicyExponential: // 2. 对于指数周期,通过 backoff 类库来实现,简单透传配置参数
eb := backoff.NewExponentialBackOff()
eb.InitialInterval = c.InitialInterval
eb.RandomizationFactor = float64(c.RandomizationFactor)
eb.Multiplier = float64(c.Multiplier)
eb.MaxInterval = c.MaxInterval
eb.MaxElapsedTime = c.MaxElapsedTime
b = eb
}
if c.MaxRetries >= 0 {
b = backoff.WithMaxRetries(b, uint64(c.MaxRetries))
}
return b
}
NewBackOffWithContext() 方法返回一个BackOff实例,以便与RetryNotifyRecover
或backoff.RetryNotify
直接使用。如果提供的上下文被取消,则用于取消重试。
由于底层的回退实现并不总是线程安全的,NewBackOff
或NewBackOffWithContext
应该在每次使用RetryNotifyRecover
或backoff.RetryNotify
时被调用。
func (c *Config) NewBackOffWithContext(ctx context.Context) backoff.BackOff {
b := c.NewBackOff()
return backoff.WithContext(b, ctx)
}
标准 backoff.RetryNotify
的用法:
func RetryNotify(operation Operation, b BackOff, notify Notify) error {
return RetryNotifyWithTimer(operation, b, notify, nil)
}
// Operation 是由Retry()或RetryNotify()执行的。
// 如果该操作返回错误,将使用退避策略重试。
type Operation func() error
// Notify是一个出错通知的函数。
// 如果操作失败(有错误),它会收到一个操作错误和回退延迟。
// 注意,如果退避政策要求停止重试。通知函数不会被调用。
type Notify func(error, time.Duration)
如果出现问题,需要多次重试才恢复,会存在几个问题:
NotifyRecover() 方法是 backoff.RetryNotify
的封装器,它为之前操作失败但后来恢复的情况增加了另一个回调。这个包装器的主要目的是只在操作第一次失败时调用 “notify”,在最后成功时调用 “recovered”。这有助于将日志信息限制在操作者需要被提醒的事件上。
这里的NotifyRecover() 方法包装了 Operation()
和 Notify()
函数:
func NotifyRecover(operation backoff.Operation, b backoff.BackOff, notify backoff.Notify, recovered func()) error {
var notified bool
return backoff.RetryNotify(func() error {
err := operation()
// notified为true说明之前执行过notify,即出现了一次或者多次连续错误。
// err为空说明operation不再出错
// 这才可以成为"恢复"
if err == nil && notified {
notified = false // 重置 notified ,下一次 operation() 再成功也不会再出发recovered()
recovered() // 满足逻辑,可以触发一次 recovered() 方法
}
return err
}, b, func(err error, d time.Duration) {
if !notified { // 只在第一次时调用真正的notify()函数,其他情况下忽略
notify(err, d)
notified = true
}
})
}
备注:感觉 notified 这个变量的取名不够清晰,它的语义不应该是"是否触发了通知",而是"是否发生了错误而一直没有恢复"。应该改为类似 errorNotRecoverd 之类的,语义更清晰一些。
工具类代码指完全作为工具使用的代码,这些代码往往是在代码调用链的最底层,自身没有任何特定逻辑,只专注于完成某个特定的功能,作为上层代码的工具使用。
工具类代码处于代码依赖关系的最底层。
concurrency packge的代码不多,暂时只有一个 limiter.go。
Dapr concurrency package中的 limiter.go 文件的源码学习,rating limiter的代码实现和使用场景。
重点:充分利用 golang chan 的特性
// Limiter object
type Limiter struct {
limit int
tickets chan int
numInProgress int32
}
字段说明:
const (
// DefaultLimit is the default concurrency limit
DefaultLimit = 100
)
// NewLimiter allocates a new ConcurrencyLimiter
func NewLimiter(limit int) *Limiter {
if limit <= 0 {
limit = DefaultLimit
}
// allocate a limiter instance
c := &Limiter{
limit: limit,
// tickets chan 的 size 设置为 limit
tickets: make(chan int, limit),
}
// allocate the tickets:
// 开始时先准备和limit数量相当的可用 tickets
for i := 0; i < c.limit; i++ {
c.tickets <- i
}
return c
}
// Execute adds a function to the execution queue.
// if num of go routines allocated by this instance is < limit
// launch a new go routine to execute job
// else wait until a go routine becomes available
func (c *Limiter) Execute(job func(param interface{}), param interface{}) int {
// 从 chan 中拿一个有效票据
// 如果当前 chan 中有票据,则说明 go routines 的数量还没有达到 limit 的最大限制,还可以继续启动go routine执行job
// 如果当前 chan 中没有票据,则说明 go routines 的数量已经达到 limit 的最大限制,需要限速了。execute方法会阻塞在这里,等待有job执行完成释放票据
ticket := <-c.tickets
// 拿到之后更新numInProgress,数量加一,要求是原子操作
atomic.AddInt32(&c.numInProgress, 1)
// 启动 go routine 执行 job
go func(param interface{}) {
// 通过defer来做 job 完成后的清理
defer func() {
// 将票据释放给 chan,这样后续的 job 有机会申请到
c.tickets <- ticket
// 更新numInProgress,数量减一,要求是原子操作
atomic.AddInt32(&c.numInProgress, -1)
}()
// 执行job
job(param)
}(param)
// 返回当前的票据号
return ticket
}
wait方法会阻塞并等待所有的已经通过 execute() 方法拿到票据的 go routine 执行完毕。
// Wait will block all the previously Executed jobs completed running.
//
// IMPORTANT: calling the Wait function while keep calling Execute leads to
// un-desired race conditions
func (c *Limiter) Wait() {
// 这是从 chan 中读取所有的票据,只要有任何票据被 job 释放都会去争抢
// 最后wait()方法获取到所有的票据,其他 job 自然就无法获取票据从而阻塞住所有job的工作
// 但这并不能保证一定能第一时间抢的到,如果还有其他的 job 也在调用 execute() 方法申请票据,那只有等这个 job 完成工作释放票据时再次争抢
for i := 0; i < c.limit; i++ {
<-c.tickets
}
}
在 pkg/grpc/api.go
和 pkg/http/api.go
的 GetBulkState()方法中,通过 limiter 来限制批量操作的并发数量:
// 构建limiter,limit参数由 请求参数中的 Parallelism 制定
limiter := concurrency.NewLimiter(int(in.Parallelism))
n := len(reqs)
for i := 0; i < n; i++ {
fn := func(param interface{}) {
......
}
// 提交 job 给 limiter
limiter.Execute(fn, &reqs[i])
}
// 等待所有的 job 执行完成
limiter.Wait()
在 actor 中也有类似的代码:
limiter := concurrency.NewLimiter(actorMetadata.RemindersMetadata.PartitionCount)
for i := range getRequests {
fn := func(param interface{}) {
......
}
limiter.Execute(fn, &bulkResponse[i])
}
limiter.Wait()
类库类代码指为了更方便的使用第三方类库而封装的辅助类代码,这些代码也通常是在代码调用链的底层,专注于完成某方面特定的功能,可能会带有一点点 dapr 的逻辑。
工具类代码处于代码依赖关系的倒数第二层底层,仅仅比工具类代码高一层。
Dapr grpc package中的 util.go文件的源码分析,目前只有用于转换state参数类型的两个方法。
stateConsistencyToString 方法将 StateOptions_StateConsistency 转为 string:
func stateConsistencyToString(c commonv1pb.StateOptions_StateConsistency) string {
switch c {
case commonv1pb.StateOptions_CONSISTENCY_EVENTUAL:
return "eventual"
case commonv1pb.StateOptions_CONSISTENCY_STRONG:
return "strong"
}
return ""
}
方法 方法将 StateOptions_StateConsistency 转为 string:
func stateConcurrencyToString(c commonv1pb.StateOptions_StateConcurrency) string {
switch c {
case commonv1pb.StateOptions_CONCURRENCY_FIRST_WRITE:
return "first-write"
case commonv1pb.StateOptions_CONCURRENCY_LAST_WRITE:
return "last-write"
}
return ""
}
Dapr grpc package中的 port.go文件的源码分析,只有一个 GetFreePort 方法用于获取一个空闲的端口。
GetFreePort 方法从操作系统获取一个空闲可用的端口:
// GetFreePort returns a free port from the OS
func GetFreePort() (int, error) {
addr, err := net.ResolveTCPAddr("tcp", "localhost:0")
if err != nil {
return 0, err
}
l, err := net.ListenTCP("tcp", addr)
if err != nil {
return 0, err
}
defer l.Close()
return l.Addr().(*net.TCPAddr).Port, nil
}
通过将端口设置为0, 来让操作系统自动分配一个可用的端口。注意返回时一定要关闭这个连接。
Dapr grpc package中的 dial.go文件的源码分析,目前只有用于建连获取地址前缀的一个方法。
GetDialAddressPrefix 为给定的 DaprMode 返回 dial 前缀,用于gPRC 客户端连接:
// GetDialAddressPrefix returns a dial prefix for a gRPC client connections
// For a given DaprMode.
func GetDialAddressPrefix(mode modes.DaprMode) string {
if runtime.GOOS == "windows" {
return ""
}
switch mode {
case modes.KubernetesMode:
return "dns:///"
default:
return ""
}
}
注意:Kubernetes 模式下 返回 “dns:///”
调用场景,只在 grpc.go 的 GetGRPCConnection() 方法中被调用:
// GetGRPCConnection returns a new grpc connection for a given address and inits one if doesn't exist
func (g *Manager) GetGRPCConnection(address, id string, namespace string, skipTLS, recreateIfExists, sslEnabled bool) (*grpc.ClientConn, error) {
dialPrefix := GetDialAddressPrefix(g.mode)
......
conn, err := grpc.DialContext(ctx, dialPrefix+address, opts...)
......
}
基础代码是 Dapr 代码中最基础的部分,这些代码已经是 dapr 自身逻辑的组成部分,但处于比较偏底层,也不是 dapr 的主要链路,通常代码量也不大。
基础代码在依赖关系中位于工具类代码和类库类代码之上。
version 的代码超级简单,就一个 version.go,内容也只有一点点:
// Values for these are injected by the build.
var (
version = "edge"
commit string
)
// Version returns the Dapr version. This is either a semantic version
// number or else, in the case of unreleased code, the string "edge".
func Version() string {
return version
}
// Commit returns the git commit SHA for the code that Dapr was built from.
func Commit() string {
return commit
}
1.0.0
这种,要不就是 edge
表示未发布的代码Values for these are injected by the build.
那是怎么注入的呢? Build 总不能调用代码,而且这两个值也是private。
Dapr 下的 Makefile 文件中:
# git rev-list -1 HEAD 得到的 git commit 的 hash 值
# 如:63147334aa246d76f9f65708c257460567a1cff4
GIT_COMMIT = $(shell git rev-list -1 HEAD)
# git describe --always --abbrev=7 --dirty 得到的是版本信息
# 如:v1.0.0-rc.4-5-g6314733
GIT_VERSION = $(shell git describe --always --abbrev=7 --dirty)
ifdef REL_VERSION
DAPR_VERSION := $(REL_VERSION)
else
DAPR_VERSION := edge
endif
BASE_PACKAGE_NAME := github.com/dapr/dapr
DEFAULT_LDFLAGS:=-X $(BASE_PACKAGE_NAME)/pkg/version.commit=$(GIT_VERSION) -X $(BASE_PACKAGE_NAME)/pkg/version.version=$(DAPR_VERSION)
ifeq ($(origin DEBUG), undefined)
BUILDTYPE_DIR:=release
LDFLAGS:="$(DEFAULT_LDFLAGS) -s -w"
else ifeq ($(DEBUG),0)
BUILDTYPE_DIR:=release
LDFLAGS:="$(DEFAULT_LDFLAGS) -s -w"
else
BUILDTYPE_DIR:=debug
GCFLAGS:=-gcflags="all=-N -l"
LDFLAGS:="$(DEFAULT_LDFLAGS)"
$(info Build with debugger information)
endif
define genBinariesForTarget
.PHONY: $(5)/$(1)
$(5)/$(1):
CGO_ENABLED=$(CGO) GOOS=$(3) GOARCH=$(4) go build $(GCFLAGS) -ldflags=$(LDFLAGS) \
-o $(5)/$(1) $(2)/;
endef
TODO:没看懂,有时间详细研究一下这个makefile。
modes 的代码超级简单,就一个 modes.go,内容也只有一点点:
// DaprMode is the runtime mode for Dapr.
type DaprMode string
const (
// KubernetesMode is a Kubernetes Dapr mode
KubernetesMode DaprMode = "kubernetes"
// StandaloneMode is a Standalone Dapr mode
StandaloneMode DaprMode = "standalone"
)
Dapr有两种运行模式
两种模式的差异:
配置文件读取的方式:
config
指定。config
指定。config := flag.String("config", "", "Path to config file, or name of a configuration object")
TODO
cors 的代码超级简单,就一个 cors.go,内容也只有一点点:
// DefaultAllowedOrigins is the default origins allowed for the Dapr HTTP servers
const DefaultAllowedOrigins = "*"
AllowedOrigins 配置在启动时通过命令行参数 allowed-origins
传入,默认值为 DefaultAllowedOrigins ("*")。然后传入给 NewRuntimeConfig()方法:
func FromFlags() (*DaprRuntime, error) {
allowedOrigins := flag.String("allowed-origins", cors.DefaultAllowedOrigins, "Allowed HTTP origins")
runtimeConfig := NewRuntimeConfig(*appID, placementAddresses, *controlPlaneAddress, *allowedOrigins ......)
}
之后保存在 NewRuntimeConfig 的 AllowedOrigins 字段中:
func NewRuntimeConfig(
id string, placementAddresses []string,
controlPlaneAddress, allowedOrigins ......) *Config {
return &Config{
AllowedOrigins: allowedOrigins,
......
}
pkg/http/server.go
的 useCors() 方法:
func (s *server) useCors(next fasthttp.RequestHandler) fasthttp.RequestHandler {
if s.config.AllowedOrigins == cors_dapr.DefaultAllowedOrigins {
return next
}
log.Infof("enabled cors http middleware")
origins := strings.Split(s.config.AllowedOrigins, ",")
corsHandler := s.getCorsHandler(origins)
return corsHandler.CorsMiddleware(next)
}
Dapr credentials package中的 certchain.go 文件的源码学习,credentials 结构体持有证书相关的各种 path。
CertChain 结构体持有证书信任链的PEM值:
// CertChain holds the certificate trust chain PEM values
type CertChain struct {
RootCA []byte
Cert []byte
Key []byte
}
LoadFromDisk 方法从给定目录中读取 CertChain:
// LoadFromDisk retruns a CertChain from a given directory
func LoadFromDisk(rootCertPath, issuerCertPath, issuerKeyPath string) (*CertChain, error) {
rootCert, err := ioutil.ReadFile(rootCertPath)
if err != nil {
return nil, err
}
cert, err := ioutil.ReadFile(issuerCertPath)
if err != nil {
return nil, err
}
key, err := ioutil.ReadFile(issuerKeyPath)
if err != nil {
return nil, err
}
return &CertChain{
RootCA: rootCert,
Cert: cert,
Key: key,
}, nil
}
placement 的 main.go 中,如果 mTLS 开启了,则会读取 tls 证书:
func loadCertChains(certChainPath string) *credentials.CertChain {
tlsCreds := credentials.NewTLSCredentials(certChainPath)
log.Info("mTLS enabled, getting tls certificates")
// try to load certs from disk, if not yet there, start a watch on the local filesystem
chain, err := credentials.LoadFromDisk(tlsCreds.RootCertPath(), tlsCreds.CertPath(), tlsCreds.KeyPath())
......
}
operator 的 operator.go 中,也会判断,如果 MTLSEnabled :
var certChain *credentials.CertChain
if o.config.MTLSEnabled {
log.Info("mTLS enabled, getting tls certificates")
// try to load certs from disk, if not yet there, start a watch on the local filesystem
chain, err := credentials.LoadFromDisk(o.config.Credentials.RootCertPath(), o.config.Credentials.CertPath(), o.config.Credentials.KeyPath())
......
}
备注:上面两段代码重复度极高,最好能重构一下。
sentry 中也有调用:
func (c *defaultCA) validateAndBuildTrustBundle() (*trustRootBundle, error) {
var (
issuerCreds *certs.Credentials
rootCertBytes []byte
issuerCertBytes []byte
)
// certs exist on disk or getting created, load them when ready
if !shouldCreateCerts(c.config) {
err := detectCertificates(c.config.RootCertPath)
if err != nil {
return nil, err
}
certChain, err := credentials.LoadFromDisk(c.config.RootCertPath, c.config.IssuerCertPath, c.config.IssuerKeyPath)
if err != nil {
return nil, errors.Wrap(err, "error loading cert chain from disk")
}
TODO: 证书相关的细节后面单独细看。
Dapr credentials package中的 credentials.go文件的源码学习,credentials 结构体持有证书相关的各种 path。
只有一个字段 credentialsPath:
// TLSCredentials holds paths for credentials
type TLSCredentials struct {
credentialsPath string
}
构造方法很简单:
// NewTLSCredentials returns a new TLSCredentials
func NewTLSCredentials(path string) TLSCredentials {
return TLSCredentials{
credentialsPath: path,
}
}
获取 credentialsPath,这个path中保存有 TLS 证书:
// Path returns the directory holding the TLS credentials
func (t *TLSCredentials) Path() string {
return t.credentialsPath
}
分别获取 root cert / cert / cert key 的 path:
// RootCertPath returns the file path for the root cert
func (t *TLSCredentials) RootCertPath() string {
return filepath.Join(t.credentialsPath, RootCertFilename)
}
// CertPath returns the file path for the cert
func (t *TLSCredentials) CertPath() string {
return filepath.Join(t.credentialsPath, IssuerCertFilename)
}
// KeyPath returns the file path for the cert key
func (t *TLSCredentials) KeyPath() string {
return filepath.Join(t.credentialsPath, IssuerKeyFilename)
}
Dapr credentials package中的 tls.go文件的源码学习,从 cert/key 中装载 tls.config 对象。
TLSConfigFromCertAndKey() 方法从 PEM 格式中有效的 cert/key 对中返回 tls.config 对象:
// TLSConfigFromCertAndKey return a tls.config object from valid cert/key pair in PEM format.
func TLSConfigFromCertAndKey(certPem, keyPem []byte, serverName string, rootCA *x509.CertPool) (*tls.Config, error) {
cert, err := tls.X509KeyPair(certPem, keyPem)
if err != nil {
return nil, err
}
// nolint:gosec
config := &tls.Config{
InsecureSkipVerify: false,
RootCAs: rootCA,
ServerName: serverName,
Certificates: []tls.Certificate{cert},
}
return config, nil
}
Dapr credentials package中的 grpc.go文件的源码学习,获取服务器端选项和客户端选项。
func GetServerOptions(certChain *CertChain) ([]grpc.ServerOption, error) {
opts := []grpc.ServerOption{}
if certChain == nil {
return opts, nil
}
cp := x509.NewCertPool()
cp.AppendCertsFromPEM(certChain.RootCA)
cert, err := tls.X509KeyPair(certChain.Cert, certChain.Key)
if err != nil {
return opts, nil
}
// nolint:gosec
config := &tls.Config{
ClientCAs: cp,
// Require cert verification
ClientAuth: tls.RequireAndVerifyClientCert,
Certificates: []tls.Certificate{cert},
}
opts = append(opts, grpc.Creds(credentials.NewTLS(config)))
return opts, nil
}
func GetClientOptions(certChain *CertChain, serverName string) ([]grpc.DialOption, error) {
opts := []grpc.DialOption{}
if certChain != nil {
cp := x509.NewCertPool()
ok := cp.AppendCertsFromPEM(certChain.RootCA)
if !ok {
return nil, errors.New("failed to append PEM root cert to x509 CertPool")
}
config, err := TLSConfigFromCertAndKey(certChain.Cert, certChain.Key, serverName, cp)
if err != nil {
return nil, errors.Wrap(err, "failed to create tls config from cert and key")
}
opts = append(opts, grpc.WithTransportCredentials(credentials.NewTLS(config)))
} else {
opts = append(opts, grpc.WithInsecure())
}
return opts, nil
}
TODO: 好吧,细节后面看,加密我不熟。
Dapr runtime package中的 options.go 文件的源码学习,用于定制 runtime 中包含的组件。
runtimeOpts封装了需要包含在 runtime 中的 component:
type (
// runtimeOpts encapsulates the components to include in the runtime.
runtimeOpts struct {
secretStores []secretstores.SecretStore
states []state.State
pubsubs []pubsub.PubSub
nameResolutions []nameresolution.NameResolution
inputBindings []bindings.InputBinding
outputBindings []bindings.OutputBinding
httpMiddleware []http.Middleware
}
)
Option 方法用于定制 runtime:
type (
// Option is a function that customizes the runtime.
Option func(o *runtimeOpts)
)
提供多个 WithXxx() 方法,用于定制 runtime 的组件:
// WithSecretStores adds secret store components to the runtime.
func WithSecretStores(secretStores ...secretstores.SecretStore) Option {
return func(o *runtimeOpts) {
o.secretStores = append(o.secretStores, secretStores...)
}
}
// WithStates adds state store components to the runtime.
func WithStates(states ...state.State) Option {
return func(o *runtimeOpts) {
o.states = append(o.states, states...)
}
}
// WithPubSubs adds pubsub store components to the runtime.
func WithPubSubs(pubsubs ...pubsub.PubSub) Option {
return func(o *runtimeOpts) {
o.pubsubs = append(o.pubsubs, pubsubs...)
}
}
// WithNameResolutions adds name resolution components to the runtime.
func WithNameResolutions(nameResolutions ...nameresolution.NameResolution) Option {
return func(o *runtimeOpts) {
o.nameResolutions = append(o.nameResolutions, nameResolutions...)
}
}
// WithInputBindings adds input binding components to the runtime.
func WithInputBindings(inputBindings ...bindings.InputBinding) Option {
return func(o *runtimeOpts) {
o.inputBindings = append(o.inputBindings, inputBindings...)
}
}
// WithOutputBindings adds output binding components to the runtime.
func WithOutputBindings(outputBindings ...bindings.OutputBinding) Option {
return func(o *runtimeOpts) {
o.outputBindings = append(o.outputBindings, outputBindings...)
}
}
// WithHTTPMiddleware adds HTTP middleware components to the runtime.
func WithHTTPMiddleware(httpMiddleware ...http.Middleware) Option {
return func(o *runtimeOpts) {
o.httpMiddleware = append(o.httpMiddleware, httpMiddleware...)
}
}
这些方法都只是简单的往 runtimeOpts 结构体的各个组件字段里面保存信息,用于后续 runtime 的初始化。
Dapr runtime package中的 cli.go 文件的源码学习,解析命令行标记并返回 DaprRuntime 实例。
cli.go 基本上就一个 FromFlags() 方法。
protocol,目前只支持 http 和 grpc :
// Protocol is a communications protocol
type Protocol string
const (
// GRPCProtocol is a gRPC communication protocol
GRPCProtocol Protocol = "grpc"
// HTTPProtocol is a HTTP communication protocol
HTTPProtocol Protocol = "http"
)
各种端口的默认值:
const (
// DefaultDaprHTTPPort is the default http port for Dapr
DefaultDaprHTTPPort = 3500
// DefaultDaprAPIGRPCPort is the default API gRPC port for Dapr
DefaultDaprAPIGRPCPort = 50001
// DefaultProfilePort is the default port for profiling endpoints
DefaultProfilePort = 7777
// DefaultMetricsPort is the default port for metrics endpoints
DefaultMetricsPort = 9090
)
http默认配置,目前只有一个 MaxRequestBodySize :
const (
// DefaultMaxRequestBodySize is the default option for the maximum body size in MB for Dapr HTTP servers
DefaultMaxRequestBodySize = 4
)
// Config holds the Dapr Runtime configuration
type Config struct {
ID string
HTTPPort int
ProfilePort int
EnableProfiling bool
APIGRPCPort int
InternalGRPCPort int
ApplicationPort int
ApplicationProtocol Protocol
Mode modes.DaprMode
PlacementAddresses []string
GlobalConfig string
AllowedOrigins string
Standalone config.StandaloneConfig
Kubernetes config.KubernetesConfig
MaxConcurrency int
mtlsEnabled bool
SentryServiceAddress string
CertChain *credentials.CertChain
AppSSL bool
MaxRequestBodySize int
}
有点乱,所有的字段都是扁平的,以后越加越多。。。
简单赋值构建 config 结构体,这个参数是在太多了一点:
// NewRuntimeConfig returns a new runtime config
func NewRuntimeConfig(
id string, placementAddresses []string,
controlPlaneAddress, allowedOrigins, globalConfig, componentsPath, appProtocol, mode string,
httpPort, internalGRPCPort, apiGRPCPort, appPort, profilePort int,
enableProfiling bool, maxConcurrency int, mtlsEnabled bool, sentryAddress string, appSSL bool, maxRequestBodySize int) *Config {
return &Config{
ID: id,
HTTPPort: httpPort,
InternalGRPCPort: internalGRPCPort,
APIGRPCPort: apiGRPCPort,
ApplicationPort: appPort,
ProfilePort: profilePort,
ApplicationProtocol: Protocol(appProtocol),
Mode: modes.DaprMode(mode),
PlacementAddresses: placementAddresses,
GlobalConfig: globalConfig,
AllowedOrigins: allowedOrigins,
Standalone: config.StandaloneConfig{
ComponentsPath: componentsPath,
},
Kubernetes: config.KubernetesConfig{
ControlPlaneAddress: controlPlaneAddress,
},
EnableProfiling: enableProfiling,
MaxConcurrency: maxConcurrency,
mtlsEnabled: mtlsEnabled,
SentryServiceAddress: sentryAddress,
AppSSL: appSSL,
MaxRequestBodySize: maxRequestBodySize,
}
}
Dapr runtime package中的 cli.go 文件的源码学习,解析命令行标记并返回 DaprRuntime 实例。
cli.go 基本上就一个 FromFlags() 方法。
FromFlags() 方法解析命令行标记并返回 DaprRuntime 实例:
// FromFlags parses command flags and returns DaprRuntime instance
func FromFlags() (*DaprRuntime, error) {
......
return NewDaprRuntime(runtimeConfig, globalConfig, accessControlList), nil
}
代码如下:
mode := flag.String("mode", string(modes.StandaloneMode), "Runtime mode for Dapr")
daprHTTPPort := flag.String("dapr-http-port", fmt.Sprintf("%v", DefaultDaprHTTPPort), "HTTP port for Dapr API to listen on")
daprAPIGRPCPort := flag.String("dapr-grpc-port", fmt.Sprintf("%v", DefaultDaprAPIGRPCPort), "gRPC port for the Dapr API to listen on")
daprInternalGRPCPort := flag.String("dapr-internal-grpc-port", "", "gRPC port for the Dapr Internal API to listen on")
appPort := flag.String("app-port", "", "The port the application is listening on")
profilePort := flag.String("profile-port", fmt.Sprintf("%v", DefaultProfilePort), "The port for the profile server")
appProtocol := flag.String("app-protocol", string(HTTPProtocol), "Protocol for the application: grpc or http")
componentsPath := flag.String("components-path", "", "Path for components directory. If empty, components will not be loaded. Self-hosted mode only")
config := flag.String("config", "", "Path to config file, or name of a configuration object")
appID := flag.String("app-id", "", "A unique ID for Dapr. Used for Service Discovery and state")
controlPlaneAddress := flag.String("control-plane-address", "", "Address for a Dapr control plane")
sentryAddress := flag.String("sentry-address", "", "Address for the Sentry CA service")
placementServiceHostAddr := flag.String("placement-host-address", "", "Addresses for Dapr Actor Placement servers")
allowedOrigins := flag.String("allowed-origins", cors.DefaultAllowedOrigins, "Allowed HTTP origins")
enableProfiling := flag.Bool("enable-profiling", false, "Enable profiling")
runtimeVersion := flag.Bool("version", false, "Prints the runtime version")
appMaxConcurrency := flag.Int("app-max-concurrency", -1, "Controls the concurrency level when forwarding requests to user code")
enableMTLS := flag.Bool("enable-mtls", false, "Enables automatic mTLS for daprd to daprd communication channels")
appSSL := flag.Bool("app-ssl", false, "Sets the URI scheme of the app to https and attempts an SSL connection")
daprHTTPMaxRequestSize := flag.Int("dapr-http-max-request-size", -1, "Increasing max size of request body in MB to handle uploading of big files. By default 4 MB.")
TODO:应该有命令行参数的文档,对照文档学习一遍。
loggerOptions := logger.DefaultOptions()
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)
metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)
// attaching only metrics-port option
metricsExporter.Options().AttachCmdFlag(flag.StringVar)
然后执行解析:
flag.Parse()
如果只是version命令,则打印版本信息之后就可以退出进程了:
runtimeVersion := flag.Bool("version", false, "Prints the runtime version")
if *runtimeVersion {
fmt.Println(version.Version())
os.Exit(0)
}
根据日志属性初始化logger:
loggerOptions := logger.DefaultOptions()
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)
if *appID == "" {
return nil, errors.New("app-id parameter cannot be empty")
}
// Apply options to all loggers
loggerOptions.SetAppID(*appID)
if err := logger.ApplyOptionsToLoggers(&loggerOptions); err != nil {
return nil, err
}
完成日志初始化之后就可以愉快的打印日志了:
log.Infof("starting Dapr Runtime -- version %s -- commit %s", version.Version(), version.Commit())
log.Infof("log level set to: %s", loggerOptions.OutputLevel)
初始化dapr metrics exporter:
// Initialize dapr metrics exporter
if err := metricsExporter.Init(); err != nil {
log.Fatal(err)
}
dapr-http-port / dapr-grpc-port / profile-port / dapr-internal-grpc-port / app-port :
daprHTTP, err := strconv.Atoi(*daprHTTPPort)
if err != nil {
return nil, errors.Wrap(err, "error parsing dapr-http-port flag")
}
daprAPIGRPC, err := strconv.Atoi(*daprAPIGRPCPort)
if err != nil {
return nil, errors.Wrap(err, "error parsing dapr-grpc-port flag")
}
profPort, err := strconv.Atoi(*profilePort)
if err != nil {
return nil, errors.Wrap(err, "error parsing profile-port flag")
}
var daprInternalGRPC int
if *daprInternalGRPCPort != "" {
daprInternalGRPC, err = strconv.Atoi(*daprInternalGRPCPort)
if err != nil {
return nil, errors.Wrap(err, "error parsing dapr-internal-grpc-port")
}
} else {
daprInternalGRPC, err = grpc.GetFreePort()
if err != nil {
return nil, errors.Wrap(err, "failed to get free port for internal grpc server")
}
}
var applicationPort int
if *appPort != "" {
applicationPort, err = strconv.Atoi(*appPort)
if err != nil {
return nil, errors.Wrap(err, "error parsing app-port")
}
}
继续解析 maxRequestBodySize / placementAddresses / concurrency / appProtocol 等 配置:
var maxRequestBodySize int
if *daprHTTPMaxRequestSize != -1 {
maxRequestBodySize = *daprHTTPMaxRequestSize
} else {
maxRequestBodySize = DefaultMaxRequestBodySize
}
placementAddresses := []string{}
if *placementServiceHostAddr != "" {
placementAddresses = parsePlacementAddr(*placementServiceHostAddr)
}
var concurrency int
if *appMaxConcurrency != -1 {
concurrency = *appMaxConcurrency
}
appPrtcl := string(HTTPProtocol)
if *appProtocol != string(HTTPProtocol) {
appPrtcl = *appProtocol
}
runtimeConfig := NewRuntimeConfig(*appID, placementAddresses, *controlPlaneAddress, *allowedOrigins, *config, *componentsPath,
appPrtcl, *mode, daprHTTP, daprInternalGRPC, daprAPIGRPC, applicationPort, profPort, *enableProfiling, concurrency, *enableMTLS, *sentryAddress, *appSSL, maxRequestBodySize)
MTLS相关的配置:
if *enableMTLS {
runtimeConfig.CertChain, err = security.GetCertChain()
if err != nil {
return nil, err
}
}
var globalConfig *global_config.Configuration
根据 config 配置文件的配置,还有 dapr 模式的配置,读取相应的配置文件:
config := flag.String("config", "", "Path to config file, or name of a configuration object")
if *config != "" {
switch modes.DaprMode(*mode) {
case modes.KubernetesMode:
client, conn, clientErr := client.GetOperatorClient(*controlPlaneAddress, security.TLSServerName, runtimeConfig.CertChain)
if clientErr != nil {
return nil, clientErr
}
defer conn.Close()
namespace = os.Getenv("NAMESPACE")
globalConfig, configErr = global_config.LoadKubernetesConfiguration(*config, namespace, client)
case modes.StandaloneMode:
globalConfig, _, configErr = global_config.LoadStandaloneConfiguration(*config)
}
if configErr != nil {
log.Debugf("Config error: %v", configErr)
}
}
if configErr != nil {
log.Fatalf("error loading configuration: %s", configErr)
}
简单说:kubernetes 模式下读取CRD,standalone 模式下读取本地配置文件。
如果 config 没有配置,则使用默认的 global 配置:
if globalConfig == nil {
log.Info("loading default configuration")
globalConfig = global_config.LoadDefaultConfiguration()
}
var accessControlList *global_config.AccessControlList
accessControlList, err = global_config.ParseAccessControlSpec(globalConfig.Spec.AccessControlSpec, string(runtimeConfig.ApplicationProtocol))
if err != nil {
log.Fatalf(err.Error())
}
最后构造 DaprRuntime 实例:
return NewDaprRuntime(runtimeConfig, globalConfig, accessControlList), nil
Dapr channel package中的 channel.go 文件的源码学习,定义 AppChannel 接口和方法。
AppChannel 是和用户代码进行通讯的抽象。
常量定义 DefaultChannelAddress,考虑到 dapr 通常是以 sidecar 模式部署的,因此默认channel 地址是 127.0.0.1
const (
// DefaultChannelAddress is the address that user application listen to
DefaultChannelAddress = "127.0.0.1"
)
方法定义:
// AppChannel is an abstraction over communications with user code
type AppChannel interface {
GetBaseAddress() string
InvokeMethod(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error)
}
Dapr channel package中的 grpc_channel.go 文件的源码学习,AppChannel 的 gRPC 实现。
Channel是一个具体的AppChannel实现,用于与基于gRPC的用户代码进行交互。
// Channel is a concrete AppChannel implementation for interacting with gRPC based user code
type Channel struct {
// grpc 客户端连接
client *grpc.ClientConn
// user code(应用)的地址
baseAddress string
// 限流用的 go chan
ch chan int
tracingSpec config.TracingSpec
appMetadataToken string
}
// CreateLocalChannel creates a gRPC connection with user code
func CreateLocalChannel(port, maxConcurrency int, conn *grpc.ClientConn, spec config.TracingSpec) *Channel {
c := &Channel{
client: conn,
// baseAddress 就是 "ip:port"
baseAddress: fmt.Sprintf("%s:%d", channel.DefaultChannelAddress, port),
tracingSpec: spec,
appMetadataToken: auth.GetAppToken(),
}
if maxConcurrency > 0 {
// 如果有并发控制要求,则创建用于并发控制的go channel
c.ch = make(chan int, maxConcurrency)
}
return c
}
// GetBaseAddress returns the application base address
func (g *Channel) GetBaseAddress() string {
return g.baseAddress
}
这个方法用来获取app的基础路径,可以拼接其他的字路径,如:
func (a *actorsRuntime) startAppHealthCheck(opts ...health.Option) {
healthAddress := fmt.Sprintf("%s/healthz", a.appChannel.GetBaseAddress())
ch := health.StartEndpointHealthCheck(healthAddress, opts...)
......
}
备注:只有 actor 这一个地方用到了这个方法
InvokeMethod 方法通过 gRPC 调用 user code:
// InvokeMethod invokes user code via gRPC
func (g *Channel) InvokeMethod(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
var rsp *invokev1.InvokeMethodResponse
var err error
switch req.APIVersion() {
case internalv1pb.APIVersion_V1:
// 目前只支持 v1 版本
rsp, err = g.invokeMethodV1(ctx, req)
default:
// Reject unsupported version
// 其他版本会被拒绝
rsp = nil
err = status.Error(codes.Unimplemented, fmt.Sprintf("Unsupported spec version: %d", req.APIVersion()))
}
return rsp, err
}
invokeMethodV1() 的实现
// invokeMethodV1 calls user applications using daprclient v1
func (g *Channel) invokeMethodV1(ctx context.Context, req *invokev1.InvokeMethodRequest) (*invokev1.InvokeMethodResponse, error) {
if g.ch != nil {
// 往 ch 里面发一个int,等价于当前并发数量 + 1
g.ch <- 1
}
// 创建一个 app callback 的 client
clientV1 := runtimev1pb.NewAppCallbackClient(g.client)
// 将内部 metadata 转为 grpc 的 metadata
grpcMetadata := invokev1.InternalMetadataToGrpcMetadata(ctx, req.Metadata(), true)
if g.appMetadataToken != "" {
grpcMetadata.Set(auth.APITokenHeader, g.appMetadataToken)
}
// Prepare gRPC Metadata
ctx = metadata.NewOutgoingContext(context.Background(), grpcMetadata)
var header, trailer metadata.MD
// 调用user code
resp, err := clientV1.OnInvoke(ctx, req.Message(), grpc.Header(&header), grpc.Trailer(&trailer))
if g.ch != nil {
// 从 ch 中读取一个int,等价于当前并发数量 - 1
// 但这个操作并没有额外保护,如果上面的代码发生 panic,岂不是这个计数器就出错了?
// 考虑把这个操作放在 deffer 中进行会比较安全
<-g.ch
}
var rsp *invokev1.InvokeMethodResponse
if err != nil {
// Convert status code
respStatus := status.Convert(err)
// Prepare response
rsp = invokev1.NewInvokeMethodResponse(int32(respStatus.Code()), respStatus.Message(), respStatus.Proto().Details)
} else {
rsp = invokev1.NewInvokeMethodResponse(int32(codes.OK), "", nil)
}
rsp.WithHeaders(header).WithTrailers(trailer)
return rsp.WithMessage(resp), nil
}
使用这个方法的地方有:
Registry 结构体是用来注册返回工作流实现的组件接口
import (
wfs "github.com/dapr/components-contrib/workflows"
)
// Registry is an interface for a component that returns registered state store implementations.
type Registry struct {
Logger logger.Logger
workflowComponents map[string]func(logger.Logger) wfs.Workflow
}
这里的 Workflow 在 components-contrib 中定义。
package 中定义了一个 默认Registry, singleton, 还是 public的:
// DefaultRegistry is the singleton with the registry .
var DefaultRegistry *Registry = NewRegistry()
// NewRegistry is used to create workflow registry.
func NewRegistry() *Registry {
return &Registry{
workflowComponents: map[string]func(logger.Logger) wfs.Workflow{},
}
}
RegisterComponent() 方法在在 register 结构体的 workflowComponents 字段中加入一条或多条记录
func (s *Registry) RegisterComponent(componentFactory func(logger.Logger) wfs.Workflow, names ...string) {
for _, name := range names {
s.workflowComponents[createFullName(name)] = componentFactory
}
}
func createFullName(name string) string {
return strings.ToLower("workflow." + name)
}
key 是 "workflow." + name
转小写, value 是传入的 componentFactory,这是一个函数,只要传入一个 logger,就能返回 Workflow 实现。
create() 方法根据指定的 name ,version 来构建对应的 workflow 实现:
func (s *Registry) Create(name, version, logName string) (wfs.Workflow, error) {
if method, ok := s.getWorkflowComponent(name, version, logName); ok {
return method(), nil
}
return nil, fmt.Errorf("couldn't find wokflow %s/%s", name, version)
}
关键实现代码在 getWorkflowComponent() 方法中:
func (s *Registry) getWorkflowComponent(name, version, logName string) (func() wfs.Workflow, bool) {
nameLower := strings.ToLower(name)
versionLower := strings.ToLower(version)
// 用 nameLower+"/"+versionLower 拼接出 key
// 然后在 register 结构体的 workflowComponents 字段中查找
// TODO: 保存的时候是 key 是 `"workflow." + name` 转小写
workflowFn, ok := s.workflowComponents[nameLower+"/"+versionLower]
if ok {
return s.wrapFn(workflowFn, logName), true
}
// 如果没有找到,看看是不是 InitialVersion
if components.IsInitialVersion(versionLower) {
// 如果是 InitialVersion,则不需要拼接 version 内容,直接通过 name 来查找
// TODO:这要求 name 必须是 "workflow." 开头?
workflowFn, ok = s.workflowComponents[nameLower]
if ok {
return s.wrapFn(workflowFn, logName), true
}
}
return nil, false
}
如果有在 workflowComponents 字段中找到注册的 workflow 实现的 factory, 则用这个 factory 生成 workflow 的实现:
func (s *Registry) wrapFn(componentFactory func(logger.Logger) wfs.Workflow, logName string) func() wfs.Workflow {
return func() wfs.Workflow {
// registey 的 logger 会被用来做 workflow 实现的 logger
l := s.Logger
if logName != "" && l != nil {
// 在 logger 中增加 component 字段,值为 logName
l = l.WithFields(map[string]any{
"component": logName,
})
}
// 最后调用 factory 的方法来构建 workflow 实现
return componentFactory(l)
}
}
需要小心核对 key 的内容:
Dapr health package中的 health.go 文件的源码分析,health checking的客户端实现
// Option is an a function that applies a health check option
type Option func(o *healthCheckOptions)
healthCheckOptions 结构体
type healthCheckOptions struct {
initialDelay time.Duration
requestTimeout time.Duration
failureThreshold int
interval time.Duration
successStatusCode int
}
WithXxx 方法用来设置上述5个健康检查的选项,每个方法都返回一个 Option 函数:
// WithInitialDelay sets the initial delay for the health check
func WithInitialDelay(delay time.Duration) Option {
return func(o *healthCheckOptions) {
o.initialDelay = delay
}
}
// WithFailureThreshold sets the failure threshold for the health check
func WithFailureThreshold(threshold int) Option {
return func(o *healthCheckOptions) {
o.failureThreshold = threshold
}
}
// WithRequestTimeout sets the request timeout for the health check
func WithRequestTimeout(timeout time.Duration) Option {
return func(o *healthCheckOptions) {
o.requestTimeout = timeout
}
}
// WithSuccessStatusCode sets the status code for the health check
func WithSuccessStatusCode(code int) Option {
return func(o *healthCheckOptions) {
o.successStatusCode = code
}
}
// WithInterval sets the interval for the health check
func WithInterval(interval time.Duration) Option {
return func(o *healthCheckOptions) {
o.interval = interval
}
}
StartEndpointHealthCheck 方法用给定的选项在指定的地址上启动健康检查。它返回一个通道,如果端点是健康的则发出true,如果满足失败条件则发出false。
// StartEndpointHealthCheck starts a health check on the specified address with the given options.
// It returns a channel that will emit true if the endpoint is healthy and false if the failure conditions
// Have been met.
func StartEndpointHealthCheck(endpointAddress string, opts ...Option) chan bool {
options := &healthCheckOptions{}
applyDefaults(options)
// 执行每个 Option 函数来设置健康检查的选项
for _, o := range opts {
o(options)
}
signalChan := make(chan bool, 1)
go func(ch chan<- bool, endpointAddress string, options *healthCheckOptions) {
// 设置健康检查的间隔时间 interval,默认5秒一次
ticker := time.NewTicker(options.interval)
failureCount := 0
// 先 sleep initialDelay 时间再开始健康检查
time.Sleep(options.initialDelay)
// 创建 http client,设置请求超时时间为 requestTimeout
client := &fasthttp.Client{
MaxConnsPerHost: 5, // Limit Keep-Alive connections
ReadTimeout: options.requestTimeout,
MaxIdemponentCallAttempts: 1,
}
req := fasthttp.AcquireRequest()
req.SetRequestURI(endpointAddress)
req.Header.SetMethod(fasthttp.MethodGet)
defer fasthttp.ReleaseRequest(req)
for range ticker.C {
resp := fasthttp.AcquireResponse()
err := client.DoTimeout(req, resp, options.requestTimeout)
// 通过检查应答的状态码来判断健康检查是否成功: successStatusCode
if err != nil || resp.StatusCode() != options.successStatusCode {
// 健康检查失败,错误计数器加一
failureCount++
// 如果连续错误次数达到阈值 failureThreshold,则视为健康检查失败,发送false到channel
if failureCount == options.failureThreshold {
ch <- false
}
} else {
// 健康检查成功,发送 true 到 channel
ch <- true
// 同时重制 failureCount
failureCount = 0
}
fasthttp.ReleaseResponse(resp)
}
}(signalChan, endpointAddress, options)
return signalChan
}
applyDefaults() 方法设置默认属性:
const (
initialDelay = time.Second * 1
failureThreshold = 2
requestTimeout = time.Second * 2
interval = time.Second * 5
successStatusCode = 200
)
func applyDefaults(o *healthCheckOptions) {
o.failureThreshold = failureThreshold
o.initialDelay = initialDelay
o.requestTimeout = requestTimeout
o.successStatusCode = successStatusCode
o.interval = interval
}
对某一个给定地址 endpointAddress 进行健康检查的步骤和方式为:
Dapr health package中的 server.go 文件的源码分析,healthz server的实现
healthz server 的接口定义:
// Server is the interface for the healthz server
type Server interface {
Run(context.Context, int) error
Ready()
NotReady()
}
server 结构体,ready 字段保存状态:
type server struct {
ready bool
log logger.Logger
}
创建 healthz server的方法:
// NewServer returns a new healthz server
func NewServer(log logger.Logger) Server {
return &server{
log: log,
}
}
设置 ready 状态的两个方法:
// Ready sets a ready state for the endpoint handlers
func (s *server) Ready() {
s.ready = true
}
// NotReady sets a not ready state for the endpoint handlers
func (s *server) NotReady() {
s.ready = false
}
Run 方法启动一个带有 healthz 端点的 http 服务器,端口通过参数 port 指定:
// Run starts a net/http server with a healthz endpoint
func (s *server) Run(ctx context.Context, port int) error {
router := http.NewServeMux()
router.Handle("/healthz", s.healthz())
srv := &http.Server{
Addr: fmt.Sprintf(":%d", port),
Handler: router,
}
...
}
启动之后:
doneCh := make(chan struct{})
go func() {
select {
case <-ctx.Done():
s.log.Info("Healthz server is shutting down")
shutdownCtx, cancel := context.WithTimeout(
context.Background(),
time.Second*5,
)
defer cancel()
srv.Shutdown(shutdownCtx) // nolint: errcheck
case <-doneCh:
}
}()
s.log.Infof("Healthz server is listening on %s", srv.Addr)
err := srv.ListenAndServe()
if err != http.ErrServerClosed {
s.log.Errorf("Healthz server error: %s", err)
}
close(doneCh)
return err
}
healthz() 方法是 health endpoint 的 handler,根据当前 healthz server 的 ready 字段的状态值返回 HTTP 状态码:
// healthz is a health endpoint handler
func (s *server) healthz() http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
var status int
if s.ready {
// ready 返回 200
status = http.StatusOK
} else {
// 不 ready 则返回 503
status = http.StatusServiceUnavailable
}
w.WriteHeader(status)
})
}
healthz server 在 injector / placement / sentry / operator 中都有使用,这些进程都是在 main 方法中启动 healthz server。
injector 启动在 8080 端口:
const (
healthzPort = 8080
)
func main() {
......
go func() {
healthzServer := health.NewServer(log)
healthzServer.Ready()
healthzErr := healthzServer.Run(ctx, healthzPort)
if healthzErr != nil {
log.Fatalf("failed to start healthz server: %s", healthzErr)
}
}()
......
}
placement 默认启动在 8080 端口(也可以通过命令行参数修改端口):
const (
defaultHealthzPort = 8080
)
func main() {
flag.IntVar(&cfg.healthzPort, "healthz-port", cfg.healthzPort, "sets the HTTP port for the healthz server")
......
go startHealthzServer(cfg.healthzPort)
......
}
func startHealthzServer(healthzPort int) {
healthzServer := health.NewServer(log)
healthzServer.Ready()
if err := healthzServer.Run(context.Background(), healthzPort); err != nil {
log.Fatalf("failed to start healthz server: %s", err)
}
}
sentry 启动在 8080 端口:
const (
healthzPort = 8080
)
func main() {
......
go func() {
healthzServer := health.NewServer(log)
healthzServer.Ready()
err := healthzServer.Run(ctx, healthzPort)
if err != nil {
log.Fatalf("failed to start healthz server: %s", err)
}
}()
......
}
operator 启动在 8080 端口:
const (
healthzPort = 8080
)
func main() {
......
go func() {
healthzServer := health.NewServer(log)
healthzServer.Ready()
err := healthzServer.Run(ctx, healthzPort)
if err != nil {
log.Fatalf("failed to start healthz server: %s", err)
}
}()
......
}
特别指出:daprd 没有使用 healthz server,daprd 是直接在 dapr HTTP api 的基础上增加了 healthz 的功能。
具体代码在 http/api.go 中:
func NewAPI(......
api.endpoints = append(api.endpoints, api.constructHealthzEndpoints()...)
return api
}
func (a *api) constructHealthzEndpoints() []Endpoint {
return []Endpoint{
{
Methods: []string{fasthttp.MethodGet},
Route: "healthz",
Version: apiVersionV1,
Handler: a.onGetHealthz,
},
}
}
onGetHealthz() 方法处理请求:
func (a *api) onGetHealthz(reqCtx *fasthttp.RequestCtx) {
if !a.readyStatus {
msg := NewErrorResponse("ERR_HEALTH_NOT_READY", messages.ErrHealthNotReady)
respondWithError(reqCtx, fasthttp.StatusInternalServerError, msg)
log.Debug(msg)
} else {
respondEmpty(reqCtx)
}
}
func respondEmpty(ctx *fasthttp.RequestCtx) {
ctx.Response.SetBody(nil)
ctx.Response.SetStatusCode(fasthttp.StatusNoContent)
}
注意:这里成功时返回的状态码是 204 StatusNoContent,而不是通常的 200 OK。
Dapr metrics package中的 exporter.go文件的源码分析,包括结构体定义、方法实现。当前只支持 Prometheus。
Exporter 接口定义:
// Exporter is the interface for metrics exporters
type Exporter interface {
// Init initializes metrics exporter
Init() error
// Options returns Exporter options
Options() *Options
}
exporter 结构体定义:
// exporter is the base struct
type exporter struct {
namespace string
options *Options
logger logger.Logger
}
// NewExporter creates new MetricsExporter instance
func NewExporter(namespace string) Exporter {
// TODO: support multiple exporters
return &promMetricsExporter{
&exporter{
namespace: namespace,
options: defaultMetricOptions(),
logger: logger.NewLogger("dapr.metrics"),
},
nil,
}
}
当前只支持 promMetrics 的 Exporter。
Options() 方法简单返回 m.options:
// Options returns current metric exporter options
func (m *exporter) Options() *Options {
return m.options
}
具体的赋值在 defaultMetricOptions().
// promMetricsExporter is prometheus metric exporter
type promMetricsExporter struct {
*exporter
ocExporter *ocprom.Exporter
}
内嵌 exporter (相当于继承),还有一个 ocprom.Exporter 字段。
初始化 opencensus 的 exporter:
// Init initializes opencensus exporter
func (m *promMetricsExporter) Init() error {
if !m.exporter.Options().MetricsEnabled {
return nil
}
// Add default health metrics for process
// 添加默认的 health metrics: 进程信息,和 go 信息
registry := prom.NewRegistry()
registry.MustRegister(prom.NewProcessCollector(prom.ProcessCollectorOpts{}))
registry.MustRegister(prom.NewGoCollector())
var err error
m.ocExporter, err = ocprom.NewExporter(ocprom.Options{
Namespace: m.namespace,
Registry: registry,
})
if err != nil {
return errors.Errorf("failed to create Prometheus exporter: %v", err)
}
// register exporter to view
view.RegisterExporter(m.ocExporter)
// start metrics server
return m.startMetricServer()
}
启动 MetricServer, 监听端口来自 options 的 MetricsPort,监听路径为 defaultMetricsPath:
const (
defaultMetricsPath = "/"
)
// startMetricServer starts metrics server
func (m *promMetricsExporter) startMetricServer() error {
if !m.exporter.Options().MetricsEnabled {
// skip if metrics is not enabled
return nil
}
addr := fmt.Sprintf(":%d", m.options.MetricsPort())
if m.ocExporter == nil {
return errors.New("exporter was not initialized")
}
m.exporter.logger.Infof("metrics server started on %s%s", addr, defaultMetricsPath)
go func() {
mux := http.NewServeMux()
mux.Handle(defaultMetricsPath, m.ocExporter)
if err := http.ListenAndServe(addr, mux); err != nil {
m.exporter.logger.Fatalf("failed to start metrics server: %v", err)
}
}()
return nil
}
Dapr metrics package中的 options.go文件的源码学习
// Options defines the sets of options for Dapr logging
type Options struct {
// OutputLevel is the level of logging
MetricsEnabled bool
metricsPort string
}
metrics 默认端口 9090, 默认启用 metrics:
const (
defaultMetricsPort = "9090"
defaultMetricsEnabled = true
)
func defaultMetricOptions() *Options {
return &Options{
metricsPort: defaultMetricsPort,
MetricsEnabled: defaultMetricsEnabled,
}
}
MetricsPort() 方法用于获取 metrics 端口,如果配置错误,则使用默认端口 9090:
// MetricsPort gets metrics port.
func (o *Options) MetricsPort() uint64 {
port, err := strconv.ParseUint(o.metricsPort, 10, 64)
if err != nil {
// Use default metrics port as a fallback
port, _ = strconv.ParseUint(defaultMetricsPort, 10, 64)
}
return port
}
AttachCmdFlags() 方法解析 metrics-port 和 enable-metrics 两个命令行标记:
// AttachCmdFlags attaches metrics options to command flags
func (o *Options) AttachCmdFlags(
stringVar func(p *string, name string, value string, usage string),
boolVar func(p *bool, name string, value bool, usage string)) {
stringVar(
&o.metricsPort,
"metrics-port",
defaultMetricsPort,
"The port for the metrics server")
boolVar(
&o.MetricsEnabled,
"enable-metrics",
defaultMetricsEnabled,
"Enable prometheus metric")
}
AttachCmdFlag() 方法只解析 metrics-port 命令行标记(不解析 enable-metrics ) :
// AttachCmdFlag attaches single metrics option to command flags
func (o *Options) AttachCmdFlag(
stringVar func(p *string, name string, value string, usage string)) {
stringVar(
&o.metricsPort,
"metrics-port",
defaultMetricsPort,
"The port for the metrics server")
}
只解析 metrics-port 命令行标记 的 AttachCmdFlag() 方法在 dapr runtime 启动时被调用(也只被这一个地方调用):
metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)
// attaching only metrics-port option
metricsExporter.Options().AttachCmdFlag(flag.StringVar)
而解析 metrics-port 和 enable-metrics 两个命令行标记的 AttachCmdFlags() 方法被 injector / operator / placement / sentry 调用:
func init() {
metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)
metricsExporter.Options().AttachCmdFlags(flag.StringVar, flag.BoolVar)
}
dapr/proto/runtime/v1/dapr.proto
service Dapr {
// Starts a new instance of a workflow
rpc StartWorkflowAlpha1 (StartWorkflowRequest) returns (StartWorkflowResponse) {}
// Gets details about a started workflow instance
rpc GetWorkflowAlpha1 (GetWorkflowRequest) returns (GetWorkflowResponse) {}
// Purge Workflow
rpc PurgeWorkflowAlpha1 (PurgeWorkflowRequest) returns (google.protobuf.Empty) {}
// Terminates a running workflow instance
rpc TerminateWorkflowAlpha1 (TerminateWorkflowRequest) returns (google.protobuf.Empty) {}
// Pauses a running workflow instance
rpc PauseWorkflowAlpha1 (PauseWorkflowRequest) returns (google.protobuf.Empty) {}
// Resumes a paused workflow instance
rpc ResumeWorkflowAlpha1 (ResumeWorkflowRequest) returns (google.protobuf.Empty) {}
// Raise an event to a running workflow instance
rpc RaiseEventWorkflowAlpha1 (RaiseEventWorkflowRequest) returns (google.protobuf.Empty) {}
}
workflow 没有 sidecar 往应用方向发请求的场景,也就是没有 appcallback 。
pkg/proto/runtime/v1
下存放的是根据 proto 生成的 go 代码
比如 pkg/proto/runtime/v1/dapr_grpc.pb.go
pkg/http/api.go
const (
workflowComponent = "workflowComponent"
workflowName = "workflowName"
)
func NewAPI(opts APIOpts) API {
api := &api{
......
api.endpoints = append(api.endpoints, api.constructWorkflowEndpoints()...)
return api
}
constructWorkflowEndpoints() 方法的实现在 pkg/http/api_workflow.go
中:
func (a *api) constructWorkflowEndpoints() []Endpoint {
return []Endpoint{
{
Methods: []string{http.MethodGet},
Route: "workflows/{workflowComponent}/{instanceID}",
Version: apiVersionV1alpha1,
Handler: a.onGetWorkflowHandler(),
},
{
Methods: []string{http.MethodPost},
Route: "workflows/{workflowComponent}/{instanceID}/raiseEvent/{eventName}",
Version: apiVersionV1alpha1,
Handler: a.onRaiseEventWorkflowHandler(),
},
{
Methods: []string{http.MethodPost},
Route: "workflows/{workflowComponent}/{workflowName}/start",
Version: apiVersionV1alpha1,
Handler: a.onStartWorkflowHandler(),
},
{
Methods: []string{http.MethodPost},
Route: "workflows/{workflowComponent}/{instanceID}/pause",
Version: apiVersionV1alpha1,
Handler: a.onPauseWorkflowHandler(),
},
{
Methods: []string{http.MethodPost},
Route: "workflows/{workflowComponent}/{instanceID}/resume",
Version: apiVersionV1alpha1,
Handler: a.onResumeWorkflowHandler(),
},
{
Methods: []string{http.MethodPost},
Route: "workflows/{workflowComponent}/{instanceID}/terminate",
Version: apiVersionV1alpha1,
Handler: a.onTerminateWorkflowHandler(),
},
{
Methods: []string{http.MethodPost},
Route: "workflows/{workflowComponent}/{instanceID}/purge",
Version: apiVersionV1alpha1,
Handler: a.onPurgeWorkflowHandler(),
},
}
}
pkg/http/api_workflow.go
// Route: "workflows/{workflowComponent}/{workflowName}/start?instanceID={instanceID}",
// Workflow Component: Component specified in yaml
// Workflow Name: Name of the workflow to run
// Instance ID: Identifier of the specific run
func (a *api) onStartWorkflowHandler() http.HandlerFunc {
return UniversalHTTPHandler(
a.universal.StartWorkflowAlpha1,
// UniversalHTTPHandlerOpts 是范型结构体
UniversalHTTPHandlerOpts[*runtimev1pb.StartWorkflowRequest, *runtimev1pb.StartWorkflowResponse]{
// We pass the input body manually rather than parsing it using protojson
SkipInputBody: true,
InModifier: func(r *http.Request, in *runtimev1pb.StartWorkflowRequest) (*runtimev1pb.StartWorkflowRequest, error) {
in.WorkflowName = chi.URLParam(r, workflowName)
in.WorkflowComponent = chi.URLParam(r, workflowComponent)
// instance id 是可选的,如果没有指定则生成一个随机的
// The instance ID is optional. If not specified, we generate a random one.
in.InstanceId = r.URL.Query().Get(instanceID)
if in.InstanceId == "" {
randomID, err := uuid.NewRandom()
if err != nil {
return nil, err
}
in.InstanceId = randomID.String()
}
// HTTP request body 直接用来做 workflow 的 Input
// We accept the HTTP request body as the input to the workflow
// without making any assumptions about its format.
var err error
in.Input, err = io.ReadAll(r.Body)
if err != nil {
return nil, messages.ErrBodyRead.WithFormat(err)
}
return in, nil
},
SuccessStatusCode: http.StatusAccepted,
})
}
// Route: POST "workflows/{workflowComponent}/{instanceID}"
func (a *api) onGetWorkflowHandler() http.HandlerFunc {
return UniversalHTTPHandler(
a.universal.GetWorkflowAlpha1,
UniversalHTTPHandlerOpts[*runtimev1pb.GetWorkflowRequest, *runtimev1pb.GetWorkflowResponse]{
InModifier: workflowInModifier[*runtimev1pb.GetWorkflowRequest],
})
}
workflowInModifier() 方法是通用方法,读取 WorkflowComponent 和 InstanceId 两个参数:
// Shared InModifier method for all universal handlers for workflows that adds the "WorkflowComponent" and "InstanceId" properties
func workflowInModifier[T runtimev1pb.WorkflowRequests](r *http.Request, in T) (T, error) {
in.SetWorkflowComponent(chi.URLParam(r, workflowComponent))
in.SetInstanceId(chi.URLParam(r, instanceID))
return in, nil
}
dapr/proto/runtime/v1/dapr.proto
service Dapr {
// Starts a new instance of a workflow
rpc StartWorkflowAlpha1 (StartWorkflowRequest) returns (StartWorkflowResponse) {}
// Gets details about a started workflow instance
rpc GetWorkflowAlpha1 (GetWorkflowRequest) returns (GetWorkflowResponse) {}
// Purge Workflow
rpc PurgeWorkflowAlpha1 (PurgeWorkflowRequest) returns (google.protobuf.Empty) {}
// Terminates a running workflow instance
rpc TerminateWorkflowAlpha1 (TerminateWorkflowRequest) returns (google.protobuf.Empty) {}
// Pauses a running workflow instance
rpc PauseWorkflowAlpha1 (PauseWorkflowRequest) returns (google.protobuf.Empty) {}
// Resumes a paused workflow instance
rpc ResumeWorkflowAlpha1 (ResumeWorkflowRequest) returns (google.protobuf.Empty) {}
// Raise an event to a running workflow instance
rpc RaiseEventWorkflowAlpha1 (RaiseEventWorkflowRequest) returns (google.protobuf.Empty) {}
}
workflow 没有 sidecar 往应用方向发请求的场景,也就是没有 appcallback 。
pkg/proto/runtime/v1
下存放的是根据 proto 生成的 go 代码
比如 pkg/proto/runtime/v1/dapr_grpc.pb.go
状态管理的源码
stateStoreRegistry Registry 的初始化在 runtime 初始化时进行:
func NewDaprRuntime(runtimeConfig *Config, globalConfig *config.Configuration) *DaprRuntime {
......
stateStoreRegistry: state_loader.NewRegistry(),
}
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
a.stateStoreRegistry.Register(opts.states...)
......
}
这些 opts 来自 runtime 启动时的配置,如 cmd/daprd/main.go 下:
func main() {
rt, err := runtime.FromFlags()
if err != nil {
log.Fatal(err)
}
err = rt.Run(
......
runtime.WithStates(
state_loader.New("redis", func() state.Store {
return state_redis.NewRedisStateStore(logContrib)
}),
state_loader.New("consul", func() state.Store {
return consul.NewConsulStateStore(logContrib)
}),
state_loader.New("azure.blobstorage", func() state.Store {
return state_azure_blobstorage.NewAzureBlobStorageStore(logContrib)
}),
state_loader.New("azure.cosmosdb", func() state.Store {
return state_cosmosdb.NewCosmosDBStateStore(logContrib)
}),
state_loader.New("azure.tablestorage", func() state.Store {
return state_azure_tablestorage.NewAzureTablesStateStore(logContrib)
}),
//state_loader.New("etcd", func() state.Store {
// return etcd.NewETCD(logContrib)
//}),
state_loader.New("cassandra", func() state.Store {
return cassandra.NewCassandraStateStore(logContrib)
}),
state_loader.New("memcached", func() state.Store {
return memcached.NewMemCacheStateStore(logContrib)
}),
state_loader.New("mongodb", func() state.Store {
return mongodb.NewMongoDB(logContrib)
}),
state_loader.New("zookeeper", func() state.Store {
return zookeeper.NewZookeeperStateStore(logContrib)
}),
state_loader.New("gcp.firestore", func() state.Store {
return firestore.NewFirestoreStateStore(logContrib)
}),
state_loader.New("postgresql", func() state.Store {
return postgresql.NewPostgreSQLStateStore(logContrib)
}),
state_loader.New("sqlserver", func() state.Store {
return sqlserver.NewSQLServerStateStore(logContrib)
}),
state_loader.New("hazelcast", func() state.Store {
return hazelcast.NewHazelcastStore(logContrib)
}),
state_loader.New("cloudstate.crdt", func() state.Store {
return cloudstate.NewCRDT(logContrib)
}),
state_loader.New("couchbase", func() state.Store {
return couchbase.NewCouchbaseStateStore(logContrib)
}),
state_loader.New("aerospike", func() state.Store {
return aerospike.NewAerospikeStateStore(logContrib)
}),
),
......
}
在这里配置各种 state store 的实现。
pkg/components/state/registry.go,定义了registry的接口和数据结构:
// Registry is an interface for a component that returns registered state store implementations
type Registry interface {
Register(components ...State)
CreateStateStore(name string) (state.Store, error)
}
type stateStoreRegistry struct {
stateStores map[string]func() state.Store
}
state.Store 是 dapr 定义的标准 state store的接口,所有的实现都要遵循这个接口。定义在 github.com/dapr/components-contrib/state/store.go
文件中:
// Store is an interface to perform operations on store
type Store interface {
Init(metadata Metadata) error
Delete(req *DeleteRequest) error
BulkDelete(req []DeleteRequest) error
Get(req *GetRequest) (*GetResponse, error)
Set(req *SetRequest) error
BulkSet(req []SetRequest) error
}
前面 runtime 初始化时,每个实现都通过 New 方法将 name 和对应的 state store 关联起来:
type State struct {
Name string
FactoryMethod func() state.Store
}
func New(name string, factoryMethod func() state.Store) State {
return State{
Name: name,
FactoryMethod: factoryMethod,
}
}
pkg/runtime/runtime.go :
State 的初始化在 runtime 初始化时进行:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
go a.processComponents()
......
}
func (a *DaprRuntime) processComponents() {
for {
comp, more := <-a.pendingComponents
if !more {
a.pendingComponentsDone <- true
return
}
if err := a.processOneComponent(comp); err != nil {
log.Errorf("process component %s error, %s", comp.Name, err)
}
}
}
processOneComponent:
func (a *DaprRuntime) processOneComponent(comp components_v1alpha1.Component) error {
res := a.preprocessOneComponent(&comp)
compCategory := a.figureOutComponentCategory(comp)
......
return nil
}
doProcessOneComponent:
func (a *DaprRuntime) doProcessOneComponent(category ComponentCategory, comp components_v1alpha1.Component) error {
switch category {
case stateComponent:
return a.initState(comp)
}
......
return nil
}
initState方法的实现:
// Refer for state store api decision https://github.com/dapr/dapr/blob/master/docs/decision_records/api/API-008-multi-state-store-api-design.md
func (a *DaprRuntime) initState(s components_v1alpha1.Component) error {
// 构建 state store(这里才开始集成components的代码)
store, err := a.stateStoreRegistry.CreateStateStore(s.Spec.Type)
if err != nil {
log.Warnf("error creating state store %s: %s", s.Spec.Type, err)
diag.DefaultMonitoring.ComponentInitFailed(s.Spec.Type, "creation")
return err
}
if store != nil {
props := a.convertMetadataItemsToProperties(s.Spec.Metadata)
// components的store实现在这里做初始化,如建连
err := store.Init(state.Metadata{
Properties: props,
})
if err != nil {
diag.DefaultMonitoring.ComponentInitFailed(s.Spec.Type, "init")
log.Warnf("error initializing state store %s: %s", s.Spec.Type, err)
return err
}
// 将初始化完成的store实现存放在runtime中
a.stateStores[s.ObjectMeta.Name] = store
// set specified actor store if "actorStateStore" is true in the spec.
actorStoreSpecified := props[actorStateStore]
if actorStoreSpecified == "true" {
if a.actorStateStoreCount++; a.actorStateStoreCount == 1 {
a.actorStateStoreName = s.ObjectMeta.Name
}
}
diag.DefaultMonitoring.ComponentInitialized(s.Spec.Type)
}
if a.actorStateStoreName == "" || a.actorStateStoreCount != 1 {
log.Warnf("either no actor state store or multiple actor state stores are specified in the configuration, actor stores specified: %d", a.actorStateStoreCount)
}
return nil
}
其中 CreateStateStore 方法的实现在 pkg/components/state/registry.go
中:
func (s *stateStoreRegistry) CreateStateStore(name string) (state.Store, error) {
if method, ok := s.stateStores[name]; ok {
return method(), nil
}
return nil, errors.Errorf("couldn't find state store %s", name)
}
runtime 处理 state 请求的代码在 pkg/grpc/api.go
中。
func (a *api) GetState(ctx context.Context, in *runtimev1pb.GetStateRequest) (*runtimev1pb.GetStateResponse, error) {
// 找 store name 对应的 state store
// 所以请求里面的 store name,必须对应 yaml 文件里面的 name
store, err := a.getStateStore(in.StoreName)
if err != nil {
apiServerLogger.Debug(err)
return &runtimev1pb.GetStateResponse{}, err
}
req := state.GetRequest{
Key: a.getModifiedStateKey(in.Key),
Metadata: in.Metadata,
Options: state.GetStateOption{
Consistency: stateConsistencyToString(in.Consistency),
},
}
// 执行查询
// 里面实际上会先执行 HGETALL 命令,失败后再执行 GET 命令
getResponse, err := store.Get(&req)
if err != nil {
err = fmt.Errorf("ERR_STATE_GET: %s", err)
apiServerLogger.Debug(err)
return &runtimev1pb.GetStateResponse{}, err
}
response := &runtimev1pb.GetStateResponse{}
if getResponse != nil {
response.Etag = getResponse.ETag
response.Data = getResponse.Data
}
return response, nil
}
get bulk 方法的实现是有 runtime 封装 get 方法而成,底层 state store 只需要实现单个查询的 get 即可。
func (a *api) GetBulkState(ctx context.Context, in *runtimev1pb.GetBulkStateRequest) (*runtimev1pb.GetBulkStateResponse, error) {
store, err := a.getStateStore(in.StoreName)
if err != nil {
apiServerLogger.Debug(err)
return &runtimev1pb.GetBulkStateResponse{}, err
}
resp := &runtimev1pb.GetBulkStateResponse{}
// 如果 Parallelism <= 0,则取默认值100
limiter := concurrency.NewLimiter(int(in.Parallelism))
for _, k := range in.Keys {
fn := func(param interface{}) {
req := state.GetRequest{
Key: a.getModifiedStateKey(param.(string)),
Metadata: in.Metadata,
}
r, err := store.Get(&req)
item := &runtimev1pb.BulkStateItem{
Key: param.(string),
}
if err != nil {
item.Error = err.Error()
} else if r != nil {
item.Data = r.Data
item.Etag = r.ETag
}
resp.Items = append(resp.Items, item)
}
limiter.Execute(fn, k)
}
limiter.Wait()
return resp, nil
}
func (a *api) SaveState(ctx context.Context, in *runtimev1pb.SaveStateRequest) (*empty.Empty, error) {
store, err := a.getStateStore(in.StoreName)
if err != nil {
apiServerLogger.Debug(err)
return &empty.Empty{}, err
}
reqs := []state.SetRequest{}
for _, s := range in.States {
req := state.SetRequest{
Key: a.getModifiedStateKey(s.Key),
Metadata: s.Metadata,
Value: s.Value,
ETag: s.Etag,
}
if s.Options != nil {
req.Options = state.SetStateOption{
Consistency: stateConsistencyToString(s.Options.Consistency),
Concurrency: stateConcurrencyToString(s.Options.Concurrency),
}
}
reqs = append(reqs, req)
}
// 调用 store 的 BulkSet 方法
// 事实上store的Set方法根本没有被 runtime 调用???
err = store.BulkSet(reqs)
if err != nil {
err = fmt.Errorf("ERR_STATE_SAVE: %s", err)
apiServerLogger.Debug(err)
return &empty.Empty{}, err
}
return &empty.Empty{}, nil
}
func (a *api) DeleteState(ctx context.Context, in *runtimev1pb.DeleteStateRequest) (*empty.Empty, error) {
store, err := a.getStateStore(in.StoreName)
if err != nil {
apiServerLogger.Debug(err)
return &empty.Empty{}, err
}
req := state.DeleteRequest{
Key: a.getModifiedStateKey(in.Key),
Metadata: in.Metadata,
ETag: in.Etag,
}
if in.Options != nil {
req.Options = state.DeleteStateOption{
Concurrency: stateConcurrencyToString(in.Options.Concurrency),
Consistency: stateConsistencyToString(in.Options.Consistency),
}
}
// 调用 store 的delete方法
// store 的 BulkDelete 方法没有调用
// runtime 也没有对外暴露 BulkDelete 方法
err = store.Delete(&req)
if err != nil {
err = fmt.Errorf("ERR_STATE_DELETE: failed deleting state with key %s: %s", in.Key, err)
apiServerLogger.Debug(err)
return &empty.Empty{}, err
}
return &empty.Empty{}, nil
}
如果要支持事务,则要求实现 TransactionalStore 接口:
type TransactionalStore interface {
// Init方法是和普通store接口一致的
Init(metadata Metadata) error
// 增加的是 Multi 方法
Multi(request *TransactionalStateRequest) error
}
runtime 的 ExecuteStateTransaction 方法的实现:
func (a *api) ExecuteStateTransaction(ctx context.Context, in *runtimev1pb.ExecuteStateTransactionRequest) (*empty.Empty, error) {
if a.stateStores == nil || len(a.stateStores) == 0 {
err := errors.New("ERR_STATE_STORE_NOT_CONFIGURED")
apiServerLogger.Debug(err)
return &empty.Empty{}, err
}
storeName := in.StoreName
if a.stateStores[storeName] == nil {
err := errors.New("ERR_STATE_STORE_NOT_FOUND")
apiServerLogger.Debug(err)
return &empty.Empty{}, err
}
// 检测是否是 TransactionalStore
transactionalStore, ok := a.stateStores[storeName].(state.TransactionalStore)
if !ok {
err := errors.New("ERR_STATE_STORE_NOT_SUPPORTED")
apiServerLogger.Debug(err)
return &empty.Empty{}, err
}
// 构造请求
operations := []state.TransactionalStateOperation{}
for _, inputReq := range in.Operations {
var operation state.TransactionalStateOperation
var req = inputReq.Request
switch state.OperationType(inputReq.OperationType) {
case state.Upsert:
setReq := state.SetRequest{
Key: a.getModifiedStateKey(req.Key),
// Limitation:
// components that cannot handle byte array need to deserialize/serialize in
// component sepcific way in components-contrib repo.
Value: req.Value,
Metadata: req.Metadata,
ETag: req.Etag,
}
if req.Options != nil {
setReq.Options = state.SetStateOption{
Concurrency: stateConcurrencyToString(req.Options.Concurrency),
Consistency: stateConsistencyToString(req.Options.Consistency),
}
}
operation = state.TransactionalStateOperation{
Operation: state.Upsert,
Request: setReq,
}
case state.Delete:
delReq := state.DeleteRequest{
Key: a.getModifiedStateKey(req.Key),
Metadata: req.Metadata,
ETag: req.Etag,
}
if req.Options != nil {
delReq.Options = state.DeleteStateOption{
Concurrency: stateConcurrencyToString(req.Options.Concurrency),
Consistency: stateConsistencyToString(req.Options.Consistency),
}
}
operation = state.TransactionalStateOperation{
Operation: state.Delete,
Request: delReq,
}
default:
err := fmt.Errorf("ERR_OPERATION_NOT_SUPPORTED: operation type %s not supported", inputReq.OperationType)
apiServerLogger.Debug(err)
return &empty.Empty{}, err
}
operations = append(operations, operation)
}
// 调用 state store 的 Multi 方法执行有事务性的多个操作
err := transactionalStore.Multi(&state.TransactionalStateRequest{
Operations: operations,
Metadata: in.Metadata,
})
if err != nil {
err = fmt.Errorf("ERR_STATE_TRANSACTION: %s", err)
apiServerLogger.Debug(err)
return &empty.Empty{}, err
}
return &empty.Empty{}, nil
}
Redis的实现在 dapr/components-contrib 下,/state/redis/redis.go 中:
// StateStore is a Redis state store
type StateStore struct {
client *redis.Client
json jsoniter.API
metadata metadata
replicas int
logger logger.Logger
}
// NewRedisStateStore returns a new redis state store
func NewRedisStateStore(logger logger.Logger) *StateStore {
return &StateStore{
json: jsoniter.ConfigFastest,
logger: logger,
}
}
在 dapr runtime 初始化时,关联 redis 的 state 实现:
state_loader.New("redis", func() state.Store {
return state_redis.NewRedisStateStore(logContrib)
}),
然后 Init 方法会在 state 初始化时被 dapr runtime 调用,Redis的实现内容为:
// Init does metadata and connection parsing
func (r *StateStore) Init(metadata state.Metadata) error {
m, err := parseRedisMetadata(metadata)
if err != nil {
return err
}
r.metadata = m
if r.metadata.failover {
r.client = r.newFailoverClient(m)
} else {
r.client = r.newClient(m)
}
if _, err = r.client.Ping().Result(); err != nil {
return fmt.Errorf("redis store: error connecting to redis at %s: %s", m.host, err)
}
r.replicas, err = r.getConnectedSlaves()
return err
}
get的实现方式:
// Get retrieves state from redis with a key
func (r *StateStore) Get(req *state.GetRequest) (*state.GetResponse, error) {
res, err := r.client.DoContext(context.Background(), "HGETALL", req.Key).Result() // Prefer values with ETags
if err != nil {
return r.directGet(req) //Falls back to original get
}
if res == nil {
// 结果为空的处理1
return &state.GetResponse{}, nil
}
vals := res.([]interface{})
if len(vals) == 0 {
// 结果为空的处理2
// 所以如果没有找到对应key的值,是给空应答,而不是报错
return &state.GetResponse{}, nil
}
data, version, err := r.getKeyVersion(vals)
if err != nil {
return nil, err
}
return &state.GetResponse{
Data: []byte(data),
ETag: version,
}, nil
}
要支持ETag,就不能简单用 redis 的 key / value 方式直接在value中存放state的数据(data字段,byte[]格式),这个“value”需要包含出data之外的其他Etag字段,比如 version。
redis state实现的设计方式方式是:对于每个存储在 redis 中的 state item中,其value是一个hashmap,在这个value hashmap中通过不同的key存放多个信息:
所以前面要用 HGETALL 命令把这个hashamap的所有key/value都取出来,然后现在要通过getKeyVersion方法来从这些key/value中读取data和version:
func (r *StateStore) getKeyVersion(vals []interface{}) (data string, version string, err error) {
seenData := false
seenVersion := false
for i := 0; i < len(vals); i += 2 {
field, _ := strconv.Unquote(fmt.Sprintf("%q", vals[i]))
switch field {
case "data":
data, _ = strconv.Unquote(fmt.Sprintf("%q", vals[i+1]))
seenData = true
case "version":
version, _ = strconv.Unquote(fmt.Sprintf("%q", vals[i+1]))
seenVersion = true
}
}
if !seenData || !seenVersion {
return "", "", errors.New("required hash field 'data' or 'version' was not found")
}
return data, version, nil
}
返回的时候,带上ETag:
return &state.GetResponse{
Data: []byte(data),
ETag: version,
}, nil
如果 HGETALL 命令执行失败,则fall back到普通场景:redis中只简单保存数据,没有etag。此时保存方式就是简单的key/value,用简单的 GET 命令直接读取:
func (r *StateStore) directGet(req *state.GetRequest) (*state.GetResponse, error) {
res, err := r.client.DoContext(context.Background(), "GET", req.Key).Result()
if err != nil {
return nil, err
}
if res == nil {
return &state.GetResponse{}, nil
}
s, _ := strconv.Unquote(fmt.Sprintf("%q", res))
return &state.GetResponse{
Data: []byte(s),
}, nil
}
备注:这个设计有个性能问题,如果redis中的数据是用简单key/value存储,没有etag,则每次读取都要进行两个:第一次 HGETALL 命令失败,然后 fall back 用 GET 命令再读第二次。
redis的实现,有 set 方法和 BulkSet
// Set saves state into redis
func (r *StateStore) Set(req *state.SetRequest) error {
return state.SetWithOptions(r.setValue, req)
}
// BulkSet performs a bulks save operation
func (r *StateStore) BulkSet(req []state.SetRequest) error {
for i := range req {
err := r.Set(&req[i])
if err != nil {
// 这个地方有异议
// 按照代码逻辑,只要有一个save操作失败,就直接return而放弃后续的操作
return err
}
}
return nil
}
实际实现在 r.setValue 方法中:
func (r *StateStore) setValue(req *state.SetRequest) error {
err := state.CheckRequestOptions(req.Options)
if err != nil {
return err
}
// 解析etag,要求etag必须是可以转为整型
ver, err := r.parseETag(req.ETag)
if err != nil {
return err
}
// LastWrite win意味着无视ETag的异同,强制写入
// 所以这里重置 ver 为 0
if req.Options.Concurrency == state.LastWrite {
ver = 0
}
bt, _ := utils.Marshal(req.Value, r.json.Marshal)
// 用 EVAL 命令执行一段 LUA 脚本,脚本内容为 setQuery
_, err = r.client.DoContext(context.Background(), "EVAL", setQuery, 1, req.Key, ver, bt).Result()
if err != nil {
return fmt.Errorf("failed to set key %s: %s", req.Key, err)
}
// 如果要求强一致性,而且副本数量大于0
if req.Options.Consistency == state.Strong && r.replicas > 0 {
// 则需要等待所有副本数都写入成功
_, err = r.client.DoContext(context.Background(), "WAIT", r.replicas, 1000).Result()
if err != nil {
return fmt.Errorf("timed out while waiting for %v replicas to acknowledge write", r.replicas)
}
}
return nil
}
更多redis细节:
setQuery = "local var1 = redis.pcall(\"HGET\", KEYS[1], \"version\"); if type(var1) == \"table\" then redis.call(\"DEL\", KEYS[1]); end; if not var1 or type(var1)==\"table\" or var1 == \"\" or var1 == ARGV[1] or ARGV[1] == \"0\" then redis.call(\"HSET\", KEYS[1], \"data\", ARGV[2]) return redis.call(\"HINCRBY\", KEYS[1], \"version\", 1) else return error(\"failed to set key \" .. KEYS[1]) end"
// Delete performs a delete operation
func (r *StateStore) Delete(req *state.DeleteRequest) error {
err := state.CheckRequestOptions(req.Options)
if err != nil {
return err
}
return state.DeleteWithOptions(r.deleteValue, req)
}
// 内部循环调用 Delete
// BulkDelete 方法没有暴露给 dapr runtime
// BulkDelete performs a bulk delete operation
func (r *StateStore) BulkDelete(req []state.DeleteRequest) error {
for i := range req {
err := r.Delete(&req[i])
if err != nil {
return err
}
}
return nil
}
实际实现在 r.deleteValue 方法中:
func (r *StateStore) deleteValue(req *state.DeleteRequest) error {
if req.ETag == "" {
// ETag的空值则改为 “0” / 零值
req.ETag = "0"
}
_, err := r.client.DoContext(context.Background(), "EVAL", delQuery, 1, req.Key, req.ETag).Result()
if err != nil {
return fmt.Errorf("failed to delete key '%s' due to ETag mismatch", req.Key)
}
return nil
}
更多redis细节:
delQuery = "local var1 = redis.pcall(\"HGET\", KEYS[1], \"version\"); if not var1 or type(var1)==\"table\" or var1 == ARGV[1] or var1 == \"\" or ARGV[1] == \"0\" then return redis.call(\"DEL\", KEYS[1]) else return error(\"failed to delete \" .. KEYS[1]) end"
redis state store 实现了 TransactionalStore,它的 Multi方式:
// Multi performs a transactional operation. succeeds only if all operations succeed, and fails if one or more operations fail
func (r *StateStore) Multi(request *state.TransactionalStateRequest) error {
// 用的是 redis-go 封装的 TxPipeline
pipe := r.client.TxPipeline()
for _, o := range request.Operations {
if o.Operation == state.Upsert {
req := o.Request.(state.SetRequest)
bt, _ := utils.Marshal(req.Value, r.json.Marshal)
pipe.Set(req.Key, bt, defaultExpirationTime)
} else if o.Operation == state.Delete {
req := o.Request.(state.DeleteRequest)
pipe.Del(req.Key)
}
}
_, err := pipe.Exec()
return err
}
Binding Registry 的初始化在 runtime 初始化时进行:
func NewDaprRuntime(runtimeConfig *Config, globalConfig *config.Configuration) *DaprRuntime {
......
bindingsRegistry: bindings_loader.NewRegistry(),
}
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
a.bindingsRegistry.RegisterInputBindings(opts.inputBindings...)
a.bindingsRegistry.RegisterOutputBindings(opts.outputBindings...)
......
}
这些 opts 来自 runtime 启动时的配置,如 cmd/daprd/main.go 下:
func main() {
rt, err := runtime.FromFlags()
if err != nil {
log.Fatal(err)
}
err = rt.Run(
......
runtime.WithInputBindings(
bindings_loader.NewInput("aws.sqs", func() bindings.InputBinding {
return sqs.NewAWSSQS(logContrib)
}),
bindings_loader.NewInput("aws.kinesis", func() bindings.InputBinding {
return kinesis.NewAWSKinesis(logContrib)
}),
bindings_loader.NewInput("azure.eventhubs", func() bindings.InputBinding {
return eventhubs.NewAzureEventHubs(logContrib)
}),
bindings_loader.NewInput("kafka", func() bindings.InputBinding {
return kafka.NewKafka(logContrib)
}),
bindings_loader.NewInput("mqtt", func() bindings.InputBinding {
return mqtt.NewMQTT(logContrib)
}),
bindings_loader.NewInput("rabbitmq", func() bindings.InputBinding {
return bindings_rabbitmq.NewRabbitMQ(logContrib)
}),
bindings_loader.NewInput("azure.servicebusqueues", func() bindings.InputBinding {
return servicebusqueues.NewAzureServiceBusQueues(logContrib)
}),
bindings_loader.NewInput("azure.storagequeues", func() bindings.InputBinding {
return storagequeues.NewAzureStorageQueues(logContrib)
}),
bindings_loader.NewInput("gcp.pubsub", func() bindings.InputBinding {
return pubsub.NewGCPPubSub(logContrib)
}),
bindings_loader.NewInput("kubernetes", func() bindings.InputBinding {
return kubernetes.NewKubernetes(logContrib)
}),
bindings_loader.NewInput("azure.eventgrid", func() bindings.InputBinding {
return eventgrid.NewAzureEventGrid(logContrib)
}),
bindings_loader.NewInput("twitter", func() bindings.InputBinding {
return twitter.NewTwitter(logContrib)
}),
bindings_loader.NewInput("cron", func() bindings.InputBinding {
return cron.NewCron(logContrib)
}),
),
runtime.WithOutputBindings(
bindings_loader.NewOutput("aws.sqs", func() bindings.OutputBinding {
return sqs.NewAWSSQS(logContrib)
}),
bindings_loader.NewOutput("aws.sns", func() bindings.OutputBinding {
return sns.NewAWSSNS(logContrib)
}),
bindings_loader.NewOutput("aws.kinesis", func() bindings.OutputBinding {
return kinesis.NewAWSKinesis(logContrib)
}),
bindings_loader.NewOutput("azure.eventhubs", func() bindings.OutputBinding {
return eventhubs.NewAzureEventHubs(logContrib)
}),
bindings_loader.NewOutput("aws.dynamodb", func() bindings.OutputBinding {
return dynamodb.NewDynamoDB(logContrib)
}),
bindings_loader.NewOutput("azure.cosmosdb", func() bindings.OutputBinding {
return bindings_cosmosdb.NewCosmosDB(logContrib)
}),
bindings_loader.NewOutput("gcp.bucket", func() bindings.OutputBinding {
return bucket.NewGCPStorage(logContrib)
}),
bindings_loader.NewOutput("http", func() bindings.OutputBinding {
return http.NewHTTP(logContrib)
}),
bindings_loader.NewOutput("kafka", func() bindings.OutputBinding {
return kafka.NewKafka(logContrib)
}),
bindings_loader.NewOutput("mqtt", func() bindings.OutputBinding {
return mqtt.NewMQTT(logContrib)
}),
bindings_loader.NewOutput("rabbitmq", func() bindings.OutputBinding {
return bindings_rabbitmq.NewRabbitMQ(logContrib)
}),
bindings_loader.NewOutput("redis", func() bindings.OutputBinding {
return redis.NewRedis(logContrib)
}),
bindings_loader.NewOutput("aws.s3", func() bindings.OutputBinding {
return s3.NewAWSS3(logContrib)
}),
bindings_loader.NewOutput("azure.blobstorage", func() bindings.OutputBinding {
return blobstorage.NewAzureBlobStorage(logContrib)
}),
bindings_loader.NewOutput("azure.servicebusqueues", func() bindings.OutputBinding {
return servicebusqueues.NewAzureServiceBusQueues(logContrib)
}),
bindings_loader.NewOutput("azure.storagequeues", func() bindings.OutputBinding {
return storagequeues.NewAzureStorageQueues(logContrib)
}),
bindings_loader.NewOutput("gcp.pubsub", func() bindings.OutputBinding {
return pubsub.NewGCPPubSub(logContrib)
}),
bindings_loader.NewOutput("azure.signalr", func() bindings.OutputBinding {
return signalr.NewSignalR(logContrib)
}),
bindings_loader.NewOutput("twilio.sms", func() bindings.OutputBinding {
return sms.NewSMS(logContrib)
}),
bindings_loader.NewOutput("twilio.sendgrid", func() bindings.OutputBinding {
return sendgrid.NewSendGrid(logContrib)
}),
bindings_loader.NewOutput("azure.eventgrid", func() bindings.OutputBinding {
return eventgrid.NewAzureEventGrid(logContrib)
}),
bindings_loader.NewOutput("cron", func() bindings.OutputBinding {
return cron.NewCron(logContrib)
}),
bindings_loader.NewOutput("twitter", func() bindings.OutputBinding {
return twitter.NewTwitter(logContrib)
}),
bindings_loader.NewOutput("influx", func() bindings.OutputBinding {
return influx.NewInflux(logContrib)
}),
),
......
}
在这里配置各种 inputbinding 和 output binding的实现。
pkg/components/bindings/registry.go,定义了多个数据结构:
type (
// InputBinding is an input binding component definition.
InputBinding struct {
Name string
FactoryMethod func() bindings.InputBinding
}
// OutputBinding is an output binding component definition.
OutputBinding struct {
Name string
FactoryMethod func() bindings.OutputBinding
}
// Registry is the interface of a components that allows callers to get registered instances of input and output bindings
Registry interface {
RegisterInputBindings(components ...InputBinding)
RegisterOutputBindings(components ...OutputBinding)
CreateInputBinding(name string) (bindings.InputBinding, error)
CreateOutputBinding(name string) (bindings.OutputBinding, error)
}
bindingsRegistry struct {
inputBindings map[string]func() bindings.InputBinding
outputBindings map[string]func() bindings.OutputBinding
}
)
前面 runtime 初始化时,每个实现都通过 NewInput 方法和 NewOutput方法,将 name 和对应的InputBinding/OutputBinding关联起来:
// NewInput creates a InputBinding.
func NewInput(name string, factoryMethod func() bindings.InputBinding) InputBinding {
return InputBinding{
Name: name,
FactoryMethod: factoryMethod,
}
}
// NewOutput creates a OutputBinding.
func NewOutput(name string, factoryMethod func() bindings.OutputBinding) OutputBinding {
return OutputBinding{
Name: name,
FactoryMethod: factoryMethod,
}
}
RegisterInputBindings 和 RegisterOutputBindings 方法用来注册 input binding 和 output binding
的实现,在runtime 初始化时被调用:
// RegisterInputBindings registers one or more new input bindings.
func (b *bindingsRegistry) RegisterInputBindings(components ...InputBinding) {
for _, component := range components {
b.inputBindings[createFullName(component.Name)] = component.FactoryMethod
}
}
// RegisterOutputBindings registers one or more new output bindings.
func (b *bindingsRegistry) RegisterOutputBindings(components ...OutputBinding) {
for _, component := range components {
b.outputBindings[createFullName(component.Name)] = component.FactoryMethod
}
}
func createFullName(name string) string {
// createFullName统一增加前缀 bindings.
return fmt.Sprintf("bindings.%s", name)
}
pkg/runtime/runtime.go :
Binding 的初始化在 runtime 初始化时进行:
func (a *DaprRuntime) initRuntime(opts *runtimeOpts) error {
......
go a.processComponents()
......
}
func (a *DaprRuntime) processComponents() {
for {
comp, more := <-a.pendingComponents
if !more {
a.pendingComponentsDone <- true
return
}
if err := a.processOneComponent(comp); err != nil {
log.Errorf("process component %s error, %s", comp.Name, err)
}
}
}
processOneComponent:
func (a *DaprRuntime) processOneComponent(comp components_v1alpha1.Component) error {
res := a.preprocessOneComponent(&comp)
compCategory := a.figureOutComponentCategory(comp)
......
return nil
}
doProcessOneComponent:
func (a *DaprRuntime) doProcessOneComponent(category ComponentCategory, comp components_v1alpha1.Component) error {
switch category {
case bindingsComponent:
return a.initBinding(comp)
......
}
return nil
}
initBinding:
func (a *DaprRuntime) initBinding(c components_v1alpha1.Component) error {
if err := a.initOutputBinding(c); err != nil {
log.Errorf("failed to init output bindings: %s", err)
return err
}
if err := a.initInputBinding(c); err != nil {
log.Errorf("failed to init input bindings: %s", err)
return err
}
return nil
}
在这里进行 input binding 和 output binding 的初始化。
pkg/runtime/runtime.go:
func (a *DaprRuntime) initOutputBinding(c components_v1alpha1.Component) error {
// 成功
binding, err := a.bindingsRegistry.CreateOutputBinding(c.Spec.Type)
if err != nil {
log.Warnf("failed to create output binding %s (%s): %s", c.ObjectMeta.Name, c.Spec.Type, err)
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "creation")
return err
}
if binding != nil {
err := binding.Init(bindings.Metadata{
Properties: a.convertMetadataItemsToProperties(c.Spec.Metadata),
Name: c.ObjectMeta.Name,
})
if err != nil {
log.Errorf("failed to init output binding %s (%s): %s", c.ObjectMeta.Name, c.Spec.Type, err)
diag.DefaultMonitoring.ComponentInitFailed(c.Spec.Type, "init")
return err
}
log.Infof("successful init for output binding %s (%s)", c.ObjectMeta.Name, c.Spec.Type)
a.outputBindings[c.ObjectMeta.Name] = binding
diag.DefaultMonitoring.ComponentInitialized(c.Spec.Type)
}
return nil
}
其中 CreateOutputBinding 方法的实现在 pkg/components/bindings/registry.go
中:
// Create instantiates an output binding based on `name`.
func (b *bindingsRegistry) CreateOutputBinding(name string) (bindings.OutputBinding, error) {
if method, ok := b.outputBindings[name]; ok {
// 调用 factory 方法生成具体实现的 outputBinding
return method(), nil
}
return nil, errors.Errorf("couldn't find output binding %s", name)
}
TODO
备注:根据 https://github.com/dapr/docs/blob/master/concepts/bindings/README.md 的描述,redis 只实现了 output binding。
Redis的实现在 dapr/components-contrib 下,/bindings/redis/redis.go 中:
func (r *Redis) Operations() []bindings.OperationKind {
// 只支持create
return []bindings.OperationKind{bindings.CreateOperation}
}
func (r *Redis) Invoke(req *bindings.InvokeRequest) (*bindings.InvokeResponse, error) {
// 通过 metadata 传递 key
if val, ok := req.Metadata["key"]; ok && val != "" {
key := val
// 调用标准 redis 客户端,执行 SET 命令
_, err := r.client.DoContext(context.Background(), "SET", key, req.Data).Result()
if err != nil {
return nil, err
}
return nil, nil
}
return nil, errors.New("redis binding: missing key on write request metadata")
}
初始化:
在 dapr runtime 初始化时,关联 redis 的 output binding实现:
bindings_loader.NewOutput("redis", func() bindings.OutputBinding {
return redis.NewRedis(logContrib)
}),
然后 Init 方法会在 output binding初始化时被 dapr runtime 调用,Redis的实现内容为:
// Init performs metadata parsing and connection creation
func (r *Redis) Init(meta bindings.Metadata) error {
// 解析metadata
m, err := r.parseMetadata(meta)
if err != nil {
return err
}
// redis 连接属性
opts := &redis.Options{
Addr: m.host,
Password: m.password,
DB: defaultDB,
MaxRetries: m.maxRetries,
MaxRetryBackoff: m.maxRetryBackoff,
}
/* #nosec */
if m.enableTLS {
opts.TLSConfig = &tls.Config{
InsecureSkipVerify: m.enableTLS,
}
}
// 建立redis连接
r.client = redis.NewClient(opts)
_, err = r.client.Ping().Result()
if err != nil {
return fmt.Errorf("redis binding: error connecting to redis at %s: %s", m.host, err)
}
return err
}
pkc/grpc/api.go
中的 InvokeBinding 方法:
func (a *api) InvokeBinding(ctx context.Context, in *runtimev1pb.InvokeBindingRequest) (*runtimev1pb.InvokeBindingResponse, error) {
req := &bindings.InvokeRequest{
Metadata: in.Metadata,
Operation: bindings.OperationKind(in.Operation),
}
if in.Data != nil {
req.Data = in.Data
}
r := &runtimev1pb.InvokeBindingResponse{}
// 关键实现在这里
resp, err := a.sendToOutputBindingFn(in.Name, req)
if err != nil {
err = fmt.Errorf("ERR_INVOKE_OUTPUT_BINDING: %s", err)
apiServerLogger.Debug(err)
return r, err
}
if resp != nil {
r.Data = resp.Data
r.Metadata = resp.Metadata
}
return r, nil
}
sendToOutputBindingFn 方法的初始化在这里:
func (a *DaprRuntime) getGRPCAPI() grpc.API {
return grpc.NewAPI(a.runtimeConfig.ID, a.appChannel, a.stateStores, a.secretStores, a.getPublishAdapter(), a.directMessaging, a.actor, a.sendToOutputBinding, a.globalConfig.Spec.TracingSpec)
}
sendToOutputBinding 方法的实现在 pkg/runtime/runtime.go
:
func (a *DaprRuntime) sendToOutputBinding(name string, req *bindings.InvokeRequest) (*bindings.InvokeResponse, error) {
if req.Operation == "" {
return nil, errors.New("operation field is missing from request")
}
// 根据 name 找已经注册好的 binding
if binding, ok := a.outputBindings[name]; ok {
ops := binding.Operations()
for _, o := range ops {
// 找到改 binding 下支持的 operation
if o == req.Operation {
// 关键代码,需要转到具体的实现了
return binding.Invoke(req)
}
}
supported := make([]string, len(ops))
for _, o := range ops {
supported = append(supported, string(o))
}
return nil, errors.Errorf("binding %s does not support operation %s. supported operations:%s", name, req.Operation, strings.Join(supported, " "))
}
return nil, errors.Errorf("couldn't find output binding %s", name)
}
总结一下各种binding实现中 metadata 的设计和使用:
实现 | 配置级别的metadata | 请求级别的metadata |
---|---|---|
alicloud oss | key | |
HTTP | url / method | 无 |
cron | schedule | 无 |
MQTT | url / topic | 无 |
RabbitMQ | host / queueName / durable deleteWhenUnused / prefetchCount |
ttlInSeconds |
Redis | host / password / enableTLS / maxRetries / maxRetryBackoff |
key |
Influx | url / token / org / bucket | 无 |
Kafka | brokers / topics / publishTopic consumerGroup / authRequried saslUsername / saslPassword |
key |
Kubernetes | namespace / resyncPeriodInSec / | 无 |
twilio-sendgrid | apiKey / emailFrom / emailTo subject / emailCc / emailBcc |
emailFrom / emailTo / subject emailCc / emailBcc |
twilio-sms | toNumber / fromNumber / accountSid authToken / timeout |
toNumber |
consumerKey / consumerSecret / accessToken accessSecret / query |
query / lang / result / since_id | |
gcp-bucket | bucket / type / project_id / private_key_id private_key / client_email / client_id auth_uri / token_uri auth_provider_x509_cert_url / client_x509_cert_url |
name |
gcp-pubsub | topic / subscription / type / project_id / private_key_id / private_key client_email / client_id / auth_uri / token_uri auth_provider_x509_cert_url / client_x509_cert_url |
topic |
Azure-blobstorage | storageAccount / storageAccessKey / container | blobName / ContentType / ContentMD5 ContentEncoding / ContentLanguage ContentDisposition / CacheControl |
Azure-cosmosDB | url / masterKey / database / collection / partitionKey |
无 |
Azure-EventGrid | tenantId / subscriptionId / clientId clientSecret / subscriberEndpoint handshakePort / scope eventSubscriptionName / accessKey topicEndpoint |
无 |
Azure-EventHubs | connection / consumerGroup / storageAccountName / storageAccountKey / storageContainerName partitionID / partitionKey |
partitionKey |
Azure-ServiceBusQueues | connectionString / queueName / ttl | id / correlationID / ttlInSeconds |
Azure-SignalR | connectionString / hub | hub / group / user |
Azure-storagequeue | ttlInSeconds | |
Aws-dynamodb | region / endpoint / accessKey secretKey / table |
无 |
Aws-kinesis | streamName / consumerName / region endpoint / accessKey secretKey / mode |
partitionKey |
Aws-s3 | region / endpoint / accessKey secretKey / bucket |
key |
Aws-sns | topicArn / region / endpoint accessKey / secretKey |
无 |
Aws-sqs | queueName / region / endpoint accessKey / secretKey |
无 |
以e2e中的 stateapp 为例。
tests/apps/stateapp/service.yaml
中是 stateapp 的 Service 定义和 Deployment定义。
Service的定义没有什么特殊:
kind: Service
apiVersion: v1
metadata:
name: stateapp
labels:
testapp: stateapp
spec:
selector:
testapp: stateapp
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: LoadBalancer
deployment的定义:
apiVersion: apps/v1
kind: Deployment
metadata:
name: stateapp
labels:
testapp: stateapp
spec:
replicas: 1
selector:
matchLabels:
testapp: stateapp
template: # stateapp的pod定义
metadata:
labels:
testapp: stateapp
annotations:
dapr.io/enabled: "true"
dapr.io/app-id: "stateapp"
dapr.io/app-port: "3000"
spec: #stateapp的container定义,暂时pod中只定义了这个一个container
containers:
- name: stateapp
image: docker.io/YOUR_DOCKER_ALIAS/e2e-stateapp:dev
ports:
- containerPort: 3000
imagePullPolicy: Always
单独看 stateapp 的 pod 定义的 annotations ,
annotations:
dapr.io/enabled: "true"
dapr.io/app-id: "stateapp"
dapr.io/app-port: "3000"
getPodPatchOperations:
func (i *injector) getPodPatchOperations(ar *v1beta1.AdmissionReview,
namespace, image string, kubeClient *kubernetes.Clientset, daprClient scheme.Interface) ([]PatchOperation, error) {
req := ar.Request
var pod corev1.Pod
if err := json.Unmarshal(req.Object.Raw, &pod); err != nil {
errors.Wrap(err, "could not unmarshal raw object")
return nil, err
}
log.Infof(
"AdmissionReview for Kind=%v, Namespace=%v Name=%v (%v) UID=%v "+
"patchOperation=%v UserInfo=%v",
req.Kind,
req.Namespace,
req.Name,
pod.Name,
req.UID,
req.Operation,
req.UserInfo,
)
if !isResourceDaprEnabled(pod.Annotations) || podContainsSidecarContainer(&pod) {
return nil, nil
}
...
这个info日志打印的例子如下:
{"instance":"dapr-sidecar-injector-5f6f4bb6df-n5dsk","level":"info","msg":"AdmissionReview for Kind=/v1, Kind=Pod, Namespace=dapr-tests Name= () UID=d0126a13-9efd-432e-894a-5ddbee55898c patchOperation=CREATE UserInfo={system:serviceaccount:kube-system:replicaset-controller 3e5de149-07a3-434e-a8de-209abee69760 [system:serviceaccounts system:serviceaccounts:kube-system system:authenticated] map[]}","scope":"dapr.injector","time":"2020-09-25T07:07:07.6482457Z","type":"log","ver":"edge"}
可以看到在 namespace dapr-tests 下 pod 有 CREATE operation时Injector有开始工作。
isResourceDaprEnabled(pod.Annotations)
检查是否是 dapr,判断的方式是看 pod 是否有名为dapr.io/enabled
的 annotation并且设置为true,缺省为false:
const (
daprEnabledKey = "dapr.io/enabled"
)
func isResourceDaprEnabled(annotations map[string]string) bool {
return getBoolAnnotationOrDefault(annotations, daprEnabledKey, false)
}
podContainsSidecarContainer 检查 pod 是不是已经包含 dapr的sidecar,判断的方式是看 container 的名字是不是 daprd
:
const (
sidecarContainerName = "daprd"
)
func podContainsSidecarContainer(pod *corev1.Pod) bool {
for _, c := range pod.Spec.Containers {
if c.Name == sidecarContainerName {
return true
}
}
return false
}
继续getPodPatchOperations():
id := getAppID(pod)
// Keep DNS resolution outside of getSidecarContainer for unit testing.
placementAddress := fmt.Sprintf("%s:80", getKubernetesDNS(placementService, namespace))
sentryAddress := fmt.Sprintf("%s:80", getKubernetesDNS(sentryService, namespace))
apiSrvAddress := fmt.Sprintf("%s:80", getKubernetesDNS(apiAddress, namespace))
getAppID(pod) 通过读取 annotation 来获取应用id,注意 “dapr.io/id” 已经废弃,1.0 之后将被删除,替换为dapr.io/app-id":
const (
appIDKey = "dapr.io/app-id"
// Deprecated, remove in v1.0
idKey = "dapr.io/id"
)
func getAppID(pod corev1.Pod) string {
id := getStringAnnotationOrDefault(pod.Annotations, appIDKey, "")
if id != "" {
return id
}
return getStringAnnotationOrDefault(pod.Annotations, idKey, pod.GetName())
}
var trustAnchors string
var certChain string
var certKey string
var identity string
mtlsEnabled := mTLSEnabled(daprClient)
if mtlsEnabled {
trustAnchors, certChain, certKey = getTrustAnchorsAndCertChain(kubeClient, namespace)
identity = fmt.Sprintf("%s:%s", req.Namespace, pod.Spec.ServiceAccountName)
}
mTLSEnabled判断的方式,居然是读取所有的namespace下的dapr configuration:
const (
// NamespaceAll is the default argument to specify on a context when you want to list or filter resources across all namespaces
NamespaceAll string = ""
)
func mTLSEnabled(daprClient scheme.Interface) bool {
resp, err := daprClient.ConfigurationV1alpha1().Configurations(meta_v1.NamespaceAll).List(meta_v1.ListOptions{})
if err != nil {
return defaultMtlsEnabled
}
for _, c := range resp.Items {
if c.GetName() == defaultConfig { // "daprsystem"
return c.Spec.MTLSSpec.Enabled
}
}
return defaultMtlsEnabled
}
通过读取k8s的资源来判断是否要开启 mtls,tests/config/dapr_mtls_off_config.yaml
有example内容:
apiVersion: dapr.io/v1alpha1
kind: Configuration
metadata:
name: daprsystem # 名字一定要是 daprsystem
spec:
mtls:
enabled: "false" # 在这里配置要不要开启 mtls
workloadCertTTL: "1h"
allowedClockSkew: "20m"
但这个坑货
E0925 09:37:53.480772 1 reflector.go:153] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:224: Failed to list *v1alpha1.Configuration: v1alpha1.ConfigurationList.Items: []v1alpha1.Configuration: v1alpha1.Configuration.Spec: v1alpha1.ConfigurationSpec.MTLSSpec: v1alpha1.MTLSSpec.Enabled: ReadBool: expect t or f, but found ", error found in #10 byte of ...|enabled":"false","wo|..., bigger context ...|pec":{"mtls":{"allowedClockSkew":"20m","enabled":"false","workloadCertTTL":"1h"}}},{"apiVersion":"da|...
apiVersion: v1
kind: Pod
metadata:
annotations:
dapr.io/app-id: stateapp
dapr.io/app-port: "3000"
dapr.io/enabled: "true"
dapr.io/sidecar-cpu-limit: "4.0"
dapr.io/sidecar-cpu-request: "0.5"
dapr.io/sidecar-memory-limit: 512Mi
dapr.io/sidecar-memory-request: 250Mi
creationTimestamp: "2020-09-25T07:07:07Z"
generateName: stateapp-567b6b9c6f-
labels:
pod-template-hash: 567b6b9c6f
testapp: stateapp
name: stateapp-567b6b9c6f-84kzb
namespace: dapr-tests
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: stateapp-567b6b9c6f
uid: 25a34367-79ed-4e19-868a-5b063a45b1f4
resourceVersion: "146616"
selfLink: /api/v1/namespaces/dapr-tests/pods/stateapp-567b6b9c6f-84kzb
uid: 0f4060df-0312-4d73-91c1-6f085462b33d
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- amd64
containers:
- env:
- name: DAPR_HTTP_PORT
value: "3500"
- name: DAPR_GRPC_PORT
value: "50001"
image: docker.io/skyao/e2e-stateapp:dev-linux-amd64
imagePullPolicy: Always
name: stateapp
ports:
- containerPort: 3000
name: http
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-qncjc
readOnly: true
- args:
- --mode
- kubernetes
- --dapr-http-port
- "3500"
- --dapr-grpc-port
- "50001"
- --dapr-internal-grpc-port
- "50002"
- --app-port
- "3000"
- --app-id
- stateapp
- --control-plane-address
- dapr-api.dapr-system.svc.cluster.local:80
- --app-protocol
- http
- --placement-host-address
- dapr-placement.dapr-system.svc.cluster.local:80
- --config
- ""
- --log-level
- info
- --app-max-concurrency
- "-1"
- --sentry-address
- dapr-sentry.dapr-system.svc.cluster.local:80
- --metrics-port
- "9090"
- --enable-mtls
command:
- /daprd
env:
- name: DAPR_HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: NAMESPACE
value: dapr-tests
- name: DAPR_TRUST_ANCHORS
value: |
-----BEGIN CERTIFICATE-----
MIIB3TCCAYKgAwIBAgIRAMra+wjgMY6ABDtu3/vJ0NcwCgYIKoZIzj0EAwIwMTEX
MBUGA1UEChMOZGFwci5pby9zZW50cnkxFjAUBgNVBAMTDWNsdXN0ZXIubG9jYWww
HhcNMjAwOTI1MDU1ODAzWhcNMjEwOTI1MDU1ODAzWjAxMRcwFQYDVQQKEw5kYXBy
LmlvL3NlbnRyeTEWMBQGA1UEAxMNY2x1c3Rlci5sb2NhbDBZMBMGByqGSM49AgEG
CCqGSM49AwEHA0IABE/w/8YBtRJPYNJkcDM05e9PhrbGjBU/RQd09J909OJebDe8
rthysygWrcGYHYKziKK2Pyc1j4ua2xklLC5DFEWjezB5MA4GA1UdDwEB/wQEAwIC
BDAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwDwYDVR0TAQH/BAUwAwEB
/zAdBgNVHQ4EFgQUQ2v6OiayM9V4DPAU6UZHGe/nc1swGAYDVR0RBBEwD4INY2x1
c3Rlci5sb2NhbDAKBggqhkjOPQQDAgNJADBGAiEAtVBx9vDXiRE3fXJTU2yK11W5
eo+Ce4+U6/vXDtzw4PUCIQDlLOB45ihOAhhLVLG9akhgwJOrgZLEW3FZjRabpSsb
og==
-----END CERTIFICATE-----
- name: DAPR_CERT_CHAIN
value: |
-----BEGIN CERTIFICATE-----
MIIBxDCCAWqgAwIBAgIQQ1sfEH4aYacFZwBau+aOozAKBggqhkjOPQQDAjAxMRcw
FQYDVQQKEw5kYXByLmlvL3NlbnRyeTEWMBQGA1UEAxMNY2x1c3Rlci5sb2NhbDAe
Fw0yMDA5MjUwNTU4MDNaFw0yMTA5MjUwNTU4MDNaMBgxFjAUBgNVBAMTDWNsdXN0
ZXIubG9jYWwwWTATBgcqhkjOPQIBBggqhkjOPQMBBwNCAARhj7MQ1uiOkZvJ0AYV
uiFca/Iu9D5O98E5JN1mjCohRawk+QT1PjW05YtmyVji4Tt6ckIMvOXwG3aoTsGO
UbRio30wezAOBgNVHQ8BAf8EBAMCAQYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4E
FgQUTPUh0WWBB5baKs3aJjMzInVLX/EwHwYDVR0jBBgwFoAUQ2v6OiayM9V4DPAU
6UZHGe/nc1swGAYDVR0RBBEwD4INY2x1c3Rlci5sb2NhbDAKBggqhkjOPQQDAgNI
ADBFAiBO0oCadeYyLM+RkSAYPSTtjMyEZ0wv1/BsWuUMg+KZ6AIhALHnT0pxiqlj
miYT4WZWvaBc17AbUh1efgV2DAaNKm54
-----END CERTIFICATE-----
- name: DAPR_CERT_KEY
value: |
-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIDj6niLJ5ep+fDdY71bKyWl9RZHrXyRjND6pWySL2Q4UoAoGCCqGSM49
AwEHoUQDQgAEYY+zENbojpGbydAGFbohXGvyLvQ+TvfBOSTdZowqIUWsJPkE9T41
tOWLZslY4uE7enJCDLzl8Bt2qE7BjlG0Yg==
-----END EC PRIVATE KEY-----
- name: SENTRY_LOCAL_IDENTITY
value: default:dapr-tests
image: docker.io/skyao/daprd:dev-linux-amd64
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /v1.0/healthz
port: 3500
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 6
successThreshold: 1
timeoutSeconds: 3
name: daprd
ports:
- containerPort: 3500
name: dapr-http
protocol: TCP
- containerPort: 50001
name: dapr-grpc
protocol: TCP
- containerPort: 50002
name: dapr-internal
protocol: TCP
- containerPort: 9090
name: dapr-metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /v1.0/healthz
port: 3500
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 6
successThreshold: 1
timeoutSeconds: 3
resources:
limits:
cpu: "4"
memory: 512Mi
requests:
cpu: 500m
memory: 250Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-qncjc
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: docker-desktop
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-qncjc
secret:
defaultMode: 420
secretName: default-token-qncjc
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-09-25T07:07:07Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-09-25T07:07:07Z"
message: 'containers with unready status: [daprd]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-09-25T07:07:07Z"
message: 'containers with unready status: [daprd]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-09-25T07:07:07Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://26a1d85ac6e2accd833832681b8dc2aa809e3c0fcfa293398bd5e7c2e8bf3e2b
image: skyao/daprd:dev-linux-amd64
imageID: docker-pullable://skyao/daprd@sha256:387f3bf4e7397c43dca9ac2d248a9ce790b1c1888aa0d6de3b07107ce124752f
lastState:
terminated:
containerID: docker://26a1d85ac6e2accd833832681b8dc2aa809e3c0fcfa293398bd5e7c2e8bf3e2b
exitCode: 1
finishedAt: "2020-09-25T08:03:14Z"
reason: Error
startedAt: "2020-09-25T08:03:04Z"
name: daprd
ready: false
restartCount: 21
started: false
state:
waiting:
message: back-off 5m0s restarting failed container=daprd pod=stateapp-567b6b9c6f-84kzb_dapr-tests(0f4060df-0312-4d73-91c1-6f085462b33d)
reason: CrashLoopBackOff
- containerID: docker://737745ace04213c9519ad1f91e248015c89a80e2b3d61081c3c530d1c89bdbae
image: skyao/e2e-stateapp:dev-linux-amd64
imageID: docker-pullable://skyao/e2e-stateapp@sha256:16351b331f1338a61348c9a87fce43728369f1bf18ee69d9d45fb13db0283644
lastState: {}
name: stateapp
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2020-09-25T07:07:24Z"
hostIP: 192.168.65.3
phase: Running
podIP: 10.1.0.194
podIPs:
- ip: 10.1.0.194
qosClass: Burstable
startTime: "2020-09-25T07:07:07Z"
dapr-sidecar-injector
apiVersion: v1
kind: Pod
metadata:
annotations:
prometheus.io/path: /
prometheus.io/port: "9090"
prometheus.io/scrape: "true"
creationTimestamp: "2020-09-25T05:57:37Z"
generateName: dapr-sidecar-injector-5f6f4bb6df-
labels:
app: dapr-sidecar-injector
app.kubernetes.io/component: sidecar-injector
app.kubernetes.io/managed-by: helm
app.kubernetes.io/name: dapr
app.kubernetes.io/part-of: dapr
app.kubernetes.io/version: dev-linux-amd64
pod-template-hash: 5f6f4bb6df
name: dapr-sidecar-injector-5f6f4bb6df-n5dsk
namespace: dapr-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: dapr-sidecar-injector-5f6f4bb6df
uid: ff47b1df-6da7-4a19-b99d-15622ca3a485
resourceVersion: "133143"
selfLink: /api/v1/namespaces/dapr-system/pods/dapr-sidecar-injector-5f6f4bb6df-n5dsk
uid: 40df3834-4df2-495a-aa26-5b2a22de7639
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
weight: 1
containers:
- args:
- --log-level
- info
- --log-as-json
- --metrics-port
- "9090"
command:
- /injector
env:
- name: TLS_CERT_FILE
value: /dapr/cert/tls.crt
- name: TLS_KEY_FILE
value: /dapr/cert/tls.key
- name: SIDECAR_IMAGE
value: docker.io/skyao/daprd:dev-linux-amd64
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: docker.io/skyao/dapr:dev-linux-amd64
imagePullPolicy: Always
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
name: dapr-sidecar-injector
ports:
- containerPort: 4000
name: https
protocol: TCP
- containerPort: 9090
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dapr/cert
name: cert
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: dapr-operator-token-lgpvc
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: docker-desktop
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: dapr-operator
serviceAccountName: dapr-operator
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: cert
secret:
defaultMode: 420
secretName: dapr-sidecar-injector-cert
- name: dapr-operator-token-lgpvc
secret:
defaultMode: 420
secretName: dapr-operator-token-lgpvc
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-09-25T05:57:37Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-09-25T05:58:10Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-09-25T05:58:10Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-09-25T05:57:37Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://a820646b468a07eabdd89ca133f062a93e85256afc6c19c1bdf13b56980ec5e9
image: skyao/dapr:dev-linux-amd64
imageID: docker-pullable://skyao/dapr@sha256:77003eee9fd02d9fc24c2e9f385a6c86223bc35915cede98a8897c0dfc51ee61
lastState: {}
name: dapr-sidecar-injector
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2020-09-25T05:58:06Z"
hostIP: 192.168.65.3
phase: Running
podIP: 10.1.0.188
podIPs:
- ip: 10.1.0.188
qosClass: BestEffort
startTime: "2020-09-25T05:57:37Z"
Dapr injector 中的 main.go 文件的源码分析。
init() 进行初始化,包括 flag (logger, metric),
func init() {
loggerOptions := logger.DefaultOptions()
// 这里设定了 `log-level` 和 `log-as-json`
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)
metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)
// 这里设定了 `metrics-port` 和 `enable-metrics`
metricsExporter.Options().AttachCmdFlags(flag.StringVar, flag.BoolVar)
flag.Parse()
参考 injector pod yaml文件中 Command 段:
Command:
/injector
Args:
--log-level
info
--log-as-json
--enable-metrics
--metrics-port
9090
// Apply options to all loggers
if err := logger.ApplyOptionsToLoggers(&loggerOptions); err != nil {
log.Fatal(err)
} else {
log.Infof("log level set to: %s", loggerOptions.OutputLevel)
}
// Initialize dapr metrics exporter
if err := metricsExporter.Init(); err != nil {
log.Fatal(err)
}
// Initialize injector service metrics
if err := monitoring.InitMetrics(); err != nil {
log.Fatal(err)
}
从环境变量中读取配置:
func main() {
logger.DaprVersion = version.Version()
log.Infof("starting Dapr Sidecar Injector -- version %s -- commit %s", version.Version(), version.Commit())
ctx := signals.Context()
cfg, err := injector.GetConfigFromEnvironment()
if err != nil {
log.Fatalf("error getting config: %s", err)
}
......
}
kubeClient := utils.GetKubeClient()
conf := utils.GetConfig()
daprClient, _ := scheme.NewForConfig(conf)
go func() {
healthzServer := health.NewServer(log)
healthzServer.Ready()
healthzErr := healthzServer.Run(ctx, healthzPort)
if healthzErr != nil {
log.Fatalf("failed to start healthz server: %s", healthzErr)
}
}()
uids, err := injector.AllowedControllersServiceAccountUID(ctx, kubeClient)
if err != nil {
log.Fatalf("failed to get authentication uids from services accounts: %s", err)
}
injector.NewInjector(uids, cfg, daprClient, kubeClient).Run(ctx)
简单的sleep 5秒作为 graceful shutdown :
shutdownDuration := 5 * time.Second
log.Infof("allowing %s for graceful shutdown to complete", shutdownDuration)
<-time.After(shutdownDuration)
Dapr injector package中的 config.go 文件的源码分析。
Injector 相关的配置项定义:
// Config represents configuration options for the Dapr Sidecar Injector webhook server
type Config struct {
TLSCertFile string `envconfig:"TLS_CERT_FILE" required:"true"`
TLSKeyFile string `envconfig:"TLS_KEY_FILE" required:"true"`
SidecarImage string `envconfig:"SIDECAR_IMAGE" required:"true"`
SidecarImagePullPolicy string `envconfig:"SIDECAR_IMAGE_PULL_POLICY"`
Namespace string `envconfig:"NAMESPACE" required:"true"`
}
只设置了一个 SidecarImagePullPolicy 的默认值:
func NewConfigWithDefaults() Config {
return Config{
SidecarImagePullPolicy: "Always",
}
}
这个方法只被下面的 GetConfigFromEnvironment() 方法调用。
从环境中获取配置
func GetConfigFromEnvironment() (Config, error) {
c := NewConfigWithDefaults()
err := envconfig.Process("", &c)
return c, err
}
envconfig.Process() 的代码实现会通过反射读取到 Config 结构体的信息,然后根据设定的环境变量名来读取。
这个方法的调用只有一个地方,在injector main 函数的开始位置:
func main() {
log.Infof("starting Dapr Sidecar Injector -- version %s -- commit %s", version.Version(), version.Commit())
ctx := signals.Context()
cfg, err := injector.GetConfigFromEnvironment()
if err != nil {
log.Fatalf("error getting config: %s", err)
}
......
}
通过命令如 k describe pod dapr-sidecar-injector-6f656b7dd-sg87p -n dapr-system
拿到 injector pod 的yaml 文件,可以看到 Environment 的这一段:
Environment:
TLS_CERT_FILE: /dapr/cert/tls.crt
TLS_KEY_FILE: /dapr/cert/tls.key
SIDECAR_IMAGE: docker.io/skyao/daprd:dev-linux-amd64
SIDECAR_IMAGE_PULL_POLICY: IfNotPresent
NAMESPACE: dapr-system (v1:metadata.namespace)
以下是完整的 injector pod yaml,留着备用:
Name: dapr-sidecar-injector-6f656b7dd-sg87p
Namespace: dapr-system
Priority: 0
Node: docker-desktop/192.168.65.3
Start Time: Mon, 19 Apr 2021 15:04:07 +0800
Labels: app=dapr-sidecar-injector
app.kubernetes.io/component=sidecar-injector
app.kubernetes.io/managed-by=helm
app.kubernetes.io/name=dapr
app.kubernetes.io/part-of=dapr
app.kubernetes.io/version=dev-linux-amd64
pod-template-hash=6f656b7dd
Annotations: prometheus.io/path: /
prometheus.io/port: 9090
prometheus.io/scrape: true
Status: Running
IP: 10.1.2.162
IPs:
IP: 10.1.2.162
Controlled By: ReplicaSet/dapr-sidecar-injector-6f656b7dd
Containers:
dapr-sidecar-injector:
Container ID: docker://544dabf00bdaba9cf8f320218dd0b7e6d2ebce7fbf5184ce162d58bc693162d9
Image: docker.io/skyao/dapr:dev-linux-amd64
Image ID: docker-pullable://skyao/dapr@sha256:b4843ee78eabf014e15749bc4daa5c249ce3d33f796a89aaba9d117dd3dc76c9
Ports: 4000/TCP, 9090/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/injector
Args:
--log-level
info
--log-as-json
--enable-metrics
--metrics-port
9090
State: Running
Started: Mon, 19 Apr 2021 15:04:08 +0800
Ready: True
Restart Count: 0
Liveness: http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=5
Readiness: http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=5
Environment:
TLS_CERT_FILE: /dapr/cert/tls.crt
TLS_KEY_FILE: /dapr/cert/tls.key
SIDECAR_IMAGE: docker.io/skyao/daprd:dev-linux-amd64
SIDECAR_IMAGE_PULL_POLICY: IfNotPresent
NAMESPACE: dapr-system (v1:metadata.namespace)
Mounts:
/dapr/cert from cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from dapr-operator-token-cjpnd (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
cert:
Type: Secret (a volume populated by a Secret)
SecretName: dapr-sidecar-injector-cert
Optional: false
dapr-operator-token-cjpnd:
Type: Secret (a volume populated by a Secret)
SecretName: dapr-operator-token-cjpnd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned dapr-system/dapr-sidecar-injector-6f656b7dd-sg87p to docker-desktop
Normal Pulled 17m kubelet Container image "docker.io/skyao/dapr:dev-linux-amd64" already present on machine
Normal Created 17m kubelet Created container dapr-sidecar-injector
Normal Started 17m kubelet Started container dapr-sidecar-injector
Injector 是Dapr运行时 sidecar 注入组件的接口。
// Injector is the interface for the Dapr runtime sidecar injection component
type Injector interface {
Run(ctx context.Context)
}
injector 结构体定义:
type injector struct {
config Config
deserializer runtime.Decoder
server *http.Server
kubeClient *kubernetes.Clientset
daprClient scheme.Interface
authUIDs []string
}
创建新的 injector 结构体(这个方法在injecot的main方法中被调用):
// NewInjector returns a new Injector instance with the given config
func NewInjector(authUIDs []string, config Config, daprClient scheme.Interface, kubeClient *kubernetes.Clientset) Injector {
mux := http.NewServeMux()
i := &injector{
config: config,
deserializer: serializer.NewCodecFactory(
runtime.NewScheme(),
).UniversalDeserializer(),
// 启动http server
server: &http.Server{
Addr: fmt.Sprintf(":%d", port),
Handler: mux,
},
kubeClient: kubeClient,
daprClient: daprClient,
authUIDs: authUIDs,
}
// 给 k8s 调用的 mutate 端点
mux.HandleFunc("/mutate", i.handleRequest)
return i
}
最核心的run方法,
func (i *injector) Run(ctx context.Context) {
doneCh := make(chan struct{})
// 启动go routing,监听 ctx 和 doneCh 的信号
go func() {
select {
case <-ctx.Done():
log.Info("Sidecar injector is shutting down")
shutdownCtx, cancel := context.WithTimeout(
context.Background(),
time.Second*5,
)
defer cancel()
i.server.Shutdown(shutdownCtx) // nolint: errcheck
case <-doneCh:
}
}()
// 打印启动时的日志,这行日志可以通过
log.Infof("Sidecar injector is listening on %s, patching Dapr-enabled pods", i.server.Addr)
// TODO:这里有时会报错,证书有问题,导致injector无法正常工作,后面再来检查
err := i.server.ListenAndServeTLS(i.config.TLSCertFile, i.config.TLSKeyFile)
if err != http.ErrServerClosed {
log.Errorf("Sidecar injector error: %s", err)
}
close(doneCh)
}
可以对比通过 k logs dapr-sidecar-injector-86b8dc4dcd-bkbgw -n dapr-system
命令查看到的injecot 日志内容:
{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"log level set to: info","scope":"dapr.injector","time":"2021-05-11T01:13:20.1904136Z","type":"log","ver":"unknown"}
{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"metrics server started on :9090/","scope":"dapr.metrics","time":"2021-05-11T01:13:20.1907347Z","type":"log","ver":"unknown"}
{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"starting Dapr Sidecar Injector -- version edge -- commit v1.0.0-rc.4-163-g9a4210a-dirty","scope":"dapr.injector","time":"2021-05-11T01:13:20.191669Z","type":"log","ver":"unknown"}
{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"Healthz server is listening on :8080","scope":"dapr.injector","time":"2021-05-11T01:13:20.1928941Z","type":"log","ver":"unknown"}
{"instance":"dapr-sidecar-injector-86b8dc4dcd-bkbgw","level":"info","msg":"Sidecar injector is listening on :4000, patching Dapr-enabled pods","scope":"dapr.injector","time":"2021-05-11T01:13:20.208587Z","type":"log","ver":"unknown"}
handleRequest方法用来处理来自 k8s api server的 mutate 调用:
mux.HandleFunc("/mutate", i.handleRequest)
func (i *injector) handleRequest(w http.ResponseWriter, r *http.Request) {
......
}
代码比较长,忽略部分细节代码。
读取请求的body,验证长度和content-type:
defer r.Body.Close()
var body []byte
if r.Body != nil {
if data, err := ioutil.ReadAll(r.Body); err == nil {
body = data
}
}
if len(body) == 0 {
log.Error("empty body")
http.Error(w, "empty body", http.StatusBadRequest)
return
}
contentType := r.Header.Get("Content-Type")
if contentType != "application/json" {
log.Errorf("Content-Type=%s, expect application/json", contentType)
http.Error(
w,
"invalid Content-Type, expect `application/json`",
http.StatusUnsupportedMediaType,
)
return
}
反序列化body,并做一些基本的验证:
ar := v1.AdmissionReview{}
_, gvk, err := i.deserializer.Decode(body, nil, &ar)
if err != nil {
log.Errorf("Can't decode body: %v", err)
} else {
if !utils.StringSliceContains(ar.Request.UserInfo.UID, i.authUIDs) {
err = errors.Wrapf(err, "unauthorized request")
log.Error(err)
} else if ar.Request.Kind.Kind != "Pod" {
err = errors.Wrapf(err, "invalid kind for review: %s", ar.Kind)
log.Error(err)
} else {
patchOps, err = i.getPodPatchOperations(&ar, i.config.Namespace, i.config.SidecarImage, i.config.SidecarImagePullPolicy, i.kubeClient, i.daprClient)
}
}
getPodPatchOperations 是核心代码,后面细看。
统一处理前面可能产生的错误,以及 getPodPatchOperations() 的处理结果:
diagAppID := getAppIDFromRequest(ar.Request)
if err != nil {
admissionResponse = toAdmissionResponse(err)
log.Errorf("Sidecar injector failed to inject for app '%s'. Error: %s", diagAppID, err)
monitoring.RecordFailedSidecarInjectionCount(diagAppID, "patch")
} else if len(patchOps) == 0 {
// len(patchOps) == 0 表示什么都没改,返回 Allowed: true
admissionResponse = &v1.AdmissionResponse{
Allowed: true,
}
} else {
var patchBytes []byte
// 将 patchOps 序列化为json
patchBytes, err = json.Marshal(patchOps)
if err != nil {
admissionResponse = toAdmissionResponse(err)
} else {
// 返回AdmissionResponse
admissionResponse = &v1.AdmissionResponse{
Allowed: true,
Patch: patchBytes,
PatchType: func() *v1.PatchType {
pt := v1.PatchTypeJSONPatch
return &pt
}(),
}
}
}
组装 AdmissionReview:
admissionReview := v1.AdmissionReview{}
if admissionResponse != nil {
admissionReview.Response = admissionResponse
if ar.Request != nil {
admissionReview.Response.UID = ar.Request.UID
admissionReview.SetGroupVersionKind(*gvk)
}
}
将应答序列化并返回:
log.Infof("ready to write response ...")
respBytes, err := json.Marshal(admissionReview)
if err != nil {
http.Error(
w,
err.Error(),
http.StatusInternalServerError,
)
log.Errorf("Sidecar injector failed to inject for app '%s'. Can't deserialize response: %s", diagAppID, err)
monitoring.RecordFailedSidecarInjectionCount(diagAppID, "response")
}
w.Header().Set("Content-Type", "application/json")
if _, err := w.Write(respBytes); err != nil {
log.Error(err)
} else {
log.Infof("Sidecar injector succeeded injection for app '%s'", diagAppID)
monitoring.RecordSuccessfulSidecarInjectionCount(diagAppID)
}
toAdmissionResponse 方法用于从一个 error 创建 k8s 的 AdmissionResponse :
// toAdmissionResponse is a helper function to create an AdmissionResponse
// with an embedded error
func toAdmissionResponse(err error) *v1.AdmissionResponse {
return &v1.AdmissionResponse{
Result: &metav1.Status{
Message: err.Error(),
},
}
}
getAppIDFromRequest() 方法从 AdmissionRequest 中获取AppID:
func getAppIDFromRequest(req *v1.AdmissionRequest) string {
// default App ID
appID := ""
// if req is not given
if req == nil {
return appID
}
var pod corev1.Pod
// 解析pod的raw数据为json
if err := json.Unmarshal(req.Object.Raw, &pod); err != nil {
log.Warnf("could not unmarshal raw object: %v", err)
} else {
// 然后从pod信息中获取appID
appID = getAppID(pod)
}
return appID
}
getAppID()方法的实现如下,首先读取 “dapr.io/app-id” 的 Annotation,如果没有,则取 pod 的 name 作为默认AppID:
const appIDKey = "dapr.io/app-id"
func getAppID(pod corev1.Pod) string {
return getStringAnnotationOrDefault(pod.Annotations, appIDKey, pod.GetName())
}
AllowedControllersServiceAccountUID()方法返回UID数组,这些是 webhook handler 上容许的 service account 列表:
var allowedControllersServiceAccounts = []string{
"replicaset-controller",
"deployment-controller",
"cronjob-controller",
"job-controller",
"statefulset-controller",
}
// AllowedControllersServiceAccountUID returns an array of UID, list of allowed service account on the webhook handler
func AllowedControllersServiceAccountUID(ctx context.Context, kubeClient *kubernetes.Clientset) ([]string, error) {
allowedUids := []string{}
for i, allowedControllersServiceAccount := range allowedControllersServiceAccounts {
saUUID, err := getServiceAccount(ctx, kubeClient, allowedControllersServiceAccount)
// i == 0 => "replicaset-controller" is the only one mandatory
if err != nil && i == 0 {
return nil, err
} else if err != nil {
log.Warnf("Unable to get SA %s UID (%s)", allowedControllersServiceAccount, err)
continue
}
allowedUids = append(allowedUids, saUUID)
}
return allowedUids, nil
}
func getServiceAccount(ctx context.Context, kubeClient *kubernetes.Clientset, allowedControllersServiceAccount string) (string, error) {
ctxWithTimeout, cancel := context.WithTimeout(ctx, getKubernetesServiceAccountTimeoutSeconds*time.Second)
defer cancel()
sa, err := kubeClient.CoreV1().ServiceAccounts(metav1.NamespaceSystem).Get(ctxWithTimeout, allowedControllersServiceAccount, metav1.GetOptions{})
if err != nil {
return "", err
}
return string(sa.ObjectMeta.UID), nil
}
代码非常简单,只定义了一个结构体 PatchOperation,用来表示要应用于Kubernetes资源的一个单独的变化。
// PatchOperation represents a discreet change to be applied to a Kubernetes resource
type PatchOperation struct {
Op string `json:"op"`
Path string `json:"path"`
Value interface{} `json:"value,omitempty"`
}
getPodPatchOperations() 是最重要的方法,injector 对 pod 的修改就在这里进行:
func (i *injector) getPodPatchOperations(ar *v1.AdmissionReview,
namespace, image, imagePullPolicy string, kubeClient *kubernetes.Clientset, daprClient scheme.Interface) ([]PatchOperation, error) {
......
return patchOps, nil
}
解析request,得到 pod 对象 (这里和前面重复了?):
req := ar.Request
var pod corev1.Pod
if err := json.Unmarshal(req.Object.Raw, &pod); err != nil {
errors.Wrap(err, "could not unmarshal raw object")
return nil, err
}
判断是否需要 injector 做处理:
if !isResourceDaprEnabled(pod.Annotations) || podContainsSidecarContainer(&pod) {
return nil, nil
}
// 判断是否启动了dapr,依据是是否设置 annotation "dapr.io/enabled" 为 true,默认为false
const daprEnabledKey = "dapr.io/enabled"
func isResourceDaprEnabled(annotations map[string]string) bool {
return getBoolAnnotationOrDefault(annotations, daprEnabledKey, false)
}
// 判断是否包含了 dapr 的 sidecar container
const sidecarContainerName = "daprd"
func podContainsSidecarContainer(pod *corev1.Pod) bool {
for _, c := range pod.Spec.Containers {
// 检测方式是循环pod中的所有container,检查是否有container的名字为 "daprd"
if c.Name == sidecarContainerName {
return true
}
}
return false
}
创建 daprd sidecar container:
sidecarContainer, err := getSidecarContainer(pod.Annotations, id, image, imagePullPolicy, req.Namespace, apiSrvAddress, placementAddress, tokenMount, trustAnchors, certChain, certKey, sentryAddress, mtlsEnabled, identity)
getSidecarContainer()的细节后面看,先走完主流程。
patchOps := []PatchOperation{}
envPatchOps := []PatchOperation{}
var path string
var value interface{}
if len(pod.Spec.Containers) == 0 {
// 如果pod的container数量为0(什么情况下会有这种没有container的pod?)
path = containersPath
value = []corev1.Container{*sidecarContainer}
} else {
// 将 daprd 的sidecar 加入
envPatchOps = addDaprEnvVarsToContainers(pod.Spec.Containers)
// TODO:path 的设值有什么规范或者要求?
path = "/spec/containers/-"
value = sidecarContainer
}
patchOps = append(
patchOps,
PatchOperation{
Op: "add",
Path: path,
Value: value,
},
)
patchOps = append(patchOps, envPatchOps...)
// This function add Dapr environment variables to all the containers in any Dapr enabled pod.
// The containers can be injected or user defined.
func addDaprEnvVarsToContainers(containers []corev1.Container) []PatchOperation {
portEnv := []corev1.EnvVar{
{
Name: userContainerDaprHTTPPortName,
Value: strconv.Itoa(sidecarHTTPPort),
},
{
Name: userContainerDaprGRPCPortName,
Value: strconv.Itoa(sidecarAPIGRPCPort),
},
}
envPatchOps := make([]PatchOperation, 0, len(containers))
for i, container := range containers {
path := fmt.Sprintf("%s/%d/env", containersPath, i)
patchOps := getEnvPatchOperations(container.Env, portEnv, path)
envPatchOps = append(envPatchOps, patchOps...)
}
return envPatchOps
}
mtlsEnabled := mTLSEnabled(daprClient)
if mtlsEnabled {
trustAnchors, certChain, certKey = getTrustAnchorsAndCertChain(kubeClient, namespace)
identity = fmt.Sprintf("%s:%s", req.Namespace, pod.Spec.ServiceAccountName)
}
func mTLSEnabled(daprClient scheme.Interface) bool {
resp, err := daprClient.ConfigurationV1alpha1().Configurations(meta_v1.NamespaceAll).List(meta_v1.ListOptions{})
if err != nil {
log.Errorf("Failed to load dapr configuration from k8s, use default value %t for mTLSEnabled: %s", defaultMtlsEnabled, err)
return defaultMtlsEnabled
}
for _, c := range resp.Items {
if c.GetName() == defaultConfig {
return c.Spec.MTLSSpec.Enabled
}
}
log.Infof("Dapr system configuration (%s) is not found, use default value %t for mTLSEnabled", defaultConfig, defaultMtlsEnabled)
return defaultMtlsEnabled
}
components-contrib仓库中的代码:
代码量比较少,就放在一起看吧。
workflow 接口定义了 workflow 上要履行的操作:
var ErrNotImplemented = errors.New("this component doesn't implement the current API operation")
type Workflow interface {
Init(metadata Metadata) error
Start(ctx context.Context, req *StartRequest) (*StartResponse, error)
Terminate(ctx context.Context, req *TerminateRequest) error
Get(ctx context.Context, req *GetRequest) (*StateResponse, error)
RaiseEvent(ctx context.Context, req *RaiseEventRequest) error
Purge(ctx context.Context, req *PurgeRequest) error
Pause(ctx context.Context, req *PauseRequest) error
Resume(ctx context.Context, req *ResumeRequest) error
}
其中 Init 是初始化 workflow 实现。
Start / Terminate / Pause / Resume 是 workflow 的生命周期管理。
如果没有实现上述操作,则需要返回错误,而错误信息在 ErrNotImplemented 中有统一给出。
通过 metadata 进行初始化,和其他组件类似:
type Workflow interface {
Init(metadata Metadata) error
......
}
type Metadata struct {
metadata.Base `json:",inline"`
}
start 操作用来开始一个工作流:
type Workflow interface {
Start(ctx context.Context, req *StartRequest) (*StartResponse, error)
......
}
// StartRequest is the struct describing a start workflow request.
type StartRequest struct {
InstanceID string `json:"instanceID"`
Options map[string]string `json:"options"`
WorkflowName string `json:"workflowName"`
WorkflowInput []byte `json:"workflowInput"`
}
type StartResponse struct {
InstanceID string `json:"instanceID"`
}
start 操作的请求参数是:
start 操作的响应参数是:
Terminate 操作用来终止一个 workflow:
type Workflow interface {
Terminate(ctx context.Context, req *TerminateRequest) error
}
type TerminateRequest struct {
InstanceID string `json:"instanceID"`
}
start 操作的请求只需要传递一个 InstanceID 参数。
Get 操作用来或者一个工作流实例的状态:
type Workflow interface {
Get(ctx context.Context, req *GetRequest) (*StateResponse, error)
......
}
type GetRequest struct {
InstanceID string `json:"instanceID"`
}
type StateResponse struct {
Workflow *WorkflowState `json:"workflow"`
}
type WorkflowState struct {
InstanceID string `json:"instanceID"`
WorkflowName string `json:"workflowName"`
CreatedAt time.Time `json:"startedAt"`
LastUpdatedAt time.Time `json:"lastUpdatedAt"`
RuntimeStatus string `json:"runtimeStatus"`
Properties map[string]string `json:"properties"`
}
Get 操作的请求只需要传递一个 InstanceID 参数。
Get 操作的响应参数是 WorkflowState,字段有:
Purge 操作用来终止一个 workflow:
type Workflow interface {
Purge(ctx context.Context, req *PurgeRequest) error
}
type PurgeRequest struct {
InstanceID string `json:"instanceID"`
}
Purge 操作的请求只需要传递一个 InstanceID 参数。
Pause 操作用来暂停一个 workflow:
type Workflow interface {
Pause(ctx context.Context, req *PauseRequest) error
}
type PauseRequest struct {
InstanceID string `json:"instanceID"`
}
Pause 操作的请求只需要传递一个 InstanceID 参数。
Resume 操作用来继续一个 workflow:
type Workflow interface {
Resume(ctx context.Context, req *ResumeRequest) error
}
type ResumeRequest struct {
InstanceID string `json:"instanceID"`
}
Resume 操作的请求只需要传递一个 InstanceID 参数。
TemporalWF 结构体包含 temporal 的 client:
type TemporalWF struct {
client client.Client
logger logger.Logger
}
temporalMetadata 结构体定义 metadata:
type temporalMetadata struct {
Identity string `json:"identity" mapstructure:"identity"`
HostPort string `json:"hostport" mapstructure:"hostport"`
Namespace string `json:"namespace" mapstructure:"namespace"`
}
// NewTemporalWorkflow returns a new workflow.
func NewTemporalWorkflow(logger logger.Logger) workflows.Workflow {
s := &TemporalWF{
logger: logger,
}
return s
}
func (c *TemporalWF) Init(metadata workflows.Metadata) error {
c.logger.Debugf("Temporal init start")
m, err := c.parseMetadata(metadata)
if err != nil {
return err
}
cOpt := client.Options{}
if m.HostPort != "" {
cOpt.HostPort = m.HostPort
}
if m.Identity != "" {
cOpt.Identity = m.Identity
}
if m.Namespace != "" {
cOpt.Namespace = m.Namespace
}
// Create the workflow client
newClient, err := client.Dial(cOpt)
if err != nil {
return err
}
c.client = newClient
return nil
}
func (c *TemporalWF) parseMetadata(meta workflows.Metadata) (*temporalMetadata, error) {
var m temporalMetadata
err := metadata.DecodeMetadata(meta.Properties, &m)
return &m, err
}
func (c *TemporalWF) Start(ctx context.Context, req *workflows.StartRequest) (*workflows.StartResponse, error) {
c.logger.Debugf("starting workflow")
if len(req.Options) == 0 {
c.logger.Debugf("no options provided")
return nil, errors.New("no options provided. At the very least, a task queue is needed")
}
if _, ok := req.Options["task_queue"]; !ok {
c.logger.Debugf("no task queue provided")
return nil, errors.New("no task queue provided")
}
taskQ := req.Options["task_queue"]
opt := client.StartWorkflowOptions{ID: req.InstanceID, TaskQueue: taskQ}
var inputArgs interface{}
if err := decodeInputData(req.WorkflowInput, &inputArgs); err != nil {
return nil, fmt.Errorf("error decoding workflow input data: %w", err)
}
run, err := c.client.ExecuteWorkflow(ctx, opt, req.WorkflowName, inputArgs)
if err != nil {
return nil, fmt.Errorf("error executing workflow: %w", err)
}
wfStruct := workflows.StartResponse{InstanceID: run.GetID()}
return &wfStruct, nil
}
代码和 temporal 的牵连还是很重的,WorkflowInput 相当于透传给了 temporal ,dapr 对此没有做任何的抽象和封装,只是简单透传。
func (c *TemporalWF) Terminate(ctx context.Context, req *workflows.TerminateRequest) error {
c.logger.Debugf("terminating workflow")
err := c.client.TerminateWorkflow(ctx, req.InstanceID, "", "")
if err != nil {
return fmt.Errorf("error terminating workflow: %w", err)
}
return nil
}
sentry 模块的入口在文件 cmd/sentry/main.go
中。
const (
defaultCredentialsPath = "/var/run/dapr/credentials"
// defaultDaprSystemConfigName is the default resource object name for Dapr System Config.
defaultDaprSystemConfigName = "daprsystem"
healthzPort = 8080
)
func main() {
configName := flag.String("config", defaultDaprSystemConfigName, "Path to config file, or name of a configuration object")
credsPath := flag.String("issuer-credentials", defaultCredentialsPath, "Path to the credentials directory holding the issuer data")
flag.StringVar(&credentials.RootCertFilename, "issuer-ca-filename", credentials.RootCertFilename, "Certificate Authority certificate filename")
flag.StringVar(&credentials.IssuerCertFilename, "issuer-certificate-filename", credentials.IssuerCertFilename, "Issuer certificate filename")
flag.StringVar(&credentials.IssuerKeyFilename, "issuer-key-filename", credentials.IssuerKeyFilename, "Issuer private key filename")
trustDomain := flag.String("trust-domain", "localhost", "The CA trust domain")
tokenAudience := flag.String("token-audience", "", "Expected audience for tokens; multiple values can be separated by a comma")
......
}
logger 和 metrics 的参数需要展开:
loggerOptions := logger.DefaultOptions()
loggerOptions.AttachCmdFlags(flag.StringVar, flag.BoolVar)
metricsExporter := metrics.NewExporter(metrics.DefaultMetricNamespace)
metricsExporter.Options().AttachCmdFlags(flag.StringVar, flag.BoolVar)
获取 k8s 的 配置文件路径:
var kubeconfig *string
if home := homedir.HomeDir(); home != "" {
// 读取 home 路径
kubeconfig = flag.String("kubeconfig", filepath.Join(home, ".kube", "config"), "(optional) absolute path to the kubeconfig file")
} else {
// 通过 `--kubeconfig` 传递完整的 kubeconfig 文件路径
kubeconfig = flag.String("kubeconfig", "", "absolute path to the kubeconfig file")
}
最后解析一把:
flag.Parse()
将 kubeconfig 的值设置到 KUBE_CONFIG 环境变量:
var (
KubeConfigVar = "KUBE_CONFIG"
)
if err := utils.SetEnvVariables(map[string]string{
utils.KubeConfigVar: *kubeconfig,
}); err != nil {
log.Fatalf("error set env failed: %s", err.Error())
}
这行日志标记着初始化正式开始:
log.Infof("starting sentry certificate authority -- version %s -- commit %s", buildinfo.Version(), buildinfo.Commit())
log.Infof("log level set to: %s", loggerOptions.OutputLevel)
// Initialize dapr metrics exporter
if err := metricsExporter.Init(); err != nil {
log.Fatal(err)
}
if err := monitoring.InitMetrics(); err != nil {
log.Fatal(err)
}
// 拼凑文件路径
issuerCertPath := filepath.Join(*credsPath, credentials.IssuerCertFilename) //issuer.crt
issuerKeyPath := filepath.Join(*credsPath, credentials.IssuerKeyFilename) // issuer.key
rootCertPath := filepath.Join(*credsPath, credentials.RootCertFilename) // ca.crt
// 读取 sentry 配置:
config, err := config.FromConfigName(*configName)
if err != nil {
log.Warn(err)
}
// 保存证书相关的各个路径和参数
config.IssuerCertPath = issuerCertPath
config.IssuerKeyPath = issuerKeyPath
config.RootCertPath = rootCertPath
config.TrustDomain = *trustDomain
if *tokenAudience != "" {
config.TokenAudience = tokenAudience
}
ca := sentry.NewSentryCA()
// Start the server in background
err = ca.Start(runCtx, config)
if err != nil {
log.Fatalf("failed to restart sentry server: %s", err)
}
log.Infof("starting watch on filesystem directory: %s", watchDir)
// Start the health server in background
go func() {
healthzServer := health.NewServer(log)
healthzServer.Ready()
if innerErr := healthzServer.Run(runCtx, healthzPort); innerErr != nil {
log.Fatalf("failed to start healthz server: %s", innerErr)
}
}()
issuerEvent := make(chan struct{})
watchDir := filepath.Dir(config.IssuerCertPath)
// Watch for changes in the watchDir
// This also blocks until runCtx is canceled
fswatcher.Watch(runCtx, watchDir, issuerEvent)
这个函数会一直阻塞直到 runCtx 被取消(这意味着要退出 sentry 进程)。
如果有文件更新,则 issuerEvent 会收到 event,issuerEvent 相关的处理代码:
go func() {
// Restart the server when the issuer credentials change
var restart <-chan time.Time
for {
select {
case <-issuerEvent:
monitoring.IssuerCertChanged()
log.Debug("received issuer credentials changed signal")
// Batch all signals within 2s of each other
if restart == nil {
// issuerEvent 不会被直接处理,而是安排在 2 秒发一个 restart event
// 2秒之内的各种 issuerEvent 都会被这个 restart event 集中处理
restart = time.After(2 * time.Second)
}
case <-restart:
// 收到 restart,意味着 issuerEvent 已经积攒了 2 秒钟,可以统一处理了
log.Warn("issuer credentials changed; reloading")
innerErr := ca.Restart(runCtx, config)
if innerErr != nil {
log.Fatalf("failed to restart sentry server: %s", innerErr)
}
// 重置 restart,恢复原样,以便处理 2 秒之后的后续 issuerEvent
restart = nil
}
}
}()
shutdownDuration := 5 * time.Second
log.Infof("allowing %s for graceful shutdown to complete", shutdownDuration)
<-time.After(shutdownDuration)
去除非核心代码,sentry main 函数的主要功能是启动 sentry 的 ca server, 并监控目录,如果有变化则重启 ca server。
sentry 模块的 proto 服务定义在文件 dapr/proto/sentry/v1/sentry.proto
中。
service CA {
// A request for a time-bound certificate to be signed.
//
// The requesting side must provide an id for both loosely based
// And strong based identities.
rpc SignCertificate (SignCertificateRequest) returns (SignCertificateResponse) {}
}
SignCertificate() 方法要求签署一个有时间限制的证书。请求方必须提供一个可以同时用于松散型身份和强势型身份的ID。
SignCertificateRequest 的定义:
message SignCertificateRequest {
string id = 1;
string token = 2;
string trust_domain = 3;
string namespace = 4;
// A PEM-encoded x509 CSR.
bytes certificate_signing_request = 5;
}
SignCertificateResponse 的定义:
message SignCertificateResponse {
// A PEM-encoded x509 Certificate.
bytes workload_certificate = 1;
// A list of PEM-encoded x509 Certificates that establish the trust chain
// between the workload certificate and the well-known trust root cert.
repeated bytes trust_chain_certificates = 2;
google.protobuf.Timestamp valid_until = 3;
}
trust_chain_certificates 是一个 PEM 编码的 x509 证书的列表,这些证书在 workload_certificate 和众所周知的信任根证书(trust root cert)之间建立信任链。
sentry 模块的主要实现在文件 pkg/sentry/sentry.go
中。
type CertificateAuthority interface {
Start(context.Context, config.SentryConfig) error
Stop()
Restart(context.Context, config.SentryConfig) error
}
start 和 restart 的函数定义是一样的。
type sentry struct {
conf config.SentryConfig // sentry的配置,启动时由 main 函数初始化后传入
ctx context.Context // 启动时由 main 函数初始化后传入
cancel context.CancelFunc
server server.CAServer // CA server
restartLock sync.Mutex // 用于 restart 的锁
running chan bool
stopping chan bool
}
Sentry.go 被 sentry main.go 调用,主要工作流程就是三个事情:
// 1. 初始化
ca := sentry.NewSentryCA()
// 2. 启动
err = ca.Start(runCtx, config)
// 3. 在需要时重启
innerErr := ca.Restart(runCtx, config)
备注:sentry main.go 没有调用 sentry的 stop(),这个 stop() 只在 restart() 方法中被调用。
NewSentryCA() 的实现:
// NewSentryCA returns a new Sentry Certificate Authority instance.
func NewSentryCA() CertificateAuthority {
return &sentry{
running: make(chan bool, 1),
}
}
什么都没干,只是初始化了 running 这个channel。
// Start the server in background.
func (s *sentry) Start(ctx context.Context, conf config.SentryConfig) error {
// If the server is already running, return an error
select {
case s.running <- true:
default:
return errors.New("CertificateAuthority server is already running")
}
// Create the CA server
s.conf = conf
certAuth, v := s.createCAServer()
// Start the server in background
s.ctx, s.cancel = context.WithCancel(ctx)
go s.run(certAuth, v)
// Wait 100ms to ensure a clean startup
time.Sleep(100 * time.Millisecond)
return nil
}
主要工作就是创建 CA server,然后运行服务。
createCAServer() 方法加载信任锚和签发者证书,然后创建一个新的CA:
// Loads the trust anchors and issuer certs, then creates a new CA.
func (s *sentry) createCAServer() (ca.CertificateAuthority, identity.Validator) {
// Create CA
certAuth, authorityErr := ca.NewCertificateAuthority(s.conf)
if authorityErr != nil {
log.Fatalf("error getting certificate authority: %s", authorityErr)
}
log.Info("certificate authority loaded")
// Load the trust bundle
trustStoreErr := certAuth.LoadOrStoreTrustBundle()
if trustStoreErr != nil {
log.Fatalf("error loading trust root bundle: %s", trustStoreErr)
}
certExpiry := certAuth.GetCACertBundle().GetIssuerCertExpiry()
if certExpiry == nil {
log.Fatalf("error loading trust root bundle: missing certificate expiry")
} else {
// Need to be in an else block for the linter
log.Infof("trust root bundle loaded. issuer cert expiry: %s", certExpiry.String())
}
monitoring.IssuerCertExpiry(certExpiry)
// Create identity validator
v, validatorErr := s.createValidator()
if validatorErr != nil {
log.Fatalf("error creating validator: %s", validatorErr)
}
log.Info("validator created")
return certAuth, v
}
方法返回 ca.CertificateAuthority 和 identity.Validator 。
createValidator 的实现细节:
func (s *sentry) createValidator() (identity.Validator, error) {
if config.IsKubernetesHosted() { // 通过 KUBERNETES_SERVICE_HOST 环境变量来判断
// we're in Kubernetes, create client and init a new serviceaccount token validator
kubeClient, err := k8s.GetClient()
if err != nil {
return nil, fmt.Errorf("failed to create kubernetes client: %w", err)
}
// TODO: Remove once the NoDefaultTokenAudience feature is finalized
noDefaultTokenAudience := false
// 创建 kubernetes 的 Validator
return kubernetes.NewValidator(kubeClient, s.conf.GetTokenAudiences(), noDefaultTokenAudience), nil
}
// 创建 selfhosted 的 Validator
return selfhosted.NewValidator(), nil
}
run 方法运行 CA server,阻塞直到服务器关闭:
// Runs the CA server.
// This method blocks until the server is shut down.
func (s *sentry) run(certAuth ca.CertificateAuthority, v identity.Validator) {
s.server = server.NewCAServer(certAuth, v)
// In background, watch for the root certificate's expiration
go watchCertExpiry(s.ctx, certAuth)
// Watch for context cancelation to stop the server
go func() {
<-s.ctx.Done()
s.server.Shutdown()
close(s.running)
s.running = make(chan bool, 1)
if s.stopping != nil {
close(s.stopping)
}
}()
// Start the server; this is a blocking call
log.Infof("sentry certificate authority is running, protecting y'all")
serverRunErr := s.server.Run(s.conf.Port, certAuth.GetCACertBundle())
if serverRunErr != nil {
log.Fatalf("error starting gRPC server: %s", serverRunErr)
}
}
启动 ca 的 grpc server 以便接收外部请求。
Run() 方法中启动了一个 goroutine,用于监控证书是否过期。如果快要过期了,则会显示错误信息。
// Watches certificates' expiry and shows an error message when they're nearing expiration time.
// This is a blocking method that should be run in its own goroutine.
func watchCertExpiry(ctx context.Context, certAuth ca.CertificateAuthority) {
log.Debug("starting root certificate expiration watcher")
// time 是每小时触发一次
certExpiryCheckTicker := time.NewTicker(time.Hour)
for {
select {
case <-certExpiryCheckTicker.C:
caCrt := certAuth.GetCACertBundle().GetRootCertPem()
block, _ := pem.Decode(caCrt)
cert, certParseErr := x509.ParseCertificate(block.Bytes)
if certParseErr != nil {
log.Warn("could not determine Dapr root certificate expiration time")
break
}
if cert.NotAfter.Before(time.Now().UTC()) {
// 已经过期则报警
log.Warn("Dapr root certificate expiration warning: certificate has expired.")
break
}
if (cert.NotAfter.Add(-30 * 24 * time.Hour)).Before(time.Now().UTC()) {
// 有效期不足30天也报警
expiryDurationHours := int(cert.NotAfter.Sub(time.Now().UTC()).Hours())
log.Warnf("Dapr root certificate expiration warning: certificate expires in %d days and %d hours", expiryDurationHours/24, expiryDurationHours%24)
} else {
validity := cert.NotAfter.Sub(time.Now().UTC())
log.Debugf("Dapr root certificate is still valid for %s", validity.String())
}
case <-ctx.Done():
log.Debug("terminating root certificate expiration watcher")
certExpiryCheckTicker.Stop()
return
}
}
}
// Stop the server.
func (s *sentry) Stop() {
log.Info("sentry certificate authority is shutting down")
if s.cancel != nil {
s.stopping = make(chan bool)
s.cancel()
<-s.stopping
s.stopping = nil
}
}
Restart() 方法重启 sentry:
func (s *sentry) Restart(ctx context.Context, conf config.SentryConfig) error {
s.restartLock.Lock()
defer s.restartLock.Unlock()
log.Info("sentry certificate authority is restarting")
s.Stop()
// Wait 200ms to ensure a clean shutdown
time.Sleep(200 * time.Millisecond)
return s.Start(ctx, conf)
}
步骤:
ca server 的实现在文件 pkg/sentry/server/server.go
中。
// CAServer is an interface for the Certificate Authority server.
type CAServer interface {
Run(port int, trustBundle ca.TrustRootBundler) error
Shutdown()
}
type server struct {
certificate *tls.Certificate
certAuth ca.CertificateAuthority
srv *grpc.Server // grpc server,用来对外提供 grpc 服务
validator identity.Validator
}
server.go 被 sentry.go 调用,主要工作流程就是三个事情:
// 1. 初始化CA server
s.server = server.NewCAServer(certAuth, v)
// 2. 运行CA server
s.server.Run(s.conf.Port, certAuth.GetCACertBundle())
// 3. 在需要时关闭CA server
s.server.Shutdown()
// NewCAServer returns a new CA Server running a gRPC server.
func NewCAServer(ca ca.CertificateAuthority, validator identity.Validator) CAServer {
return &server{
certAuth: ca,
validator: validator,
}
}
保存传递进来的参数,这两个参数在 sentry.go 中初始化。
CA server 主要提供两个功能:
// Run starts a secured gRPC server for the Sentry Certificate Authority.
// It enforces client side cert validation using the trust root cert.
func (s *server) Run(port int, trustBundler ca.TrustRootBundler) error {
addr := fmt.Sprintf(":%d", port)
lis, err := net.Listen("tcp", addr)
if err != nil {
return fmt.Errorf("could not listen on %s: %w", addr, err)
}
tlsOpt := s.tlsServerOption(trustBundler)
// 创建 grpc server
s.srv = grpc.NewServer(tlsOpt)
// 注册 ca server 到 grpc server
sentryv1pb.RegisterCAServer(s.srv, s)
// 启动 grpc server 监听服务地址
if err := s.srv.Serve(lis); err != nil {
return fmt.Errorf("grpc serve error: %w", err)
}
return nil
}
trustBundler 是从 sentry.go 中传递过来,后面详细展开。
func (s *server) Shutdown() {
if s.srv != nil {
// 调用 grpc 的 GracefulStop,会在请求完成后再关闭
s.srv.GracefulStop()
}
}
tlsServerOption() 方法,为客户端连接准备 tls 相关的选项:
func (s *server) tlsServerOption(trustBundler ca.TrustRootBundler) grpc.ServerOption {
cp := trustBundler.GetTrustAnchors()
//nolint:gosec
config := &tls.Config{
ClientCAs: cp,
// 这里要求验证客户端证书
// Require cert verification
ClientAuth: tls.RequireAndVerifyClientCert,
GetCertificate: func(*tls.ClientHelloInfo) (*tls.Certificate, error) {
if s.certificate == nil || needsRefresh(s.certificate, serverCertExpiryBuffer) {
// 如果ca server的证书为空,或者需要刷新,则开始创建/刷新证书
cert, err := s.getServerCertificate()
if err != nil {
monitoring.ServerCertIssueFailed("server_cert")
log.Error(err)
return nil, fmt.Errorf("failed to get TLS server certificate: %w", err)
}
s.certificate = cert
}
return s.certificate, nil
},
}
return grpc.Creds(credentials.NewTLS(config))
}
needsRefresh() 方法的实现:
func needsRefresh(cert *tls.Certificate, expiryBuffer time.Duration) bool {
leaf := cert.Leaf
if leaf == nil {
return true
}
// Check if the leaf certificate is about to expire.
// 检查是不是快要过期了:15 分钟
return leaf.NotAfter.Add(-serverCertExpiryBuffer).Before(time.Now().UTC())
}
const (
serverCertExpiryBuffer = time.Minute * 15
)
getServerCertificate() 方法负责生成服务器端的证书:
func (s *server) getServerCertificate() (*tls.Certificate, error) {
csrPem, pkPem, err := csr.GenerateCSR("", false)
if err != nil {
return nil, err
}
now := time.Now().UTC()
issuerExp := s.certAuth.GetCACertBundle().GetIssuerCertExpiry()
if issuerExp == nil {
return nil, errors.New("could not find expiration in issuer certificate")
}
serverCertTTL := issuerExp.Sub(now)
resp, err := s.certAuth.SignCSR(csrPem, s.certAuth.GetCACertBundle().GetTrustDomain(), nil, serverCertTTL, false)
if err != nil {
return nil, err
}
certPem := resp.CertPEM
certPem = append(certPem, s.certAuth.GetCACertBundle().GetIssuerCertPem()...)
if rootCertPem := s.certAuth.GetCACertBundle().GetRootCertPem(); len(rootCertPem) > 0 {
certPem = append(certPem, rootCertPem...)
}
cert, err := tls.X509KeyPair(certPem, pkPem)
if err != nil {
return nil, err
}
return &cert, nil
}
更多细节要看 certAuth.SignCSR() 方法的实现。
SignCertificate() 方法处理从 dapr sidedar 发起的 CSR 请求。这个方法接收带有 identity 和 初始证书的请求,并为调用者返回包括信任链在内的签名证书和过期时间。
// SignCertificate handles CSR requests originating from Dapr sidecars.
// The method receives a request with an identity and initial cert and returns
// A signed certificate including the trust chain to the caller along with an expiry date.
func (s *server) SignCertificate(ctx context.Context, req *sentryv1pb.SignCertificateRequest) (*sentryv1pb.SignCertificateResponse, error) {
monitoring.CertSignRequestReceived()
csrPem := req.GetCertificateSigningRequest()
// 解析请求中的 CSR
csr, err := certs.ParsePemCSR(csrPem)
if err != nil {
err = fmt.Errorf("cannot parse certificate signing request pem: %w", err)
log.Error(err)
monitoring.CertSignFailed("cert_parse")
return nil, err
}
// 验证 CSR
err = s.certAuth.ValidateCSR(csr)
if err != nil {
err = fmt.Errorf("error validating csr: %w", err)
log.Error(err)
monitoring.CertSignFailed("cert_validation")
return nil, err
}
// 验证请求身份
err = s.validator.Validate(req.GetId(), req.GetToken(), req.GetNamespace())
if err != nil {
err = fmt.Errorf("error validating requester identity: %w", err)
log.Error(err)
monitoring.CertSignFailed("req_id_validation")
return nil, err
}
// 签名证书
identity := identity.NewBundle(csr.Subject.CommonName, req.GetNamespace(), req.GetTrustDomain())
signed, err := s.certAuth.SignCSR(csrPem, csr.Subject.CommonName, identity, -1, false)
if err != nil {
err = fmt.Errorf("error signing csr: %w", err)
log.Error(err)
monitoring.CertSignFailed("cert_sign")
return nil, err
}
// 准备要返回的各种数据
certPem := signed.CertPEM
issuerCert := s.certAuth.GetCACertBundle().GetIssuerCertPem()
rootCert := s.certAuth.GetCACertBundle().GetRootCertPem()
certPem = append(certPem, issuerCert...)
if len(rootCert) > 0 {
certPem = append(certPem, rootCert...)
}
if len(certPem) == 0 {
err = errors.New("insufficient data in certificate signing request, no certs signed")
log.Error(err)
monitoring.CertSignFailed("insufficient_data")
return nil, err
}
expiry := timestamppb.New(signed.Certificate.NotAfter)
if err = expiry.CheckValid(); err != nil {
return nil, fmt.Errorf("could not validate certificate validity: %w", err)
}
// 组装 response 结构体
resp := &sentryv1pb.SignCertificateResponse{
WorkloadCertificate: certPem,
TrustChainCertificates: [][]byte{issuerCert, rootCert},
ValidUntil: expiry,
}
monitoring.CertSignSucceed()
return resp, nil
}
实现很简单,就是涉及到证书的各种操作,需要有相关的背景知识。
csr 相关的逻辑实现在文件 pkg/sentry/csr/csr.go
中。
const (
blockTypeECPrivateKey = "EC PRIVATE KEY" // EC private key
blockTypePrivateKey = "PRIVATE KEY" // PKCS#8 private key
encodeMsgCSR = "CERTIFICATE REQUEST"
encodeMsgCert = "CERTIFICATE"
)
// The OID for the SAN extension (http://www.alvestrand.no/objectid/2.5.29.17.html)
var oidSubjectAlternativeName = asn1.ObjectIdentifier{2, 5, 29, 17}
GenerateCSR() f方法创建 X.509 certificate sign request 和私钥:
// GenerateCSR creates a X.509 certificate sign request and private key.
func GenerateCSR(org string, pkcs8 bool) ([]byte, []byte, error) {
// 生成 ec 私钥
key, err := certs.GenerateECPrivateKey()
if err != nil {
return nil, nil, fmt.Errorf("unable to generate private keys: %w", err)
}
// 生成 csr 模版
templ, err := genCSRTemplate(org)
if err != nil {
return nil, nil, fmt.Errorf("error generating csr template: %w", err)
}
// 创建证书请求
csrBytes, err := x509.CreateCertificateRequest(rand.Reader, templ, key)
if err != nil {
return nil, nil, fmt.Errorf("failed to create CSR: %w", err)
}
// 编码证书
crtPem, keyPem, err := encode(true, csrBytes, key, pkcs8)
return crtPem, keyPem, err
}
生成 csr 模版的实现,只设置了Organization :
func genCSRTemplate(org string) (*x509.CertificateRequest, error) {
return &x509.CertificateRequest{
Subject: pkix.Name{
Organization: []string{org},
},
}, nil
}
编码证书的实现代码:
func encode(csr bool, csrOrCert []byte, privKey *ecdsa.PrivateKey, pkcs8 bool) ([]byte, []byte, error) {
// 判断是 "CERTIFICATE" 还是 "CERTIFICATE REQUEST"
encodeMsg := encodeMsgCert
if csr {
encodeMsg = encodeMsgCSR
}
// 执行编码
csrOrCertPem := pem.EncodeToMemory(&pem.Block{Type: encodeMsg, Bytes: csrOrCert})
var encodedKey, privPem []byte
var err error
if pkcs8 {
// 如果是 pkcs8,需要将私钥编码为 PKCS8 私钥 / "PRIVATE KEY"
if encodedKey, err = x509.MarshalPKCS8PrivateKey(privKey); err != nil {
return nil, nil, err
}
// 将上面的 PKCS8 私钥编码到内存
privPem = pem.EncodeToMemory(&pem.Block{Type: blockTypePrivateKey, Bytes: encodedKey})
} else {
// 不是 pkcs8 的话,需要将私钥编码为 EC 私钥 / "EC PRIVATE KEY"
encodedKey, err = x509.MarshalECPrivateKey(privKey)
if err != nil {
return nil, nil, err
}
privPem = pem.EncodeToMemory(&pem.Block{Type: blockTypeECPrivateKey, Bytes: encodedKey})
}
return csrOrCertPem, privPem, nil
}
generateBaseCert() 方法返回一个基本的非CA证书,该证书可以通过添加 subject、key usage 和附加属性成为一个工作负载或CA证书:
// generateBaseCert returns a base non-CA cert that can be made a workload or CA cert
// By adding subjects, key usage and additional proerties.
func generateBaseCert(ttl, skew time.Duration, publicKey interface{}) (*x509.Certificate, error) {
// 创建一个新的序列号
serNum, err := newSerialNumber()
if err != nil {
return nil, err
}
now := time.Now().UTC()
// Allow for clock skew with the NotBefore validity bound.
// 允许在 NotBefore 有效期内出现时钟偏移。
notBefore := now.Add(-1 * skew)
notAfter := now.Add(ttl)
// 创建并返回 x509 证书
return &x509.Certificate{
SerialNumber: serNum,
NotBefore: notBefore,
NotAfter: notAfter,
PublicKey: publicKey,
}, nil
}
创建一个新的序列号的代码实现细节:
func newSerialNumber() (*big.Int, error) {
// 序列号的最大值,1 << 128
serialNumLimit := new(big.Int).Lsh(big.NewInt(1), 128)
// 在这个区间内取随机数
serialNum, err := rand.Int(rand.Reader, serialNumLimit)
if err != nil {
return nil, fmt.Errorf("error generating serial number: %w", err)
}
return serialNum, nil
}
生成基础证书的第一步就是生成其他证书。
GenerateRootCertCSR() 方法返回 CA root cert x509 证书:
// GenerateRootCertCSR returns a CA root cert x509 Certificate.
func GenerateRootCertCSR(org, cn string, publicKey interface{}, ttl, skew time.Duration) (*x509.Certificate, error) {
// 先生成基本证书
cert, err := generateBaseCert(ttl, skew, publicKey)
if err != nil {
return nil, err
}
// 设置证书的参数
cert.KeyUsage = x509.KeyUsageCertSign
cert.ExtKeyUsage = append(cert.ExtKeyUsage, x509.ExtKeyUsageServerAuth, x509.ExtKeyUsageClientAuth)
cert.Subject = pkix.Name{
CommonName: cn,
Organization: []string{org},
}
cert.DNSNames = []string{cn}
cert.IsCA = true
cert.BasicConstraintsValid = true
cert.SignatureAlgorithm = x509.ECDSAWithSHA256
return cert, nil
}
GenerateCSRCertificate() 方法 返回 x509 Certificate,输入为 CSR / 签名证书 / 公钥 / 签名私钥 和持续时间:
// GenerateCSRCertificate returns an x509 Certificate from a CSR, signing cert, public key, signing private key and duration.
func GenerateCSRCertificate(csr *x509.CertificateRequest, subject string, identityBundle *identity.Bundle, signingCert *x509.Certificate, publicKey interface{}, signingKey crypto.PrivateKey,
ttl, skew time.Duration, isCA bool,
) ([]byte, error) {
// 先生成基本证书
cert, err := generateBaseCert(ttl, skew, publicKey)
if err != nil {
return nil, fmt.Errorf("error generating csr certificate: %w", err)
}
if isCA {
cert.KeyUsage = x509.KeyUsageCertSign | x509.KeyUsageCRLSign
} else {
cert.KeyUsage = x509.KeyUsageDigitalSignature | x509.KeyUsageKeyEncipherment
cert.ExtKeyUsage = append(cert.ExtKeyUsage, x509.ExtKeyUsageServerAuth, x509.ExtKeyUsageClientAuth)
}
if subject == "cluster.local" {
cert.Subject = pkix.Name{
CommonName: subject,
}
cert.DNSNames = []string{subject}
}
cert.Issuer = signingCert.Issuer
cert.IsCA = isCA
cert.IPAddresses = csr.IPAddresses
cert.Extensions = csr.Extensions
cert.BasicConstraintsValid = true
cert.SignatureAlgorithm = csr.SignatureAlgorithm
if identityBundle != nil {
spiffeID, err := identity.CreateSPIFFEID(identityBundle.TrustDomain, identityBundle.Namespace, identityBundle.ID)
if err != nil {
return nil, fmt.Errorf("error generating spiffe id: %w", err)
}
rv := []asn1.RawValue{
{
Bytes: []byte(spiffeID),
Class: asn1.ClassContextSpecific,
Tag: asn1.TagOID,
},
{
Bytes: []byte(fmt.Sprintf("%s.%s.svc.cluster.local", subject, identityBundle.Namespace)),
Class: asn1.ClassContextSpecific,
Tag: 2,
},
}
b, err := asn1.Marshal(rv)
if err != nil {
return nil, fmt.Errorf("failed to marshal asn1 raw value for spiffe id: %w", err)
}
cert.ExtraExtensions = append(cert.ExtraExtensions, pkix.Extension{
Id: oidSubjectAlternativeName,
Value: b,
Critical: true, // According to x509 and SPIFFE specs, a SubjAltName extension must be critical if subject name and DNS are not present.
})
}
return x509.CreateCertificate(rand.Reader, cert, signingCert, publicKey, signingKey)
}
这里涉及很多 x509 相关的领域知识。
certs 相关的逻辑实现在文件 pkg/sentry/certs/certs.go
中。
const (
BlockTypeCertificate = "CERTIFICATE"
BlockTypeECPrivateKey = "EC PRIVATE KEY"
BlockTypePKCS1PrivateKey = "RSA PRIVATE KEY"
BlockTypePKCS8PrivateKey = "PRIVATE KEY"
)
备注:这里的常量定义和 csr.go 中的有部分重复。
Credentials 结构体包含一个证书和一个 私钥:
// Credentials holds a certificate and private key.
type Credentials struct {
PrivateKey crypto.PrivateKey
Certificate *x509.Certificate
}
DecodePEMKey() 接收一个 PEM key 字节数组并返回一个代表 RSA 或 EC 私钥 的对象:
func DecodePEMKey(key []byte) (crypto.PrivateKey, error) {
// 解码 pem key
block, _ := pem.Decode(key)
if block == nil {
return nil, errors.New("key is not PEM encoded")
}
// 按照类型进行后续解析处理
switch block.Type {
case BlockTypeECPrivateKey:
// EC Private Key
return x509.ParseECPrivateKey(block.Bytes)
case BlockTypePKCS1PrivateKey:
// PKCS1 Private Key
return x509.ParsePKCS1PrivateKey(block.Bytes)
case BlockTypePKCS8PrivateKey:
// PKCS8 Private Key
return x509.ParsePKCS8PrivateKey(block.Bytes)
default:
return nil, fmt.Errorf("unsupported block type %s", block.Type)
}
}
DecodePEMCertificates() 方法接收一个 PEM 编码的 x509 证书字节数组,并以 x509.Certificate 对象片断的方式返回所有证书:
func DecodePEMCertificates(crtb []byte) ([]*x509.Certificate, error) {
certs := []*x509.Certificate{}
// crtb 数组可能包含多个证书
for len(crtb) > 0 {
var err error
var cert *x509.Certificate
// 解码单个 pem 证书
cert, crtb, err = decodeCertificatePEM(crtb)
if err != nil {
return nil, err
}
if cert != nil {
// it's a cert, add to pool
certs = append(certs, cert)
}
}
return certs, nil
}
decodeCertificatePEM() 方法解码单个 pem 证书:
func decodeCertificatePEM(crtb []byte) (*x509.Certificate, []byte, error) {
// 执行pem 解码
// pem.Decode() 方法将在输入中找到下一个 PEM 格式的块(证书,私钥 等)的输入。
// 它返回该块和输入的其余部分。
// 注意是返回剩余部分,当没有更多部分时,返回的长度为0
// 如果没有找到PEM数据,则返回 block 为nil,其余部分返回整个输入。
block, crtb := pem.Decode(crtb)
if block == nil {
return nil, crtb, errors.New("invalid PEM certificate")
}
if block.Type != BlockTypeCertificate {
return nil, nil, nil
}
// 解码 x509 证书
c, err := x509.ParseCertificate(block.Bytes)
return c, crtb, err
}
生成基础证书的第一步就是生成其他证书。
PEMCredentialsFromFiles() 方法接收一个密钥/证书对的路径,并返回一个经过验证的Credentials包装器:
func PEMCredentialsFromFiles(certPem, keyPem []byte) (*Credentials, error) {
// 解码 PEM key
pk, err := DecodePEMKey(keyPem)
if err != nil {
return nil, err
}
// 解码 PEM 证书
// 如果有多个证书,实际后续只使用多个证书中的第一个
crts, err := DecodePEMCertificates(certPem)
if err != nil {
return nil, err
}
if len(crts) == 0 {
return nil, errors.New("no certificates found")
}
// 检查私钥和证书的 PublicKey 是否匹配
match := matchCertificateAndKey(pk, crts[0])
if !match {
return nil, errors.New("error validating credentials: public and private key pair do not match")
}
// 构建 Credentials 结构体并返回
creds := &Credentials{
PrivateKey: pk,
Certificate: crts[0],
}
return creds, nil
}
matchCertificateAndKey() 方法检查私钥和证书的 PublicKey 是否匹配 :
func matchCertificateAndKey(pk any, cert *x509.Certificate) bool {
// 根据私钥的类型进行匹配
// 实际是根据私钥类型的不同,获取到 cert 相应的 PublicKey,然后和私钥的 PublicKey 对比看是否相同
switch key := pk.(type) {
case *ecdsa.PrivateKey:
// ecdsa PrivateKey
if cert.PublicKeyAlgorithm != x509.ECDSA {
return false
}
pub, ok := cert.PublicKey.(*ecdsa.PublicKey)
return ok && pub.Equal(key.Public())
case *rsa.PrivateKey:
// rsa PrivateKey
if cert.PublicKeyAlgorithm != x509.RSA {
return false
}
pub, ok := cert.PublicKey.(*rsa.PublicKey)
return ok && pub.Equal(key.Public())
case ed25519.PrivateKey:
// ed25519 Private Key
if cert.PublicKeyAlgorithm != x509.Ed25519 {
return false
}
pub, ok := cert.PublicKey.(ed25519.PublicKey)
return ok && pub.Equal(key.Public())
default:
return false
}
}
CertPoolFromPEM() 方法从一个 PEM 编码的证书字符串返回一个 CertPool
func CertPoolFromPEM(certPem []byte) (*x509.CertPool, error) {
// 解码 PEM 证书
certs, err := DecodePEMCertificates(certPem)
if err != nil {
return nil, err
}
if len(certs) == 0 {
return nil, errors.New("no certificates found")
}
// 从多个证书中创建 cert pool
return certPoolFromCertificates(certs), nil
}
certPoolFromCertificates() 方法的实现很简单:
func certPoolFromCertificates(certs []*x509.Certificate) *x509.CertPool {
// 创建 cert pool
pool := x509.NewCertPool()
for _, c := range certs {
// 将每个证书添加到 pool
pool.AddCert(c)
}
return pool
}
ParsePemCSR() 使用给定的 PEM 编码的证书签名请求构建一个 x509 证书请求:
func ParsePemCSR(csrPem []byte) (*x509.CertificateRequest, error) {
// pem 解码
block, _ := pem.Decode(csrPem)
if block == nil {
return nil, errors.New("certificate signing request is not properly encoded")
}
// 尝试 x509 解码证书请求
csr, err := x509.ParseCertificateRequest(block.Bytes)
if err != nil {
return nil, fmt.Errorf("failed to parse X.509 certificate signing request: %w", err)
}
return csr, nil
}
GenerateECPrivateKey() 方法返回一个新的 ECP 私钥:
func GenerateECPrivateKey() (*ecdsa.PrivateKey, error) {
return ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
}
这里涉及很多 x509 相关的领域知识。
certs 相关的存储逻辑实现在文件 pkg/sentry/certs/store.go
中。
const (
defaultSecretNamespace = "default"
)
StoreCredentials() 方法将 trust bundle 存储在 Kubernetes secret store 或者本地磁盘上,取决于托管的平台:
func StoreCredentials(ctx context.Context, conf config.SentryConfig, rootCertPem, issuerCertPem, issuerKeyPem []byte) error {
if config.IsKubernetesHosted() {
// 如果是 k8s 托管来
return storeKubernetes(ctx, rootCertPem, issuerCertPem, issuerKeyPem)
}
// 否则是自托管
return storeSelfhosted(rootCertPem, issuerCertPem, issuerKeyPem, conf.RootCertPath, conf.IssuerCertPath, conf.IssuerKeyPath)
}
storeKubernetes() 方法将凭证存储在 Kubernetes secret store 中:
// 部分常量定于在 consts.go 中
const (
TrustBundleK8sSecretName = "dapr-trust-bundle" /* #nosec */
)
func storeKubernetes(ctx context.Context, rootCertPem, issuerCertPem, issuerCertKey []byte) error {
// 准备 k8s client
kubeClient, err := kubernetes.GetClient()
if err != nil {
return err
}
// 获取 namespace
namespace := getNamespace()
// 调用 k8s API 的方法获取 secret
secret, err := kubeClient.CoreV1().Secrets(namespace).Get(context.TODO(), consts.TrustBundleK8sSecretName, metav1.GetOptions{})
if errors.IsNotFound(err) {
return fmt.Errorf("failed to get secret %w", err)
}
// 将 rootCertPem / issuerCertPem / issuerCertKey 保存到 secret 的 Data 中
secret.Data = map[string][]byte{
credentials.RootCertFilename: rootCertPem,
credentials.IssuerCertFilename: issuerCertPem,
credentials.IssuerKeyFilename: issuerCertKey,
}
// 更新保存 secret
// We update and not create because sentry expects a secret to already exist
_, err = kubeClient.CoreV1().Secrets(namespace).Update(ctx, secret, metav1.UpdateOptions{})
if err != nil {
return fmt.Errorf("failed saving secret to kubernetes: %w", err)
}
return nil
}
其中 getNamespace() 读取环境变量 “NAMESPACE” 来获知当前的命名空间,缺省值为 “default”:
const (
defaultSecretNamespace = "default"
)
func getNamespace() string {
namespace := os.Getenv("NAMESPACE")
if namespace == "" {
namespace = defaultSecretNamespace
}
return namespace
}
storeSelfhosted() 方法将凭证存储在本地文件中:
func StoreCredentials(...) {
......
return storeSelfhosted(rootCertPem, issuerCertPem, issuerKeyPem, conf.RootCertPath, conf.IssuerCertPath, conf.IssuerKeyPath)
}
func storeSelfhosted(rootCertPem, issuerCertPem, issuerKeyPem []byte, rootCertPath, issuerCertPath, issuerKeyPath string) error {
// 分别将三个内容保存到三个文件中
err := os.WriteFile(rootCertPath, rootCertPem, 0o644)
if err != nil {
return fmt.Errorf("failed saving file to %s: %w", rootCertPath, err)
}
err = os.WriteFile(issuerCertPath, issuerCertPem, 0o644)
if err != nil {
return fmt.Errorf("failed saving file to %s: %w", issuerCertPath, err)
}
err = os.WriteFile(issuerKeyPath, issuerKeyPem, 0o644)
if err != nil {
return fmt.Errorf("failed saving file to %s: %w", issuerKeyPath, err)
}
return nil
}
rootCertPem / issuerCertPem / issuerKeyPem 分别保存到 conf.RootCertPath / conf.IssuerCertPath / conf.IssuerKeyPath 这三个 sentry 配置指定的文件路径中。
回顾一下 main.go 中读取相关配置的代码实现:
const (
defaultCredentialsPath = "/var/run/dapr/credentials"
)
var (
// RootCertFilename is the filename that holds the root certificate.
RootCertFilename = "ca.crt"
// IssuerCertFilename is the filename that holds the issuer certificate.
IssuerCertFilename = "issuer.crt"
// IssuerKeyFilename is the filename that holds the issuer key.
IssuerKeyFilename = "issuer.key"
)
func main() {
......
credsPath := flag.String("issuer-credentials", defaultCredentialsPath, "Path to the credentials directory holding the issuer data")
flag.StringVar(&credentials.RootCertFilename, "issuer-ca-filename", credentials.RootCertFilename, "Certificate Authority certificate filename")
flag.StringVar(&credentials.IssuerCertFilename, "issuer-certificate-filename", credentials.IssuerCertFilename, "Issuer certificate filename")
flag.StringVar(&credentials.IssuerKeyFilename, "issuer-key-filename", credentials.IssuerKeyFilename, "Issuer private key filename")
issuerCertPath := filepath.Join(*credsPath, credentials.IssuerCertFilename)
issuerKeyPath := filepath.Join(*credsPath, credentials.IssuerKeyFilename)
rootCertPath := filepath.Join(*credsPath, credentials.RootCertFilename)
......
config.IssuerCertPath = issuerCertPath
config.IssuerKeyPath = issuerKeyPath
config.RootCertPath = rootCertPath
......
}
可见默认是使用 “/var/run/dapr/credentials” 目录下的这三个文件:
metrics 相关的实现在文件 pkg/sentry/monitoring/metrics.go
中。
定义了一些和 metrics 相关的变量:
var (
// Metrics definitions.
csrReceivedTotal = stats.Int64(
"sentry/cert/sign/request_received_total",
"The number of CSRs received.",
stats.UnitDimensionless)
certSignSuccessTotal = stats.Int64(
"sentry/cert/sign/success_total",
"The number of certificates issuances that have succeeded.",
stats.UnitDimensionless)
certSignFailedTotal = stats.Int64(
"sentry/cert/sign/failure_total",
"The number of errors occurred when signing the CSR.",
stats.UnitDimensionless)
serverTLSCertIssueFailedTotal = stats.Int64(
"sentry/servercert/issue_failed_total",
"The number of server TLS certificate issuance failures.",
stats.UnitDimensionless)
issuerCertChangedTotal = stats.Int64(
"sentry/issuercert/changed_total",
"The number of issuer cert updates, when issuer cert or key is changed",
stats.UnitDimensionless)
issuerCertExpiryTimestamp = stats.Int64(
"sentry/issuercert/expiry_timestamp",
"The unix timestamp, in seconds, when issuer/root cert will expire.",
stats.UnitDimensionless)
// Metrics Tags.
failedReasonKey = tag.MustNewKey("reason")
noKeys = []tag.Key{}
)
目前总共有 6 个 metrics 指标:
初始化 metrics:
func InitMetrics() error {
// 将 6 个 metrics 指标都注册起来
return view.Register(
diagUtils.NewMeasureView(csrReceivedTotal, noKeys, view.Count()),
diagUtils.NewMeasureView(certSignSuccessTotal, noKeys, view.Count()),
diagUtils.NewMeasureView(certSignFailedTotal, []tag.Key{failedReasonKey}, view.Count()),
diagUtils.NewMeasureView(serverTLSCertIssueFailedTotal, []tag.Key{failedReasonKey}, view.Count()),
diagUtils.NewMeasureView(issuerCertChangedTotal, noKeys, view.Count()),
diagUtils.NewMeasureView(issuerCertExpiryTimestamp, noKeys, view.LastValue()),
)
}
CertSignRequestReceived() 对接收到的 csr 数量进行计数:
// CertSignRequestReceived counts when CSR received.
func CertSignRequestReceived() {
stats.Record(context.Background(), csrReceivedTotal.M(1))
}
另外 CertSignSucceed() 会对处理成功的情况进行计数:
func CertSignSucceed() {
stats.Record(context.Background(), certSignSuccessTotal.M(1))
}
而 CertSignFailed() 则会对处理失败的情况进行计数:
func CertSignFailed(reason string) {
stats.RecordWithTags(
context.Background(),
diagUtils.WithTags(certSignFailedTotal.Name(), failedReasonKey, reason),
certSignFailedTotal.M(1))
}
三者的调用点为 server.go 中的 SignCertificate() 函数,这个函数负责处理 csr 请求:
func (s *server) SignCertificate(ctx context.Context, req *sentryv1pb.SignCertificateRequest) (*sentryv1pb.SignCertificateResponse, error) {
// 进来就计数:这是 接收到的 csr 数量
monitoring.CertSignRequestReceived()
......
// 每一个错误在return之前都要进行一次失败计数
if err != nil {
monitoring.CertSignFailed("cert_parse")
return nil, err
}
......
// 如果最后 csr 处理成功,则进行成功计数
monitoring.CertSignSucceed()
return resp, nil
}
IssuerCertExpiry() 方法记录 root cert 有效期的情况:
// IssuerCertExpiry records root cert expiry.
func IssuerCertExpiry(expiry *time.Time) {
stats.Record(context.Background(), issuerCertExpiryTimestamp.M(expiry.Unix()))
}
调用点在 sentry.go 中的 createCAServer() 函数中:
func (s *sentry) createCAServer(ctx context.Context) (ca.CertificateAuthority, identity.Validator) {
certAuth, authorityErr := ca.NewCertificateAuthority(s.conf)
trustStoreErr := certAuth.LoadOrStoreTrustBundle(ctx)
......
certExpiry := certAuth.GetCACertBundle().GetIssuerCertExpiry()
monitoring.IssuerCertExpiry(certExpiry)
......
return certAuth, v
}
在 CA server 的创建过程中,会加载 trust bundle并检查证书的有效期,在这里记录有效期的数据收集。
ServerCertIssueFailed() 记录服务器证书签发失败。
func ServerCertIssueFailed(reason string) {
stats.Record(context.Background(), serverTLSCertIssueFailedTotal.M(1))
}
调用点在 server.go 中:
func (s *server) Run(ctx context.Context, port int, trustBundler ca.TrustRootBundler) error {
......
tlsOpt := s.tlsServerOption(trustBundler)
s.srv = grpc.NewServer(tlsOpt)
......
}
sentry server启动过程中,在启动 grpc server 时,需要获取 tls server 的参数,期间要获取 sentry server 的服务器端证书:
func (s *server) tlsServerOption(trustBundler ca.TrustRootBundler) grpc.ServerOption {
cp := trustBundler.GetTrustAnchors()
config := &tls.Config{
ClientCAs: cp,
// Require cert verification
ClientAuth: tls.RequireAndVerifyClientCert,
GetCertificate: func(*tls.ClientHelloInfo) (*tls.Certificate, error) {
if s.certificate == nil || needsRefresh(s.certificate, serverCertExpiryBuffer) {
cert, err := s.getServerCertificate()
if err != nil {
monitoring.ServerCertIssueFailed("server_cert")
log.Error(err)
return nil, fmt.Errorf("failed to get TLS server certificate: %w", err)
}
s.certificate = cert
}
......
}
如果获取失败,则会记录这个失败信息。
IssuerCertChanged() 记录发行人凭证的变更:
func IssuerCertChanged() {
stats.Record(context.Background(), issuerCertChangedTotal.M(1))
}
调用点在 main.go 中的 main() 函数中,sentry 在启动后会监视发行者证书(默认为 “/var/run/dapr/credentials” 下的 “issuer.crt” 文件):
func main() {
......
func(ctx context.Context) error {
select {
case <-ctx.Done():
return nil
case <-issuerEvent:
monitoring.IssuerCertChanged()
log.Debug("received issuer credentials changed signal")
......
}
......
// Watch for changes in the watchDir
mngr.Add(func(ctx context.Context) error {
log.Infof("starting watch on filesystem directory: %s", watchDir)
return fswatcher.Watch(ctx, watchDir, issuerEvent)
})
}
// Bundle 包含了足以以识别一个跨信任域和命名空间的工作负载的所有的元素:
type Bundle struct {
ID string
Namespace string
TrustDomain string
}
其实就三个元素: ID / Namespace 以及 TrustDomain
NewBundle() 方法返回一个新的 identity Bundle。
func NewBundle(id, namespace, trustDomain string) *Bundle {
// Empty namespace and trust domain result in an empty bundle
// 如果 namespace 或者 trust domain 为空,则返回空的 bundle(nil)
if namespace == "" || trustDomain == "" {
return nil
}
// 否则指示简单的赋值三个属性
return &Bundle{
ID: id,
Namespace: namespace,
TrustDomain: trustDomain,
}
}
namespace和trustDomain是可选参数。当为空时,将返回一个 nil 值。
Validator 通过使用 ID 和 token 来验证证书请求的身份
type Validator interface {
Validate(id, token, namespace string) error
}
CreateSPIFFEID() 方法从给定的 trustDomain, namespace, appID 创建符合 SPIFFE 标准的唯一ID:
func CreateSPIFFEID(trustDomain, namespace, appID string) (string, error) {
// trustDomain, namespace, appID 三者都不能为空
if trustDomain == "" {
return "", errors.New("can't create spiffe id: trust domain is empty")
}
if namespace == "" {
return "", errors.New("can't create spiffe id: namespace is empty")
}
if appID == "" {
return "", errors.New("can't create spiffe id: app id is empty")
}
// 根据 SPIFFE 规范进行验证
// Validate according to the SPIFFE spec
if strings.ContainsRune(trustDomain, ':') {
// trustDomain不能带":"
return "", errors.New("trust domain cannot contain the ':' character")
}
// trustDomain 的长度不能大于255个 byte
if len([]byte(trustDomain)) > 255 {
return "", errors.New("trust domain cannot exceed 255 bytes")
}
// 拼接出 SPIFFE ID
id := fmt.Sprintf("spiffe://%s/ns/%s/%s", trustDomain, namespace, appID)
if len([]byte(id)) > 2048 {
// 验证 SPIFFE ID 长度不大于 2048
return "", errors.New("spiffe id cannot exceed 2048 bytes")
}
return id, nil
}
validator 结构体定义:
type validator struct {
client k8s.Interface
auth kauth.AuthenticationV1Interface
audiences []string
}
NewValidator() 方法创建新的 validator 结构体:
func NewValidator(client k8s.Interface, audiences []string) identity.Validator {
return &validator{
client: client,
auth: client.AuthenticationV1(),
audiences: audiences,
}
}
Validate() 实现通过使用 ID 和 token 来验证证书请求的身份:
func (v *validator) Validate(id, token, namespace string) error {
// id 和 token 不能为空
if id == "" {
return fmt.Errorf("%s: id field in request must not be empty", errPrefix)
}
if token == "" {
return fmt.Errorf("%s: token field in request must not be empty", errPrefix)
}
// TODO: Remove in Dapr 1.12 to enforce setting an explicit audience
var canTryWithNilAudience, showDefaultTokenAudienceWarning bool
audiences := v.audiences
if len(audiences) == 0 {
// 处理用户没有显式设置 audience 的特殊情况
// 此时采用默认是 sentryConsts.ServiceAccountTokenAudience "dapr.io/sentry"
audiences = []string{sentryConsts.ServiceAccountTokenAudience}
// TODO: Remove in Dapr 1.12 to enforce setting an explicit audience
// Because the user did not specify an explicit audience and is instead relying on the default, if the authentication fails we can retry with nil audience
// 并记录下来这是特殊情况,如果认证失败则应该尝试 audience 为 nil 的情况
canTryWithNilAudience = true
}
tokenReview := &kauthapi.TokenReview{
Spec: kauthapi.TokenReviewSpec{
Token: token,
Audiences: audiences,
},
}
tr: // TODO: Remove in Dapr 1.12 to enforce setting an explicit audience
prts, err := v.executeTokenReview(tokenReview)
if err != nil {
// TODO: Remove in Dapr 1.12 to enforce setting an explicit audience
if canTryWithNilAudience {
// Retry with a nil audience, which means the default audience for the K8s API server
tokenReview.Spec.Audiences = nil
showDefaultTokenAudienceWarning = true
canTryWithNilAudience = false
goto tr
}
return err
}
// TODO: Remove in Dapr 1.12 to enforce setting an explicit audience
if showDefaultTokenAudienceWarning {
log.Warn("WARNING: Sentry accepted a token with the audience for the Kubernetes API server. This is deprecated and only supported to ensure a smooth upgrade from Dapr pre-1.10.")
}
if len(prts) != 4 || prts[0] != "system" {
return fmt.Errorf("%s: provided token is not a properly structured service account token", errPrefix)
}
podSa := prts[3]
podNs := prts[2]
// 检验 namespace
if namespace != "" {
if podNs != namespace {
return fmt.Errorf("%s: namespace mismatch. received namespace: %s", errPrefix, namespace)
}
}
// 检验 id
if id != podNs+":"+podSa {
return fmt.Errorf("%s: token/id mismatch. received id: %s", errPrefix, id)
}
return nil
}
executeTokenReview() 方法执行 tokenReview,如果 token 无效或者失败则返回错误:
func (v *validator) executeTokenReview(tokenReview *kauthapi.TokenReview) ([]string, error) {
review, err := v.auth.TokenReviews().Create(context.TODO(), tokenReview, v1.CreateOptions{})
if err != nil {
return nil, fmt.Errorf("%s: token review failed: %w", errPrefix, err)
}
if review.Status.Error != "" {
return nil, fmt.Errorf("%s: invalid token: %s", errPrefix, review.Status.Error)
}
if !review.Status.Authenticated {
return nil, fmt.Errorf("%s: authentication failed", errPrefix)
}
return strings.Split(review.Status.User.Username, ":"), nil
}
selfhosted 下实际没有做验证:
func NewValidator() identity.Validator {
return &validator{}
}
type validator struct{}
func (v *validator) Validate(id, token, namespace string) error {
// no validation for self hosted.
return nil
}
只是保留了一套代码框架,以满足 Validator 接口的要求。
这意味着在 selfhosted 下是不会进行身份验证的。
主要有以下子项目:
定义的项目依赖:
其中 grpc 版本为 1.42.1。
<properties>
<grpc.version>1.42.1</grpc.version>
</properties>
<dependencies>
<dependency>
<groupId>javax.annotation</groupId>
<artifactId>javax.annotation-api</artifactId>
<version>1.3.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty-shaded</artifactId>
<version>${grpc.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-protobuf</artifactId>
<version>${grpc.version}</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-stub</artifactId>
<version>${grpc.version}</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-testing</artifactId>
<version>${grpc.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
两个目录:
<properties>
<protobuf.output.directory>${project.build.directory}/generated-sources</protobuf.output.directory>
<protobuf.input.directory>${project.build.directory}/proto</protobuf.input.directory>
</properties>
download-maven-plugin 用来下载 proto 文件。
插件的功能可以简单理解为:
<plugin>
<groupId>com.googlecode.maven-download-plugin</groupId>
<artifactId>download-maven-plugin</artifactId>
<version>1.6.0</version>
<executions>
<execution>
<id>getCommonProto</id>
<!-- the wget goal actually binds itself to this phase by default -->
<phase>initialize</phase>
<goals>
<goal>wget</goal>
</goals>
<configuration>
<url>${dapr.proto.baseurl}/common/v1/common.proto</url>
<outputFileName>common.proto</outputFileName>
<!-- default target location, just to demonstrate the parameter -->
<outputDirectory>${protobuf.input.directory}/dapr/proto/common/v1</outputDirectory>
</configuration>
</execution>
<execution>
<id>getDaprProto</id>
<!-- the wget goal actually binds itself to this phase by default -->
<phase>initialize</phase>
<goals>
<goal>wget</goal>
</goals>
<configuration>
<url>${dapr.proto.baseurl}/runtime/v1/dapr.proto</url>
<outputFileName>dapr.proto</outputFileName>
<!-- default target location, just to demonstrate the parameter -->
<outputDirectory>${protobuf.input.directory}</outputDirectory>
</configuration>
</execution>
<execution>
<id>getDaprClientProto</id>
<!-- the wget goal actually binds itself to this phase by default -->
<phase>initialize</phase>
<goals>
<goal>wget</goal>
</goals>
<configuration>
<url>${dapr.proto.baseurl}/runtime/v1/appcallback.proto</url>
<outputFileName>appcallback.proto</outputFileName>
<!-- default target location, just to demonstrate the parameter -->
<outputDirectory>${protobuf.input.directory}</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
最关键的地方,protoc-jar-maven-plugin 用于将 proto 文件生成 java 代码。
<plugin>
<groupId>com.github.os72</groupId>
<artifactId>protoc-jar-maven-plugin</artifactId>
<version>3.11.4</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<protocVersion>${protobuf.version}</protocVersion>
<addProtoSources>inputs</addProtoSources>
<includeMavenTypes>direct</includeMavenTypes>
<includeStdTypes>true</includeStdTypes>
<inputDirectories>
<include>${protobuf.input.directory}/dapr/proto/common/v1</include>
<include>${protobuf.input.directory}</include>
</inputDirectories>
<outputTargets>
<outputTarget>
<type>java</type>
<outputDirectory>${protobuf.output.directory}</outputDirectory>
</outputTarget>
<outputTarget>
<type>grpc-java</type>
<outputDirectory>${protobuf.output.directory}</outputDirectory>
<pluginArtifact>io.grpc:protoc-gen-grpc-java:${grpc.version}</pluginArtifact>
</outputTarget>
</outputTargets>
</configuration>
</execution>
</executions>
</plugin>
没啥特殊,只是为自动生成的代码跳过 findbugs
<plugin>
<groupId>com.github.spotbugs</groupId>
<artifactId>spotbugs-maven-plugin</artifactId>
<configuration>
<!-- Skip findbugs for auto-generated code -->
<skip>true</skip>
</configuration>
</plugin>
没啥特殊。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
没啥特殊。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
执行 mvn install 命令,就可以看到代码生成的过程和结果。
download-maven-plugin 插件首先会下载 proto 文件到 target/proto 目录:
之后 protoc-jar-maven-plugin 插件会将这些 proto 文件生成 java 代码:
编译完成之后 proto 文件和 class 文件都被放到 target/classes 目录:
最后被打包为 jar 包,以及对应的 sources 和 javadoc 的 jar:
解开这个jar包,可以看到里面的文件内容和 target/classes 目录里面的内容是一致的:
里面不仅仅有 java classes文件,还有 proto 文件。
dapr proto 文件是来源于 ${dapr.proto.baseurl},通过 wget 命令下载。
而 dapr.proto.baseurl
的定义在 java-sdk 根目录下的 pom.xml 文件中定义:
<dapr.proto.baseurl>https://raw.githubusercontent.com/dapr/dapr/v1.7.0-rc.2/dapr/proto</dapr.proto.baseurl>
这里就涉及到 proto 文件的版本(所在分支 / tag /commit id)。本地开发时如果涉及到 proto 文件的修改,就需要更新这里的 url 地址以对应正确的 proto 文件。反过来说,如果发现根据 proto 生成的代码没有反映出 proto 中新的修改,则应该第一时间检查这个 url 地址的有效性。
https://github.com/dapr/java-sdk#how-to-use-a-custom-serializer
dapr java-sdk 项目的 readme 中有这么一段介绍:
How to use a custom serializer
如何使用一个自定义的序列化器
This SDK provides a basic serialization for request/response objects but also for state objects. Applications should provide their own serialization for production scenarios.
这个SDK为请求/响应对象提供了一个基本的序列化,但也为状态对象提供了序列化。应用程序应该为生产场景提供他们自己的序列化。
DaprObjectSerializer 接口很简单,定义如下:
// 对应用程序的对象进行序列化和反序列化
public interface DaprObjectSerializer {
// 将给定的对象序列化为byte[].
byte[] serialize(Object o) throws IOException;
// 将给定的byte[]反序列化为一个对象。
<T> T deserialize(byte[] data, TypeRef<T> type) throws IOException;
// 返回请求的内容类型
String getContentType();
}
getContentType() 方法获知内容的类型,serialize() 和 deserialize() 分别实现序列化和反序列化,即实现对象和 byte[] 的相互转换。
DefaultObjectSerializer 继承自 ObjectSerializer, serialize 和 deserialize 都只是代理给 ObjectSerializer ,而 getContentType() 方法则 hard code 为返回 “application/json”:
public class DefaultObjectSerializer extends ObjectSerializer implements DaprObjectSerializer {
@Override
public byte[] serialize(Object o) throws IOException {
return super.serialize(o);
}
@Override
public <T> T deserialize(byte[] data, TypeRef<T> type) throws IOException {
return super.deserialize(data, type);
}
@Override
public String getContentType() {
return "application/json";
}
}
public class ObjectSerializer {
// 默认构造函数,以避免类在包外被实例化,但仍可以被继承。
protected ObjectSerializer() {
}
}
protected static final ObjectMapper OBJECT_MAPPER = new ObjectMapper()
.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
.setSerializationInclusion(JsonInclude.Include.NON_NULL);
public byte[] serialize(Object state) throws IOException {
if (state == null) {
return null;
}
if (state.getClass() == Void.class) {
return null;
}
// Have this check here to be consistent with deserialization (see deserialize() method below).
if (state instanceof byte[]) {
return (byte[]) state;
}
// Proto buffer class is serialized directly.
if (state instanceof MessageLite) {
return ((MessageLite) state).toByteArray();
}
// Not string, not primitive, so it is a complex type: we use JSON for that.
return OBJECT_MAPPER.writeValueAsBytes(state);
}
这两个方法都是简单代理:
public <T> T deserialize(byte[] content, TypeRef<T> type) throws IOException {
return deserialize(content, OBJECT_MAPPER.constructType(type.getType()));
}
public <T> T deserialize(byte[] content, Class<T> clazz) throws IOException {
return deserialize(content, OBJECT_MAPPER.constructType(clazz));
}
具体实现在这里:
private <T> T deserialize(byte[] content, JavaType javaType) throws IOException {
// 对应 serialize 的做法
if ((javaType == null) || javaType.isTypeOrSubTypeOf(Void.class)) {
return null;
}
// 如果是 java 基本类型,则交给 deserializePrimitives() 方法处理
// 注意此时 content 有可能是 null 或者 空数组
if (javaType.isPrimitive()) {
return deserializePrimitives(content, javaType);
}
// 对应 serialize 的做法
if (content == null) {
return null;
}
// Deserialization of GRPC response fails without this check since it does not come as base64 encoded byte[].
// 如果没有这个检查,GRPC响应的反序列化就会失败,因为它不是以 base64 编码的 byte[] 形式出现的。
// TBD:这里有点不是太理解
if (javaType.hasRawClass(byte[].class)) {
return (T) content;
}
// // 对应 serialize 的做法,但长度为零的检测放在 byte[] 检测之后
if (content.length == 0) {
return null;
}
// 对 CloudEvent 的支持:如果是 CloudEvent,则单独序列化
if (javaType.hasRawClass(CloudEvent.class)) {
return (T) CloudEvent.deserialize(content);
}
// 对 grpc MessageLite 的支持:通过反射调用 parseFrom 方法
if (javaType.isTypeOrSubTypeOf(MessageLite.class)) {
try {
Method method = javaType.getRawClass().getDeclaredMethod("parseFrom", byte[].class);
if (method != null) {
return (T) method.invoke(null, content);
}
} catch (NoSuchMethodException e) {
// It was a best effort. Skip this try.
} catch (Exception e) {
throw new IOException(e);
}
}
// 最后才通过 jackson 进行标准的 json 序列化
return OBJECT_MAPPER.readValue(content, javaType);
}
对原生类型的解析:
private static <T> T deserializePrimitives(byte[] content, JavaType javaType) throws IOException {
if ((content == null) || (content.length == 0)) {
// content 为null或者空的特殊处理,相当于是缺省值
if (javaType.hasRawClass(boolean.class)) {
return (T) Boolean.FALSE;
}
if (javaType.hasRawClass(byte.class)) {
return (T) Byte.valueOf((byte) 0);
}
if (javaType.hasRawClass(short.class)) {
return (T) Short.valueOf((short) 0);
}
if (javaType.hasRawClass(int.class)) {
return (T) Integer.valueOf(0);
}
if (javaType.hasRawClass(long.class)) {
return (T) Long.valueOf(0L);
}
if (javaType.hasRawClass(float.class)) {
return (T) Float.valueOf(0);
}
if (javaType.hasRawClass(double.class)) {
return (T) Double.valueOf(0);
}
if (javaType.hasRawClass(char.class)) {
return (T) Character.valueOf(Character.MIN_VALUE);
}
return null;
}
// 对于非空值,通过 jackson 进行反序列化
return OBJECT_MAPPER.readValue(content, javaType);
}
这个代码中,在 jackson 处理之前有很多特殊逻辑,这些逻辑理论上应该是独立于 jackson 序列化方案的,如果要引入其他 DaprObjectSerializer 的实现,这些特殊逻辑都要重复 n 次,有代码重复和逻辑不一致的风险。
最好是能把这些逻辑提取出来,在序列化和反序列化时先用这些特殊逻辑出来一遍,最后再交给 DaprObjectSerializer ,会比较合理。
再有就是依赖冲突问题,目前的 DaprObjectSerializer 方案没有给出完整的解决方案。jackson 的依赖还是写死的。
public static final String API_VERSION = "v1.0";
public static final String ALPHA_1_API_VERSION = "v1.0-alpha1";
private static final String HEADER_DAPR_REQUEST_ID = "X-DaprRequestId";
private static final String DEFAULT_HTTP_SCHEME = "http";
private static final Set<String> ALLOWED_CONTEXT_IN_HEADERS =
Collections.unmodifiableSet(new HashSet<>(Arrays.asList("grpc-trace-bin", "traceparent", "tracestate")));
HTTP 方法定义:
public enum HttpMethods {
NONE,
GET,
PUT,
POST,
DELETE,
HEAD,
CONNECT,
OPTIONS,
TRACE
}
public static class Response {
private byte[] body;
private Map<String, String> headers;
private int statusCode;
......
}
private final OkHttpClient httpClient;
private final int port;
private final String hostname;
DaprHttp(String hostname, int port, OkHttpClient httpClient) {
this.hostname = hostname;
this.port = port;
this.httpClient = httpClient;
}
这个方法有多个重载,最终的实现如下,用来执行http调用请求:
/**
* 调用API,返回文本格式有效载荷。
*
* @param method HTTP method.
* @param pathSegments Array of path segments (/a/b/c -> ["a", "b", "c"]).
* @param urlParameters Parameters in the URL
* @param content payload to be posted.
* @param headers HTTP headers.
* @param context OpenTelemetry's Context.
* @return CompletableFuture for Response.
*/
private CompletableFuture<Response> doInvokeApi(String method,
String[] pathSegments,
Map<String, List<String>> urlParameters,
byte[] content, Map<String, String> headers,
Context context) {
// 方法人口参数基本就是一个非常简化的HTTP请求的格式抽象
// 取 UUID 为 requestId
final String requestId = UUID.randomUUID().toString();
RequestBody body;
//组装 okhttp3 的 request
String contentType = headers != null ? headers.get(Metadata.CONTENT_TYPE) : null;
MediaType mediaType = contentType == null ? MEDIA_TYPE_APPLICATION_JSON : MediaType.get(contentType);
if (content == null) {
body = mediaType.equals(MEDIA_TYPE_APPLICATION_JSON)
? REQUEST_BODY_EMPTY_JSON
: RequestBody.Companion.create(new byte[0], mediaType);
} else {
body = RequestBody.Companion.create(content, mediaType);
}
HttpUrl.Builder urlBuilder = new HttpUrl.Builder();
urlBuilder.scheme(DEFAULT_HTTP_SCHEME)
.host(this.hostname)
.port(this.port);
for (String pathSegment : pathSegments) {
urlBuilder.addPathSegment(pathSegment);
}
Optional.ofNullable(urlParameters).orElse(Collections.emptyMap()).entrySet().stream()
.forEach(urlParameter ->
Optional.ofNullable(urlParameter.getValue()).orElse(Collections.emptyList()).stream()
.forEach(urlParameterValue ->
urlBuilder.addQueryParameter(urlParameter.getKey(), urlParameterValue)));
Request.Builder requestBuilder = new Request.Builder()
.url(urlBuilder.build())
.addHeader(HEADER_DAPR_REQUEST_ID, requestId);
if (context != null) {
context.stream()
.filter(entry -> ALLOWED_CONTEXT_IN_HEADERS.contains(entry.getKey().toString().toLowerCase()))
.forEach(entry -> requestBuilder.addHeader(entry.getKey().toString(), entry.getValue().toString()));
}
if (HttpMethods.GET.name().equals(method)) {
requestBuilder.get();
} else if (HttpMethods.DELETE.name().equals(method)) {
requestBuilder.delete();
} else {
requestBuilder.method(method, body);
}
String daprApiToken = Properties.API_TOKEN.get();
if (daprApiToken != null) {
requestBuilder.addHeader(Headers.DAPR_API_TOKEN, daprApiToken);
}
if (headers != null) {
Optional.ofNullable(headers.entrySet()).orElse(Collections.emptySet()).stream()
.forEach(header -> {
requestBuilder.addHeader(header.getKey(), header.getValue());
});
}
// 完成 request 的组装,构建 request 对象
Request request = requestBuilder.build();
// 发出 okhttp3 的请求,然后返回 CompletableFuture
CompletableFuture<Response> future = new CompletableFuture<>();
this.httpClient.newCall(request).enqueue(new ResponseFutureCallback(future));
return future;
}
在 http 请求组装过程中,注意 header 的处理:
代码没啥特殊的,就注意一下 okhttp 的一些参数的获取。
另外 MaxRequestsPerHost 默认为5,这是一个超级大坑!
private DaprHttp buildDaprHttp() {
// 双重检查锁
if (OK_HTTP_CLIENT == null) {
synchronized (LOCK) {
if (OK_HTTP_CLIENT == null) {
OkHttpClient.Builder builder = new OkHttpClient.Builder();
Duration readTimeout = Duration.ofSeconds(Properties.HTTP_CLIENT_READ_TIMEOUT_SECONDS.get());
builder.readTimeout(readTimeout);
Dispatcher dispatcher = new Dispatcher();
dispatcher.setMaxRequests(Properties.HTTP_CLIENT_MAX_REQUESTS.get());
//这里有一个超级大坑!
// The maximum number of requests for each host to execute concurrently.
// Default value is 5 in okhttp which is totally UNACCEPTABLE!
// For sidecar case, set it the same as maxRequests.
dispatcher.setMaxRequestsPerHost(Properties.HTTP_CLIENT_MAX_REQUESTS.get());
builder.dispatcher(dispatcher);
ConnectionPool pool = new ConnectionPool(Properties.HTTP_CLIENT_MAX_IDLE_CONNECTIONS.get(),
KEEP_ALIVE_DURATION, TimeUnit.SECONDS);
builder.connectionPool(pool);
OK_HTTP_CLIENT = builder.build();
}
}
}
return new DaprHttp(Properties.SIDECAR_IP.get(), Properties.HTTP_PORT.get(), OK_HTTP_CLIENT);
}
}
相关的几个参数的获取:
对于 okhttp,还必须自动设置 http max request per host 参数,不然默认值为 5 对于 sidecar 来说完全不可用。
// 无论需要何种GRPC或HTTP客户端实现,都可以使用通用客户端适配器。
public interface DaprClient extends AutoCloseable {
Mono<Void> waitForSidecar(int timeoutInMilliseconds);
Mono<Void> shutdown();
}
其他方法都是和 dapr api 相关的方法,然后所有的方法都是实现了 reactive 风格,如:
Mono<Void> publishEvent(String pubsubName, String topicName, Object data);
Mono<Void> publishEvent(String pubsubName, String topicName, Object data, Map<String, String> metadata);
Mono<Void> publishEvent(PublishEventRequest request);
DaprPreviewClient 接口定义,目前只有新增的 configuration api 的方法和 state query 的方法:
// 无论需要何种GRPC或HTTP客户端实现,都可以使用通用客户端适配器。
public interface DaprPreviewClient extends AutoCloseable {
Mono<ConfigurationItem> getConfiguration(String storeName, String key);
Flux<List<ConfigurationItem>> subscribeToConfiguration(String storeName, String... keys);
<T> Mono<QueryStateResponse<T>> queryState(String storeName, String query, TypeRef<T> type);
}
备注:distribuyted lock 的方法还没有加上来,估计是还没有开始实现。
// 抽象类,具有客户端实现之间共同的便利方法。
abstract class AbstractDaprClient implements DaprClient, DaprPreviewClient {
// 这里还是写死了 jackson!
// TBD: 看下是哪里在用
protected static final ObjectMapper JSON_REQUEST_MAPPER = new ObjectMapper();
protected DaprObjectSerializer objectSerializer;
protected DaprObjectSerializer stateSerializer;
AbstractDaprClient(
DaprObjectSerializer objectSerializer,
DaprObjectSerializer stateSerializer) {
this.objectSerializer = objectSerializer;
this.stateSerializer = stateSerializer;
}
}
其他都方法实现基本都是一些代理方法,没有实质性内容,实际实现都应该在子类中实现。
@Override
public Mono<Void> publishEvent(String pubsubName, String topicName, Object data) {
return this.publishEvent(pubsubName, topicName, data, null);
}
@Override
public Mono<Void> publishEvent(String pubsubName, String topicName, Object data, Map<String, String> metadata) {
PublishEventRequest req = new PublishEventRequest(pubsubName, topicName, data)
.setMetadata(metadata);
return this.publishEvent(req).then();
}
这些方法重载可以理解成一些语法糖,可以不用构造复杂的请求对象如 PublishEventRequest 就可以方便的直接使用而已。
public class DaprClientHttp extends AbstractDaprClient {
private final DaprHttp client;
private final boolean isObjectSerializerDefault;
private final boolean isStateSerializerDefault;
DaprClientHttp(DaprHttp client, DaprObjectSerializer objectSerializer, DaprObjectSerializer stateSerializer) {
super(objectSerializer, stateSerializer);
this.client = client;
this.isObjectSerializerDefault = objectSerializer.getClass() == DefaultObjectSerializer.class;
this.isStateSerializerDefault = stateSerializer.getClass() == DefaultObjectSerializer.class;
}
DaprClientHttp(DaprHttp client) {
this(client, new DefaultObjectSerializer(), new DefaultObjectSerializer());
}
}
waitForSidecar() 方法通过连接指定的 sidecar ip地址和端口来判断并等待 sidecar 是不是可用。
public Mono<Void> waitForSidecar(int timeoutInMilliseconds) {
return Mono.fromRunnable(() -> {
try {
NetworkUtils.waitForSocket(Properties.SIDECAR_IP.get(), Properties.HTTP_PORT.get(), timeoutInMilliseconds);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
});
}
close() 方法是实现 java.lang.AutoCloseable 的要求,DaprClient 继承了这个接口:
@Override
public void close() {
// 简单的关闭 http client
client.close();
}
publishEvent()方法主要是两个任务:
@Override
public Mono<Void> publishEvent(PublishEventRequest request) {
try {
String pubsubName = request.getPubsubName();
String topic = request.getTopic();
Object data = request.getData();
Map<String, String> metadata = request.getMetadata();
if (topic == null || topic.trim().isEmpty()) {
throw new IllegalArgumentException("Topic name cannot be null or empty.");
}
byte[] serializedEvent = objectSerializer.serialize(data);
// Content-type can be overwritten on a per-request basis.
// It allows CloudEvents to be handled differently, for example.
String contentType = request.getContentType();
if (contentType == null || contentType.isEmpty()) {
contentType = objectSerializer.getContentType();
}
Map<String, String> headers = Collections.singletonMap("content-type", contentType);
String[] pathSegments = new String[]{ DaprHttp.API_VERSION, "publish", pubsubName, topic };
Map<String, List<String>> queryArgs = metadataToQueryArgs(metadata);
return Mono.subscriberContext().flatMap(
context -> this.client.invokeApi(
DaprHttp.HttpMethods.POST.name(), pathSegments, queryArgs, serializedEvent, headers, context
)
).then();
} catch (Exception ex) {
return DaprException.wrapMono(ex);
}
}
注意这个 shutdown() 方法是关闭 sidecar,因此也是需要发送请求到 sidecar 的:
@Override
public Mono<Void> shutdown() {
String[] pathSegments = new String[]{ DaprHttp.API_VERSION, "shutdown" };
return Mono.subscriberContext().flatMap(
context -> client.invokeApi(DaprHttp.HttpMethods.POST.name(), pathSegments,
null, null, context))
.then();
}
http 请求最终是通过 DaprClient 发出去的。
public class DaprClientGrpc extends AbstractDaprClient {
private Closeable channel;
private DaprGrpc.DaprStub asyncStub;
DaprClientGrpc(
Closeable closeableChannel,
DaprGrpc.DaprStub asyncStub,
DaprObjectSerializer objectSerializer,
DaprObjectSerializer stateSerializer) {
super(objectSerializer, stateSerializer);
this.channel = closeableChannel;
this.asyncStub = intercept(asyncStub);
}
}
waitForSidecar() 方法通过连接指定的 sidecar ip地址和端口来判断并等待 sidecar 是不是可用。
和 HTTP 的实现差别只是端口不同。
@Override
public Mono<Void> waitForSidecar(int timeoutInMilliseconds) {
return Mono.fromRunnable(() -> {
try {
NetworkUtils.waitForSocket(Properties.SIDECAR_IP.get(), Properties.GRPC_PORT.get(), timeoutInMilliseconds);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
});
}
close() 方法是实现 java.lang.AutoCloseable 的要求,DaprClient 继承了这个接口:
public void close() throws Exception {
if (channel != null) {
DaprException.wrap(() -> {
// 关闭channel
channel.close();
return true;
}).call();
}
}
publishEvent()方法主要是两个任务:
@Override
public Mono<Void> publishEvent(PublishEventRequest request) {
try {
String pubsubName = request.getPubsubName();
String topic = request.getTopic();
Object data = request.getData();
DaprProtos.PublishEventRequest.Builder envelopeBuilder = DaprProtos.PublishEventRequest.newBuilder()
.setTopic(topic)
.setPubsubName(pubsubName)
.setData(ByteString.copyFrom(objectSerializer.serialize(data)));
// Content-type can be overwritten on a per-request basis.
// It allows CloudEvents to be handled differently, for example.
String contentType = request.getContentType();
if (contentType == null || contentType.isEmpty()) {
contentType = objectSerializer.getContentType();
}
envelopeBuilder.setDataContentType(contentType);
Map<String, String> metadata = request.getMetadata();
if (metadata != null) {
envelopeBuilder.putAllMetadata(metadata);
}
return Mono.subscriberContext().flatMap(
context ->
this.<Empty>createMono(
it -> intercept(context, asyncStub).publishEvent(envelopeBuilder.build(), it)
)
).then();
} catch (Exception ex) {
return DaprException.wrapMono(ex);
}
}
@topic
注解用来订阅某个主题, pubsubName, name, metadata 分别对应 dapr pub/sub API 中的 pubsubName, topic,metadata 字段:
@Documented
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface Topic {
String name();
String pubsubName();
String metadata() default "{}";
// 用于匹配传入的 cloud event 的规则。
Rule rule() default @Rule(match = "", priority = 0);
}
以下是 @topic 注解使用的典型例子:
@Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}")
@PostMapping(path = "/testingtopic")
public Mono<Void> handleMessage(@RequestBody(required = false) CloudEvent<?> cloudEvent) {
......
}
@topic
注解用来表述匹配规则。
@Documented
@Target(ElementType.ANNOTATION_TYPE)
@Retention(RetentionPolicy.RUNTIME)
public @interface Rule {
// 用于匹配传入的 cloud event 的通用表达式语言( Common Expression Language / CEL)表达。
String match();
// 规则的优先级,用于排序。最低的数字有更高的优先权。
int priority();
}
以下是 @rule 注解使用的典型例子:
@Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}",
rule = @Rule(match = "event.type == \"v2\"", priority = 1))
@PostMapping(path = "/testingtopicV2")
public Mono<Void> handleMessageV2(@RequestBody(required = false) CloudEvent cloudEvent) {
......
}
按照 springboot 的标准做法,src/main/resources/META-INF/spring.factories
文件内容如下:
org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
io.dapr.springboot.DaprAutoConfiguration
DaprAutoConfiguration 的内容非常简单:
@Configuration
@ConditionalOnWebApplication
@ComponentScan("io.dapr.springboot")
public class DaprAutoConfiguration {
}
DaprBeanPostProcessor 用来处理 dapr 注解。
@Component
public class DaprBeanPostProcessor implements BeanPostProcessor {
private static final ObjectMapper MAPPER = new ObjectMapper();
private final EmbeddedValueResolver embeddedValueResolver;
DaprBeanPostProcessor(ConfigurableBeanFactory beanFactory) {
embeddedValueResolver = new EmbeddedValueResolver(beanFactory);
}
......
}
BeanPostProcessor 接口的 postProcessBeforeInitialization() 的说明如下:
在任何 Bean 初始化回调(如 InitializingBean 的
afterPropertiesSet
或自定义init-method
)之前,将此 BeanPostProcessor 应用于给定的新 Bean 实例。 该 bean 将已经被填充了属性值。返回的 Bean 实例可能是一个围绕原始 Bean 的包装器。
也就是每个 bean 在初始化后都会调用这个方法以便植入我们需要的逻辑,如在这里就需要扫描 bean 是否带有 dapr 的 topic 注解:
@Override
public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException {
if (bean == null) {
return null;
}
subscribeToTopics(bean.getClass(), embeddedValueResolver);
return bean;
}
subscribeToTopics() 方法的具体实现后面再详细看,期间还有规则匹配的实现代码。
postProcessAfterInitialization() 方法没有特殊逻辑,简单返回原始bean:
@Override
public Object postProcessAfterInitialization(Object bean, String beanName) throws BeansException {
return bean;
}
@RestController
public class DaprController {
}
用于 health check 的 endpoint,路径为 “/healthz”,实现为空。
@GetMapping(path = "/healthz")
public void healthz() {
}
TBD:这里是否要考虑 sidecar 的某些状态?目前这是只要 sidecar 进程和端口可以访问就会应答状态OK,而不管sidecar 中的功能是否正常。
用于获取 dapr sidecar 的自身配置, 路径为 “/dapr/config”
@GetMapping(path = "/dapr/config", produces = MediaType.APPLICATION_JSON_VALUE)
public byte[] daprConfig() throws IOException {
return ActorRuntime.getInstance().serializeConfig();
}
但看 ActorRuntime 的代码实现,这个 config 是指 actor configuration:
public byte[] serializeConfig() throws IOException {
return INTERNAL_SERIALIZER.serialize(this.config);
}
private ActorRuntime(ManagedChannel channel, DaprClient daprClient) throws IllegalStateException {
this.config = new ActorRuntimeConfig();
}
用于获取当前 dapr sidecar 的 pub/sub 订阅信息,路径为 “/dapr/subscribe”:
@GetMapping(path = "/dapr/subscribe", produces = MediaType.APPLICATION_JSON_VALUE)
public byte[] daprSubscribe() throws IOException {
return SERIALIZER.serialize(DaprRuntime.getInstance().listSubscribedTopics());
}
用于 actor 的 endpoint,包括 deactive, invoke actor method, invoke actor timer 和 invoke actor reminder:
@DeleteMapping(path = "/actors/{type}/{id}")
public Mono<Void> deactivateActor(@PathVariable("type") String type,
@PathVariable("id") String id) {
return ActorRuntime.getInstance().deactivate(type, id);
}
@PutMapping(path = "/actors/{type}/{id}/method/{method}")
public Mono<byte[]> invokeActorMethod(@PathVariable("type") String type,
@PathVariable("id") String id,
@PathVariable("method") String method,
@RequestBody(required = false) byte[] body) {
return ActorRuntime.getInstance().invoke(type, id, method, body);
}
@PutMapping(path = "/actors/{type}/{id}/method/timer/{timer}")
public Mono<Void> invokeActorTimer(@PathVariable("type") String type,
@PathVariable("id") String id,
@PathVariable("timer") String timer,
@RequestBody byte[] body) {
return ActorRuntime.getInstance().invokeTimer(type, id, timer, body);
}
@PutMapping(path = "/actors/{type}/{id}/method/remind/{reminder}")
public Mono<Void> invokeActorReminder(@PathVariable("type") String type,
@PathVariable("id") String id,
@PathVariable("reminder") String reminder,
@RequestBody(required = false) byte[] body) {
return ActorRuntime.getInstance().invokeReminder(type, id, reminder, body);
}
订阅 topic 的具体代码实现在类 DaprBeanPostProcessor 的 subscribeToTopics() 方法中,在 bean 初始化时被调用。
topic 注解使用的例子如下:
@Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}",
rule = @Rule(match = "event.type == \"v2\"", priority = 1))
@PostMapping(path = "/testingtopicV2")
public Mono<Void> handleMessageV2(@RequestBody(required = false) CloudEvent cloudEvent) {
......
}
现在需要在 postProcessBeforeInitialization() 方法中扫描并解析所有有 topic 注解的 bean:
@Override
public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException {
subscribeToTopics(bean.getClass(), embeddedValueResolver);
return bean;
}
private static void subscribeToTopics(Class clazz, EmbeddedValueResolver embeddedValueResolver) {
if (clazz == null) {
return;
}
// 先用 Superclass 做一次递归调用,这样就会从当前类的父类开始先推衍
// 由于每次都是父类先执行,因此这会一直递归到最顶层的 Object 类
subscribeToTopics(clazz.getSuperclass(), embeddedValueResolver);
// 取当前类的所有方法
for (Method method : clazz.getDeclaredMethods()) {
// 然后看方法上是不是标记了 dapr 的 topic 注解
Topic topic = method.getAnnotation(Topic.class);
if (topic == null) {
continue;
}
// 如果方法上有标记 dapr 的 topic 注解,则开始处理
// 先获取 topic 注解上的属性 topic name, pubsub name, rule
Rule rule = topic.rule();
String topicName = embeddedValueResolver.resolveStringValue(topic.name());
String pubSubName = embeddedValueResolver.resolveStringValue(topic.pubsubName());
// rule 也是一个注解,获取 match 属性
String match = embeddedValueResolver.resolveStringValue(rule.match());
if ((topicName != null) && (topicName.length() > 0) && pubSubName != null && pubSubName.length() > 0) {
// topicName 和 pubSubName 不能为空 (metadata 可以为空,rule可以为空)
try {
TypeReference<HashMap<String, String>> typeRef
= new TypeReference<HashMap<String, String>>() {};
// 读取 topic 注解上的 metadata 属性
Map<String, String> metadata = MAPPER.readValue(topic.metadata(), typeRef);
// 读取路由信息,细节看下一节
List<String> routes = getAllCompleteRoutesForPost(clazz, method, topicName);
for (String route : routes) {
// 将读取的路由信息添加到 dapr runtime 中。
// 细节看下一节
DaprRuntime.getInstance().addSubscribedTopic(
pubSubName, topicName, match, rule.priority(), route, metadata);
}
} catch (JsonProcessingException e) {
throw new IllegalArgumentException("Error while parsing metadata: " + e);
}
}
}
}
路由信息配置方法如下:
@Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}",
rule = @Rule(match = "event.type == \"v2\"", priority = 1))
@PostMapping(path = "/testingtopicV2")
public Mono<Void> handleMessageV2(@RequestBody(required = false) CloudEvent cloudEvent) {
......
}
getAllCompleteRoutesForPost() 方法负责读取 @rule 注解相关的路由信息:
private static List<String> getAllCompleteRoutesForPost(Class clazz, Method method, String topicName) {
List<String> routesList = new ArrayList<>();
RequestMapping clazzRequestMapping =
(RequestMapping) clazz.getAnnotation(RequestMapping.class);
String[] clazzLevelRoute = null;
if (clazzRequestMapping != null) {
clazzLevelRoute = clazzRequestMapping.value();
}
// 读取该方法上的路由信息,注意必须是 POST
String[] postValueArray = getRoutesForPost(method, topicName);
if (postValueArray != null && postValueArray.length >= 1) {
for (String postValue : postValueArray) {
if (clazzLevelRoute != null && clazzLevelRoute.length >= 1) {
for (String clazzLevelValue : clazzLevelRoute) {
// 完整的路由路径应该是类级别 + 方法级别
String route = clazzLevelValue + confirmLeadingSlash(postValue);
routesList.add(route);
}
} else {
routesList.add(postValue);
}
}
}
return routesList;
}
getRoutesForPost() 方法用来读取 @topic 注解所在方法的 @PostMapping 注解,以便获得路由的 path 信息,对应例子如下:
@Topic(name = "testingtopic", pubsubName = "${myAppProperty:messagebus}",
rule = @Rule(match = "event.type == \"v2\"", priority = 1))
@PostMapping(path = "/testingtopicV2")
public Mono<Void> handleMessageV2(@RequestBody(required = false) CloudEvent cloudEvent) {
......
}
getRoutesForPost() 方法的代码实现如下:
private static String[] getRoutesForPost(Method method, String topicName) {
String[] postValueArray = new String[] {topicName};
// 读取 PostMapping 注解
PostMapping postMapping = method.getAnnotation(PostMapping.class);
if (postMapping != null) {
// 如果有 PostMapping 注解
if (postMapping.path() != null && postMapping.path().length >= 1) {
// 如果 path 属性有设置则从 path 属性取值
postValueArray = postMapping.path();
} else if (postMapping.value() != null && postMapping.value().length >= 1) {
// 如果 path 属性没有设置则直接从 PostMapping 注解的 value 中取值
postValueArray = postMapping.value();
}
} else {
// 如果没有 PostMapping 注解,则尝试读取 RequestMapping 注解
RequestMapping reqMapping = method.getAnnotation(RequestMapping.class);
for (RequestMethod reqMethod : reqMapping.method()) {
// 要求 RequestMethod 为 POST
if (reqMethod == RequestMethod.POST) {
// 同样读取 path 或者 value 的值
if (reqMapping.path() != null && reqMapping.path().length >= 1) {
postValueArray = reqMapping.path();
} else if (reqMapping.value() != null && reqMapping.value().length >= 1) {
postValueArray = reqMapping.value();
}
break;
}
}
}
return postValueArray;
}
getRoutesForPost() 方法的解读,就是从标记了 @topic 注解的方法上读取路由信息,也就是后续订阅的事件应该发送的地址。读取的逻辑为:
topic 订阅信息在读取之后,就会通过 DaprRuntime 的 addSubscribedTopic() 方法保存起来:
public synchronized void addSubscribedTopic(String pubsubName,
String topicName,
String match,
int priority,
String route,
Map<String,String> metadata) {
// 用 pubsubName 和 topicName 做 key
DaprTopicKey topicKey = new DaprTopicKey(pubsubName, topicName);
// 获取 key 对应的 builder,没有的话就创建一个
DaprSubscriptionBuilder builder = subscriptionBuilders.get(topicKey);
if (builder == null) {
builder = new DaprSubscriptionBuilder(pubsubName, topicName);
subscriptionBuilders.put(topicKey, builder);
}
// match 不为空则添加 rule,为空则采用默认路径
if (match.length() > 0) {
builder.addRule(route, match, priority);
} else {
builder.setDefaultPath(route);
}
if (metadata != null && !metadata.isEmpty()) {
builder.setMetadata(metadata);
}
}
考虑到调用的地方代码是:
// 读取路由信息
List<String> routes = getAllCompleteRoutesForPost(clazz, method, topicName);
for (String route : routes) {
// 将读取的路由信息添加到 dapr runtime 中。
DaprRuntime.getInstance().addSubscribedTopic(
pubSubName, topicName, match, rule.priority(), route, metadata);
}
所以前面的读取流程可以理解为就是读取和 topic 订阅有关的上述6个参数,然后保存起老。
在 DaprController 中,daprSubscribe() 方法对外暴露路径 /dapr/subscribe
,以便让 dapr sidecar 可以通过读取该路径来获取当前应用的 topic 订阅信息:
@GetMapping(path = "/dapr/subscribe", produces = MediaType.APPLICATION_JSON_VALUE)
public byte[] daprSubscribe() throws IOException {
return SERIALIZER.serialize(DaprRuntime.getInstance().listSubscribedTopics());
}
而 DaprRuntime 的 listSubscribedTopics() 方法获取的就是前面保存起来的 topic 订阅信息:
public synchronized DaprTopicSubscription[] listSubscribedTopics() {
List<DaprTopicSubscription> values = subscriptionBuilders.values().stream()
.map(b -> b.build()).collect(Collectors.toList());
return values.toArray(new DaprTopicSubscription[0]);
}
整个 topic 订阅流程的示意图如下:
title topic subscription
hide footbox
skinparam style strictuml
box "Application" #LightBlue
participant DaprBeanPostProcessor
participant bean
participant DaprRuntime
participant DaprController
end box
participant daprd
-> DaprBeanPostProcessor: postProcessBeforeInitialization(bean)
DaprBeanPostProcessor -> bean: get @topic
bean --> DaprBeanPostProcessor
alt if bean has @topic
DaprBeanPostProcessor -> bean: parse @topic @rule
bean --> DaprBeanPostProcessor: pubsub name, topic name, match,\n priority, routes, metadata
DaprBeanPostProcessor -> DaprRuntime: addSubscribedTopic()
DaprRuntime -> DaprRuntime: save in map\n subscriptionBuilders
DaprRuntime --> DaprBeanPostProcessor
end
<-- DaprBeanPostProcessor
daprd -> DaprController: get subscription
DaprController -> DaprRuntime: listSubscribedTopics()
DaprRuntime --> DaprController
DaprController --> daprd
Workflow 定义定义很简单:
public abstract class Workflow {
// 默认构造函数应该可以不用写的
public Workflow(){
}
public abstract WorkflowStub create();
public void run(WorkflowContext ctx) {
this.create().run(ctx);
}
}
create() 方法定义创建 WorkflowStub 的模板方法,然后在 run() 方法通过执行 create() 方法创建 WorkflowStub ,在执行 WorkflowStub 的 run() 方法。
WorkflowStub 是一个单方法的接口定义,用于实现函数编程,标注有 java.lang.@FunctionalInterface
注解。
@FunctionalInterface
public interface WorkflowStub {
void run(WorkflowContext ctx);
}
@FunctionalInterface 的 javadoc 描述如下:
一种信息性注解类型,用于表明接口类型声明是 Java 语言规范所定义的函数接口。从概念上讲,一个函数接口只有一个抽象方法。由于默认方法有一个实现,所以它们不是抽象方法。如果一个接口声明了一个覆盖 java.lang.Object 公共方法之一的抽象方法,该方法也不计入接口的抽象方法数,因为接口的任何实现都将有一个来自 java.lang.Object 或其他地方的实现。
请注意,函数接口的实例可以通过 lambda 表达式、方法引用或构造器引用来创建。
如果一个类型被注释为该注释类型,编译器必须生成一条错误信息,除非:
- 该类型是接口类型,而不是注解类型、枚举或类。
- 注解的类型满足函数接口的要求。
然而,无论接口声明中是否有 FunctionalInterface 注解,编译器都会将任何符合函数接口定义的接口视为函数接口。
出乎意外的是 WorkflowContext 的定义超级复杂,远远不是一个 上下文 那么简单。
WorkflowContext 接口上定义了大量的方法,其中部分基本方法
public interface WorkflowContext {
// 通过这个方法传递 logger 对象以供在后续执行时打印日志
Logger getLogger();
// 获取 workflow 的 name
String getName();
// 获取 workflow instance 的 id
String getInstanceId();
//获取当前协调时间(UTC)
Instant getCurrentInstant();
// 完成当前 wofklow,输出是完成的workflow的序列化输出
void complete(Object output);
......
}
WorkflowContext 接口上定义了三个 waitForExternalEvent() 接口方法和一个默认实现:
public interface WorkflowContext {
......
<V> Task<V> waitForExternalEvent(String name, Duration timeout, Class<V> dataType) throws TaskCanceledException;
<V> Task<Void> waitForExternalEvent(String name, Duration timeout) throws TaskCanceledException;
<V> Task<Void> waitForExternalEvent(String name) throws TaskCanceledException;
default <V> Task<V> waitForExternalEvent(String name, Class<V> dataType) {
try {
return this.waitForExternalEvent(name, null, dataType);
} catch (TaskCanceledException e) {
// This should never happen because of the max duration
throw new RuntimeException("An unexpected exception was throw while waiting for an external event.", e);
}
}
......
}
waitForExternalEvent 的 javadoc 描述如下:
等待名为 name 的事件发生,并返回一个 Task,该任务在收到事件时完成,或在超时时取消。
如果当前协调器尚未等待名为 name 的事件,那么事件将保存在协调器实例状态中,并在调用此方法时立即派发。即使当前协调器在收到事件前取消了等待操作,事件保存也会发生。
协调器可以多次等待同一事件名,因此允许等待多个同名事件。协调器收到的每个外部事件将只完成本方法返回的一个任务。
特别注意: 这个 Task 的类型是 com.microsoft.durabletask.Task
,直接用在 dapr workflow 的接口定义上,意味着 dapr workflow 彻底和 durabletask 绑定。
WorkflowContext 接口上定义了 callActivity() 接口方法和多个默认方法来重写不同参数的 callActivity() 方法
public interface WorkflowContext {
......
<V> Task<V> callActivity(String name, Object input, TaskOptions options, Class<V> returnType);
default Task<Void> callActivity(String name) {
return this.callActivity(name, null, null, Void.class);
}
default Task<Void> callActivity(String name, Object input) {
return this.callActivity(name, input, null, Void.class);
}
default <V> Task<V> callActivity(String name, Class<V> returnType) {
return this.callActivity(name, null, null, returnType);
}
default <V> Task<V> callActivity(String name, Object input, Class<V> returnType) {
return this.callActivity(name, input, null, returnType);
}
default Task<Void> callActivity(String name, Object input, TaskOptions options) {
return this.callActivity(name, input, options, Void.class);
}
......
}
waitForExternalEvent 的 javadoc 描述如下:
使用指定的 input 异步调用一个 activity,并在 activity 完成时返回一个新的 task。如果 activity 成功完成,返回的 task 值将是 task 的输出。如果 activity 失败,返回的 task 将以 TaskFailedException 异常完成。
isReplaying() 用来判断当前工作流当前是否正在重放之前的执行:
public interface WorkflowContext {
......
boolean isReplaying();
}
waitForExternalEvent 的 javadoc 描述如下:
获取一个值,指示工作流当前是否正在重放之前的执行。
工作流函数从内存中卸载后会进行 “重放”,以重建本地变量状态。在重放过程中,先前执行的任务将自动使用存储在工作流历史记录中的先前查看值完成。一旦工作流达到不再重放现有历史记录的程度,此方法将返回 false。
如果您的逻辑只需要在不重放时运行,则可以使用此方法。例如,某些类型的应用程序日志在作为重放的一部分进行复制时可能会变得过于嘈杂。应用程序代码可以检查函数是否正在重放,然后在该值为 false 时发出日志语句。
<V> Task<List<V>> allOf(List<Task<V>> tasks) throws CompositeTaskFailedException;
Task<Task<?>> anyOf(List<Task<?>> tasks);
default Task<Task<?>> anyOf(Task<?>... tasks) {
return this.anyOf(Arrays.asList(tasks));
}
allOf 的 javadoc 描述如下:
返回一个新任务,该任务在所有给定任务完成后完成。如果任何给定任务在完成时出现异常,返回的任务也会在完成时出现 CompositeTaskFailedException,其中包含第一次遇到的故障的详细信息。返回的任务值是给定任务返回值的有序列表。如果没有提供任务,则返回值为空的已完成任务。
该方法适用于在继续协调的下一步之前等待一组独立任务的完成,如下面的示例:
Task
t1 = ctx.callActivity(“MyActivity”, String.class); Task t2 = ctx.callActivity(“MyActivity”, String.class); Task t3 = ctx.callActivity(“MyActivity”, String.class); List
orderedResults = ctx.allOf(List.of(t1, t2, t3)).await(); 任何给定任务出现异常都会导致非受查的 CompositeTaskFailedException 异常。可以通过检查该异常来获取单个任务的失败详情。
try { List
orderedResults = ctx.allOf(List.of(t1, t2, t3)).await(); } catch (CompositeTaskFailedException e) { List exceptions = e.getExceptions() } }
特别注意: 这个 CompositeTaskFailedException 的类型是 com.microsoft.durabletask.CompositeTaskFailedException
,直接用在 dapr workflow 的接口定义上,意味着 dapr workflow 彻底和 durabletask 绑定。
anyOf 的 javadoc 描述如下:
当任何给定任务完成时,返回一个已完成的新任务。新任务的值是已完成任务对象的引用。如果没有提供任务,则返回一个永不完成的任务。
该方法适用于等待多个并发任务,并在第一个任务完成时执行特定于任务的操作,如下面的示例:
Task
event1 = ctx.waitForExternalEvent(“Event1”); Task event2 = ctx.waitForExternalEvent(“Event2”); Task event3 = ctx.waitForExternalEvent(“Event3”); Task> winner = ctx.anyOf(event1、event2、event3).await(); 如果(winner == event1){ // … } else if (winner == event2) { // … // … } else if (winner == event3) { // … // … }
anyOf 方法还可用于实现长时间超时,如下面的示例:
Task
activityTask = ctx.callActivity(“SlowActivity”); Task timeoutTask = ctx.createTimer(Duration.ofMinutes(30)); Task> winner = ctx.anyOf(activityTask, timeoutTask).await(); 如果(winner == activityTask){ // 完成情况 } else { // 超时情况 }
创建一个在指定延迟后过期的 durable timer。
指定较长的延迟(例如,几天或更长时间的延迟)可能会导致创建多个内部管理的 durable timer。协调器代码不需要意识到这种行为。不过,框架日志和存储的历史状态中可能会显示这种行为。
Task<Void> createTimer(Duration duration);
default Task<Void> createTimer(ZonedDateTime zonedDateTime) {
throw new UnsupportedOperationException("This method is not implemented.");
}
getInput() 方法获取当前任务协调器的反序列化输入。
<V> V getInput(Class<V> targetType);
callSubWorkflow() 方法异步调用另一个工作流作为子工作流:
default Task<Void> callSubWorkflow(String name) {
return this.callSubWorkflow(name, null);
}
default Task<Void> callSubWorkflow(String name, Object input) {
return this.callSubWorkflow(name, input, null);
}
default <V> Task<V> callSubWorkflow(String name, Object input, Class<V> returnType) {
return this.callSubWorkflow(name, input, null, returnType);
}
default <V> Task<V> callSubWorkflow(String name, Object input, String instanceID, Class<V> returnType) {
return this.callSubWorkflow(name, input, instanceID, null, returnType);
}
default Task<Void> callSubWorkflow(String name, Object input, String instanceID, TaskOptions options) {
return this.callSubWorkflow(name, input, instanceID, options, Void.class);
}
<V> Task<V> callSubWorkflow(String name,
@Nullable Object input,
@Nullable String instanceID,
@Nullable TaskOptions options,
Class<V> returnType);
callSubWorkflow() 的 javadoc 描述如下:
异步调用另一个工作流作为子工作流,并在子工作流完成时返回一个任务。如果子工作流成功完成,返回的任务值将是 activity 的输出。如果子工作流失败,返回的任务将以 TaskFailedException 异常完成。
子工作流有自己的 instance ID、历史和状态,与启动它的父工作流无关。将大型协调分解为子工作流有很多好处:
- 将大型协调拆分成一系列较小的子工作流可以使代码更易于维护。
- 如果协调逻辑需要协调大量任务,那么在多个计算节点上并发分布协调逻辑就非常有用。
- 通过保持较小的父协调历史记录,可以减少内存使用和 CPU 开销。
缺点是启动子工作流和处理其输出会产生开销。这通常只适用于非常小的协调。
由于子工作流独立于父工作流,因此终止父协调不会影响任何子工作流。
callSubWorkflow() 方法使用新输入重启协调并清除其历史记录:
default void continueAsNew(Object input) {
this.continueAsNew(input, true);
}
void continueAsNew(Object input, boolean preserveUnprocessedEvents);
}
continueAsNew() 的 javadoc 描述如下:
使用新输入重启协调并清除其历史记录。
该方法主要针对永恒协调(eternal orchestrations),即可能永远无法完成的协调。它的工作原理是重新启动协调,为其提供新的输入,并截断现有的协调历史。它允许协调无限期地继续运行,而不会让其历史记录无限制地增长。定期截断历史记录的好处包括降低内存使用率、减少存储容量,以及在重建状态时缩短协调器重播时间。
当协调器调用 continueAsNew 时,任何未完成任务的结果都将被丢弃。例如,如果计划了一个定时器,但在定时器启动前调用了 continueAsNew,那么定时器事件将被丢弃。唯一的例外是外部事件。默认情况下,如果协调收到外部事件但尚未处理,则会通过调用 waitForExternalEvent 将该事件保存在协调状态单元中。即使协调器使用 continueAsNew 重新启动,这些事件也会保留在内存中。可以通过为 preserveUnprocessedEvents 参数值指定 false 来禁用此行为。
协调器实现应在调用 continueAsNew 方法后立即完成。
DaprWorkflowContextImpl 类实现了 WorkflowContext 接口,实现上采用代理给内部字段 innerContext,这是一个 com.microsoft.durabletask.TaskOrchestrationContext
import com.microsoft.durabletask.TaskOrchestrationContext;
public class DaprWorkflowContextImpl implements WorkflowContext {
private final TaskOrchestrationContext innerContext;
private final Logger logger;
......
}
构造函数只是简单赋值,加了一些必要的 null 检测:
public DaprWorkflowContextImpl(TaskOrchestrationContext context) throws IllegalArgumentException {
this(context, LoggerFactory.getLogger(WorkflowContext.class));
}
public DaprWorkflowContextImpl(TaskOrchestrationContext context, Logger logger) throws IllegalArgumentException {
if (context == null) {
throw new IllegalArgumentException("Context cannot be null");
}
if (logger == null) {
throw new IllegalArgumentException("Logger cannot be null");
}
this.innerContext = context;
this.logger = logger;
}
除 getLogger() 外的所有方法的实现都是简单的代理给 innerContext 的同名方法:
public Logger getLogger() {
if (this.innerContext.getIsReplaying()) {
return NOPLogger.NOP_LOGGER;
}
return this.logger;
}
public String getName() {
return this.innerContext.getName();
}
public String getInstanceId() {
return this.innerContext.getInstanceId();
}
public Instant getCurrentInstant() {
return this.innerContext.getCurrentInstant();
}
public boolean isReplaying() {
return this.innerContext.getIsReplaying();
}
public <V> Task<V> callSubWorkflow(String name, @Nullable Object input, @Nullable String instanceID,
@Nullable TaskOptions options, Class<V> returnType) {
return this.innerContext.callSubOrchestrator(name, input, instanceID, options, returnType);
}
public void continueAsNew(Object input) {
this.innerContext.continueAsNew(input);
}
这个类基本就是 com.microsoft.durabletask.TaskOrchestrationContext
的简单包裹,所有功能都代理给 com.microsoft.durabletask.TaskOrchestrationContext
, 包括设计甚至方法名。
dapr 的 workflow 实现基本是完全绑定在 durabletask 上的。
WorkflowRuntime 简单封装了 durabletask 的 DurableTaskGrpcWorker:
import com.microsoft.durabletask.DurableTaskGrpcWorker;
public class WorkflowRuntime implements AutoCloseable {
private DurableTaskGrpcWorker worker;
public WorkflowRuntime(DurableTaskGrpcWorker worker) {
this.worker = worker;
}
......
}
然后将 start() 和 close() 方法简单的代理给 durabletask 的 DurableTaskGrpcWorker:
public void start() {
this.start(true);
}
public void start(boolean block) {
if (block) {
this.worker.startAndBlock();
} else {
this.worker.start();
}
}
public void close() {
if (this.worker != null) {
this.worker.close();
this.worker = null;
}
}
WorkflowRuntimeBuilder 用来构建 WorkflowRuntime,类似 WorkflowRuntime 只是简单封装了 durabletask 的 DurableTaskGrpcWorker, WorkflowRuntimeBuilder 的实现也是简单封装了 durabletask 的 DurableTaskGrpcWorkerBuilder:
import com.microsoft.durabletask.DurableTaskGrpcWorkerBuilder;
public class WorkflowRuntimeBuilder {
private static volatile WorkflowRuntime instance;
private DurableTaskGrpcWorkerBuilder builder;
public WorkflowRuntimeBuilder() {
this.builder = new DurableTaskGrpcWorkerBuilder().grpcChannel(NetworkUtils.buildGrpcManagedChannel());
}
......
}
grpcChannel()的细节后面细看。
registerWorkflow() 方法注册 workflow 对象,实际代理给 DurableTaskGrpcWorkerBuilder 的 addOrchestration() 方法:
public <T extends Workflow> WorkflowRuntimeBuilder registerWorkflow(Class<T> clazz) {
this.builder = this.builder.addOrchestration(
new OrchestratorWrapper<>(clazz)
);
return this;
}
registerActivity() 方法注册 activity 对象,实际代理给 DurableTaskGrpcWorkerBuilder 的 addActivity() 方法:
public <T extends WorkflowActivity> void registerActivity(Class<T> clazz) {
this.builder = this.builder.addActivity(
new ActivityWrapper<>(clazz)
);
}
build() 方法实现了一个简单的单例,只容许构建一个 WorkflowRuntime 的 instance:
private static volatile WorkflowRuntime instance;
public WorkflowRuntime build() {
if (instance == null) {
synchronized (WorkflowRuntime.class) {
if (instance == null) {
instance = new WorkflowRuntime(this.builder.build());
}
}
}
return instance;
}
DurableTaskGrpcWorkerBuilder() 在构建时,需要设置 grpcChannel,而这个 grpcChannel 是通过 NetworkUtils.buildGrpcManagedChannel() 方法来实现的。
NetworkUtils.buildGrpcManagedChannel() 在 sdk/src/main/java/io/dapr/utils/NetworkUtils.java
文件中,是一个通用的网络工具类。buildGrpcManagedChannel() 方法的实现如下:
private static final String DEFAULT_SIDECAR_IP = "127.0.0.1";
private static final Integer DEFAULT_GRPC_PORT = 50001;
public static final Property<String> SIDECAR_IP = new StringProperty(
"dapr.sidecar.ip",
"DAPR_SIDECAR_IP",
DEFAULT_SIDECAR_IP);
public static final Property<Integer> GRPC_PORT = new IntegerProperty(
"dapr.grpc.port",
"DAPR_GRPC_PORT",
DEFAULT_GRPC_PORT);
public static final Property<String> GRPC_ENDPOINT = new StringProperty(
"dapr.grpc.endpoint",
"DAPR_GRPC_ENDPOINT",
null);
public static ManagedChannel buildGrpcManagedChannel() {
// 从系统属性或者环境变量中读取 dapr sidecar 的IP
String address = Properties.SIDECAR_IP.get();
// 从系统属性或者环境变量中读取 dapr grpc 端口
int port = Properties.GRPC_PORT.get();
// 默认不用https
boolean insecure = true;
// 从系统属性或者环境变量中读取 dapr grpc 端点信息
String grpcEndpoint = Properties.GRPC_ENDPOINT.get();
if ((grpcEndpoint != null) && !grpcEndpoint.isEmpty()) {
// 如果 dapr grpc 端点不为空,则用 grpc 端点的内容覆盖
URI uri = URI.create(grpcEndpoint);
// 通过 schema 是不是 http 来判断是 http 还是 https
insecure = uri.getScheme().equalsIgnoreCase("http");
// grpcEndpoint 如果设置有端口则采用,没有设置则根据是 http 还是 https 来选择 80 或者 443 端口
port = uri.getPort() > 0 ? uri.getPort() : (insecure ? 80 : 443);
// 覆盖 dapr sidecar 的地址
address = uri.getHost();
if ((uri.getPath() != null) && !uri.getPath().isEmpty()) {
address += uri.getPath();
}
}
// 构建连接到指定地址的 grpc channel
ManagedChannelBuilder<?> builder = ManagedChannelBuilder.forAddress(address, port)
.userAgent(Version.getSdkVersion());
if (insecure) {
builder = builder.usePlaintext();
}
return builder.build();
}
从部署来看,runtime 运行在 client 一侧的 app 应用程序内部,然后通过 durabletask 的 sdk 连接到 dapr sidecar 了,走 grpc 协议。
这个设计有点奇怪,dapr sdk 和 dapr sidecar 之间没有走标准的 dapr API,而是通过 durabletask 的 sdk 。
WorkflowRuntimeBuilder 的 registerWorkflow() 方法在注册 workflow 对象时,实际代理给 DurableTaskGrpcWorkerBuilder 的 addOrchestration() 方法:
import com.microsoft.durabletask.TaskOrchestrationFactory;
public <T extends Workflow> WorkflowRuntimeBuilder registerWorkflow(Class<T> clazz) {
this.builder = this.builder.addOrchestration(
new OrchestratorWrapper<>(clazz)
);
return this;
}
而 addOrchestration() 方法的输入参数为 com.microsoft.durabletask.TaskOrchestrationFactory
:
public interface TaskOrchestrationFactory {
String getName();
TaskOrchestration create();
}
因此需要提供一个 TaskOrchestrationFactory 的实现。
OrchestratorWrapper 类实现了 com.microsoft.durabletask.TaskOrchestrationFactory
接口:
class OrchestratorWrapper<T extends Workflow> implements TaskOrchestrationFactory {
private final Constructor<T> workflowConstructor;
private final String name;
......
}
构造函数:
public OrchestratorWrapper(Class<T> clazz) {
// 获取并设置 name
this.name = clazz.getCanonicalName();
try {
// 获取 Constructor
this.workflowConstructor = clazz.getDeclaredConstructor();
} catch (NoSuchMethodException e) {
throw new RuntimeException(
String.format("No constructor found for workflow class '%s'.", this.name), e
);
}
}
TaskOrchestrationFactory 接口要求的 getName() 方法,直接返回前面获取的 name:
@Override
public String getName() {
return name;
}
TaskOrchestrationFactory 接口要求的 create() 方法,要返回一个 durabletask 的 TaskOrchestration ,而 TaskOrchestration 是一个 @FunctionalInterface,仅有一个 run() 方法:
@FunctionalInterface
public interface TaskOrchestration {
void run(TaskOrchestrationContext ctx);
}
因此构建 TaskOrchestration 实例的方式被简写为:
import com.microsoft.durabletask.TaskOrchestration;
@Override
public TaskOrchestration create() {
return ctx -> {
T workflow;
try {
// 通过 workflow 的构造器生成一个 workflow 实例
workflow = this.workflowConstructor.newInstance();
} catch (InstantiationException | IllegalAccessException | InvocationTargetException e) {
throw new RuntimeException(
String.format("Unable to instantiate instance of workflow class '%s'", this.name), e
);
}
// 将 durable task 的 context 包装为 dapr 的 workflow context DaprWorkflowContextImpl
// 然后执行 workflow.run()
workflow.run(new DaprWorkflowContextImpl(ctx));
};
}
WorkflowRuntimeBuilder 的 registerActivity() 方法在注册 activity 对象时,实际代理给 DurableTaskGrpcWorkerBuilder 的 addActivity() 方法:
import com.microsoft.durabletask.TaskOrchestrationFactory;
public <T extends WorkflowActivity> void registerActivity(Class<T> clazz) {
this.builder = this.builder.addActivity(
new ActivityWrapper<>(clazz)
);
}
而 addActivity() 方法的输入参数为 com.microsoft.durabletask.TaskActivityFactory
:
public interface TaskActivityFactory {
String getName();
TaskActivity create();
}
因此需要提供一个 TaskActivityFactory 的实现。
ActivityWrapper 类实现了 com.microsoft.durabletask.TaskActivityFactory
接口:
public class ActivityWrapper<T extends WorkflowActivity> implements TaskActivityFactory {
private final Constructor<T> activityConstructor;
private final String name;
......
}
构造函数:
public ActivityWrapper(Class<T> clazz) {
this.name = clazz.getCanonicalName();
try {
this.activityConstructor = clazz.getDeclaredConstructor();
} catch (NoSuchMethodException e) {
throw new RuntimeException(
String.format("No constructor found for activity class '%s'.", this.name), e
);
}
}
TaskActivityFactory 接口要求的 getName() 方法,直接返回前面获取的 name:
@Override
public String getName() {
return name;
}
TaskActivityFactory 接口要求的 create() 方法,要返回一个 durabletask 的 TaskActivity ,而 TaskActivity 是一个 @FunctionalInterface,仅有一个 run() 方法:
@FunctionalInterface
public interface TaskActivity {
Object run(TaskActivityContext ctx);
}
因此构建 TaskActivity 实例的方式被简写为:
import com.microsoft.durabletask.TaskActivity;
@Override
public TaskActivity create() {
return ctx -> {
Object result;
T activity;
try {
activity = this.activityConstructor.newInstance();
} catch (InstantiationException | IllegalAccessException | InvocationTargetException e) {
throw new RuntimeException(
String.format("Unable to instantiate instance of activity class '%s'", this.name), e
);
}
result = activity.run(new WorkflowActivityContext(ctx));
return result;
};
}
}
WorkflowActivity接口定义了 Activity
public interface WorkflowActivity {
/**
* 执行活动逻辑并返回一个值,该值将被序列化并返回给调用的协调器。
*
* @param ctx 提供有关当前活动执行的信息,如活动名称和协调程序提供给它的输入数据。
* @return 要返回给调用协调器的任何可序列化值。
*/
Object run(WorkflowActivityContext ctx);
}
WorkflowActivity 的 javadoc 描述如下:
任务活动实现的通用接口。
活动(Activity)是 durable task 协调的基本工作单位。活动(Activity)是在业务流程中进行协调的任务。例如,您可以创建一个协调器来处理订单。这些任务包括检查库存、向客户收费和创建装运。每个任务都是一个单独的活动(Activity)。这些活动(Activity)可以串行执行、并行执行或两者结合执行。
与任务协调器不同的是,活动(Activity)在工作类型上不受限制。活动(Activity)函数经常用于进行网络调用或运行 CPU 密集型操作。活动(Activity)还可以将数据返回给协调器函数。 durable task 运行时保证每个被调用的活动(Activity)函数在协调执行期间至少被执行一次。
由于活动(Activity)只能保证至少执行一次,因此建议尽可能将活动(Activity)逻辑作为幂等逻辑来实现。
协调器使用 io.dapr.workflows.WorkflowContext.callActivity 方法重载之一来调度活动。
WorkflowActivityContext 简单包装了 durabletask 的 TaskActivityContext :
import com.microsoft.durabletask.TaskActivityContext;
public class WorkflowActivityContext implements TaskActivityContext {
private final TaskActivityContext innerContext;
public WorkflowActivityContext(TaskActivityContext context) throws IllegalArgumentException {
if (context == null) {
throw new IllegalArgumentException("Context cannot be null");
}
this.innerContext = context;
}
......
}
TaskActivityContext 接口要求的 getName() 和 getInput() 方法都简单代理给了内部的 durabletask 的 TaskActivityContext :
public String getName() {
return this.innerContext.getName();
}
public <T> T getInput(Class<T> targetType) {
return this.innerContext.getInput(targetType);
}
备注:这样的包装并没有起到隔离 dapr sdk 和 durabletask sdk 的目的,还是紧密的耦合在一起,包装的意义何在?
DaprWorkflowClient 定义管理 Dapr 工作流实例的客户端操作。
注意这里是 “管理” !
import com.microsoft.durabletask.DurableTaskClient;
public class DaprWorkflowClient implements AutoCloseable {
DurableTaskClient innerClient;
ManagedChannel grpcChannel;
public DaprWorkflowClient() {
this(NetworkUtils.buildGrpcManagedChannel());
}
private DaprWorkflowClient(ManagedChannel grpcChannel) {
this(createDurableTaskClient(grpcChannel), grpcChannel);
}
private DaprWorkflowClient(DurableTaskClient innerClient, ManagedChannel grpcChannel) {
this.innerClient = innerClient;
this.grpcChannel = grpcChannel;
}
实现上依然是包装 durabletask 的 DurableTaskClient , 而 durabletask 的 DurableTaskClient 在创建时需要传入一个 grpcChannel。
关键点在于这个 grpcChannel 的创建,可以从外部传入,如果没有传入则可以通过 NetworkUtils.buildGrpcManagedChannel() 方法进行创建。
实现和之前 WorkflowRuntimeBuilder 中的一致,都是调用 NetworkUtils.buildGrpcManagedChannel()
方法。
NetworkUtils.buildGrpcManagedChannel()
方法在 dapr java sdk 中一共有3处调用:
WorkflowRuntimeBuilder:
public WorkflowRuntimeBuilder() {
this.builder = new DurableTaskGrpcWorkerBuilder().grpcChannel(NetworkUtils.buildGrpcManagedChannel());
}
DaprWorkflowClient:
public DaprWorkflowClient() {
this(NetworkUtils.buildGrpcManagedChannel());
}
DaprClientBuilder
final ManagedChannel channel = NetworkUtils.buildGrpcManagedChannel();
DurableTaskClient 的创建是简单的调用 durabletask 的 DurableTaskGrpcClientBuilder 来实现的:
import com.microsoft.durabletask.DurableTaskGrpcClientBuilder;
private static DurableTaskClient createDurableTaskClient(ManagedChannel grpcChannel) {
return new DurableTaskGrpcClientBuilder()
.grpcChannel(grpcChannel)
.build();
}
close() 方法用于关闭 DaprWorkflowClient,内部实现为关闭包装的 durabletask 的 DurableTaskClient 以及创建时传入的 grpcChannel:
public void close() throws InterruptedException {
try {
if (this.innerClient != null) {
this.innerClient.close();
this.innerClient = null;
}
} finally {
if (this.grpcChannel != null && !this.grpcChannel.isShutdown()) {
this.grpcChannel.shutdown().awaitTermination(5, TimeUnit.SECONDS);
this.grpcChannel = null;
}
}
}
}
scheduleNewWorkflow() 方法调度一个新的 workflow ,即创建并开始一个新的 workflow instance,这个方法返回 workflow instance id:
package io.dapr.workflows.client;
public <T extends Workflow> String scheduleNewWorkflow(Class<T> clazz) {
return this.innerClient.scheduleNewOrchestrationInstance(clazz.getCanonicalName());
}
public <T extends Workflow> String scheduleNewWorkflow(Class<T> clazz, Object input) {
return this.innerClient.scheduleNewOrchestrationInstance(clazz.getCanonicalName(), input);
}
public <T extends Workflow> String scheduleNewWorkflow(Class<T> clazz, Object input, String instanceId) {
return this.innerClient.scheduleNewOrchestrationInstance(clazz.getCanonicalName(), input, instanceId);
}
实现完全代理给 durabletask 的 DurableTaskClient 。
terminateWorkflow() 方法终止一个 workflow instance 的执行,需要传入之前从 scheduleNewWorkflow() 方法中得到的 workflow instance id。
public void terminateWorkflow(String workflowInstanceId, @Nullable Object output) {
this.innerClient.terminate(workflowInstanceId, output);
}
output 参数是可选的,用来传递被终止的 workflow instance 的输出。
getInstanceState() 方法获取 workflow instance 的状态,同样需要传入之前从 scheduleNewWorkflow() 方法中得到的 workflow instance id:
@Nullable
public WorkflowInstanceStatus getInstanceState(String instanceId, boolean getInputsAndOutputs) {
OrchestrationMetadata metadata = this.innerClient.getInstanceMetadata(instanceId, getInputsAndOutputs);
if (metadata == null) {
return null;
}
return new WorkflowInstanceStatus(metadata);
}
实现为调用 durabletask 的 DurableTaskClient 的 getInstanceMetadata() 方法来获取 OrchestrationMetadata,然后转换为 dapr 定义的 WorkflowInstanceStatus()。
这里的细节在 WorkflowInstanceStatus 类实现中展开。
waitForInstanceStart() 方法等待 workflow instance 执行的开始:
@Nullable
public WorkflowInstanceStatus waitForInstanceStart(String instanceId, Duration timeout, boolean getInputsAndOutputs)
throws TimeoutException {
OrchestrationMetadata metadata = this.innerClient.waitForInstanceStart(instanceId, timeout, getInputsAndOutputs);
return metadata == null ? null : new WorkflowInstanceStatus(metadata);
}
waitForInstanceStart() 方法的 javadoc 描述为:
等待工作流开始运行,并返回一个 WorkflowInstanceStatus 对象,该对象包含已启动实例的元数据,以及可选的输入、输出和自定义状态有效载荷。
“已启动” 的工作流实例是指未处于 “Pending” 状态的任何实例。
如果调用该方法时工作流实例已在运行,该方法将立即返回。
waitForInstanceCompletion() 方法等待 workflow instance 执行的完成:
@Nullable
public WorkflowInstanceStatus waitForInstanceCompletion(String instanceId, Duration timeout,
boolean getInputsAndOutputs) throws TimeoutException {
OrchestrationMetadata metadata =
this.innerClient.waitForInstanceCompletion(instanceId, timeout, getInputsAndOutputs);
return metadata == null ? null : new WorkflowInstanceStatus(metadata);
}
waitForInstanceStart() 方法的 javadoc 描述为:
等待工作流完成,并返回一个包含已完成实例元数据的 WorkflowInstanceStatus 对象。
“已完成” 的工作流实例是指处于终止状态之一的任何实例。例如,“Completed”、“Failed” 或 “Terminated” 状态。
工作流是长期运行的,可能需要数小时、数天或数月才能完成。工作流也可能是长久的,在这种情况下,除非终止,否则永远不会完成。在这种情况下,该调用可能会无限期阻塞,因此必须注意确保使用适当的超时。如果调用该方法时工作流实例已经完成,该方法将立即返回。
purgeInstance() 方法从工作流状态存储中清除工作流实例的状态:
public boolean purgeInstance(String workflowInstanceId) {
PurgeResult result = this.innerClient.purgeInstance(workflowInstanceId);
if (result != null) {
return result.getDeletedInstanceCount() > 0;
}
return false;
}
如果找到工作流状态并成功清除,则返回 true,否则返回 false。
raiseEvent() 方法向等待中的工作流实例发送事件通知消息:
public void raiseEvent(String workflowInstanceId, String eventName, Object eventPayload) {
this.innerClient.raiseEvent(workflowInstanceId, eventName, eventPayload);
}
这两个方法暂时还知道什么情况下用,暂时忽略。
public void createTaskHub(boolean recreateIfExists) {
this.innerClient.createTaskHub(recreateIfExists);
}
public void deleteTaskHub() {
this.innerClient.deleteTaskHub();
}
WorkflowInstanceStatus 代表工作流实例当前状态的快照,包括元数据。
WorkflowInstanceStatus 的实现依然是包装 durabletask,内部是一个 durabletask 的 OrchestrationMetadata,以及 OrchestrationMetadata 携带的 FailureDetails:
import com.microsoft.durabletask.FailureDetails;
import com.microsoft.durabletask.OrchestrationMetadata;
public class WorkflowInstanceStatus {
private final OrchestrationMetadata orchestrationMetadata;
@Nullable
private final WorkflowFailureDetails failureDetails;
public WorkflowInstanceStatus(OrchestrationMetadata orchestrationMetadata) {
if (orchestrationMetadata == null) {
throw new IllegalArgumentException("OrchestrationMetadata cannot be null");
}
this.orchestrationMetadata = orchestrationMetadata;
FailureDetails details = orchestrationMetadata.getFailureDetails();
if (details != null) {
this.failureDetails = new WorkflowFailureDetails(details);
} else {
this.failureDetails = null;
}
}
获取 FailureDetails 之后将转为 dapr 的 WorkflowFailureDetails, 这里的细节在 WorkflowFailureDetails 类实现中展开。
WorkflowFailureDetails 只是非常简单的包装了 durabletask 的 FailureDetails
public class WorkflowFailureDetails {
FailureDetails workflowFailureDetails;
/**
* Class constructor.
* @param failureDetails failure Details
*/
public WorkflowFailureDetails(FailureDetails failureDetails) {
this.workflowFailureDetails = failureDetails;
}
然后代理各种方法:
public String getErrorType() {
return workflowFailureDetails.getErrorType();
}
public String getErrorMessage() {
return workflowFailureDetails.getErrorMessage();
}
public String getStackTrace() {
return workflowFailureDetails.getStackTrace();
}