-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split generated serde code from transport handling by introducing middlewares #343
Comments
This makes good sense to me. I like the pipeline-based model, and I appreciate the testing benefits we'll see as a result of this (re)organization. I don't think we'll see much compile-time (speed) benefit from the change, will we? It seems like we'll still be generating a good amount of code for the clients, servers, and serialization functions. We may always want to consider some small namespace changes, such as |
If we agree on this approach, should we also include this in the 2.0 milestone? |
I agree that it should be about the same.
I'd like to get #323 completed before moving forward with this. |
This sounds great, I love the idea of being able to plug custom middleware into the middle of the |
This all sounds good (and is the approach that Absinthe uses), but why tie this internal refactor to 2.0 being released? Seems like a good candidate for 2.1. |
This isnt in the 2.0 milestone :) but this should be usable externally, it isnt internal only. |
Perfect. I guess you're right, it will have some effects, like having to do |
After discussion with @thecodeboss we are going to try a different approach, at least for the first two tasks, which should allow lower level access to the thrift seralization and deserialization, and provide idiomatic support for multiple protocols. After completing it we should be in much better shape for 2.0, and could move the remaining steps to a different repo (to avoid more complex parts than needed in the thrift library itself). Introduce a defprotocol Thrift.Serializable do
def serialize(thrift_struct, protocol_struct)
def deserialize(thrift_struct, protocol_struct)
end The implementations (per thrift struct) of the protocol will delegate to a defimpl Thrift.Serializable do
def serialize(%name{} = thrift_struct, protocol_struct) do
Module.concat(name, SerDe).serialize(protocol_struct, thrift_struct)
end
def deserialize(%name{} = thrift_struct, protocol_struct) do
Module.concat(name, SerDe).deserialize(protocol_struct, thrift_struct)
end
end With each defprotocol NameSpace.Struct.SerDe do
alias NameSpace.Struct
@spec serialize(protocol_struct, Struct.t) :: protocol_struct when protocol_struct: var
def serialize(protocol_struct, thrift_struct)
@spec deserialize(protocol_struct) :: {Struct.t, protocol_struct} | :error when protocol_struct: var
def deserialize(protocol_struct)
@spec deserialize(protocol_struct, Struct.t) :: {Struct.t, protocol_struct} | :error when protocol_struct: var
def deserialize(protocol_struct, thrift_struct)
end Therefore each thrift protocol (i.e. TBinary, TCompact #333) would implement this protocol for each struct in a schema, and this would be the only code generated for those protocols (e.g. To avoid unnecessary allocations and dispatching logic the generated code (per protocol) will make assumptions about the defined implementations of other structs (for its protocol), and call the implementation module directly with a binary in place of the protocol struct. This is an internal optimization that won't be exposed externally as the elixir Protocol will delegate based on the protocol struct, so that code can never be reached externally. Note we already do a call with same overhead and assumptions between structs today. This would be equivalent to: def serialize(binary, %_struct{nested: nested_struct}) when is_binary(binary) do
[Namespace.NestedStruct.SerDe.Thrift.Protocol.Binary.serialize(binary, nested_struct), 0]
end
def serialize(%Thrift.Protocol.Binary{payload: payload} = wrapper, struct) do
%Thrift.Protocol.Binary{wrapper | payload: serialize(payload, struct)}
end
def deserialize(binary) when is_binary(binary) do
deserialize_struct(binary, %Namespace.Struct{})
end
def deserialize(wrapper) do
deserialize(wrapper, %NameSpace.Struct{})
end
def deserialize(%Thrift.Protocol.Binary{payload: payload} = wrapper, default_struct) do
case deserialize_struct(payload, default_struct) do
{struct, rest} ->
{struct, %Thrift.Protocol.Binary{payload: rest}}
:error ->
:error
end
end
defp deserialize_struct(<<_field_id::16, bin::binary>>, acc) do
case Namespace.NestedStruct.SerDe.Thrift.Protocol.Binary.deserialize(bin) do
{nested_struct, rest} ->
deserialize_struct(rest, %{acc | nested: nested_struct})
:error ->
:error
end
end Note the extra In order to fully decouple transports from protocols we also need to implement defmodule Thrift.Message do
@enforce_keys [:type., :seq_id, :name, :payload]
defstruct [:type., :seq_id, :name,. :payload]
@type t() :: t(term())
@type t(payload) :: %__MODULE__{payload: payload}
defprotocol SerDe do
alias Thrift.TApplicationException
@spec serialize(payload, Message.t(payload | struct)) :: payload when payload: var
def serialize(payload, msg)
@spec deserialize(payload) :: {Message.t(payload), payload} | :error when payload: var
def deserialize(payload)
@spec deserialize(payload, Message.t(thrift_struct)) :: {Message.t(thrift_struct | TApplicationException.t), payload} | :error when thrift_struct: struct, payload: var
@spec deserialize(payload, Message.t(nil)) :: {Message.t(payload), payload} | :error when payload: var
def deserialize(payload, msg)
end
defimpl Thrift.Serializable do
def serialize(msg, payload), do: SerDe.serialize(payload, msg)
def deserialize(msg, payload), do: SerDe.deserialize(payload, msg)
end
end Note the parameterization of |
This is a very well-thought-out approach, thanks for doing this @fishcakez. I believe it adds a couple layers of complexity by now having multiple protocols to reason about ( For the last example, are you suggesting something like defimpl Thrift.Message.SerDe, for: FramedTransport do
def serialize(payload, msg), do: ...
def deserialize(payload), do: ...
end where each transport would implement the |
The protocols we generate code for (TBInary, TCompact) would implement the |
@fishcakez are you suggesting we pull this into the 2.0 milestone? |
@jparise yes but only the latest part, I actually started a bit towards this. Plan to have a PR by end of the week. |
Currently we generate the client and server code for TFramedTransport and there is somewhat tight coupling. However there are four different parts to this end to end pipeline:
To decouple, we provide a common data structure that is passed through the pipeline on each of the peers:
On the client side the function for a call (or oneway) would create a
%Thrift.Message{}
with the method name, request type and payload as the request struct. This would be passed to the protocol layer, which replaces the request struct in the message with a new struct (e.g.%Thrift.Binary{}
) containing the serialized request struct. The protocol then passes the message to the transport layer (e.g.Thrift.Binary.Framed.Client
) which converts the message to data and sends it on the socket. The transport layer then receives the response, deserializes into a newThrift.Message
with serialized payload struct (e.g.%Thrift.Binary{}
) and returns the message to the protocol layer. The protocol layer deseralizes the payload to the response struct, puts that in the message's payload and returns the message to the generated client function. Finally the client function handles the response struct and converts to the function's result,i.e.
{:ok, result} | {:error, Exception.t}
.This means our client side control flow is
Client -> Protocol -> Transport -> Protocol -> Client
. For the server side we reverse this toTransport -> Protocol -> Server -> Protocol -> Transport
.On the server side the transport layer (e.g.
Thrift.Binary.Framed.Server
) receives data from the socket, deserializes this into a message containing the method name, request type, sequence id and serialized payload (e.g.Thrift.Binary
struct). The message is passed to the protocol layer, which deserializes the payload to the request struct and passes the message to the server layer. The server layer unrolls the request struct and dispatches it to the server's callback module, which handles the request and returns a response. The server layer turns this response into the response struct, puts it in the message struct with the new type (e.g.:reply
) and returns it to the protocol layer. The protocol layer serializes the response struct into its own payload (e.g.%Thrift.Binary{}
) and returns it to the transport layer. The transport layer serializes the message and sends it over the socket.Lets take this through an example with the following schema:
The generated client will generate code approximately like:
However it is awkward to have to pass the
stack
around to all the clients so we will allow a stack and client tobe compiled into their own module:
And then create the
Example.MyClient
withping/0
with function spec:The client
stack
is a list of modules and arguments:With
Thrift.Pipeline.call/2
definition approximately:With
Example.Service.Binary.call/3
definition approximately (next
being&Thrift.Binary.Framed.call(&1, opts)
):With
Thrift.Binary.Framed.call/2
definition approximately:The
Thrift.Binary.Framed.Pool
would checkout a connection, send request, receive reply, checkin connection.On the server side we also have a compiled module:
It is started like:
The server
stack
is a similar list of modules:The
Thrift.Binary.Framed.Server
would have a function to handle each request, approximately:With
Example.Service.Binary.call/3
definition approximately (next
being&Example.Service.Handler(&1, {handler, opts})
):With
Example.Service.Handler.call/2
definition approximately:Once these are in place we can now support custom middleware as elements in the
stack
on client and server side. For example we could supportThrift.BinaryCompact
(#333) as a protocol instead of our currentThrift.Binary
. We could support both, or even additional custom protocols, on the same server as we could dispatch to the correct protocol module based on the "magic bytes" in the framed binary message.Notably the middlewares are identical on client and server so a single middleware can be reused on both sides. For example to measure per method latency, which is awkward to do right now. There are many other common or generic middlewares that would be useful. Fortunately there is a protocol-agnostic middleware library at https://github.com/fishcakez/stack we can build on top of that supports retries, concurrency limits, request criticality, deadlines and distributed tracing out of the box.
We don't currently support a thrift protocol/transport that supports distributed tracing. There are open source thrift protocols that do,
TTwitter
fromfinagle
andTHeader
fromfbthrift
, which could be supported via middlewares. These would require an additionalheaders
field in theThrift.Message
struct:The middleware design also enables a convenient way to test clients and servers because we can write a client that calls through to the server side implementation without the protocol or transport:
We would implement this in stages:
stack
for public middlewaresThe text was updated successfully, but these errors were encountered: