Ephemeral Control API #5136

travis-minke-sap · 2021-03-23T21:22:17Z

travis-minke-sap
Mar 23, 2021

Ephemeral Control API

The standard Kubernetes & Knative APIs use a declarative configuration (YAML) which can be a bad fit for certain ephemeral control operations. This discussion aims to standardize an approach for handling such use cases in order to achieve consistency across the Knative landscape.

The ephemeral control operation in question is one which has an immediate one-time impact on the system, for which specifying some static configuration would be out-of-date after the operation has taken effect.

A specific use case is driving this need for this capability, and will be used as a example in this discussion, but resolving the larger issue of how to handle such operations is the actual goal.

Example Use Case

Kafka Topics are persisted which allows subscribers to consume prior messages. They are also partitioned for scalability reasons, and the messages in these partitions are ordered. Individual Consumers of a partition track their own "offset" into the partition. This position tracking allows for the prevention of message loss during restarts, etc. If a subscriber experiences unexpected downtime it can reposition its offset back to an earlier time in order to recover what would otherwise be lost messages.

The eventing-kafka KafkaChannel implementation would like to expose this ability to end users so that they can perform the recovery described above and is detailed in Issue #477. The design for implementing this capability is well understood, except for "HOW" to expose this capability to the end user. Meaning, what is the API for them to request a re-position of the offset?

The operation of re-positioning the offsets is an ephemeral one-time (per usage) change to the system, which would be awkward to specify in static configuration YAML. Once the new offset (or offset timestamp) was specified in configuration,
and the offset adjusted, the value is no longer valid/relevant? In fact, if it was not removed or flagged as "completed" or "processed", it could cause an unintended additional re-positioning upon Pod restart or re-reconciliation.

Exposing a REST API on the KafkaChannel Controller which allows users to reposition their subscriptions is straight-forward enough, but it would be setting a new precedent as to how such operations are handled which might not be desirable. Therefore, I'd like to brainstorm options for handling such requests and hopefully settle on an approach.

Options (Brainstorming ; )

Controller REST APIs: Simply allowing Controllers (and/or other Deployments?) to expose a REST API as desired. Could define some specifications around the naming/pathing of operations. Is different from all other K8S / Knative configuration.
K8S API Aggregation: Kubernetes exposes the ability to extend the core K8S api with custom endpoints via the APIService resource. The K8S Docs describe the complicated auth setup required for proxying requests. Not sure if this works with existing CRD Controllers?
New CRDs: New Custom Resources could be created to capture the desire for such operations. This will require new Controllers to reconcile the CRDS. The Controllers would have to work together with the existing Controllers of the actual Custom Resources being managed (e.g. KafkaChannels) which might be awkward. These CRDs would have their own state (e.g. "Pending" or "Completed"). Questions around their overall lifecycle once completed - are they ever removed?
Treat As State: It might be possible keep such configuration in the declarative YAML by treating it as State and/or History information? The contextual problem of the "one-time operation" must be handled to prevent unintended repetitive execution of the operation.
Kubernetes Jobs: Something along the lines of K8S Jobs might be a possibility. This is similar to the New CRDs approach but instead of a Controller it is a new Deployment. This is heavy-weight requiring new build images for distinct operations, etc.
Annotations: Some minimal standards around naming of light-weight annotations. These annotations would have to be removed/flagged similar to the other solutions to prevent unintended re-execution. Would probably require new Controllers as well since the resource in question might not already have one (e.g. - KafkaChannels replay would have to watch Subscriptions since that is the granularity at which the annotations would be needed.)
Other Ideas?

Additional Considerations

Operation History: For some ephemeral operations, it might also be useful for the user to see a history of such operations. In the example use case, a subscriber might be active for years and have experienced multiple re-positions. Being able to see when the offsets were changed (and to what) would be useful. Depending on the actual approach selected it would be nice to consider such a capability if possible.

Resources

slinkydeveloper · 2021-03-24T09:14:25Z

slinkydeveloper
Mar 24, 2021

IMHO I think these solutions doesn't fit:

Controller REST APIs: My problem with this solution is that we need somehow to converge the authn/authz of that endpoint to the service accounts, unless we want a completely separate authn/authz mechanism for it.
Treat As State/Annotations: They're two faces of the same medal IMO, because they'll both end up in a solution which doesn't map very well to the kube declarative approach
Kubernetes Jobs: I don't really get how this should work both api-wise and implementation-wise

While I think we should focus on these:

K8S API Aggregation

I honestly didn't knew at all about this feature of k8s, from a quick look it seems to me somehow similar to how admission webhooks works... Even if our controllers doesn't support it, maybe can we reuse some code from webhooks to implement it?! It's definitely worth investigating. Did you already made some investigation around it?

If we could have an endpoint that is already connected and authenticated following the k8s flow and service accounts, and we just need to care about implementing the api, then I think we can achieve a great result, like:

/subscription/{subscriptionNamespace}/{subscriptionName}/commands/replay?offset=10

But again, I need to read a bit more about it to understand if I'm just dreaming 😄

New CRDs

What I like here is:

We can have these "commands" CRDs implementation specific, avoiding to modify the Subscription contract and spec
We can implement it with the tools we have now
We can fit the controller wherever we want, keeping it in the same process where we have the other control protocol connections (this is in fact a requirement of control protocol to have a single controller deal with a single data plane pod, because of mTLS).

I think the trick here is to create a CRD which has the same semantics of a Job: once created, you can't remove it until it's done nor you can mutate it. Because we can have these "commands" CRDs "implementation specific", this also avoids modifying the Subscription contract and spec. We might need to iterate on it to make sure its usage is not cumbersome, but long term I think this might work it out. It can also trivially provide history. Maybe something that looks like this (in relation to the replay use case)?

apiGroup: messaging.knative.dev/v1alpha1
kind: KafkaSubscriptionCommand
meta:
  name: #whatever?! what should the user type here?
  namespace: #whatever?! or knative-eventing?
spec:
  subscription: #ref to the subscription to replay
  action:
    moveToOffset: 100
status: # filled by the controller and it could be [Processing, Done] and maybe some status in the middle...

If I have to choose between these 2 approaches, I would prefer the K8S API Aggregation, but again, we need to understand its feasibility...

1 reply

travis-minke-sap Mar 24, 2021
Author

Thanks for the feedback Francesco! I agree with the points you have made.

I also only just discovered the K8s API Aggregation and hour or so before I created this discussion ; ). I was thinking exactly along the lines you mentioned with your sample URL - but also don't know if that's feasible. The docs seem to present it as an "advanced alternative" to creating CRDs, so it's unclear how it might mesh with an existing CRD/Controller. Also, the authentication between the services is complicated (not that we can't get it working - but I wasn't sure how much of that might have to be exposed). I will try to dig into it a bit more this week to try and understand it better.

If the K8S API Aggregation isn't a good fit then I think we can make the CRD approach work - thanks for providing an example to expand my brief description ; )

Still open to other opinions or ideas if anyone else wants to weigh in ; )

pmorie · 2021-03-24T16:41:20Z

pmorie
Mar 24, 2021

Some key info around aggregated APIs:

Aggregated APIs allow much greater flexibility in terms of API shapes from a subresource standpoint; you can express arbitrary subresources this way
The cost of the additional flexibility is a higher implementation bar by far than CRDs - you must implement an API server that serves your API
It is not possible to have CRDs and aggregated APIs spanning a single API group
The APIService resource is extremely highly privileged, historically not all *KS providers allowed users access to this resource

1 reply

slinkydeveloper Mar 24, 2021

The cost of the additional flexibility is a higher implementation bar by far than CRDs - you must implement an API server that serves your API

As discussed here: #5136 (comment) how much can we reuse from the webhooks code we have? Isn't it already implementing part of the privilege chain?

The APIService resource is extremely highly privileged, historically not all *KS providers allowed users access to this resource

Isn't it "related" to the service accounts? Can an access to an api service be regulated by some ClusterRole?

travis-minke-sap · 2021-03-24T20:35:43Z

travis-minke-sap
Mar 24, 2021
Author

Interesting K8S Issue on the topic (with a final comment from @n3wscott ; )

0 replies

matzew · 2021-03-25T16:37:16Z

matzew
Mar 25, 2021
Collaborator

I am not sure the Aggregation API sounds feasible, b/c Paul said.

Since we talk about CRDs... and the Command suggestion above....

Did we think about a specific KafkaSubscription? Eg. like extending the Subscription and add specifics to the needs of Kafka?
Or would that be weird ?

1 reply

slinkydeveloper Mar 25, 2021

Did we think about a specific KafkaSubscription? Eg. like extending the Subscription and add specifics to the needs of Kafka?
Or would that be weird ?

Even if we do that, still the semantics of an eventual kafkaSubscription sounds weird to me, because of the declarative resource approach, while a CRD explicitly fitted for this purpose can "flip" the medal of the declarative and its semantics may be shaped to sound more imperative

travis-minke-sap · 2021-03-25T17:12:34Z

travis-minke-sap
Mar 25, 2021
Author

After spending some time researching the K8S Aggregation API I don't think it is the right approach, at least for the current KafkaChannel Subscription Replay use case. Everything I've seen (K8S docs, apiserver-builder-alpha, Sample API Server) indicates we would be going against the grain and using this in a manner for which it was not intended.

The Aggregation API is focused on allowing advanced or alternate handling of Custom Resources above and beyond what the standard CRD mechanism allows. It is still, though, focused on managing the CRUD lifecycle of a Custom Resource, whereas what we are trying to do is expose some additional "functionality" that is "related" to a Custom Resource.

The closest paradigm that might work would be to add a sub-resource for our "functionality" to the Subscription Custom Resource. There are several problems with this approach including;

We've said we that we want the Replay capability to be limited to KafkaChannels, so it would be odd to add it to the Subscription Custom Resource
It doesn't seem possible / practical to add a sub-resource dynamically with the Aggregation API to an existing CRD based Custom Resource
I don't think we're entertaining the idea of refactoring our use of CRDs in favor of the Aggregation API for existing Eventing Custom Resources.

Additionally, the concerns provided above by Paul about CRDs and Aggregated APIs in the same API Group, and potential support from cloud providers, etc are major impediments. Also, supporting the Aggregation API requires significantly more effort (initial and ongoing) over CRDs and is probably not something to be taken lightly.

If we instead choose to create a new "Request" or "Command" CRD for the Replay feature, then there's no longer a need to use the Aggregation API over the simpler CRD based approach. I'm thinking this is the most "standard" way of handling this in K8S and we should choose this approach?

3 replies

lionelvillard Mar 25, 2021
Collaborator

+1 for creating a new command CRD. After all, k8s has jobs, so it feels like standard to me.

slinkydeveloper Mar 26, 2021

+1 just, let's make sure we start with a crd not generic for all knative eventing like Command, but something specific for kafka, like KafkaSubscriptionCommand

travis-minke-sap Mar 29, 2021
Author

Yeah, I agree - was worried over the weekend that this might have been misinterpreted as suggesting a global shared Command CRD which is not what I was suggesting. I was thinking exactly along the naming/scope that you suggested above - thanks for clarifying!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ephemeral Control API #5136

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Ephemeral Control API #5136

travis-minke-sap Mar 23, 2021

Ephemeral Control API

Example Use Case

Options (Brainstorming ; )

Additional Considerations

Resources

Replies: 5 comments · 6 replies

slinkydeveloper Mar 24, 2021

K8S API Aggregation

New CRDs

travis-minke-sap Mar 24, 2021 Author

pmorie Mar 24, 2021

slinkydeveloper Mar 24, 2021

travis-minke-sap Mar 24, 2021 Author

matzew Mar 25, 2021 Collaborator

slinkydeveloper Mar 25, 2021

travis-minke-sap Mar 25, 2021 Author

lionelvillard Mar 25, 2021 Collaborator

slinkydeveloper Mar 26, 2021

travis-minke-sap Mar 29, 2021 Author

travis-minke-sap
Mar 23, 2021

Replies: 5 comments 6 replies

slinkydeveloper
Mar 24, 2021

travis-minke-sap Mar 24, 2021
Author

pmorie
Mar 24, 2021

travis-minke-sap
Mar 24, 2021
Author

matzew
Mar 25, 2021
Collaborator

travis-minke-sap
Mar 25, 2021
Author

lionelvillard Mar 25, 2021
Collaborator

travis-minke-sap Mar 29, 2021
Author