Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Consider dial protocol tracing #411

Open
tallclair opened this issue Oct 5, 2022 · 2 comments
Open

[FR] Consider dial protocol tracing #411

tallclair opened this issue Oct 5, 2022 · 2 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@tallclair
Copy link
Contributor

It would be cool to have tracing on the initial dial request, but I think this would require a protocol-level enhancement to DialResponse (grpc).

The initial flow (happy path) looks like:

client -> server -> agent -[ endpoint ]- agent -> server -> client

It would be great to have latency information for each hop. In other words, something like:

  1. Server records dial_req received timestamp (and server ID?)
  2. Agent records dial_req received timestamp (and agent ID?)
  3. Agent records endpoint dial complete timestamp
  4. Agent includes traces in the DialResponse
  5. Server records dial_resp received timestamp
  6. Server adds request & response received timestamps to DialResponse
  7. Client constructs the full trace, records latency metrics for each hop, logs full trace at high verbosity

It looks like there's OpenCensus gRPC integration that we should investigate, but I'm not sure if it would be work with our multiplexed streams. At the very least, we should make sure our design fits the OpenCensus tracing spec

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2023
@tallclair
Copy link
Contributor Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

3 participants