Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compactor is not honoring all API delete log requests #14985

Open
LinLo opened this issue Nov 18, 2024 · 0 comments
Open

compactor is not honoring all API delete log requests #14985

LinLo opened this issue Nov 18, 2024 · 0 comments

Comments

@LinLo
Copy link

LinLo commented Nov 18, 2024

Describe the bug
In helm deployment simple scalable mode (but maybe on distributed mode) when we have multiple pods with compactor, so multiple "endpoints" for DELETE api, the DELETE requests are sent to "loki-backend" service, then the kubernetes service is "randomly" forwarding to any backend pod.

What I found is that POST DELETE requests sent to backend pods, even where compactor is not "active" (or maybe is not running at all) will stay in "received" status.
I found also that GET DELETE requests is following the same service path as POST DELETE requests and depending on the backend pod the requests ends (loki-backend service load balancing) the reply will not be the same.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy loki 3.2.1.
  2. Configuration, no TLS/SSL and no AUTH for this test.
    • Simple Scalable mode with 3 backend, 3 writes, 3 read with only one tenant (fake)
      compactor:
        compaction_interval: 10m
        delete_request_cancel_period: 5m
        delete_request_store: s3
        retention_delete_delay: 5m
        retention_delete_worker_count: 150
        retention_enabled: true
      limits_config:
        deletion_mode: filter-and-delete
        max_query_lookback: 744h
        query_timeout: 300s
        reject_old_samples: true
        reject_old_samples_max_age: 24h
        retention_period: 744h
      
  3. Launch several POST DELETE requests with curl (adapt start date and TLS/AUTH depending on your deployment)
    • Example:
      curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="1"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
      curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="2"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
      curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="3"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
      curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="4"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
      curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="5"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
      
  4. Launch several times GET DELETE requests with curl (adapt with TLS and AUTH depending on your deployment):
    curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
    curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
    curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
    curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
    curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
    
  5. You will see that each answer content is not containing the full list of previous DELETE requests. It depends on which loki-backend pod the answer came from.
  6. Repeat GET DELETE step (4) after some time (5 to 15 min) and will see that DELETE requests "processed" are from only one compactor (on only one loki-backend). You will see in corresponding loki-backend logs that only DELETE requests on the "active/running" compactor will be honored. Others will never be taken into account (unless active/running compactor is moving to another pod if the loki-backend pod is deleted for example).

Expected behavior

  1. All POST DELETE requests must be honored by active/running compactor.
  2. All GET DELETE requests should display all requests status.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output
If applicable, add any output to help explain your problem.

Tell me if you need more information (like helm values and loki-backend pod logs).
The only workaround I found for now is to only deploy one instance of backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant