Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced performance of sending logs #92

Open
KhafRuslan opened this issue Aug 2, 2024 · 5 comments
Open

Reduced performance of sending logs #92

KhafRuslan opened this issue Aug 2, 2024 · 5 comments

Comments

@KhafRuslan
Copy link

KhafRuslan commented Aug 2, 2024

At a certain point, when we reached a heavy load we encountered the problem of low speed of sending logs via promtail
image
The difference is the speed of reading promtail logs from a file with the same configuration. In the screenshot promtail sent all messages to loki
image

Configuring the client part of promtail:

clients:
  - url: http://127.0.0.1:3111/loki/api/v1/push
    batchwait: 1s
    batchsize: 100
    backoff_config:
      min_period: 100ms
      max_period: 5s
    external_labels:
      job: ${HOSTNAME}

The solution was simple, we raised the second loki log receiver. After that we can observe a decrease in the graph above. The result is the same
image
The average resource utilization of an instance was no higher than 30 percent

@lmangani
Copy link
Contributor

lmangani commented Aug 2, 2024

The qryn process is single threaded so you either need to scale multiple writers/readers and distribute traffic to achieve your desired capacity or use the qryn otel-collector and write directly into ClickHouse at max speed. Remember most of the performance is on the clickhouse side.

@KhafRuslan
Copy link
Author

KhafRuslan commented Aug 3, 2024

Rather, the description of the panels was confusing. I use qryn otel-collector, it was on it that I encountered the problem. single receiver configuration :

receivers:
  loki:
    protocols:
       grpc:
        endpoint: 0.0.0.0:3200
       http:
        endpoint: 0.0.0.0:3100

processors:
  batch/logs:
    send_batch_size: 8600
    timeout: 400ms
  memory_limiter/logs:
    limit_percentage: 100
    check_interval: 2s

exporters:
  qryn:
    dsn: http://qryn-chp1...
    logs:
      format: raw
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_elapsed_time: 300s
      max_interval: 30s
    sending_queue:
      queue_size: 1200
    timeout: 10s
service:
    extensions: [pprof, zpages, health_check]
    pipelines:
       logs:
         exporters: [qryn]
         processors: [batch/logs]
         receivers: [loki]
    telemetry:
      logs:
        level: "debug"
      metrics:
        address: 0.0.0.0:8888

@lmangani
Copy link
Contributor

lmangani commented Aug 8, 2024

If you are using the otel-collector to ingest, then I would assume the bottleneck being either with the collector or clickhouse rather than qryn itself. Did you observe any resource bottlenecking while operating the setup?

@KhafRuslan
Copy link
Author

KhafRuslan commented Aug 9, 2024

I ran into the problem not in qryn. It's with qryn-otel-collector. Perhaps I misunderstood your comment. I'm not sure if it's a resource problem, because it works correctly when I bring up another receiver

@lmangani
Copy link
Contributor

lmangani commented Sep 4, 2024

I'm not sure if it's a resource problem, because it works correctly when I bring up another receiver

We definitely need to investigate this further to understand what the root cause is. Could you show the multi-receiver config too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants