Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‘du’ and ‘df’ showing different usage #14914

Open
GeiserX opened this issue Nov 14, 2024 · 0 comments
Open

‘du’ and ‘df’ showing different usage #14914

GeiserX opened this issue Nov 14, 2024 · 0 comments

Comments

@GeiserX
Copy link

GeiserX commented Nov 14, 2024

Describe the bug
Every week or so, I should manually restart the loki containers, as the storage fills up even though it's seemingly "empty". This is happening in two clusters. This is happening since a long time (I've been upgrading Loki many times and the problem still persists), but I cannot say for sure when it all started.

A Grafana Champion in the official forum encouraged me to open this bug issue: https://community.grafana.com/t/du-and-df-showing-different-usage/134459

To Reproduce
Steps to reproduce the behavior:

  1. Install Loki with this values.yaml file:
deploymentMode: SingleBinary
loki:
  image:
    tag: 3.2.0
  config: |
    {{- if .Values.enterprise.enabled}}
    {{- tpl .Values.enterprise.config . }}
    {{- else }}
    auth_enabled: {{ .Values.loki.auth_enabled }}
    {{- end }}

    {{- with .Values.loki.server }}
    server:
      {{- toYaml . | nindent 2}}
    {{- end}}

    pattern_ingester:
      enabled: {{ .Values.loki.pattern_ingester.enabled }}

    memberlist:
    {{- if .Values.loki.memberlistConfig }}
      {{- toYaml .Values.loki.memberlistConfig | nindent 2 }}
    {{- else }}
    {{- if .Values.loki.extraMemberlistConfig}}
    {{- toYaml .Values.loki.extraMemberlistConfig | nindent 2}}
    {{- end }}
      join_members:
        - {{ include "loki.memberlist" . }}
        {{- with .Values.migrate.fromDistributed }}
        {{- if .enabled }}
        - {{ .memberlistService }}
        {{- end }}
        {{- end }}
    {{- end }}

    {{- with .Values.loki.ingester }}
    ingester:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- if .Values.loki.commonConfig}}
    common:
    {{- toYaml .Values.loki.commonConfig | nindent 2}}
      storage:
      {{- include "loki.commonStorageConfig" . | nindent 4}}
    {{- end}}

    {{- with .Values.loki.limits_config }}
    limits_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    runtime_config:
      file: /etc/loki/runtime-config/runtime-config.yaml

    {{- with .Values.chunksCache }}
    {{- if .enabled }}
    chunk_store_config:
      chunk_cache_config:
        default_validity: {{ .defaultValidity }}
        background:
          writeback_goroutines: {{ .writebackParallelism }}
          writeback_buffer: {{ .writebackBuffer }}
          writeback_size_limit: {{ .writebackSizeLimit }}
        memcached:
          batch_size: {{ .batchSize }}
          parallelism: {{ .parallelism }}
        memcached_client:
          addresses: dnssrvnoa+_memcached-client._tcp.{{ template "loki.fullname" $ }}-chunks-cache.{{ $.Release.Namespace }}.svc
          consistent_hash: true
          timeout: {{ .timeout }}
          max_idle_conns: 72
    {{- end }}
    {{- end }}

    {{- if .Values.loki.schemaConfig }}
    schema_config:
    {{- toYaml .Values.loki.schemaConfig | nindent 2}}
    {{- end }}

    {{- if .Values.loki.useTestSchema }}
    schema_config:
    {{- toYaml .Values.loki.testSchemaConfig | nindent 2}}
    {{- end }}

    {{ include "loki.rulerConfig" . }}

    {{- if or .Values.tableManager.retention_deletes_enabled .Values.tableManager.retention_period }}
    table_manager:
      retention_deletes_enabled: {{ .Values.tableManager.retention_deletes_enabled }}
      retention_period: {{ .Values.tableManager.retention_period }}
    {{- end }}

    query_range:
      align_queries_with_step: true
      {{- with .Values.loki.query_range }}
      {{- tpl (. | toYaml) $ | nindent 2 }}
      {{- end }}
      {{- if .Values.resultsCache.enabled }}
      {{- with .Values.resultsCache }}
      cache_results: true
      results_cache:
        cache:
          default_validity: {{ .defaultValidity }}
          background:
            writeback_goroutines: {{ .writebackParallelism }}
            writeback_buffer: {{ .writebackBuffer }}
            writeback_size_limit: {{ .writebackSizeLimit }}
          memcached_client:
            consistent_hash: true
            addresses: dnssrvnoa+_memcached-client._tcp.{{ template "loki.fullname" $ }}-results-cache.{{ $.Release.Namespace }}.svc
            timeout: {{ .timeout }}
            update_interval: 1m
      {{- end }}
      {{- end }}

    {{- with .Values.loki.storage_config }}
    storage_config:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.query_scheduler }}
    query_scheduler:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.compactor }}
    compactor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.analytics }}
    analytics:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.querier }}
    querier:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.index_gateway }}
    index_gateway:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend }}
    frontend:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.frontend_worker }}
    frontend_worker:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    {{- with .Values.loki.distributor }}
    distributor:
      {{- tpl (. | toYaml) $ | nindent 4 }}
    {{- end }}

    tracing:
      enabled: {{ .Values.loki.tracing.enabled }}
  auth_enabled: false
  server:
    http_listen_port: 3100
    grpc_listen_port: 9095
    grpc_server_max_recv_msg_size: 52434304 # 50MBi
    grpc_server_max_send_msg_size: 52434304 # 50MBi
  limits_config:
    retention_period: 30d
    ingestion_rate_mb: 8
    ingestion_burst_size_mb: 16
    per_stream_rate_limit: 5MB
    per_stream_rate_limit_burst: 15MB
    tsdb_max_query_parallelism: 128
    split_queries_by_interval: 1h
  commonConfig:
    path_prefix: /var/loki
    replication_factor: 3
    compactor_address: '{{ include "loki.compactorAddress" . }}'
  storage:
    bucketNames:
      chunks: loki-boltdb
      ruler: loki-boltdb
    type: s3
    s3:
      endpoint: ...
      region: ...
      secretAccessKey: ${S3_SECRET_ACCESS_KEY}
      accessKeyId: ${S3_ACCESS_KEY_ID}
      s3ForcePathStyle: true
      sse_encryption: false
      insecure: false
      http_config:
        idle_conn_timeout: 90s
        response_header_timeout: 0s
        insecure_skip_verify: true
    filesystem:
      chunks_directory: /var/loki/chunks
      rules_directory: /var/loki/rules
  schemaConfig:
    configs:
      - from: "2024-04-16"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: index_
          period: 24h
      - from: "2024-09-24"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: index_
          period: 24h
  query_scheduler:
    max_outstanding_requests_per_tenant: 2048
  storage_config:
    hedging:
      at: "250ms"
      max_per_second: 20
      up_to: 3
  compactor:
    working_directory: /var/loki/data/retention
    compaction_interval: 10m
    retention_enabled: true
    retention_delete_delay: 2h
    retention_delete_worker_count: 150
    delete_request_store: s3
  pattern_ingester:
    enabled: false
  querier:
    max_concurrent: 8
  ingester:
    chunk_block_size: 262144
    chunk_retain_period: 1m
    chunk_target_size: 1572864 # 1.5MB
    chunk_encoding: snappy
    max_chunk_age: 2h
    chunk_idle_period: 1h
    lifecycler:
      final_sleep: 0s
      ring:
        replication_factor: 1
        heartbeat_timeout: 10m

  frontend:
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'
    log_queries_longer_than: 20s
  frontend_worker:
    scheduler_address: '{{ include "loki.querySchedulerAddress" . }}'

ingress:
  enabled: false

test:
  enabled: false
lokiCanary:
  enabled: false

singleBinary:
  replicas: 2
  extraArgs:
    - -config.expand-env=true
  extraEnv:
    - name: S3_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: loki-boltdb-bucket
          key: AWS_ACCESS_KEY_ID
    - name: S3_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: loki-boltdb-bucket
          key: AWS_SECRET_ACCESS_KEY
  resources:
    limits:
      cpu: "5"
      memory: 8Gi
    requests:
      cpu: 200m
      memory: 3Gi

  persistence:
    enabled: true
    size: 100Gi
    enableStatefulSetAutoDeletePVC: false

write:
  replicas: 0
read:
  replicas: 0
backend:
  replicas: 0

resultsCache:
  enabled: true
  replicas: 2

chunksCache:
  enabled: true
  replicas: 2

gateway:
  enabled: false

loki:
  limits_config:
    retention_stream:
      - selector: '{app="..."}'
        priority: 1
        period: 7d

Expected behavior
Loki should stop keeping deleted files open, so that I don't need to be manually restarting the container

Environment:

  • Infrastructure: K8s
  • Deployment tool: Helm

Screenshots, Promtail config, or terminal output
Repeating the steps from my grafana forum post. It's been 28h running and these are the results:

~ ❯ k exec -it loki-stack-0 -c loki -- df -h
/dev/rbd16               98.2G      4.4G     93.8G   5% /var/loki
...
~ ❯ k exec -it loki-stack-0 -c loki -- du -hs /var/loki
543.1M	/var/loki

Notice there's already a lot of difference between the real space used and the used space by deleted files.

~ ❯ k exec -it loki-stack-1 -c loki -- lsof +L1
1	/usr/bin/loki	10	/var/loki/data/retention/deletion/delete_requests/delete_requests
1	/usr/bin/loki	42	/var/loki/tsdb-shipper-active/wal/s3_2024-09-24/1731576632/00000000
1	/usr/bin/loki	54	/var/loki/wal/00061847
1	/usr/bin/loki	55	/var/loki/tsdb-shipper-active/multitenant/index_20041/1731575732-loki-stack-1-1713781691510093445.tsdb
1	/usr/bin/loki	57	/var/loki/tsdb-shipper-active/wal/s3_2024-04-16/1731576632/00000000
1	/usr/bin/loki	67	/var/loki/wal/checkpoint.061846.tmp/00000000
1	/usr/bin/loki	71	/var/loki/tsdb-shipper-cache/index_20041/fake/1731576863927558043-compactor-1731533767289-1731576608650-a2e01e88.tsdb
... # Just a lot of sockets and pipes more

Let's investigate all these files:

~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/data/retention/deletion/delete_requests/delete_requests
  16.0K -rw-rw-r--    1 loki     loki       16.0K May  7  2024 /var/loki/data/retention/deletion/delete_requests/delete_requests
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/tsdb-shipper-active/wal/s3_2024-09-24/1731576632/00000000
  72.0K -rw-r--r--    1 loki     loki       68.0K Nov 14 09:44 /var/loki/tsdb-shipper-active/wal/s3_2024-09-24/1731576632/00000000
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/wal/00061847
  89.3M -rw-r--r--    1 loki     loki       89.3M Nov 14 09:44 /var/loki/wal/00061847
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/tsdb-shipper-active/multitenant/index_20041/1731575732-loki-stack-1-1713781691510093445.tsdb
  64.0K -rw-r--r--    1 loki     loki       62.4K Nov 14 09:30 /var/loki/tsdb-shipper-active/multitenant/index_20041/1731575732-loki-stack-1-1713781691510093445.tsdb
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/tsdb-shipper-active/wal/s3_2024-04-16/1731576632/00000000
ls: /var/loki/tsdb-shipper-active/wal/s3_2024-04-16/1731576632/00000000: No such file or directory
command terminated with exit code 1
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/wal/checkpoint.061846.tmp/00000000
ls: /var/loki/wal/checkpoint.061846.tmp/00000000: No such file or directory
command terminated with exit code 1
~ ❯ k exec -it loki-stack-1 -c loki -- ls -lash /var/loki/tsdb-shipper-cache/index_20041/fake/1731576863927558043-compactor-1731533767289-1731576608650-a2e01e88.tsdb
   5.5M -rw-r--r--    1 loki     loki        5.5M Nov 14 09:34 /var/loki/tsdb-shipper-cache/index_20041/fake/1731576863927558043-compactor-1731533767289-1731576608650-a2e01e88.tsdb

Let me know if I should provide any more info to properly debug this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant