Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8smeta does not populate its events even though it is configured correctly and there are no errors #514

Open
fjellvannet opened this issue Jun 29, 2024 · 12 comments
Assignees
Labels
kind/bug Something isn't working

Comments

@fjellvannet
Copy link

fjellvannet commented Jun 29, 2024

Describe the bug
I set up k8smeta and k8smetacollector according to this command (line 273 in Falco's official Helm chart):

helm install falco falcosecurity/falco \
    --namespace falco \
    --create-namespace \
    --set collectors.kubernetes.enabled=true

I create and add a custom syscall rule that often triggers in my deployment, and use a k8smeta-field, k8smeta.pod.name to be precise. I would expect this field to be populated, but it returns N/A. Sorry for this bug report being very long, I just included a lot of context :)

How to reproduce it

Deploy falco with the following command using its Helm-chart:
helm upgrade --install falco falcosecurity/falco --namespace falco --create-namespace -f falco-values.yaml

falco-values.yaml has the following contents:

falco:
  rules_file:
    - /etc/falco/rules.d

driver:
  kind: ebpf

collectors:
  kubernetes:
    enabled: false

falcosidekick:
  enabled: true
  webui:
    enabled: true

customRules:
  rules-k8smeta.yaml: |-
    - macro: k8s_containers
      condition: >
        (container.image.repository in (gcr.io/google_containers/hyperkube-amd64,
        gcr.io/google_containers/kube2sky,
        docker.io/sysdig/sysdig, sysdig/sysdig,
        fluent/fluentd-kubernetes-daemonset, prom/prometheus,
        falco_containers,
        falco_no_driver_containers,
        ibm_cloud_containers,
        velero/velero,
        quay.io/jetstack/cert-manager-cainjector, weaveworks/kured,
        quay.io/prometheus-operator/prometheus-operator,
        registry.k8s.io/ingress-nginx/kube-webhook-certgen, quay.io/spotahome/redis-operator,
        registry.opensource.zalan.do/acid/postgres-operator, registry.opensource.zalan.do/acid/postgres-operator-ui,
        rabbitmqoperator/cluster-operator, quay.io/kubecost1/kubecost-cost-model,
        docker.io/bitnami/prometheus, docker.io/bitnami/kube-state-metrics, mcr.microsoft.com/oss/azure/aad-pod-identity/nmi)
        or (k8s.ns.name = "kube-system"))

    - macro: never_true
      condition: (evt.num=0)

    - macro: container
      condition: (container.id != host)

    - macro: k8s_api_server
      condition: (fd.sip.name="kubernetes.default.svc.cluster.local")
    
    - macro: user_known_contact_k8s_api_server_activities
      condition: (never_true)
    
    - rule: Custom Contact K8S API Server From Container
      desc: >
        Detect attempts to communicate with the K8S API Server from a container by non-profiled users. Kubernetes APIs play a 
        pivotal role in configuring the cluster management lifecycle. Detecting potential unauthorized access to the API server 
        is of utmost importance. Audit your complete infrastructure and pinpoint any potential machines from which the API server 
        might be accessible based on your network layout. If Falco can't operate on all these machines, consider analyzing the 
        Kubernetes audit logs (typically drained from control nodes, and Falco offers a k8saudit plugin) as an additional data 
        source for detections within the control plane.
      condition: >
        evt.type=connect and evt.dir=< 
        and (fd.typechar=4 or fd.typechar=6) 
        and container 
        and k8s_api_server 
        and not k8s_containers 
        and not user_known_contact_k8s_api_server_activities
      output: Custom Unexpected connection to K8s API Server from container (connection=%fd.name lport=%fd.lport rport=%fd.rport fd_type=%fd.type fd_proto=%fd.l4proto evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline k8s_podname=%k8smeta.pod.name orig_podname=%k8s.pod.name terminal=%proc.tty %container.info)
      priority: NOTICE
      tags: [maturity_stable, container, network, k8s, mitre_discovery, T1565]

The included custom rule is a copy of the Custom Contact K8S API Server From Container rule with all its dependencies. The only modification is that two new fields k8s_podname=%k8smeta.pod.name and orig_podname=%k8s.pod.name are added to the output. The orig_podname field is populated - it says the same as k8s.pod.name in the output. However, k8s_podname remains N/A, and I would expect this field to be populated if the same value is available in k8s.pod.name, which is said to only be kept alive for backwards compatibility purposes (line 250 in Falco's official Helm chart).

Expected behaviour
I would expect that if k8s.pod.name is populated with a value, k8smeta.pod.name should also be populated.

Screenshots
k8smeta-fields not available
Checking out the events in the UI, we see that the k8s_podname field remains N/A while orig_podname gets the same value as k8s.pod.name.

metacollector is running
falco-k8s-metacollector is running in the same namespace as the falco-pods and the UI.

k8smeta plugin is installed
Logs from the artifact-install-container show that k8smeta is indeed installed correctly.

Environment

  • Falco version: 0.38.1 (x86_64)
  • System info: {"machine":"x86_64","nodename":"falco-fq2mk","release":"6.8.0-36-generic","sysname":"Linux","version":"Field properties changes #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 10:49:14 UTC 2024"},"version":"0.38.1"}
  • Cloud provider or hardware configuration: microk8s on three VM's which are connected in a cluster, each with 4 vCPU's and 16 GB RAM. microk8s uses its own version of containerd that has its socket in /var/snap/microk8s/common/run/containerd.sock. In that position falco does not find it, so I created an empty file in /run/containerd/containerd.sock, and then used sudo mount --bind /var/snap/microk8s/common/run/containerd.sock /run/containerd/containerd.sock to make it accessible for falco. That seems to work, as before this change the pod- and containername were N/A in the UI as well, and now they are populated, so falco seems to have access to the containerd socket at least. Changing the deployment volume containerd-socket to mount the /var/snap/microk8s/common/run to falco instead of /run/containerd also works.
  • OS: Ubuntu
PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
  • Kernel: Linux microk8s-1 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 10:49:14 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • Installation method:
    helm upgrade --install falco falcosecurity/falco --namespace falco --create-namespace -f falco-values.yaml, see above.

Additional context
/etc/falco/falco.yaml pulled from one of the falco-pods:

base_syscalls:
  custom_set: []
  repair: false
buffered_outputs: false
config_files:
- /etc/falco/config.d
engine:
  ebpf:
    buf_size_preset: 4
    drop_failed_exit: false
    probe: ${HOME}/.falco/falco-bpf.o
  kind: ebpf
falco_libs:
  thread_table_size: 262144
file_output:
  enabled: false
  filename: ./events.txt
  keep_alive: false
grpc:
  bind_address: unix:///run/falco/falco.sock
  enabled: false
  threadiness: 0
grpc_output:
  enabled: false
http_output:
  ca_bundle: ""
  ca_cert: ""
  ca_path: /etc/falco/certs/
  client_cert: /etc/falco/certs/client/client.crt
  client_key: /etc/falco/certs/client/client.key
  compress_uploads: false
  echo: false
  enabled: true
  insecure: false
  keep_alive: false
  mtls: false
  url: http://falco-falcosidekick:2801
  user_agent: falcosecurity/falco
json_include_output_property: true
json_include_tags_property: true
json_output: true
libs_logger:
  enabled: false
  severity: debug
load_plugins:
- k8smeta
log_level: info
log_stderr: true
log_syslog: true
metrics:
  convert_memory_to_mb: true
  enabled: false
  include_empty_values: false
  interval: 1h
  kernel_event_counters_enabled: true
  libbpf_stats_enabled: true
  output_rule: true
  resource_utilization_enabled: true
  rules_counters_enabled: true
  state_counters_enabled: true
output_timeout: 2000
outputs:
  max_burst: 1000
  rate: 0
outputs_queue:
  capacity: 0
plugins:
- init_config: null
  library_path: libk8saudit.so
  name: k8saudit
  open_params: http://:9765/k8s-audit
- library_path: libcloudtrail.so
  name: cloudtrail
- init_config: ""
  library_path: libjson.so
  name: json
- init_config:
    collectorHostname: falco-k8s-metacollector.falco.svc
    collectorPort: 45000
    nodeName: ${FALCO_K8S_NODE_NAME}
  library_path: libk8smeta.so
  name: k8smeta
priority: debug
program_output:
  enabled: false
  keep_alive: false
  program: 'jq ''{text: .output}'' | curl -d @- -X POST https://hooks.slack.com/services/XXX'
rule_matching: first
rules_file:
- /etc/falco/rules.d
stdout_output:
  enabled: true
syscall_event_drops:
  actions:
  - log
  - alert
  max_burst: 1
  rate: 0.03333
  simulate_drops: false
  threshold: 0.1
syscall_event_timeouts:
  max_consecutives: 1000
syslog_output:
  enabled: true
time_format_iso_8601: false
watch_config_files: true
webserver:
  enabled: true
  k8s_healthz_endpoint: /healthz
  listen_port: 8765
  prometheus_metrics_enabled: false
  ssl_certificate: /etc/falco/falco.pem
  ssl_enabled: false
  threadiness: 0

As far as I can see, k8smeta and k8smeta collector are configured correctly here in the config as well. I experimented with changing the port or hostname of the metacollector, and then I got errors, same when I turned on SSL without fixing the certificates. This screenshot from the falco-container log also confirms that k8smeta is running - it says that it received at least one event from k8s-metacollector, indicating that their connection should be OK.

k8smeta plugin is healthy - no errors in logs
Also here it looks as if the k8smeta plugin is healthy. When I removed collectors.kubernetes.enabled=true, falco would not start any longer claiming that I used an invalid value in my rule in rules-k8smeta.yaml, the invalid value being k8smeta.pod.name, which is another indication of k8smeta likely being set up correctly.

@fjellvannet fjellvannet added the kind/bug Something isn't working label Jun 29, 2024
@alacuku
Copy link
Member

alacuku commented Jul 4, 2024

Hi @fjellvannet, unfortunately, I'm not able to reproduce your issue. It works on my side.

I installed Falco:

helm install falco falcosecurity/falco \
    --namespace falco \
    --create-namespace \
    --set collectors.kubernetes.enabled=true

I added the custom rule as you did.

And here is the output of Falco:

Thu Jul 4 15:17:12 2024: [info] [k8smeta] The plugin received at least one event from the k8s-metacollector
15:17:14.901142503: Notice Custom Unexpected connection to K8s API Server from container (connection=10.16.1.11:50383->10.0.0.1:80 lport=50383 rport=80 fd_type=ipv4 fd_proto=udp evt_type=connect user=root user_uid=0 user_loginuid=-1 process=curl proc_exepath=/usr/bin/curl parent=zsh command=curl kubernetes.default k8s_podname=tmp-shell orig_podname=tmp-shell terminal=34816 container_id=82d0121584ee container_image=docker.io/nicolaka/netshoot container_image_tag=latest container_name=tmp-shell k8s_ns=tmp-namespace k8s_pod_name=tmp-shell)
15:17:14.901195293: Notice Custom Unexpected connection to K8s API Server from container (connection=10.16.1.11:57012->10.0.0.1:80 lport=57012 rport=80 fd_type=ipv4 fd_proto=tcp evt_type=connect user=root user_uid=0 user_loginuid=-1 process=curl proc_exepath=/usr/bin/curl parent=zsh command=curl kubernetes.default k8s_podname=tmp-shell orig_podname=tmp-shell terminal=34816 container_id=82d0121584ee container_image=docker.io/nicolaka/netshoot container_image_tag=latest container_name=tmp-shell k8s_ns=tmp-namespace k8s_pod_name=tmp-shell)

@fjellvannet
Copy link
Author

fjellvannet commented Jul 4, 2024

I just reinstalled everything in the same way and still have the error.

Did you try on microk8s specifically? I don't know if it is part of the problem.

To give falco access to the containerd socket (which microk8s has put in a snappy directory apart from the default containerd socket), I had to create the empty files /run/containerd/containerd.sock and /run/containerd/containerd.sock.ttrpc and add these lines to /etc/fstab

/var/snap/microk8s/common/run/containerd.sock /run/containerd/containerd.sock none bind 0 0
/var/snap/microk8s/common/run/containerd.sock.ttrpc /run/containerd/containerd.sock.ttrpc none bind 0 0

It makes the microk8s containerd socket accessible for falco in the default location. This hack fixes the k8s.pod.name field, but not k8smeta.pod.name

What kind of cluster did you use to test?

@alacuku
Copy link
Member

alacuku commented Jul 5, 2024

Hey @fjellvannet, I used a kubeadm cluster. Can you share the instructions on how to create your environment?

@fjellvannet
Copy link
Author

fjellvannet commented Jul 5, 2024

Start with a vanilla Ubuntu 24.04 server amd64 machine.

Install the microk8s snap:
sudo snap install microk8s --classic

Add user to the microk8s group to steer microk8s without sudo:
sudo usermod -aG microk8s "$USER"

Install / set up kubeconfig for kubectl / helm etc., microk8s config spits it out

Install the kube-prometheus-stack, as grafana constantly triggers my custom rule:
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack -n kube-metrics --create-namespace

Create falco-rules-k8smeta.yaml with the following content:

customRules:
  rules-k8smeta.yaml: |-
    - macro: k8s_containers
      condition: >
        (container.image.repository in (gcr.io/google_containers/hyperkube-amd64,
        gcr.io/google_containers/kube2sky,
        docker.io/sysdig/sysdig, sysdig/sysdig,
        fluent/fluentd-kubernetes-daemonset, prom/prometheus,
        falco_containers,
        falco_no_driver_containers,
        ibm_cloud_containers,
        velero/velero,
        quay.io/jetstack/cert-manager-cainjector, weaveworks/kured,
        quay.io/prometheus-operator/prometheus-operator,
        registry.k8s.io/ingress-nginx/kube-webhook-certgen, quay.io/spotahome/redis-operator,
        registry.opensource.zalan.do/acid/postgres-operator, registry.opensource.zalan.do/acid/postgres-operator-ui,
        rabbitmqoperator/cluster-operator, quay.io/kubecost1/kubecost-cost-model,
        docker.io/bitnami/prometheus, docker.io/bitnami/kube-state-metrics, mcr.microsoft.com/oss/azure/aad-pod-identity/nmi)
        or (k8s.ns.name = "kube-system"))

    - macro: never_true
      condition: (evt.num=0)

    - macro: container
      condition: (container.id != host)

    - macro: k8s_api_server
      condition: (fd.sip.name="kubernetes.default.svc.cluster.local")
    
    - macro: user_known_contact_k8s_api_server_activities
      condition: (never_true)
    
    - rule: Custom Contact K8S API Server From Container
      desc: >
        Detect attempts to communicate with the K8S API Server from a container by non-profiled users. Kubernetes APIs play a 
        pivotal role in configuring the cluster management lifecycle. Detecting potential unauthorized access to the API server 
        is of utmost importance. Audit your complete infrastructure and pinpoint any potential machines from which the API server 
        might be accessible based on your network layout. If Falco can't operate on all these machines, consider analyzing the 
        Kubernetes audit logs (typically drained from control nodes, and Falco offers a k8saudit plugin) as an additional data 
        source for detections within the control plane.
      condition: >
        evt.type=connect and evt.dir=< 
        and (fd.typechar=4 or fd.typechar=6) 
        and container 
        and k8s_api_server 
        and not k8s_containers 
        and not user_known_contact_k8s_api_server_activities
      output: Custom Unexpected connection to K8s API Server from container (connection=%fd.name lport=%fd.lport rport=%fd.rport fd_type=%fd.type fd_proto=%fd.l4proto evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline k8s_podname=%k8smeta.pod.name orig_podname=%k8s.pod.name terminal=%proc.tty %container.info)
      priority: NOTICE
      tags: [maturity_stable, container, network, k8s, mitre_discovery, T1565]

Deploy falco using Helm and make sure the custom rule is evaluated before the default rule:

helm upgrade --install falco falcosecurity/falco \
    --namespace falco \
    --create-namespace \
    --set collectors.kubernetes.enabled=true \
    --set falco.rules_file="{/etc/falco/rules.d}" \
  -f falco-rules-k8smeta.yaml

Create the following bash-script that adjusts the path of the volume that mounts the containerd-socket to falco. Microk8s uses its own containerd instance, the socket is stored in /var/snap/microk8s/common/run/containerd.sock where falco cannot find it without help. The following script modifies the volume of the containerd-socket in the falco daemonset to correctly mount the socket in the snap. I have not found a way to make this adaptation directly in the Helm script. It would make the life much easier if it just could be set there directly. The bind-mount I presented in my previous comment has the same effect, but for safe reproducibility the patch script is better I think.

#!/bin/bash

# Replace <name> with your DaemonSet's name
DAEMONSET_NAME="falco"

# Find the index of the 'containerd-socket' volume
INDEX=$(kubectl -n falco get daemonset "$DAEMONSET_NAME" -o json | jq '.spec.template.spec.volumes | map(.name) | index("containerd-socket")')

# Check if the volume was found
if [ "$INDEX" = "null" ]; then
    echo "Volume 'containerd-socket' not found."
    exit 1
fi

# Construct the JSON Patch
PATCH="[{\"op\": \"replace\", \"path\": \"/spec/template/spec/volumes/$INDEX/hostPath/path\", \"value\": \"/var/snap/microk8s/common/run\"}]"

# Apply the patch
kubectl patch daemonset "$DAEMONSET_NAME" --type='json' -p="$PATCH"

When the daemonset has updated and the pod has restarted, enjoy:

falco 09:30:43.245032810: Notice Custom Unexpected connection to K8s API Server from container (connection=10.1.10.89:41658->10.152.183.1:443 lport=41658 rport=443 fd_type=ipv4 fd_proto=tcp evt_type=connect user=<NA> user_uid=472 user_loginuid=-1 process=python proc_exepath=/usr/local/bin/python3.12 parent=python command=python -u /app/sidecar.py k8s_podname=<NA> orig_podname=kube-prometheus-stack-grafana-86844f6b47-t8cg2 terminal=0 container_id=5db2c3be25ce container_image=quay.io/kiwigrid/k8s-sidecar container_image_tag=1.26.1 container_name=grafana-sc-dashboard k8s_ns=kube-metrics k8s_pod_name=kube-prometheus-stack-grafana-86844f6b47-t8cg2)

In the k8smeta field k8s_podname which I added, the value is <NA>

If I have set up sth wrong here or forgotten sth according to the documentation, please tell me :)
Could not figure it out myself at least because there are no error-messages indicating a problem.

@fjellvannet fjellvannet reopened this Jul 5, 2024
@alacuku
Copy link
Member

alacuku commented Jul 23, 2024

Hi @fjellvannet, it turns out that you are right. The plugin does not populate fields for containers that existed before Falco was deployed. We are working on a fix. We are going to release a new plugin version in the coming days.

Thanks for your effort in helping us to discover the bug.

@alacuku
Copy link
Member

alacuku commented Jul 25, 2024

Hey @fjellvannet, the latest helm chart of Falco includes the fix. Could you please try it out?

@fjellvannet
Copy link
Author

fjellvannet commented Jul 25, 2024

I have tested it. Redeployed falco with the same rule.

$ helm list -n falco
NAME 	NAMESPACE	REVISION	UPDATED                              	STATUS  	CHART      	APP VERSION
falco	falco    	1       	2024-07-25 18:41:23.607662 +0200 CEST	deployed	falco-4.7.0	0.38.1

Unfortunately it made no difference:

Skjermbilde 2024-07-25 kl  18 56 57

It is again the same custom rule, now triggered from the istio-kiali container that was deployed before falco. The k8smeta field is still N/A

@alacuku
Copy link
Member

alacuku commented Jul 26, 2024

@fjellvannet, can you share the falco logs? The plugin should have scanned the /proc for existing processes.

@fjellvannet
Copy link
Author

fjellvannet commented Jul 26, 2024

I ran the following command to produce these logs:

helm upgrade --install falco falcosecurity/falco -f falco-rules-k8smeta.yaml -f falco-values.yaml -n falco --create-namespace
stern -n falco ".*" | tee falco-logs.txt

falco-logs.txt with the log-messages can be found here, along with the helm-files:
falco-logs.txt
falco-rules-k8smeta.yaml.txt
falco-values.yaml.txt

@leogr
Copy link
Member

leogr commented Aug 28, 2024

Is this still an issue? 🤔

@fjellvannet
Copy link
Author

fjellvannet commented Aug 28, 2024 via email

@alacuku
Copy link
Member

alacuku commented Nov 8, 2024

Hey @fjellvannet,
this PR fixes the issue. . I’ve tested it in production, and it’s been performing well so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants