Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine CVE check in check script for k8s version policy #779

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
d96edd7
adding functions for getting more information & debug
piobig2871 Oct 10, 2024
0caf99d
Merge branch 'main' into 526-refine-cve-check-in-scs-0210-v2-test-script
piobig2871 Oct 11, 2024
ebfe951
solving conflicts
piobig2871 Oct 16, 2024
e635411
fixing git inset
piobig2871 Oct 16, 2024
a298678
Merge branch 'main' into 526-refine-cve-check-in-scs-0210-v2-test-script
piobig2871 Oct 16, 2024
cfc3dc6
Merge branch 'main' into 526-refine-cve-check-in-scs-0210-v2-test-script
piobig2871 Oct 16, 2024
54ee694
feat: Add Kubernetes pod image scanning and improve error handling
piobig2871 Oct 18, 2024
cc87097
Merge branch 'main' into 526-refine-cve-check-in-scs-0210-v2-test-script
piobig2871 Oct 21, 2024
38921f1
Merge branch 'main' into 526-refine-cve-check-in-scs-0210-v2-test-script
piobig2871 Oct 22, 2024
12987e5
removing comments
piobig2871 Oct 23, 2024
1d332ae
reseting standard to it's original form
piobig2871 Nov 4, 2024
11daeac
reverting ClusterInfo to its original shape, removing kubeconfig fiel…
piobig2871 Nov 4, 2024
e62b347
removing unused kubeconfig variable
piobig2871 Nov 4, 2024
5ab2bd0
fixing pylint and docstring formatting
piobig2871 Nov 4, 2024
5e2b764
Update Tests/kaas/k8s-version-policy/k8s_version_policy.py
piobig2871 Nov 4, 2024
3348960
fixing pylint and resolving conflict which appeard after review
piobig2871 Nov 4, 2024
4a59878
fixing the script with providing proper images list to check out
piobig2871 Nov 7, 2024
cea9840
femoving unused lines
piobig2871 Nov 7, 2024
c676f5f
Merge branch 'main' into 526-refine-cve-check-in-scs-0210-v2-test-script
piobig2871 Nov 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 25 additions & 8 deletions Standards/scs-0210-v2-k8s-version-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,31 @@ window period.
In order to keep up-to-date with the latest Kubernetes features, bug fixes and security improvements,
the provided Kubernetes versions should be kept up-to-date with new upstream releases:

- The latest minor version MUST be provided no later than 4 months after release.
- The latest patch version MUST be provided no later than 2 weeks after release.
- This time period MUST be even shorter for patches that fix critical CVEs.
In this context, a critical CVE is a CVE with a CVSS base score >= 8 according
to the CVSS version used in the original CVE record (e.g., CVSSv3.1).
It is RECOMMENDED to provide a new patch version in a 2-day time period after their release.
- New versions MUST be tested before being rolled out on productive infrastructure;
at least the [CNCF E2E tests][cncf-conformance] should be passed beforehand.
1. Minor Versions:
- The latest minor version MUST be provided no later than 4 months after release.

2. Patch Versions:
- The latest patch version MUST be provided no later than 1 week after release.
piobig2871 marked this conversation as resolved.
Show resolved Hide resolved
- This time period MUST be even shorter for patches that fix critical CVEs.
In this context, a critical CVE is a CVE with a CVSS base score >= 8 according
to the CVSS version used in the original CVE record (e.g., CVSSv3.1).
It is RECOMMENDED to provide a new patch version in a 2-day time period after their release.
- New versions MUST be tested before being rolled out on productive infrastructure;
at least the [CNCF E2E tests][cncf-conformance] should be passed beforehand.
piobig2871 marked this conversation as resolved.
Show resolved Hide resolved

3. CI Integration
* Trivy
- Providers should integrate Trivy into their CI pipeline to automatically scan Kubernetes cluster components,
including kubelet, apiserver, and others.
- The CI job MUST fail if critical vulnerabilities (CVSS >= 8) are detected in the cluster components.
- JSON reports from Trivy scans should be reviewed, and Trivy's experimental status should be monitored for changes
in output formats.
* nvdlib (Fallback):
- If Trivy fails or cannot meet requierements, nvdlib MUST be used as a fallback to query CVE data for Kubernetes
versions, laveraging CPE-based searches to track vunerabilities for specific versions.
- Providers using nvdlib MUST periodically query for critical cunerabilities affecting the Kubernetes version in production.

4. TBD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This standard has been stabilized already, for better or worse. New requirements can only be introduced in a new major version (then v3). However, I'm not sure that this was the original objective of this PR; here, we mainly wanted some tooling for the compliance check, and the providers are free to use whatever tools they want. (We can put these items into the implementation notes though, but only as non-authoritative recommendation!)

Copy link
Author

@piobig2871 piobig2871 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that mean that I should restore original version of standard?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to make your research results available. We should just reframe them as guidelines for operators. We could write a blog post. I would then ask you to get feedback from Team Container. It would be good to talk to people who already use Trivy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have done right now is restore the original standard text and drop the changes.

According to the code, there were several changes made:

  • Integrated Trivy for scanning Kubernetes pod images for security vulnerabilities.
  • Fixed issue with ClusterInfo object being incorrectly passed where kubeconfig path was expected.
  • Added logging improvements to provide clearer insights during version compliance checks.
  • Refined the code structure to handle K8s image scanning and cluster versioning in an async manner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have done right now is restore the original standard text and drop the changes

This is not what I see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would then ask you to get feedback from Team Container

Have you done that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have done right now is restore the original standard text and drop the changes

@mbuechse I do apologize, I have reverted it now, it was lost somewhere on my git in the mess with the branches

Copy link
Author

@piobig2871 piobig2871 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would then ask you to get feedback from Team Container

Have you done that?

I have not, I will bring that topic on the nearest container call(last week there was not a container call at all).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!


At the same time, providers must support Kubernetes versions at least as long as the
official sources as described in [Kubernetes Support Period][k8s-support-period]:
Expand Down
97 changes: 95 additions & 2 deletions Tests/kaas/k8s-version-policy/k8s_version_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@
(c) Hannes Baum <[email protected]>, 6/2023
(c) Martin Morgenstern <[email protected]>, 2/2024
(c) Matthias Büchse <[email protected]>, 3/2024
(c) Piotr Bigos <[email protected]>
SPDX-License-Identifier: CC-BY-SA-4.0
"""

from collections import Counter
from dataclasses import dataclass
from datetime import datetime, timedelta
Expand All @@ -35,11 +35,13 @@
import asyncio
import contextlib
import getopt
import json
import kubernetes_asyncio
import logging
import logging.config
import re
import requests
import subprocess
import sys
import yaml

Expand Down Expand Up @@ -93,6 +95,10 @@ class HelpException(BaseException):
"""Exception raised if the help functionality is called"""


class CriticalException(BaseException):
"""Exception raised if the critical CVE are found"""


class Config:
kubeconfig = None
context = None
Expand Down Expand Up @@ -275,6 +281,7 @@ def __contains__(self, version: K8sVersion) -> bool:
class ClusterInfo:
version: K8sVersion
name: str
kubeconfig: str


async def request_cve_data(session: aiohttp.ClientSession, cveid: str) -> dict:
Expand Down Expand Up @@ -381,6 +388,80 @@ async def collect_cve_versions(session: aiohttp.ClientSession) -> set:
return cfvs


async def run_trivy_scan(image: str) -> dict:
"""
Run Trivy scan on the specified image and return the results as a dictionary.

Args:
image (str): The Docker image to scan.

Returns:
dict: Parsed JSON results from Trivy.
"""
try:
# Run the Trivy scan command
result = await asyncio.create_subprocess_exec(
'trivy',
'image',
'--format', 'json',
'--no-progress',
image,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)

stdout, stderr = await result.communicate()

if result.returncode != 0:
logger.error("Trivy scan failed: %s", stderr.decode().strip())
return {}

# Parse the JSON output from Trivy
return json.loads(stdout.decode())

except Exception as e:
logger.error("Error running Trivy scan: %s", e)
return {}


async def get_k8s_pod_images(kubeconfig, context=None) -> list[str]:
"""Get the list of container images used by all the pods in the Kubernetes cluster."""
cluster_config = await kubernetes_asyncio.config.load_kube_config(kubeconfig, context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is not used (this fact gets reported by flake8 as well), and this doesn't seem right?!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted


async with kubernetes_asyncio.client.ApiClient() as api:
v1 = kubernetes_asyncio.client.CoreV1Api(api)
pods = await v1.list_pod_for_all_namespaces(watch=False)

images = set()
for pod in pods.items:
for container in pod.spec.containers:
images.add(container.image)

if pod.spec.init_containers:
for container in pod.spec.init_containers:
images.add(container.image)

return list(images)


async def scan_k8s_images(kubeconfig, context=None) -> None:
"""Scan the images used in the Kubernetes cluster for vulnerabilities."""
images_to_scan = await get_k8s_pod_images(kubeconfig, context)
piobig2871 marked this conversation as resolved.
Show resolved Hide resolved

for image in images_to_scan:
logger.info(f"Scanning image: {image}")
scan_results = await run_trivy_scan(image)

if scan_results:
for result in scan_results.get('Results', []):
for vulnerability in result.get('Vulnerabilities', []):
logger.warning(
f"""Vulnerability found in image {image}:
{vulnerability['VulnerabilityID']} "
(Severity: {vulnerability['Severity']})"""
)


async def get_k8s_cluster_info(kubeconfig, context=None) -> ClusterInfo:
"""Get the k8s version of the cluster under test."""
cluster_config = await kubernetes_asyncio.config.load_kube_config(kubeconfig, context)
Expand All @@ -389,7 +470,7 @@ async def get_k8s_cluster_info(kubeconfig, context=None) -> ClusterInfo:
version_api = kubernetes_asyncio.client.VersionApi(api)
response = await version_api.get_code()
version = parse_version(response.git_version)
return ClusterInfo(version, cluster_config.current_context['name'])
return ClusterInfo(version, cluster_config.current_context['name'], kubeconfig=kubeconfig)


def check_k8s_version_recency(
Expand Down Expand Up @@ -474,11 +555,23 @@ async def main(argv):
logger.critical("The EOL data in %s is outdated and we cannot reliably run this script.", EOLDATA_FILE)
return 1

kubeconfig_path = config.kubeconfig

connector = aiohttp.TCPConnector(limit=5)
async with aiohttp.ClientSession(connector=connector) as session:
cve_affected_ranges = await collect_cve_versions(session)
releases_data = fetch_k8s_releases_data()

try:
logger.info(f"Checking cluster specified by {kubeconfig_path}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs some more explanation, because almost the same message will be displayed in line 577 (only better, because there, it contains the context as well).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that be satisfying?

f""" Initiating scan on the Kubernetes cluster specified by kubeconfig at '{kubeconfig_path}'
{' with context ' + config.context if config.context else ''}.
Fetching cluster information and verifying access.""")

scanner provides in the output additional info regarding vulnerability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that it looks like a duplicate of the other line, and it appears to me that the script now tries to achieve the same objective with two different means, one after the other.

cluster = await get_k8s_cluster_info(config.kubeconfig, config.context)
await scan_k8s_images(cluster.kubeconfig)
piobig2871 marked this conversation as resolved.
Show resolved Hide resolved

except CriticalException as e:
logger.critical(e)
logger.debug("Exception info", exc_info=True)
return 1

try:
context_desc = f"context '{config.context}'" if config.context else "default context"
logger.info("Checking cluster specified by %s in %s.", context_desc, config.kubeconfig)
Expand Down
Loading