Replies: 6 comments 3 replies
-
Hello @goneall I can't reproduce your case (spdx file deson't contain duplicates).
What did I miss? |
Beta Was this translation helpful? Give feedback.
-
@DmitriyLewen - Thanks for the quick review. I'll get the latest code and see if it duplicates. If it duplicates, I'll see if I can do some debugging and come up with a smaller reproduceable example. |
Beta Was this translation helpful? Give feedback.
-
@DmitriyLewen I wasn't able to duplicate with the "smaller" file system referenced above, but I was able to duplicate it with a much larger file system scan. It took a couple hours to scan and it produced a 29MB SBOM containing 10,549 duplicate SPDX ID's. The files analyzed are from the following repos followed by the git hashes for the version scanned from the CNCF KeyCloak project:
I'll go through the duplicates and see if I can create a smaller set of files to duplicate the problem so you don't have to run such a large (time consuming) scan. |
Beta Was this translation helpful? Give feedback.
-
Here are 3 duplicates from the output - I'm scanning just the integration-arquillian directory from the keycloak repo to see if it duplicates with a smaller set of files. It's still pretty large (around 20MB) and taking a long time to scan (> 20 minutes and still running). I'll update the discussion in about 8 hours (getting late my time).
|
Beta Was this translation helpful? Give feedback.
-
@DmitriyLewen The above file will duplicate the problem. I ran this on version v0.55.2 git hash 928c7c0 (I wasn't able to get the latest development code to run on my local machine). Unzip the file and run the command Here's the resultant JSON file: The file has 317 duplicates. One example is:
|
Beta Was this translation helpful? Give feedback.
-
Created #7824 |
Beta Was this translation helpful? Give feedback.
-
Description
This is similar to #6204, but perhaps different.
In scanning a large repo as a file system with SPDX as the output format, multiple packages with the same SPDX-ID are present in the output.
From a cursory look, it appears that these are multiple entries for exactly the same package.
I suspect that, similar to #6204 the same package shows up in more than one file reference (e.g. as the same Python dependency reference in two separate requirements.txt files).
Desired Behavior
For SPDX SBOMs the package SPDX ID's must be unique within the same SPDX document - duplicates are not allowed.
If the duplicate SPDX ID represents the exact same package, the duplicate should just be removed.
If the duplicate SPDX ID represents a different package, a different SPDX ID should be generated.
Actual Behavior
Duplicate SPDX IDs are found.
For example, in the attached MaterialX-2024-08-21-trivy-spdx.json you will find:
as well as:
Reproduction Steps
Target
Filesystem
Scanner
License
Output Format
SPDX
Mode
Standalone
Debug Output
Please ping me if you need the debug output - it is quite large
Operating System
Ubuntu
Version
Checklist
trivy clean --all
Beta Was this translation helpful? Give feedback.
All reactions