-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux diff artifact: Print the differences in human readable form. #4916
base: main
Are you sure you want to change the base?
Conversation
@octo you fork is behind, please rebase with upstream main and force push, also please fix the diff test that's currently failing. |
✅ rebased |
@octo what happens if Helm charts are inside the artifact? Are the templated YAMLs excluded or the CLI will crash? |
We can't assume an artifact has Kubernetes YAMLs, Flux artifacts are used for many other things besides YAML, for example Terraform modules that work with tofu-controller. I think we need to put the YAML diff behind a flag e.g. |
I don't know, since I haven't tried. My suspicion is that the YAML parsing will fail and that the command errors out. In that case, we're probably better off using a non-semantic diff, something akin to using |
I think there is value in showing a semantic diff for YAML content, but we need to make it opt-in and exclude templated files, like those in Helm charts. |
@stefanprodan I added code that uses The I unified the handling of the two arguments, allowing each to be a directory, Unfortunately, the |
Example: `# Check if local files differ from remote | ||
flux diff artifact oci://ghcr.io/stefanprodan/manifests:podinfo:6.2.0 --path=./kustomize`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, please undo it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous flux diff artifact <path1> --path=<path2>
syntax is still supported for backwards compatibility. This simply makes flux diff artifact <path1> <path2>
the documented syntax. If you don't like it, I can revert back to the --path
syntax, but compatibility should not be broken.
cmd/flux/diff_artifact.go
Outdated
args = append(args, c.flags...) | ||
args = append(args, fromDir, toDir) | ||
|
||
cmd := exec.CommandContext(ctx, c.name, args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed in the flux dev meeting recently. We have avoided executing external commands so far. Since this is part of the CLI, the default diff utility may not be present in platforms like Windows and the alternative tools may have their own ways of displaying diff. I think it would be better if we can have consistency when we can.
I believe for non-yaml content, we want to have text diff in unified format. Upstream Go has a private diff implementation https://github.com/golang/go/blob/master/src/internal/diff/diff.go which provides a unified diff result given two file contents. For an example, try running this snippet in the Go playground https://go.dev/play/p/TPPc4BAMKDp . The example shows a diff of some terraform configuration.
In the past, we have taken other such code from upstream Go and adapted it for our needs, for example https://github.com/fluxcd/pkg/blob/tar/v0.8.0/tar/tar.go#L6. We can do something similar for this. We can have a new diff package in https://github.com/fluxcd/pkg with some niceness for our use case. It would be nice to add an option to apply colors to added and removed lines. The same can be surfaced as a flag to toggle colored output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ done – the --differ
flag now takes one of three arguments:
-
dyff
: Use thedyff
package to generate a semantic YAML diff. -
external
: execute the command in theFLUX_EXTERNAL_DIFF
environment variable. Using--differ external
without settingFLUX_EXTERNAL_DIFF
is now an error, there is no default command being executed. That means flux makes no assumptions about the system is runs on. -
unified
: uses github.com/hexops/gotextdiff to generate a unified diff without invoking external commands.I looked into unified diffs and Golang quite a bit more than I wanted to — the situation is surprisingly bad. I opted to use github.com/hexops/gotextdiff, which is a clone of one of the internal-only packages by the Go authors. The repo is marked as archived as of earlier this year, but overall this seems like the least bad option.
In the past, we have taken other such code from upstream Go and adapted it for our needs
As a maintainer myself, I'd work hard to avoid that. Copying and adapting code rather than referencing it means that all the maintenance overhead is on the Flux maintainers, which can snowball quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the implementation. Sorry for the late reply, we were busy with a release and I wanted to discuss this in the dev meeting before further discussion. I'm adding section headers below to make it easier to see the topics discussed below, as I have heard my long replies are difficult to parse for some 😄
External command execution
We discussed about the external command execution today and we still would like to not execute an external command for the same reasons as before. We don't execute other binaries/commands anywhere in Flux yet and this can open up questions about similar considerations elsewhere. We can have a consistent diff result with our diff code independently. If there's a need to use any other diff program, users can always combine flux pull artifact
with their program to achieve that. If needed, we can document it in our docs or even publish blog post about such use case.
We can discuss this further if you would like to.
Unified diff implementation
Regarding the unified diff implementation, in my initial research, I did come across gotextdiff which used the code from golang/tools and found it to be archived. Then I found a fork of it https://github.com/Shogoki/gotextdiff which uses the code from golang/go, but doesn't seem to have much going on, stale README from the original repo.
I did further research today around the diff algorithms, more specifically the Git diff algorithms, refer https://git-scm.com/docs/diff-options#Documentation/diff-options.txt---diff-algorithmpatienceminimalhistogrammyers. Found this nice article that describes the differences between the algorithms https://luppeng.wordpress.com/2020/10/10/when-to-use-each-of-the-git-diff-algorithms/.
The default Git diff and GNU diff algorithm is myers diff algorithm. The algorithm used in golang/tools also implements myers diff algorithm. The algorithm in golang/go implements anchored diff algorithm, based on the patience diff algorithm. Patience diff algorithm seems to produce more clear and efficient result based on multiple sources I have read. This finding will be used below.
In today's meeting, we discussed about depending on an archived package. Since the package is archived, we have no option than to take the ownership of the code. If we own the code, we can fix any bugs we find and also add improvements to it as we need them. Hence, we can't depend on the archived gotextdiff package. In addition, it may be better to use patience diff algorithm over myers diff algorithm. Since we have to copy the diff code due to different reasons, it seems better to use the one from golang/go, which implements patience diff algorithm, and add the modifications I mentioned above regarding coloring of the unified diff results.
I hope this provides enough details about our considerations. We can discuss more if you have any other concerns and ideas about it.
Diff result presentation
In the meeting, we also discussed about the presentation of the diff result. The current output looks like the following
fromFiles = {"kustomize/deployment.yaml", "kustomize/hpa.yaml", "kustomize/kustomization.yaml", "kustomize/service.yaml"}
toFiles = {"kustomize/hpa.yaml"}
Only in /tmp/flux-diff-artifact410276048: kustomize/deployment.yaml
...
I did notice that fromFiles
and toFiles
are written to stderr, maybe to separate the debug output from the actual result.
I don't have a good example to share of what we would like it to be yet, I can work on that and share later. But it's along the lines of a summary of which files were modified, added, deleted, similar to what git status shows, as this would be more relevant for the use case of a Flux OCI user, in addition to the actual diff output. Maybe the it would involve another flag to show the full diff and only show a summary by default. I'll see if I can share some good examples after thinking more about it. If this gives you any idea, please feel free to share and we can discuss about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I came up with for the output. I have not discussed the details of this with anyone, so this is more as a proposal to gather some feedback.
I took some liberty in what constitutes a diff result, as mentioned above about presenting a simplified summary by default and using an additional option to show the full diff of what changed. I think it's okay for a flux user who may just need a brief summary of which files changed.
In the following, I used oci://ghcr.io/stefanprodan/manifests/podinfo:6.2.0
as the artifact. I deleted service.yaml
, introduced ingress.yaml
and modified hpa.yaml
.
For the default diff result, we can show something like the following:
$ flux diff artifact oci://ghcr.io/stefanprodan/manifests/podinfo:6.2.0 ./test-manifests/
new file: kustomize/ingress.yaml
deleted: kustomize/service.yaml
modified: kustomize/hpa.yaml
Maybe we can color these too, green for new file and red for deleted, and maybe for modified too. Or maybe there's no need for coloring at all here.
Unmodified files are not listed.
Update: An advantage of showing a summary with just the file names is that it can also list binary files which can't be diffed.
To show details of the changes, a new flag can be passed, I'm calling this --full
, --long
may also work, but we can discuss and change it if we do things this way.
$ flux diff artifact oci://ghcr.io/stefanprodan/manifests/podinfo:6.2.0 ./test-manifests/ --full
--- kustomize/hpa.yaml
+++ kustomize/hpa.yaml
@@ -2,6 +2,7 @@
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
+ namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
--- /dev/null
+++ kustomize/ingress.yaml
@@ -1 +1,6 @@
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+ name: minimal-ingress
+ annotations:
+ nginx.ingress.kubernetes.io/rewrite-target: /
...
This is the detailed diff. Only changes are shown. Unmodified files are not shown. New file content have all the file content with +
prefix and deleted file content have -
prefix.
For a modified file, the diff starts with the relative name of the file with respect to the artifact:
--- kustomize/hpa.yaml
+++ kustomize/hpa.yaml
...
For new file or deleted file, similar to what git diff does, we can show the empty part as /dev/null
at the top:
--- /dev/null
+++ kustomize/ingress.yaml
...
...
--- kustomize/service.yaml
+++ /dev/null
...
For binary files, we can either skip it as we can't show any diff of it or, we can also do what git diff does, print a line about it, something like:
...
Binary files /dev/null and b/abinaryfile differ
...
This may be overkill but this is what I think of it as of now. All these are just for the plain text diff. YAML manifests will just use what dyff provides. But maybe we can modify that output too if we agree on some enhancements we find here. Update: Thinking more about it, I think regardless of the differ, doing the summary by default seems better and only use dyff or text diff for detailed diff.
Open for any suggestion and discussion about it.
This option allows to exclude certain files from extraction. This is going to be used by `flux diff artifact` to only extract "interesting" files from an archive for comparison with another source. See also: fluxcd/flux2#4916 Signed-off-by: Florian Forster <[email protected]>
I was hoping to use `flux diff artifact` as part of a CI pipeline to show the difference between the merge request and the currently deployed artifact. The existing implementation doesn't work for us, since it only compares the checksums. This commit changes the output produced by `flux diff artifact` to a list of changes in human readable form. The code is using the `dyff` package to produce a semantic diff of the YAML files. That means, for example, that changes in the order of map fields are ignored, while changes in the order of lists are not. Example output: ``` $ ./bin/flux diff artifact "oci://${IMAGE}" --path=example-service/ spec.replicas (apps/v1/Deployment/example-service-t205j6/backend-production) ± value change - 1 + 7 ✗ "oci://registry.gitlab.com/${REDACTED}/example-service-t205j6/deploy:production" and "example-service/" differ ``` The new `--brief` / `-q` flag enables users to revert to the previous behavior of only printing a has changed/has not changed line. Signed-off-by: Florian Forster <[email protected]>
This fixes the `TestDiffArtifact` unit test. Signed-off-by: Florian Forster <[email protected]>
Also updates the list of options passed to `dyff.CompareInputFiles` to be the same as in the internal `build` package. Signed-off-by: Florian Forster <[email protected]>
Signed-off-by: Florian Forster <[email protected]>
Signed-off-by: Florian Forster <[email protected]>
Artifacts may contain other files types, not just YAML files, meaning the semantic YAML diff provided by `dyff` is not a safe default. This change implements purely textual diffing using the `diff` command line tool. This tool can be overridden by users using the `FLUX_EXTERNAL_DIFF` environment variable. Users that store Kubernetes resource manifests in the artifact can re-enable the semantic YAML diff behavior using the `--semantic-diff yaml` flag. The arguments to the diff subcommand may be: * A directory * A .tar.gz or .tgz file * An OCI url * An individual file The two arguments to the command are treated the same way, allowing users to diff in either direction. Signed-off-by: Florian Forster <[email protected]>
Signed-off-by: Florian Forster <[email protected]>
Signed-off-by: Florian Forster <[email protected]>
Signed-off-by: Florian Forster <[email protected]>
Signed-off-by: Florian Forster <[email protected]>
@stefanprodan @darkowlzz Thank you so much for your reviews! I have addressed all review comments and rebased my branch on top of |
@@ -6,6 +6,7 @@ go 1.22.4 | |||
replace gopkg.in/yaml.v3 => gopkg.in/yaml.v3 v3.0.1 | |||
|
|||
require ( | |||
bitbucket.org/creachadair/stringset v0.0.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This project is not Apache 2.0 licensed, so we can't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my reading, it seems set union operation is the primary need for this package in the code, which is implemented in https://bitbucket.org/creachadair/stringset/src/v0.0.14/stringset.go#lines-156:168. I believe we should be able to use Set
from apimachinery https://github.com/kubernetes/apimachinery/blob/v0.31.0/pkg/util/sets/set.go#L143, or implement the same using the their data types.
@@ -157,6 +159,7 @@ require ( | |||
github.com/hashicorp/go-version v1.7.0 // indirect | |||
github.com/hashicorp/golang-lru/arc/v2 v2.0.5 // indirect | |||
github.com/hashicorp/golang-lru/v2 v2.0.5 // indirect | |||
github.com/hexops/gotextdiff v1.0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This package is no longer maintained, so we can't use it.
I was hoping to use
flux diff artifact
as part of a CI pipeline to show the difference between the merge request and the currently deployed artifact. The existing implementation doesn't work for us, since it only compares the checksums.This commit changes the output produced by
flux diff artifact
to a list of changes in human readable form. The code is using thedyff
package to produce a semantic diff of the YAML files. That means, for example, that changes in the order of map fields are ignored, while changes in the order of lists are not.Example output:
The new
--brief
/-q
flag enables users to revert to the previous behavior of only printing a has changed/has not changed line.Closes: #3839