WORKDIR learned to cache it's potential output layer #3341

mzihlmann · 2024-10-13T04:08:59Z

Description

When WORKDIR is called on a non-existent directory, kaniko is kind enough to create that directory for you, resulting in a layer being added. However, kaniko does not cache that layer, which means that on every invocation a completely new image is emitted from that point onwards. Inside the same stage this is non-obvious as caching mechanism still pulls, so you get a 100% cache hitrate thereafter, but the image is completely new. In multistage builds or builds that depend on the newly emitted image, this is catastrophic, as they do consider the entire image's sha when determining whether a cache is hit or not, so this will invalidate the entire cache.

So far the workaround was to ensure that the directory exists before calling WORKDIR to avoid creating it implicitly, as RUN statements can be cached:

RUN mkdir /app
WORKDIR /app

With this change the layer potentially created by WORKDIR is cached too in similar vein to how RUN statements are cached.

There is some optimization potential left on the table here, as we do sometimes know a-priori whether a layer should be created at all and always know which directory. Currently I copied the code from RUN to make it work, but this is suboptimal, as this code assumes no a-priori knowledge. I'm open for suggestions.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

Includes unit tests
Adds integration tests if needed.

See the contribution guide for more details.

Reviewer Notes

The code flow looks good.
Unit tests and or integration tests added.

Release Notes

kaniko learned to cache layers created by WORKDIR

mzihlmann · 2024-10-13T04:11:30Z

pkg/commands/workdir.go

@@ -81,14 +84,116 @@ func (w *WorkdirCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile

 // FilesToSnapshot returns the workingdir, which should have been created if it didn't already exist
 func (w *WorkdirCommand) FilesToSnapshot() []string {
-	return w.snapshotFiles
+	return nil


WORKDIR / for example would result in an empty list, this was not cacheable on my trials. As it is not cacheable it causes cache misses every time requiring the entire file system to be unrolled. This is copied from RUN command, as a run command can result in no files being created ie. RUN echo "Hello World", somehow if I return nil instead of an empty list the caching mechanism is handled graciously, I'm currently investigating why.

I was able to resolve this by allowing the cache to contain empty images, just to indicate that the command did not change any files and we are aware of this. With this modification we are able to report back an empty list and no longer need to use the FS snapshot function from RUN command.

mzihlmann · 2024-10-13T04:12:18Z

pkg/commands/workdir.go

 func (w *WorkdirCommand) MetadataOnly() bool {
 	return false
 }
+
+func (r *WorkdirCommand) RequiresUnpackedFS() bool {
+	return true


WORKDIR required me to unpack the filesystem as otherwise the user is not known.

error building image: error building stage: failed to execute command: identifying uid and gid for user app: user app is not a uid and does not exist on the system

mzihlmann · 2024-10-13T04:14:07Z

pkg/commands/workdir.go

+}
+
+func (w *WorkdirCommand) ShouldCacheOutput() bool {
+	return w.shdCache


I was thinking whether we could optimize this instruction here to not cache the output if no files were created, I'm not sure how to tell from the cache consumer side whether a cache entry is missing or not there on purpose.

mzihlmann · 2024-10-13T04:17:24Z

pkg/commands/workdir.go

+
+func (wr *CachingWorkdirCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile.BuildArgs) error {
+	var err error
+	logrus.Info("Cmd: workdir")


I copied that entire block of code here, it is to ensure that even if we hit the cache we still do the metadata operation thingy and actually change the workdir. I could of course put that in a function etc. but I would actually prefer to pass the resolved directory into the cache, s.t. I can have a single line here. Not only reusing the functionality, but reusing the result.

cfg.WorkingDir = wr.resolvedWorkingDir

passing the result is not possible, as either the regular or the cached ExecuteCommand function is called, never both. So I opted for sharing the functionality, instead of extracting a nameless subroutine that fits the entire bill, I extracted the more meaningful ToAbsPath function. I think the remaining few lines are fine to duplicate. The remaining calls have too many dependencies to be neatly tucked away in a function.

hmmm... passing the result would be possible but requires some creative thinking. We do store the layer as an image when we push to the cache, so we could change the WORKDIR on that image and with that store that variable too. Come to think of it, labels on the cached image could be used to pass arbitrary data between cache creation and reusing stage.

but this would be a bigger redefinition of what the cache does and how it is used, not suitable for a small PR like this. Yet still interesting.

mzihlmann · 2024-10-13T04:20:09Z

pkg/commands/workdir.go

+		extractFn: util.ExtractFile,
+	}
+}
+
 func (w *WorkdirCommand) MetadataOnly() bool {
 	return false


sometimes calling WORKDIR is metadata only, there might be some optimization potential here, depending on whether we can know this a-priori.

ie. calling

WORKDIR /

without any further context is guaranteed to be metadata only in all images to my understanding.

mzihlmann · 2024-10-13T04:21:13Z

pkg/commands/commands.go

@@ -78,7 +78,7 @@ func GetCommand(cmd instructions.Command, fileContext util.FileContext, useNewRu
 	case *instructions.EnvCommand:
 		return &EnvCommand{cmd: c}, nil
 	case *instructions.WorkdirCommand:
-		return &WorkdirCommand{cmd: c}, nil
+		return &WorkdirCommand{cmd: c, shdCache: cacheRun}, nil


I piggy-backed on the flag for RUN instructions as I don't think it's sensible to have a separate flag for each instruction.

cache copy layers probably got its own flag because depending on the context the files might be huge and invalidate the purpose of a cache, which is speeding things up by downloading a layer instead of executing commands. This should not be a problem in our simple case here, so piggy-backing should be fine.

…created

fix: WORKDIR learned to cache it's output layer

5f0cdd2

mzihlmann commented Oct 13, 2024

View reviewed changes

mzihlmann added 2 commits October 13, 2024 15:05

move ToAbsPath to a function

fec8be0

support empty images in cache to handle WORKDIR w/o implicit folders …

2366c6d

…created

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WORKDIR learned to cache it's potential output layer #3341

WORKDIR learned to cache it's potential output layer #3341

mzihlmann commented Oct 13, 2024

mzihlmann Oct 13, 2024

mzihlmann Oct 13, 2024

mzihlmann Oct 13, 2024 •

edited

Loading

mzihlmann Oct 13, 2024

mzihlmann Oct 13, 2024

mzihlmann Oct 13, 2024 •

edited

Loading

mzihlmann Oct 15, 2024 •

edited

Loading

mzihlmann Oct 13, 2024

mzihlmann Oct 13, 2024

mzihlmann Oct 13, 2024

WORKDIR learned to cache it's potential output layer #3341

Are you sure you want to change the base?

WORKDIR learned to cache it's potential output layer #3341

Conversation

mzihlmann commented Oct 13, 2024

mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

mzihlmann Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

mzihlmann Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

mzihlmann Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

mzihlmann Oct 13, 2024 •

edited

Loading

mzihlmann Oct 13, 2024 •

edited

Loading

mzihlmann Oct 15, 2024 •

edited

Loading