Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WORKDIR learned to cache it's potential output layer #3341

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mzihlmann
Copy link

Fixes #3340

Description

When WORKDIR is called on a non-existent directory, kaniko is kind enough to create that directory for you, resulting in a layer being added. However, kaniko does not cache that layer, which means that on every invocation a completely new image is emitted from that point onwards. Inside the same stage this is non-obvious as caching mechanism still pulls, so you get a 100% cache hitrate thereafter, but the image is completely new. In multistage builds or builds that depend on the newly emitted image, this is catastrophic, as they do consider the entire image's sha when determining whether a cache is hit or not, so this will invalidate the entire cache.

So far the workaround was to ensure that the directory exists before calling WORKDIR to avoid creating it implicitly, as RUN statements can be cached:

RUN mkdir /app
WORKDIR /app

With this change the layer potentially created by WORKDIR is cached too in similar vein to how RUN statements are cached.

There is some optimization potential left on the table here, as we do sometimes know a-priori whether a layer should be created at all and always know which directory. Currently I copied the code from RUN to make it work, but this is suboptimal, as this code assumes no a-priori knowledge. I'm open for suggestions.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes unit tests
  • Adds integration tests if needed.

See the contribution guide for more details.

Reviewer Notes

  • The code flow looks good.
  • Unit tests and or integration tests added.

Release Notes

  • kaniko learned to cache layers created by WORKDIR

@@ -81,14 +84,116 @@ func (w *WorkdirCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile

// FilesToSnapshot returns the workingdir, which should have been created if it didn't already exist
func (w *WorkdirCommand) FilesToSnapshot() []string {
return w.snapshotFiles
return nil
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORKDIR / for example would result in an empty list, this was not cacheable on my trials. As it is not cacheable it causes cache misses every time requiring the entire file system to be unrolled. This is copied from RUN command, as a run command can result in no files being created ie. RUN echo "Hello World", somehow if I return nil instead of an empty list the caching mechanism is handled graciously, I'm currently investigating why.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to resolve this by allowing the cache to contain empty images, just to indicate that the command did not change any files and we are aware of this. With this modification we are able to report back an empty list and no longer need to use the FS snapshot function from RUN command.

func (w *WorkdirCommand) MetadataOnly() bool {
return false
}

func (r *WorkdirCommand) RequiresUnpackedFS() bool {
return true
Copy link
Author

@mzihlmann mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORKDIR required me to unpack the filesystem as otherwise the user is not known.

error building image: error building stage: failed to execute command: identifying uid and gid for user app: user app is not a uid and does not exist on the system

}

func (w *WorkdirCommand) ShouldCacheOutput() bool {
return w.shdCache
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking whether we could optimize this instruction here to not cache the output if no files were created, I'm not sure how to tell from the cache consumer side whether a cache entry is missing or not there on purpose.


func (wr *CachingWorkdirCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile.BuildArgs) error {
var err error
logrus.Info("Cmd: workdir")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied that entire block of code here, it is to ensure that even if we hit the cache we still do the metadata operation thingy and actually change the workdir. I could of course put that in a function etc. but I would actually prefer to pass the resolved directory into the cache, s.t. I can have a single line here. Not only reusing the functionality, but reusing the result.

cfg.WorkingDir = wr.resolvedWorkingDir

Copy link
Author

@mzihlmann mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing the result is not possible, as either the regular or the cached ExecuteCommand function is called, never both. So I opted for sharing the functionality, instead of extracting a nameless subroutine that fits the entire bill, I extracted the more meaningful ToAbsPath function. I think the remaining few lines are fine to duplicate. The remaining calls have too many dependencies to be neatly tucked away in a function.

Copy link
Author

@mzihlmann mzihlmann Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... passing the result would be possible but requires some creative thinking. We do store the layer as an image when we push to the cache, so we could change the WORKDIR on that image and with that store that variable too. Come to think of it, labels on the cached image could be used to pass arbitrary data between cache creation and reusing stage.

but this would be a bigger redefinition of what the cache does and how it is used, not suitable for a small PR like this. Yet still interesting.

extractFn: util.ExtractFile,
}
}

func (w *WorkdirCommand) MetadataOnly() bool {
return false
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes calling WORKDIR is metadata only, there might be some optimization potential here, depending on whether we can know this a-priori.

ie. calling

WORKDIR /

without any further context is guaranteed to be metadata only in all images to my understanding.

@@ -78,7 +78,7 @@ func GetCommand(cmd instructions.Command, fileContext util.FileContext, useNewRu
case *instructions.EnvCommand:
return &EnvCommand{cmd: c}, nil
case *instructions.WorkdirCommand:
return &WorkdirCommand{cmd: c}, nil
return &WorkdirCommand{cmd: c, shdCache: cacheRun}, nil
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I piggy-backed on the flag for RUN instructions as I don't think it's sensible to have a separate flag for each instruction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache copy layers probably got its own flag because depending on the context the files might be huge and invalidate the purpose of a cache, which is speeding things up by downloading a layer instead of executing commands. This should not be a problem in our simple case here, so piggy-backing should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

layer created implicitly by WORKDIR is not cached
1 participant