Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not resolve path we pass into decorated function? #44

Closed
yarikoptic opened this issue Apr 30, 2021 · 7 comments · Fixed by #63
Closed

do not resolve path we pass into decorated function? #44

yarikoptic opened this issue Apr 30, 2021 · 7 comments · Fixed by #63
Assignees

Comments

@yarikoptic
Copy link
Member

prompted by investigating dandi/dandisets#38 -- it seems we resolve path which we pass to underlying function:

code -- we pass os.realpath resolved path into original function
 86             @wraps(f)
 87             def fingerprinter(path, *args, **kwargs):
 88                 # we need to dereference symlinks and use that path in the function
(Pdb) 
 89                 # call signature
 90                 path_orig = path
 91                 path = op.realpath(path)
 92                 if path != path_orig:
 93                     lgr.log(5, "Dereferenced %r into %r", path_orig, path)
 94                 if op.isdir(path):
 95                     fprint = self._get_dir_fingerprint(path)
 96                 else:
 97                     fprint = self._get_file_fingerprint(path)
 98                 if fprint is None:
 99                     lgr.debug("Calling %s directly since no fingerprint for %r", f, path)
(Pdb) 
100                     # just call the function -- we have no fingerprint,
101                     # probably does not exist or permissions are wrong
102                     ret = f(path, *args, **kwargs)
103                 # We should still pass through if file was modified just now,
104                 # since that could mask out quick modifications.
105                 # Target use cases will not be like that.
106                 elif fprint.modified_in_window(self._min_dtime):
107                     lgr.debug("Calling %s directly since too short for %r", f, path)
108                     ret = f(path, *args, **kwargs)
109                 else:
110                     lgr.debug("Calling memoized version of %s for %s", f, path)
(Pdb) 
111                     # If there is a fingerprint -- inject it into the signature
112                     kwargs_ = kwargs.copy()
113                     kwargs_[fingerprint_kwarg] = fprint.to_tuple() + (
114                         tuple(self._tokens) if self._tokens else ()
115                     )
116  ->                 ret = fingerprinted(path, *args, **kwargs_)
117                 lgr.log(1, "Returning value %r", ret)
118                 return ret
119     
120             # and we memoize actually that function
121             return fingerprinter
(Pdb) p path
'/mnt/backup/dandi/dandisets/000003/.git/annex/objects/64/J4/SHA256E-s8417855886--918e541a92b934d7543acc92981ed0425eadc0d24dcabf9d8e5b44924dbb1046.nwb/SHA256E-s8417855886--918e541a92b934d7543acc92981ed0425eadc0d24dcabf9d8e5b44924dbb1046.nwb'
(Pdb) p path_orig
PosixPath('/mnt/backup/dandi/dandisets/000003/sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140327_behavior+ecephys.nwb')

we should check on original motivation and either it could/should be avoided!

@yarikoptic
Copy link
Member Author

@jwodder could you please have a look either there is an easy/safe resolution for it?

@jwodder
Copy link
Member

jwodder commented Jul 8, 2021

@yarikoptic I believe the motivation was something like ensuring that identical relative paths run from different working directories wouldn't be confused, though now that file fingerprints include the inode number, that may be less of an issue.

I'm not clear on why this is a problem. Does git-annex use different symlink paths for identical files?

@yarikoptic
Copy link
Member Author

IIRC it was more of a fact that underlying function needed that original path, so it could lookup annex key from a symlink here https://github.com/dandi/dandisets/blob/master/tools/backups2datalad.py#L263 . I guess logic could just operate straight on the file under .git/annex/objects but overall I think the caching decorator should not alter the actual path passed into the function it is decorating so there is no "decorator specific logic needed".

I think decorator indeed must operate on the resolved paths (to cache all the inodes etc) but pass original path into the decorated function.

@jwodder
Copy link
Member

jwodder commented Jul 8, 2021

@yarikoptic I've tried to update the code to pass the original path to the decorated function in #45, but the tests are failing for some inscrutable reason. I feel inclined to blame the _min_dtime feature.

@jwodder
Copy link
Member

jwodder commented Dec 17, 2021

@yarikoptic I've gone back to this, as this or something similar is needed for #58. The reason the tests are failing is because the unresolved path is still being seen by joblib and used in accessing the cache, which means that when the tests reach this point where the previously-cached file is accessed through a symlink, passing the unresolved symlink path to the decorated function results in an uncached call and a cache miss, which is a change in behavior. One way to resolve this would be to tell joblib to ignore the "path" argument itself, but if we applied that to #58, then non-path attributes of instances would be ignored, which seems undesirable. A related solution would be to give memoize_path a pass_unresolved_path: BOOL = False parameter for controlling whether the resolved or unresolved path is passed to the decorated function, and when the unresolved path is passed, we ignore the "path" argument, and we either document the risks of combining this feature with instance methods or else explicitly disallow mixing them.

@yarikoptic
Copy link
Member Author

I have pushed 3637aaa to #45 -- check it out (commit provides a brief description for the "idea"). I am not yet sure on the necessity of your original commits in #45 but let's see if that change does any good

@jwodder
Copy link
Member

jwodder commented Dec 17, 2021

@yarikoptic So what should #58 do about resolving path attributes when decorating methods?

yarikoptic added a commit to yarikoptic/fscacher that referenced this issue Dec 17, 2021
…ng kwargs arg

Also includes fix to test_memoize_path_dir from con#45 by @jwodder, which I
do not grasp necessity of yet in how it doesn't fail on master but fails with
these changes.

A minimalistic alternative to con#45
which fixes con#44
yarikoptic added a commit to yarikoptic/fscacher that referenced this issue Dec 17, 2021
…ng kwargs arg

Also includes fix to test_memoize_path_dir from con#45 by @jwodder, which I
do not grasp necessity of yet in how it doesn't fail on master but fails with
these changes.

A minimalistic alternative to con#45
which fixes con#44
jwodder pushed a commit to yarikoptic/fscacher that referenced this issue Dec 17, 2021
…ng kwargs arg

Also includes fix to test_memoize_path_dir from con#45 by @jwodder, which I
do not grasp necessity of yet in how it doesn't fail on master but fails with
these changes.

A minimalistic alternative to con#45
which fixes con#44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants