Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making close() of AbstractBufferedFile idempotent #1686

Open
sugibuchi opened this issue Sep 21, 2024 · 2 comments
Open

Consider making close() of AbstractBufferedFile idempotent #1686

sugibuchi opened this issue Sep 21, 2024 · 2 comments

Comments

@sugibuchi
Copy link

The current implementation of close() in AbstractBufferedFile is like the following.

    def close(self):
        """Close file

        Finalizes writes, discards cache
        """
        if getattr(self, "_unclosable", False):
            return
        if self.closed:
            return
        if self.mode == "rb":
            self.cache = None
        else:
            if not self.forced:
                self.flush(force=True)     # <--- can throw exceptions

            if self.fs is not None:
                self.fs.invalidate_cache(self.path)
                self.fs.invalidate_cache(self.fs._parent(self.path))

        self.closed = True  # <---  we always need to reach here.

https://github.com/fsspec/filesystem_spec/blob/2024.9.0/fsspec/spec.py#L2022-L2041

This method calls flush() to finalize file writing. However, the file will continue to be considered as "open" in case of exceptions in flush() as the code does not reach self.closed = True.

If close() is called again, the file object will try to flush data again. This can lead to unexpected side effects, and it does not satisfy the convention defined by IOBase.close().

Flush and close this stream. This method has no effect if the file is already closed. Once the file is closed, any operation on the file (e.g. reading or writing) will raise a ValueError.
As a convenience, it is allowed to call this method more than once; only the first call, however, will have an effect.
https://docs.python.org/3/library/io.html#io.IOBase.close

As I reported in #1685, the garbage collection also calls this close(). Therefore, even if we explicitly close a file object once, its close() is called at least twice if flush() throws an exception. In addition, the file close() function's idempotence is generally an expected characteristic in many programming languages.

To make the behaviour of file objects simpler and more predictable in case of errors, it would be better always to mark self.closed = True regardless the result of flush().

    def close(self):
        ...
        try:
            if getattr(self, "_unclosable", False):
            ...
        finally:
            self.closed = True
@martindurant
Copy link
Member

I see your point. A context exit should certainly make the file closed, and calling it twice during del is pretty bad (I'll comment on that issue in a moment). However, I wonder whether there's space in the scheme you propose (with finally) for what to do when the final upload fails. Should other things be cleaned up at the time (buffer/cache, etc)?

@sugibuchi
Copy link
Author

However, I wonder whether there's space in the scheme you propose (with finally) for what to do when the final upload fails. Should other things be cleaned up at the time (buffer/cache, etc)?

I think a better implementation is to mark a file object "closed" immediately after flush() regardless of its result, as this flush() is supposed to be the last operation for the file object to touch the file.

    def close(self):
        
        try:
            ...
            else:
                if not self.forced:
                    self.flush(force=True)
        finally:
            self.closed = True
            
            self.buffer = None
            self.cache = None
            if self.fs is not None:
                self.fs.invalidate_cache(self.path)
                self.fs.invalidate_cache(self.fs._parent(self.path))    

Then we should release resources and perform other operations, such as cleaning, etc. after marking the file object closed.

With this modification, we can ensure that a concrete file object class inheriting AbstractBufferedFile always releases its internal resources after closing a file (regardless of whether it succeeded or not) using the following idiom.

class MyFile(AbstractBufferedFile):
    ...
    def close(self):
        try:
            super().close()
        finally:
            if not self.closed:
                release_some_expensive_resources()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants