Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an S3 "cache" layer for S3 access? #72

Open
phirstgemini opened this issue Oct 8, 2024 · 0 comments
Open

Implement an S3 "cache" layer for S3 access? #72

phirstgemini opened this issue Oct 8, 2024 · 0 comments
Assignees
Milestone

Comments

@phirstgemini
Copy link
Collaborator

phirstgemini commented Oct 8, 2024

Consider whether it would be good to implement an access layer for S3 that also acts as a local cache.

Pros:

  • Resolves collisions on S3 access - admittedly this could be solved a lot simpler by using tmpfile names.
  • Potentially significant performance gains - would avoid re-download cycles for eg previews.
  • Potential significant performance gain with header update exports - again, avoid re-download
  • Provides a single code point for interface to S3.

Cons:

  • Additional complexity
  • Potential for edge cases leaving the cache in a mess
  • S3 errors would show up in the cache rather than the calling code - might not be apparent that an upload_file was unable to store the data to S3 for example, unless implement specific write through / write back calls.

Implementation concepts:

  • s3_staging directory becomes more of a cache
  • database table to record and maintain state - filename, path, size, refcount and last_used columns.
  • Python module (possibly simply the ORM class) for access.
    • Would be used as a context manager for file access (the enter and exit methods would take care of updating the refcount and last_used)
    • Note - do not maintain any state in the python class, only in the database table. The class would be instantiated in different python processes simultaneously, potentially regarding the same file. A python singleton won't cut it for this.

Other notes:

  • Would be nice to expand this to handle compression too, possibly both with and without S3, and remove that code from diskfile. That's potentially a lot of added complexity though.
@phirstgemini phirstgemini added this to the Someday Maybe milestone Oct 8, 2024
@phirstgemini phirstgemini self-assigned this Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant