You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unsure if this is a good plan. Right now if there's collision in the sense that the same file gets uploaded twice at the same time, the second one will overwrite the first one, just by virtue of open(filename. 'wb') on the same filename in upload_staging. When the first one completes it will add it to the ingest queue - that will either fail to ingest for a malformed fits file, or will keep deferring as the file is recently modified, etc. The second add to ingest queue will only happen if the first one has already started, so it should all be good in the end, though this seems like not the most robust and clear way to deal with this.
If we use unique tmpfile names in upload_staging, we're just deferring the collision until the point where we copy the file to dataflow or upload it to S3, which doesn't seem obviously better.
Maybe we could do some other kind of collision protection such that the second upload attempt will get rejected until the first one has completed. We could simply check for the existence of filename in upload_staging and reject the attempt if it's already there, though then we'd need to be very careful about ensuring we clean-up the partial file if an upload fails as we wouldn't be able to simply re-try and end up overwriting the failed one in that case.
The text was updated successfully, but these errors were encountered:
Unsure if this is a good plan. Right now if there's collision in the sense that the same file gets uploaded twice at the same time, the second one will overwrite the first one, just by virtue of open(filename. 'wb') on the same filename in upload_staging. When the first one completes it will add it to the ingest queue - that will either fail to ingest for a malformed fits file, or will keep deferring as the file is recently modified, etc. The second add to ingest queue will only happen if the first one has already started, so it should all be good in the end, though this seems like not the most robust and clear way to deal with this.
If we use unique tmpfile names in upload_staging, we're just deferring the collision until the point where we copy the file to dataflow or upload it to S3, which doesn't seem obviously better.
Maybe we could do some other kind of collision protection such that the second upload attempt will get rejected until the first one has completed. We could simply check for the existence of filename in upload_staging and reject the attempt if it's already there, though then we'd need to be very careful about ensuring we clean-up the partial file if an upload fails as we wouldn't be able to simply re-try and end up overwriting the failed one in that case.
The text was updated successfully, but these errors were encountered: