-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account for block size in filesystem store for eviction purposes #661
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
62f13e8
to
ae859a9
Compare
ae859a9
to
2f5e4f9
Compare
2f5e4f9
to
c190eaa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Only a few nit-picky comments.
Thanks a lot 😄
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained
-- commits
line 2 at r1:
nit: Please expand out the commit here.
nativelink-store/src/filesystem_store.rs
line 134 at r1 (raw file):
Self: Sized; /// Returns the underlying reference to where the filesize is stored.
nit: This comment needs updated.
nativelink-store/src/filesystem_store.rs
line 135 at r1 (raw file):
/// Returns the underlying reference to where the filesize is stored. fn get_data_size(&mut self) -> &mut u64;
nit: While we are here, can we rename this .data_size_mut()
? (For consistency I'd also rename get_file_size()
-> file_size()
.
This will bring us slightly more inline with rust name convention.
nativelink-store/src/filesystem_store.rs
line 137 at r1 (raw file):
fn get_data_size(&mut self) -> &mut u64; /// Returns the actual size of the underlying file on the disk after accounting for filesystem block size
super nit: End in a period.
nativelink-store/src/filesystem_store.rs
line 138 at r1 (raw file):
/// Returns the actual size of the underlying file on the disk after accounting for filesystem block size fn get_file_size(&self) -> u64;
nit: to reduce ambiguity here, I suggest get_size_on_disk()
. It's probably confusing which one is which the way it's named.
nativelink-store/tests/filesystem_store_test.rs
line 1151 at r1 (raw file):
// Ensure that get_file_size() returns the correct number // block_size if content length is < block_size // ceil(content length / block_size) * block_size if content length is > block_size
nit: I think this comment is inverted and a bit confusing.
nativelink-store/tests/filesystem_store_test.rs
line 1154 at r1 (raw file):
#[tokio::test] async fn get_file_size_uses_block_size() -> Result<(), Error> { let short_digest = DigestInfo::try_new(HASH1, VALUE1.len())?;
nit: Can we just declare a value here with with the size that we are seeking in our test?
I really don't like talking about variable sized items when it's the thing we care about.
Maybe just declare const VALUE_4K
and const VALUE_1K
inline here and use them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained
c190eaa
to
dc4a00d
Compare
Thanks for the review! I made most of the changes but had a couple questions/comments
I'm not sure how to do this, could you clarify what you mean by expand out the commit? I've been amending the main commit since that's what the guidelines recommended
Since another recommendation was to just make it
I think the comment is confusing because the 3rd line is actually applicable in case 2 as well as 3. There's no reason to separate it out. I updated it to include an example as well |
dc4a00d
to
429acb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained
a discussion (no related file):
No action required, but I wanted to mention:
Ext4 filesystems don't actually use ceil(data_size / block_size) * block_size
... They will in fact inline data into the inode block if the data can fit (from what I remember reading)... This works similar to how short-string optimization works.
Again, no action, but I think it's important to know that our calculations are very "rough" and more of a "worst-case".
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: Please expand out the commit here.
Yeah, could you put in a short paragraph explaining what was done?
We have tools we use internally to roll this up into the release notes, and this is definitely something we want to include in the notes.
Format is as follows:
[Title]
[Short paragraph goes here]
I think we actually changed some stuff recently, so the commit message may not be the appropriate place... Can you do it here in the commit and in the github description? We require squash merge now due to the way github signs commits (we are still fiddling with some of the subtle details).
nativelink-store/tests/filesystem_store_test.rs
line 1151 at r3 (raw file):
// Ensure that get_file_size() returns the correct number // ceil(content length / block_size) * block_size // Content length is 1B and block_size is 4B: size on disk = 4B
nit: Lets not talk in bytes for blocksize and just always talk in 4k
chunks. I understand what you are saying in the comment, but I had to re-read it like 3 times to realize the block size was a non-standard size.
Maybe elude to something more like:
(Assume block size of 4k)
1B data size = 4K size on disk
5K data size = 8K size on disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained
429acb5
to
f655804
Compare
f655804
to
d3ebed2
Compare
Sure, sorry about the confusion in the test case comment. I updated the comment and the actual test case to use the default block size and generate a string of 1K and 5K length for the contents. I added some extra information about the changes made in the commit description as well as how to restore the previous behavior. I saw the PR was approved but could you review it to make sure I did it a way your release note aggregator will pick up before I merge it @allada ? Thanks for the heads up about how Ext4 works, makes sense that there would be some optimizations in there! |
- eviction_store will now treat FileEntries as if they consume the worst case amount of space on disk based on block size to prevent out of space errors. Prior behavior can be restored by setting `"block_size": 1` under filesystem in configuration file - Add `FileEntry::size_on_disk` - worst case disk space consumed by entry - Add test cases to `filesystem_store_test` ensuring correct `size_on_disk` - Add field in filesystem config for setting block size
d3ebed2
to
235613a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: complete! 1 of 1 LGTMs obtained
Description
Fixes #595
Should prevent scenarios where fragmented files take up more space than is allowed by
max_bytes
and could therefore exceed the maximum space availablefile_size
has been renamed todata_size
since the actual file size on the disk after accounting for block size is provided byget_file_size()
now.FileEntry::len
will now return get_file_size() since it is used by theeviction_map
to determine what to evict based on sizeType of change
How Has This Been Tested?
All affected existing tests have been updated to use a block size of 1 so that they do not have a max_bytes limit smaller than block size (which will hang but wouldn't occur in production)
A new test has been added to ensure that the
get_file_size
calculation returns the correct result for different inputsChecklist
bazel test //...
passes locallygit amend
This change is