-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement vblock split write amp reduction algo #582
base: master
Are you sure you want to change the base?
Conversation
lib/cn/kvset_split.c
Outdated
perfc_rwc++; | ||
if (perfc_ison(pc, PERFC_RA_CNCOMP_RBYTES) || perfc_ison(pc, PERFC_RA_CNCOMP_WBYTES)) { | ||
struct mblock_props props; | ||
log_debug("Cloned mblock (0x%" PRIx64 ") starting at offset %ld", src_mbid, off); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these for my debugging purposes, but they could probably be removed or changed. Open to thoughts.
lib/cn/kvset_split.c
Outdated
bool overlaps; | ||
bool accessed; | ||
uint16_t vblk_idx; | ||
uint offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a uint
because this is what I get back from the kvset_iter_next_vref
API.
Workload I used for testing: kmt -j64 -i2g -Rx -s1 -f %023lu -l977 -okeydist=0 /mnt/hse kvs1 kvs-oparams cn_dsplit_size=32 With this, I notice a 4.68% reduction in KVDB size at the end of the run. |
a2bdd13
to
de63cae
Compare
You should also have seen no gc operations... |
Greg how could I double check that? Is there a log message I should look out for? |
de63cae
to
b27a545
Compare
6625ce7
to
4d35f54
Compare
b0e3e43
to
5d57b45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good!
But I feel the vblocks_split()
function warrants a comment at the top briefly explaining the algorithm.
It doesn't work though 😕 |
Ok, so here are the facts. If I am remembering correctly from when I was last touching this PR, you can either end up in 1 of 2 situations. One is a segfault due to a null pointer dereference, but I couldn't tell you what the backtrace is. The other situation is the following:
My steps for reproducing:
Been a while since I tried this workflow with gdb since I kept hitting an internal gdb bug. I am using a debug build, and the run will usually fail at >= 3.10 space amp, so pretty far into the load. |
5d57b45
to
63d5d06
Compare
And this is the backtrace in the other scenario:
|
I have some pretty spammy logs in this diff. So what you'll see when you encounter either the segfault or the assertion failure is something like the following:
The problematic portion being the 4096 alen, which is most likely completely wrong. Following ancestries of blocks like this you can see that pretty much the entire block is a hole due cloning at an offset and then punching. Right now, I am trying to understand if the min and max keys of the vblocks need to change after cloning/punching. |
During a vblock split, we can save write amp by using mpool_mblock_punch() to FALLOC_FL_PUNCH_HOLE a portion of the mblock for both the left and right hand side destination mblocks. Signed-off-by: Tristan Partin <[email protected]> Co-authored-by: Nabeel Meeramohideen Mohamed <[email protected]>
63d5d06
to
20010dc
Compare
I believe last column is space amp (Greg and unstructured data 😄). Doing some quick math would indicate that vgarb had a 99.07% reduction with this patch using the aforementioned workload.
|
Looking at |
yeah, i was having issues with that when running db-bench... what stats are you looking at? in general, i don't think our stats include the garbage, which is why i asked nabeel to create the vgarb stat.. sadly, our stats are reported in SI units, and du is reported in powers of two, does that make a difference? |
9f2903f
to
4367c87
Compare
Well more debugging to do...sigh |
4367c87
to
74e2875
Compare
This PR has at least two issues that I think were documented in JIRA, but since we no longer have JIRA here we are :).
Issue 1 manifests itself if you abort at the end of the vblocks_split() function. Issue 2 is harder to notice, but you will fail because total bytes is <= used bytes of the vblock. You would also notice a vblock has an alen of a single page (4096). I think @gsramdasi noticed one other issue similar to how issue 2 manifests even when a vblock was going through its first split. This might be an indicator of a third undiagnosed issue. |
During a vblock split, we can save write amp by using mpool_mblock_punch() to FALLOC_FL_PUNCH_HOLE a portion of the mblock for both the left and right hand side destination mblocks.
Signed-off-by: Tristan Partin [email protected]
Co-authored-by: Nabeel Meeramohideen Mohamed [email protected]