Perf[BMQ,MQB]: stable GC history preventing allocations #498

678098 · 2024-11-03T02:18:46Z

Problem overview

Consider a cluster processing 100k PUT msgs/s on its limit, meaning, that it can barely process this amount of PUTs without NACKs and producers generate exactly this rate of messages.
Consider a rollover or any other lag causing Queue dispatcher thread to stuck for 100ms.
After this time, there will be 10k pending PUT messages waiting in QUEUE dispatcher queue:
[Put1]...[Put10000]

Now, consider a GC called before this block of PUTs. It cleared something in queue history, but not everything, and also we scheduled another GC call to the QUEUE dispatcher:
[Put1]...[Put10000][gcHistory]

To call this gcHistory, we need to process all PUTs before (they are earlier in the queue).
So we process 10k PUTs and add them to queue history. Since in this scenario producers generate 100k msgs/s and cluster is also able to process 100k msgs/s and no more, by the time we process Put1-Put10000, there will be another 10k PUTs in the queue:
[gcHistory][Put10001]...[Put20000]

So when we call gcHistory, we will remove at most 1k msgs from the history (k_GC_MESSAGES_BATCH_SIZE = 1000). But at the same time, we will add 10k new items to queue history and get another 10k PUTs to the QUEUE dispatcher queue, delaying the next gcHistory.

And this process will continue. So we will constantly allocate new items in queue history items pool, and we will eventually upsize the number of buckets in ordered hashmap, causing performance lag and full rehashing of hashtable.

20kk items in queue history after a 5-minute run on 80k msgs/s:

3GB allocated for queue history (Handles):

:::::::::: :::::::::: ALLOCATORS >> Last snapshot was 1.00 m ago.
                                 | Bytes Allocated| -delta-| Max Bytes Allocated| Allocations| -delta-| Deallocations| -delta-
---------------------------------+----------------+--------+--------------------+------------+--------+--------------+--------
allocators                       |   7,424,269,408|  86,800|       7,424,336,192| 106,779,796| 315,271|   103,623,032| 315,254
  *direct*                       |           1,584|        |               1,584|          10|        |             1|        
  Application                    |   7,423,728,624|  87,632|       7,423,748,064|  84,002,540| 243,085|    80,846,206| 243,063
    *direct*                     |           9,600|        |              11,440|       1,570|        |         1,544|        
    ClusterCatalog               |   3,426,265,408|     528|       3,426,279,424|  17,366,654|  39,631|    17,365,735|  39,630
      *direct*                   |           3,776|        |               3,776|          21|        |             3|        
      cl6_dc3                    |   3,426,261,632|     528|       3,426,275,648|  17,366,633|  39,631|    17,365,732|  39,630
        *direct*                 |       1,019,184|        |           1,056,640|   4,229,079|        |     4,228,883|        
        StorageManager           |   3,382,069,840|     528|       3,382,083,680|  13,137,171|  39,631|    13,136,714|  39,630
          *direct*               |           7,920|        |               7,920|          41|        |            12|        
          Partition1             |   3,361,992,272|     528|       3,362,006,112|  13,136,758|  39,631|    13,136,583|  39,630
            *direct*             |           5,824|     528|              19,664|  13,136,547|  39,631|    13,136,522|  39,630
            QueueStorage         |   3,163,421,248|        |       3,163,421,408|         139|        |            40|        
              *direct*           |           2,096|        |               2,256|          15|        |             4|        
              Handles            |   3,059,921,280|        |       3,059,921,280|          63|        |            20|

Alternative 1

Set k_GC_MESSAGES_BATCH_SIZE = 1000000:

Alternative 2

insert to history is the only operation that can increase history size, and, if we know that GC required, we can enforce GC on any insert operation. So the workflow is:

We exec scheduled gcHistory and find out that we cleaned 1000 items from history, but there are more. We set d_requireGC flag for history.
Next, we have a PUT event. We do insert to history on PUT. insert checks for d_requireGC flag and execs gc for another batch of items.
This process continues for one or more PUTs, but eventually we reach the end of gc, when there are no items to gc. We unset d_requireGC flag, so the following PUTs do not cause GC.
The situation remains the same until the next scheduler call possibly set d_requireGC again.

So we pay some CPU time on PUT operations to do the GC, but at the same time:

We save CPU time on allocating elements to history items pool.
We save CPU on rehashing the history hashmap when going to bigger scale.
We occupy less memory, the memory growth is not uncontrollable anymore.
We save some CPU on enque gcHistory callback to the same dispatcher queue.

Comparison

80k msgs/s priority stress test shows the same results as 90k msgs/s with the proposed change. This means, that with this change we can actually hold better on higher throughputs.

bmq-oss-ci

Build 354 of commit adcdc17 has completed with FAILURE

Signed-off-by: Evgeny Malygin <[email protected]>

bmq-oss-ci

Build 355 of commit 1196b50 has completed with FAILURE

chrisbeard · 2024-11-05T15:02:51Z

Before, performing gc on history was always sort of best-effort and done in the background. I think I agree on approach 1 not being ideal, it doesn't get at the core of the problem which is mismatched pacing of PUTs and gc; gc is still too periodic. Approach 2 is interesting, I hadn't considered moving gc into the critical path.

With approach 2, if (and only if) background gc can't keep history limited to the configured time, the broker starts doing gc on the critical data path when processing incoming PUTs. When doing that gc, k_INSERT_GC_MESSAGES_BATCH_SIZE controls how aggressive the gc is and since a PUT is only one message, any value > 1 will eventually reduce history to the desired size?

I'm curious if you've got any measurements on the gc time, do we have a rough number for how long that takes? I suspect it's pretty cheap. It'd be interesting to see if a smaller batch of gc on PUTs helps (in terms of latency maybe?) at all.

678098 · 2024-11-05T19:32:18Z

I hadn't considered moving gc into the critical path.

Actually, we do gc from the same QUEUE dispatcher thread, but just more rarely.

and since a PUT is only one message, any value > 1 will eventually reduce history to the desired size?

That is correct. However, there should also be some overhead of calling gc, making some initial checks and going into the loop etc. Need to check if it's inlined or not.

I'm curious if you've got any measurements on the gc time, do we have a rough number for how long that takes? I suspect it's pretty cheap.

Yes, I think it is cheap, but I will also add benchmarks.

678098 requested a review from a team as a code owner November 3, 2024 02:18

678098 force-pushed the 241103_queue_history_GC_opt branch from 992dd4c to adcdc17 Compare November 3, 2024 02:20

bmq-oss-ci bot reviewed Nov 3, 2024

View reviewed changes

678098 force-pushed the 241103_queue_history_GC_opt branch from adcdc17 to 310abe9 Compare November 3, 2024 04:06

Perf[BMQ,MQB]: stable GC history preventing allocations

1196b50

Signed-off-by: Evgeny Malygin <[email protected]>

678098 force-pushed the 241103_queue_history_GC_opt branch from 310abe9 to 1196b50 Compare November 3, 2024 22:21

bmq-oss-ci bot reviewed Nov 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf[BMQ,MQB]: stable GC history preventing allocations #498

Perf[BMQ,MQB]: stable GC history preventing allocations #498

678098 commented Nov 3, 2024 •

edited

Loading

bmq-oss-ci bot left a comment

bmq-oss-ci bot left a comment

chrisbeard commented Nov 5, 2024

678098 commented Nov 5, 2024 •

edited

Loading

Perf[BMQ,MQB]: stable GC history preventing allocations #498

Are you sure you want to change the base?

Perf[BMQ,MQB]: stable GC history preventing allocations #498

Conversation

678098 commented Nov 3, 2024 • edited Loading

Problem overview

Alternative 1

Alternative 2

Comparison

bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

bmq-oss-ci bot left a comment

Choose a reason for hiding this comment

chrisbeard commented Nov 5, 2024

678098 commented Nov 5, 2024 • edited Loading

678098 commented Nov 3, 2024 •

edited

Loading

678098 commented Nov 5, 2024 •

edited

Loading