-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue: Implement doorbell batching for the new API #164
Open
pasis
wants to merge
169
commits into
Mellanox:vNext
Choose a base branch
from
pasis:storage-api-db-batching-2
base: vNext
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
At most a single element of this vector is always used. Once rfs constructor is complete there must be exactly one attach_flow_data element in case of ring_simple. For ring_tap this element remains null. Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alex Briskin <[email protected]>
Set ETIMEDOUT errno and return -1 from recv in case a socket was timed out, instead of 0 return value and 0 errno. For instance, in case of TCP keep alive timeout. Signed-off-by: Alexander Grissik <[email protected]>
The idea is to scan all rpm/deb packages for personal emails we should not be releasing packages with such emails the scan is done on both the metadat info and the changelog of a specific package Issue: HPCINFRA-919 Signed-off-by: Daniel Pressler <[email protected]>
poll_group takes additional reference to each its ring. But it doesn't release it once the group is destroyed. This leads to two issues: 1. Extra resources are utilized if user destroys a polling group before the application terminates. 2. Polling is not possible for a destroyed group. Therefore, if there are not completed WQEs in the SQ, respective sockets won't report TX completions and cannot be fully terminated. The ring needs to be destroyed to flush all the completions. Release all the native rings explicitly in the poll_group destructor to resolve the above issues. Signed-off-by: Dmytro Podgornyi <[email protected]>
When an RX packet event happens, XLIO passes the ownership to user. Further, user releases the buffer explicitly. However, XLIO frees the buffer unconditionally just after emitting the event. Fix this and free buffers only if user doesn't provides the RX event callback. Signed-off-by: Dmytro Podgornyi <[email protected]>
reclaim_recv_single_buffer() accumulates buffers in a list. In the performance oriented API we want to reuse hot buffers immediately, so reclaim_recv_buffers() implementation is more suitable. Signed-off-by: Dmytro Podgornyi <[email protected]>
The memory callback provides hugepage size of the underlying pages. Replace hardcoded 0 with real hugepage size. Keep the page size in xlio_allocator object. This field a relevant only the hugepage allocation method and 0 in all other cases. Signed-off-by: Dmytro Podgornyi <[email protected]>
XLIO Socket API must guarantee that the XLIO_SOCKET_EVENT_TERMINATED is not followed by any other events. Therefore, all the TX completion events must be completed by that moment. Do a polling iteration before calling socket destructor to increase the chance that all the relevant WQEs are completed. This mechanism needs to be improved in the future. Signed-off-by: Dmytro Podgornyi <[email protected]>
xlio_init_ex() changes some default parameters. However, a global object can trigger safe_mce_sys() constructor at the start. Therefore, we need to re-read the environment variables again to guarantee that the changed parameters take place. Signed-off-by: Dmytro Podgornyi <[email protected]>
Avoid using connect() with sock fd interface, because fd_collection doesn't keep xlio_socket_t objects. Signed-off-by: Dmytro Podgornyi <[email protected]>
xlio_socket_t objects aren't connected to the fd_collection anymore. Therefore, all the methods must be called from the sockinfo_tcp objects directly. Also, xlio_socket_fd() is not relevant anymore and can be removed. Signed-off-by: Dmytro Podgornyi <[email protected]>
Iterate over std::list of TCP sockets while erasing socket during iteration. Overcomed by increasing iterator before erase. Signed-off-by: Iftah Levi <[email protected]>
rdma-core limits number of UARs per context to 16 by default. After creating 16 QPs, XLIO receives duplicates of blueflame registers for each subsequent QP. As results, blueflame doorbell method can write WQEs concurrently without serialization and this leads to a data corruption. BlueFlame can make impact on throughput, since copy to the blueflame register is expensive. It can improve latency in some low latency scenarios, however, XLIO targets high traffic/PPS rates. Removing blueflame method also slightly improves performance in some scenarios. BlueFlame can be returned back in the future to improve low-latency scenarios, however, it will need some rework to avoid the data corruption. Signed-off-by: Dmytro Podgornyi <[email protected]>
The inline WQE branch is not likely in most throughput scenarios. Signed-off-by: Dmytro Podgornyi <[email protected]>
Avoid calling register_socket_timer_event when a socket is already registered (TIME-WAIT). Although there is no functionality issue with that, it produces too high rate of posting events for internal-thread. This leads to lock contantion inside internal-thread and degraded performance of HTTP CPS. Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Gal Noam <[email protected]>
UTLS uses tcp_tx_express() for non blocking sockets. However, this TX method doesn't support XLIO_RX_POLL_ON_TX_TCP. Additional RX polling improves scenarios such as WEB servers. Insert RX polling into UTLS TX path to resolve performance degradation. Signed-off-by: Dmytro Podgornyi <[email protected]>
In heavy CPS scenarios a socket may go to TIME-WAIT state and be reused before first TCP timer registration is performed by internal-thread. 1. Setting timer_registered=true while posting the event prevents the second attemp to try and post the event again. 2. Adding sanity check in add_new_timer that verifies that the socket is not already in the timer map. Signed-off-by: Alexander Grissik <[email protected]>
Added new env parameter - XLIO_MAX_TSO_SIZE. It allows the user to control maximum size of TSO, instead of taking the maximum cap by HW. The default size is 256KB (maximum by current HW). Values higher than HW capabilities won't be taken into account. Signed-off-by: Iftah Levi <[email protected]>
Signed-off-by: Gal Noam <[email protected]>
PBUF_NONE was used mistakenly instead of PBUF_DESC_NONE. Signed-off-by: Dmytro Podgornyi <[email protected]>
The field doesn't have to be initialized if we do copy. This is extra operation, therefore, move it to the else branch. Signed-off-by: Dmytro Podgornyi <[email protected]>
Inline part of the fill_wqe() is overcomplicated. Hide it in a separate method, so refactoring can be isolated. Also, don't check multiple scatter-gather case for inline criteria. This is unlikely scenario because TCP layer copies non-zcopy data to a single buffer until it's full. Signed-off-by: Dmytro Podgornyi <[email protected]>
Signed-off-by: Dmytro Podgornyi <[email protected]>
URGENT flag requests TX completion for the respective WQE. This is required for zerocopy interfaces where user cannot specify the last send operation explicitly. Otherwise, TX completion batching can lead to a dead lock if user stops sending data and waits for the completions. CALLBACK flag requests to call a callback once the buffer is released by XLIO. Signed-off-by: Dmytro Podgornyi <[email protected]>
Signed-off-by: Dmytro Podgornyi <[email protected]>
This allows to achieve doorbell batching similar to TX completions batching. XLIO extra socket API provide explicit flush functionality. Such a flush operation will close WQE accumulation session and ring doorbell. This guarantees doorbells and zcopy completions even if user stops sending data (as far as user calls the final flush). Signed-off-by: Dmytro Podgornyi <[email protected]>
Signed-off-by: Dmytro Podgornyi <[email protected]>
bot:retest |
AlexanderGrissik
force-pushed
the
vNext
branch
from
August 18, 2024 08:58
cfce2e8
to
51c2340
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
New API requires user to call explicit flush. We can implement doorbell batching and close unfinished batch by the flush. This will guarantee doorbell and zcopy completions progress even if user stops sending data.
What
Implement doorbell batching for the new API.
Why ?
Up to 4% performance improvement depending on scenario.
Change type
What kind of change does this PR introduce?
Check list