Support network namespaces in exasock #11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This aims to introduce partial network namespaces support with minimal changes in the code.
The main challenge in implementing netns is that different processes (and sockets) can have different routing tables. Therefore exasock-dst tables must either be additionally indexed by netns inum or held separately for each namespace. Because my aim was to minimize performance impact of this addition I opted for a separate state which allows us to continue using 32-bit hashing.
Most fixes needed are various parts of the module now passing around network namespace structures. No userspace changes are needed now.
Current main limitation is that in several places it's still assumed that "socket namespace = process namespace" which is not necessarily true. To fix this we'd need to make userspace aware of socket's namespace and use tables accordingly. This would also require changes in several ioctls so that userspace can mmap several destination tables and request
exasock_dst_queue
for a given netns.Another small edge case is that we no longer update destination table when we resend SEQ because this would mean we'll need to track network namespace in TCP requests -- IMO it's too much complex code for too little a gain.
Overall this is not a complete solution because of the reasons above but it improves things to the point where it covers most cases (containers, isolated processes etc.) with understandable patch and no hot path changes.