Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue: 3884801 Check for resident hugepages to avoid SIGBUS #204

Open
wants to merge 2 commits into
base: vNext
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ Example:
XLIO DETAILS: SigIntr Ctrl-C Handle Enabled [XLIO_HANDLE_SIGINTR]
XLIO DETAILS: SegFault Backtrace Disabled [XLIO_HANDLE_SIGSEGV]
XLIO DETAILS: Print a report Disabled [XLIO_PRINT_REPORT]
XLIO DETAILS: Quick start Disabled [XLIO_QUICK_START]
XLIO DETAILS: Ring allocation logic TX 0 (Ring per interface) [XLIO_RING_ALLOCATION_LOGIC_TX]
XLIO DETAILS: Ring allocation logic RX 0 (Ring per interface) [XLIO_RING_ALLOCATION_LOGIC_RX]
XLIO INFO : Ring migration ratio TX -1 [XLIO_RING_MIGRATION_RATIO_TX]
Expand Down Expand Up @@ -308,6 +309,13 @@ When Enabled, print backtrace if segmentation fault happens.
Value range is 0 to 1
Default value is 0 (Disabled)

XLIO_QUICK_START
Avoid expensive extra checks to reduce the initialization time. This may result
in failures in case of a system misconfiguration.
For example, if the parameter is enabled and hugepages are requested beyond the
cgroup limit, XLIO crashes due to an access to an unmapped page.
Default value is 0 (Disabled)

XLIO_ZC_BUFS
Number of global zerocopy data buffer elements allocation.
Default value is 200000
Expand Down
2 changes: 2 additions & 0 deletions src/core/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -501,6 +501,8 @@ void print_xlio_global_settings()
safe_mce_sys().handle_segfault ? "Enabled " : "Disabled");
VLOG_PARAM_STRING("Print a report", safe_mce_sys().print_report, MCE_DEFAULT_PRINT_REPORT,
SYS_VAR_PRINT_REPORT, safe_mce_sys().print_report ? "Enabled " : "Disabled");
VLOG_PARAM_STRING("Quick start", safe_mce_sys().quick_start, MCE_DEFAULT_QUICK_START,
SYS_VAR_QUICK_START, safe_mce_sys().quick_start ? "Enabled " : "Disabled");

VLOG_PARAM_NUMSTR("Ring allocation logic TX", safe_mce_sys().ring_allocation_logic_tx,
MCE_DEFAULT_RING_ALLOCATION_LOGIC_TX, SYS_VAR_RING_ALLOCATION_LOGIC_TX,
Expand Down
39 changes: 38 additions & 1 deletion src/core/util/hugepage_mgr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,44 @@ void *hugepage_mgr::alloc_hugepages_helper(size_t &size, size_t hugepage)
if (ptr == MAP_FAILED) {
ptr = nullptr;
__log_info_dbg("mmap failed (errno=%d), skipping hugepage %zu kB", errno, hugepage / 1024U);
} else {
} else if (!safe_mce_sys().quick_start) {
/* Check whether all the pages are resident. In a container, allocation beyond the limit can
* be successful and lead to SIGBUS on an access.
*/
const size_t pages_nr = actual_size / hugepage;
size_t resident_nr = 0;
char *page_ptr = reinterpret_cast<char *>(ptr);
int rc = 0;

/* Checking a single page per hugepage in a loop is more efficient than a single mincore()
* syscall for the entire range. A single syscall would also require an array allocation
* which can grow to tens of MB for the preallocated memory region.
*/
for (size_t i = 0; rc == 0 && i < pages_nr; ++i) {
unsigned char vec;
rc = mincore(page_ptr, 1, &vec);
resident_nr += rc == 0 ? (vec & 1U) : 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to count or we can just fail on the first bad hugepage ptr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can stop, however, this is expected to be very unlikely case, so no point in this optimization and I'd suggest choosing the more readable variant (not necessarily current variant)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that stopping on the first one may create simpler impl, maybe even without many different dbg_logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we can loop while ptr < end_ptr ... and avoid many different vars and stuff

page_ptr += hugepage;
}

if (rc != 0 || resident_nr != pages_nr) {
int rc2 = munmap(ptr, actual_size);
if (rc2 < 0) {
__log_info_dbg("munmap failed (errno=%d)", errno);
}
if (rc < 0) {
__log_info_dbg("mincore() failed to verify hugepages (errno=%d)", errno);
} else if (resident_nr != pages_nr) {
__log_info_dbg("Not all hugepages are resident (allocated=%zu resident=%zu)",
AlexanderGrissik marked this conversation as resolved.
Show resolved Hide resolved
pages_nr, resident_nr);
}
__log_info_dbg("Cannot use hugepages, skipping hugepage %zu kB", hugepage / 1024U);

ptr = nullptr;
}
}
if (ptr) {
// Success.
size = actual_size;
}
return ptr;
Expand Down
5 changes: 5 additions & 0 deletions src/core/util/sys_vars.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -761,6 +761,7 @@ void mce_sys_var::get_env_params()
service_enable = MCE_DEFAULT_SERVICE_ENABLE;

print_report = MCE_DEFAULT_PRINT_REPORT;
quick_start = MCE_DEFAULT_QUICK_START;
log_level = VLOG_DEFAULT;
log_details = MCE_DEFAULT_LOG_DETAILS;
log_colors = MCE_DEFAULT_LOG_COLORS;
Expand Down Expand Up @@ -1084,6 +1085,10 @@ void mce_sys_var::get_env_params()
print_report = atoi(env_ptr) ? true : false;
}

if ((env_ptr = getenv(SYS_VAR_QUICK_START))) {
quick_start = atoi(env_ptr) ? true : false;
}

if ((env_ptr = getenv(SYS_VAR_LOG_FILENAME))) {
read_env_variable_with_pid(log_filename, sizeof(log_filename), env_ptr);
}
Expand Down
5 changes: 4 additions & 1 deletion src/core/util/sys_vars.h
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,7 @@ struct mce_sys_var {
uint32_t mce_spec;

bool print_report;
bool quick_start;
vlog_levels_t log_level;
uint32_t log_details;
char log_filename[PATH_MAX];
Expand Down Expand Up @@ -558,6 +559,7 @@ extern mce_sys_var &safe_mce_sys();
#define SYS_VAR_HANDLE_SIGINTR "XLIO_HANDLE_SIGINTR"
#define SYS_VAR_HANDLE_SIGSEGV "XLIO_HANDLE_SIGSEGV"
#define SYS_VAR_STATS_FD_NUM "XLIO_STATS_FD_NUM"
#define SYS_VAR_QUICK_START "XLIO_QUICK_START"

#define SYS_VAR_RING_ALLOCATION_LOGIC_TX "XLIO_RING_ALLOCATION_LOGIC_TX"
#define SYS_VAR_RING_ALLOCATION_LOGIC_RX "XLIO_RING_ALLOCATION_LOGIC_RX"
Expand Down Expand Up @@ -710,7 +712,8 @@ extern mce_sys_var &safe_mce_sys();
#define MCE_DEFAULT_APP_ID ("XLIO_DEFAULT_APPLICATION_ID")
#define MCE_DEFAULT_HANDLE_SIGINTR (true)
#define MCE_DEFAULT_HANDLE_SIGFAULT (false)
#define MCE_DEFAULT_STATS_FD_NUM 0
#define MCE_DEFAULT_STATS_FD_NUM (0)
#define MCE_DEFAULT_QUICK_START (false)
#define MCE_DEFAULT_RING_ALLOCATION_LOGIC_TX (RING_LOGIC_PER_THREAD)
#define MCE_DEFAULT_RING_ALLOCATION_LOGIC_RX (RING_LOGIC_PER_THREAD)
#define MCE_DEFAULT_RING_MIGRATION_RATIO_TX (-1)
Expand Down