Run dotCMS with jemalloc #30619

wezell · 2024-11-11T15:01:48Z

Parent Issue

No response

Task

Running dotCMS in our cloud infra, we have seen many pods getting OOM killed at the container level. This happens even if it seems that the JVM has plenty of heap headroom. In a more perfect world, I would expect to see the jvm die with an internal OOM at which point, we would know that, hey, java needs a bigger -Xmx, but this is not what is happening. Instead it seems that these containers are using untracked/off-heap/system memory which seems to grow in some cases and results in the containers getting killed.

Right now, our best guidance on sizing dotCMS's JVM in a container with large heaps is to run with -Xmx set to ~65% of available memory. This means that if we want to run with a 10GB heap, we need a 16GB RAM limit on the pod. 4GB overhead for the underlying OS is kind nuts and leads to resource over-allocation and excessive costs. It would be ideal (and more $$ efficient) if we could tighten that up to say, run -Xmx10g in a pod with 12GB RAM limit.

It seems that in some cases libraries that rely on JNI and "unsafe" off-heap memory allocations can cause system memory usage to leak/grow in a way that is very difficult to track. Apparently Linux's default memory allocator malloc can stack memory usage in a way that makes it impossible to be reclaimed by the system. A fix for this is to replace glibc's malloc implementation with on that does a better job allowing memory to be reclaimed - jemalloc is a memory allocator implementation that prevents memory fragmentation and allows system memory to be reclaimed.

I was looking into implementing a new image filter using libvips, which is a high performance image lib. This would rely on a JNI implementation. From reading about libvips, running it can cause memory usage to grow unbounded unless you use jemalloc or some other non-default memory allocator. This got me thinking that this might be some of our problems too. I know we use JNI in a number of places, including our image resizing libraries and saas compiler and with all the libs we include, we probably are using a bunch of "unsafe" operations in a number of places. I don't have a smoking gun test case but my gut is that moving to jemalloc has very little downsides and also has the very real possibility of improving our container memory usage profile

Proposed Objective

Application Performance

Proposed Priority

Priority 2 - Important

Acceptance Criteria

download the dotCMS docker image and run

export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2
export MALLOC_CONF=stats_print:true
java -version

You should see jemalloc stat output printed out.

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

References:

Assumptions & Initiation Needs

No response

Quality Assurance Notes & Workarounds

No response

Sub-Tasks & Estimates

No response

The text was updated successfully, but these errors were encountered:

…efault but is just available. ref #30619

github-actions · 2024-11-11T16:58:37Z

PRs:

issue 30619 jemalloc #30621

… information is being set ref #30619

ref #30619

wezell added Triage Type : Task labels Nov 11, 2024

wezell added this to dotCMS - Product Planning Nov 11, 2024

github-project-automation bot moved this to New in dotCMS - Product Planning Nov 11, 2024

wezell added a commit that referenced this issue Nov 11, 2024

feat(perf) adding jemalloc to our java-base image. It is not set by d…

c18fe59

…efault but is just available. ref #30619

wezell added a commit that referenced this issue Nov 11, 2024

feat(perf) adding comment to sdkman so we know where the java version…

163f3df

… information is being set ref #30619

wezell added a commit that referenced this issue Nov 11, 2024

feat(perf) setting jemalloc by default, can be unset by an env var

69e89fa

ref #30619

wezell added a commit that referenced this issue Nov 11, 2024

feat(perf) setting jemalloc by default, can be unset by an env var

e6e2062

ref #30619

wezell linked a pull request Nov 11, 2024 that will close this issue

issue 30619 jemalloc #30621

Merged

wezell added a commit that referenced this issue Nov 11, 2024

feat(perf) setting jemalloc by default, can be unset by an env var

3a335df

ref #30619

wezell added a commit that referenced this issue Nov 11, 2024

feat(perf) setting jemalloc by default, can be unset by an env var

cc50651

ref #30619

wezell closed this as completed in #30621 Nov 12, 2024

github-project-automation bot moved this from New to Internal QA in dotCMS - Product Planning Nov 12, 2024

github-actions bot mentioned this issue Nov 12, 2024

issue 30619 jemalloc #30621

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run dotCMS with jemalloc #30619

Run dotCMS with jemalloc #30619

wezell commented Nov 11, 2024 •

edited

Loading

github-actions bot commented Nov 11, 2024

Run dotCMS with jemalloc #30619

Run dotCMS with jemalloc #30619

Comments

wezell commented Nov 11, 2024 • edited Loading

Parent Issue

Task

Proposed Objective

Proposed Priority

Acceptance Criteria

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

Assumptions & Initiation Needs

Quality Assurance Notes & Workarounds

Sub-Tasks & Estimates

github-actions bot commented Nov 11, 2024

wezell commented Nov 11, 2024 •

edited

Loading