Possible memory leak when collecting events for profiling #625

kif · 2022-10-05T06:39:39Z

Describe the bug
Very large (host) memory consumption has been observed when running OpenCL application in profiling mode.
Example: Processing 10000 4Mpix images (int32) with ~6 kernels per image on a nvidia Tesla A40 gets (OOM-) killed on a computer with 200GB of memory. The computer could host all images, uncompressed, in memory.

I used the tracemalloc tool from Python on the application without noticeable leak (at the Python level) indicating that the leak was from malloc performed outside the scope of Python. I investigated a possible leak coming from HDF5 via the h5py since all data were read and written in this format. but this was not the case.

When profiling is disabled, the memory consumption does not exceed few percent of the total memory.

To Reproduce
Investigated in:
silx-kit/pyFAI#1744

Expected behavior
A memory leak is expected from keeping the list of all events, but should not exceed 3.4 MB for 60000 kernels (when stored as 2-namedtuple)

Environment (please complete the following information):

OS: Linux Debian11 + Ubuntu 20.04
ICD Loader and version: ocl-icd 2.2.14-2
ICD and version: Nvidia 470.141.03-1~deb11u1
CPU/GPU: Nvidia Titan V & Tesla A40
Python version: 3.9 + 3.9
PyOpenCL version: 2021, 1, 2 + 2021, 2, 13

Additional context
The list of event is handled at https://github.com/silx-kit/silx/blob/master/src/silx/opencl/processing.py#L288

The text was updated successfully, but these errors were encountered:

inducer · 2022-10-05T07:25:24Z

Can something like valgrind maybe provide details on where those allocations are taking place?

kif · 2022-10-05T12:24:05Z

Here are the valgrind "massif" profile for two calls of the program when running on a limited number of images (2000), with and without profiling activated. Valgrind still suggests to look at h5py rather than pyopencl but the triggering of the option makes 16G difference in memory consumption.

Without profiling:

With profiling:

kif · 2022-10-05T13:02:05Z

I run it several other times and it looks like the profiling in OpenCL prevents the memory from being freed.

kif · 2022-10-06T11:15:29Z

So I tired to collect only timestamps for each event instead of the complete process.
The patch is for now implemented in:
silx-kit/silx#3690

The memory profile looks like this, now. One would have expected 10 memory free (since 10 files are processed) but fewer are visible.

kif · 2022-10-18T14:42:41Z

I got struck by something similar in another project ... but profiling was not involved this time.
https://github.com/kif/multianalyzer/blob/main/multianalyzer/opencl.py
The pattern was similar: read data from an HDF5 file with large chunks and send them to the GPU ...
But once again, unable to reproduce the behaviour within a self contained script.
Calling the pyopencl.array method finalize helps in freeing the memory on the CPU

kif added the bug label Oct 5, 2022

kif mentioned this issue Oct 6, 2022

Eventless OpenCL profiling silx-kit/silx#3690

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory leak when collecting events for profiling #625

Possible memory leak when collecting events for profiling #625

kif commented Oct 5, 2022

inducer commented Oct 5, 2022

kif commented Oct 5, 2022 •

edited

Loading

kif commented Oct 5, 2022

kif commented Oct 6, 2022

kif commented Oct 18, 2022 •

edited

Loading

Possible memory leak when collecting events for profiling #625

Possible memory leak when collecting events for profiling #625

Comments

kif commented Oct 5, 2022

inducer commented Oct 5, 2022

kif commented Oct 5, 2022 • edited Loading

kif commented Oct 5, 2022

kif commented Oct 6, 2022

kif commented Oct 18, 2022 • edited Loading

kif commented Oct 5, 2022 •

edited

Loading

kif commented Oct 18, 2022 •

edited

Loading