Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use system's default rocgdb instead of AOMP's #853

Open
wants to merge 1 commit into
base: aomp-dev
Choose a base branch
from

Conversation

saiislam
Copy link
Member

@saiislam saiislam commented Mar 6, 2024

rocgdb requires libpython.so which is more likely to be found by the system's default rocgdb.

The one in AOMP/bin/rocgdb complains about missing libpython.so file.

rocgdb requires libpython.so which is more likely to be found
by the system's default rocgdb.

The one in AOMP/bin/rocgdb complains about missing libpython.so file.
@jplehr
Copy link
Contributor

jplehr commented Mar 6, 2024

I don't think we want to test the sytem's rocgdb (by accident or on purpose).

@ronlieb
Copy link
Contributor

ronlieb commented Mar 7, 2024

I also wonder why we don’t want to test the rocgdb we built and packed ?

@dpalermo
Copy link
Contributor

dpalermo commented Mar 7, 2024

I am seeing a different complaint instead of a missing libpython:

[r6 ~]$ /COD/LATEST/aomp/bin/rocgdb
amd-dbgapi library version mismatch, got 0.70.1, need 0.71+

Seems to have started on:

[r6 ~]$ /COD/2023-12-20/aomp/bin/rocgdb
amd-dbgapi library version mismatch, got 0.70.1, need 0.71+

Works before that date:

[r6 ~]$ /COD/2023-12-19/aomp/bin/rocgdb
GNU gdb (AOMP_18.0-1) 13.2
...
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) q

@dpalermo
Copy link
Contributor

dpalermo commented Mar 7, 2024

We are actually staging python libs into COD to allow tools that are linked to specific versions of python shared objects to work. If you see a missing python lib error, paste in the exact error message and the system you saw it on.

@saiislam
Copy link
Member Author

saiislam commented Mar 7, 2024

We are actually staging python libs into COD to allow tools that are linked to specific versions of python shared objects to work. If you see a missing python lib error, paste in the exact error message and the system you saw it on.

I am getting same error irrespective of using 2024-03-07 build or 2023-12-04 build.

Note: results are on r11

/COD/2024-03-07/aomp/bin/clang++  -g -O0    -fopenmp --offload-arch=gfx90a  -D__OFFLOAD_ARCH_gfx90a__ clang-325070.cpp -o clang-325070
/COD/2024-03-07/aomp/bin/rocgdb -x doit.gdb --args ./clang-325070 0 2>&1 | tee run.log
/COD/2024-03-07/aomp/bin/rocgdb: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory
make: *** [../Makefile.rules:71: run] Error 127
/COW/2023-12-04/aomp/bin/clang++  -g -O0    -fopenmp --offload-arch=gfx90a  -D__OFFLOAD_ARCH_gfx90a__ clang-325070.cpp -o clang-325070
/COW/2023-12-04/aomp/bin/rocgdb -x doit.gdb --args ./clang-325070 0 2>&1 | tee run.log
/COW/2023-12-04/aomp/bin/rocgdb: error while loading shared libraries: libpython3.8.so.1.0: cannot open shared object file: No such file or directory
make: *** [../Makefile.rules:71: run] Error 127

@dpalermo
Copy link
Contributor

dpalermo commented Mar 7, 2024

Looking back at the original thread on the 'CI OpenMP compiler daily triage group' teams chat motivated the staging fix, you will need to do the following on a 22.04 system:

[r11 ~]$ PYTHONHOME=/COD/LATEST/aomp/lib/python3.8 PYTHONPATH=/COD/LATEST/aomp/lib/python3.8  LD_LIBRARY_PATH=/COD/LATEST/aomp/lib /COD/LATEST/aomp/bin/rocgdb
GNU gdb (AOMP_19.0-0) 13.2
...
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb)

Also note that setting the above env vars also fixes the 'amd-dbgapi library version mismatch, got 0.70.1, need 0.71+' error now seen on 20.04 systems.

Not a "fix" so much as a workaround for running rocgdb built on an older OS.

Not that it helps us in this situation, but the moral of the story is don't link your product with the python shared objects. There is just no backward compatibility guaranteed (at least not building on 20.04 and running on 22.04).

@dpalermo
Copy link
Contributor

dpalermo commented Mar 7, 2024

The 'amd-dbgapi library version mismatch, got 0.70.1, need 0.71+' error is even a problem on the same system where rocgdb was built. Without specifying LD_LIBRARY_PATH, it is picking up the library from the system /opt/rocm:

[r5 /COD/LATEST/aomp]$ ldd /COD/LATEST/aomp/bin/rocgdb | grep dbgapi
        librocm-dbgapi.so.0 => /opt/rocm-5.7.0/lib/librocm-dbgapi.so.0 (0x00007f8046edb000)

Gets the staged librocm-dbgapi.so.0 with the workaround:

[r5 /COD/LATEST/aomp]$ PYTHONHOME=/COD/LATEST/aomp/lib/python3.8 PYTHONPATH=/COD/LATEST/aomp/lib/python3.8  LD_LIBRARY_PATH=/COD/LATEST/aomp/lib ldd /COD/LATEST/aomp/bin/rocgdb | grep dbgapi
        librocm-dbgapi.so.0 => /COD/LATEST/aomp/lib/librocm-dbgapi.so.0 (0x00007f25342d1000)

This issue feels like a cmake bug in rocgdb, as it should try to pick up shared libraries relative to it's installed location first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants