Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples/shallow_sphere gives segmentation fault #642

Open
rjleveque opened this issue Apr 21, 2020 · 21 comments
Open

examples/shallow_sphere gives segmentation fault #642

rjleveque opened this issue Apr 21, 2020 · 21 comments

Comments

@rjleveque
Copy link
Member

I'm trying to build the pyclaw galleries and everything works fine with v5.7.0rc except in examples/shallow_sphere where running Rossby_wave.py or test_shallow_sphere.py gives a segmentation fault.

@mandli
Copy link
Member

mandli commented Apr 22, 2020

I got it to work with 5.7, what seg fault were you seeing?

@rjleveque
Copy link
Member Author

Not very informative:

[shallow_sphere] como $ python Rossby_wave.py 
Segmentation fault: 11

Googling segmentation fault 11 python turns up various discussions but they all seem a few years old.

@mandli
Copy link
Member

mandli commented Apr 22, 2020

Ug, wish we had better error codes with this stuff. I will try and see if I can reproduce this again then and report back.

@mandli
Copy link
Member

mandli commented Apr 22, 2020

If you remove the *.so files in that examples directory does the f2py wrapping give any errors?

@rjleveque
Copy link
Member Author

Just some new gfortran warnings similar to what I get these days with the fortran code...

(geo5) [shallow_sphere] como $ python Rossby_wave.py 
/Users/rjl/clawpack_src/clawpack_master/clawpack/pyclaw/util.py:78: UserWarning: missing extension modules
  warnings.warn("missing extension modules")
/Users/rjl/clawpack_src/clawpack_master/clawpack/pyclaw/util.py:79: UserWarning: running python setup.py build_ext -i in /Users/rjl/clawpack_src/clawpack_master/clawpack/pyclaw/examples/shallow_sphere
  warnings.warn("running python setup.py build_ext -i in %s" % working_dir)
running build_ext
running build_src
build_src
building extension "shallow_sphere.classic2" sources
f2py options: []
  adding 'build/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere/fortranobject.c' to sources.
  adding 'build/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere' to include_dirs.
  adding 'build/src.macosx-10.7-x86_64-3.6/shallow_sphere/classic2-f2pywrappers.f' to sources.
building extension "shallow_sphere.problem" sources
f2py options: []
  adding 'build/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere/fortranobject.c' to sources.
  adding 'build/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere' to include_dirs.
build_src: building npy-pkg config files
customize UnixCCompiler
customize UnixCCompiler using build_ext
get_default_fcompiler: matching types: '['gnu95', 'nag', 'absoft', 'ibm', 'intel', 'gnu', 'g95', 'pg']'
customize Gnu95FCompiler
Found executable /usr/local/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build_ext
building 'shallow_sphere.classic2' extension
compiling C sources
C compiler: gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/rjl/miniconda/envs/geo5/include -arch x86_64 -I/Users/rjl/miniconda/envs/geo5/include -arch x86_64

compile options: '-Ibuild/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere -I/Users/rjl/miniconda/envs/geo5/lib/python3.6/site-packages/numpy/core/include -I/Users/rjl/miniconda/envs/geo5/include/python3.6m -c'
compiling Fortran sources
Fortran f77 compiler: /usr/local/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -arch x86_64 -fPIC -O2 -fopenmp -O3 -funroll-loops
Fortran f90 compiler: /usr/local/bin/gfortran -Wall -g -fno-second-underscore -arch x86_64 -fPIC -O2 -fopenmp -O3 -funroll-loops
Fortran fix compiler: /usr/local/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -Wall -g -fno-second-underscore -arch x86_64 -fPIC -O2 -fopenmp -O3 -funroll-loops
compile options: '-Ibuild/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere -I/Users/rjl/miniconda/envs/geo5/lib/python3.6/site-packages/numpy/core/include -I/Users/rjl/miniconda/envs/geo5/include/python3.6m -c'
gfortran:f90: ./step2qcor.f90
gfortran:f90: ./qcor.f90
./qcor.f90:2:27:

    2 |     subroutine qcor(ixy,i,m,aux,q,maxm,num_eqn,num_ghost,qc)
      |                           1
Warning: Unused dummy argument 'm' at (1) [-Wunused-dummy-argument]
gfortran:f90: /Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/limiter.f90
gfortran:f90: /Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/philim.f90
gfortran:f90: /Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/flux2.f90
gfortran:f90: /Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/step2ds.f90
/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/step2ds.f90:111:55:

  111 |                     do 23 i = 1-num_ghost, mx+num_ghost
      |                                                       1
Warning: Fortran 2018 deleted feature: Shared DO termination label 23 at (1)
/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/step2ds.f90:117:59:

  117 |                         do 24 i = 1-num_ghost, mx+num_ghost
      |                                                           1
Warning: Fortran 2018 deleted feature: Shared DO termination label 24 at (1)
/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/step2ds.f90:124:59:

  124 |                         do 25 i = 1-num_ghost, mx+num_ghost
      |                                                           1
Warning: Fortran 2018 deleted feature: Shared DO termination label 25 at (1)
/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/step2ds.f90:193:55:

  193 |                     do 73 j = 1-num_ghost, my+num_ghost
      |                                                       1
Warning: Fortran 2018 deleted feature: Shared DO termination label 73 at (1)
/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/step2ds.f90:199:59:

  199 |                         do 74 j = 1-num_ghost, my+num_ghost
      |                                                           1
Warning: Fortran 2018 deleted feature: Shared DO termination label 74 at (1)
/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/step2ds.f90:206:59:

  206 |                         do 75 j = 1-num_ghost, my+num_ghost
      |                                                           1
Warning: Fortran 2018 deleted feature: Shared DO termination label 75 at (1)
gfortran:f77: build/src.macosx-10.7-x86_64-3.6/shallow_sphere/classic2-f2pywrappers.f
/usr/local/bin/gfortran -Wall -g -arch x86_64 -Wall -g -undefined dynamic_lookup -bundle build/temp.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere/classic2module.o build/temp.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere/fortranobject.o build/temp.macosx-10.7-x86_64-3.6/step2qcor.o build/temp.macosx-10.7-x86_64-3.6/qcor.o build/temp.macosx-10.7-x86_64-3.6/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/limiter.o build/temp.macosx-10.7-x86_64-3.6/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/philim.o build/temp.macosx-10.7-x86_64-3.6/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/flux2.o build/temp.macosx-10.7-x86_64-3.6/Users/rjl/clawpack_src/clawpack_master/pyclaw/src/pyclaw/classic/step2ds.o build/temp.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere/classic2-f2pywrappers.o -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin18/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin18/9.2.0/../../.. -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin18/9.2.0/../../.. -lgfortran -o ./classic2.cpython-36m-darwin.so
building 'shallow_sphere.problem' extension
compiling C sources
C compiler: gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/rjl/miniconda/envs/geo5/include -arch x86_64 -I/Users/rjl/miniconda/envs/geo5/include -arch x86_64

compile options: '-Ibuild/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere -I/Users/rjl/miniconda/envs/geo5/lib/python3.6/site-packages/numpy/core/include -I/Users/rjl/miniconda/envs/geo5/include/python3.6m -c'
compiling Fortran sources
Fortran f77 compiler: /usr/local/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -arch x86_64 -fPIC -O2 -fopenmp -O3 -funroll-loops
Fortran f90 compiler: /usr/local/bin/gfortran -Wall -g -fno-second-underscore -arch x86_64 -fPIC -O2 -fopenmp -O3 -funroll-loops
Fortran fix compiler: /usr/local/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -Wall -g -fno-second-underscore -arch x86_64 -fPIC -O2 -fopenmp -O3 -funroll-loops
compile options: '-Ibuild/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere -I/Users/rjl/miniconda/envs/geo5/lib/python3.6/site-packages/numpy/core/include -I/Users/rjl/miniconda/envs/geo5/include/python3.6m -c'
gfortran:f90: ./mapc2p.f90
gfortran:f90: ./setaux.f90
./setaux.f90:76:42:

   76 |         do 15 i=1-num_ghost,mx+num_ghost+1
      |                                          1
Warning: Fortran 2018 deleted feature: Shared DO termination label 15 at (1)
./setaux.f90:104:40:

  104 |         do 20 i=1-num_ghost,mx+num_ghost
      |                                        1
Warning: Fortran 2018 deleted feature: Shared DO termination label 20 at (1)
gfortran:f90: ./qinit.f90
./qinit.f90:43:20:

   43 |         do 20 j=1,my
      |                    1
Warning: Fortran 2018 deleted feature: Shared DO termination label 20 at (1)
./qinit.f90:3:23:

    3 |     dx,dy,q,num_aux,aux,Rsphere)
      |                       1
Warning: Unused dummy argument 'aux' at (1) [-Wunused-dummy-argument]
./qinit.f90:50:0:

   50 |                 theta = dasin(yp/rad)
      | 
Warning: 'theta' may be used uninitialized in this function [-Wmaybe-uninitialized]
gfortran:f90: ./src2.f90
./src2.f90:2:25:

    2 |     subroutine src2(maxmx,maxmy,num_eqn,num_ghost,mx,my,xlower,ylower, &
      |                         1
Warning: Unused dummy argument 'maxmx' at (1) [-Wunused-dummy-argument]
./src2.f90:2:31:

    2 |     subroutine src2(maxmx,maxmy,num_eqn,num_ghost,mx,my,xlower,ylower, &
      |                               1
Warning: Unused dummy argument 'maxmy' at (1) [-Wunused-dummy-argument]
./src2.f90:2:49:

    2 |     subroutine src2(maxmx,maxmy,num_eqn,num_ghost,mx,my,xlower,ylower, &
      |                                                 1
Warning: Unused dummy argument 'num_ghost' at (1) [-Wunused-dummy-argument]
./src2.f90:3:25:

    3 |     dx,dy,q,num_aux,aux,t,dt,Rsphere)
      |                         1
Warning: Unused dummy argument 't' at (1) [-Wunused-dummy-argument]
/usr/local/bin/gfortran -Wall -g -arch x86_64 -Wall -g -undefined dynamic_lookup -bundle build/temp.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere/problemmodule.o build/temp.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/build/src.macosx-10.7-x86_64-3.6/shallow_sphere/fortranobject.o build/temp.macosx-10.7-x86_64-3.6/mapc2p.o build/temp.macosx-10.7-x86_64-3.6/setaux.o build/temp.macosx-10.7-x86_64-3.6/qinit.o build/temp.macosx-10.7-x86_64-3.6/src2.o -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin18/9.2.0 -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin18/9.2.0/../../.. -L/usr/local/Cellar/gcc/9.2.0_3/lib/gcc/9/gcc/x86_64-apple-darwin18/9.2.0/../../.. -lgfortran -o ./problem.cpython-36m-darwin.so
/Users/rjl/clawpack_src/clawpack_master/clawpack/pyclaw/util.py:84: UserWarning: successfully executed python setup.py build_ext -i in /Users/rjl/clawpack_src/clawpack_master/clawpack/pyclaw/examples/shallow_sphere
  warnings.warn("successfully executed python setup.py build_ext -i in %s" % working_dir)
Segmentation fault: 11

@donnaaboise
Copy link
Contributor

donnaaboise commented Apr 23, 2020 via email

@ketch
Copy link
Member

ketch commented Apr 23, 2020

I wonder if those "shared DO termination label"s are actually a problem? We should get rid of them anyway. I'll do that later today.

@rjleveque
Copy link
Member Author

rjleveque commented Apr 23, 2020

I haven't run into problems with GeoClaw with the termination label warnings, but I agree we need to eventually clean it all up.

Also @mjberger recently noticed you can add the gfortran flag -std=legacy to FFLAGS to avoid some other problems with the latest gfortran not working on legacy code. I'm not sure how to add this to the f2py compilation however, since changing my FFLAGS for this example didn't have any effect.

@mandli
Copy link
Member

mandli commented Apr 26, 2020

f2py has a bit complex bit of flagging as it acts as C compiler, a Fortran compiler and somehow gets Python in there. I think if you want to do fortran flags you need to use the --f90flags= flag to f2py.

@ketch
Copy link
Member

ketch commented Apr 27, 2020

I cleaned up all the loop labels in #643.

@ketch
Copy link
Member

ketch commented Sep 25, 2024

Currently this test is failing in a similar way on the CI test, but for me and at least one other person it passes locally on a Macbook.

@mandli
Copy link
Member

mandli commented Sep 25, 2024

This works for me too on Apple M1, macOS 14.7. Does someone have access to ifort that could see if that might produce the problem? I would hazard a guess that it is a problem in the Fortran allocation that gfortran either cleans up or does not care about when linked in Python.

@ketch
Copy link
Member

ketch commented Oct 6, 2024

@rjleveque could you check if this test still fails for you locally? i.e. do

cd clawpack/pyclaw/examples/shallow_sphere
pytest

If it fails, then try doing this in a Python session and let me know the output:

from clawpack.pyclaw.examples.shallow_sphere import sw_sphere_problem
sw_sphere_problem.src2?

I'm suspicious that f2py might be generating a slightly different signature for the src2 wrapper on some systems.

@pavelkomarov
Copy link
Contributor

pavelkomarov commented Oct 17, 2024

When I run pytests locally, I do not get the segmentation fault (running on an Intel i7 in a macbook) for shallow spheres; it passes fine. But I do get a test failure for one of the advection tests.

============================================================= FAILURES =============================================================
__________________________________ TestAdvectionVarCoeff1D.test_sharpclaw_custom_time_integrator ___________________________________

self = <examples.advection_1d_variable.test_variable_coefficient_advection.TestAdvectionVarCoeff1D object at 0x11140d550>

    def test_sharpclaw_custom_time_integrator(self):
        #Load Butcher Tableaux for custom time integrators
        rk_methods_dict = np.load(os.path.join(thisdir,'rk_methods.npy'),allow_pickle=True).item()
        rk_names = list(rk_methods_dict.keys())
        for rk_name in rk_names:
            rk_coeffs = rk_methods_dict[rk_name]
>           assert error(test_name='sharpclaw_custom_time_integrator_'+rk_name,kernel_language='Fortran',solver_type='sharpclaw',
                             time_integrator='RK',rk_coeffs=rk_coeffs)<1e-6, f"Failed for {rk_name}"
E           AssertionError: Failed for SSP33
E           assert np.float64(1.1389149780820139e-06) < 1e-06
E            +  where np.float64(1.1389149780820139e-06) = error(test_name=('sharpclaw_custom_time_integrator_' + 'SSP33'), kernel_language='Fortran', solver_type='sharpclaw', time_integrator='RK', rk_coeffs=[array([[0.  , 0.  , 0.  ],\n       [1.  , 0.  , 0.  ],\n       [0.25, 0.25, 0.  ]]), array([0.16666667, 0.16666667, 0.66666667]), array([0. , 1. , 0.5])])

examples/advection_1d_variable/test_variable_coefficient_advection.py:50: AssertionError

Are the tests numerically brittle? Can we set them up to accept some value +/- some error range? For example, see the accuracy_thresholds in test_ppr_learns() here.

Does the code or do the tests need division by zero checks or sqrt(negative) checks to preempt numerical problems? Or do we expect the segfault has a less banal cause?

@mandli
Copy link
Member

mandli commented Oct 17, 2024

What flags are you using for compilation with ifort? In general the types of problems can have issues if the compiler decides to optimize in unfortunate ways, such as reordering of operations. The intel compiler suite allows more control over this and can check for other issues, which is why it may be fruitful to add some debugging flags to the compilation.

@pavelkomarov
Copy link
Contributor

pavelkomarov commented Oct 17, 2024

What is ifort?

@mandli
Copy link
Member

mandli commented Oct 17, 2024

The fortran intel compiler. If you are using gfortran then I am a bit surprised that you are seeing a segfault as I thought we were not able to reproduce it with the GCC compilers.

@pavelkomarov
Copy link
Contributor

pavelkomarov commented Oct 17, 2024

I'm not seeing a segfault on this architecture. I'm seeing an ordinary test failure.

I'm just wondering whether some kind of checking can forestall both these things. Can the root cause ultimately be reduced to how the tests or code was written? Because that we can control. Which compiler is used in the wild or on CI is difficult to control, though obviously we'd like to work on all of them.

@mandli
Copy link
Member

mandli commented Oct 17, 2024

Ah, sorry, getting confused as to where is what and which is failing...

We really do not want to do the checks as (a) they are relatively expensive and (b) they should fail if a sqrt of a negative number is being taken or an overflow is occurring due to a divide by zero. The algorithms have certain guarantees regarding positivity for instance so these may well be true bugs. These algorithms tend to brittle with aggressive compiler optimization, which is hard to predict. It's best to find the cause and see if there's more stable means of computing the problematic bit or enforcing order of operations for instance.

@rjleveque
Copy link
Member Author

Sorry for the slow response to the request from @ketch in #642 (comment)

I just tried and it seems to work fine using v5.11.0, on my MacBook M1 with gfortran 13.2.0.

Running

cd pyclaw/examples/shallow_sphere
pytest

gives no error and also

python Rossby_wave.py iplot=1

now runs and produces plots.

Also, when I checked out the master branch in pyclaw, which includes PRs #729 - #732, and then re-install and run pytest again it still works.

@ketch
Copy link
Member

ketch commented Oct 18, 2024

Are the tests numerically brittle? Can we set them up to accept some value +/- some error range? For example, see the accuracy_thresholds in test_ppr_learns() here.

The tests are already set up to accept a range of values intended to handle rounding errors on different architectures. You're welcome to open a new issue to discuss this failure, which I believe is not related to the shallow_sphere segfault.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants