Surface Interpolation Function [Fixes Issue45] #52

ojeda-e · 2021-07-22T07:27:06Z

This is a starting point to solve #45

Changes in this PR:

Added function to calculate interpolation of z_surface (surface_interpolation). Includes docstrings.
Added tests of surface_interpolation.

As suggested by @orbeckst in this comment, the function here included uses NumPy. Thanks for encouraging numpy!

Surfaces included in tests are dummy_arrays with the following characteristics:

size 3x3 with all values 150 and one np.nan.
size 4x3 with all values 150 and two np.nan.
size 4x4 with all values 150 and two np.nan.
size 3x3 array 3x3 with lots of np.nan. 💪🏽

Probably I'll be asked to add more tests, but this is ok to start. Probably @lilyminium will give me lots of ideas. 😅

Thanks.

pep8speaks · 2021-07-22T07:27:09Z

Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file membrane_curvature/tests/test_membrane_curvature.py:

Line 7:88: W291 trailing whitespace

Comment last updated at 2021-08-07 22:39:35 UTC

codecov · 2021-07-22T07:29:32Z

Codecov Report

Merging #52 (7d7f4ca) into main (8a52295) will not change coverage.
The diff coverage is 100.00%.

membrane_curvature/tests/test_membrane_curvature.py

membrane_curvature/surface.py

tylerjereddy · 2021-07-23T21:54:31Z

membrane_curvature/surface.py

+    # interpolate values in array
+    interpolated_array = np.interp(index_array,
+                                   np.flatnonzero(~mask_nans),
+                                   array_surface[~mask_nans])


Since mask_nans is used twice in inverted form, and never in its original form, perhaps it is worth it to invert at the original isnan location rather than inverting the original value twice?

I completely missed it and it's a good point. You mean, having this instead, correct?

mask_nans = ~np.isnan(array_surface) #invert original index_array = np.arange(array_surface.shape[0]) interpolated_array = np.interp(index_array, np.flatnonzero(mask_nans), #not inverted array_surface[mask_nans]). #not inverted

tylerjereddy · 2021-07-23T21:57:43Z

membrane_curvature/tests/test_membrane_curvature.py

+])
+def test_surface_interpolation(dummy_surface, expected_interpolated_surface):
+    surface = surface_interpolation(dummy_surface)
+    assert_almost_equal(surface, expected_interpolated_surface)


For arrays of floats, the docstring for assert_almost_equal suggests using i.e., assert_allclose these days for better consistency.

Thanks. I remember @orbeckst mentioned that MDAnalysis typically uses assert_almost_equal. Maybe we can have his opinion on what would be best here?

MDAnalysis uses assert_almost_equal in preference to assert_array_almost_equal. assert_allclose might be more recent than when we started using numpy tests so MDAnalysis might just be outdated and should be switching eventually. Follow @tylerjereddy 's advice :-).

membrane_curvature/surface.py

membrane_curvature/tests/test_membrane_curvature.py

ojeda-e · 2021-08-05T23:19:17Z

The pushed changes include:

replaced assert_almost_equal by assert_allclose. For two assertions I set tolerance to rtol=6 otherwise the tests fail. Not sure you are ok with this change. (Thanks @tylerjereddy for the suggestion and @orbeckst for confirming)
added interpolation to base.py and tests for grids of different sizes and with a different number of undefined values.
In base.py, kept the data for z_surface and interpolated_z_surface as suggested by @lilyminium.

I know base.py could be written better, but maybe we can leave that improvement for later and work on the essentials for now if that's ok.
Hopefully I am not missing relevant things here. Thanks!

membrane_curvature/tests/test_membrane_curvature.py

ojeda-e · 2021-08-07T17:23:31Z

@lilyminium sorry, it's my fault because I made a bit of a mess here.
To clarify.
def test_mean_curvature_small passes with assert_allclose. I don't know why I ended up adding rtol it should go. (I added the comment here)

The problem is in def test_gaussian_curvature_all. The test passes with rtol=4 or atol=1e-7. From the docs it says allclose "... compares the difference between actual and desired to atol + rtol * abs(desired)."
So probably better have assert_allclose(k, k_test, atol=1e-7) ?

lilyminium · 2021-08-08T03:59:28Z

So probably better have assert_allclose(k, k_test, atol=1e-7) ?

I think so. Even if you know yourself that the 400% difference is small because the real value is small, it communicates to future users that you expect the degree of difference to be around 0.0000001.

lilyminium

The tests here are a good start! However I think you definitely need to vary your values (maybe by doing tests on your added data files) for robust checking. I am already reasonably certain that self.results.average_interpolated_z_surface etc. are not calculated the way you want them to, which wouldn't have been caught with current test cases (although there aren't actually tests for those yet -- they need some too!).

lilyminium · 2021-08-08T04:17:29Z

membrane_curvature/base.py


    def _conclude(self):
        self.results.average_z_surface = np.nanmean(self.results.z_surface, axis=0)
        self.results.average_mean = np.nanmean(self.results.mean, axis=0)
        self.results.average_gaussian = np.nanmean(self.results.gaussian, axis=0)
+        if self.interpolation:
+            self.results.average_interpolated_z_surface = np.nanmean(
+                self.results.interpolated_z_surface[self._frame_index], axis=0)


Suggested change

self.results.interpolated_z_surface[self._frame_index], axis=0)

self.results.interpolated_z_surface, axis=0)

And with the others too -- if you specify frame_index, you'll wind up only getting the mean of the last frame.

Do you need np.nanmean? Do you expect np.nan values in your interpolated surfaces? If not, using np.mean will reveal errors in case there ever are np.nan values.

lilyminium · 2021-08-08T04:20:25Z

membrane_curvature/surface.py

+
+def interpolation_by_array(array_surface):
+    """
+        Interpolates values contained in `array_surface` over axis


Over which axis?

lilyminium · 2021-08-08T04:28:55Z

membrane_curvature/surface.py

+    mask_nans = ~np.isnan(array_surface)
+
+    # index of array_surface
+    index_array = np.arange(array_surface.shape[0])
+
+    # interpolate values in array
+    interpolated_array = np.interp(index_array,
+                                   np.flatnonzero(mask_nans),
+                                   array_surface[mask_nans])
+
+    return interpolated_array


I'm not quite understanding this. mask_nans is 2D, with shape (n_x, n_y). index_array is 1D, with shape n_x. np.flatnonzero(mask_nans) and array_surface[mask_nans] will be 1D, of length between 0 to n_x * n_y. So I think that means that you are only ever operating on the first row of array_surface?

I guess my question is then, why is array_surface 2D? Why not just pass in one row at a time? In addition, as it is a 2D surface, why use a 1D interpolation function? Why not a 2D one?

lilyminium · 2021-08-08T04:31:04Z

membrane_curvature/surface.py

+
+    """
+
+    interpolated_surface = np.apply_along_axis(interpolation_by_array, axis=0, arr=array_surface)


Ok, I see that you are applying interpolation_by_array by row, then. In that case I suggest amending the documentation of interpolation_by_array to specify that the input is 1-dimensional with length n_x, instead of n_x, n_y. Although I think doing 2D interpolation would be less arbitrary (choosing which axis is x and which is y is basically a random, right?)

lilyminium · 2021-08-08T04:34:59Z

membrane_curvature/tests/test_membrane_curvature.py

+    (np.array(([150., 150., 150.],
+               [150., np.nan, 150.],
+               [150., 150., 150.])),
+     np.full((3, 3), 150.)),
+    # array 3x4 with all 150 and two nans
+    (np.array([[150.,  150, 150.,  150.],
+               [150., np.nan, np.nan,  150.],
+               [150., 150., 150.,  150.]]),
+     np.full((3, 4), 150.)),
+    # array 4x4 with all 150 and two nans
+    (np.array([[150., 150,  150.,  150.],
+               [150., np.nan, np.nan,  150.],
+               [150., 130., 140.,  150.],
+               [150., 150., 150.,  150.]]),
+     np.array([[150., 150, 150.,  150.],
+               [150., 140., 145.,  150.],
+               [150., 130., 140.,  150.],
+               [150., 150., 150.,  150.]])),
+    # array 3x3  with lots of nans
+    (np.array([[np.nan, np.nan, 150.],
+              [150, np.nan, 150.],
+              [np.nan, 150., np.nan]]),
+     np.full((3, 3), 150.))


Could you please add some tests with more interesting values? e.g. in the one below, the 145 number could have come from interpolating either along the x-axis or the y-axis. In order to be diagnostic of potential current and future bugs, I'd be interested in seeing one where using 1D interpolation gives different results if it's run across the x-axis, vs. the y-axis.

(np.array([[150., 150, 150., 150.], [150., np.nan, np.nan, 150.], [150., 130., 140., 150.], [150., 150., 150., 150.]]), np.array([[150., 150, 150., 150.], [150., 140., 145., 150.], [150., 130., 140., 150.], [150., 150., 150., 150.]])),

lilyminium · 2021-08-08T04:40:48Z

membrane_curvature/tests/test_membrane_curvature.py

+    @pytest.mark.parametrize('dim_x, dim_y, x_bins, y_bins, dummy_array, expected_interp_surf', [
+        # array 3x3 with all 150 and one nan
+        (300, 300, 3, 3, np.array([[0., 0.,   150.], [100., 0.,   150.],   [200., 0.,   150.],
+                                   [0., 100., 150.], [100., 100., np.nan], [200., 100., 150.],
+                                   [0., 200., 150.], [100., 200., 150.],   [200., 200., 150.]]),
+         np.full((1, 3, 3), 150.)),
+        # array 3x3 with all 150 and one nan
+        (300, 300, 3, 3, np.array([[0., 0.,   150.], [100., 0.,   150.],   [200., 0.,   150.],
+                                   [0., 100., 150.], [100., 100., np.nan], [200., 100., 150.],
+                                   [0., 200., np.nan], [100., 200., 150.],   [200., 200., 150.]]),
+         np.full((1, 3, 3), 150.)),
+        # array 3x4 with all 150 and three nans
+        (300, 400, 3, 4, np.array([[0., 0.,   150.], [100., 0.,   150.],   [200., 0.,   150.],
+                                   [0., 100., 150.], [100., 100., np.nan], [200., 100., 150.],
+                                   [0., 200., 150],  [100., 200., np.nan], [200., 200., np.nan],
+                                   [0., 300., 150.], [100., 300., 150.],   [200., 300., 150.]]),
+         np.full((1, 3, 4), 150.)),
+        # array 4x4 with all 120 and many nans
+        (400, 400, 4, 4, np.array([[0., 0., np.nan], [100., 0., 120.], [200., 0., 120.], [300., 0., np.nan],
+                                   [0., 100., 120.], [100., 100., np.nan], [200., 100., 120.], [300., 100., 120.],
+                                   [0., 200., 120], [100., 200., np.nan], [200., 200., np.nan], [300., 200., 120.],
+                                   [0., 300., np.nan], [100., 300., 120.], [200., 300., 120.], [300., 300., np.nan]]),
+         np.full((1, 4, 4), 120.))
+    ])


Here as well, tests where the expected values aren't all one number are needed to really check assumptions such as:

is it interpolating along the axis I think it is

is it interpolating in a linear way (vs. some random other polynomial)

is it even interpolating, or just taking the most common number?

are we sure it's interpolating independently for every frame, or is it interpolating across different frames?

etc.

I am, incidentally, somewhat surprised that coordinates are getting wrapped with np.nan values in them. @richardjgowers should this be happening...?

ojeda-e commented Jul 22, 2021

View reviewed changes

membrane_curvature/tests/test_membrane_curvature.py Outdated Show resolved Hide resolved

tylerjereddy reviewed Jul 23, 2021

View reviewed changes

ojeda-e linked an issue Jul 25, 2021 that may be closed by this pull request

Undefined values in bins occupied by insertion in membrane. #45

Open

ojeda-e added 5 commits August 2, 2021 16:05

Added interpolation function with docstrings

33b9748

Added surface_interpolation tests

13ea4a9

Fixed PEP8

36bd38e

Fixed PEP8

7382cec

inverted original mask_nans in interpolation function

10fd0a7

ojeda-e force-pushed the issue45 branch from 45c2b06 to 10fd0a7 Compare August 2, 2021 22:33

ojeda-e added 2 commits August 5, 2021 13:30

Replaced assertions by assert_allclose.

fc4ec95

Added interpolation to base with tests.

3a2ad5d

ojeda-e commented Aug 5, 2021

View reviewed changes

membrane_curvature/tests/test_membrane_curvature.py Outdated Show resolved Hide resolved

ojeda-e commented Aug 7, 2021

View reviewed changes

membrane_curvature/tests/test_membrane_curvature.py Outdated Show resolved Hide resolved

changed atol in assert_allclose

7d7f4ca

lilyminium requested changes Aug 8, 2021

View reviewed changes

ojeda-e closed this Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface Interpolation Function [Fixes Issue45] #52

Surface Interpolation Function [Fixes Issue45] #52

ojeda-e commented Jul 22, 2021

pep8speaks commented Jul 22, 2021 •

edited

Loading

codecov bot commented Jul 22, 2021 •

edited

Loading

tylerjereddy Jul 23, 2021

ojeda-e Jul 25, 2021

tylerjereddy Jul 23, 2021

ojeda-e Jul 25, 2021

orbeckst Aug 3, 2021 •

edited

Loading

ojeda-e commented Aug 5, 2021

ojeda-e commented Aug 7, 2021

lilyminium commented Aug 8, 2021

lilyminium left a comment

lilyminium Aug 8, 2021

lilyminium Aug 8, 2021

lilyminium Aug 8, 2021

lilyminium Aug 8, 2021

lilyminium Aug 8, 2021

lilyminium Aug 8, 2021

lilyminium Aug 8, 2021

	self.results.interpolated_z_surface[self._frame_index], axis=0)
	self.results.interpolated_z_surface, axis=0)


		"""

		interpolated_surface = np.apply_along_axis(interpolation_by_array, axis=0, arr=array_surface)

Surface Interpolation Function [Fixes Issue45] #52

Surface Interpolation Function [Fixes Issue45] #52

Conversation

ojeda-e commented Jul 22, 2021

pep8speaks commented Jul 22, 2021 • edited Loading

Comment last updated at 2021-08-07 22:39:35 UTC

codecov bot commented Jul 22, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst Aug 3, 2021 • edited Loading

Choose a reason for hiding this comment

ojeda-e commented Aug 5, 2021

ojeda-e commented Aug 7, 2021

lilyminium commented Aug 8, 2021

lilyminium left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Jul 22, 2021 •

edited

Loading

codecov bot commented Jul 22, 2021 •

edited

Loading

orbeckst Aug 3, 2021 •

edited

Loading