Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __array__ method to pyopencl.Array #301

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

jni
Copy link

@jni jni commented Oct 9, 2019

Hello! We are beginning to experiment with OpenCL over at scikit-image (see e.g. scikit-image/scikit-image#4215). For us, it would be ideal np.asarray(pyopencl_array) worked transparently. The __array__ method just calls get, but enables easier compatibility within the NumPy ecosystem.

Here's the approach demonstrated with monkeypatching:

In [1]: import numpy as np
In [2]: from pyopencl.array import Array                                                                                      
In [3]: from gputools import OCLArray                                                                                         
In [4]: arr = OCLArray.from_array(np.random.random((32, 32)))                                                                 
In [5]: type(arr)                                                                                                             
Out[5]: pyopencl.array.Array
In [6]: np.asarray(arr)                                                                                                       
Out[6]: 
array([[array(0.2397247), array(0.77347383), array(0.96216983), ...,
        array(0.5608753), array(0.30588483), array(0.52376195)],
       [array(0.70506864), array(0.22409207), array(0.84294354), ...,
        array(0.21627319), array(0.72259475), array(0.36756701)],
       [array(0.37779019), array(0.23074081), array(0.31592551), ...,
        array(0.45015637), array(0.08316402), array(0.15636401)],
       ...,
       [array(0.30017419), array(0.39220295), array(0.0588779), ...,
        array(0.49011532), array(0.6003594), array(0.78047825)],
       [array(0.14872931), array(0.76511074), array(0.69126287), ...,
        array(0.0361155), array(0.07902312), array(0.12513675)],
       [array(0.80042382), array(0.16620888), array(0.28844463), ...,
        array(0.065685), array(0.9465674), array(0.56786556)]],
      dtype=object)

In [7]: Array.__array__ = Array.get                                                                                           
In [8]: np.asarray(arr)                                                                                                       
Out[8]: 
array([[0.2397247 , 0.77347383, 0.96216983, ..., 0.5608753 , 0.30588483,
        0.52376195],
       [0.70506864, 0.22409207, 0.84294354, ..., 0.21627319, 0.72259475,
        0.36756701],
       [0.37779019, 0.23074081, 0.31592551, ..., 0.45015637, 0.08316402,
        0.15636401],
       ...,
       [0.30017419, 0.39220295, 0.0588779 , ..., 0.49011532, 0.6003594 ,
        0.78047825],
       [0.14872931, 0.76511074, 0.69126287, ..., 0.0361155 , 0.07902312,
        0.12513675],
       [0.80042382, 0.16620888, 0.28844463, ..., 0.065685  , 0.9465674 ,
        0.56786556]])

For more information, see:
https://docs.scipy.org/doc/numpy/user/basics.dispatch.html#module-numpy.doc.dispatch
https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

Thank you!

@jni
Copy link
Author

jni commented Oct 15, 2019

@inducer Yikes, looks like a bunch of stuff broken in CI that I hope is independent from this change and that probably takes priority over this. However, could I get some comment from you about whether this would be a welcome addition? Thanks!

@inducer
Copy link
Owner

inducer commented Oct 15, 2019

CI says the failure is in the test you added:

_ test_array_method[<context factory for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz' on 'Portable Computing Language'>] _
Traceback (most recent call last):
  File "/home/vsts/work/1/s/test/test_array.py", line 670, in test_array_method
    arr = make_random_array(queue, np.float32, (32, 32))
  File "/home/vsts/work/1/s/test/test_array.py", line 75, in make_random_array
    return rand(queue, shape=(size,), dtype=dtype)
  File "/home/vsts/work/1/s/.miniconda3/envs/testing/lib/python2.7/site-packages/pyopencl/clrandom.py", line 765, in rand
    result = Array(queue, shape, dtype)
  File "/home/vsts/work/1/s/.miniconda3/envs/testing/lib/python2.7/site-packages/pyopencl/array.py", line 500, in __init__
    context, cl.mem_flags.READ_WRITE, alloc_nbytes)
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. pyopencl._cl.Buffer(context: pyopencl._cl.Context, flags: int, size: int = 0, hostbuf: object = None)

Invoked with: <pyopencl.Context at 0x55cc47968910 on <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz' on 'Portable Computing Language' at 0x55cc329c4fa0>>, 1, (32, 32, 32, 32, 32, 32, 32, 32)

(You can click through to "view details on Azure Pipelines" from the CI results)

I'm not sure I'm a fan of this, as Array.get() can be quite expensive if you're not careful, especially if you end up sloshing data back and forth between CPU and GPU inadvertently. So I'd like to retain an obvious syntactic hint that what you're doing is not free.

@inducer
Copy link
Owner

inducer commented Oct 15, 2019

I should ask, could you explain the specific need filled by having asarray work? Is it just convenience or something deeper?

@jni
Copy link
Author

jni commented Oct 15, 2019

Ooops! Sorry, I somehow didn't find that line in the logs!

Re the utility of it: it's common and very useful to allow np.asarray on array-likes to work as expected, as it allows unifying code in pipelines that, of necessity, mix OpenCL and NumPy arrays. In my case, I'm trying to allow dispatching certain operations in scikit-image to the GPU, if OpenCL is correctly configured. Due to the prevalence of scikit-image, this must be optional. This means that I will end up peppering a lot of code with .get()s that could otherwise be totally transparent. In this code, for example, a simple shim around gputools.affine could allow me get rid of that entire if clause:

jni/scikit-image@0c6fb69#diff-28412ceac9e3406ba1842d7ba3938e44R177-R183

Yes, I understand that there are performance considerations, but a certain amount of responsibility should fall on library creators. In this case, because not everything I need is implemented in gputools, some operations have to happen in NumPy, and it'd be great if I could just feed in my pyopencl arrays transparently.

This whole endeavour is reasonably contentious, so the smaller I can keep the impact to the code, the better...

Thanks!

@inducer
Copy link
Owner

inducer commented Oct 17, 2019

  • Your code is still failing the Flake8 linter, see CI. FWIW, it is expected that your code passes CI before it gets merged.
  • What will this do for numpy_array + cl_array?
  • Will np.asarray make another copy? Or just return what __array__ returns?

@jni
Copy link
Author

jni commented Oct 18, 2019

@inducer Apologies for the churn!

Your code is still failing the Flake8 linter, see CI.

Fixed.

FWIW, it is expected that your code passes CI before it gets merged.

Of course. Thanks for the reminder. =)

The Travis failures appear to be NumPy compilation failures, which I think are unrelated? (That's what confused me with the first commit.)

What will this do for numpy_array + cl_array?

This currently fails with a ValueError in PyOpenCL:

In [6]: np.random.random((32, 32)) + arr                                                                                                                                                      
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-80631e85a40f> in <module>
----> 1 np.random.random((32, 32)) + arr

~/miniconda3/envs/cf/lib/python3.6/site-packages/pyopencl/array.py in __add__(self, other)
    920         else:
    921             # add a scalar
--> 922             if other == 0:
    923                 return self.copy()
    924             else:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This failure is the same with and without this PR. More work needs to be done to support this, which I'd be interested in pursuing but probably belongs in a new PR. I think the correct way to get automatic coercion here is to raise NotImplemented, but I would have to verify this.

Will np.asarray make another copy?

Only if a specific dtype is requested that is different from the one provided by __array__. From the np.asarray docstring:

No copy is performed if the input is already an ndarray with matching dtype and order. If a is a subclass of ndarray, a base class ndarray is returned.

@d-v-b
Copy link

d-v-b commented Oct 30, 2019

👍 to this change... It would make dispatching to pyopencl arrays much easier.

@emmanuelle
Copy link

@jni if you think that the CI issue is with numpy compilation, did you try rebasing on master and see if it disappears?

@jni
Copy link
Author

jni commented Jan 10, 2020

@inducer, @emmanuelle's suggestion worked! Would you mind giving this another round of review? =) Thank you!

@jni
Copy link
Author

jni commented Feb 15, 2020

@inducer do you have any lingering doubts about this? If it doesn't fit with your vision for pyopencl, please let me know, as I can potentially get this done from our end with subclassing or monkey-patching. I just prefer to contribute to the libraries I find useful — but I don't mean to presume what should go in!

Thanks again!

Base automatically changed from master to main March 8, 2021 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants