You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The array api standard makes it feasible to take advantage of increased interoperability between array libraries. This means that a single array based code can efficiently be deployed that is agnostic to the underlying array type i.e. the arrays can be of numpy, cupy, dask, jax, or any other array api compliant standard. This makes it simple to upgrade a code that is organised around Numpy (single threaded cpu) and deploy this in multithreaded CPU (e.g Dask), or alternative architecture (e.g. CUDA GPU) scenarios. This approach also allows use of lazy evaluation (DASK, JAX) vs eager evaluation (Numpy), and (in principle) Just in Time compilation (JAX).
Details on it aims, implementation and stakeholder scan be found here: Array API
The array_api_compat library has been developed to act as a common interface which simplifies the practical implementation of this intent. This library provides a pure python wrapper which provides aliases and helper functions. Array API compat
There aren't many alternatives to this approach that capture both speedup through CPU parallelisation and GPU support while maintaining an interface which is similar to pure Numpy.
Dependencies:
We wouldn't need to add any extra array libraries as hard dependencies. They can just be used on an 'if available' basis. We will want to keep Numpy as the default array processing lib.
Challenges:
Identifying which parts of the code are naturally agnostic to where they are processed, and which are constrained to the CPU (e.g. file I/O). We might need to add in checks to move objects in memory to the required device if needed (e.g. if processing on CUDA GPU is requested). The array_api_compat lib adds .device and .to_device functionality to make this simple (hopefully).
I've added a code snippet to demo an example of how this can be implemented and how this leads to array type interoperability. At the core is the namespace query which is able to evaluate which array implementation is being passed to the function and then use the correct array function (matmul in this case) implementation (i.e the numpy version, the dask version etc) to complete the required operation. Simple scalar operations on arrays or basic array element multiplication can be done as normal I think.
Any changes we make should be fully backwards compatible as they extend the functionality of any function or class, rather than modifying it. All tests can remain the same.
fromarray_api_compatimportarray_namespaceimportarray_api_compat.numpyasnpimportarray_api_compat.torchastorchfromnumpy.typingimportNDArrayfromarray_api_compat.torchimportTensorfromtypingimportUnionAPIArray=Union[NDArray,Tensor]
deffunction(x: APIArray, y: APIArray):
#determine the namesapcexp=array_namespace(x, y)
# Now use xp as the array library namespacereturnxp.matmul(x,y)
defset_backend(backend):
ifbackend=="numpy":
returnnpelifbackend=='torch':
returntorchelse:
raiseValueErrorbackend=set_backend("numpy")
a=backend.asarray([(1,2,3), (4,5,6)])
b=backend.asarray([(1,2), (3,4), (5,6)])
c=function(a, b)
print(c)
The text was updated successfully, but these errors were encountered:
The array api standard makes it feasible to take advantage of increased interoperability between array libraries. This means that a single array based code can efficiently be deployed that is agnostic to the underlying array type i.e. the arrays can be of numpy, cupy, dask, jax, or any other array api compliant standard. This makes it simple to upgrade a code that is organised around Numpy (single threaded cpu) and deploy this in multithreaded CPU (e.g Dask), or alternative architecture (e.g. CUDA GPU) scenarios. This approach also allows use of lazy evaluation (DASK, JAX) vs eager evaluation (Numpy), and (in principle) Just in Time compilation (JAX).
Details on it aims, implementation and stakeholder scan be found here:
Array API
The array_api_compat library has been developed to act as a common interface which simplifies the practical implementation of this intent. This library provides a pure python wrapper which provides aliases and helper functions.
Array API compat
There aren't many alternatives to this approach that capture both speedup through CPU parallelisation and GPU support while maintaining an interface which is similar to pure Numpy.
Dependencies:
We wouldn't need to add any extra array libraries as hard dependencies. They can just be used on an 'if available' basis. We will want to keep Numpy as the default array processing lib.
Challenges:
Identifying which parts of the code are naturally agnostic to where they are processed, and which are constrained to the CPU (e.g. file I/O). We might need to add in checks to move objects in memory to the required device if needed (e.g. if processing on CUDA GPU is requested). The array_api_compat lib adds
.device
and.to_device
functionality to make this simple (hopefully).I've added a code snippet to demo an example of how this can be implemented and how this leads to array type interoperability. At the core is the namespace query which is able to evaluate which array implementation is being passed to the function and then use the correct array function (
matmul
in this case) implementation (i.e the numpy version, the dask version etc) to complete the required operation. Simple scalar operations on arrays or basic array element multiplication can be done as normal I think.Any changes we make should be fully backwards compatible as they extend the functionality of any function or class, rather than modifying it. All tests can remain the same.
The text was updated successfully, but these errors were encountered: