-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33801: [Python] Generate on the fly Python Extension Types wrapping C++ Extension Types #34483
Conversation
Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍 |
Would it be possible to re-open this issue? I'm awaiting review from Arrow maintainers here. |
Yes of course, the cleaning was done exactly to surface the prs that still needed attention |
@sjperkins sorry about closing all your PRs, and about the slow review here. I will try to get back to this one and #34469 shortly. Other question: you couldn't reopen this PR yourself? I mean, did the GitHub UI allow that? Because if only committers can reopen PRs, that would be useful to know because the messaging should be different then if we would do some auto-closing in the future. |
No problem! I assumed that the upcoming Arrow 12.0.0 release was important.
Yes, the UI didn't seem to give me the option to reopen the PR. |
But I do have the ability to close the PR.. |
OK, this does seem possible if I close the PR. Apologies I responded late yesterday evening, so I may have conflated the force push in #33805 or missed the button in this issue. |
@jorisvandenbossche Regarding this issue and #33997, they're somewhat exploratory because they dynamically generate Python types. I'm interested in whether these strategies would be a useful way of exposing pure C++ Extension Types in Python. For instance, def to_numpy(self):
....
array_cls = uuid_type.__arrow_ext_class__()
array_cls.to_numpy = to_numpy However, perhaps this should be done through the existing PyExtensionType mechanisms (both Python and C++)? |
Some further context to motivate for this: It would be useful to efficiently convert nested FixedSizeListArray's into numpy arrays. The default produces deeply nested numpy arrays with This may also be relevant to #8510 which, from imperfect memory, uses FixedSizeListArray's to represent Tensors. |
Can you reopen it now? (it might depend on whether you closed it yourself) |
Ah I cannot. I think it does depend on the closer. |
OK, that's good to know, thanks for checking! |
I am not a huge fan of the idea of creating classes on the fly .. Also, does this give something more useful than wrapping it just in a base class as is done now? (because right now this generated class also doesn't have any extension-type specific logic?)
Yeah, so if we have a way to register a python class to use as the type class for an extension type implemented in C++ (#33997), you can override the
Yes, that one is close to being merged, and then we can expose this in python, which might be a useful exercise to see how this goes / what we can learn from this (but of course it's an internal one, so we can hard code support for it) |
Yes, I understand the suggestion is a bit exotic. I think what it gives is the ability to associate specific Python classes with C++ Extension Types/Arrays and attach (mainly)
Thanks for pointing this out, this gives me a broader perspective on what would be useful.
I'll think about this a bit more. let me know if there're approaches you think would be useful to experiment with. |
Closing because the use of dynamic types is a bit too exotic. |
Alternative implementation to #34469
Rationale for this change
Explained in greater detail in #33997, but my summary follows:
ExtensionArray
andExtensionScalar
types are associated with C++ Extensions exposed throughBaseExtensionType
, rather than hand-coded Python types (which would be laborious to generate).What changes are included in this PR?
get_cpp_extension_type
is called a Python wrapper CppExtensionType is created and cached, along with Python ExtensionArrays and ExtensionScalarsCppExtensionType.__arrow_ext_serialize__()
andCppExtensionType.__arrow_ext_deserialize__()
defer toCExtensionType.Serialize()
andCExtensionType.Deserialize()
respectively.Are these changes tested?
__arrow_ext_class__()
and__arrow_ext_scalar_class__()
are tested__arrow_ext_serialize__()
and__arrow_ext_deserialize__()
are not tested yet. I'd appreciate feedback on whether the general approach is useful going forward before adding tests.Are there any user-facing changes?