Replies: 4 comments 1 reply
-
Not ideal. 1 breaks symmetry, as unpacking would not yield the same object back. 2 enables all manner of bad things. For unpacking it's not as bad. |
Beta Was this translation helpful? Give feedback.
-
I think msgpack is a JSON-like format, not a format optimized for large data. I am happy when I see people loving msgpack, but I feel it is wrong choice for sometime. |
Beta Was this translation helpful? Give feedback.
-
True. Even if you were to add padding bytes to the message, there is no way to ensure that the message itself is aligned. |
Beta Was this translation helpful? Give feedback.
-
It's true that there is no alignment guarantees, but numpy doesn't prevent users from creating arrays from memory addresses that are not aligned with the array data type. Sure, performance afterwards might suffer (depending on how much of the data you will actually read/process), but at least you can save the initial copy. data = b'\x00\x00\x00\x00\xff\xff\xff\xff'
aligned = np.ndarray(buffer=data, dtype=np.uint32, shape=(2,))
misaligned = np.ndarray(buffer=memoryview(data)[1:], dtype='>u4', shape=(1,))
print(aligned[0])
# 0
print(misaligned[0])
# 255
print(f"address: 0x{aligned.__array_interface__['data'][0]:02x}")
# address: 0x7fa40fb93a10
print(f"address: 0x{misaligned.__array_interface__['data'][0]:02x}")
# address: 0x7fa40fb93a11 |
Beta Was this translation helpful? Give feedback.
-
Hi,
A bit related to #547, I spent a bit of time investigating some weeks ago the deserialisation side of msgpack, specifically in the context of serialing xarrays, or even more general numpy arrays. At first I expected to find a zero-copy mechanism to be available when using
unpackb
with a buffer, but I quickly realised that when deserialisation happens, raw bytes are inevitably copied into abytes
object, which is what users get back, resulting in a copy.I played locally adding a new
bytes_memoryview
option to theunpackb
method to get zero-copy for numpy array deserialisation, and I see it working as expected, with numpy arrays pointing directly to the memory underlying the input forunpackb
, and thus it has near-to-zero cost. This also requires a change tomsgpack-numpy
, but OTOH we could use our owndefault
/object_hook
functions and not depend onmsgpack-numpy
.Questions:
Unpacker
wouldn't benefit from this, or at least not without heavy refactoring, is that correct? My current diff contains a small change forUnpacker
, but in retrospective I realise I don't need it, and probably would actually confuse people.See below for the benchmark code, results in Python 3.10 and 3.11 (noisy, this is my work laptop), and the current diff.
After producing these plots, I realised that the bottom-left panel isn't relevant, as it's mostly a copy of the top-left panel, but hopefully doesn't distract too much from the results being shown.
Results in Python 3.10
Results in Python 3.11
Current diff: rtobar@76b2888
Beta Was this translation helpful? Give feedback.
All reactions