memoryview for raw bytes elements #550

rtobar · 2023-08-01T07:29:12Z

rtobar
Aug 1, 2023

Hi,

A bit related to #547, I spent a bit of time investigating some weeks ago the deserialisation side of msgpack, specifically in the context of serialing xarrays, or even more general numpy arrays. At first I expected to find a zero-copy mechanism to be available when using unpackb with a buffer, but I quickly realised that when deserialisation happens, raw bytes are inevitably copied into a bytes object, which is what users get back, resulting in a copy.

I played locally adding a new bytes_memoryview option to the unpackb method to get zero-copy for numpy array deserialisation, and I see it working as expected, with numpy arrays pointing directly to the memory underlying the input for unpackb, and thus it has near-to-zero cost. This also requires a change to msgpack-numpy, but OTOH we could use our own default/object_hook functions and not depend on msgpack-numpy.

Questions:

Given this speedup, in general would there be willingness to accept a patch to allow for this use case?
Unpacker wouldn't benefit from this, or at least not without heavy refactoring, is that correct? My current diff contains a small change for Unpacker, but in retrospective I realise I don't need it, and probably would actually confuse people.

See below for the benchmark code, results in Python 3.10 and 3.11 (noisy, this is my work laptop), and the current diff.

import msgpack
import msgpack_numpy as mnp
import numpy as np
import xarray as xa
import timeit
import functools
import seaborn as sns
import pandas as pd
import matplotlib.pyplot

mser = lambda x: msgpack.packb(x, default=mnp.encode)
packer = msgpack.Packer(default=mnp.encode, autoreset=False)
def mser_packer(x):
    packer.reset()
    packer.pack(x)
    return packer
munser = lambda x: msgpack.unpackb(x, object_hook=mnp.decode, bytes_memoryview=bytes_memoryview)
munser_packer = lambda x: msgpack.unpackb(x.getbuffer(), object_hook=mnp.decode, bytes_memoryview=bytes_memoryview)

all_benchmarks = {
    "write_only": {
        'msgpack.packb': [mser],
        'msgpack.Packer': [mser_packer],
    },
    "write_read": {
        'msgpack.packb + msgpack.unpackb': [mser, munser],
        'msgpack.Packer + msgpack.unpackb(Packer.getbuffer())': [mser_packer, munser_packer],
    }
}

def benchmark_for_sizes(functions, nitems):
    results = []
    for nitem in nitems:
        arr = np.random.rand(nitem)
        xarr = xa.Dataset({"x": arr, "y": arr + 30})
        size = xarr.nbytes
        print(f"  {nitem=}, {size=}")
        timer = timeit.Timer('functools.reduce(lambda v, f: f(v), functions, xarr.to_dict("array"))', setup="import functools", globals=locals())
        n_executions, total_duration = timer.autorange()
        duration = total_duration / n_executions
        results.append((size, duration))
    return results


def run_benchmarks(nitems, benchmarks):
    results = {}
    for name, functions in benchmarks.items():
        print(f"Benchmarking {name}")
        results[name] = benchmark_for_sizes(functions, nitems)
    flat_results = list((name, size, duration) for name, values in results.items() for size, duration in values)
    df = pd.DataFrame(flat_results, columns=("Benchmark", "Size", "Duration"))
    df['Size [MB]'] = df['Size'] / 1024 / 1024
    df['Speed [MB/s]'] = df['Size'] / 1024 / 1024 / df["Duration"]
    return df

def run_all():
    global bytes_memoryview
    sns.set_theme()
    nitems = list(range(10000000, 2000000, -120000))
    dfs = []
    for use_memoryview in (False, True):
        bytes_memoryview = use_memoryview
        for group, benchmarks in all_benchmarks.items():
            df = run_benchmarks(nitems, benchmarks)
            df["Group"] = group
            df["bytes_memoryview"] = bytes_memoryview
            dfs.append(df)

    df = pd.concat(dfs)
    sns.relplot(data=df, x='Size [MB]', y='Speed [MB/s]', kind='line', hue='Benchmark', col='Group', row="bytes_memoryview")
    matplotlib.pyplot.savefig(f'benchmark_results.png')

if __name__ == '__main__':
    run_all()

After producing these plots, I realised that the bottom-left panel isn't relevant, as it's mostly a copy of the top-left panel, but hopefully doesn't distract too much from the results being shown.

Results in Python 3.10

Results in Python 3.11

Current diff: rtobar@76b2888

jfolz · 2023-08-01T12:57:28Z

jfolz
Aug 1, 2023

bytes_memoryview as an option seems too broad to me. It's similar in concept to raw option of the Unpacker, where suddenly all your strings are now bytes. Really what we want is to avoid copying large objects, not change how bytes are handled.
msgpack-numpy was always a bit of a crude hack (sorry Lev ^^) to avoid double packing of arrays. Adding types not in the spec is exactly what ext types are for, but it's slow with the current API. You have to first pack your array bytes & metadata, and then msgpack packs it again as an ext type. Currently both require copying data, and the only way to fix it I can think of is to allow ExtType to either ...

contain objects that need to be packed, or
directly write to the packer's buffer.

Not ideal. 1 breaks symmetry, as unpacking would not yield the same object back. 2 enables all manner of bad things.

For unpacking it's not as bad. ExtType could optionally be given a memoryview instead of bytes and we can unpack from there. Ignoring memory alignment (which hopefully numpy deals with by itself), this should enable unpacking arrays without copying.

1 reply

rtobar Aug 2, 2023
Author

Thanks @jfolz for all those insights!

bytes_memoryview as an option seems too broad to me. It's similar in concept to raw option of the Unpacker, where suddenly all your strings are now bytes

Exactly, I based myself on that for this quick experiment, but I do realise it's a big hammer, and that it's not necessarily the best way to go about this, hence I wanted to open up the discussion

Adding types not in the spec is exactly what ext types are for, but it's slow with the current API. You have to first pack your array bytes & metadata, and then msgpack packs it again as an ext type.

Thanks, I hadn't looked into that corner of the code yet, but a quick glance confirmed what you just said here.

I'd say there's a third option though: have a new, enriched class (ExtType2, for the same of this text) for providing external types (ExtType is a simple namedtuple ATM) that instead of simply providing a code/data, it can: a) calculate the size of its serialisation, and b) be given a Packer object, so that ExtType2 can issue its own msgpack_pack_ext / msgpack_pack_raw_body calls (maybe offered by Packer as part of its external interface for ExtType2). That way ExtType2 is in control of what is written into the buffer, but not in control of the buffer allocation/resizing itself.

I might have a play with this idea and post back with any findings.

For unpacking it's not as bad. ExtType could optionally be given a memoryview instead of bytes and we can unpack from there

Yes, that looks feasible. It might require an option though to select whether bytes or a memoryview is given, for backwards compatibility with older ext_hooks.

methane · 2023-08-02T07:31:26Z

methane
Aug 2, 2023
Maintainer

I think msgpack is a JSON-like format, not a format optimized for large data.
There is no alignment. It means msgpack is not designed for mmap or zero-copy.

I am happy when I see people loving msgpack, but I feel it is wrong choice for sometime.

0 replies

jfolz · 2023-08-03T16:30:44Z

jfolz
Aug 3, 2023

There is no alignment. It means msgpack is not designed for mmap or zero-copy.

True. Even if you were to add padding bytes to the message, there is no way to ensure that the message itself is aligned.
While true zero-copy for arrays cannot be achieve, I still think the current API around ext types is not ideal and this feature of the format is underused as a result.

0 replies

rtobar · 2023-08-05T05:51:12Z

rtobar
Aug 5, 2023
Author

There is no alignment. It means msgpack is not designed for mmap or zero-copy.

It's true that there is no alignment guarantees, but numpy doesn't prevent users from creating arrays from memory addresses that are not aligned with the array data type. Sure, performance afterwards might suffer (depending on how much of the data you will actually read/process), but at least you can save the initial copy.

data = b'\x00\x00\x00\x00\xff\xff\xff\xff'
aligned = np.ndarray(buffer=data, dtype=np.uint32, shape=(2,))
misaligned = np.ndarray(buffer=memoryview(data)[1:], dtype='>u4', shape=(1,))
print(aligned[0])
# 0
print(misaligned[0])
# 255
print(f"address: 0x{aligned.__array_interface__['data'][0]:02x}")
# address: 0x7fa40fb93a10
print(f"address: 0x{misaligned.__array_interface__['data'][0]:02x}")
# address: 0x7fa40fb93a11

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memoryview for raw bytes elements #550

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

memoryview for raw bytes elements #550

rtobar Aug 1, 2023

Replies: 4 comments · 1 reply

jfolz Aug 1, 2023

rtobar Aug 2, 2023 Author

methane Aug 2, 2023 Maintainer

jfolz Aug 3, 2023

rtobar Aug 5, 2023 Author

rtobar
Aug 1, 2023

Replies: 4 comments 1 reply

jfolz
Aug 1, 2023

rtobar Aug 2, 2023
Author

methane
Aug 2, 2023
Maintainer

jfolz
Aug 3, 2023

rtobar
Aug 5, 2023
Author