Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience functions to insert / fetch when an attach field is in table definition #1156

Open
MaxFBurg opened this issue Mar 21, 2024 · 1 comment

Comments

@MaxFBurg
Copy link
Contributor

Feature Request

Problem

When inserting into a table that has a field result : attach@minio, the insert table method expects a file path. Similarly, fetch stores a file and returns a file path. This is often times inconvenient, because (i) the data saved in the file is required as an object in the python script one is executing, and (ii) the saved / downloaded files remains on local storage even after the script terminated.

Requirements

Possible solution: Introduce a parameter to insert that automatically saves the data that should be inserted to a file, inserts it into the table, and then removes that file. Similarly, fetch could save the file, and return the file / data loaded within the python script.

Justification

See problem section

Alternative Considerations

Currently I am using an AttachMixin as a workaround, i.e. my table would be defined as class MyTable(AttachMixin, dj.Computed). The mixin could be the code basis for the feature I suggested, although it would need a little bit of improvement.

class AttachMixin:

    def attach_insert(self, keys: Iterable[Dict[str, Any]], attach_keys: Iterable[str]) -> None:
        if not isinstance(attach_keys, list):
            raise ValueError("attach_keys must be a list")

        with tempfile.TemporaryDirectory(dir=os.environ.get("TMP", ".")) as temp_dir:
            for (i, key), ak in product(enumerate(keys), attach_keys):
                path = os.path.join(temp_dir, create_random_str() + ".pkl")

                with open(path, "wb") as f:
                    pickle.dump(key[ak], f)
                keys[i][ak] = path

            self.insert(keys)

    def attach_insert1(self, key: Dict[str, Any], attach_keys: Iterable[str]) -> None:
        self.attach_insert([key], attach_keys)

    def attach_fetch(
        self,
        *attrs: str,
        key: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> Union[Dict[str, Any], List]:
        key = key or {}

        with tempfile.TemporaryDirectory(dir=os.environ.get("TMP", ".")) as temp_dir:
            ret = (self & key).fetch(*attrs, download_path=temp_dir, **kwargs)  # array, list[dict]

            if isinstance(ret, dict):
                ret = self._load_from_dict(ret)

            elif isinstance(ret, Iterable):
                ret = np.array(ret)

                for i, value in enumerate(ret):
                    if isinstance(value, dict):
                        ret[i] = self._load_from_dict(value)

                    elif self._is_pkl_path(value):
                        with open(value, "rb") as f:
                            ret[i] = pickle.load(f)

                    else:
                        raise NotImplementedError(f"Value {value} is not a dict or a pkl path")

            elif self._is_pkl_path(ret):
                with open(ret, "rb") as f:
                    ret = pickle.load(f)

            else:
                raise NotImplementedError(f"Return value {ret} is not a dict, Iterable, or a pkl path")

        return ret

    def attach_fetch1(
        self,
        *attrs: str,
        key: Optional[Dict[str, Any]] = None,
        **kwargs,
    ) -> Union[Dict[str, Any], List]:
        ret = self.attach_fetch(*attrs, key=key, **kwargs)
        if len(ret) > 1:
            raise dj.DataJointError(f"fetch1 should only return one tuple. {len(ret)} tuples were found")
        return ret[0]

    def _load_from_dict(self, d: dict[str, str]) -> dict[str, Any]:
        for key, value in d.items():
            if self._is_pkl_path(value):
                with open(value, "rb") as f:
                    d[key] = pickle.load(f)
        return d

    def _is_pkl_path(self, value):
        return (
            isinstance(value, str) and value.endswith(".pkl") and os.path.isfile(value)
        )

Related

This issues might be (loosely) related:
#1109
#1099

If you think such a feature could be helpful to be included in datajoint, I would be happy to help implementing it.

@ttngu207
Copy link
Contributor

ttngu207 commented Apr 9, 2024

I think you're suggesting some sort of a user-provided functions on insert and on fetch for attach type.
This is very much the idea of DataJoint's AttributeAdapter feature - see here

With that feature, you can define a new DataJoint datatype (e.g. attack_pkl or something like that).

See some examples here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants