Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-level indexing for parallelization of I/O #6

Open
cphyc opened this issue Feb 19, 2024 · 1 comment
Open

Two-level indexing for parallelization of I/O #6

cphyc opened this issue Feb 19, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@cphyc
Copy link
Member

cphyc commented Feb 19, 2024

At the moment, yt employs a global index to locate data. For many datasets (incl. RAMSES), data are ever spread within files with a mapping that relies on some space-filling process (e.g. the Hilbert curve or the Z-curve).

This mapping is typically lightweight and could be used as an intermediate coarse-grained indexing (similar to what has been done with SPH, as far as I could understand).

Having a lightweight coarse-grained indexing (with no false-negative!) would allow it to be copied on however many MPI tasks yt is running on. As a second step, the current indexing (from file to position in file) would allow finer refinement (do we need to read any data? where is it located on disk?). This would allow distributing the list of files to be read to the different tasks deterministically based on the intersection with the coarse index, and each task would subsequently read the files that actually intersect.

Some of the code required for this is already in place for the RAMSES dataset (yt-project/yt#4734 for having a two-level index, yt-project/yt#4730 for how one could use this to parallelize I/O).

@cphyc cphyc added the enhancement New feature or request label Feb 19, 2024
@matthewturk
Copy link
Member

We should do this. I had an attempt at something related but not identical that I can no longer find; essentially, it was to make .index a list instead of a single attribute, and then have multiple grid index objects that could live on it. Same for particles, which are already closer to that.

Perhaps what we could try:

  • Make .index a wrapper, which acts basically as an index collection. It would need to respond to all of the same API calls, but we'd want to explore how the attributes that are different between different index types would be supplied.
  • Have it allow multiple index objects, for grids, cells, particles, etc. And it can have multiple of each type. (This would enable overlaying particles on grids more easily, for instance.)
  • Then this top-level index could mediate the coarse indexing, which particles handle already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants