Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching: Add CACHE_VERSION attribute to CalcJob and Parser #6328

Merged
merged 1 commit into from
Apr 16, 2024

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Mar 20, 2024

Recently, the hashing mechanism was changed to remove all version
information, both aiida-core and the plugin if applicable, from the
hash computation. The reason was that with that information, each
version change in core and plugin would essentially reset the cache.
By removing the version information, updating the code would no longer
invalidate all the cache.

The downside of this change, however, is that when the implementation of
a plugin, most notably CalcJob and Parser plugins, changes
significantly such that a change in its attributes/inputs would really
change its content, this would no longer be reflected in the hash, which
would remain identical. This could lead to false positives when users
update certain plugins.

To give some control to plugin developers in case of changes where the
hash would have to be effectively reset, the CalcJob and Parser
classes now define the CACHE_VERSION class attribute. By default this
is set to None but it can be set to an integer by a plugin developer.
At this point, it is stored in the cache_version attribute under the
calc_job or parser key, respectively. Since this attribute is
included in the hash calculation, changing the value would effectively
reset the cache for nodes generated with one of the plugins.

The concept could have been added to the Node base class to stay
generic, however, this had complicates for Data nodes. Since the
version data would have to be stored in the attributes, in order to
prevent the risk from it being mutated or lost, it would interfere with
the actual data of the node. For example, the Dict node is entirely
defined by its attributes, so AiiDA cannot store any data in that
namespace. Luckily, not having this cache version information in nodes
other than CalcJobNodes is not a problematic, as the problem of
inducing false positives by changing plugin code is really only relevant
to CalcJob and Parser plugins.

Recently, the hashing mechanism was changed to remove all version
information, both aiida-core and the plugin if applicable, from the
hash computation. The reason was that with that information, each
version change in core and plugin would essentially reset the cache.
By removing the version information, updating the code would no longer
invalidate all the cache.

The downside of this change, however, is that when the implementation of
a plugin, most notably a CalcJob plugin, changes significantly such
that a change in its attributes/inputs would really change its content,
this would no longer be reflected in the hash, which would remain
identical. This could lead to false positives when users update certain
plugins.

To give some control to plugin developers in case of changes where the
hash would have to be effectively reset, the Node class now defines
the CACHE_VERSION class attribute. By default this is set to None
but a Data plugin can change it to some string value. To facilitate
the same API for process plugins, such as CalcJob's, the
CACHE_VERSION attribute is also added to the Process base class.
This value is automatically added to the node's attributes, under the
key _aiida_cache_version such that is is included in the computed hash.
This way, a plugin developer can change this attribute in their plugin
to force the hash to change.

The version has to be stored in the attributes of the node and cannot
just be taken from the plugin's class attribute directly when computing
the hash, because in that case when the hash is recalculated in the
future, a potentially different value of CACHE_VERSION could be used
(whichever version the plugin has that is installed at that time).

@sphuber
Copy link
Contributor Author

sphuber commented Mar 20, 2024

@danielhollas @unkcpz I tried to implement something that would allow plugin developers to still control cache invalidation if their plugins change significantly, after we remove all version information from hash calculation. However, it is not as easy as it seemed.

Essentially, we need to allow a plugin to define some class attribute that enters in the hash computation. This has to be stored on the node somehow though and cannot just be taken from the class when the node gets stored, because otherwise in the future, when the hash of the node is recalculated, it will take the current cache version of the plugin class, which may have been updated. Since it is important information, I didn't want to store this "cache version" to the extras, since it can get lost, since they are mutable. So the attributes are the only remaining place. However, this has all sorts of knock-on consequences: up till now, AiiDA never automatically stored "internal" values in the attributes column. This assumption is now broken and breaks all sorts of other parts of the code. Example, Dict.get_dict() will now include this _aiida_cache_version attribute that is automatically set. Now I can update the code to explicitly ignore this, but this is really a workaround and as we can see other tests are still failing due to this change. It is not clear to me yet whether this is acceptable, but it is a pretty significant departure from the status quo, so I am really hesitant. Not sure if there is another solution that I am missing for us to permanently store this cache version.

We have had talks in the past to have "internal" attributes that are set just by aiida-core's internals and never by plugins. This would require a significant change to the node model and the ORM. Not really to keen to having to do that at this point, but maybe that is the only proper way of solving this issue. Thoughts?

@danielhollas
Copy link
Collaborator

Yeah, I realized late last night the hash recalculation will make things tricky :-(
Just thinking, doesn't this also apply to the current design that stores aiida and plugin versions in the hash?

@sphuber
Copy link
Contributor Author

sphuber commented Mar 20, 2024

Just thinking, doesn't this also apply to the current design that stores aiida and plugin versions in the hash?

It does. Because currently the Node class automatically adds the module version:

version = importlib.import_module(top_level_module).__version__

which would be removed by #6215

Note that for calcjobs the version of both core and plugin is also stored in the attributes. But since that is fixed, that doesn't suffer from the same problem when rehashing. But #6215 also proposes to remove this from the hash calculation.

@danielhollas
Copy link
Collaborator

Note that for calcjobs the version of both core and plugin is also stored in the attributes. But since that is fixed, that doesn't suffer from the same problem when rehashing.

Sorry, I don't follow, can you expand on this part? If those versions are stored (under which key?) why is adding an extra field problematic?

@sphuber
Copy link
Contributor Author

sphuber commented Mar 20, 2024

Sorry, I don't follow, can you expand on this part? If those versions are stored (under which key?) why is adding an extra field problematic?

The important part here is that the behavior is different for data and process nodes. The Process class will automatically store version information in the attributes of its ProcessNode, see:

def _setup_version_info(self) -> None:

This is fine for process nodes since those cannot be subclassed by plugins and so anyway they cannot control what is stored in the attributes. This is different for data plugins, though, where the attributes are essentially the one thing that plugins can control. As we have seen, the Dict node for example uses the attributes to store its content. If aiida-core now starts "polluting" this with internal attributes, that no longer gives full control to the plugin.

@danielhollas
Copy link
Collaborator

I see, that is subtle indeed. So if I understand correctly, the approach in this PR could work for Processes, because you could modify aiida-core to add it to ProcessNode attributes? Or does this have unintended side effects as well?

@sphuber
Copy link
Contributor Author

sphuber commented Mar 20, 2024

I see, that is subtle indeed. So if I understand correctly, the approach in this PR could work for Processes, because you could modify aiida-core to add it to ProcessNode attributes? Or does this have unintended side effects as well?

Hmm, perhaps we could indeed only add it to ProcessNode's, because that is already what we are doing more or less. The problem is really only with Data nodes. I was focused on adding it there as well, because the Data implementation could also change. But I now remember that I already argumented a few weeks ago that arguably the implementation for Data plugins is less important and there ultimately the hash really is defined just by the data contents. So maybe adding this cache version just to process nodes is sufficient for our use case and we avoid all the problems by polluting the Data attributes namespace

@unkcpz
Copy link
Member

unkcpz commented Mar 20, 2024

The use case is clear to me now after a first go of reviewing. Just one comment.

I didn't want to store this "cache version" to the extras, since it can get lost, since they are mutable.

I think this can be a very straightforward implementation, the hash is already stored in the extra, I didn't see why it is a problem to not store the cache version to extra as well.

But I now remember that I already argumented a few weeks ago that arguably the implementation for Data plugins is less important and there ultimately the hash really is defined just by the data contents.

Indeed, the direct purpose is to avoid the changing of Process input/output interfaces that will bring the false positive.

Yeah, I realized late last night the hash recalculation will make things tricky :-(

@danielhollas If I understand correctly, there will be no hash recalculation, if the plugin update the cache version, the new ProcessNode will just has different hashing and the old ProcessNode is unreachable by new calculations.
If roll-back the plugin version, and the old ProcessNode will because available again.

@danielhollas
Copy link
Collaborator

Yeah, that makes a lot of sense! I guess I don't even understand how caching works for Data node, or what even would be a purpose of that? I thought of caching as always tied to Processes?

By default this is set to None but a Data plugin can change it to some string value.

Any specific reason for this to be a string? An int seems simpler and more straightforward to think about, evicting the cache would simply be bumping it by +1.

@sphuber sphuber force-pushed the feature/node-cache-version-control branch from 8daadfc to 9730f7a Compare March 20, 2024 11:24
@danielhollas
Copy link
Collaborator

I think this can be a very straightforward implementation, the hash is already stored in the extra, I didn't see why it is a problem to not store the cache version to extra as well.

Well, there's a difference in consequences when these extras are dropped. If the hash is dropped, the node should simply not be available for caching (right?) so you'll get a false negative for cache hit. But if the cache version somehow got dropped, then you could get false positive cache hits (but I guess only after the hash would also be recalculated?). These considerations do seem a bit academical. At the same time, adding it to Process attributes does not seem any more difficult in terms of implementation?

@danielhollas If I understand correctly, there will be no hash recalculation

Sure, but it's possible that due to DB migration induced by AiIDA-core, we would ask users to recalculate their hashes manually (as we did last time, although whether that's necessary is up to discussion I guess).

@sphuber
Copy link
Contributor Author

sphuber commented Mar 20, 2024

Yeah, that makes a lot of sense! I guess I don't even understand how caching works for Data node, or what even would be a purpose of that? I thought of caching as always tied to Processes?

Data nodes are not really cached in that sense, they just have a hash which is used in the computation of the process' hash.

Any specific reason for this to be a string? An int seems simpler and more straightforward to think about, evicting the cache would simply be bumping it by +1.

No particular reason. Could even accept either. For now I haven't really thought about limitations/validation of this class attribute. First wanted to see if the generic concept would be possible.

If I understand correctly, there will be no hash recalculation, if the plugin update the cache version, the new ProcessNode will just has different hashing and the old ProcessNode is unreachable by new calculations.
If roll-back the plugin version, and the old ProcessNode will because available again.

@unkcpz The problem here as Daniel has pointed out is that the public API (and CLI as well) have methods to clear the hash of the extras. And on top of that, the hash is just stored in the extras, which can be (accidentally) deleted. If at that point you have to recalculate the hash, you better make sure you have all the data that went in the original hash. So we need to permanently store this specific cache version (and cannot rely on extras as it can be accidentally removed).

@sphuber sphuber force-pushed the feature/node-cache-version-control branch from 9730f7a to 4df4753 Compare March 20, 2024 11:33
@sphuber
Copy link
Contributor Author

sphuber commented Mar 20, 2024

I have updated the implementation to just change the behavior for Process plugins. They can set the CACHE_VERSION class attribute, which will get set on the ProcessNode in the version.cache attribute. For now I put it there as it integrated nicely with the other version information of core and the plugin. This would just force @unkcpz 's implementation to change slightly as now he can no longer drop the version attribute in its entirety from the objects to go in the hash, but the version.cache subkey needs to be kept. Alternative would be to not nest it in the version attribute, but have it top-level in version_cache or something like that.

@@ -92,5 +93,6 @@ def get_version_info(self, plugin: str | type) -> dict[t.Any, dict[t.Any, t.Any]
return self._cache[plugin]

self._cache[plugin][KEY_VERSION_ROOT][KEY_VERSION_PLUGIN] = version_plugin
self._cache[plugin][KEY_VERSION_ROOT][KEY_VERSION_CACHE] = getattr(plugin, 'CACHE_VERSION', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow the logic in this function, but should we try to only do this for Process subclasses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other classes? The only other node subclasses are Data and we just went through why that would cause issues while it not even being necessary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not being clear. If I understand this function, it is defined for anything that a plugin defines, e.g. both Data and Process classes. Since now the CACHE_VERSION is only on Process classes, this means that self._cache[plugin][KEY_VERSION_ROOT][KEY_VERSION_CACHE] will always be None for Data classes? Or am I totally off mark here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that this PluginVersionProvider is only used by the Process class (through the Runner) to add the result to its attributes. So Data nodes are not affected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this part of the code:

def _setup_version_info(self) -> None:

@danielhollas
Copy link
Collaborator

I briefly went through the implementation and mostly makes sense to me.

Now it got me thinking, should aiida-core use the same mechanism for invalidating caches, instead of forcing a DB migration? E.g. instead of putting a aiida version itself in the cache, have CACHE_VERSION defined somewhere (effectively globally). DB migration is a chore (I suspect), but the biggest issue is that it is not reversible.

@sphuber sphuber force-pushed the feature/node-cache-version-control branch 2 times, most recently from 9fcafea to 1388ae1 Compare March 20, 2024 17:00
@sphuber
Copy link
Contributor Author

sphuber commented Mar 22, 2024

Now it got me thinking, should aiida-core use the same mechanism for invalidating caches, instead of forcing a DB migration? E.g. instead of putting a aiida version itself in the cache, have CACHE_VERSION defined somewhere (effectively globally). DB migration is a chore (I suspect), but the biggest issue is that it is not reversible.

Very interesting idea. I think this might actually work and would indeed get rid of having to add a database migration to reset the cache in case the implementation changes. I don't see any immediate problems with it, but as always, the devil is in the details with caching. I think the best thing would be to simply try so I think I will give it a shot and see what comes up.

@sphuber
Copy link
Contributor Author

sphuber commented Mar 25, 2024

Thought the idea over a bit. Although it should properly "invalidate" existing caches since the hash of future processes with identical inputs will now be different, due to the changed CACHE_VERSION attribute, the stored hash of existing nodes will remain and be incorrect. It is therefore feasible, albeit very unlikely, that a future process can produce a hash that is identical to an existing one despite having different inputs. I guess then that this approach might be pragmatically functional but theoretically it is incorrect and leaves margin for false positives.

We might say that this theoretical problem is so unlikely that it doesn't really justify the inconvenience that migrations add. However, if we were to accept this solution, we would go back to the previous problem where we would have to store this global cache version somewhere on a node when it gets stored so later its hash can be recomputed. But with the current database layout, that could only be stored in the attributes, and that gives problems for the Dict class, and potentially other Data plugins. So we are back to square one.

I am therefore leaning to just keeping database migrations to invalidate hashes if aiida-core significantly updates the hashing algorithm, and in this PR we add the capability for plugin developers to invalidate the hashes of process plugins (i.e. effectively CalcJobs).

pinging @danielhollas

@danielhollas
Copy link
Collaborator

Sorry, on a conference this week. Need to think this through a bit, I'll try to reply on Wednesday if I am able. By Friday the latest.

@danielhollas
Copy link
Collaborator

Although it should properly "invalidate" existing caches since the hash of future processes with identical inputs will now be different, due to the changed CACHE_VERSION attribute, the stored hash of existing nodes will remain and be incorrect.

Hm, I am not sure I fully follow this scenario. What exactly do you mean by "incorrect" in this case? It seems to me that the hash will be consistent with the CACHE_VERSION that it was created with, no?

It is therefore feasible, albeit very unlikely, that a future process can produce a hash that is identical to an existing one despite having different inputs.

This seems to describe a hash collision? Those should be vanishingly unlikely right? I mean you can get a collision regardless of this scheme by pure bad luck? But I might be misunderstanding what you're trying to say.

However, if we were to accept this solution, we would go back to the previous problem where we would have to store this global cache version somewhere on a node when it gets stored so later its hash can be recomputed. But with the current database layout, that could only be stored in the attributes, and that gives problems for the Dict class, and potentially other Data plugins. So we are back to square one.

I thought that above we decided we don't really have to worry about the Data nodes, since they are defined by construction by their hash? I guess you're right that in case of aiida-core it might not be the same situation, if the core hashing algorithm is somehow changed.

I am therefore leaning to just keeping database migrations to invalidate hashes if aiida-core significantly updates the hashing algorithm, and in this PR we add the capability for plugin developers to invalidate the hashes of process plugins (i.e. effectively CalcJobs).

I think for this PR let's go with that. We can always decide to rethink/improve on the aiida-core later I guess.

@sphuber
Copy link
Contributor Author

sphuber commented Apr 3, 2024

I agree that hash collisions are always a risk, but since they are inherent to hashing algorithms and extremely unlikely, we shouldn't worry about them. However, what I was trying to sketch out, is that hashing collisions are to be expected and fine as long as your hashing algorithm doesn't change. But here, we were talking about the scenario where the hashing implementation in aiida-core changes. It is a bit more complicated in aiida-core because our algorithm is composed of the actual hashing algorithm (currently blake2b) and the selection of data of the node that actually goes into the hash. Currently, we are mostly and have ever changed the latter part, but we could imagine changing the actual hashing algorithm.

What I was thinking in my comment above is that hash collissions are acceptable as long as you stay within the same algorithm. However, when we change the objects to hash, we are changing the algorithm and then collisions risks no longer follow the rules of the hashing algorithm that lies underneath, right?

Anyway, there is a more practical objection I think. Let's go back to why we want to introduce some mechanism to invalidate hashes. The reason we have been removing hash extras through database migrations is not to prevent false positives but rather to make sure that caching keeps working (we just didn't think that including the versions would fully undo this work 😅 ). If the hashing algorithm changes, the computed hash for a data node with attributes {x} would be different. A calculation that has that node as input will therefore not be cached, even though it should because the data is exactly the same. This is why we were resetting hashes, to make sure that old data nodes are properly matched with the new hashing algorithm. So the reset is not to invalidate the existing data node hashes, but to make sure they are up to date.

This is exactly the opposite motiviation for the feature we are discussing adding here, which is to give control to calcjob plugins to invalidate their hash if their implementation changes such that the outputs are expected to change even for identical inputs.

Conclusion: I think we should keep using database migrations if aiida-core changes the hashing algorithm to ensure that existing data nodes remain valid cache sources, but also provide a CACHE_VERSION attribute for CalcJob plugin writers to invalidate existing nodes.

@danielhollas
Copy link
Collaborator

This is exactly the opposite motiviation for the feature we are discussing adding here, which is to give control to calcjob plugins to invalidate their hash if their implementation changes such that the outputs are expected to change even for identical inputs.

That's a very good point. 👍 Agreed.

I am not sure what is the state of the current PR, feel free to ping me for review.

@sphuber
Copy link
Contributor Author

sphuber commented Apr 3, 2024

I am not sure what is the state of the current PR, feel free to ping me for review.

I think the current summary of the situation is as follows:

  1. Including the version (core and plugin) in the hash computation makes caching effectively useless, so we remove it (PR Loose the hashing computing without include core version #6215)
  2. Removing the version, however, runs the risk of false positives when CalcJob plugin implementations change significantly.
  3. Therefore an attribute is added to the CalcJob class that is added to objects included in the hash. This attribute can be changed by plugin implementations to effectively reset the cache. (This PR)
  4. Whenever the hashing algorithm in aiida-core is changed, a database migration is necessary to reset all hashes and recompute them. Without it, the existing cache would essentially be unusable.

This is currently implemented by PR #6215 and this PR. So I think this is ready for review.

@sphuber
Copy link
Contributor Author

sphuber commented Apr 3, 2024

I just thought of one more potential complication. Adding a cache version for the CalcJob class may not be sufficient. The outputs of a calcjob are really the generation of inputs files from input nodes through the CalcJob as well as the creation of the outputs through the Parser. So far we never included the parser version separately because the argument was that typically that would be part of the same plugin package as the CalcJob and the package version was included in the hash calculation. This is now getting removed, but we are only putting the version of the CalcJob back in. I think we should add a similar attribute for the Parser.

@danielhollas
Copy link
Collaborator

I think we should add a similar attribute for the Parser.

Yeah, unfortunate but reasonable.

At some point I was also thinking about if we could somehow detect Parser/CalcJob changes automatically via some code introspection. But even if we were able to do that, it would kind of defeat the purpose since even non-functional changes would affect the caching behaviour.

@sphuber sphuber force-pushed the feature/node-cache-version-control branch 2 times, most recently from 1297858 to 436662c Compare April 15, 2024 09:46
@sphuber sphuber force-pushed the feature/node-cache-version-control branch 2 times, most recently from 6335b91 to 2e46256 Compare April 15, 2024 12:16
@sphuber sphuber marked this pull request as ready for review April 15, 2024 12:18
@sphuber
Copy link
Contributor Author

sphuber commented Apr 15, 2024

@danielhollas @unkcpz I have now implemented the final state of the discussion. It essentially adds the CACHE_VERSION class attributes to the Process and Parser classes. Plugin developers can now change this in their CalcJob and Parser plugins to reset the cache. If these versions are set, they will be written to the CalcJobNode attributes together with the version of aiida-core and the plugin package, i.e. node.base.attributes.get('version') would return something like:

{
    'core': '2.6.0',
    'plugin': '3.0.0',
    'cache': 'some-value', # Value of `CalcJob.CACHE_VERSION`
    'parser': 'other-value', # Value of `Parser.CACHE_VERSION`
}

Some open questions/action points:

  • The CACHE_VERSION is currently added to the Process base class, but really would probably only be used by the CalcJob class. Technically, calcfunctions are also considered for caching since they are calculation processes and not workflows, but there the process class FunctionProcess is created dynamically and so the developer cannot set the CACHE_VERSION. But then again they wouldn't have to because the function source code is included in the hash calculation I think, so changing it would anyway already invalidate the code. So maybe it would make more sense to move CACHE_VERSION to the CalcJob? Less "correct" in the generic sense, but certainly makes a whole lot more sense practically speaking.
  • The key cache for the CalcJob.CACHE_VERSION also makes little sense. If we agree that we move the attribute from the Process to the CalcJob class, then just naming this calc_job_cache_version and changing the parser to parser_cache_version would be clearest, right?
  • Currently there is no type-checking on the CACHE_VERSION. I have type checked it as str | None. I don't see a reason why we should be restrictive in principle. Unless this unrestrictiveness actually makes it more difficult for the user to adopt a sensible versioning strategy? I am not sure what the best approach would be here.
  • These changes would still require Loose the hashing computing without include core version #6215 to be changed. Currently that drops the entire version attribute dictionary from the hash calculation, but it would have to keep the cache and parser keys. That is easy enough to do though once we sign off on the concept.
  • Once signed off, I will add documentation of course.

@danielhollas
Copy link
Collaborator

So maybe it would make more sense to move CACHE_VERSION to the CalcJob?

+1 If we realize we need it to be more generic, we can always lift it up the class chain.

then just naming this calc_job_cache_version and changing the parser to parser_cache_version would be clearest, right?

Seems ok to me.

Currently there is no type-checking on the CACHE_VERSION. I have type checked it as str | None. I don't see a reason why we should be restrictive in principle. Unless this unrestrictiveness actually makes it more difficult for the user to adopt a sensible versioning strategy? I am not sure what the best approach would be here.

I would vote to make it an int to make it clear that any versioning strategy other then "bump this integer by one" doesn't make sense since the purpose of this version is essentially only to invalidate the cache (e.g. semver doesn't make sense). I don't have a strong opinion about whether we should type-check this at runtime.

Currently that drops the entire version attribute dictionary from the hash calculation, but it would have to keep the cache and parser keys. That is easy enough to do though once we sign off on the concept.

I am wondering if it would be clearer to have a separate attribute "cache_version" or something like that. That would make it easier to see that we do not hash anything inside version, and the current code in #6215 would work as is and wouldn't need to get more complicated.

@sphuber sphuber force-pushed the feature/node-cache-version-control branch from 2e46256 to 6ab6e60 Compare April 15, 2024 15:27
@sphuber
Copy link
Contributor Author

sphuber commented Apr 15, 2024

I would vote to make it an int to make it clear that any versioning strategy other then "bump this integer by one" doesn't make sense since the purpose of this version is essentially only to invalidate the cache (e.g. semver doesn't make sense). I don't have a strong opinion about whether we should type-check this at runtime.

Done. For now there is no validation though. I am not sure we need it.

I am wondering if it would be clearer to have a separate attribute "cache_version" or something like that. That would make it easier to see that we do not hash anything inside version, and the current code in #6215 would work as is and wouldn't need to get more complicated.

Very good point, I like it. I have implemented it.

@sphuber sphuber force-pushed the feature/node-cache-version-control branch from 6ab6e60 to c16a73b Compare April 15, 2024 20:47
@sphuber
Copy link
Contributor Author

sphuber commented Apr 15, 2024

@danielhollas this should be done for final review. I have added the docs as well

@sphuber sphuber changed the title Caching: Add Node.CACHE_VERSION to node attributes Caching: Add CACHE_VERSION attribute to CalcJob and Parser Apr 15, 2024
@sphuber sphuber force-pushed the feature/node-cache-version-control branch from c16a73b to a29554f Compare April 15, 2024 21:10
Copy link
Collaborator

@danielhollas danielhollas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sphuber this looks great, thank you! Just some nits.

docs/source/topics/provenance/caching.rst Outdated Show resolved Hide resolved
docs/source/topics/provenance/caching.rst Outdated Show resolved Hide resolved
docs/source/topics/provenance/caching.rst Outdated Show resolved Hide resolved
@sphuber sphuber force-pushed the feature/node-cache-version-control branch from a29554f to e705186 Compare April 16, 2024 07:55
Recently, the hashing mechanism was changed to remove all version
information, both `aiida-core` and the plugin if applicable, from the
hash computation. The reason was that with that information, each
version change in core and plugin would essentially reset the cache.
By removing the version information, updating the code would no longer
invalidate all the cache.

The downside of this change, however, is that when the implementation of
a plugin, most notably `CalcJob` and `Parser` plugins, changes
significantly such that a change in its attributes/inputs would really
change its content, this would no longer be reflected in the hash, which
would remain identical. This could lead to false positives when users
update certain plugins.

To give some control to plugin developers in case of changes where the
hash would have to be effectively reset, the `CalcJob` and `Parser`
classes now define the `CACHE_VERSION` class attribute. By default this
is set to `None` but it can be set to an integer by a plugin developer.
At this point, it is stored in the `cache_version` attribute under the
`calc_job` or `parser` key, respectively. Since this attribute is
included in the hash calculation, changing the value would effectively
reset the cache for nodes generated with one of the plugins.

The concept _could_ have been added to the `Node` base class to stay
generic, however, this had complicates for `Data` nodes. Since the
version data would have to be stored in the attributes, in order to
prevent the risk from it being mutated or lost, it would interfere with
the actual data of the node. For example, the `Dict` node is entirely
defined by its attributes, so AiiDA cannot store any data in that
namespace. Luckily, not having this cache version information in nodes
other than `CalcJobNode`s is not a problematic, as the problem of
inducing false positives by changing plugin code is really only relevant
to `CalcJob` and `Parser` plugins.
@sphuber sphuber force-pushed the feature/node-cache-version-control branch from 2a1745e to 0a6bc89 Compare April 16, 2024 08:18
@sphuber sphuber merged commit 39d0f31 into aiidateam:main Apr 16, 2024
20 checks passed
@sphuber sphuber deleted the feature/node-cache-version-control branch April 16, 2024 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants