-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION] Reduce redundancy in model specifications #264
Comments
This is a timely discussion, as I have been thinking about this QCFractal side for a while. I think conceptually what you have under the "Describe alternatives you've considered" might be a bit cleaner. The way I think of it:
An QCFractal side, things will be a bit different, but could possibly re-use some of these classes in QCElemental. What to do with |
Thanks for the thoughts, @bennybp! A few questions to help decipher between the two alternatives. Question: Would one ever define a computation without defining the molecule on which to perform the computation? It appears to me the answer is most likely "no." Which suggests to me the Assuming I'm wrong in my statement above: What are there cases where we want to define how to run a computation without defining the molecule on which we want to perform it? I.e., cases where the object of computation contains a
If I understand you correctly here you're saying an I'd push back by saying the That said, I think my thinking boils down to two points:
Re: How does my question about use cases for an input specification sans a |
Interesting points!
Yes and no. As far as QCEngine is concerned, no (@loriab can correct me if I am wrong). However, in databases (like QCFractal), you can specify a specification and then a list of molecules to apply that specification to. That way you only store the specification once for many calculations. (This separation will always exist in QCFractal whether it exists in QCElemental/QCSchema or not)
I wouldn't say it's compensating. Optimizations have two main molecules, so we should be explicit (better than implicit!) about which is which. Imagine someone using autocomplete in Jupyter notebooks or something.
Yes, but it is up to the optimization program on how to populate those Also, it is not necessarily correct to think that the I tend to think of it this way:
This is what I'm moving towards in QCFractal (think of it in terms of a relational database). But that doesn't necessarily mean QCElemental must follow suit. Program is the interesting case. On one hand, you pass an input into a program, and so the program is outside the input. Ie, you could take the same On the other hand, it really is convenient to have that in there, isn't it :)
We could enforce that always be the case. But optimizers don't always use external programs for gradient (e.g., Gaussian does it all by itself). That also allowed for some optimizations and tighter coupling between gradient calculations. This could be worked around by setting the qc program to be the same as the optimization program, though. Again, interesting discussion! |
Great context to have. Thank you!
The implementation as it stands does indeed return the initial molecule submitted by the user as the
Makes sense from a relational database perspective--you want to segment out values that belong in unique tables and build the correct relationships between them. On a higher level note, I'd be curious to know your long-term plans for Are there plans for a more general specification for computing procedures? If so, perhaps a rethinking of the core compute inputs to this function would be helpful to properly define the general case for procedures. If procedures will remain as just optimizations for the foreseeable future, perhaps that's not helpful. I'm planning to use and/or expand QCEngine quite a lot throughout my grad school experience so I'd love to know longer term plans for procedure designs--it's something I've been thinking a lot about recently. I'm up for a Zoom call to brainstorm this idea and hear about your plans for the library, if helpful :) |
@bennybp I've been doing some more tooling around with the models, specifically doing more large-scale batch computations. After those experiments I'm seeing the wisdom in what you suggested of being able to specify the more granular parts of the computation as separate objects. This makes it easier to define a computation--like a specific energy evaluation--and map it across a library of molecules. I might be totally wrong in my earlier comments--a QCElemental API that looks more like a normalized db schema may be spot on! I'm digging the composition rather than inheritance approach better. I.e., I like the Anyways, just more random food for thought. Generally, I'm leaning your way of more granular definitions as soon as one does bulk computations. And I'm tending to prefer a compositional approach.
class QCInputSpecification(ProtoModel):
driver: DriverEnum = Field(DriverEnum.gradient, description=str(DriverEnum.__doc__))
model: Model = Field(..., description=str(Model.__doc__))
keywords: Dict[str, Any] = Field({}, description="The program specific keywords to be used.")
class AtomicInput(ProtoModel):
input_specification: QCInputSepecification
molecule: Molecule
# Protocols here or QCInputSpecification? Seems it should live with keywords. There's an argument
# for keywords being here instead of QCInputSpecification as well. Could be nice to have separation
# between things that are general across packages (like driver/model) and specific to a QC package
# like keywords and protocols (are protocols package specific??).
protocols: ... |
I am now getting to this part in my modifications to QCFractal. It's been a few months, and I'm wondering if you have any other thoughts? I do have some, but I would like to know if you have discovered anything in the meantime before I write them out |
Hi Ben-- Good to hear from you again on this! No new thoughts. The main thoughts I have after our discussion above and working through various implementations myself are:
Happy to discuss more if at all helpful, but I think these two point summarize my high level thoughts. Thanks for continuing to advance the project--I think you are doing such important work for the field :) |
@bennybp I do have an additional thought now :) Another layer at which model consolidation would be great and lead to a cleaner API (IMO): class AtomicResult(ProtoModel):
input_data: AtomicInput
...
class FailedOperation(ProtoModel):
input_data: AtomicInput
...
class OptimizationResult(ProtoModel):
input_data: OptimizationInput Currently
I noticed this discrepancy when writing code to handle successful/failed computations. It's super nice to be able to say Another win for the compositional model you were suggesting above. |
One other benefit to this approach on |
Is your feature request related to a problem? Please describe.
As I've worked through compute implementations of single point calculations and the optimization procedures in
qcengine
I've found some redundancies and inconsistencies in how inputs are defined. Would it be worth consolidating some of these data models to a more unified approach?Specifically, the
AtomicInput
,QCInputSpecification
, andOptimizationInput
models handle similar (and sometimes duplicate) data differently. Perhaps a unified approach would make the API simpler and more consistent?The way it is now
QCInputSpecification
andAtomicInput
contain the exact same fields with the only difference beingAtomicInput
having aprotocols
andmolecule
attribute. Given that theQCInputSpecification
is what is used inside of variousProcedureHarness
classes--likepyberny
's harness--to define the QC gradient calculation, perhaps it would make more sense to directly use theAtomicInput
specification to describe this task? This would consolidate where theMolecule
is defined for computations across single point and procedure tasks and provide a unified way for defining single point calculations whether requested on their own (viaqcng.compute
) or as part of a procedure.Describe the solution you'd like
I would suggest eliminating the
QCInputSpecification
class and updating theOptimizationInput
class as follows:This solves a few problems:
QCInputSpecification
model which is really anAtomicInput
minus theMolecule
AtomicInput
declaration, the same interface used for single point computations usingqcng.compute()
. I propose this is advantageous to having a second way to declare this computation for optimization procedures using aQCInputSpecification
and aMolecule
separately defined on theOptimizationInput
model..pop()
-ing "program" fromOptimizationInput.keywords
dictionary--not entirely intuitive and one has to read the source code itself to realize this is the required implementation to define the compute engine.Describe alternatives you've considered
AtomicInput
could inherit fromQCInputSpecification
and then only define its additional.protocol
and.molecule
attribute. This would at least reduce the code overhead of redundant field declarations that serve the same purpose. But I think the real intent of the library is better served by using theAtomicInput
object throughout to define single point calculations, even when those calculations are used inside of a procedure.Additional context
If this proposal has support I'll gladly draft up a PR for the code and update the associated procedures (berny, geometric, optking). The only update they'd need, if we implemented the
initial_molecule
property noted above, is to get the engine/program from the explicitly definedOptimizationInput.engine
field rather than executing ainput_data.keywords.pop("program")
command. I think the explicit declaration of the compute engine is better than implicitly placing it into keywords.Thanks for your time to review and consider this!
The text was updated successfully, but these errors were encountered: