-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Multiple Python Versions #175
Conversation
Adds support for having multiple evaluation python versions installed in the docker container. A utility to automatically generate correct dockerfile instructions and nsjail mounts based on the available versions is also included. Signed-off-by: Hassan Abouelela <[email protected]>
Adds a version argument to the eval API, and a GET endpoint to retrieve all enabled versions. Signed-off-by: Hassan Abouelela <[email protected]>
Signed-off-by: Hassan Abouelela <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this decouple the interpreter the web app uses and what eval/nsjail uses?
snekbox/api/resources/eval.py
Outdated
"version": { | ||
"type": "string", | ||
"oneOf": [{"const": name} for name in _VERSION_DISPLAY_NAMES], | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative proposal: pass a path to an executable instead of a version. This would leave the door open for supporting arbitrary executables in the future. So far nothing in the API is actually specific to Python or even a compiler/interpreter. The downside is that it's less user-friendly to know the full path to an executable than it is to specify a Python version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So instead of discovering which versions are available, users could discover which binaries are available and what their paths are. We don't need to list everything that's mounted; can continue to use the same approach of configuring that in a JSON file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting idea, I'll into implementing it. I'd probably keep the display names in the JSON files so end-users (bot) can still display human-readable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach would be to base it off nsjail config files. Each binary (including different Python versions) would have its own config file, and we would scan a directory for config files. The API would accept a config name. This would offer more flexibility in supporting multiple binaries, as different binaries may require different mounts, environment variables, etc. I don't think this approach is mutually exclusive with my previous proposal, so we can revisit this in the future as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how much of the config will be repeated. I imagine a lot of the python (for the cpython implementations) configs will be very repetitive, but even other interpreters would presumably need similar cgroup configuration, mounts, etc. We might be able to get around the duplication problem by providing overrides to be applied in python (or if nsjail supports combining configs?) instead of multiple config files. Either way, it's a problem for another day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fair enough if we stick with something like versions.json. However, I am considering that maybe arbitrary paths is better. It's more flexible, it's less complex, and requires no additional configuration. What problem or use case are we trying to solve by restricting the binaries to a predefined set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to define the python versions statically somewhere at any rate, so we can actually define them in the dockerfile. We might be able to get away with an image that has all python versions already so we don't need to specify them, but that would require an external image most likely. I linked an example in the original issue, but that is no longer maintained.
On a more general level though, I'm not sure arbitrary binaries is the best approach. Different binaries would require different configurations, be that resources, mounts, or permissions. The most flexible system would be statically defined sets of configurations - harking back to your multiple configs idea - and at that point we'd end up with end users selecting from a pre-defined set of human names.
This system could actually define multiple binaries per config-set as well by passing the path as an argument, or more ideally, defining the paths in a dict and passing the name to the API which resolves the correct binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most flexible system would be statically defined sets of configurations
I agree, and we want to leave the door open for that in the future.
Different binaries would require different configurations, be that resources, mounts, or permissions.
This is true, but not always. It is feasible for a single NsJail config to support multiple binaries (though it can only have one default entry point I think).
We'd need to define the python versions statically somewhere at any rate
That only applies to Python (so far), and I consider it an implementation detail as far as the API is concerned. I believe we need a stronger reason to influence the design of our API.
defining the paths in a dict and passing the name to the API which resolves the correct binary.
I find this redundant. It seems simpler and more intuitive to create an NsJail config, which effectively defines an environment, and have everything in that environment be available via the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is feasible for a single NsJail config to support multiple binaries (though it can only have one default entry point I think).
This is true, but we can probably just avoid defining a default entry point in the first place. That's more or less what this PR does for python.
It seems simpler and more intuitive to create an NsJail config, which effectively defines an environment, and have everything in that environment be available via the API.
My problem is it moves the burden of knowing what binaries are available on the API from the project (which can reasonably control the matter) to the user (which can not directly affect it). For instance, if we use this PR to implement support for multiple python versions on bot, we can't trivially define what versions are available to the bot without essentially hardcoding it. A change to snekbox would necessitate a change to the users as well.
Regardless of which approach we choose, it'd be our burden as maintainers to ensure if we provide a binary for eval in one version, that we also have it available for future versions, or declaring an incompatibility if needed. This is much harder when we don't actually know what binaries users need, and have to assume everything is used.
Explicitly defining which binaries the API supports (and returning them in /info
) alleviates both problem. At that point a display name is just the icing on the cake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My problem is it moves the burden of knowing what binaries are available on the API from the project (which can reasonably control the matter) to the user (which can not directly affect it).
I don't necessarily see that as a problem. I've seen snekbox as primarily an internal service for which users build their own front ends. Thus, the users are in control of both, and they will be aware of what is available in the snekbox environment. It'd be a different story if the snekbox API was directly available on the internet for anyone to use, but I do not think that is the use case we should be prioritising.
This is much harder when we don't actually know what binaries users need, and have to assume everything is used.
We could include a list of binaries in our documentation, and it does not need to be exhaustive. We can just list the Python binaries. If users want to use other binaries, then that's on them, and we won't guarantee the availability of those undocumented binaries.
It is sort of decoupled in the sense they are not enforced to be the same version as the one in the pyproject. The server will use one of the versions in versions.json, particularly the one labeled main. They are the same binaries though. |
I think the version the application uses should always match what we target with I don't know if there is any way for us to actually enforce this. I suppose we can mostly rely on CI to tell us whether the dependencies are compatible. |
The main challenge is figuring that version out during the dockerfile creation step, and verifying the version string against whatever is configured in the file. It might be doable, do we want that enforcement? |
No, I don't think we need to do anything for that. I think the tests + image build than run in CI gives us confidence in the dependencies being compatible. It's not worth adding complexity to our image. If there was a straightforward way, then sure. |
Remove the pointless function and allow the file-level constants to be imported by external callers. Signed-off-by: Hassan Abouelela <[email protected]>
Move the information endpoint from the eval resource into its own location. This endpoint currently only returns the python version but can easily be expanded with more info later. Signed-off-by: Hassan Abouelela <[email protected]>
Signed-off-by: Hassan Abouelela <[email protected]>
Signed-off-by: Hassan Abouelela <[email protected]>
12a7de9
to
2e8c91d
Compare
Signed-off-by: Hassan Abouelela <[email protected]>
Signed-off-by: Hassan Abouelela <[email protected]>
2e8c91d
to
db8d635
Compare
Adds an image with only one python version installed. This can be useful in development environments for other projects where the multi-version features are less useful than a more efficient container. Signed-off-by: Hassan Abouelela <[email protected]>
db8d635
to
3d9eff4
Compare
Nearly done, just need to wrap up some dependency work. I've tested the new CI on my fork. |
Closing this now that #181 is merged |
Closes #158
Adds support for evaluating code in different python versions via the
version
argument.This PR is a PoC to discuss the approach, and find issues with the idea. There's a versions.json file which specifies the supported python versions, and a make target for re-generating the relevant configuration files and dockerfile to enable the new versions. Packages are isolated based on python version.
The current approach was chosen for being relatively easy, quick, and lightweight when creating the container, however it is a little hacky. I am open to suggestions on other good methods for installing multiple python versions. Part of this proposal is to publish two different snekbox images, one with all images, and a lightweight one with just one version for developers.
Other things I've attempted included building all the versions in the container, manually and with pyenv, but those inevitably take way too long and produce really large containers. I've also tried adding deadsnakes and installing from apt, but I struggled to actually get it working within our image. This might still be a viable solution if we want to investigate it.