Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests_l2: Update readme #104

Merged
merged 1 commit into from
Sep 20, 2023
Merged

tests_l2: Update readme #104

merged 1 commit into from
Sep 20, 2023

Conversation

vbedida79
Copy link
Contributor

@vbedida79 vbedida79 commented Jul 31, 2023

Updated readme for tests/l2/gpu- set boolean value of setsebool container_use_devices on for the GPU node.
This is needed for GPU workloads to run and access host devices. Will be removed after #107 resolved

Signed-off-by: vbedida79 [email protected]

@vbedida79
Copy link
Contributor Author

vbedida79 commented Jul 31, 2023

@mregmi @uMartinXu please review

@mregmi
Copy link
Member

mregmi commented Jul 31, 2023

LGTM

@@ -28,6 +28,10 @@ $ oc logs intel-sgx-job-4tnh5
```
### Verify Intel® Data Center GPU provisioning
This workload runs [clinfo](https://github.com/Oblomov/clinfo) utilizing the i915 resource from GPU provisioning and displays the related GPU information.
* Before deploying any workload, please ensure to set access for ```/dev/``` folder on the GPU node, with the command below:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need a selinux policy for the workload pod? @mregmi? if it is the case, do we have an existing one can be used by the workload pod?

Copy link
Contributor Author

@vbedida79 vbedida79 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use container_device_t selinux policy. Works with a custom ocp scc.
I wonder how feasible it is to for users to declare it for every workload/e2e with other operators/dependencies. @mregmi @uMartinXu thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#107 with container_device_t does not include enough permissions to run. Might require more permissions. For this release, I suggest we use this PR. Will keep the other PR open and verify with correct permissions. @mregmi @uMartinXu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea for now lets go with bool. we will have to give extra permissions later

@hershpa
Copy link
Contributor

hershpa commented Aug 30, 2023

@vbedida79, can you add the target milestone for this PR?

@vbedida79 vbedida79 modified the milestones: v1.1.0, v1.0.1 Sep 5, 2023
@uMartinXu
Copy link
Contributor

We now are working on the issue #107 and need more time to figure out the proper solution. So we will close this PR

@uMartinXu uMartinXu closed this Sep 13, 2023
@vbedida79 vbedida79 reopened this Sep 19, 2023
@hershpa hershpa added the gpu Intel GPU label Sep 19, 2023
@@ -28,6 +28,10 @@ $ oc logs intel-sgx-job-4tnh5
```
### Verify Intel® Data Center GPU provisioning
This workload runs [clinfo](https://github.com/Oblomov/clinfo) utilizing the i915 resource from GPU provisioning and displays the related GPU information.
* Before deploying any workload, please ensure to set access for ```/dev/``` folder on the GPU node, with the command below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change it to:

To work around issue, please run the below command on the node with the GPU where your workload will run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: vbedida79 <[email protected]>
@uMartinXu
Copy link
Contributor

Looks good to me!

@uMartinXu uMartinXu merged commit a389ae0 into intel:main Sep 20, 2023
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Intel GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants