Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog for running gpu based functions on Fission #271

Merged
merged 11 commits into from
Sep 26, 2024

Conversation

soharab-ic
Copy link
Contributor

No description provided.

Copy link

netlify bot commented Sep 19, 2024

Deploy Preview for fission-website ready!

Name Link
🔨 Latest commit d966fd3
🔍 Latest deploy log https://app.netlify.com/sites/fission-website/deploys/66f511c567721300082d8e2a
😎 Deploy Preview https://deploy-preview-271--fission-website.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@techmaharaj
Copy link
Contributor

General:

  • We can improve the intro - add some more meat around serverless functions, talk about rise in AI workloads, how serverless can help, here's how we can use Fission to deploy GPU functions.
  • [Important] We should add a section on Why run GPU based functions on Fission, maybe 2-4 points with reasons, that'll be really helpful.
  • Let us also mention at the beginning before pre-requisites as to what we are doing in this blog post. Currently it's not mentioned anywhere.
  • We can find an opportunity to link to our GPU blog post on InfraCloud.

Specific:

  • Let's be clear with pre-requisites, a system with an Nvidia card, Kubernetes cluster, fission installed (link to fission installation doc), Nvidia related stuff installed (link to respective pages) - all these should be bullet points under the sub-heading "Pre-requisites."
  • Steps under the "Steps" section are also not clear.
  • Can improve the sentiment example? I mean currently we're hardcoding the statement and the output in the last step just says Positive, and I was where did I give the statement? So two things can be done here,either provide a way to pass the statement as an argument OR print the statement in the output.

Signed-off-by: Md Soharab Ansari <[email protected]>
Signed-off-by: Md Soharab Ansari <[email protected]>
Signed-off-by: Md Soharab Ansari <[email protected]>
Create a function using the package, notice are passing `sentiment.main` as entrypoint.

```bash
$ fission fn create --name sentiment-fn --pkg sentiment-pkg --entrypoint "sentiment.main"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment reference is missing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package is already associated with an environment.

```

- The `fission env create` command will create two deployments. One deployment named `poolmgr-python-default-*` for environment and another for builder named `python-*`.
- Edit the environment deployment and add GPU resources to `python` environment container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide the Kubectl edit command here?
Also, if we can do this via kubectl patch that would be easy to consume for the reader.

Signed-off-by: Md Soharab Ansari <[email protected]>
- Patch the environment deployment to add GPU resources to `python` environment container and set `nodeSelector` to schedule pods on a GPU node using `kubectl patch` command.

```bash
kubectl patch deployment poolmgr-python-default-5560759 -p '{"spec": {"template": {"spec":{"containers":[{"name":"python","resources": {"limits": {"nvidia.com/gpu": "1"}, "requests": {"nvidia.com/gpu": "1"}}}]}}}}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider patching environment spec. Not deployment.
User should not change fissions internal object.

Copy link
Contributor Author

@soharab-ic soharab-ic Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanketsudake Patching the resources is working via environment spec but nodeSelector patch is not working.
Seeing this error.

Error: error applying specs: environment apply failed: Environment.fission.io "pytorch" is invalid: spec.runtime.podspec.containers: Required value
apiVersion: fission.io/v1
kind: Environment
metadata:
  creationTimestamp: null
  name: pytorch
spec:
  builder:
    command: build
    container:
      name: ""
      resources: {}
    image: ghcr.io/fission/python-builder
  imagepullsecret: ""
  keeparchive: false
  poolsize: 1
  resources:
    requests:
      nvidia.com/gpu: "1"
    limits:
      nvidia.com/gpu: "1"
  runtime:
    container:
      name: ""
      resources: {}
    image: ghcr.io/fission/python-env
    podSpec:
      nodeSelector:
        kubernetes.io/hostname: "gpu-node03"
  version: 3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So typically you need to have container name same environment. As we have two containers in pod

  • runtime and fetcher

containers is required field https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#podspec-v1-core

https://github.com/fission/fission/blob/93869d3bc8b6097143e5488366dae178488b4474/test/tests/test_specs/test_spec_merge/specs/env-nodend.yaml#L25

Probably we can skip node selector part in blog post. only keep resources requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping the node selector part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please update blog accordingly

@sanketsudake sanketsudake merged commit f6b43e4 into main Sep 26, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants