Runpod.sh Refactor and new implementations. #14

Steel-skull · 2024-01-30T01:01:55Z

TLDR: refactored the code on Runpod.sh and Fixed any issues this caused in downstream files.

Updated repo from https://github.com/dmahan93/lm-evaluation-harness to https://github.com/EleutherAI/lm-evaluation-harness and made necessary changes to Table.py to fix any issues with the format

Runpod.sh:

GPU Detection and Parallelization Logic:
Added logic to set a flag for parallelization if multiple GPUs are detected.

Test Quantization:
Setup ability to quantize models for testing.

Package Installation:
Added installation of deepspeed, gekko and AUTO_GPTQ libraries

Environment Variable Handling:
Introduced LOAD_IN_4BIT and AUTOGPTQ environment variables with default fallbacks.

Script Structure:
Introduced functions run_benchmark_nous and run_benchmark_openllm to encapsulate the logic for running any future benchmarks.

Refactoring the Benchmark Execution:
Replaced benchmark execution blocks with calls to the newly defined functions.

Improved Error Handling and Logging:
Added a check for the presence of main.py before attempting to run it.
Replaced pathing for main.py, and others to solve issues with "Cannot find file"

Infinite Sleep for Debug Mode:
Added a message to indicate that the script is in debug mode, providing clarity during script execution.

was getting errors due to unfound files testing a refactor.

runpod.sh refactor

testing new approach in main.sh

fixing main.py link

Refactor GPU Parallelization and Benchmark Execution Logic

changed to agieval branch to fix requirements

added a function to debug that lets you know its still active

shortening test

added gptq and 4bit options

fixed indentation

added gekko requirement

add bitsandbytes

add dtype to gptq

Steel-skull · 2024-01-30T01:10:22Z

changes will need to be made to the colab to add 4bit function

Steel-skull · 2024-01-30T01:40:16Z

adding 4bit had increased processing times i will be looking into how to fix this i think i need to push a variable to gptq

Warning in runpod:
UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_dtype=torch.float32 (default). This will lead to slow inference or training speed.

need to push variable bnb_4bit_compute_dtype= "auto" (first attempt did not work)

may need to add a function to detect the dtype then push to gptq.

future ideas....

Steel-skull · 2024-01-30T02:55:23Z

Running without issue:
-AGIEval
-GPT4All

Failing to run? (Need to test further.)
-TruthfulQA
-Bigbench

Steel-skull · 2024-02-02T16:44:19Z

converting to draft until i am able to solve discovered issues

change debug print statement

attempting to passthrough bnb_dtype

fixed truthful qa and bigbench

adjusting for error: ImportError: /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Steel-skull added 22 commits January 29, 2024 02:04

runpod.sh refactor

add2b9d

was getting errors due to unfound files testing a refactor.

Merge pull request #1 from Steel-skull/patch-1

0d783f1

runpod.sh refactor

Update runpod.sh

15894af

testing new approach in main.sh

Update runpod.sh

1fa2faf

fixing main.py link

Update runpod.sh

724e9f4

Refactor GPU Parallelization and Benchmark Execution Logic

Update runpod.sh

9b859e4

Update runpod.sh

944101c

Update runpod.sh

b84b81d

Update runpod.sh

79649ad

changed to agieval branch to fix requirements

Update runpod.sh

d233fed

added a function to debug that lets you know its still active

Update runpod.sh

b27d435

Update runpod.sh

816867f

shortening test

Update runpod.sh

9457742

Update runpod.sh

4f4bdc1

added gptq and 4bit options

Update table.py

fadc88e

Update runpod.sh

78a2657

Update table.py

ac7aeb9

fixed indentation

Update runpod.sh

224270a

Update runpod.sh

d67d9e0

added gekko requirement

Update runpod.sh

eb5216c

add bitsandbytes

Update runpod.sh

ca28d3c

add dtype to gptq

Update runpod.sh

3220590

Steel-skull marked this pull request as draft February 2, 2024 16:44

Steel-skull added 3 commits February 2, 2024 10:51

Update runpod.sh

abc55a2

change debug print statement

Update runpod.sh

1d8843e

attempting to passthrough bnb_dtype

Update runpod.sh

29a80de

fixed truthful qa and bigbench

Steel-skull added 2 commits February 3, 2024 19:10

Update runpod.sh

bb2faa7

adjusting for error: ImportError: /usr/local/lib/python3.10/dist-packages/vllm/_C.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Update runpod.sh

e308fcc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runpod.sh Refactor and new implementations. #14

Runpod.sh Refactor and new implementations. #14

Steel-skull commented Jan 30, 2024

Steel-skull commented Jan 30, 2024

Steel-skull commented Jan 30, 2024 •

edited

Loading

Steel-skull commented Jan 30, 2024

Steel-skull commented Feb 2, 2024

Runpod.sh Refactor and new implementations. #14

Are you sure you want to change the base?

Runpod.sh Refactor and new implementations. #14

Conversation

Steel-skull commented Jan 30, 2024

Steel-skull commented Jan 30, 2024

Steel-skull commented Jan 30, 2024 • edited Loading

Steel-skull commented Jan 30, 2024

Steel-skull commented Feb 2, 2024

Steel-skull commented Jan 30, 2024 •

edited

Loading