Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Clean-Up Pass 2: Tests & Code Improvements #1293

Closed
wants to merge 15 commits into from

Conversation

cosmic-snow
Copy link
Collaborator

@cosmic-snow cosmic-snow commented Jul 30, 2023

Describe your changes

  • Python tests:
    • Add pytest markers for better test selection control
    • Ability to set model folders other than the default with the new PYTEST_MODEL_PATH environment variable (tests only)
    • Add tests for the gpt4all.gpt4all module / GPT4All class
    • Add tests for backend code (ctypes related) in the gpt4all.pyllmodel module
      • not exhaustive but the important parts are there
      • uncovered minor bugs (embeddings memory management) and a minor missing feature (embeddings memory requirement)
    • Add tests for the rest of the gpt4all.pyllmodel module in test_pyllmodel.py (focused on the LLModel class)
  • Many more type hints:
    • pyllmodel module
    • gpt4all module
  • Improve embedding logic in LLModel class
  • Factoring out some things into separate methods in GPT4All to make it more flexible:
    • Model initialization logic: so the heavy lifting could be handled differently in a subclass (e.g. deferred).
    • Chat session open/close: so it's easier to manage a session if there are separate localities (e.g. when using a framework).

There are no API breaking changes.

Pytest Markers

  • For the documentation on how markers work see: https://docs.pytest.org/en/7.4.x/how-to/mark.html
  • Now these markers are defined (see pytest.ini):
    • inference: all tests that do inference on a model. These are typically very taxing on both CPU and RAM
    • online: tests that require an internet connection (e.g. for models.json or the embeddings model)
    • slow: tests which typically take more than a few seconds; e.g. the heavier inference tests
    • c_api_call: only in test_backend.py, on tests which call the C API. While other Python tests can ultimately call the backend, too, the idea is that these can be run after backend changes.
  • Example: pytest -m "not inference and not online"

Specifying Model Path with PYTEST_MODEL_PATH

Tests now look for an environment variable PYTEST_MODEL_PATH which can be used to specify model folders. Example Linux:

$ export PYTEST_MODEL_PATH="${HOME}/.local/share/nomic.ai/GPT4All:${HOME}/.cache/gpt4all"

More details in tests/__init__.py in the docstring.

Issue ticket number and link

None

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • I have added thorough documentation for my code.
  • I have tagged PR with relevant project labels. I acknowledge that a PR without labels may be dismissed.
  • If this PR addresses a bug, I have provided both a screenshot/video of the original bug and the working solution.

Demo

None

Steps to Reproduce

mostly: pytest gpt4all-bindings/python 😉

Notes

Not sure about everything I'll include in this one, might split it up further.

@cosmic-snow cosmic-snow added the python-bindings gpt4all-bindings Python specific issues label Jul 30, 2023
@cosmic-snow
Copy link
Collaborator Author

cosmic-snow commented Jul 30, 2023

The first few commits are still from the previous one. Ignore, will be rebased eventually. Done.

@cosmic-snow
Copy link
Collaborator Author

This has become quite big even before refactoring.

At the moment, it's basically all tests. I think I'll do just a few code improvements now. And I won't worry about everything else I wanted to do here.

I'll open it up already in any case. From here it'll be some changes and squashing a bunch of commits.

@cosmic-snow cosmic-snow marked this pull request as ready for review August 13, 2023 21:50
@cosmic-snow
Copy link
Collaborator Author

cosmic-snow commented Aug 14, 2023

I think I'll leave it at this, just some squashing still to do. Maybe some very minor touch-ups, as well.

  - Mark decorators in test_gpt4all.py
  - Add note to the makefile
  - pytest marks need to be defined in a config file, here: pytest.ini
  - see: https://docs.pytest.org/en/7.4.x/how-to/mark.html
  - Specify optional PYTEST_MODEL_PATH environment variable.
  - Format is like other PATH env vars, i.e. multiple folders allowed.
  - If set, these are the folder(s) in which models will be looked up
    for the tests.
  - Tests are focused on the backend, i.e. the C API (through ctypes)
  - Add another pytest marker specifically for the backend: c_api_call
  - Tests for the rest of the pyllmodel module which are mostly focused
    on the LLModel class.
  - module gpt4all
  - module pyllmodel
  - edit gpt4all_python.md to match the _format_chat_prompt_template() change
)

  - Separate _open_chat_session() and _close_chat_session() methods so
    it's easier to manage a session if there are separate
    localities (e.g. when using a framework).
…i#1293)

  - Make it more flexible so e.g. the heavy lifting could be handled
    differently in a subclass.
@apage43
Copy link
Member

apage43 commented Aug 24, 2023

it appears that if PYTEST_MODEL_PATH is not set that many tests fail with 'Invalid model path' instead of using a reasonable default:

======================================================================================================================================== short test summary info ==========================================================================================================================================
FAILED gpt4all/tests/test_gpt4all.py::test_inference - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_long_orca_3b - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_long_falcon - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_long_llama_7b - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_long_llama_13b - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_long_mpt - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_long_replit - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_long_groovy - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_hparams - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_falcon - ValueError: Invalid model directory: None
FAILED gpt4all/tests/test_gpt4all.py::test_inference_mpt - ValueError: Invalid model directory: None
ERROR gpt4all/tests/test_backend.py::test_llmodel_model_create - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_model_create2 - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_model_destroy - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_required_mem - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_loadModel_valid - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_loadModel_invalid - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_isModelLoaded - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_prompt - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_embedding - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_valid[0] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_valid[1] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_valid[2] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_valid[4] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_valid[8] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_valid[65] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_exception[None] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_exception[10011] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_exception[sixteen] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_invalid[-1] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_invalid[-4] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'
ERROR gpt4all/tests/test_backend.py::test_llmodel_get_set_threadCount_invalid[-257] - TypeError: unsupported operand type(s) for /: 'NoneType' and 'str'

@apage43
Copy link
Member

apage43 commented Aug 25, 2023

with a directory specified some tests that check output fail on Mac because the Metal implementation does not get the same result:

==================================================================================================== short test summary info ====================================================================================================
FAILED gpt4all/tests/test_gpt4all.py::test_inference_long_groovy - requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(2487988224 bytes read, 1297260057 more expected)', IncompleteRead(2487988224 bytes read, 1297260057 more expected))
FAILED gpt4all/tests/test_gpt4all.py::test_inference_hparams - AssertionError: assert 'Paris' in '1.\n'
FAILED gpt4all/tests/test_gpt4all.py::TestGPT4AllInstance::test_generate - AssertionError: assert ' peaches and...instructions?' == ' apples and ... and bananas.'
FAILED gpt4all/tests/test_gpt4all.py::TestGPT4AllInstance::test_generate_streaming - AssertionError: assert ' peaches and...instructions?' == ' apples and ... and bananas.'
FAILED gpt4all/tests/test_pyllmodel.py::TestLLModelInstance::test_prompt_model - assert ' And let him...e bed." In an' == '\nTheir eyes... on the prize'
FAILED gpt4all/tests/test_pyllmodel.py::TestLLModelInstance::test_prompt_model_streaming - assert ' And let him...e bed." In an' == '\nTheir eyes... on the prize'
===================================================================================== 6 failed, 87 passed, 1 warning in 1638.82s (0:27:18) ===============================

@cosmic-snow
Copy link
Collaborator Author

cosmic-snow commented Aug 25, 2023

Thanks for having a look.

it appears that if PYTEST_MODEL_PATH is not set that many tests fail with 'Invalid model path' instead of using a reasonable default:

Good catch, I failed to consider that. Although it already does the "reasonable default" thing by returning None, because that's the default value for the GPT4All class to trigger its own fallback. I need to do the same for the backend, for now at least.

with a directory specified some tests that check output fail on Mac because the Metal implementation does not get the same result:

Ah, interesting. I'd wager I don't have Metal support on CI? @acrastt was kind enough to help me run some tests on a Mac because I don't have one, but that was without Metal, too. I guess I'll just make the tests more general.

Edit:

  • Not sure what that first error in test_gpt4all.py::test_inference_long_groovy is about, though. That's not something I've touched I think; I've only touched the model folder logic in that. And the test passes here. Was that on a machine without internet access?
  • I had not changed the logic in test_gpt4all.py::test_inference_hparams either, and it's not deterministic (no temp=0).

I have inference off most of the time here, so that slipped through. By the way, is there a way to make it truly deterministic across all architectures? temp= 0 is not enough.

@cosmic-snow
Copy link
Collaborator Author

cosmic-snow commented Aug 25, 2023

By the way, looking at those pre-existing tests, I don't understand what do_long_input() is supposed to achieve in test_gpt4all. It might be a holdover from earlier, I'll investigate that later.

Edit: I guess it's fine, in that it tests whether longer inputs are possible. Although not very useful anymore, n_predict changed from 128 to 4096 in LLModel.

@apage43
Copy link
Member

apage43 commented Sep 2, 2023

that it tests whether longer inputs are possible

this is what it's for - specifically I added it to test that memory management for long contexts in the backend have not broken

Copy link
Member

@apage43 apage43 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputs are not necessarily 100% consistent across backends or build configurations - tests should probably not check for exact output strings but if they do should force running on cpu (though even with Metal off the MacOS build may get different results as ggml on MacOS still has significant differences from the windows and linux builds)

test_inference_hparams still fails on my machine for Paris not appearing in the first 3 tokens

@cosmic-snow
Copy link
Collaborator Author

Alright, I hope that's that.

If it's ok like this, I think I'll rebase onto main and merge those fixes into their respective commits.

@apage43
Copy link
Member

apage43 commented Sep 2, 2023

all tests now pass for me if I do set PYTEST_MODEL_PATH, but some still fail without it: https://gist.github.com/apage43/75247c0111363cde4dedd81b80217caa

weird that this doesn't repro for you - if you're sure you've made sure the var is not set and it all passes it could be a MacOS specific thing, in which case I'm okay with calling that out in the py binding README and leaving it that way for now and I'll try to debug it myself later

@cosmic-snow
Copy link
Collaborator Author

cosmic-snow commented Sep 2, 2023

weird that this doesn't repro for you - if you're sure you've made sure the var is not set and it all passes it could be a MacOS specific thing, in which case I'm okay with calling that out in the py binding README and leaving it that way for now and I'll try to debug it myself later

I'm not sure about it, I'll have to try again. I'm usually running all of these through VS Codium, which comes with .env file support out of the box. It's very convenient, but there's still a possibility I didn't test it quite right because of that.

If you don't set the PYTEST_MODEL_PATH, the fallback will be ~/.cache/gpt4all. Are all of your models in that directory?

Actually, this is strange:

        if model_path is None:
            try:
                os.makedirs(DEFAULT_MODEL_DIRECTORY, exist_ok=True)
            except OSError as exc:
                raise ValueError(
                    f"Failed to create model download directory at {DEFAULT_MODEL_DIRECTORY}: {exc}. "
                    "Please specify model_path."
                )
            model_path = DEFAULT_MODEL_DIRECTORY
        else:
            model_path = model_path.replace("\\", "\\\\")
    
        if not os.path.exists(model_path):
>           raise ValueError(f"Invalid model directory: {model_path}")
E           ValueError: Invalid model directory: None

How is it still None at that point? None should trigger the creation and setting of the DEFAULT_MODEL_DIRECTORY.

Maybe I'll have to remove all my models and paths to find out what's going on here.

Edit: Alright I see the problem. I guess I just failed to remove the env var here when I tried previously. There's a conversion from None to 'None' happening there, too. 🤦‍♂️

The idea is to eventually accept files/folders both as Path and str, but this version isn't there yet.

@apage43
Copy link
Member

apage43 commented Sep 5, 2023

If you don't set the PYTEST_MODEL_PATH, the fallback will be ~/.cache/gpt4all. Are all of your models in that directory?

They are, though for me it is a symlink to the path the GUI puts them in:

lrwxr-xr-x 1 aaron staff 57 Jun 30 14:20 /Users/aaron/.cache/gpt4all -> /Users/aaron/Library/Application Support/nomic.ai/GPT4All

@cosmic-snow
Copy link
Collaborator Author

Alright, so it looks like the version that finally brings the project near upstream llama.cpp again (i.e. 2.5.0 of chat GUI; GGUF support) is about to be ready.

I guess the PR is outdated now (haven't followed everything closely), but the idea is still the same: if/when it gets accepted, refine things in a branch of the repo so there's access to CI, i.e. CI can run when pushing commits.

How exactly that happens is up to you.

@manyoso
Copy link
Collaborator

manyoso commented Mar 6, 2024

This is outdated and to be closed?

@manyoso manyoso closed this Mar 6, 2024
@cosmic-snow
Copy link
Collaborator Author

This was outdated a while ago because you guys didn't want to move it forward or merge it when it was ready.

@cebtenzzre
Copy link
Member

This was outdated a while ago because you guys didn't want to move it forward or merge it when it was ready.

If you're still interested in updating this PR to GGUF and the latest bindings, I could run CI for you and review/merge it. apage43 hasn't been assigned to GPT4All for a while.

@cosmic-snow
Copy link
Collaborator Author

cosmic-snow commented Mar 10, 2024

If you're still interested in updating this PR to GGUF and the latest bindings, I could run CI for you and review/merge it. apage43 hasn't been assigned to GPT4All for a while.

Why not talk to me on Discord while I'm there? Here's the summary of what has been going on and the state this is in:

  • It was basically ready by the mid of August 2023
  • I had to pipe up on Discord to even someone to look at it
  • There was one issue with PYTEST_MODEL_PATH which I didn't catch although I thought I had it covered, because of my setup (through which it was set multiple times, so removing it once was not enough to uncover the issue)
  • However, I fixed that with 21744b8 (early September)
  • I never heard back from anyone even when bringing it up.

Now I get that you people have been busy, but it's simply at a point where someone would have needed to spend some time on it. That shouldn't have been too much to ask.

I can't decide this for you.

  • I don't have a Mac myself, so the only way I could have the tests run was through another volunteer: Acrasst (no one at Nomic volunteered). And this wasn't even on one of the new M machines, but it became clear there were differences between the platforms.

So what I would have liked to see was the following:

  • Running the latest commit (21744b8) to see that it's OK.
  • Figuring out if and how to move it to separate branch in the project itself, so it could run on CI. Note:
    • It's likely that not all the tests should be active for that, some are resource-heavy.
    • The work I did with Pytest markers was exactly to be able to select a subset of tests.
    • I don't know what resource constraints there are on CI nor do I know what it costs you, so this is something you'd need to decide.
  • With that in place, try to bring everything up-to-date (that means current main)
  • Only at that point, see whether it'd be a good idea to actually merge it into main.

If you're still interested in updating this PR to GGUF and the latest bindings, I could run CI for you and review/merge it.

You'll have to excuse me, but at this point in time, my motivation to contribute any more time to the project is at an all-time low.

And regardless, the things I've outlined above have never been addressed and you guys basically refuse to talk to me about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python-bindings gpt4all-bindings Python specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants