Grand Overhaul: Refactored Core Structure, Introduced New Features, E… #1295

6ixGODD · 2024-10-18T13:05:54Z

Refactoring of `graphrag/query` Module

Description

This pull request introduces a significant refactoring of the graphrag/query module within the GraphRAG project. The
primary objectives of this refactoring are:

Decoupling the Query Module: Transform the query component into an independent package, fully decoupled from
other modules.
Enhancing Code Reusability and Modularity: Implement a modular design for the entire lifecycle of the GraphRAG
query pipeline, promoting loose coupling and facilitating future maintenance and extension.
Improving the Python API: Provide a more user-friendly and convenient Python API, simplifying the creation and
management of GraphRAG clients.
Eliminating Redundancies: Remove redundant modules and parameters (e.g., the question_gen module), streamlining
the codebase.
Comprehensive Documentation: Add detailed docstrings and extensive type annotations throughout the codebase,
ensuring code reliability and passing mypy checks.
Enhanced CLI and GUI Tools: Introduce a more powerful CLI tool with rich parameter combinations and an optional
GUI built with PyQt6.
Unified Streaming Implementation: Employ a more elegant approach to handle both streaming and non-streaming
outputs within a single method.

Related Issues

N/A

Proposed Changes

1. Project Layout

The codebase has been reorganized, promoting separation of concerns and ease of navigation. The new structure is as
follows:

query/
├── __init__.py             # Package initialization
├── __main__.py             # CLI entry point
├── _base_client.py         # Base client templates
├── _cli/                   # CLI layer
│   ├── __init__.py
│   ├── _api.py             # CLI API
│   ├── _cli.py             # CLI main program
│   ├── _qt/                # GUI layer
│   │   ├── __init__.py
│   │   └── _app.py         # GUI main program
│   └── _utils.py           # CLI utilities
├── _client.py              # GraphRAG clients
├── _config.py              # Configuration classes
├── _defaults.py            # Default constants
├── _search/                # Search layer
│   ├── __init__.py
│   ├── _context/           # Context module
│   │   ├── __init__.py
│   │   ├── _builders/      # Context builders
│   │   ├── _loaders/       # Context loaders
│   │   └── _types.py       # Type hints
│   ├── _defaults.py        # Search layer defaults
│   ├── _engine/            # Engine module
│   │   ├── __init__.py
│   │   ├── _base_engine.py # Base engine template
│   │   ├── _global.py      # Global search engine
│   │   └── _local.py       # Local search engine
│   ├── _input/             # Input module
│   │   ├── __init__.py
│   │   ├── _loaders/       # Input loaders
│   │   └── _retrieval/     # Input retrieval
│   ├── _llm/               # LLM module
│   │   ├── __init__.py
│   │   ├── _base_llm.py    # Base LLM template
│   │   ├── _chat.py        # Chat LLM
│   │   ├── _embedding.py   # Text Embedding
│   │   └── _types.py       # Type hints
│   ├── _model/             # Data models
│   └── _types/             # Type hints
│       ├── __init__.py
│       ├── _search.py
│       ├── _search_chunk.py
│       ├── _search_verbose.py
│       └── _search_chunk_verbose.py
├── _utils/                 # Utilities
│   ├── __init__.py
│   ├── _text.py            # Text utilities
│   └── _utils.py           # General utilities
├── _vector_stores/         # Vector storage layer
│   ├── __init__.py
│   ├── _base_vector_store.py
│   └── _lancedb.py
├── _version.py             # Version information
├── errors.py               # Error types
└── types.py                # Type hints

The query module is now fully decoupled from other modules, making it usable as a standalone package.
The code is reorganized to promote modularity, facilitating easier maintenance and potential future extensions.

2. Enhanced Python API

2.1 Initialize

Users can easily create a GraphRAGClient instance using configuration file, dictionary, environment variables or
configuration object.

a) From Configuration File

e.g.,

from graphrag.query import GraphRAGClient

config_file = "config.yaml"
client = GraphRAGClient.from_config_file(config_file)

The configuration file can be in YAML, JSON, or TOML format. Refer to the graphrag.example.yaml file for an example.

b) From Configuration Dictionary

e.g.,

from graphrag.query import AsyncGraphRAGClient

config = {
    "chat":      {
        "api_key":  "API_KEY",
        "base_url": "BASE_URL",
        "model":    "MODEL"
    },
    "embedding": {
        "api_key":  "API_KEY",
        "base_url": "BASE_URL",
        "model":    "MODEL"
    }
}

client = AsyncGraphRAGClient.from_config_dict(config)

c) From Configuration Object

If you prefer to use a configuration object and an optional logger, you can pass them directly to the constructor:

import logging

from graphrag.query import (
    ChatLLMConfig,
    EmbeddingConfig,
    GraphRAGClient,
    GraphRAGConfig,
)

logger = logging.getLogger(__name__)
config = GraphRAGConfig(
    chat=ChatLLMConfig(api_key="API_KEY", base_url="BASE_URL", model="MODEL"),
    embedding=EmbeddingConfig(api_key="API_KEY", base_url="BASE_URL", model="MODEL")
)

client = GraphRAGClient(config=config, logger=logger)

d) From Environment Variables

You can also initialize a client using environment variables:

export GRAPHRAG_QUERY__CHAT_LLM__API_KEY=API_KEY
export GRAPHRAG_QUERY__CHAT_LLM__MODEL=MODEL
export GRAPHRAG_QUERY__EMBEDDING__API_KEY=API_KEY
export GRAPHRAG_QUERY__EMBEDDING__MODEL=MODEL

Or create .env file in the project root directory:

GRAPHRAG_QUERY__CHAT_LLM__API_KEY=API_KEY
GRAPHRAG_QUERY__CHAT_LLM__MODEL=MODEL

GRAPHRAG_QUERY__EMBEDDING__API_KEY=API_KEY
GRAPHRAG_QUERY__EMBEDDING__MODEL=MODEL

Then initialize the client:

from graphrag.query import GraphRAGClient, GraphRAGConfig

config = GraphRAGConfig()
client = GraphRAGClient(config=config)

2.2 Chatting with GraphRAG

a) Simple Chat

You can chat with GraphRAG using the chat method:

from graphrag.query import GraphRAGClient

client: GraphRAGClient = ...
response = client.chat(
    engine="local",
    message=[
        {"role": "user", "content": "What is the purpose of life?"},
        {"role": "assistant", "content": "The purpose of life is to be happy."},
        {"role": "user", "content": "What is the meaning of happiness?"}
    ],
)

print(response.choice.message.content)

Or, in streaming mode:

from graphrag.query import GraphRAGClient

client: GraphRAGClient = ...
response = client.chat(
    engine="local",
    message=[
        {"role": "user", "content": "What is the purpose of life?"},
        {"role": "assistant", "content": "The purpose of life is to be happy."},
        {"role": "user", "content": "What is the meaning of happiness?"}
    ],
    stream=True
)

for chunk in response:
    print(chunk.choice.delta.content, end="")

client.close()  # Close the client

c) Using `with` Statement

You can also use the with statement to manage the client's lifecycle:

from graphrag.query import GraphRAGClient, GraphRAGConfig

config: GraphRAGConfig = ...
with GraphRAGClient(config=config) as client:
    response = client.chat(
        engine="local",
        message=[
            {"role": "user", "content": "What is the purpose of life?"},
            {"role": "assistant", "content": "The purpose of life is to be happy."},
            {"role": "user", "content": "What is the meaning of happiness?"}
        ],
        stream=True
    )

    for chunk in response:
        print(chunk.choice.delta.content, end="")

d) Verbose Search Results

If you want to collect verbose search results, you can set the verbose parameter to True:

from graphrag.query import GraphRAGClient

client: GraphRAGClient = ...
response = client.chat(
    engine="local",
    message=[
        {"role": "user", "content": "What is the purpose of life?"},
        {"role": "assistant", "content": "The purpose of life is to be happy."},
        {"role": "user", "content": "What is the meaning of happiness?"}
    ],
    verbose=True
)

print(response.model_dump())

Or, in streaming mode:

from graphrag.query import GraphRAGClient

client: GraphRAGClient = ...
response = client.chat(
    engine="local",
    message=[
        {"role": "user", "content": "What is the purpose of life?"},
        {"role": "assistant", "content": "The purpose of life is to be happy."},
        {"role": "user", "content": "What is the meaning of happiness?"}
    ],
    streaming=True,
    verbose=True
)

for chunk in response:
    print(chunk.model_dump())

e) Async Client

AsyncGraphRAGClient provides an asynchronous version of the GraphRAGClient:

import asyncio

from graphrag.query import AsyncGraphRAGClient, GraphRAGConfig

config: GraphRAGConfig = ...


async def main():
    client = AsyncGraphRAGClient(config=config)
    response = await client.chat(
        engine="local",
        message=[
            {"role": "user", "content": "What is the purpose of life?"},
            {"role": "assistant", "content": "The purpose of life is to be happy."},
            {"role": "user", "content": "What is the meaning of happiness?"}
        ],
        streaming=True
    )

    async for chunk in response:
        print(chunk.choice.delta.content, end="")

    await client.close()  # Or you can use the async context manager


asyncio.run(main())

3. Streamlined CLI and GUI Tools

3.1 CLI Parameters

Execute the following command:

python -m graphrag.query --help

To see the available options:

usage: python -m query [-h] [--verbose] [--engine {local,global}] [--stream] --chat-api-key CHAT_API_KEY [--chat-base-url CHAT_BASE_URL] --chat-model CHAT_MODEL
                       --embedding-api-key EMBEDDING_API_KEY [--embedding-base-url EMBEDDING_BASE_URL] --embedding-model EMBEDDING_MODEL --context-dir CONTEXT_DIR
                       [--mode {console,gui}] [--sys-prompt SYS_PROMPT] [-V]

GraphRAG Query CLI

options:
  -h, --help            show this help message and exit
  --verbose, -v         enable verbose logging (default: False)
  --engine {local,global}, -e {local,global}
                        engine to use for the query (default: local)
  --stream, -s          enable streaming output (default: False)
  --chat-api-key CHAT_API_KEY, -k CHAT_API_KEY
                        API key for the Chat API (default: None)
  --chat-base-url CHAT_BASE_URL, -b CHAT_BASE_URL
                        base URL for the chat API (default: None)
  --chat-model CHAT_MODEL, -m CHAT_MODEL
                        model to use for the chat API (default: None)
  --embedding-api-key EMBEDDING_API_KEY, -K EMBEDDING_API_KEY
                        API key for the embedding API (default: None)
  --embedding-base-url EMBEDDING_BASE_URL, -B EMBEDDING_BASE_URL
                        base URL for the embedding API (default: None)
  --embedding-model EMBEDDING_MODEL, -M EMBEDDING_MODEL
                        model to use for the embedding API (default: None)
  --context-dir CONTEXT_DIR, -c CONTEXT_DIR
                        directory containing the context data (default: None)
  --mode {console,gui}, -o {console,gui}
                        mode to execute the GraphRAG engine (default: console)
  --sys-prompt SYS_PROMPT, -p SYS_PROMPT
                        system prompt file in TXT format to use for the local engine (default: None)
  -V, --version         show program's version number and exit

3.2 Usage Examples

We can get started with the CLI from the corpus used in the GraphRAG official tutorial:

curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./input/pg24022.txt

Then running the indexing pipeline. Ommited for brevity.

a) Console Mode

python -m graphrag.query --engine local \
                         --chat-api-key API_KEY \
                         --chat-model MODEL \
                         --embedding-api-key API_KEY \
                         --embedding-model MODEL \
                         --context-dir ./output \
                         --mode console \
                         --stream

Or, more concisely:

python -m graphrag.query -e local \
                         -k API_KEY \
                         -m MODEL \
                         -K API_KEY \
                         -M MODEL \
                         -c ./output \
                         -o console \
                         -s

Here is an example screenshot:

b) GUI Mode

python -m graphrag.query --engine local \
                         --chat-api-key API_KEY \
                         --chat-model MODEL \
                         --embedding-api-key API_KEY \
                         --embedding-model MODEL \
                         --context-dir ./output \
                         --mode gui

Here is an example screenshot:

4. Web API

Applied the refactored query module to a web service in
the graphrag-server repository, providing an OpenAI-compatible
Chat API interface.

git clone https://github.com/6ixGODD/graphrag-server.git

cd graphrag-server

Modify the .env file with the appropriate API keys and models.

cp .env.example .env

Write a simple Python script to execute the web service:

from server import create_app

app = create_app()

if __name__ == '__main__':
    import uvicorn

    uvicorn.run(app, host='127.0.0.1', port=8000)

Then you can use the OpenAI SDK to interact with the web service:

import openai

client = openai.OpenAI(
    api_key="API_KEY",
    base_url="http://127.0.0.1:8000/api",
)

Detailed documentation and deployment instructions (e.g., using Gunicorn and Docker) will be provided in future
updates.
Currently, there is no detailed docstring documentation for the web service; this will be added subsequently.

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

Additional Notes

As mentioned, this PR involves significant code changes, but I believe it is a positive step forward. With thorough
testing, it will provide developers a more stable and modular version of GraphRAG for integration into their
applications, leading to greater overall benefits.

However, for this PR to be merged, some additional documentation work and test case development may require
collaboration with the official team.

…xtended APIs, and Squashed Bugs

JoedNgangmeni · 2024-10-28T15:41:50Z

PLEASE REVIEW THIS! PEOPLE ARE WAITING!!!!

knguyen1 · 2024-10-30T18:41:32Z

PLEASE REVIEW THIS! PEOPLE ARE WAITING!!!!

You need to rebase to main.

Grand Overhaul: Refactored Core Structure, Introduced New Features, E…

6121af0

…xtended APIs, and Squashed Bugs

6ixGODD requested review from a team as code owners October 18, 2024 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grand Overhaul: Refactored Core Structure, Introduced New Features, E… #1295

Grand Overhaul: Refactored Core Structure, Introduced New Features, E… #1295

6ixGODD commented Oct 18, 2024

JoedNgangmeni commented Oct 28, 2024

knguyen1 commented Oct 30, 2024

Grand Overhaul: Refactored Core Structure, Introduced New Features, E… #1295

Are you sure you want to change the base?

Grand Overhaul: Refactored Core Structure, Introduced New Features, E… #1295

Conversation

6ixGODD commented Oct 18, 2024

Refactoring of graphrag/query Module

Description

Related Issues

Proposed Changes

1. Project Layout

2. Enhanced Python API

2.1 Initialize

a) From Configuration File

b) From Configuration Dictionary

c) From Configuration Object

d) From Environment Variables

2.2 Chatting with GraphRAG

a) Simple Chat

c) Using with Statement

d) Verbose Search Results

e) Async Client

3. Streamlined CLI and GUI Tools

3.1 CLI Parameters

3.2 Usage Examples

a) Console Mode

b) GUI Mode

4. Web API

Checklist

Additional Notes

JoedNgangmeni commented Oct 28, 2024

knguyen1 commented Oct 30, 2024

Refactoring of `graphrag/query` Module

c) Using `with` Statement