Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grand Overhaul: Refactored Core Structure, Introduced New Features, E… #1295

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

6ixGODD
Copy link
Contributor

@6ixGODD 6ixGODD commented Oct 18, 2024

Refactoring of graphrag/query Module

Description

This pull request introduces a significant refactoring of the graphrag/query module within the GraphRAG project. The
primary objectives of this refactoring are:

  • Decoupling the Query Module: Transform the query component into an independent package, fully decoupled from
    other modules.
  • Enhancing Code Reusability and Modularity: Implement a modular design for the entire lifecycle of the GraphRAG
    query pipeline, promoting loose coupling and facilitating future maintenance and extension.
  • Improving the Python API: Provide a more user-friendly and convenient Python API, simplifying the creation and
    management of GraphRAG clients.
  • Eliminating Redundancies: Remove redundant modules and parameters (e.g., the question_gen module), streamlining
    the codebase.
  • Comprehensive Documentation: Add detailed docstrings and extensive type annotations throughout the codebase,
    ensuring code reliability and passing mypy checks.
  • Enhanced CLI and GUI Tools: Introduce a more powerful CLI tool with rich parameter combinations and an optional
    GUI built with PyQt6.
  • Unified Streaming Implementation: Employ a more elegant approach to handle both streaming and non-streaming
    outputs within a single method.

Related Issues

N/A

Proposed Changes

1. Project Layout

The codebase has been reorganized, promoting separation of concerns and ease of navigation. The new structure is as
follows:

query/
├── __init__.py             # Package initialization
├── __main__.py             # CLI entry point
├── _base_client.py         # Base client templates
├── _cli/                   # CLI layer
│   ├── __init__.py
│   ├── _api.py             # CLI API
│   ├── _cli.py             # CLI main program
│   ├── _qt/                # GUI layer
│   │   ├── __init__.py
│   │   └── _app.py         # GUI main program
│   └── _utils.py           # CLI utilities
├── _client.py              # GraphRAG clients
├── _config.py              # Configuration classes
├── _defaults.py            # Default constants
├── _search/                # Search layer
│   ├── __init__.py
│   ├── _context/           # Context module
│   │   ├── __init__.py
│   │   ├── _builders/      # Context builders
│   │   ├── _loaders/       # Context loaders
│   │   └── _types.py       # Type hints
│   ├── _defaults.py        # Search layer defaults
│   ├── _engine/            # Engine module
│   │   ├── __init__.py
│   │   ├── _base_engine.py # Base engine template
│   │   ├── _global.py      # Global search engine
│   │   └── _local.py       # Local search engine
│   ├── _input/             # Input module
│   │   ├── __init__.py
│   │   ├── _loaders/       # Input loaders
│   │   └── _retrieval/     # Input retrieval
│   ├── _llm/               # LLM module
│   │   ├── __init__.py
│   │   ├── _base_llm.py    # Base LLM template
│   │   ├── _chat.py        # Chat LLM
│   │   ├── _embedding.py   # Text Embedding
│   │   └── _types.py       # Type hints
│   ├── _model/             # Data models
│   └── _types/             # Type hints
│       ├── __init__.py
│       ├── _search.py
│       ├── _search_chunk.py
│       ├── _search_verbose.py
│       └── _search_chunk_verbose.py
├── _utils/                 # Utilities
│   ├── __init__.py
│   ├── _text.py            # Text utilities
│   └── _utils.py           # General utilities
├── _vector_stores/         # Vector storage layer
│   ├── __init__.py
│   ├── _base_vector_store.py
│   └── _lancedb.py
├── _version.py             # Version information
├── errors.py               # Error types
└── types.py                # Type hints
  • The query module is now fully decoupled from other modules, making it usable as a standalone package.
  • The code is reorganized to promote modularity, facilitating easier maintenance and potential future extensions.

2. Enhanced Python API

2.1 Initialize

Users can easily create a GraphRAGClient instance using configuration file, dictionary, environment variables or
configuration object.

a) From Configuration File

e.g.,

from graphrag.query import GraphRAGClient

config_file = "config.yaml"
client = GraphRAGClient.from_config_file(config_file)
  • The configuration file can be in YAML, JSON, or TOML format. Refer to the graphrag.example.yaml file for an example.
b) From Configuration Dictionary

e.g.,

from graphrag.query import AsyncGraphRAGClient

config = {
    "chat":      {
        "api_key":  "API_KEY",
        "base_url": "BASE_URL",
        "model":    "MODEL"
    },
    "embedding": {
        "api_key":  "API_KEY",
        "base_url": "BASE_URL",
        "model":    "MODEL"
    }
}

client = AsyncGraphRAGClient.from_config_dict(config)
c) From Configuration Object

If you prefer to use a configuration object and an optional logger, you can pass them directly to the constructor:

import logging

from graphrag.query import (
    ChatLLMConfig,
    EmbeddingConfig,
    GraphRAGClient,
    GraphRAGConfig,
)

logger = logging.getLogger(__name__)
config = GraphRAGConfig(
    chat=ChatLLMConfig(api_key="API_KEY", base_url="BASE_URL", model="MODEL"),
    embedding=EmbeddingConfig(api_key="API_KEY", base_url="BASE_URL", model="MODEL")
)

client = GraphRAGClient(config=config, logger=logger)
d) From Environment Variables

You can also initialize a client using environment variables:

export GRAPHRAG_QUERY__CHAT_LLM__API_KEY=API_KEY
export GRAPHRAG_QUERY__CHAT_LLM__MODEL=MODEL
export GRAPHRAG_QUERY__EMBEDDING__API_KEY=API_KEY
export GRAPHRAG_QUERY__EMBEDDING__MODEL=MODEL

Or create .env file in the project root directory:

GRAPHRAG_QUERY__CHAT_LLM__API_KEY=API_KEY
GRAPHRAG_QUERY__CHAT_LLM__MODEL=MODEL

GRAPHRAG_QUERY__EMBEDDING__API_KEY=API_KEY
GRAPHRAG_QUERY__EMBEDDING__MODEL=MODEL

Then initialize the client:

from graphrag.query import GraphRAGClient, GraphRAGConfig

config = GraphRAGConfig()
client = GraphRAGClient(config=config)

2.2 Chatting with GraphRAG

a) Simple Chat

You can chat with GraphRAG using the chat method:

from graphrag.query import GraphRAGClient

client: GraphRAGClient = ...
response = client.chat(
    engine="local",
    message=[
        {"role": "user", "content": "What is the purpose of life?"},
        {"role": "assistant", "content": "The purpose of life is to be happy."},
        {"role": "user", "content": "What is the meaning of happiness?"}
    ],
)

print(response.choice.message.content)

Or, in streaming mode:

from graphrag.query import GraphRAGClient

client: GraphRAGClient = ...
response = client.chat(
    engine="local",
    message=[
        {"role": "user", "content": "What is the purpose of life?"},
        {"role": "assistant", "content": "The purpose of life is to be happy."},
        {"role": "user", "content": "What is the meaning of happiness?"}
    ],
    stream=True
)

for chunk in response:
    print(chunk.choice.delta.content, end="")

client.close()  # Close the client
c) Using with Statement

You can also use the with statement to manage the client's lifecycle:

from graphrag.query import GraphRAGClient, GraphRAGConfig

config: GraphRAGConfig = ...
with GraphRAGClient(config=config) as client:
    response = client.chat(
        engine="local",
        message=[
            {"role": "user", "content": "What is the purpose of life?"},
            {"role": "assistant", "content": "The purpose of life is to be happy."},
            {"role": "user", "content": "What is the meaning of happiness?"}
        ],
        stream=True
    )

    for chunk in response:
        print(chunk.choice.delta.content, end="")
d) Verbose Search Results

If you want to collect verbose search results, you can set the verbose parameter to True:

from graphrag.query import GraphRAGClient

client: GraphRAGClient = ...
response = client.chat(
    engine="local",
    message=[
        {"role": "user", "content": "What is the purpose of life?"},
        {"role": "assistant", "content": "The purpose of life is to be happy."},
        {"role": "user", "content": "What is the meaning of happiness?"}
    ],
    verbose=True
)

print(response.model_dump())

Or, in streaming mode:

from graphrag.query import GraphRAGClient

client: GraphRAGClient = ...
response = client.chat(
    engine="local",
    message=[
        {"role": "user", "content": "What is the purpose of life?"},
        {"role": "assistant", "content": "The purpose of life is to be happy."},
        {"role": "user", "content": "What is the meaning of happiness?"}
    ],
    streaming=True,
    verbose=True
)

for chunk in response:
    print(chunk.model_dump())
e) Async Client

AsyncGraphRAGClient provides an asynchronous version of the GraphRAGClient:

import asyncio

from graphrag.query import AsyncGraphRAGClient, GraphRAGConfig

config: GraphRAGConfig = ...


async def main():
    client = AsyncGraphRAGClient(config=config)
    response = await client.chat(
        engine="local",
        message=[
            {"role": "user", "content": "What is the purpose of life?"},
            {"role": "assistant", "content": "The purpose of life is to be happy."},
            {"role": "user", "content": "What is the meaning of happiness?"}
        ],
        streaming=True
    )

    async for chunk in response:
        print(chunk.choice.delta.content, end="")

    await client.close()  # Or you can use the async context manager


asyncio.run(main())

3. Streamlined CLI and GUI Tools

3.1 CLI Parameters

Execute the following command:

python -m graphrag.query --help

To see the available options:

usage: python -m query [-h] [--verbose] [--engine {local,global}] [--stream] --chat-api-key CHAT_API_KEY [--chat-base-url CHAT_BASE_URL] --chat-model CHAT_MODEL
                       --embedding-api-key EMBEDDING_API_KEY [--embedding-base-url EMBEDDING_BASE_URL] --embedding-model EMBEDDING_MODEL --context-dir CONTEXT_DIR
                       [--mode {console,gui}] [--sys-prompt SYS_PROMPT] [-V]

GraphRAG Query CLI

options:
  -h, --help            show this help message and exit
  --verbose, -v         enable verbose logging (default: False)
  --engine {local,global}, -e {local,global}
                        engine to use for the query (default: local)
  --stream, -s          enable streaming output (default: False)
  --chat-api-key CHAT_API_KEY, -k CHAT_API_KEY
                        API key for the Chat API (default: None)
  --chat-base-url CHAT_BASE_URL, -b CHAT_BASE_URL
                        base URL for the chat API (default: None)
  --chat-model CHAT_MODEL, -m CHAT_MODEL
                        model to use for the chat API (default: None)
  --embedding-api-key EMBEDDING_API_KEY, -K EMBEDDING_API_KEY
                        API key for the embedding API (default: None)
  --embedding-base-url EMBEDDING_BASE_URL, -B EMBEDDING_BASE_URL
                        base URL for the embedding API (default: None)
  --embedding-model EMBEDDING_MODEL, -M EMBEDDING_MODEL
                        model to use for the embedding API (default: None)
  --context-dir CONTEXT_DIR, -c CONTEXT_DIR
                        directory containing the context data (default: None)
  --mode {console,gui}, -o {console,gui}
                        mode to execute the GraphRAG engine (default: console)
  --sys-prompt SYS_PROMPT, -p SYS_PROMPT
                        system prompt file in TXT format to use for the local engine (default: None)
  -V, --version         show program's version number and exit

3.2 Usage Examples

We can get started with the CLI from the corpus used in the GraphRAG official tutorial:

curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./input/pg24022.txt

Then running the indexing pipeline. Ommited for brevity.

a) Console Mode
python -m graphrag.query --engine local \
                         --chat-api-key API_KEY \
                         --chat-model MODEL \
                         --embedding-api-key API_KEY \
                         --embedding-model MODEL \
                         --context-dir ./output \
                         --mode console \
                         --stream

Or, more concisely:

python -m graphrag.query -e local \
                         -k API_KEY \
                         -m MODEL \
                         -K API_KEY \
                         -M MODEL \
                         -c ./output \
                         -o console \
                         -s

Here is an example screenshot:

img

b) GUI Mode
python -m graphrag.query --engine local \
                         --chat-api-key API_KEY \
                         --chat-model MODEL \
                         --embedding-api-key API_KEY \
                         --embedding-model MODEL \
                         --context-dir ./output \
                         --mode gui

Here is an example screenshot:

img_1

4. Web API

Applied the refactored query module to a web service in
the graphrag-server repository, providing an OpenAI-compatible
Chat API interface.

git clone https://github.com/6ixGODD/graphrag-server.git

cd graphrag-server

Modify the .env file with the appropriate API keys and models.

cp .env.example .env

Write a simple Python script to execute the web service:

from server import create_app

app = create_app()

if __name__ == '__main__':
    import uvicorn

    uvicorn.run(app, host='127.0.0.1', port=8000)

Then you can use the OpenAI SDK to interact with the web service:

import openai

client = openai.OpenAI(
    api_key="API_KEY",
    base_url="http://127.0.0.1:8000/api",
)
  • Detailed documentation and deployment instructions (e.g., using Gunicorn and Docker) will be provided in future
    updates.
  • Currently, there is no detailed docstring documentation for the web service; this will be added subsequently.

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

As mentioned, this PR involves significant code changes, but I believe it is a positive step forward. With thorough
testing, it will provide developers a more stable and modular version of GraphRAG for integration into their
applications, leading to greater overall benefits.

However, for this PR to be merged, some additional documentation work and test case development may require
collaboration with the official team.

@6ixGODD 6ixGODD requested review from a team as code owners October 18, 2024 13:05
@JoedNgangmeni
Copy link

PLEASE REVIEW THIS! PEOPLE ARE WAITING!!!!

@knguyen1
Copy link

PLEASE REVIEW THIS! PEOPLE ARE WAITING!!!!

You need to rebase to main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants