Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowledge #1567

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Knowledge #1567

wants to merge 14 commits into from

Conversation

bhancockio
Copy link
Collaborator

No description provided.

@lorenzejay lorenzejay self-requested a review November 14, 2024 20:22
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource


class Knowledge(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets write some docs about:

  1. how to use this
  2. setting your own custom embedder for this

@@ -0,0 +1,82 @@
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd drop the ollama version. support openai, then let anyone bring their own embedder function (super easy) then have the knowledge_config setup like embedder_config setup for our rag storage

Comment on lines +25 to +31
@abstractmethod
def add(self, embedder: BaseEmbedder) -> None:
"""Process content, chunk it, compute embeddings, and save them."""
pass

def get_embeddings(self) -> List[np.ndarray]:
"""Return the list of embeddings for the chunks."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make this save to the project directory instead of the root.

# Compute embeddings for the new chunks
new_embeddings = embedder.embed_chunks(new_chunks)
# Save the embeddings
self.chunk_embeddings.extend(new_embeddings)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be saving this to a db and persist it. like our ragstorage

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this as we can generate the embeddings for files once, then just query if they already exist. otherwise, users will spend tokens generating embeddings that already exist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll help with this.

from pydantic import BaseModel, ConfigDict, Field

from crewai.knowledge.embedder.base_embedder import BaseEmbedder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extending this to save embeddings to db, then using knowledge class to query from there

@@ -85,6 +88,10 @@ class Agent(BaseAgent):
llm: Union[str, InstanceOf[LLM], Any] = Field(
description="Language model that will run the agent.", default=None
)
knowledge_sources: Optional[List[BaseKnowledgeSource]] = Field(
default=None,
description="Knowledge sources for the agent.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to declare this on the crew class. the task prompt will query from the relevant trickling down to the agent level, then defining here on the agent level

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants