Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conduct research on given URLs without forgetting and add more research #734

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Makesh-Srinivasan
Copy link
Contributor

Hi!

I am making this pull request to fix the issue with source_urls being reset inside the conduct_research() function, which causes GPTR to forget the user-input URLs.

Additionally, I am introducing a new parameter to the GPTResearcher class called add_additional_sources (bool). This parameter allows GPTR to gather more context from a default web search in addition to the user-input URLs, thereby increasing the overall scope of research for the query or sub-query.

HOW: If set, I scrape from both the user-input URLs and the default web search function. This way, GPTR researches both the user-input sources and the sources it finds on its own. If unset, we simply scour the user-input URLs alone and build the answer with the gathered context. If the query is unrelated to the URLs' contents, we log a message so the user knows the answer is generated from the model's inherent knowledge from its training data and not through 'research'.

WHY: The intent of providing the source_urls is to scour the user-provided webpages. Since conduct_research was forgetting the URLs, we were unable to scrape them. With this fix, the webpages can be scraped. However, there may be cases where the user might have missed edge cases where the query could be unrelated to the hardcoded source_urls, causing GPTR to generate answers from its own knowledge rather than from new research. To address this, I introduce the new parameter add_additional_sources, which allows GPTR to scour both the user-provided sources and conduct web searches, thereby increasing the context to answer from. This way, if the sources do not match the query, we can still overcome this and perform authentic research because of default web search as compared to the earlier answer generation from model's pre-trained weights. This feature is also useful when the user wants research done not only from the hardcoded URLs he/she provides, but also from other related sources on the internet which is infeasible to add manually by the user every time, but GPTR can find easily.

Other functions remain the same. I have also cleaned up parts of the code and comments relevant to the new modifications.

Thanks,
Makesh Srinivasan

@assafelovic
Copy link
Owner

@Makesh-Srinivasan this is great love it! Would you mind also adding a section in the documentation that explains how to use it? https://github.com/assafelovic/gpt-researcher/blob/master/docs/docs/gpt-researcher/tailored-research.md

Thank you!

@Makesh-Srinivasan
Copy link
Contributor Author

Sure, I can do that :) Thanks!

@ElishaKay
Copy link
Collaborator

ElishaKay commented Nov 11, 2024

resolved conflicts & pending merge here.
Welcome to the Git Tree @Makesh-Srinivasan - feel free to ping me on Discord to get added to our Contributors wall of honor

#982

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants