19 Jun 12:38

masci

da700c7

v1.17.2-rc1 Pre-release

Pre-release

This release fixes a bug in telemetry collection originate by the generalimport library. Since we already switched to a different library in the unstable branch, we decided to backport to 1.17.x the introduction of lazy_import, thus fixing the bug inherently.

What's Changed

feat: introduce lazy_import by @ZanSara in #5084

Full Changelog: v1.17.1...v1.17.2-rc1

Contributors

ZanSara

Assets 2

05 Jun 11:11

masci

v1.17.1

9abbd88

v1.17.1

Pin generalimport library to fix #5075

Assets 2

30 May 11:41

masci

v1.17.0

49a59dd

v1.17.0

⭐ Highlights

🗣️ Introducing ConversationalAgent

Great news! We’re introducing the ConversationalAgent – a type of Agent specifically implemented to create chat applications! With its memory integration, the new ConversationalAgent enables human-like conversation with large language models (LLMs). If you’re worried about the token limit of your model, there is an option to condense the chat history with ConversationSummaryMemory before injecting the history into the prompt.

To get started, just initialize ConversationalAgent with a PromptNode and start chatting.

summary_memory = ConversationalSummaryMemory(prompt_node=prompt_node)

conversational_agent = ConversationalAgent(
	prompt_node=prompt_node, 
	memory=summary_memory
)
conversational_agent.run(user_input)

To try it out, check out the new ConversationalAgent Tutorial, see the full example, or visit our documentation!

🎉 Now using transformers 4.29.1

With this release, Haystack depends on the latest version of the transformers library.

🧠 More LLMs

Haystack now supports command from Cohere and claude from Anthropic!

🤖 New error reporting strategy around 3rd-party dependencies

One of the challenges with a multi-purpose NLP framework like Haystack is to find the sweet spot of a turn-key solution implementing multiple NLP use cases without getting into dependency hell. With the new features around generative AI recently shipped, we got several requests about avoiding pulling in too many unrelated dependencies when, say, one just needs a PromptNode.
We heard your feedback and lowered the number of packages a simple pip install farm-haystack pulls in (and we'll keep doing it)! To keep the user experience as smooth as possible, by using the generalimports library, we defer dependency errors from "import time" down to "actual usage time" – so that you don't have to ask yourself "Why do I need this database client to run PromptNode?" anymore.

⚠️ MilvusDocumentStore Deprecated in Haystack

With Haystack 1.17, we have moved the MilvusDocumentStore out of the core haystack project, and we will maintain it in the haystack-extras repo. To continue using Milvus, check out the instructions on how to install the package separately in its readme.

What's Changed

⚠️ Breaking Changes

refactor: Update schema objects to handle Dataframes in to_{dict,json} and from_{dict,json} by @sjrl in #4747
chore: remove deprecated MilvusDocumentStore by @masci in #4951
chore: remove BaseKnowledgeGraph by @masci in #4953
chore: remove deprecated node PDFToTextOCRConverter by @masci in #4982

Pipeline

chore: upgrade transformers to 4.28.1 by @vblagoje in #4665
chore: fixed reader loading test for hf-hub starting 0.14.0 by @mayankjobanputra in #4607
bug: (rest_api) remove full logging of overwritten env variables by @duffn in #4791
fix: preserve root_node in JoinNode's output by @ZanSara in #4820
feat: Send pipeline config hash every 100 runs by @bogdankostic in #4884
feat: add BLIP support in TransformersImageToText by @anakin87 in #4912
fix: EvaluationResult serialization changes dataframes by @tstadel in #4906
fix: shaper exception when retriever return 0 docs. by @yuanwu2017 in #4929
fix: Use AutoTokenizer instead of DPR specific tokenizer by @bogdankostic in #4898
fix: Fix necessary extra for MarkdownConverter by @bogdankostic in #4947

DocumentStores

fix: Add support for _split_overlap meta to Pinecone and dict metadata in general to Weaviate by @bogdankostic in #4805
fix: str issues in squad_to_dpr by @PhilipMay in #4826
feat: introduce generalimport by @ZanSara in #4662
feat: Support authentication using AuthBearerToken and AuthClientCredentials in Weaviate by @hsm207 in #4028

Documentation

fix: loads local HF Models in PromptNode pipeline by @saitejamalyala in #4670
fix: README latest and main installation by @dfokina in #4741
fix: SentenceTransformersRanker's predict_batch returns wrong number of documents by @vblagoje in #4756
feat: add Google API to search engine providers by @Pouyanpi in #4722
bug: fix filtering in MemoryDocumentStore (v2) by @ZanSara in #4768
refactor: Extract ToolsManager, add it to Agent by composition by @vblagoje in #4794
chore: move custom linter to a separate package by @masci in #4790
refactor!: Deprecate name param in PromptTemplate and introduce template_name instead by @bogdankostic in #4810
chore: revert Deprecate name param in PromptTemplate and introduce prompt_nameinstead by @bogdankostic in #4834
chore: remove optional imports in v2 by @ZanSara in #4855
test: Update unit tests for schema by @sjrl in #4835
feat: allow filtering documents on all fields (v2) by @ZanSara in #4773
feat: Add Anthropic invocation layer by @silvanocerza in #4818
fix: improve Document comparison (v2) by @ZanSara in #4860
feat: Add Cohere PromptNode invocation layer by @vblagoje in #4827
fix: Support for gpt-4-32k by @dnetguru in #4825
fix: Document v2 JSON serialization by @ZanSara in #4863
fix: Dynamic max_answers for SquadProcessor (fixes IndexError when max_answers is less than the number of answers in the dataset) by @benheckmann in #4817
feat: Add agent memory by @vblagoje in #4829
fix: Make sure summary memory is cumulative by @vblagoje in #4932
feat: Add conversational agent by @vblagoje in #4931
docs: Small fix to PromptTemplate API docs by @sjrl in #4870
build: Remove mmh3 dependency by @julian-risch in #4896
docstrings update in web.py by @dfokina in #4921
feat: Add max_tokens to BaseGenerator params by @vblagoje in #4168
fix: change parameter name to request_with_retry by @ZanSara in #4950
fix: Adjust tool pattern to support multi-line inputs by @vblagoje in #4801
feat: enable passing generation_kwargs to the PromptNode in pipeline.run() by @faaany in #4832
fix: Remove streaming LLM tracking; they are all streaming now by @vblagoje in #4944
feat: HFInferenceEndpointInvocationLayer streaming support by @vblagoje in #4819
fix: Fix request_with_retry kwargs by @silvanocerza in #4980

Other Changes

fix: Allow to set num_beams in HFInvocationLayer by @sywangyi in #4731
ci: Execute pipelines and utils unit tests in CI by @bogdankostic in #4749
refactor: Make agent test more robust by @vblagoje in #4767
ci: Add coverage tracking with Coveralls by @silvanocerza in #4772
test: move several modeling tests in e2e/ by @ZanSara in #4308
chore: Added deprecation tests for seq2seq generator and RAG Generator by @mayankjobanputra in #4782
feat: Add HF local runtime token streaming support by @vblagoje in #4652
fix: load the local finetuning model from pipeline yaml (#4729) by @yuanwu2017 in #4760
test: Add others folder to unit test job by @silvanocerza in https://github.com/deepset-ai/haysta...

Contributors

masci, PhilipMay, and 20 other contributors

Assets 2

29 May 13:27

masci

v1.17.0-rc2

acbe12b

v1.17.0-rc2 Pre-release

Pre-release

v1.17.0-rc2

Assets 2

23 May 16:14

masci

v1.17.0-rc1

b8911df

v1.17.0-rc1 Pre-release

Pre-release

v1.17.0-rc1

Assets 2

28 Apr 14:41

bogdankostic

v1.16.1

0b5d680

v1.16.1

What's changed

fix: update ImportError for 'metrics' dependency by @bilgeyucel in #4778

Full Changelog: v1.16.0...v1.16.1

Contributors

bilgeyucel

Assets 2

27 Apr 18:04

bogdankostic

v1.16.0

d72cf07

v1.16.0

⭐️ Highlights

Using GPT-4 through `PromptNode` and `Agent`

Haystack now supports GPT-4 through PromptNode and Agent. This means you can use the latest advancements in large language modeling to make your NLP applications more accurate and efficient.

To get started, create a PromptModel for GPT-4 and plug it into your PromptNode. Just like with ChatGPT, you can use GPT-4 in a chat scenario and ask follow-up questions, as shown in this example:

prompt_model = PromptModel("gpt-4", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

Deprecating `RAGenerator` and `Seq2SeqGenerator`

RAGenerator and Seq2SeqGenerator are deprecated and will be removed in version 1.18. We advise using the more powerful PromptNode instead, which can use RAG and Seq2Seq models as well. The following example shows how to use PromptNode as a replacement for Seq2SeqGenerator:

p = PromptNode("vblagoje/bart_lfqa")

# Start by defining a question/query
query = "Why does water heated to room temperature feel colder than the air around it?"

# Given the question above, suppose the documents below were found in some document store
documents = [
    "when the skin is completely wet. The body continuously loses water by...",
    "at greater pressures. There is an ambiguity, however, as to the meaning of the terms 'heating' and 'cooling'...",
    "are not in a relation of thermal equilibrium, heat will flow from the hotter to the colder, by whatever pathway...",
    "air condition and moving along a line of constant enthalpy toward a state of higher humidity. A simple example ...",
    "Thermal contact conductance. In physics, thermal contact conductance is the study of heat conduction between solid ...",
]


# Manually concatenate the question and support documents into BART input
# conditioned_doc = "<P> " + " <P> ".join([d for d in documents])
# query_and_docs = "question: {} context: {}".format(query, conditioned_doc)

# Or use the PromptTemplate as shown here
pt = PromptTemplate("lfqa", "question: {query} context: {join(documents, delimiter='<P>')}")

res = p.prompt(prompt_template=pt, query=query, documents=[Document(d) for d in documents])

⚠️ Breaking Changes

Refactoring of our dependency management

We added the following extras as optional dependencies for Haystack: stats, metrics, preprocessing, file-conversion, and elasticsearch. To keep using certain components, you need to install farm-haystack with these new extras:

Component	Installation extra
`PreProcessor`	`farm-haystack[preprocessing]`
`DocxToTextConverter`	`farm-haystack[file-conversion]`
`TikaConverter`	`farm-haystack[file-conversion]`
`LangdetectDocumentLanguageClassifier`	`farm-haystack[file-conversion]`
`ElasticsearchDocumentStore`	`farm-haystack[elasticsearch]`

Dropping support for Python 3.7

Since Python 3.7 will reach end of life in June 2023, we will no longer support it as of Haystack version 1.16.

Smaller Breaking Changes

Using TableCell instead of Span to indicate the coordinates of a table cell (#4616)
Default save_dir for FARMReader's train method changed to f"./saved_models/{self.inferencer.model.language_model.name}" (#4553)
Using PreProcessor with split_respect_sentence_boundary set to True might return a different set of Documents than in v1.15 (#4470)

What's Changed

Breaking Changes

feat: Deduplicate duplicate Answers resulting from overlapping Documents in FARMReader by @bogdankostic in #4470
feat: Change default save_dir for FARMReader.train by @GitIgnoreMaybe in #4553
feat!: drop Python3.7 support by @ZanSara in #4421
refactor!: extract evaluation and statistical dependencies by @ZanSara in #4457
refactor!: extract preprocessing and file conversion deps by @ZanSara in #4605
feat: Implementation of Table Cell Proposal by @sjrl in #4616

Pipeline

fix: Fix pipeline config and agent tools hashing for telemetry by @silvanocerza in #4508
refactor: Adjust WhisperTranscriber to pipeline run methods by @vblagoje in #4510
Adding filtering support for Weaviate when used for BM25 querying by @zoltan-fedor in #4385
test: Remove duplicate whisper test by @julian-risch in #4567
fix: provide a fallback for PyMuPDF by @masci in #4564
Docs: Shaper API update by @agnieszka-m in #4542
Docs: Update Whisper API. by @agnieszka-m in #4539
refactor: remove variadic parameters in WebSearch initialization; make new nodes directly importable by @anakin87 in #4581
test: Add pytest fixture to block requests in unit tests by @silvanocerza in #4433
test: Rework conftest by @silvanocerza in #4614
feat: arbitrary crawler_depth for Crawler class by @benheckmann in #4623
fix: ParsrConverter list element added by @Namoush in #4562
fix: make langdetect truly optional by @ZanSara in #4686
feat: More flexible routing for RouteDocuments node by @sjrl in #4690
docs: Adapt Shaper docstrings regarding dropping metadata by @bogdankostic in #4655

DocumentStores

fix: Check for date fields in weaviate meta update by @joekitsmith in #4371
chore: skip Milvus tests by @ZanSara in #4654
docs: Add deprecation information to doc string of MilvusDocumentStore by @bogdankostic in #4658
Ignore cross-reference properties when loading documents by @masci in #4664
fix: PineconeDocumentStore error when delete_documents right after initialization by @Namoush in #4609
fix: remove warnings from the more recent Elasticsearch client by @masci in #4602
fix: Fixing the Weaviate BM25 query builder bug by @zoltan-fedor in #4703

Documentation

Docs: Update Seq2SeqGen models and docstrings lg by @agnieszka-m in #4595
feat: Load documents from remote - helper function by @TuanaCelik in #4545
refactor: Remove unecessary literal_eval when parsing env var by @silvanocerza in #4570
Docs: Fix QuestionGenerator and Summarizer docstrings by @agnieszka-m in #4594
refactor: Rework prompt tests by @silvanocerza in #4600
feat: Add util method to make HTTP requests with configurable retry by @silvanocerza in #4627
refactor: Rework invocation layers by @silvanocerza in #4615
refactor: Add 503 as status code that triggers retry in request_with_retry by @silvanocerza in #4640
feat: initial implementation of MemoryDocumentStore for new Pipelines by @ZanSara in #4447
docs: Add PDFToTextOCRConverter to API Docs by @bogdankostic in #4656
Docs: Add max length unit to PromptNode API docs by @agnieszka-m in #4601
fix: Add model_max_length model_kwargs parameter to HF PromptNode by @vblagoje in #4651
feat: Add chatgpt streaming by @vblagoje in #4659
feat: Add Hugging Face inferencing PromptNode layer by @vblagoje in #4641
refactor: node->component by @ZanSara in #4687
feat: Add AzureChatGPT Capability using new InvocationLayer style by @recrudesce in #4675
...

Contributors

masci, vblagoje, and 18 other contributors

Assets 2

12 Apr 10:49

julian-risch

v1.15.1

48b4b99

v1.15.1

What's Changed

fix: provide a fallback for PyMuPDF by @masci in #4564
refactor: Adjust WhisperTranscriber to pipeline run methods by @vblagoje in #4510

Full Changelog: v1.15.0...v1.15.1

Contributors

masci and vblagoje

Assets 2

31 Mar 13:31

julian-risch

v1.15.1-rc1

8f70519

v1.15.1-rc1 Pre-release

Pre-release

v1.15.1-rc1

Assets 2

30 Mar 09:02

julian-risch

v1.15.0

1ed4caf

v1.15.0

⭐ Highlights

Build Agents Yourself with Open Source

Exciting news! Say hello to LLM-based Agents, the new decision makers for your NLP applications! These agents have the power to answer complex questions by creating a dynamic action plan and using a variety of Tools in a loop. Picture this: your Agent decides to tackle a multi-hop question by retrieving pieces of information through a web search engine again and again. That's just one of the many feats these Agents can accomplish. Excited about the recent ChatGPT plugins? Agents allow you to build similar experiences in an open source way: your own environment, full control and transparency.
But how do you get started? First, wrap your Haystack Pipeline in a Tool and give your Agent a description of what that Tool can do. Then, initialize your Agent with a list of Tools and a PromptNode that decides when to use each Tool.

web_qa_tool = Tool(
    name="Search",
    pipeline_or_node=WebQAPipeline(retriever=web_retriever, prompt_node=web_qa_pn),
    description="useful for when you need to Google questions.",
    output_variable="results",
)

agent = Agent(
    prompt_node=agent_pn,
    prompt_template=prompt_template,
    tools=[web_qa_tool],
    final_answer_pattern=r"Final Answer\s*:\s*(.*)",
)
agent.run(query="<Your question here!>")

Check out the full example, a stand-alone WebQAPipeline, our new tutorials and the documentation!

Flexible PromptTemplates

Get ready to take your Pipelines to the next level with the revamped PromptNode. Now you have more flexibility when it comes to shaping the PromptNode outputs and inputs to work seamlessly with other nodes. But wait, there's more! You can now apply functions right within prompt_text. Want to concatenate the content of input documents? No problem! It's all possible with the PromptNode. And that's not all! The output_parser converts output into Haystack Document, Answer, or Label formats. Check out the AnswerParser in action, fully loaded and ready to use:

PromptTemplate(
            name="question-answering",
            prompt_text="Given the context please answer the question.\n" 
            			"Context: {join(documents)}\n"
            			"Question: {query}\n"
            			"Answer: ",
            output_parser=AnswerParser(),
        )

More details here.

Using ChatGPT through PromptModel

A few lines of code are all you need to start chatting with ChatGPT through Haystack! The simple message format distinguishes instructions, user questions, and assistant responses. And with the chat functionality you can ask follow-up questions as in this example:

prompt_model = PromptModel("gpt-3.5-turbo", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)

Haystack Extras

We now have another repo haystack-extras with extra Haystack components, like audio nodes AnswerToSpeech and DocumentToSpeech. For example, these two can be installed via:

pip install farm-haystack-text2speech

What's Changed

Breaking Changes

feat!: Increase Crawler standardization regarding Pipelines by @danielbichuetti in #4122
feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation by @danielbichuetti in #4226
build: Use uvicorn instead of gunicorn as server in REST API's Dockerfile by @bogdankostic in #4304
chore!: remove deprecated OpenDistroElasticsearchDocumentStore by @masci in #4361
refactor: Remove AnswerToSpeech and DocumentToSpeech nodes by @silvanocerza in #4391
fix: Fix debug on PromptNode by @recrudesce in #4483
feat: PromptTemplate extensions by @tstadel in #4378

Pipeline

feat: Add JsonConverter node by @bglearning in #4130
fix: Shaper store all outputs from function by @sjrl in #4223
refactor: Isolate PDF OCR converter from PDF text converter by @danielbichuetti in #4193
fix: add option to not override results by Shaper by @tstadel in #4231
feat: reduce and focus telemetry by @ZanSara in #4087
refactor: Remove deprecated nodes EvalDocuments and EvalAnswers by @anakin87 in #4194
refact: mark unit tests under the test/nodes/** path by @masci in #4235
fix: FARMReader produces Answers with negative start and end position by @julian-risch in #4248
test: replace ElasticsearchDS with InMemoryDS when it makes sense; support scale_score in InMemoryDS by @anakin87 in #4283
test: mock all Translator tests and move one to e2e by @ZanSara in #4290
fix: Prevent going past token limit in OpenAI calls in PromptNode by @sjrl in #4179
feat: Add Azure OpenAI embeddings support by @danielbichuetti in #4332
test: move tests on standard pipelines in e2e/ by @ZanSara in #4309
fix: EvalResult load migration by @tstadel in #4289
feat: Report execution time for pipeline components in _debug by @zoltan-fedor in #4197
refactor: Use TableQuestionAnsweringPipeline from transformers by @sjrl in #4303
fix: hf-tiny-roberta model loading from disk and mypy errors by @mayankjobanputra in #4363
docs: TransformersImageToText- inform about supported models, better exception handling by @anakin87 in #4310
fix: check that answer is not None before accessing it in table.py by @culms in #4376
feat: add automatic OCR detection mechanism and improve performance by @danielbichuetti in #4329
Add Whisper node by @vblagoje in #4335
tests: Mark Crawler tests correctly by @silvanocerza in #4435
test: Skip flaky test_multimodal_retriever_query by @silvanocerza in #4444
fix: issue evaluation check for content type by @ju-gu in #4181
feat: break retry loop for 401 unauthorized errors in promptnode by @FHardow in #4389
refactor: Remove retry_with_exponential_backoff in favor of tenacity by @silvanocerza in #4460
refactor: Remove ElasticsearchRetriever and ElasticsearchFilterOnlyRetriever by @silvanocerza in #4499
refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever by @silvanocerza in #4500
refactor: remove telemetry v1 by @ZanSara in #4496
feat: expose prompts to Answer and EvaluationResult by @tstadel in #4341
feat: Add agent tools by @vblagoje in #4437
refactor: reduce telemetry events count by @ZanSara in #4501

DocumentStores

fix: OpenSearchDocumentStore.delete_index doesn't raise by @tstadel in #4295
fix: increase MetaDocumentORM value length in SQLDocumentStore by @anakin87 in #4333
fix: when using IVF* indexing, ensure the index is trained frist by @kaixuanliu in #4311
refactor: Mark MilvusDocumentStore as deprecated by @silvanocerza in #4498

Documentation

feat: add top_k to PromptNode by @tstadel in #4159
feat: Add Agent by @julian-risch in #4148
ci: Automate OpenAPI specs upload to Readme.io by @silvanocerza in #4228
ci: Refactor docs config and generation by @silvanocerza in #4280
feat: Add Azure as OpenAI endpoint by @vblagoje in #4170
refactor: Allow flexible document id generation by @danielbichuetti in https://github.com/deepset-a...