⭐ Highlights

This is a major Haystack release with many new features. The release blog post has a detailed summary. Below are the top highlights:

Milvus Document Store

Milvus is an open-source vector database. With the MilvusDocumentStore contributed by @lalitpagaria, embedding based Retrievers like the DensePassageRetriever or EmbeddingRetriever can use production-ready Milvus servers for large-scale deployments.

Knowledge Graph

An experimental integration for KnowledgeGraphs is introduced using GraphDB. The GraphDBKnowlegeGraph stores Triples and executes SPARQL queries. It can be integrated with Text2SparqlRetriever to convert natural language queries to SPARQL.

Pipeline configuration with YAML

The Pipelines can now be configured with YAML. This enables easier sharing of query & indexing configuration, reproducible setups, A/B testing of Pipelines, and moving from development to the production environment.

REST APIs

The REST APIs are revamped to use Pipelines for Query & Indexing files. The YAML configurations are in the rest_api/pipelines.YAML. The new API endpoints are more generic to accommodate custom Pipeline configurations.

Confidence Scores

The answers now have a probability score that is better calibrated to the model's confidence. It has a range of 0-1; 0 signifying very low confidence, while, 1 for very high confidence.

Web Crawler

A Selenium based web crawler is now part of Haystack, thanks to @DIVYA-19 for the contribution. It takes as input a list of URLs and converts extracted text to Haystack Documents.

⚠️ Breaking Changes

REST APIs

The REST APIs got a major revamp with this release.

/doc-qa & /faq-qa endpoints are replaced with a more generic POST /query endpoint. This new endpoint uses Pipelines under-the-hood, that can be configured at rest_api/pipeline.yaml.

The new /query endpoint expects a single query per request instead of a list of query strings.
The new request format is:

{
    "query": "Why did the revenue change?"
}

and the response looks like this:

{
    "query": "Why did the revenue change?",
    "answers": [
        {
            "answer": "rapid technological change and evolving industry standards",
            "question": null,
            "score": 0.543937623500824,
            "probability": 0.014070278964936733,
            "context": "tion process. The market for our products is intensely competitive and is characterized by rapid technological change and     evolving industry standards.",
            "offset_start": 91,
            "offset_end": 149,
            "offset_start_in_doc": 511,
            "offset_end_in_doc": 569,
            "document_id": "f30273b2-4d49-40d8-8824-43b3b6a0ea57",
            "meta": {
                "_split_id": "7"
            }
        },
        {
             // other answers
        }
    ]
}

The /doc-qa-feedback & /faq-qa-feedback endpoints are replaced with a new generic /feedback endpoint.

Created At Timestamp

Previously, all documents/labels in SQLDocumentStore and FAISSDocumentStore had a field called created to store the creation timestamp, while ElasticsearchDocumentStore did not have any timestamp field. Now, all document stores have a created_at field for documents and labels.

RAGenerator

The top_k_answers parameter in the RAGenerator is renamed to top_k for consistency across Haystack components.

Custom Query for Elasticsearch

The placeholder terms in custom_query should not have quotes around them. See more details here.

🤓 Detailed Changes

Pipeline

Fix execution of Pipelines with parallel nodes #901 (@oryx1729)
Add abstract run method to basecomponent #887 (@tholor)
Add support for parallel paths in Pipeline #884 (@oryx1729)
Add runtime parameters to component initialization #873 (@oryx1729 )
Add support for indexing pipelines #816 (@oryx1729 )
Adding translator with many generic input parameter support #782 (@lalitpagaria)
Fix building Pipeline with YAML #800 (@oryx1729)
Load Pipeline with YAML config file #785 (@oryx1729)
Add evaluation nodes for Pipelines #904 (@brandenchan)
Fix passing a list as parameter value in Pipeline YAML #952 (@oryx1729)

Document Store

Fixes elasticsearch auth #871 (@grafke)
Allow more options for elasticsearch client (auth, multiple hosts) #845 (@tholor)
Fix ElasticsearchDocumentStore.query_by_embedding() #823 (@oryx1729)
Introduce incremental updates for embeddings in document stores #812 (@oryx1729)
Add method to get metadata values for a key from Elasticsearch #776 (@oryx1729)
Fix refresh behaviour for Elasticsearch delete #794 (@oryx1729)
Milvus integration #771 (@lalitpagaria)
Add flag for use of window queries in SQLDocumentStore #768 (@oryx1729)
Remove quotes around placeholders in Elasticsearch custom query #762 (@oryx1729)
Fix delete_all_documents for the SQLDocumentStore #761 (@oryx1729)

Retriever

Improve dpr conversion #826 (@Timoeller)
Fix DPR training batch size #898 (@brandenchan)
Upgrade FAISS to 1.7.0 #834 (@tholor)
Allow non-standard Tokenizers (e.g. CamemBERT) for DPR via new arg #811(@psorianom)

Modeling

Add model versioning support #784 (@brandenchan)
Improve preprocessing and adding of eval data #780 (@Timoeller)
SQuAD to DPR dataset converter #765 (@psorianom)
Remove RAG todos after transformers update #781 (@Timoeller)
Update farm version #936 (@Timoeller)

REST API

Refactor REST APIs to use Pipelines #922 (@oryx1729)
Add PDF converter in Dockerfiles #877 (@oryx1729)
Update GPU Dockerimage (Cuda 11, Fix faiss) #836 (@tholor)
Add API endpoint to export accuracy metrics from user feedback + created_at timestamp #803(@tholor)
Fix file upload API #808 (@oryx1729)

File Converter

Add Markdown file convertor #875 (@lalitpagaria)
Fix encoding for pdftotext (Russian characters, German umlauts etc). Fix version in download instructions #813 (@tholor)

Crawler

Add crawler to get texts from websites #775 (@DIVYA-19)

Knowledge Graph

knowledge graph example #934 (@julian-risch)

Annotation Tool

Annotation Tool: data is not persisted when using local version #853 #855(@venuraja79)

Search UI

Fix UI when API returns fewer answers than expected #828(@tholor)

CI

Revamp CI #825 (@oryx1729)
Fix mypy typing #792 (@oryx1729)
Fix pdftotext dependency in CI #788 (@tholor)

Misc Fixes

Adding indentation to markup files #947 (@julian-risch)
Reduce precision in pipeline eval print functions #943 (@lewtun)
Fix division by zero error in EvalRetriever #938 (@lewtun)
Logged warning in Faiss and Milvus for filters #913 (@peteradorjan)
fixed "cannot allocate memory" exception by specifying max_processes #910(@mosheber)
Fix error when is_impossible not exist #870 (@voidful)
Fix validation for split_respect_sentence_boundary in Preprocessor #869 (@oryx1729)
Fix boolean progress_bar for disabling tqdm progressbar #863 (@tholor)
Remove conditional import of FAISS for Windows #819 (@oryx1729)
Make tqdm progress bars optional (less verbose prod logs) #796 (@tholor)
Fix error when is_impossible not is_impossible and json dump encoding error [#868](#868 (@voidful)
fix download ntlk preprocessor #852 (@mrtunguyen)

Documentation

Add Milvus to the retriever / document store table #931 (@lewtun)
Fixing inconsistency #926 (@guillim)
Better default value for mp chunksize #923 (@Timoeller)
Run Grammarly over README.md #890 (@peterdemin)
Remove tf-idf youtube link #888 (@ms10596)
Add Milvus Documentation #838 (@brandenchan)
Fix link to Quick Demo in ToC. #831 (@aantti)
Revamp Readme #820 (@brandenchan)
Update tutorials (torch versions, ES version, replace Finder with Pipeline) #814 (@tholor)
Choose correct similarity fns during benchmark runs & re-run benchmarks #773 (@brandenchan)
Docs v0.7.0 #757 (@PiffPaffM)
Fix top_k param in RAG tutorials #906 (@Timoeller)
Integrate sentence transformers into benchmarks #843 (@Timoeller)

🙏 Thanks to our contributors

A big thank you to all the contributors for this release: @aantti, @brandenchan, @DIVYA-19, @grafke, @guillim, @julian-risch, @lalitpagaria, @lewtun, @mosheber, @mrtunguyen, @ms10596, @oryx1729, @peteradorjan, @PiffPaffM, @psorianom, @tholor, @Timoeller, @venuraja79, and @voidful.

We would like to thank everyone who participated in the insightful discussions on GitHub and our community Slack!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.0

⭐ Highlights

Milvus Document Store

Knowledge Graph

Pipeline configuration with YAML

REST APIs

Confidence Scores

Web Crawler

⚠️ Breaking Changes

REST APIs

Created At Timestamp

RAGenerator

Custom Query for Elasticsearch

🤓 Detailed Changes

Pipeline

Document Store

Retriever

Modeling

REST API

File Converter

Crawler

Knowledge Graph

Annotation Tool

Search UI

CI

Misc Fixes

Documentation

🙏 Thanks to our contributors