milvus-io · AnthonyTsu1984 · Sep 19, 2024 · Sep 16, 2024 · Sep 16, 2024 · Sep 17, 2024
diff --git a/site/en/embeddings/embed-with-jina.md b/site/en/embeddings/embed-with-jina.md
@@ -24,22 +24,44 @@ Then, instantiate the `JinaEmbeddingFunction`:
 from pymilvus.model.dense import JinaEmbeddingFunction
 
 jina_ef = JinaEmbeddingFunction(
-    model_name="jina-embeddings-v2-base-en", # Defaults to `jina-embeddings-v2-base-en`
-    api_key=JINAAI_API_KEY # Provide your Jina AI API key
+    model_name="jina-embeddings-v3", # Defaults to `jina-embeddings-v3`
+    api_key=JINAAI_API_KEY, # Provide your Jina AI API key
+    task="retrieval.passage", # Specify the task
+    dimensions=1024, # Defaults to 1024
 )
 ```
 
 __Parameters__:
 
 - `model_name` (*string*)
 
-  The name of the Jina AI embedding model to use for encoding. You can specify any of the available Jina AI embedding model names, for example, `jina-embeddings-v2-base-en`, `jina-embeddings-v2-small-en`, etc. If you leave this parameter unspecified, `jina-embeddings-v2-base-en` will be used. For a list of available models, refer to [Jina Embeddings](https://jina.ai/embeddings).
+  The name of the Jina AI embedding model to use for encoding. You can specify any of the available Jina AI embedding model names, for example, `jina-embeddings-v3`, `jina-embeddings-v2-base-en`, etc. If you leave this parameter unspecified, `jina-embeddings-v3` will be used. For a list of available models, refer to [Jina Embeddings](https://jina.ai/embeddings).
 
 - `api_key` (*string*)
 
   The API key for accessing the Jina AI API.
 
-To create embeddings for documents, use the `encode_documents()` method:
+- `task` (*string*)
+
+  The type of input passed to the model. Required for embedding models v3 and higher.
+
+  - `"retrieval.passage"`: Used to encode large documents in retrieval tasks at indexing time.
+  - `"retrieval.query"`: Used to encode user queries or questions in retrieval tasks.
+  - `"classification"`: Used to encode text for text classification tasks.
+  - `"text-matching"`: Used to encode text for similarity matching, such as measuring similarity between two sentences.
+  - `"clustering"`: Used for clustering or reranking tasks.
+
+- `dimensions` (*int*)
+
+  The number of dimensions the resulting output embeddings should have. Defaults to 1024. Only supported for embedding models v3 and higher. 
+
+- `late_chunking` (*bool*)
+
+  This parameter controls whether to use the new chunking method [Jina AI introduced last month](https://arxiv.org/abs/2409.04701) for encoding a batch of sentences. Defaults to `False`. When set to `True`, Jina AI API will concatenate all sentences in the input field and feed them as a single string to the model. Internally, the model embeds this long concatenated string and then performs late chunking, returning a list of embeddings that matches the size of the input list. 
+
+To create embeddings for documents, use the `encode_documents()` method. This method is designed for documents embeddings in asymmetric retrieval tasks, such as indexing documents for search or recommendation tasks. This method uses `retrieval.passage` as the task.
+
+```python:
 
 ```python
 docs = [
@@ -59,18 +81,18 @@ print("Dim:", jina_ef.dim, docs_embeddings[0].shape)
 The expected output is similar to the following:
 
 ```python
-Embeddings: [array([-4.88487840e-01, -4.28095880e-01,  4.90086500e-01, -1.63274320e-01,
-        3.43437800e-01,  3.21476880e-01,  2.83173790e-02, -3.10403670e-01,
-        4.76985040e-01, -1.77410420e-01, -3.84803180e-01, -2.19224200e-01,
-       -2.52898000e-01,  6.62411900e-02, -8.58173100e-01,  1.05221800e+00,
+Embeddings: [array([9.80641991e-02, -8.51697400e-02,  7.36531913e-02,  1.42558888e-02,
+       -2.23589484e-02,  1.68494112e-03, -3.50753777e-02, -3.11530549e-02,
+       -3.26012149e-02,  5.04568312e-03,  3.69836427e-02,  3.48948985e-02,
+        8.19722563e-03,  5.88679723e-02, -6.71099266e-03, -1.82369724e-02,
 ...
-       -2.04462400e-01,  7.14229800e-01, -1.66823000e-01,  8.72551440e-01,
-        5.53560140e-01,  8.92506300e-01, -2.39408610e-01, -4.22413560e-01,
-       -3.19551350e-01,  5.59153850e-01,  2.44338100e-01, -8.60452100e-01])]
-Dim: 768 (768,)
+        2.48654783e-02,  3.43279652e-02, -1.66154150e-02, -9.90478322e-03,
+       -2.96043139e-03, -8.57473817e-03, -7.39028037e-04,  6.25024503e-03,
+       -1.08831357e-02, -4.00776342e-02,  3.25369164e-02, -1.42691191e-03])]
+Dim: 1024 (1024,)
 ```
 
-To create embeddings for queries, use the `encode_queries()` method:
+To create embeddings for queries, use the `encode_queries()` method. This method is designed for query embeddings in asymmetric retrieval tasks, such as search queries or questions. This method uses `retrieval.query` as the task.
 
 ```python
 queries = ["When was artificial intelligence founded", 
@@ -85,13 +107,41 @@ print("Dim", jina_ef.dim, query_embeddings[0].shape)
 The expected output is similar to the following:
 
 ```python
-Embeddings: [array([-5.99164660e-01, -3.49827350e-01,  8.22405160e-01, -1.18632730e-01,
-        5.78107540e-01,  1.09789170e-01,  2.91604200e-01, -3.29306450e-01,
-        2.93779640e-01, -2.17880800e-01, -6.84535440e-01, -3.79752000e-01,
-       -3.47541800e-01,  9.20846100e-02, -6.13804400e-01,  6.31312800e-01,
+Embeddings: [array([8.79201014e-03,  1.47551354e-02,  4.02722731e-02, -2.52991207e-02,
+        1.12719582e-02,  3.75947170e-02,  3.97946090e-02, -7.36681819e-02,
+       -2.17952449e-02, -1.16298944e-02, -6.83426252e-03, -5.12507409e-02,
+        5.26071340e-02,  6.75181448e-02,  3.92445624e-02, -1.40817231e-02,
 ...
-       -1.84993740e-02,  9.38629150e-01,  2.74858470e-02,  1.09396360e+00,
-        3.96270750e-01,  7.44445800e-01, -1.95404050e-01, -6.08383200e-01,
-       -3.75076300e-01,  3.87512200e-01,  8.11889650e-01, -3.76407620e-01])]
-Dim 768 (768,)
+        8.81703943e-03,  4.24629413e-02, -2.32944116e-02, -2.05193572e-02,
+       -3.22035812e-02,  2.81896023e-03,  3.85326855e-02,  3.64372656e-02,
+       -1.65050142e-02, -4.26847413e-02,  2.02664156e-02, -1.72684863e-02])]
+Dim 1024 (1024,)
+```
+
+To create embeddings of inputs for similarity matching (such as STS or symmetric retrieval tasks), text classification, clustering, or reranking tasks, use the appropriate `task` parameter value when instantiating the `JinaEmbeddingFunction` class.
+
+
+```python
+from pymilvus.model.dense import JinaEmbeddingFunction
+
+jina_ef = JinaEmbeddingFunction(
+    model_name="jina-embeddings-v3", # Defaults to `jina-embeddings-v3`
+    api_key=JINA_API_KEY, # Provide your Jina AI API key
+    task="text-matching",
+    dimensions=1024, # Defaults to 1024
+)
+
+texts = [
+    "Follow the white rabbit.",  # English
+    "Sigue al conejo blanco.",  # Spanish
+    "Suis le lapin blanc.",  # French
+    "跟着白兔走。",  # Chinese
+    "اتبع الأرنب الأبيض.",  # Arabic
+    "Folge dem weißen Kaninchen.",  # German
+]
+
+embeddings = jina_ef(texts)
+
+# Compute similarities
+print(embeddings[0] @ embeddings[1].T)
 ```
diff --git a/site/en/integrations/integrate_with_jina.md b/site/en/integrations/integrate_with_jina.md
@@ -42,13 +42,18 @@ Jina AI's core embedding model, excels in understanding detailed text, making it
 from pymilvus.model.dense import JinaEmbeddingFunction
 
 jina_api_key = "<YOUR_JINA_API_KEY>"
-ef = JinaEmbeddingFunction("jina-embeddings-v2-base-en", jina_api_key)
+ef = JinaEmbeddingFunction(
+    "jina-embeddings-v3", 
+    jina_api_key,
+    task="retrieval.passage",
+    dimensions=1024
+)
 
 query = "what is information retrieval?"
 doc = "Information retrieval is the process of finding relevant information from a large collection of data or documents."
 
-qvecs = ef.encode_queries([query])
-dvecs = ef.encode_documents([doc])
+qvecs = ef.encode_queries([query])  # This method uses `retrieval.query` as the task
+dvecs = ef.encode_documents([doc])  # This method uses `retrieval.passage` as the task
 ```
 
 ## Bilingual Embeddings
@@ -120,8 +125,14 @@ from pymilvus.model.dense import JinaEmbeddingFunction
 from pymilvus import MilvusClient
 
 jina_api_key = "<YOUR_JINA_API_KEY>"
-ef = JinaEmbeddingFunction("jina-embeddings-v2-base-en", jina_api_key)
-DIMENSION = 768  # size of jina-embeddings-v2-base-en
+DIMENSION = 1024  # `jina-embeddings-v3` supports flexible embedding sizes (32, 64, 128, 256, 512, 768, 1024), allowing for truncating embeddings to fit your application. 
+ef = JinaEmbeddingFunction(
+    "jina-embeddings-v3", 
+    jina_api_key,
+    task="retrieval.passage",
+    dimensions=DIMENSION,
+)
+
 
 doc = [
     "In 1950, Alan Turing published his seminal paper, 'Computing Machinery and Intelligence,' proposing the Turing Test as a criterion of intelligence, a foundational concept in the philosophy and development of artificial intelligence.",
@@ -130,7 +141,7 @@ doc = [
     "The invention of the Logic Theorist by Allen Newell, Herbert A. Simon, and Cliff Shaw in 1955 marked the creation of the first true AI program, which was capable of solving logic problems, akin to proving mathematical theorems.",
 ]
 
-dvecs = ef.encode_documents(doc)
+dvecs = ef.encode_documents(doc) # This method uses `retrieval.passage` as the task
 
 data = [
     {"id": i, "vector": dvecs[i], "text": doc[i], "subject": "history"}
@@ -162,7 +173,7 @@ With all data in Milvus vector database, we can now perform semantic search by g
 
 ```python
 queries = "What event in 1956 marked the official birth of artificial intelligence as a discipline?"
-qvecs = ef.encode_queries([queries])
+qvecs = ef.encode_queries([queries]) # This method uses `retrieval.query` as the task
 
 res = milvus_client.search(
     collection_name=COLLECTION_NAME,  # target collection