sort infer_fields and temp_fields #1430

geekychandraul · 2023-05-12T18:23:52Z

I was facing a issue when I was inserting data using json input files. if the order of keys was different than schema then it was throwing error.
example:
input = "rows":[
{"book_id": 101, "word_count": 13, "book_intro": [1.1, 1.2], "book_name": "bhaskar_test_1"},
{"book_id": 102, "word_count": 25, "book_intro": [2.1, 2.2], "book_name": "bhaskar_test_2"},
{"book_id": 103, "word_count": 7, "book_intro": [3.1, 3.2], "book_name": "bhaskar_test_3"},
{"book_id": 104, "word_count": 12, "book_intro": [4.1, 4.2], "book_name": "bhaskar_test_4"},
{"book_id": 105, "word_count": 34, "book_intro": [5.1, 5.2], "book_name": "bhaskar_test_5"}
]
while creating schema:

book_id = FieldSchema(
name="book_id",
dtype=DataType.INT64,
is_primary=True,
)
book_name = FieldSchema(
name="book_name",
dtype=DataType.VARCHAR,
max_length=200,
)
word_count = FieldSchema(
name="word_count",
dtype=DataType.INT64,
)
book_intro = FieldSchema(name="book_intro", dtype=DataType.FLOAT_VECTOR, dim=2)
schema = CollectionSchema(
fields=[book_id, book_name, word_count, book_intro],
description="Test book search",
)
collection_name = "book"
collection = Collection(
name=collection_name,
schema=schema,
using=conn_name,
shards_num=2,
)

as you can see the order is different:

list(i.name for i in tmp_fields) ['book_id', 'book_name', 'word_count', 'book_intro'] list(i.name for i in infer_fields) ['book_id', 'word_count', 'book_intro', 'book_name']
which was below exception:
Exception has occurred: DataNotMatchException (note: full exception trace is shown but execution is paused at: _run_module_as_main) <DataNotMatchException: (code=1, message=The data type of field book_name doesn't match, expected: VARCHAR, got INT64)>

below is custom message for above error:
"The data type of field book_name doesn't match, expected: VARCHAR, got INT64 for input field name word_count"

Solution:
I am just sorting by name before validation so that we get correct data while zip.
infer_fields.sort(key=lambda x: x.name) tmp_fields.sort(key=lambda x: x.name)

sre-ci-robot · 2023-05-12T18:23:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: geekychandraul
To complete the pull request process, please assign longjiquan after the PR has been reviewed.
You can assign the PR to them by writing /assign @longjiquan in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot · 2023-05-12T18:24:00Z

Welcome @geekychandraul! It looks like this is your first PR to milvus-io/pymilvus 🎉

Signed-off-by: geekychandraul <[email protected]>

…ps://github.com/geekychandraul/pymilvus into fixed-validation-for-insert-using-pandas-series

sort infer_fields and temp_fields

9d5d1e8

sre-ci-robot requested review from wangting0128 and XuanYang-cn May 12, 2023 18:23

sre-ci-robot added the size/XS label May 12, 2023

mergify bot added the needs-dco label May 12, 2023

geekychandraul and others added 2 commits May 13, 2023 00:03

sort infer_fields and temp_fields

ab8801b

Signed-off-by: geekychandraul <[email protected]>

Merge branch 'fixed-validation-for-insert-using-pandas-series' of htt…

456e4b3

…ps://github.com/geekychandraul/pymilvus into fixed-validation-for-insert-using-pandas-series

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sort infer_fields and temp_fields #1430

sort infer_fields and temp_fields #1430

geekychandraul commented May 12, 2023

sre-ci-robot commented May 12, 2023

sre-ci-robot commented May 12, 2023

sort infer_fields and temp_fields #1430

Are you sure you want to change the base?

sort infer_fields and temp_fields #1430

Conversation

geekychandraul commented May 12, 2023

sre-ci-robot commented May 12, 2023

sre-ci-robot commented May 12, 2023