Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I was facing a issue when I was inserting data using json input files. if the order of keys was different than schema then it was throwing error.
example:
input = "rows":[
{"book_id": 101, "word_count": 13, "book_intro": [1.1, 1.2], "book_name": "bhaskar_test_1"},
{"book_id": 102, "word_count": 25, "book_intro": [2.1, 2.2], "book_name": "bhaskar_test_2"},
{"book_id": 103, "word_count": 7, "book_intro": [3.1, 3.2], "book_name": "bhaskar_test_3"},
{"book_id": 104, "word_count": 12, "book_intro": [4.1, 4.2], "book_name": "bhaskar_test_4"},
{"book_id": 105, "word_count": 34, "book_intro": [5.1, 5.2], "book_name": "bhaskar_test_5"}
]
while creating schema:
book_id = FieldSchema(
name="book_id",
dtype=DataType.INT64,
is_primary=True,
)
book_name = FieldSchema(
name="book_name",
dtype=DataType.VARCHAR,
max_length=200,
)
word_count = FieldSchema(
name="word_count",
dtype=DataType.INT64,
)
book_intro = FieldSchema(name="book_intro", dtype=DataType.FLOAT_VECTOR, dim=2)
schema = CollectionSchema(
fields=[book_id, book_name, word_count, book_intro],
description="Test book search",
)
collection_name = "book"
collection = Collection(
name=collection_name,
schema=schema,
using=conn_name,
shards_num=2,
)
as you can see the order is different:
list(i.name for i in tmp_fields) ['book_id', 'book_name', 'word_count', 'book_intro'] list(i.name for i in infer_fields) ['book_id', 'word_count', 'book_intro', 'book_name']
which was below exception:
Exception has occurred: DataNotMatchException (note: full exception trace is shown but execution is paused at: _run_module_as_main) <DataNotMatchException: (code=1, message=The data type of field book_name doesn't match, expected: VARCHAR, got INT64)>
below is custom message for above error:
"The data type of field book_name doesn't match, expected: VARCHAR, got INT64 for input field name word_count"
Solution:
I am just sorting by name before validation so that we get correct data while zip.
infer_fields.sort(key=lambda x: x.name) tmp_fields.sort(key=lambda x: x.name)