Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Add a property to ExtraList for extracting data #2362

Open
1 task done
CaoHaiNam opened this issue Nov 20, 2024 · 1 comment
Open
1 task done

[Enhancement]: Add a property to ExtraList for extracting data #2362

CaoHaiNam opened this issue Nov 20, 2024 · 1 comment
Assignees
Labels
kind/enhancement New feature or request

Comments

@CaoHaiNam
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

What would you like to be added?

I would like a property to be added to the ExtraList class that allows users to extract its data. This property should make it easy to use the underlying data without any additional structure or formatting.

Why is this needed?

Currently, when using the latest master branch version, search or query functions return an ExtraList object. While this is helpful for additional metadata, it complicates situations where the underlying data can not be accessed directly for further processing or calculations.

Adding this feature will improve usability and make the class more intuitive for users who need direct access to the data.

Anything else?

Current behavior:

result = ExtraList([1, 2, 3], extra={"total": 3})
print(result)  # Output: data: ['1', '2', '3'], extra_info: {'total': 3}

# Attempting to process data
sum(result)  # Fails, as `ExtraList` doesn't directly behave like a list for some operations.

Desired behavior:

result = ExtraList([1, 2, 3], extra={"total": 3})
print(result.data)  # Output: ['1', '2', '3']
sum(result.data)    # Works as expected.

Here’s an example of the current issue when using ExtraList with the Milvus query functionality:

from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections
connections.connect("default", host="localhost", port="19530")
import random

# Define schema and collection
schema = CollectionSchema([
    FieldSchema("film_id", DataType.INT64, is_primary=True),
    FieldSchema("film_date", DataType.INT64),
    FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
])
collection = Collection("test_collection_query", schema)

# Insert sample data
data = [
    [i for i in range(10)],
    [i + 2000 for i in range(10)],
    [[random.random() for _ in range(2)] for _ in range(10)],
]
collection.insert(data)

# Create index and load collection
index_param = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
collection.create_index("films", index_param)
collection.load()

# Perform query
expr = "film_id == 3"
res = collection.query(expr, output_fields=["film_date"])

# Current behavior
print(res)  
# Output: data: ["{'film_id': 3, 'film_date': 2003}"], cannot access data directly

# Desired behavior with a new property
print(res.data)  
# Output: ["{'film_id': 3, 'film_date': 2003}"], a plain list object accessible for further calculations.

In the example above, when querying the Milvus collection, it returns an ExtraList. This structure makes it difficult to directly extract the query result for further processing. Adding a .data property or a similar method would allow users to directly access the query results, making them easier to work with for calculations, transformations, or other downstream processes.

@CaoHaiNam
Copy link
Contributor Author

/assign @CaoHaiNam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant