Increasing search time when using page and limit #1309

christianskou07 · 2024-11-12T15:27:50Z

If one wants to fetch all attributes of a certain type, it seems the recommended approach is through the use of page and limit parameters, iterating through pages until the returned amount of attributes is less than limit.

As an example, I have more than 5 mio attributes of type md5 in my instance and want to fetch all of them.

From experiments it seems regardless of the value of limit, the search time increases, measured as such:

def fetch_attributes(self):
    response_count = 1
    l = 20000
    p = 1
    sum_attr = 0
    sum_time = 0
    while response_count > 0:
        t0 = time.time()
        attributes = self.client.search(controller="attributes", return_format="json", type_attribute=["md5"], page=p, limit=l)
        t1 = time.time()
        total = t1 - t0
        response_count = len(attributes["Attribute"])
        p += 1
        sum_attr += response_count
        sum_time += total
        print(f"fetched {len(attributes["Attribute"])} attributes in {total}, sum_attr = {sum_attr}, sum_time = {sum_time}")

Example output (see attachment for full output):

fetched 20000 attributes in 4.9379072189331055, sum_attr = 20000, sum_time = 4.9379072189331055
fetched 20000 attributes in 4.651666879653931, sum_attr = 40000, sum_time = 9.589574098587036
fetched 20000 attributes in 4.8340137004852295, sum_attr = 60000, sum_time = 14.423587799072266
fetched 20000 attributes in 3.9235310554504395, sum_attr = 80000, sum_time = 18.347118854522705
fetched 20000 attributes in 4.641859292984009, sum_attr = 100000, sum_time = 22.988978147506714
...
fetched 20000 attributes in 12.544357299804688, sum_attr = 1380000, sum_time = 558.8374326229095
fetched 20000 attributes in 11.658548831939697, sum_attr = 1400000, sum_time = 570.4959814548492
fetched 20000 attributes in 12.361718893051147, sum_attr = 1420000, sum_time = 582.8577003479004
fetched 20000 attributes in 13.921313285827637, sum_attr = 1440000, sum_time = 596.779013633728

Please do not pay attention to the actual values, but more the clear trend of queries taking longer and longer as pages are iterated.

Below I have included some of the configuration parameters I found relevant. From htop it does not seem like CPU or memory resources are exhausted, however, I am not an expert in interpreting it.

php memory_limit is set to 8192 MB.

/etc/my.cnf.d/server.cnf:

...
[mysqld]
datadir=/data/mysql-data
innodb_buffer_pool_size=4G
innodb_io_capacity=1000
innodb_log_file_size=600MB
innodb_read_io_threads=16
...

MISP version 2.4.198
PyMISP version 2.5.1
MariaDB version 11.4.3

Full print output:
output.txt

Please let me know if you need anymore information or if this issue belongs to the MISP project instead.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing search time when using page and limit #1309

Increasing search time when using page and limit #1309

christianskou07 commented Nov 12, 2024 •

edited

Loading

Increasing search time when using page and limit #1309

Increasing search time when using page and limit #1309

Comments

christianskou07 commented Nov 12, 2024 • edited Loading

christianskou07 commented Nov 12, 2024 •

edited

Loading