Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing search time when using page and limit #1309

Open
christianskou07 opened this issue Nov 12, 2024 · 0 comments
Open

Increasing search time when using page and limit #1309

christianskou07 opened this issue Nov 12, 2024 · 0 comments

Comments

@christianskou07
Copy link

christianskou07 commented Nov 12, 2024

If one wants to fetch all attributes of a certain type, it seems the recommended approach is through the use of page and limit parameters, iterating through pages until the returned amount of attributes is less than limit.

As an example, I have more than 5 mio attributes of type md5 in my instance and want to fetch all of them.

From experiments it seems regardless of the value of limit, the search time increases, measured as such:

def fetch_attributes(self):
    response_count = 1
    l = 20000
    p = 1
    sum_attr = 0
    sum_time = 0
    while response_count > 0:
        t0 = time.time()
        attributes = self.client.search(controller="attributes", return_format="json", type_attribute=["md5"], page=p, limit=l)
        t1 = time.time()
        total = t1 - t0
        response_count = len(attributes["Attribute"])
        p += 1
        sum_attr += response_count
        sum_time += total
        print(f"fetched {len(attributes["Attribute"])} attributes in {total}, sum_attr = {sum_attr}, sum_time = {sum_time}")

Example output (see attachment for full output):

fetched 20000 attributes in 4.9379072189331055, sum_attr = 20000, sum_time = 4.9379072189331055
fetched 20000 attributes in 4.651666879653931, sum_attr = 40000, sum_time = 9.589574098587036
fetched 20000 attributes in 4.8340137004852295, sum_attr = 60000, sum_time = 14.423587799072266
fetched 20000 attributes in 3.9235310554504395, sum_attr = 80000, sum_time = 18.347118854522705
fetched 20000 attributes in 4.641859292984009, sum_attr = 100000, sum_time = 22.988978147506714
...
fetched 20000 attributes in 12.544357299804688, sum_attr = 1380000, sum_time = 558.8374326229095
fetched 20000 attributes in 11.658548831939697, sum_attr = 1400000, sum_time = 570.4959814548492
fetched 20000 attributes in 12.361718893051147, sum_attr = 1420000, sum_time = 582.8577003479004
fetched 20000 attributes in 13.921313285827637, sum_attr = 1440000, sum_time = 596.779013633728

Please do not pay attention to the actual values, but more the clear trend of queries taking longer and longer as pages are iterated.

Below I have included some of the configuration parameters I found relevant. From htop it does not seem like CPU or memory resources are exhausted, however, I am not an expert in interpreting it.

php memory_limit is set to 8192 MB.

/etc/my.cnf.d/server.cnf:

...
[mysqld]
datadir=/data/mysql-data
innodb_buffer_pool_size=4G
innodb_io_capacity=1000
innodb_log_file_size=600MB
innodb_read_io_threads=16
...

MISP version 2.4.198
PyMISP version 2.5.1
MariaDB version 11.4.3

Full print output:
output.txt

Please let me know if you need anymore information or if this issue belongs to the MISP project instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant