Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory limit (total) exceeded in Dataset preview #135

Open
handgunman opened this issue May 3, 2024 · 2 comments
Open

Memory limit (total) exceeded in Dataset preview #135

handgunman opened this issue May 3, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@handgunman
Copy link
Contributor

When adding computed fields with aggregation to the dataset preview, a query is built that consumes excessive memory in Clickhouse.
A query of the form:
default
query: SELECT
sum(t1.sumfield) AS res_0,
...
t1.id AS res_4,
t1.datecommit AS res_5,
t1.ndssum AS res_6,
t1.numberfield AS res_7
...
FROM "default".cheks_v AS t1
GROUP BY
res_1,
res_2,
res_3,
res_4,
res_5,
res_6,
res_7,
...

In fact, grouping is performed on all fields of the sample.
And only then the limitation is done.
If the table size is comparable to the RAM size, an error occurs.
Since grouping by all fields is performed, you should change the query to this structure:

select sum(a), a,b,c
from (select a,b,c from t limit 10)
group by a,b,c

limit before group by

Otherwise, it is almost impossible to use the preview on large amounts of data.

@KonstantAnxiety
Copy link
Contributor

Hey @handgunman! Your suggestion totally makes sense, but if we were to just put the LIMIT inside the original subquery (the one that select fields from selected tables) instead of limiting the result after all calculations, the resulting values would change and in a lot of cases be different from what one expects, all in all, this is an expected behaviour.

We are currently working out ways to make dataset previews more user-friendly when they become too heavy to calculate, we may as well consider limiting the portion of the original table(s) when querying the source for a preview

@resure resure added the enhancement New feature or request label May 22, 2024
@handgunman
Copy link
Contributor Author

Hi!
Since we are talking about a preview where basically GROUP BY ALL is done, there will be no difference in the result.

https://fiddle.clickhouse.com/acfbd276-d3eb-4899-bba5-824ec9690da4

Of course you can't do that in charts that don't have GROUP BY ALL. But in preview mode, need to do that.

Because the problem is not some heavy calculations that the user does. The problem is grouping by all fields, which the DBMS tries to perform on heavy tables.
In charts, there are usually several fields involved in the grouping and no such problem there.

Cheers!
Sergey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants