-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query a parquet file 4 times slower than clickhouse local #115
Comments
a 1/10 size parquet has the same problem
|
Thanks for the report @l1t1 we appreciate it! This definitely helps refine our scope. The next chdb versions should improve any gaps in performance, although clickhouse-local and chdb are built slightly different (jemalloc for one) so details about the execution context are extremely important. I cannot reproduce this performance gap on my system for instance.
|
not only query file, query numbers_mt() has the same question. the max_threads values are the same.
|
I run them on termux + proot-distro ubuntu 22.04
|
another phone didn't show jmalloc message, has the same question too
the cpu info
|
maybe it only occurs on a slow CPU, at faster phone, when increasing the data size, chdb runs as fast as clickhouse local.
|
Thanks for the @l1t1 this is interesting and most likely means the performance issue is related to jemalloc use in chdb or even with build options for certain processors or instruction sets. Could we compare the two CPU features by any chance? |
the slow CPU mentions is
when increasing data size(not the parquet file size), the gap is smaller.
|
(you don't have to strictly follow this form)
Describe the situation
SELECT avg(i) FROM file('/data/t.parquet') group by round(log10(i));
chdb costs 400s, clickhouse local costs 100s
How to reproduce
CREATE TABLE
statements for all tables involvedselect number::int i FROM numbers_mt(1,1000000000)t into outfile '/data/t.parquet';
SELECT avg(i) FROM file('/data/t.parquet') group by round(log10(i));
Expected performance
What are your performance expectation, why do you think they are realistic? Has it been working faster in older ClickHouse releases? Is it working faster in some specific other system?
I hope chdb runs as fast as clickhouse local.
Additional context
Add any other context about the problem here.
btw
select number::int i FROM numbers_mt(1,1000000000)t into outfile '/data/t.parquet';
chdb runs as fast as clickhouse local
The text was updated successfully, but these errors were encountered: